0% found this document useful (0 votes)

234 views

2014 Book ComputerScience-TheoryAndAppli PDF

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

234 views

2014 Book ComputerScience-TheoryAndAppli PDF

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 407

Edward A.

Hirsch
Sergei O. Kuznetsov
Jean-Éric Pin
Nikolay K. Vereshchagin (Eds.)
LNCS 8476

Computer Science -
Theory and Applications
9th International Computer Science Symposium
in Russia, CSR 2014
Moscow, Russia, June 7–11, 2014
Proceedings

123
Lecture Notes in Computer Science 8476
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Edward A. Hirsch Sergei O. Kuznetsov
Jean-Éric Pin Nikolay K. Vereshchagin (Eds.)

Computer Science -
Theory andApplications
9th International Computer Science Symposium
in Russia, CSR 2014
Moscow, Russia, June 7-11, 2014
Proceedings

13
Volume Editors
Edward A. Hirsch
Steklov Institute of Mathematics at St. Petersburg
Russian Academy of Sciences, St. Petersburg, Russia
E-mail: hirsch@pdmi.ras.ru
Sergei O. Kuznetsov
National Research University - Higher School of Economics, Moscow, Russia
E-mail: skuznetsov@hse.ru
Jean-Éric Pin
CNRS and University Paris Diderot, Paris, France
E-mail: jean-eric.pin@liafa.univ-paris-diderot.fr
Nikolay K. Vereshchagin
Moscow State University, Russia
E-mail: ver@mech.math.msu.su

ISSN 0302-9743 e-ISSN 1611-3349

ISBN 978-3-319-06685-1 e-ISBN 978-3-319-06686-8
DOI 10.1007/978-3-319-06686-8
Springer Cham Heidelberg New York Dordrecht London

Library of Congress Control Number: 2014936745

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication
or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location,
in ist current version, and permission for use must always be obtained from Springer. Permissions for use
may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution
under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface

The 9th International Computer Science Symposium in Russia (CSR 2014) was
held during June 7–11, 2014, in Moscow, hosted by the Moscow Center for Con-
tinuous Mathematical Education. It was the ninth event in the series of regular
international meetings following CSR 2006 in St. Petersburg, CSR 2007 in Eka-
terinburg, CSR 2008 in Moscow, CSR 2009 in Novosibirsk, CSR 2010 in Kazan,
CSR 2011 in St. Petersburg, CSR 2012 in Nizhny Novgorod, and CSR 2013 in
Ekaterinburg.
The opening lecture was given by Shafi Goldwasser and six other invited
plenary lectures were given by Mark Braverman, Volker Diekert, Martin Grohe,
Benjamin Rossman, Alexei Semenov, and Igor Walukiewicz.
This volume contains the accepted papers and abstracts of the invited talks.
The scope of the proposed topics for the symposium is quite broad and covers
a wide range of areas in theoretical computer science and its applications. We
received 76 papers in total, and out of these the Program Committee selected 27
papers for presentation at the symposium and for publication in the proceedings.
As usual, Yandex provided the Best Paper Awards. The recipients of these
awards were selected by the Program Committee, but due to potential conflicts
of interest, the procedure of selecting award winners did not involve the PC chair
and was chaired by Edward A. Hirsch. The winners are:
– Best Paper Award: Akinori Kawachi, Benjamin Rossman and Osamu Watan-
abe, “The Query Complexity of Witness Finding”
– Konrad Schwerdtfeger, The Connectivity of Boolean Satisfiability: Dichotomies
for Formulas and Circuits.
The reviewing process was organized using the EasyChair conference system
created by Andrei Voronkov. We would like to acknowledge that this system
helped greatly to improve the efficiency of the committee work.
The following satellite events were co-located with CSR 2014:
– Workshop on Current Trends in Cryptology (CTCrypt)
– Extremal Graph Theory
– New Directions in Cryptography
– Program Semantics, Specification and Verification (PSSV 2014)
We are grateful to our sponsors:
– Russian Foundation for Basic Research
– Higher School of Economics (HSE)
– Yandex
– Talksum
– IPONWEB
– CMA Small Systems AB
– Dynasty Foundation
VI Preface

We also thank the local organizers: Alexander Kulikov, Daniil Musatov,

Vladimir Podolskii, Alexander Smal, Tatiana Starikovskaya, Mikhail Raskin,
Mikhail Andreev, Anton Makhlin.

March 2014 Edward A. Hirsch

Sergei O. Kuznetsov
Jean-Éric Pin
Nikolay K. Vereshchagin
Organization

CSR 2014 was organized by:

– Moscow Center for Continuous Mathematical Education
– Steklov Institute of Mathematics at St. Petersburg, Russian Academy of
Sciences
– National Research University Higher School of Economics

Program Committee Chair

Jean-Éric Pin LIAFA, CNRS and University of Paris-Diderot,
France

Program Committee
Eric Allender Rutgers, USA
Andris Ambainis University of Latvia, Latvia
Christel Baier TU Dresden, Germany
Petra Berenbrink Simon Fraser University, Canada
Mikolaj Bojanczyk University of Warsaw, Poland
Andrei A. Bulatov Simon Fraser University, Canada
Victor Dalmau Pompeu Fabra University, Spain
Manfred Droste University of Leipzig, Germany
Zoltan Esik University of Szeged, Hungary
Fedor Fomin University of Bergen, Norway
Edward A. Hirsch Steklov Institute of Mathematics at
St. Petersburg, Russia
Gregory Kucherov CNRS and University of Marne-la-Vallée,
France
Michal Kunc Masaryk University, Czech Republic
Leonid Libkin University of Edinburgh, UK
Konstantin Makarychev Microsoft Research, Redmond, USA
Kurt Mehlhorn Max Planck Institute, Germany
Georg Moser University of Innsbruck, Austria
Alexander Okhotin University of Turku, Finland
Giovanni Pighizzini University of Milan, Italy
Jean-Éric Pin LIAFA, CNRS and University of Paris-Diderot,
France
VIII Organization

Alexander Razborov University of Chicago, USA, and Steklov

Mathematical Institute, Russia
Michel Rigo University of Liège, Belgium
Nicole Schweikardt University of Frankfurt, Germany
Jacobo Toran University of Ulm, Germany
Mikhail Volkov Ural Federal University, Russia
Carsten Witt TU Denmark, Denmark

Symposium Co-chairs
Nikolai K. Vereshchagin MCCME and Moscow State University, Russia
Edward A. Hirsch Steklov Institute of Mathematics at
St. Petersburg, Russia
Sergei O. Kuznetsov School of Applied Mathematics and
Information Science, Higher School of
Economics, Russia

Organizing Committee
Alexander Kulikov Steklov Institute of Mathematics at
St. Petersburg, Russia
Daniil Musatov Moscow Institute of Physics and Technology,
Russia
Vladimir Podolskii Steklov Institute of Mathematics, Moscow,
Russia
Alexander Smal Steklov Institute of Mathematics at
St. Petersburg, Russia
Tatiana Starikovskaya School of Applied Mathematics and
Information Science, Higher School
of Economics, Russia

Steering Committee
Anna Frid Sobolev Institute of Mathematics, Russia
Edward A. Hirsch Steklov Institute of Mathematics at
St. Petersburg, Russian Academy
of Sciences, Russia
Juhani Karhumäki University of Turku, Finland
Ernst W. Mayr Technical University of Munich, Germany
Alexander Razborov University of Chicago, USA, and Steklov
Mathematical Institute, Russia
Mikhail Volkov Ural Federal University, Russia
Organization IX

External Reviewers

Amano, Kazuyuki Friedetzky, Tom

Ananichev, Dmitry Friedman, Luke
Autebert, Jean-Michel Galántai, Aurél
Avanzini, Martin Gambette, Philippe
Balkova, Lubimora Gasarch, William
Barash, Mikhail Gastin, Paul
Barto, Libor Grenet, Bruno
Batu, Tugkan Groote, Jan Friso
Ben-Sasson, Eli Guerraoui, Rachid
Berlinkov, Mikhail Gurevich, Yuri
Björklund, Henrik Hansen, Kristoffer Arnsfelt
Blais, Eric Hatami, Pooya
Bliznets, Ivan Hirvensalo, Mika
Boasson, Luc Hlineny, Petr
Boigelot, Bernard Holzer, Markus
Boyar, Joan Huang, Shenwei
Braverman, Mark Huschenbett, Martin
Brihaye, Thomas Hüffner, Falk
Broadbent, Christopher Iemhoff, Rosalie
Buchin, Kevin Immerman, Neil
Carayol, Arnaud Itsykson, Dmitry
Carton, Olivier Ivan, Szabolcs
Castro, Jorge Jansen, Bart M.P.
Chailloux, Andre Jeż, Artur
Charlier, Emilie Jitsukawa, Toshiaki
Choffrut, Christian Jones, Mark
Clemente, Lorenzo Jukna, Stasys
Cording, Patrick Hagge Kabanets, Valentine
Crama, Yves Kaiser, Tomas
Crochemore, Maxime Karhumäki, Juhani
de Wolf, Ronald Kartzow, Alexander
Di Summa, Marco Kerenidis, Iordanis
Doerr, Benjamin Khachay, Mikhail
Doerr, Carola Koebler, Johannes
Drange, Paal Groenaas Komusiewicz, Christian
Dubslaff, Clemens Kozik, Marcin
Duchêne, Eric Kral, Daniel
Dvorak, Zdenek Kratsch, Dieter
Dziembowski, Stefan Krebs, Andreas
Eidelman, Yuli Krohn, Erik
Elbassioni, Khaled Kulikov, Alexander
Esik, Zoltan Kullmann, Oliver
Felgenhauer, Bertram Kuske, Dietrich
X Organization

Kuznets, Roman Rasin, Oleg

Kötzing, Timo Razgon, Igor
Laud, Peeter Rytter, Wojciech
Lauria, Massimo Saarela, Aleksi
Leupold, Peter Sau, Ignasi
Loﬀ, Bruno Schaper, Michael
Lohrey, Markus Seki, Shinnosuke
Lokshtanov, Daniel Serre, Olivier
Löding, Christof Shallit, Jeﬀrey
Magdon-Ismail, Malik Shpilrain, Vladimir
Malcher, Andreas Sokolov, Dmitry
Malod, Guillaume Srba, Jiri
Matulef, Kevin Stacho, Ladislav
Mayr, Richard Steinberg, Benjamin
Meckes, Mark Thapper, Johan
van Melkebeek, Dieter Thierauf, Thomas
Mercas, Robert Tirthapura, Srikanta
Mereghetti, Carlo Tsur, Dekel
Milnikel, Robert Tygert, Mark
Misra, Neeldhara van Leeuwen, Erik Jan
Monmege, Benjamin Velner, Yaron
Müller, David Vialette, Stéphane
Nagy, Benedek Villard, Jules
Navarro Perez, Juan Antonio Walen, Tomasz
Nebel, Markus Wang, Kaishun
Nederlof, Jesper Weil, Pascal
Niedermeier, Rolf Wilke, Thomas
Nordstrom, Jakob Xia, Jianlin
Ouaknine, Joel Xie, Ning
Palano, Beatrice Yakaryilmaz, Abuzer
Pilipczuk, Marcin Yamamoto, Masaki
Pilipczuk, Michal Zakov, Shay
Plaskota, Leszek Zdanowski, Konrad
Quaas, Karin Zhang, Shengyu
Rahonis, George Zimand, Marius
Rampersad, Narad
Invited Talks
Error-Correction for Interactive Computation

Mark Braverman

Department of Computer Science, Princeton University

Abstract. Classical error-correcting codes deal with the problem of

data transmission over a noisy channel. There are eﬃcient error-correcting
codes that work even when the noise is adversarial. In the interactive set-
ting, the goal is to protect an entire conversation between two (or more)
parties from adversarial errors. The area of interactive error correcting
codes has experienced a substantial amount of activity in the last few
years. In this talk we will introduce the problem of interactive error-
correction and discuss some of the recent results.
Finding All Solutions of Equations
in Free Groups and Monoids with Involution

Volker Diekert1 , Artur Jeż2,4, , and Wojciech Plandowski3

1
Institut für Formale Methoden der Informatik, University of Stuttgart, Germany
2
Institute of Computer Science, University of Wroclaw, Poland
3
Max Planck Institute für Informatik, Saarbrcken, Germany
4
Institute of Informatics, University of Warsaw, Poland

Abstract. The aim of this paper is to present a PSPACE algorithm

which yields a finite graph of exponential size and which describes the set
of all solutions of equations in free groups and monoids with involution
in the presence of rational constraints. This became possible due to the
recently invented recompression technique of the second author.
He successfully applied the recompression technique for pure word equa-
tions without involution or rational constraints. In particular, his method
could not be used as a black box for free groups (even without ratio-
nal constraints). Actually, the presence of an involution (inverse ele-
ments) and rational constraints complicates the situation and some addi-
tional analysis is necessary. Still, the recompression technique is powerful
enough to simplify proofs for many existing results in the literature. In
particular, it simplifies proofs that solving word equations is in PSPACE
(Plandowski 1999) and the corresponding result for equations in free
groups with rational constraints (Diekert, Hagenah and Gutiérrez 2001).
As a byproduct we obtain a direct proof that it is decidable in PSPACE
whether or not the solution set is finite.1

*
Supported by Humboldt Research Fellowship for Postdoctoral Researchers.
1
A full version of the present paper with detailed proofs can be found on arXiv.
Algorithmic Meta Theorems
for Sparse Graph Classes

Martin Grohe

RWTH Aachen University, Germany

grohe@informatik.rwth-aachen.de

Abstract. Algorithmic meta theorems give eﬃcient algorithms for classes

of algorithmic problems, instead of just individual problems. They unify
families of algorithmic results obtained by similar techniques and thus
exhibit the core of these techniques. The classes of problems are typi-
cally defined in terms of logic and structural graph theory. A well-known
example of an algorithmic meta theorem is Courcelle’s Theorem, stating
that all properties of graphs of bounded tree width that are definable in
monadic second-order logic are decidable in linear time.
This paper is a brief and nontechnical survey of the most important
algorithmic meta theorems.
The Lattice of Definability.
Origins, Recent Developments,
and Further Directions

Alexei Semenov1,2,3 , Sergey Soprunov2, and Vladimir Uspensky1

1
Moscow State University
2
Dorodnicyn Computing Center of the Russian Academy of Sciences
3
Moscow Pedagogical State University
alsemenov@umail.ru, soprunov@mail.ru, vau30@list.ru

Abstract. The paper presents recent results and open problems on

classes of definable relations (definability spaces, reducts, relational alge-
bras) as well as sources for the research starting from the XIX century.
Finiteness conditions are investigated, including quantifier alternation
depth and number of arguments width. The infinite lattice of definabil-
ity for integers with a successor function (a non ω-categorical structure)
is described. Methods of investigation include study of automorphism
groups of elementary extensions of structures under consideration, using
Svenonius theorem and a generalization of it.
Table of Contents

Finding All Solutions of Equations in Free Groups and Monoids with

Involution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Volker Diekert, Artur Jeż, and Wojciech Plandowski

Algorithmic Meta Theorems for Sparse Graph Classes . . . . . . . . . . . . . . . . 16

Martin Grohe

The Lattice of Deﬁnability. Origins, Recent Developments, and Further

Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Alexei Semenov, Sergey Soprunov, and Vladimir Uspensky

Counting Popular Matchings in House Allocation Problems . . . . . . . . . . . 39

Rupam Acharyya, Sourav Chakraborty, and Nitesh Jha

Vertex Disjoint Paths in Upward Planar Graphs . . . . . . . . . . . . . . . . . . . . . 52

Saeed Akhoondian Amiri, Ali Golshani, Stephan Kreutzer, and
Sebastian Siebertz

On Lower Bounds for Multiplicative Circuits and Linear Circuits

in Noncommutative Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
V. Arvind, S. Raja, and A.V. Sreejith

Testing Low Degree Trigonometric Polynomials . . . . . . . . . . . . . . . . . . . . . . 77

Martijn Baartse and Klaus Meer

Property Testing Bounds for Linear and Quadratic Functions via

Parity Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Abhishek Bhrushundi, Sourav Chakraborty, and Raghav Kulkarni

A Fast Branching Algorithm for Cluster Vertex Deletion . . . . . . . . . . . . . . 111

Anudhyan Boral, Marek Cygan, Tomasz Kociumaka, and
Marcin Pilipczuk

Separation Logic with One Quantiﬁed Variable . . . . . . . . . . . . . . . . . . . . . . 125

Stéphane Demri, Didier Galmiche,
Dominique Larchey-Wendling, and Daniel Méry

QuickXsort: Eﬃcient Sorting with n log n − 1.399n + o(n) Comparisons

on Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Stefan Edelkamp and Armin Weiß
XVIII Table of Contents

Notions of Metric Dimension of Corona Products: Combinatorial and

Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Henning Fernau and Juan Alberto Rodrı́guez-Velázquez

On the Complexity of Computing Two Nonlinearity Measures . . . . . . . . . 167

Magnus Gausdal Find

Block Products and Nesting Negations in FO2 . . . . . . . . . . . . . . . . . . . . . . . 176

Lukas Fleischer, Manfred Kuﬂeitner, and Alexander Lauser

Model Checking for String Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Milka Hutagalung and Martin Lange

Semiautomatic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Sanjay Jain, Bakhadyr Khoussainov, Frank Stephan, Dan Teng, and
Siyuan Zou

The Query Complexity of Witness Finding . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Akinori Kawachi, Benjamin Rossman, and Osamu Watanabe

Primal Implication as Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Vladimir N. Krupski

Processing Succinct Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Markus Lohrey and Manfred Schmidt-Schauß

Constraint Satisfaction with Counting Quantiﬁers 2 . . . . . . . . . . . . . . . . . . 259

Barnaby Martin and Juraj Stacho

Dynamic Complexity of Planar 3-Connected Graph Isomorphism . . . . . . . 273

Jenish C. Mehta

Fast Approximate Computations with Cauchy Matrices, Polynomials

and Rational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Victor Y. Pan

First-Order Logic on CPDA Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Pawel Parys

Recognizing Two-Sided Contexts in Cubic Time . . . . . . . . . . . . . . . . . . . . . 314

Max Rabkin

A Parameterized Algorithm for Packing Overlapping Subgraphs . . . . . . . 325

Jazmı́n Romero and Alejandro López-Ortiz

Crossing-Free Spanning Trees in Visibility Graphs of Points between

Monotone Polygonal Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Julia Schüler and Andreas Spillner
Table of Contents XIX

The Connectivity of Boolean Satisﬁability: Dichotomies for Formulas

and Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Konrad Schwerdtfeger

Randomized Communication Complexity of Approximating

Kolmogorov Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Nikolay Vereshchagin

Space Saving by Dynamic Algebraization . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

Martin Fürer and Huiwen Yu

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

Finding All Solutions of Equations
in Free Groups and Monoids with Involution

Volker Diekert1 , Artur Jeż2,4, , and Wojciech Plandowski3

1
Institut für Formale Methoden der Informatik, University of Stuttgart, Germany
2
Institute of Computer Science, University of Wroclaw, Poland
3
Max Planck Institute für Informatik, Saarbrücken, Germany
4
Institute of Informatics, University of Warsaw, Poland

Abstract. The aim of this paper is to present a PSPACE algorithm

which yields a finite graph of exponential size and which describes the set
of all solutions of equations in free groups and monoids with involution
in the presence of rational constraints. This became possible due to the
recently invented recompression technique of the second author.
He successfully applied the recompression technique for pure word
equations without involution or rational constraints. In particular, his
method could not be used as a black box for free groups (even without
rational constraints). Actually, the presence of an involution (inverse ele-
ments) and rational constraints complicates the situation and some addi-
tional analysis is necessary. Still, the recompression technique is powerful
enough to simplify proofs for many existing results in the literature. In
particular, it simplifies proofs that solving word equations is in PSPACE
(Plandowski 1999) and the corresponding result for equations in free
groups with rational constraints (Diekert, Hagenah and Gutiérrez 2001).
As a byproduct we obtain a direct proof that it is decidable in PSPACE
whether or not the solution set is finite.1

Introduction
A word equation is a simple object. It consists of a pair (U, V ) of words over
constants and variables and a solution is a substitution of the variables by words
in constants such that U and V are identical words. The study of word equa-
tions has a long tradition. Let WordEquation be the problem to decide whether
a given word equation has a solution. It is fairly easy to see that WordEquation
reduces to Hilbert’s 10th Problem (in Hilbert’s famous list presented in 1900 for
his address at the International Congress of Mathematicians). Hence in the mid
1960s the Russian school of mathematics outlined the roadmap to prove unde-
cidability of Hilbert 10 via undecidability of WordEquation. The program failed
in the sense that Matiyasevich proved Hilbert’s 10th Problem to be undecidable
in 1970, but by a completely diﬀerent method, which employed number theory.
The missing piece in the proof of the undecidability of Hilbert’s 10th Problem

Supported by Humboldt Research Fellowship for Postdoctoral Researchers.
1
A full version of the present paper with detailed proofs can be found on arXiv.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 1–15, 2014.

c Springer International Publishing Switzerland 2014
2 V. Diekert, A. Jeż, and W. Plandowski

was based on methods due to Robinson, Davis, and Putnam [20]. On the other
hand, in 1977 Makanin showed in a seminal paper [17] that WordEquation is
decidable! The program went a different way, but its outcome were two major
achievements in mathematics. Makanin’s algorithm became famous since it set-
tled a long standing problem and also because his algorithm had an extremely
complex termination proof. In fact, his paper showed that the existential the-
ory of equations in free monoids is decidable. This is close to the borderline
of decidability as already the ∀∃3 positive theory of free monoids is undecid-
able [7]. Furthermore Makanin extended his results to free groups and showed
that the existential and positive theories in free groups are decidable [18, 19].
Later Razborov was able in [26] (partly shown also in [27]) to describe the set
of all solutions for systems of equations in free groups (see also [14] for a de-
scription of Razborov’s work). This line of decidability results culminated in the
proof of Tarski’s conjectures by Kharlampovich and Myasnikov in a series of
papers ending in [15]. In particular, they showed that the theory of free groups
is decidable. In order to prove this fundamental result the description of all so-
lutions of an equation in a free group is crucial. Another branch of research was
to extend Makanin’s result to more general algebraic structures including free
partially commutative monoids [21, 5], free partially commutative monoids with
involution, graph groups (also known as right-angled Artin groups) [6], graph
products [4], and hyperbolic groups [28, 2]. In all these cases the existential the-
ory of equations is decidable. Proofs used the notion of equation with rational
constraints, which was first developed in the habilitation of Schulz, see [29]. A
concept which is also used throughout in the present paper.
In parallel to these developments there were drastic improvements in the com-
plexity of deciding word equations. It is fairly easy to see that the problem is
NP-hard. Thus, NP is a lower bound. First estimations for the time complexity
on Makanin’s algorithm for free monoids led to a tower of several exponentials,
but it was lowered over time to EXPSPACE in [9]. On the the other hand it
was shown in [16] that Makanin’s scheme for solving equations in free groups is
not primitive recursive. (Already in the mid 1990 this statement was somehow
puzzling and counter-intuitive, as it suggested a strange crossing of complexi-
ties: The existential theory in free monoids seemed to be easier than the one
in free groups, whereas it was already known at that time that the positive
theory in free monoids is undecidable, but decidable in free groups.) The next
important step was done by Plandowski and Rytter, whose approach [25] was
the first essentially different than Makanin’s original solution. They showed that
Lempel-Ziv encodings of minimal solution of word equations leads to an expo-
nential compression (if the solution itself is at least exponential in the length
of the equation). Moreover the compression turned out be extremely simple.
As a consequence they formulated the still valid conjecture that WordEquation
is NP-complete. Following the idea to keep the equation and possible solution
in compressed form and employing a novel type of factorisations Plandowski
showed that WordEquation is in PSPACE, i.e., it can be solved in polynomial
space and exponential time [22]. His method was quite different from Makanin’s
Solutions of Equations 3

approach and more symmetric. In particular, it could be also used to generate

all solutions of a given word equation [23], however, this required non-trivial
extensions of the original method.
Using Plandowski’s method Gutiérrez showed that satisfiability of equations
in free groups is in PSPACE [10], which led Diekert, Hagenah and Gutiérrez to
the result that the existential theory of equations with rational constraints in
free groups is PSPACE-complete [3]. To date these are still the best complexity
results. Since this proof generalised Plandowski’s satisfiability result [22], it is
tempting to also extend the generator of all solutions [23]. Indeed, Plandowski
claimed that his method applies also to free groups with rational constraints,
but he found a gap in his generalization [24].
However in 2013 another substantial progress in solving word equations was
done due to a powerful recompression technique due to Jeż [13]. His new proof
that WordEquation is in PSPACE simplified the existing proofs drastically. In
particular, this approach could be used to describe the set of all solutions rather
easily, so the previous construction of Plandowski [23] was simplified as well.
What was missing however was the extension to include free monoids with
involution and therefore free groups and another missing block was the the pres-
ence of rational constraints. Both extensions are the subject of the present paper.
We first follow the approach of [3] how to transform the set of all solutions of an
equation with rational constraints in polynomial time into a set of all solutions of
an equation with regular constraints over a free monoid with involution. Starting
at that point we show the existence of a PSPACE-transducer which produces
a finite graph (of exponential size) which describes all solutions and which is
nonempty if and only if the equation has at least one solution. Moreover, the
graph also encodes whether or not there are finitely many solutions, only. The
technique of recompression simplifies thereby [3] and it yields the important new
feature that we can describe all solutions.

1 Preliminaries

Word Equations. Let A and Ω be two ﬁnite disjoint sets, called the alphabet
of constants and the alphabet of variables (or unknowns), respectively. For the
purpose of this paper A and Ω are endowed with an involution, which is is
a mapping such that x = x for all elements. In particular, an involution
is a bijection. Since, the identity mapping is an involution, there is no harm
in thinking that all sets come with an involution. If M is a monoid, then we
additionally require xy = y x for all x, y ∈ M . This applies in particular to a free
monoid Σ ∗ over a set with involution: For a word w = a1 · · · am we thus have
w = am · · · a1 . If a = a for all a ∈ Σ then w simply means to read the word
from right-to-left.
A word equation is a pair (U, V ) of words over A ∪ Ω, usually denoted by
U = V . A solution σ of a word equation U = V is a substitution σ of unknowns
in Ω by words over constants, such that the replacement of unknowns by the
substituted words in U and in V give the same word. Moreover, as we work with
4 V. Diekert, A. Jeż, and W. Plandowski

involutions we additionally demand that the solution satisﬁes σ(X) = σ(X) for
all X ∈ Ω.

Example 1. Let Ω = X, Y, X, Y and A = {a, b} with b = a. Then XabY =
Y baX behaves as a word equation without involution One of its solutions is
the substitution σ(X) = bab, σ(Y ) = babab. Under this substitution we have
σ(X)abσ(Y ) = bababbabab = σ(Y )baσ(X). It can be proved that the solution
set of the equation XabY = Y baX is closely related to Sturmian words [12].

The notion of word equation immediately generalizes to a system of word

equations Ui = Vi for some index set I. In this case a solution σ must sat-
isfy all Ui = Vi simultaneously. Next, we consider constraints. Let C be a class of
formal languages, then a system of word equations with constraints in C is given
by a finite list (Ui , Vi )i of word equations and a finite list of constraints of type
X ∈ L (resp. X ∈ / L) where X ∈ Ω and L ⊆ A∗ with L ∈ C. For a solution we
now additionally demand that σ(X) ∈ L (resp. σ(X) ∈ / L) for all constraints.
Here, we focus on rational and recognizable (or regular) constraints and we
assume that the reader is familiar with basic facts in formal language theory.
The classes of rational and recognizable subsets are defined for every monoid
M [8], and they are incomparable, in general. Rational sets (or languages) are
defined inductively as follows. All finite subsets of M are rational. If L1 , L2 ⊆ M
are rational, then the union L1 ∪L2 , the concatenation L1 ·L2 , and the generated
submonoid L∗1 are rational. A subset L ⊆ M is called recognizable, if there is
a homomorphism ρ to some finite monoid E such that L = ρ−1 ρ(L). We also
say that ρ (or E) recognizes L in this case. Kleene’s Theorem states that in
finitely generated free monoids both classes coincide, and we follow the usual
convention to call a rational subset of a free monoid regular. If M is generated
by some finite set Γ ⊆ M (as it always the case in this paper) then every rational
set is the image of a regular set L under the canonical homomorphism from Γ ∗
onto M ; and every recognizable set of M is rational. (These statements are
trivial consequences of Kleene’s Theorem.) Therefore, throughout we assume
that a rational (or regular) language is specified by a nondeterministic finite
automaton, NFA for short.

Equations with Rational Constraints over Free Groups. By F (Γ ) we

denote the free group over a finite set Γ . We let A = Γ ∪ Γ −1 . Set also x = x−1
for all x ∈ F (Γ ). Thus, in (free) groups we identify x−1 and x. By a classical
result of Benois [1] rational subsets of F (Γ ) form an effective Boolean algebra.
That is: if L is rational and specified by some NFA then F (Γ )\ L is rational; and
we can effectively find the corresponding NFA. There might be an exponential
blow-up in the NFA size, though. This is the main reason to allow negative
constraints X ∈ / L, so we can avoid explicit complementation. Let us highlight
another point: The singleton {1} ⊆ F (Γ ) is, by definition, rational. Hence the
set F (Γ ) \ {1} is rational, too. Therefore an inequality U = V can be handled
by a new fresh variable X and writing U = XV & X ∈ F (Γ ) \ {1} instead of
U = V .
Solutions of Equations 5

Proposition 1 ([3]). Let F (Γ ) be a free group and A = Γ ∪ Γ −1 be the corre-

sponding set with involution as above. There is polynomial time transformation
which takes as input a system S of equations (and inequalities) with rational
constraints over F (Γ ) and outputs a system of word equations with regular con-
straints S over A which is solvable if and only if S is solvable in F (Γ ).
More precisely, let ϕ : A∗ → F (Γ ) be the canonical morphism of the free
monoid with involution A∗ onto the free F (Γ ). Then the set of all solutions for
S is mapped via σ → ϕ ◦ σ onto the set of all solutions of S.
A system of word equations (U1 , V1 ), . . . , (Us , Vs ) is equivalent to a single
equation (U1 a · · · Us aU1 b · · · Us b, V1 a · · · Vs aV1 b · · · Vs b) where a, b are a fresh con-
stants with a = b. Hence, Proposition 1 shows that the problem of deciding the
satisfiability of a system of equations and inequalities (with rational constraints)
in a free group or to find all solutions can be efficiently reduced to solving the
corresponding task for word equations with regular constraints in a free monoids
with involution.

Input Size. The input size for the reduction is given by the sum over the lengths
of the equations and inequalities plus the size of Γ plus the sum of the number
of states of the NFAs in the lists for the constraints. The measure is accurate
enough with respect to polynomial time and or space. For example note that if
an NFA has n states then the number of transitions is bounded by 2n |Γ |. Note
also that |Γ | can be much larger than the sum over the lengths of the equations
and inequalities plus the sum of the number of states of the NFAs in the lists
for the constraints. Recall that we encode X = 1 by a rational constraint, which
introduces an NFA with 2 |Γ | + 1 states. Since |Γ | is part of the input, this does
not cause any problem.

Word Equations with Regular Constraints and with Involution. Con-

sider a list of k regular languages Li ⊆ Σ ∗ each of them being specified by some
NFA with ni states. The disjoint union of these automata yields a single NFA
with n1 + · · · + nk states which accepts all Li by choosing appropriate initial
and final sets for each Li . Let n = n1 + · · · + nk . We may assume that the
NFA has state set {1, . . . n}. Then each letter a ∈ A defines a Boolean n × n
matrix τ (a) where the entry (p, q) is 1 if (p, a, q) is a transition and 0 otherwise.
This yields a homomorphism τ : A∗ → Bn×n such that τ recognizes Li for all
1 ≤ i ≤ k. Moreover, for each i there is a row vector Ii ∈ B1×n and a column
vector Fi ∈ Bn×1 such that we have w ∈ Li if and only if Ii · τ (w) · Fi = 1.
For a matrix P we let P T be its transposition. There is no reason that τ (a) =
τ (a)T , hence τ is not necessarily a homomorphism which respects the involution.
So, as done in [3], we let M2n ⊆ B2n×2n denote the following monoid with
involution:
0 0 QT 0
M2n = P0 Q P, Q ∈ Bn×n with P0 Q = .
0 PT

τ (a) 0
Define ρ(a) = 0 τ (a)T
. Then the homomorphism ρ : A∗ → M2n respects
the involution. Moreover ρ recognizes all Li and Li = {w | w ∈ Li }.
6 V. Diekert, A. Jeż, and W. Plandowski

In terms of matrices, each constraint X ∈ L or X ∈ / L translates to restriction

of possible values of ρ(σ(X)). On the other hand, if ρ(σ(X)) is one of the values
allowed by all constraints, σ(X) satisfies all constraints. Thus, as a preprocessing
step our algorithm guesses the ρ(σ(X)), which we shall shortly denote as ρ(X)
and in the following we are interested only in solution for which ρ(σ(X)) = ρ(X)
and ρ(X) = ρ(X). Note that, as ρ is a function, each solution of the original
system corresponds to a solution for an exactly one such guess, thus we can focus
on generating the solutions for this restricted problem.
Moreover, we always assume that the involution on Ω is without fixed points.
This is no restriction and avoids some case distinctions. We now give a precise
definition of the problem we are considering now:
Definition 1. An equation E with constraints is a tuple E = (A, Ω, ρ; U = V )
containing the following items:

– An alphabet of constants with involution A.

– An alphabet of variables with involution without ﬁxed points Ω.
– A mapping ρ : A ∪ Ω → M2n such that σ(X) = σ(X) for all x ∈ A ∪ Ω.
– The word equation U = V where U, V ∈ (A ∪ Ω)∗ .

A solution of E is a homomorphism σ : (A ∪ Ω)∗ → A∗ by leaving the letters

from A invariant such that the following conditions are satisﬁed:

σ(U ) = σ(V ) ,
σ(X) = σ(X) for all X ∈ Ω,
ρσ(X) = ρ(X) for all X ∈ Ω.

The input size of E is given by E = |A| + |Ω| + |U V | + n.

In the following, we denote the size of the instance by n, i.e., by the same letter
as the size of the matrix. This is not a problem, as we can always increase the
size of the matrix.

Equations during the Algorithm. During the procedure we will create var-
ious other equations and introduce new constants. Still, the original alphabet A
never changes and new constants shall represent words in A∗ . As a consequence,
we will work with equations over B ∪ Ω, where B is the smallest alphabet con-
taining A and all constants in U V U V . In this setting a solution σ assigns words
from B to variables. To track the meaning of constants from B \ A we addition-
ally require that a solution has a morphism h : B → A∗ , which is constant on
A. Then given an equation U = V the h(σ(U )) corresponds to a solution of the
original equation. Note that |B| ≤ |A| + 2 |U V | and we therefore we can ignore
|B| for the complexity.
A weight of a solution (σ, h) of an an equation (U, V ) is

w(σ, h) = |U | + |V | + 2n |h(σ(X))| .
X∈Ω
Solutions of Equations 7

Note that we implicitly assume here that if X does not occur in the equation
then σ(X) = . Each next equation in the sequence will have a smaller weight,
which ensures that we do not cycle.
Two solutions (σ1 , h) and (σ2 , h) of (U, V ) that satisfy h(σ1 (X)) = h(σ2 (X))
for each variable X represent the same solution of the original equation and so
in some sense are equivalent. We formalise this notion in the following way: for
an equation U = V the solution (σ1 , h) is a simpler equivalent of the solution
(σ2 , h), written as (σ1 , h) (σ2 , h) if for each variable X the σ1 (X) is obtained
from σ2 (X) by replacing some letters b ∈ B by h(b). It follows straight from the
deﬁnition that if (σ1 , h) (σ2 , h) then σ1 and σ2 have the same weight.
Note that h is a technical tool used in the analysis, it is not stored, nor
transformed by the algorithm, nor it is used in the graph representation of all
solutions.

2 Graph Representation of All Solutions

In this section we give an overview of the graph representation of all solutions and
the way such a representation is generated. By an operator we denote a function
that transforms substitutions (for variables). All our operators are rather sim-
ple: σ (X) is usually obtained from σ(X) by morphisms, appending/prepending
letters, etc. In particular, they have a polynomial description. We usually denote
them by ϕ and their applications as ϕ[σ].
Recall that the input equation has size n, so in particular it has at most n
variables. We say that a word equation (U, V ) (with constraints) is strictly proper
if U, V ∈ (B ∪ Ω)∗ , in total U and V have at most n occurrences of variables and
|U | + |V | ≤ cn2 (a possible constant is c = 28 as we will see later); an equation
is proper if |U | + |V | ≤ 2cn2 . The idea is that strictly proper equations satisfy
the desired upper-bound and proper equations are some intermediate equations
needed during the computation, so they can be a bit larger. Note that the input
equation is strictly proper.
The main technical result of the paper states that:
Lemma 1. Suppose that (U0 , V0 ) is a strictly proper equation with |U0 |, |V0 | > 0
and let it have a solution (σ0 , h0 ). Then there exists a sequence of proper equa-
tions (U0 , V0 ), (U1 , V1 ), . . . , (Um , Vm ), where m > 0, and families of operators
Φ1 , Φ2 , . . . , Φm such that
– (Um , Vm ) is strictly proper;
– if (σi , hi ) is a solution of (Ui , Vi ) then there is ϕi+1 ∈ Φi+1 and a solu-
tion (σi+1 , hi+1 ) of (Ui+1 , Vi+1 ) such that σi = ϕi+1 [σi+1 ] and hi (σi (Ui )) =
hi+1 (σi+1 (Ui+1 )) furthermore, w(σi , hi ) > w(σi+1 , hi+1 );
– if (σi+1 , hi+1 ) is a solution of (Ui+1 , Vi+1 ) and ϕi+1 ∈ Φi+1 then there is hi
such that (σi , hi ) is a solution of (Ui , Vi ), where σi = ϕi+1 [σi+1 ];
– each family Φi as well as operator ϕi ∈ Φi have polynomial-size description.
Given (U0 , V0 ), all such sequences (for all possible solutions) can be produced in
PSPACE.
8 V. Diekert, A. Jeż, and W. Plandowski

The exact deﬁnition of allowed families of operators Φ is deferred to Section 3,

for the time being let us only note that Φ has polynomial description (which can
be read from (Ui , Vi ) and (Ui+1 , Vi+1 )), may be inﬁnite and its elements can be
eﬃciently listed, (in particular, it can be tested, whether Φ is empty or not).
Clearly, for an equation in which both Ui and Vi are the same constant,
the solution is entirely described by regular languages in order to satisfy the
regular constraints. By a simple preprocessing we can actually ensure that all
X ∈ Ω are present in the equation. In this case the transformation implies that
if σ(Ui ) = σ(Vi ) ∈ B then there is exactly one solution that assigns to each
variable. In this way all solutions of the input equation (U, V ) are obtained by
a path from (U0 , V0 ) to some (Ui , Vi ) satisfying with σ(Ui ) = σ(Vi ) ∈ B: the
solution of (U, V ) is a composition of operators from the families of operators
on the path applied to the solution of (Ui , Vi ).
Using Lemma 1 one can construct in PSPACE a graph like representation
of all solutions of a given word equation: for the input equation U = V we
construct a directed graph G which has nodes labelled with a proper equa-
tions. Then for each strictly proper equation (U0 , V0 ) such that |U0 |, |V0 | > 1
we use Lemma 1 to list all possible sequences for (U0 , V0 ). For each such se-
quence (U0 , V0 ), (U1 , V1 ), . . . , (Um , Vm ) we put the edges (U0 , V0 ) → (U1 , V1 ),
(U1 , V1 ) → (U2 , V2 ) . . . , (Um−1 , Vm−1 ) → (Um , Vm ) and annotate the edges with
the appropriate family of operators. We lastly remove the nodes that are not
reachable from the starting node and those that do not have a path to an ending
node.
Note that there may be several ways to obtain the same solution, using dif-
ferent paths in the constructed graph.

3 Compression Step
In this section we describe procedures that show the claim of Lemma 1. In
essence, given a word equation (U, V ) with a solution σ we want to compress the
word σ(U ) directly on the equation, i.e., without the knowledge of the actual
solution. These compression steps replace the ab-blocks, as deﬁned later in this
section. To do this, we sometimes need to modify the equation (U, V ).
The crucial observation is that a properly chosen sequence of such compression
guarantees that the obtained equation is strictly proper (assuming (U, V ) was),
see Lemma 7.

Inverse Operators. Given a (nondeterministic) procedure transforming the

equation U = V we say that this procedure transforms the solutions, if based
on the nondeterministic choices and the input equation we can deﬁne a family
of operators Φ such that:
– For any solution (σ, h) of U = V there are some nondeterministic choices that
lead to an equation U = V such that (ϕ[σ ], h) (σ, h) for some solution
(σ , h ) of the equation U = V and some operator ϕ ∈ Φ. Furthermore,
h(σ(U )) = h (σ (U )).
Solutions of Equations 9

– for every equation U = V that can be obtained from U = V and any its
solution (σ , h ) and for every operator ϕ ∈ Φ there is h such that (ϕ[σ ], h)
is a solution of U = V and h(σ(U )) = h (σ (U )).
Note that both U = V and Φ depend on the nondeterministic choices, so it
might be that for diﬀerent choices we can transform U = V to U = V (with
Φ ) and to U = V (with a family Φ ).
We also say that the equation U = V with its solution (σ, h) are transformed
into U = V with (σ , h ) and that Φ is the corresponding family of inverse
operators. In many cases, Φ consists of a single operator ϕ, in such case we call
it the corresponding inverse operator, furthermore, in some cases ϕ does not
depend on U = V , nor on the nondeterministic choices.

ab-blocks. In an earlier paper using the recompression technique [13] there were
two types of compression steps: compression of pairs ab, where a = b were two
different constants, and compression of maximal factor a (i.e., ones that cannot
be extended to the right, nor left). In both cases, such factors were replaced with
a single fresh constant, say c. The advantage of such compression steps was that
the replaced factors were non-overlapping, in the sense that when we fixed a pair
or block to be compressed, each constant in a word w belongs to at most one
replaced factor.
We would like to use similar compression rules also for the case of monoids
with involution, however, one needs to take into the account that when some
string w is replaced with a a constant c, then also w should be replaced with c.
The situation gets complicated, when some of letters in w are fixed point for the
involution, i.e., a = a. In the worst case, when a = a and b = b the occurrences
of ab and ab = ba are overlapping, so the previous approach no longer directly
applies. (If we start with an equation over a free group then, in the involution
has no fixed points in A, but fixed points are produced during the algorithm.
They cannot be avoided in our approach, see below.)
Still, the problem can be resolved by replacing factors from a more general
class (for a fixed pair of constants ab).
Definition 2. Depending on a and b, ab-blocks are
1. If a = b then there are two types of ab-blocks: ai for i ≥ 2 and ai for i ≥ 2.
2. If a = b, a = a and b = b then ab and ba are the two types of ab-blocks.
3. If a = b, a = a and b = b then ab, ab = ba and bab are the three types of
ab-blocks.
4. If a = b, a = a and b = b then ab, ab = ba and aba are the three types of
ab-blocks.
5. If a = b, a = a and b = b then (ba)i for i ≥ 1, a(ba)i for i ≥ 1, (ba)i b for
i ≥ 1 and (ab)i for i ≥ 1 are the four types of ab-blocks.
An occurrence of ab-block in a word is an ab-factor, it is maximal, if it is not
contained in any other ab-factor.
For a fixed ab block s the s-reduction of the word w is the word w in which all
maximal factors s (s) are replaced by a new constant cs (cs , respectively). The
inverse function is s-expansion.
10 V. Diekert, A. Jeż, and W. Plandowski

Observe that s-reduction introduces new constants to B, we extend ρ to it in

a natural way, keeping in mind that if s is replaced with c then s is replaced with
c. We let c = c if and only if s = s Note that in this way letters may become
fixed point for the involution. For example, aa is an aa-block for a = a. If aa is
compressed into c then c = c. It might be that after s-reduction some letter c in
the solution is no longer in B (as it was removed from the equation). In such a
case we replace c in the solution by h(c), this is described in larger detail later
on.
The following fact is a consequence of the definitions of maximal ab-blocks.
Lemma 2. For any word w ∈ B ∗ and two constants a, b ∈ B, maximal ab-blocks
in w do not overlap.
As a consequence of Lemma 2 we can extend the s-reduction to sets S of
ab-blocks in a natural way. Such S-reduction words is well-defined and defines a
function on B ∗ . Clearly, S-expansion is a function.
The s-reduction is easy, if all s factors are wholly contained within the equa-
tion or within substitution for a variable. It looks non-obvious, when part of
some factor s is within the substitution for the variable and part in the equa-
tion. Let us formalise those notions: for a word equation (U, V ) we say that an
ab-factor is crossing in a solution σ if it does not come from U (V , respectively),
nor from any σ(X) nor σ(X) for an occurrence of a variable X; ab is crossing in
a solution σ, if some ab-factor is crossing. Otherwise ab is non-crossing in σ.
By guessing all X ∈ Ω with σ(X) = (and removing them) we can always
assume that σ(X) = for all X. In this case crossing ab’s can be alternatively
characterized.
Lemma 3. If we have σ(X) = for all X then ab is crossing in σ if and only
if one of the following holds:

– aX, for an unknown X, occurs in U or V and σ(X) begins with b or

– bX, for an unknown X, occurs in U or V and σ(X) begins with a or
– Xb, for an unknown X, occurs in U or V and σ(X) ends with a or
– Xa, for an unknown X, occurs in U or V and σ(X) ends with b or
– XY or Y X, for unknowns X, Y , occurs in U or V and σ(X) ends with b
while σ(Y ) begins with a.

Since a crossing word ab can be associated with an occurrence of a variable X,

it follows that the number of crossing words is linear in the number of occurrences
of variables.
Lemma 4. Let (U, V ) be a proper equation such that σ(X) = for all X. Then
there are at most 4n diﬀerent crossing words in σ.

Reduction for Non-crossing ab. When ab is non-crossing in the solution σ

we can make the reduction for all ab-blocks that occur in the equation (U, V )
on σ(U ) simply by replacing each ab-factor in U and V . The correctness follows
from the fact that each maximal occurrence of ab-block in σ(U ) and σ(V ) comes
Solutions of Equations 11

either wholly from U (V , respectively) or from σ(X) or σ(X). The former are
replaced by our procedure and the latter are replaced implicitly, by changing the
solution. Thus it can be shown that the solutions of the new and old equation
are in one-to-one correspondence.

Algorithm 1. CompNCr(U, V, ab) Reduction for a non-crossing ab

1: S ← all maximal ab-blocks in U and V
2: for s ∈ S do
3: let cs be a fresh constant
4: if s = s then
5: let cs denote cs
6: else
7: let cs a fresh constant
8: replace each maximal factor s (s) in U and V by cs (cs , respectively)
9: set ρ(cs ) ← ρ(s) and ρ(cs ) ← ρ(s)
10: return (U, V )

We should deﬁne an inverse operator for CompNCr(U, V, ab) as well as h . For

the latter, we extend h to new letters cs simply by h (cs ) = h(s) and keeping
all other values as they were. For the former, let S be the set of all maximal
ab-blocks in (U, V ) and let CompNCr(U, V, ab) replace s ∈ S by cs then ϕ{cs →s}s∈S
is defined as follows: in each σ(X) it replaces each cs by s, for all s ∈ S. Note
that ϕ{cs →s}s∈S is the S-expansion.
Lemma 5. Let ab be non-crossing in a solution σ of an equation (U, V ). Let
compute a set of ab-blocks S and replace s ∈ S by cs . Then CompNCr(U, V, ab)
transforms (U, V ) with (σ, h) to (U , V ) with (σ , h ), where h is defined as
above. Furthermore, define ϕ{cs →s}s∈S as above; it is the inverse operator.
If at least one factor was replaced in the equation then w(σ (U )) < w(σ(U )).
Proof. We define a new solution σ by simply replacing each maximal factor
s ∈ S by cs . This is not all, as it might be that this solution uses letters that
are no longer in (U , V ). In such a case we replace all such letters c with h(c).
Concerning the weight, h (σ (X)) = h(σ(X)) but if at least one factor was
replaced, |U | + |V | < |U | + |V |.

Reduction for Crossing ab. Since we already know how to compress a non-
crossing ab, a natural way to deal with a crossing ab is to “uncross” it and
then compress using CompNCr. To this end we pop from the variables the whole
parts of maximal ab-blocks which cause this block to be crossing. Afterwards
all maximal ab-blocks are noncrossing and so they can be compressed using
CompNCr(U, V, ab)
As an example consider an equation aaXaXaX = aXaY aY aY . (For sim-
plicity without constraints.) It is easy to see that all solutions are of the form
12 V. Diekert, A. Jeż, and W. Plandowski

X ∈ aX and Y = aY , for an arbitrary k. After the popping this equation is
turned into a3X +4 = aX +3Y , for which aa is noncrossing. Thus solution of
the original equation corresponds to the solution of the Diophantine equation:
3X + 4 = X + 3Y + 4. This points out another idea of the popping: when
we pop the whole part of block that is crossing, we do not immediately guess
its length, instead we treat the length as a parameter, identify ab-blocks of the
same length and only afterwards verify, whether our guesses were correct. The
verification is formalised as a linear system of Diophantine equations. Each of its
solutions correspond to one “real” lengths of ab-blocks popped from variables.
Note that we still need to calculate the transition of the popped ab-block,
which depends on the actual length (i.e., on particular solution X or rX ). How-
ever, this ab block is long because of repeated ab (or ba). Now, when we look at
ρ(ab), ρ(ab)2 , . . . then starting from some (at most exponential) value it becomes
periodic, the period is also at most exponential. More precisely, if P ∈ M2n is
a matrix, then we can compute in PSPACE an idempotent power P p = P 2p
2
with p ≤ 2n . Thus, if is a parameter we can guess (and fix) the remainder
r ≡ mod p with 0 ≤ r < p. We can guess if r < and in this case we substitute
the parameter by c · p + r and we view c as the new integer parameter with
the constraint c ≥ 1. This can be written as an Diophantine equation and added
to the constructed linear Diophantine system which has polynomial size if coef-
ficients are written in binary. We can check solvability (and compute a minimal
solution) in NP, see e.g., [11]. (For or a more accurate estimation of constants
see [3]).
Let D be the system created by CompCr. In this case there is no single op-
erator, rather a family ΦD , and its elements ϕD,{X ,rX }X∈Ω are defined using
a solution {X , rX }X∈Ω of D. Given an arithmetic expression with integer pa-
rameters {xX , yX }X∈Ω , by ei [{X , rX }X∈Ω ] we denote its evaluation on values
xX = X and yX = rX . In order to obtain ϕD,{X ,rX }X∈Ω [σ](X) we first replace
each letter cei with appropriate type of ab-block of length ei [{X , rX }X∈Ω ]. Af-
terwards, we prepend ab block of length X and append ab-block of length rX
(in both cases we need to take into the account the types). Concerning h , we
extend h to new letters by setting h (cei ) = h(ei [{X , rX }X∈Ω ]).

Lemma 6. Let (U, V ) have a solution σ. Let S be the set of all ab-blocks in U , V
or crossing in σ. CompCr(U, V, ab) transforms (U, V ) with (σ, h) into an equation
(U , V ) with (σ , h ). If at least one ab-factor was replaced then w(σ ) < w(σ ).
Furthermore, the family ΦD is the corresponding family of operators.

Main Transformation. The crucial property of TransformEq is that it uses

equation of bounded size, as stated in the following lemma. Note that this does
not depend on the non-deterministic choices of TransformEq.

Lemma 7. Suppose that (U, V ) is a strictly proper equation. Then during Trans-
formEq the (U, V ) is a proper equation and after it it is strictly proper.
Solutions of Equations 13

Algorithm 2. CompCr(U, V, ab) Compression of ab-blocks for a crossing ab

1: for X ∈ Ω do
2: if σ(X) begins with ab-block s then Guess
3: p ← period of ρ(ab) in M2n The same as period of ab
Guess and verify
4: guess ρs = ρ(s) and ρX such that ρ(X) = ρs ρX
5: let xX denote the length of s An integer parameter, not number
6: add to D constraint on length Ensure that ρs = ρ(s)
7: replace each X with sX, set ρ(X) ← ρX
8: if σ(X) = and ρ(X) = ρ() then Guess
9: remove X from the equation
10: Perform symmetric actions on the end of X
11: let {E1 , . . . , Ek } be maximal ab-blocks in (U, V ) (read from left to right)
12: for each Ei do
13: let ei ← |Ei | Arithmetic expression in {xX , yX }X∈Ω
14: partition {E1 , . . . , Ek }, each part has ab-blocks of the same type Guess
15: for each part {Ei1 , . . . , Eikp } do
16: for each Eij ∈ {Ei1 , . . . , Eikp } do
17: add an equation eij = eij+1 to D Ignore the meaningless last equation
18: verify created system D In NP
19: for each Ei = {Ei1 , . . . , Eikp } do
20: let cei1 ∈ B be an unused letter
21: for each Eij ∈ Ei do
22: replace every Eij by cei1

Proof. Consider, how many letters are popped into the equation during Trans-
formEq. For a fixed ab, CompCr may introduce long ab-blocks at sides of each
variable, but then they are immediately replaced with one letter, so we can count
them as one letter (and in the meantime each such popped prefix and suffix is
represented by at most four constants). Thus, 2n letters are popped in this way.
There are at most 4n crossing pairs, see Lemma 4, so in total 8n2 letters are
introduced to the equation.
Consider a constant initially present in the equation, say a, which is not
followed by a variable and is not the last letter in U or V . When equation has
size m, there are at least m − 2n − 2 such letters. Thus this a is followed by
a letter, say b, and so ab is in P and we tried to compress the maximal ab-factor
containing this ab. The only reason, why we failed is that one of a, b was already
compressed, as part of a different factor. Thus, if a (not initially followed by a
variable, nor the last letter in U and V ) was not compressed during TransformEq,
then the two following constants were. The left of those constants was present
initially at the equation, so at least (m−2n−2)/3 initial constants were removed
from the equation. Hence the result.
14 V. Diekert, A. Jeż, and W. Plandowski

Algorithm 3. TransformEq(U, V )
1: P ← list of explicit ab’s in U , V .
2: P ← crossing ab’s Done by guessing ﬁrst and last letters of each σ(X),
|P | ≤ 4n
3: P ← P \ P
4: for ab ∈ P do
5: CompNCr(U, V, ab)
6: for ab ∈ P do
7: CompCr(U, V, ab)
8: return (U, V )

Using the results above we obtain the following theorem:

Theorem 1. There is PSPACE-transducer for following problem.
Input: A system of word equations with rational constraints over a finitely
generated free group (resp. over a finitely generated free monoid with involution).
Output: A finite labeled directed graph such that the labelled paths describe all
solutions. The graph is is empty if and only if the system has no solution.
Moreover, it can be decided in PSPACE whether the input system with rational
constraints has a finite number of solutions.

References
[1] Benois, M.: Parties rationelles du groupe libre. C. R. Acad. Sci. Paris, Sér. A 269,
1188–1190 (1969)
[2] Dahmani, F., Guirardel, V.: Foliations for solving equations in groups: free, vir-
tually free and hyperbolic groups. J. of Topology 3, 343–404 (2010)
[3] Diekert, V., Gutiérrez, C., Hagenah, C.: The existential theory of equations with
rational constraints in free groups is PSPACE-complete. Inf. and Comput. 202,
105–140 (2005); Conference version in Ferreira, A., Reichel, H. (eds.) STACS 2001.
LNCS, vol. 2010, pp. 170–182. Springer, Heidelberg (2001)
[4] Diekert, V., Lohrey, M.: Word equations over graph products. IJAC 18(3), 493–533
(2008)
[5] Diekert, V., Matiyasevich, Y., Muscholl, A.: Solving word equations modulo partial
commutations. TCS 224, 215–235 (1999); Special issue of LFCS 1997
[6] Diekert, V., Muscholl, A.: Solvability of equations in free partially commuta-
tive groups is decidable. International Journal of Algebra and Computation 16,
1047–1070 (2006); Journal version of Orejas, F., Spirakis, P.G., van Leeuwen, J.
(eds.) ICALP 2001. LNCS, vol. 2076, pp. 543–554. Springer, Heidelberg (2001)
[7] Durnev, V.G.: Undecidability of the positive ∀∃3 -theory of a free semi-group.
Sibirsky Matematicheskie Jurnal 36(5), 1067–1080 (1995) (in Russian); English
translation: Sib. Math. J. 36(5), 917–929 (1995)
[8] Eilenberg, S.: Automata, Languages, and Machines, vol. A. Academic Press, New
York (1974)
[9] Gutiérrez, C.: Satisﬁability of word equations with constants is in exponential
space. In: Proc. 39th FOCS 1998, pp. 112–119. IEEE Computer Society Press,
Los Alamitos (1998)
Solutions of Equations 15

[10] Gutiérrez, C.: Satisfiability of equations in free groups is in PSPACE. In: Proc.
32nd STOC 2000, pp. 21–27. ACM Press (2000)
[11] Hopcroft, J.E., Ulman, J.D.: Introduction to Automata Theory, Languages and
Computation. Addison-Wesley (1979)
[12] Ilie, L., Plandowski, W.: Two-variable word equations. Theoretical Informatics
and Applications 34, 467–501 (2000)
[13] Jeż, A.: Recompression: a simple and powerful technique for word equations.
In: Portier, N., Wilke, T. (eds.) STACS. LIPIcs, vol. 20, pp. 233–244. Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2013)
[14] Kharlampovich, O., Myasnikov, A.: Irreducible affine varieties over a free group.
II: Systems in triangular quasi-quadratic form and description of residually free
groups. J. of Algebra 200(2), 517–570 (1998)
[15] Kharlampovich, O., Myasnikov, A.: Elementary theory of free non-abelian groups.
J. of Algebra 302, 451–552 (2006)
[16] Kościelski, A., Pacholski, L.: Complexity of Makanin’s algorithm. J. Association
for Computing Machinery 43(4), 670–684 (1996)
[17] Makanin, G.S.: The problem of solvability of equations in a free semigroup. Math.
Sbornik 103, 147–236 (1977) English transl. in Math. USSR Sbornik 32 (1977)
[18] Makanin, G.S.: Equations in a free group. Izv. Akad. Nauk SSR, Ser. Math. 46,
1199–1273 (1983) English transl. in Math. USSR Izv. 21 (1983)
[19] Makanin, G.S.: Decidability of the universal and positive theories of a free group.
Izv. Akad. Nauk SSSR, Ser. Mat. 48, 735–749 (1984) (in Russian); English trans-
lation in: Math. USSR Izvestija 25, 75–88 (1985)
[20] Matiyasevich, Y.: Hilbert’s Tenth Problem. MIT Press, Cambridge (1993)
[21] Matiyasevich, Y.: Some decision problems for traces. In: Adian, S., Nerode, A.
(eds.) LFCS 1997. LNCS, vol. 1234, pp. 248–257. Springer, Heidelberg (1997)
[22] Plandowski, W.: Satisfiability of word equations with constants is in PSPACE. J.
Association for Computing Machinery 51, 483–496 (2004)
[23] Plandowski, W.: An efficient algorithm for solving word equations. In: Proc. 38th
STOC 2006, pp. 467–476. ACM Press (2006)
[24] Plandowski, W.: (unpublished, 2014)
[25] Plandowski, W., Rytter, W.: Application of Lempel-Ziv encodings to the solution
of word equations. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998.
LNCS, vol. 1443, pp. 731–742. Springer, Heidelberg (1998)
[26] Razborov, A.A.: On Systems of Equations in Free Groups. PhD thesis, Steklov
Institute of Mathematics (1987) (in Russian)
[27] Razborov, A.A.: On systems of equations in free groups. In: Combinatorial and
Geometric Group Theory, pp. 269–283. Cambridge University Press (1994)
[28] Rips, E., Sela, Z.: Canonical representatives and equations in hyperbolic groups.
Inventiones Mathematicae 120, 489–512 (1995)
[29] Schulz, K.U.: Makanin’s algorithm for word equations — Two improvements and a
generalization. In: Schulz, K.U. (ed.) IWWERT 1990. LNCS, vol. 572, pp. 85–150.
Springer, Heidelberg (1992)
Algorithmic Meta Theorems
for Sparse Graph Classes

Martin Grohe

RWTH Aachen University, Germany

grohe@informatik.rwth-aachen.de

Abstract. Algorithmic meta theorems give eﬃcient algorithms for

classes of algorithmic problems, instead of just individual problems. They
unify families of algorithmic results obtained by similar techniques and
thus exhibit the core of these techniques. The classes of problems are typ-
ically deﬁned in terms of logic and structural graph theory. A well-known
example of an algorithmic meta theorem is Courcelle’s Theorem, stating
that all properties of graphs of bounded tree width that are deﬁnable in
monadic second-order logic are decidable in linear time.
This paper is a brief and nontechnical survey of the most important
algorithmic meta theorems.

Introduction

It is often the case that a wide range of algorithmic problems can be solved by
essentially the same technique. Think of dynamic programming algorithms on
graphs of bounded tree width or planar graph algorithms based on layerwise
(or outerplanar) decompositions. In such situations, it is natural to try to find
general conditions under which an algorithmic problem can be solved by these
techniques—this leads to algorithmic meta theorems. However, it is not always
easy to describe such conditions in a way that is both mathematically precise
and sufficiently general to be widely applicable. Logic gives us convenient ways
of doing this. An early example of an algorithmic meta theorem based on logic
is Papadimitriou and Yannakakis’s [40] result that all optimisation problems in
the class MAXSNP, which is defined in terms of a fragment of existential second-
order logic, admit constant-ratio polynomial time approximation algorithms.
Besides logic, most algorithmic meta theorems have structural graph theory
as a second important ingredient in that they refer to algorithmic problems re-
stricted to specific graph classes. The archetypal example of such a meta theorem
is Courcelle’s Theorem [3], stating that all properties of graphs of bounded tree
width that are definable in monadic second-order logic are decidable in linear
time.
The main motivation for algorithmic meta theorems is to understand the core
and the scope of certain algorithmic techniques by abstracting from problem-
specific details. Sometimes meta theorems are also crucial for obtaining new
algorithmic results. Two recent examples are

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 16–22, 2014.

c Springer International Publishing Switzerland 2014
Algorithmic Meta Theorems for Sparse Graph Classes 17

– a quadratic-time algorithm for a structural decomposition of graphs with

excluded minors [27], which uses Courcelle’s Theorem;
– a logspace algorithm for deciding wether a graph is embeddable in a fixed
surface [16], which uses a logspace version of Courcelle’s Theorem [14].
Furthermore, meta theorems often give a quick and easy way to see that certain
problems can be solved efficiently (in principle), for example in linear time on
graphs of bounded tree width. Once this has been established, a problem-specific
analysis may yield better algorithms. However, an implementation of Courcelle’s
Theorem has shown that the direct application of meta theorems can yield com-
petitive algorithms for common problems such as the dominating set problem
[37].
In the following, I will give an overview of the most important algorithmic
meta theorems, mainly for decision problems. For a more thorough introduction,
I refer the reader to the surveys [26,28,33].

Meta Theorems for Monadic Second-Order Logic

Recall Courcelle’s Theorem [3], stating that all properties of graphs of bounded
tree width that are definable in monadic second-order logic (MSO) are decidable
in linear time. Courcelle, Makowsky, and Rotics [5] generalised this result to
graph classes of bounded clique width, whereas Kreutzer and Tazari showed
in a series of papers [34,35,36] that polylogarithmic tree width is a necessary
condition for meta theorems for MSO on graph classes satisfying certain closure
conditions like being closed under taking subgraphs (also see [23]).
In a different direction, Elberfeld, Jakoby, and Tantau [14,15] proved that all
MSO-definable properties of graphs of bounded tree width can be decided in
logarithmic space and all MSO-definable properties of graphs of bounded tree
depth can be decided in AC0 .

Meta Theorems for First-Order Logic

For properties definable in first-order logic (FO), we know meta theorems on a much
larger variety of graph classes. Before I describe them, let me point out an impor-
tant difference between FO-definable properties and MSO-definable properties. In
MSO, we can define NP-complete problems like 3-COLOURABILITY. Thus if we
can decide MSO-definable properties of a certain class of graphs in polynomial time,
this is worth noting. But the range of graph classes where we can hope to achieve
this is limited. For example, the MSO-definable property 3-COLOURABILITY
is already NP-complete on the class of planar graphs. On the other hand, all FO-
definable properties of (arbitrary) graphs are decidable in polynomial time (even
in uniform AC0 ). When proving meta theorems for FO, we are usually interested
in linear-time algorithms or polynomial-time algorithms with a small fixed expo-
nent in the running time, that is, in fixed-parameter tractability (see, for example,
[10,19]). When we say that NP-complete problems like MINIMUM DOMINATING
18 M. Grohe

SET are definable in FO, we mean that for each k there is an FO-sentence φk stating
that a graph has a dominating set of size at most k. Thus we define (the decision
version of) the DOMINATING SET problem by a family of FO-sentences, and if
we prove that FO-definable properties can be decided in linear time on a certain
class of graphs, this implies that DOMINATING SET parameterized by the size of
the solution is fixed-parameter tractable on this class of graphs. By comparison, we
can define 3-COLOURABILITY by a single MSO-sentence, and if we prove that
MSO-definable properties can be decided in linear time on a certain class of graphs,
this implies that 3-COLOURABILITY can be decided in linear time on this class
of graphs.
After this digression, let us turn to the results. The first notable meta theorem
for deciding FO-definable properties, due to Seese [41], says that FO-definable
properties of bounded-degree graphs can be decided in linear time. Frick and
Grohe [21] gave linear-time algorithms for deciding FO-definable properties of
planar graphs and all apex-minor-free graph classes and O(n1+ ) algorithms for
graph classes of bounded local tree width. Flum and Grohe [18] proved that de-
ciding FO-definable properties is fixed-parameter tractable on graph classes with
excluded minors, and Dawar, Grohe, and Kreutzer [8] extended this to classes
of graphs locally excluding a minor. Dvořák, Král, and Thomas [13] proved that
FO-definable properties can be decided in linear time on graph classes of bounded
expansion and in time O(n1+ ) on classes of locally bounded expansion. Finally,
Grohe, Kreutzer, and Siebertz [30] proved that FO-definable properties can be
decided in linear time on nowhere dense graph classes. Figure 1 shows the con-
tainment relation between all these and other sparse graph classes. Nowhere
dense classes were introduced by Nešetřil and Ossona de Mendez [38,39] (also
see [29]) as a formalisation of classes of “sparse” graphs. They include most fa-
miliar examples of sparse graph classes like graphs of bounded degree and planar
graphs. Notably, classes of bounded average degree or bounded degeneracy are
not necessarily nowhere dense.
The meta theorem for FO-definable properties of nowhere dense classes is op-
timal if we restrict our attention to graph classes closed under taking subgraphs:
if C is a class of graphs closed under taking subgraphs that is somewhere dense
(that is, not nowhere dense), then deciding FO-properties of graphs in C is as
hard as deciding FO-properties of arbitrary graphs, with respect to a suitable
form of reduction [13,33]. Thus under the widely believed complexity-theoretic
assumption FPT = AW[∗], which is implied by more familiar assumptions like the
exponential time hypothesis or FPT = W[1], deciding FO-definable properties of
graphs from C is not fixed-parameter tractable.
There are a few meta theorems for FO-definable properties of graph classes
that are somewhere dense (and hence not closed under taking subgraphs). Ga-
nian et al. [24] give quasilinear-time algorithms for certain classes of interval
graphs. By combining the techniques of [5] and [21], it can easily be shown
that deciding FO-definable properties is fixed-parameter tractable on graphs of
bounded local rank width (see [26]). It is also easy to prove fixed-parameter
tractability for classes of unbounded, but slowly growing degree [11,25].
Algorithmic Meta Theorems for Sparse Graph Classes 19

bounded average degree

bounded degeneracy
somewhere dense
nowhere dense
nowhere dense

locally bounded expansion

bounded expansion
locally excluded minor
excluded topological subgraph

bounded local
excluded minor
tree width
bounded genus bounded degree
bounded tree width
planar

trees

Fig. 1. Sparse graph classes

Counting, Enumeration, and Optimisation Problems

Many of the meta theorems above have variants for counting and enumeration
problems, where we are given a formula φ(x1 , . . . , xk ) with free variables and
want to compute the number of tuples satisfying the formula in a given graph
or compute a list of all such tuples, and also for optimisation problems. (See, for
example, [1,2,4,6,7,9,12,11,17,20,31,32].)

Uniformity

In this paper, we stated all meta theorems in the form: for every property deﬁn-
able in a logic L on a class C of graphs there is an O(nc ) algorithm. Here n is the
number of vertices of the input graph, and the exponent c is a small constant,
most often 1 or (1 + ). However, all these theorems hold in a uniform version of
the form: there is an algorithm that, given an L-sentence φ and a graph G ∈ C,
decides whether φ holds in G in time f (k) · nc , where k is the length of the sen-
tence φ and f is some computable function (the exponent c remains the same).
For families of classes constrained by an integer parameter , such as the classes
of all graphs of tree width at most or the classes of all graphs that exclude an
20 M. Grohe

-vertex graph as a minor, we even have an algorithm running in time g(k, ) · nc ,

for a computable function g.
We have seen that the exponent c in a running time f (k) · nc is usually close
to optimal (c = 1 or c = 1 + for every > 0). However, the “constant factor”
f (k) is prohibitively large: if P = NP, then for every algorithm deciding MSO-
definable properties on the class of trees in time f (k) · nc for a fixed exponent
c, the function f is nonelementary. The same holds for FO-definable properties
under the stronger assumption FPT = AW[∗] [22]. This implies corresponding
lower bounds for almost all classes considered in this paper, because they contain
the class of trees (see Figure 1). An elementary dependence on k can only be
achieved on classes of bounded degree [22].

References
1. Arnborg, S., Lagergren, J., Seese, D.: Easy problems for tree-decomposable graphs.
Journal of Algorithms 12, 308–340 (1991)
2. Bagan, G.: MSO queries on tree decomposable structures are computable with
linear delay. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 167–181. Springer,
Heidelberg (2006)
3. Courcelle, B.: Graph rewriting: An algebraic and logic approach. In: van Leeuwen,
J. (ed.) Handbook of Theoretical Computer Science, vol. B, pp. 194–242. Elsevier
Science Publishers (1990)
4. Courcelle, B.: Linear delay enumeration and monadic second-order logic. Discrete
Applied Mathematics 157(12), 2675–2700 (2009)
5. Courcelle, B., Makowsky, J., Rotics, U.: Linear time solvable optimization problems
on graphs of bounded clique width. Theory of Computing Systems 33(2), 125–150
(2000)
6. Courcelle, B., Makowsky, J., Rotics, U.: On the fixed-parameter complexity of
graph enumeration problems definable in monadic second-order logic. Discrete Ap-
plied Mathematics 108(1-2), 23–52 (2001)
7. Courcelle, B., Mosbah, M.: Monadic second-order evaluations on tree-
decomposable graphs. Theoretical Computer Science 109, 49–82 (1993)
8. Dawar, A., Grohe, M., Kreutzer, S.: Locally excluding a minor. In: Proceedings of
the 22nd IEEE Symposium on Logic in Computer Science, pp. 270–279 (2007)
9. Dawar, A., Grohe, M., Kreutzer, S., Schweikardt, N.: Approximation schemes for
first-order definable optimisation problems. In: Proceedings of the 21st IEEE Sym-
posium on Logic in Computer Science, pp. 411–420 (2006)
10. Downey, R., Fellows, M.: Fundamentals of Parameterized Complexity. Springer
(2013)
11. Durand, A., Schweikardt, N., Segoufin, L.: Enumerating first-order queries over
databases of low degree. In: Proceedings of the 33rd ACM Symposium on Principles
of Database Systems (2014)
12. Durand, A., Grandjean, E.: First-order queries on structures of bounded degree
are computable with constant delay. ACM Transactions on Computational Logic
8(4) (2007)
13. Dvořák, Z., Král, D., Thomas, R.: Deciding first-order properties for sparse graphs.
In: Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer
Science, pp. 133–142 (2010)
Algorithmic Meta Theorems for Sparse Graph Classes 21

14. Elberfeld, M., Jakoby, A., Tantau, T.: Logspace versions of the theorems of bod-
laender and courcelle. In: Proceedings of the 51st Annual IEEE Symposium on
Foundations of Computer Science, pp. 143–152 (2010)
15. Elberfeld, M., Jakoby, A., Tantau, T.: Algorithmic meta theorems for circuit classes
of constant and logarithmic depth. In: Dürr, C., Wilke, T. (eds.) Proceedings
of the 29th International Symposium on Theoretical Aspects of Computer Sci-
ence. LIPIcs, vol. 14, pp. 66–77. Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik
(2012)
16. Elberfeld, M., Kawarabayashi, K.I.: Embedding and canonizing graphs of bounded
genus in logspace. In: Proceedings of the 46th ACM Symposium on Theory of
Computing (2014)
17. Flum, J., Frick, M., Grohe, M.: Query evaluation via tree-decompositions. Journal
of the ACM 49(6), 716–752 (2002)
18. Flum, J., Grohe, M.: Fixed-parameter tractability, definability, and model checking.
SIAM Journal on Computing 31(1), 113–145 (2001)
19. Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer (2006)
20. Frick, M.: Generalized model-checking over locally tree-decomposable classes. The-
ory of Computing Systems 37(1), 157–191 (2004)
21. Frick, M., Grohe, M.: Deciding first-order properties of locally tree-decomposable
structures. Journal of the ACM 48, 1184–1206 (2001)
22. Frick, M., Grohe, M.: The complexity of first-order and monadic second-order logic
revisited. Annals of Pure and Applied Logic 130, 3–31 (2004)
23. Ganian, R., Hliněný, P., Langer, A., Obdrlek, J., Rossmanith, P., Sikdar, S.: Lower
bounds on the complexity of MSO1 model-checking. In: Dürr, C., Wilke, T. (eds.)
Proceedings of the 29th International Symposium on Theoretical Aspects of Com-
puter Science. LIPIcs, vol. 14, pp. 326–337. Schloss Dagstuhl, Leibniz-Zentrum fuer
Informatik (2012)
24. Ganian, R., Hliněný, P., Král’, D., Obdržálek, J., Schwartz, J., Teska, J.: FO model
checking of interval graphs. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg,
D. (eds.) ICALP 2013, Part II. LNCS, vol. 7966, pp. 250–262. Springer, Heidelberg
(2013)
25. Grohe, M.: Generalized model-checking problems for first-order logic. In: Ferreira,
A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 12–26. Springer, Heidel-
berg (2001)
26. Grohe, M.: Logic, graphs, and algorithms. In: Flum, J., Grädel, E., Wilke, T. (eds.)
Logic and Automata – History and Perspectives. Texts in Logic and Games, vol. 2,
pp. 357–422. Amsterdam University Press (2007)
27. Grohe, M., Kawarabayashi, K., Reed, B.: A simple algorithm for the graph minor
decomposition – logic meets structural graph theory. In: Proceedings of the 24th
Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 414–431 (2013)
28. Grohe, M., Kreutzer, S.: Methods for algorithmic meta theorems. In: Grohe, M.,
Makowsky, J. (eds.) Model Theoretic Methods in Finite Combinatorics. Contem-
porary Mathematics, vol. 558, pp. 181–206. American Mathematical Society (2011)
29. Grohe, M., Kreutzer, S., Siebertz, S.: Characterisations of nowhere denese graphs.
In: Seth, A., Vishnoi, N. (eds.) Proceedings of the 32nd IARCS Annual Conference
on Foundations of Software Technology and Theoretical Computer Science. LIPIcs,
vol. 24, pp. 21–40. Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik (2013)
30. Grohe, M., Kreutzer, S., Siebertz, S.: Deciding first-order properties of nowhere
dense graphs. In: Proceedings of the 46th ACM Symposium on Theory of Com-
puting (2014)
22 M. Grohe

31. Kazana, W., Segoufin, L.: First-order query evaluation on structures of bounded
degree. Logical Methods in Computer Science 7(2) (2011)
32. Kazana, W., Segoufin, L.: Enumeration of first-order queries on classes of struc-
tures with bounded expansion. In: Proceedings of the 32nd ACM Symposium on
Principles of Database Systems, pp. 297–308 (2013)
33. Kreutzer, S.: Algorithmic meta-theorems. In: Esparza, J., Michaux, C., Steinhorn,
C. (eds.) Finite and Algorithmic Model Theory. London Mathematical Society
Lecture Note Series, ch. 5, pp. 177–270. Cambridge University Press (2011)
34. Kreutzer, S., Tazari, S.: Lower bounds for the complexity of monadic second-order
logic. In: Proceedings of the 25th IEEE Symposium on Logic in Computer Science,
pp. 189–198 (2010)
35. Kreutzer, S., Tazari, S.: On brambles, grid-like minors, and parameterized in-
tractability of monadic second-order logic. In: Proceedings of the 21st Annual
ACM-SIAM Symposium on Discrete Algorithms, pp. 354–364 (2010)
36. Kreutzer, S.: On the parameterised intractability of monadic second-order logic.
In: Grädel, E., Kahle, R. (eds.) CSL 2009. LNCS, vol. 5771, pp. 348–363. Springer,
Heidelberg (2009)
37. Langer, A., Reidl, F., Rossmanith, P., Sikdar, S.: Evaluation of an mso-solver.
In: Proceedings of the 14th Meeting on Algorithm Engineering & Experiments,
pp. 55–63 (2012)
38. Nešetřil, J., Ossona de Mendez, P.: On nowhere dense graphs. European Journal
of Combinatorics 32(4), 600–617 (2011)
39. Nešetřil, J., Ossona de Mendez, P.: Sparsity. Springer (2012)
40. Papadimitriou, C., Yannakakis, M.: Optimization, approximation, and complexity
classes. Journal of Computer and System Sciences 43, 425–440 (1991)
41. Seese, D.: Linear time computable problems and first-order descriptions. Mathe-
matical Structures in Computer Science 6, 505–526 (1996)
The Lattice of Definability.
Origins, Recent Developments,
and Further Directions

Alexei Semenov1,2,3 , Sergey Soprunov2, and Vladimir Uspensky1

1
Moscow State University
2
Dorodnicyn Computing Center of the Russian Academy of Sciences
3
Moscow Pedagogical State University
alsemenov@umail.ru, soprunov@mail.ru, vau30@list.ru

Abstract. The paper presents recent results and open problems on

Keywords: Deﬁnability, deﬁnability space, reducts, Svenonius theorem,

quantiﬁer elimination, decidability, automorphisms.

“Mathematicians, in general, do not like

to operate with the notion of deﬁnability;
their attitude towards this notion
is one of distrust and reserve.”
Alfred Tarski [Tar4]

1 Introduction. The Basic Deﬁnition

One of the “most existential” problems of humanity is “How to define something
through something being understood yet”. It sounds even more important than
“What is Truth?” [John 18:38].
Let us recollect our understanding of the problem in the context of modern
mathematics.
Language (of definition):
– Logical symbols including connectives, (free) variables: x0 , x1 , . . . ,
quantifiers and equality =.
– Names of relations (sometimes, names of objects and operations as
well). The set of these names is called signature. Each name has its
number of arguments (its arity).
– Formulas.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 23–38, 2014.

c Springer International Publishing Switzerland 2014
24 A. Semenov, S. Soprunov, and V. Uspensky

Structure of signature Σ is a triple D, Σ, Int. Here D is a set (mostly count-

able infinite in this paper) called the domain of the structure, Int is an interpre-
tation that maps every n-ary name of relation to n-ary relation on D, in other
words a subset of Dn .
If the structure of signature Σ is given then every formula in the language
with this signature defines a relation on D.
The most common language uses quantifiers over D. But sometimes we con-
sider other options like quantifiers over subsets of D (monadic language), over
finite subsets of D (weak monadic language), or no quantifiers at all (quantifier-
free language).
Let us fix a domain D and any set of relations S on D. Then we can take
any finite subset of S and give names to its elements. We have now a structure
and interpretation for elements of its signature as was given beforehand. Any
formula in the constructed language defines a relation on D. We call any relation
on D obtained in this way definable in S (over D).
All relations definable in S constitute closure of S. Any closed set of relations
is called definability space (over D).
Any definability space S has its group of automorphisms Aut(S) i. e. permuta-
tions on D that preserve all relations from the space S.
All definability spaces over D constitute the lattice of definability with natu-
ral lattice operations. Evidently lattices for different (countable infinite) D are
isomorphic. In other words we have one lattice only.
Investigation of this lattice is the major topic of our paper. In particular we
consider lattice of subspaces of a given definability space. The subspaces are
called also reducts of the space.
We shall consider finitely generated spaces mostly.
If a set of generators for a space is given we can define the theory of the
space. Choosing a system of generators is like choosing a coordinate system for
a linear space. For finitely generated spaces such properties as decidability of
their theory is invariant for the choosing different sets of generators.
Today we feel that the concept of definability space (independently of formal
definition and terminology) is very basic and central for mathematical logic and
even for mathematics in general. As we will see from the next chapter it was
in use as long as the development of the very discipline of mathematical logic
was happening. Nevertheless the major results concerning it, including precise
definitions and fundamental theorems were obtained quite late and paid much
less respect than those concerning notions of “Truth” and “Provability”.
We try to keep our text self-contained and introduce needed definitions.

2 The History and Modern Reﬂections

In our short historical survey we use materials from [BuDa, Smi, Hod].
We try to trace back original sources and motivations. In some important cases
understanding of problems and meaning of notions were changed considerably
The Lattice of Deﬁnability 25

over time. It is important to consider original ideas along with their maturity 30
years and even much later. As we will see the scene of events was pretty much
international.

2.1 Relations, Logic, Languages. Early Approaches of XIX Century

Our central notion of an assignment satisfying a formula is implicit in George
Peacock [Pea] and explicit in George Boole [Boo], though without a precise
notion of “formula” in either case.
In 1860 – 1890 -s Frege developed understanding of relations and quantifiers
[Fre, Fre1].
Peirce established the fundamental laws of the calculus of classes and created
the theory of relatives. Essentially it was the definability issue. Starting with his
1870 paper [Pei], Peirce presented the final form of his first-order logic in his
1885 paper [Pei1]. Pierce’s theory was the frame that made possible the proof
by Leopold Löwenheim of the first metamathematical theorem in his [Löw].
Löwenheim proved that every first-order expression [Zählausdrücke] is either
contradictory or already satisfiable in a countable infinite domain (see [Sko]).
So, the major concepts of semantics were used by Löwenheim as well as Thoralf
Skolem, but were not explicitly presented in their papers.
Schröder proposed the first complete axiomatization of the calculus of classes
and expanded considerably the calculus and the theory of relatives [Sch, Sch1].

2.2 Automorphisms. Isomorphisms

With the appearance of Klein’s Erlangenprogramm in 1872 [Kle] it became clear
that automorphism groups are useful means of studying mathematical theories.
The word “isomorphism” appeared in the definition of categoricity bt Hunt-
ington [Hun]. There he says that “special attention may be called to the dis-
cussion of the notion of isomorphism between two systems, and the notion of a
sufficient, or categorical, set of postulates”.
Alfred Tarski in his paper “What are Logical Notions?”, presented first in 1963
[Tar4] explains the subject of logic as study of “everything” up to permutations:
“I shell try to extend his [Klein’s] method beyond geometry and apply it also to
logic . . . I use the term “notion” in a rather loose and general sense . . . Thus
notions include individuals, classes of individuals, relations on individuals”.

2.3 How to Deﬁne Major Mathematical Structures? Geometry and

Numbers. 1900 -s. The Width
At the end of XIX century Italian (Giuseppe Peano, Alessandro Padoa, Mario
Pieri, . . . ) and German (Gotlob Frege, Moritz Pasch, David Hilbert, . . . ) math-
ematicians tried to ﬁnd “the best” set of primitive notions for Geometry and
Arithmetic considered as deductive systems. This was about “how to deﬁne
something through something”.
In August 1900 The First International Congress of Philosophy [ICP] followed
by the Second International Congress of Mathematicians [ICM] met in Paris.
26 A. Semenov, S. Soprunov, and V. Uspensky

At the mathematical congress Hilbert presented his list of problems [Hil],

some of which became central to mathematical logic, Padoa gave two talks on
the axiomatizations of the integers and of geometry.
At the philosophical congress Russell read a paper on the application of the
theory of relations to the problem of order and absolute position in space and
time. The Italian school of Peano and his disciples contributed papers on the
logical analysis of mathematics. Peano and Burali-Forti spoke on definitions,
Pieri spoke on geometry considered as a purely logical system. Padoa read his
famous essay containing the “logical introduction to any theory whatever”, where
he states:
“To prove that the system of undefined symbols is irreducible with re-
spect to the system of unproved propositions [axioms] it is necessary
and sufficient to find, for any undefined symbol, an interpretation of the
system of undefined symbols that verifies the system of unproved propo-
sitions and that continues to do so if we suitably change the meaning of
only the symbol considered.”
Pieri formulated about 1900 and completed in his 1908 “Point and Sphere”
memoir, a full axiomatization of Euclidean geometry based solely on the unde-
fined notions point and equidistance of two points from a third point [Pie].
Tarski’s undefined notions were point and two relations: congruence of two
point pairs and betweenness of a triple. Tarski and Adolf Lindenbaum [LiTa]
showed that in the first-order context, Pieri’s selection of equidistance as the
sole undefined relation for Euclidean geometry was optimal. No family of binary
relations, however large, can serve as the sole undefined relations.
We considered the problem of minimization of maximal number of arguments
in generators of a given definability space.
Definition 1. Let a definability space S is given. Its width is the minimal n
such as S can be generated by relations with n or less arguments.

Theorem 1. [Sem] There are deﬁnability spaces of any ﬁnite or countable width.

Huntington and Oswald Veblen were part of a group of mathematicians known

as the American Postulate Theorists. Huntington was concerned with providing
“complete” axiomatizations of various mathematical systems, such as the theory
of the algebra of logic and the theory of real numbers. In 1935 Hungtington
published [Hun1] “Inter-relations among the four principal types of order”, where
he says:
“The four types of order whose inter-relations are considered in this pa-
per may be called, for brevity, (1) serial order; (2) betweenness; (3) cyclic
order; and (4) separation.”

These “four types of order” will play special role in the further developments
discussed in the present paper.
The Lattice of Deﬁnability 27

2.4 The Exact Formulation of Deﬁnability

Indirectly the notion of truth and even more indirectly deﬁnability were present
from beginning of 1900 -s and even earlier. For example he word “satisfy” in this
context may be due to Huntington (for example in [Hun2]). We mentioned works
of Löwenheim and Skolem.
But only the formal (“usual” inductive) deﬁnition of truth by Tarski gives the
modern (model-theoretic) understanding of semantics of a formula as a relation
over a domain [Tar].
Complications in understanding today of Tarski and Lindenbaum meaning of
Padoa’s method (relevant for our considerations) are discussed in [Hod1].

2.5 Elimination of Quantiﬁers

In the attempts to describe meaning of logical formulas and to obtain “decid-

ability” (in the sense “to prove or disprove”) versions of quantifier elimination
were developed in 1910 – 1930 -s. Remarkable results were published in the end
of 1920 -s.
C. H. Langford used this method in 1927 [Lan, Lan1] to prove decidability of
the theory of dense order without a first or last element.
Mojżesz Presburger [Pre] proved elimination of quantifiers for the additive
theory of the integers.
Not using the formal (Tarski-style) definition Skolem illustrated in 1929 an
elimination [Sko1] for the theory of order and multiplication on the natural
numbers. The complete proof was obtained by of Mostowski in 1952 [Mos].
Tarski himself announced in 1931 a decision procedure for elementary algebra
and geometry (published however only in 1948, see [Tar1]).
Elimination of quantifiers was considered as a part of introducing semantics.
A natural modern definition appealing to finite signature was not used. In fact,
both Presburger and Tarski structures DO NOT permit elimination of quantifiers
in this sense. But in these cases you can either choose using operations and terms
in atomic formulas, or take a finite set of generators,
then every formula can be effectively transform to an equivalent of a limited
quantifier depth (the number of quantifier changes).
Let S be a definability space generated by a finite set of relations F . Consider
a quantifier hierarchy of subsets of S: F0 , F1 , . . .. Here F0 is a quantifier-free
(Boolean) closure of F , for every i = 0, 1, . . . Fi+1 is obtained from Fi by taking
all projections of its relations (adding of existential quantifiers) and then getting
Boolean closure. (An alternative definition can be given by counting quantifier
alternations.) The hierarchy can be of infinite length if Fi+1 differs of Fi for all
i, or finite length n – minimal for which Fn+1 = Fn .
Here are several well-known examples. We indicate informally the structure
and the – length of the hierarchy for it:

– Q; < – 0.
– Dense order [0, 1] – 0 (if we include these elements into signature).
28 A. Semenov, S. Soprunov, and V. Uspensky

– Z; +1 – 1.
– Presburger arithmetic – 1. Linear forms, congruences module m
can be introduced via existential quantifiers. Extensions of + with
rapidly growing functions [Sem1].
– Tarski algebra – 1. Again, polynomials can be explained with exis-
tential quantifiers only.
– Skolem arithmetic – 1.
– Multiple successor arithmetic (automata proofs) – 1.
– Arithmetic of + and × – infinity.

A priory the length of the hierarchy for the space S can depend of the choice
of (ﬁnite) F .

Problem 1. Can the hierarchy length be really diﬀerent for diﬀerent choices of
F?

Definition 2. The depth of a definability space is the minimal (over all finite
sets of generators) length of the quantifier hierarchy for it.

In [Sem] a problem on existing of other options was formulated. The answer was
obtained in 2010:

Theorem 2. [SeSo] There are spaces of arbitrary ﬁnite or inﬁnite depth.

Problem 2. Are there “natural” examples of “big” (2, 3, 4, . . . ) ﬁnite depth?

Problem 3. What is the depth of Rabin Space [Rab], [Muc]?

2.6 Decidability
Decidability in the sense of existing an algorithm to decide is a statement (closed
formula) true or false was a key question of study. For example, Tarski result on
the ﬁeld of reals implies the decidability of Geometry. The decidability results
for multiple successor arithmetic led Elgot and Rabin to the following problem

Problem 4. [ElRa] Does there exist a structure with maximally decidable theory?

We say that a ﬁnitely generated deﬁnability space has a maximally decidable

theory iff its theory is decidable and any greater finitely generated definability
space does not have a decidable theory.
Soprunov proved in [Sop] (using forcing arguments) that every space in which
a regular ordering is definable is not maximal. A partial ordering B; < is said
to be regular if for every a ∈ B there exist distinct elements b1 , b2 ∈ B such that
b1 < a, b2 < a, and no element c ∈ B satisfies both c < b1 and c < b2 . As a
corollary he also proved that there is no maximal decidable space if we use weak
monadic language for definability instead of our standard language.
In [BeCe], Bès and Cégielski consider a weakening of the Elgot – Rabin ques-
tion, namely the question of whether all structures M whose theory is decidable
The Lattice of Definability 29

can be expanded by some constant in such a way that the resulting structure
still has a decidable theory. They answer this question negatively by proving
that there exists a structure M with a decidable theory (even monadic theory)
and such that any expansion of M by a constant has an undecidable theory.
In [BeCe1] they indicate a suﬃcient condition for a space with decidable
theory no to be maximal.
In our context it is natural to consider also decidability of elements of a
deﬁnability space. Of course we need a “constructivisation” of the domain D.
For example, we can take natural numbers as it.

Deﬁnition 3. We call a space decidable if all its elements are decidable. We call
a ﬁnitely generated space uniformly decidable if there is an algorithm providing
a decision procedure for any formula (using the generators) and any vector of
its arguments.

Problem 5. [Sem], 2003. Are there spaces of arbitrary ﬁnite or inﬁnite depth
with decidable theory?

Problem 6. Are there decidable and uniformly decidable spaces of arbitrary ﬁnite
or inﬁnite depth?

Problem 7. Does there exist a maximal decidable structure?

We say that a finitely generated definability space is maximal decidable iff it’s

decidable but any greater ﬁnitely generated deﬁnability space is not decidable.
As it was shown in [Sem1] there is an unary predicate R for which the space
generated by +, R on the domain of natural numbers is decidable, but not
uniformly, and has an undecidable theory.

3 General Fundamental Theorems on Deﬁnability vs.

Provability and Automorphisms. 1950 -s
Buchi and Danhof [BuDa] outlined the transition between end of 1930 -s and
end of 1950 -s:

“At this time it might have seemed that most of the basic problems of el-
ementary axiom systems were solved. A more careful observer however,
upon reading the papers of Tarski [Tar2, Tar3], might have wondered
about the existence of general theorems which would explain elemen-
tary definability as the above theorems explain the basic properties of
elementary logical consequence.
One such theorem, the completeness, in the sense of definability, of
elementary logic was proved by Beth in 1953 [Bet]. In 1959 Svenonius
[Sve] published a further result on elementary definability. Just as with
the earlier results of Beth and Craig, logicians seem slow in recognizing
Svenonius’ theorem as a basic tool in the theory of definability, perhaps
because it is not generally known to be available.”
30 A. Semenov, S. Soprunov, and V. Uspensky

These results are generally considered as realization of Padoa’s idea (or

“method”).
Let Σ is a signature, we say that M = D , Σ, Int is an extension of M =
D, Σ, Int if D is a subset of D and Int(R) is the restriction of Int (R) on D for
any R ∈ Σ.
We say that M is an elementary extension of M if the previous condition
holds for any definable relation, i. e. if R is definable in M relation, then R is
the restriction on D of the relation, definable in M by the same formula.
In our context Svenonius’ theorem is the most useful tool. Here is its suitable
formulation.

Theorem 3. (Svenonius Theorem) Let M — countable structure with sig-

nature Σ+ and let Σ ⊂ Σ+ , R ∈ Σ+ . The following statements are equivalent:
(i) Relation R belongs to closure of Σ in M ,
(ii) For any M countable elementary extension of M and any permutation
of the domain of M which preserves Σ, preserves R.

The idea here is to use an additional structure to the original one and consider
its elementary extensions. The additional structure narrows the class of exten-
sions and makes the extensions more comprehensible, so we can find the needed
automorphism.
In fact, we can use one universe only in a modification of the theorem as was
shown in [SeSo1].
By F we denote the set of everywhere defined functions f : N → D. If R is
n-ary relation on D and ϕ is a mapping F → F then we say that ϕ almost
preserves R if {i | R(f1 (i), . . . , fn (i)) ≡ R(ϕ(f1 )(i), . . . , ϕ(fn )(i))} is finite for
any f1 , . . . , fn in Dom(ϕ).

Theorem 4. (CH) Let S be a deﬁnability space. The following conditions are

equivalent:
(1) Relation R ∈ S,
(2) any permutation ϕ on F which almost preserves all relations from S almost
preserves R.

The remarkable feature of this form of Svenonius Theorem is that the condition
(2) is purely combinatorial, not appealing to any logical language.

4 The Deﬁnability Lattice

Numerous results were devoted to the study of specific definability spaces. For
example, Inan Korec in [Kor] surveyed different natural generation sets for the
definability space generated by addition and multiplication of integers.
Cobham — Semenov’s theorem [Sem2] states that nontrivial intersection of
spaces generated by automata working in different bases should be exactly the
space generated by +. (This will be considered later in the context of self-
definability of Muchnik.)
The Lattice of Definability 31

4.1 Authomorphisms and Galois Correspondence. ω-categoricity

As we see in Svenonius theorem the authomorphism group is an important object
in the study of definability spaces.
The symmetric group Sym(D) on a set D is the group consisting of all per-
mutations of D.
There is a natural topology on the symmetric group, we mean the topology
of pointwise convergence: a basis of neighborhoods of an element consists of all
permutations that coincide with the element on a finite set.
It’s easy to see that for spaces S and T we have S ⊆ T ⇒ Aut(S) ⊇ Aut(T )
and that automorphism groups for spaces are closed. So, we can call groups
corresponding to reducts of a space S supergroups of Aut(S).
Groups for different spaces can coincide.
An ω-categorical structure is one for which all countable structures that are
elementary equivalent to it are isomorphic to it.
For ω-categorical structures definability subspaces are in one-to-one corre-
spondence with closed automorphism groups, so S ⊆ T iff Aut(S) ⊇ Aut(T ), i.e
the correspondence between definability spaces and their automorphism groups
is an antitone Galois connection.
It immediately follows from Svenonius theorem, but in the special case of
ω-categoricity it may be concluded from so called Engeler – Ryll-Nardzewski –
Svenonius Theorem (see e. g. [Hod2]).

4.2 The Rational Order. Homogeneous Structures

We start with a case of the most famous definability space where all subspaces
were discovered first. This result describing the lattice of subspaces of Q; <
was obtained by Claude Frasnay in 1965 [Fra]. All subspaces of rational order
are given by the following descriptions:
– One may view the ordering up to reversal, and so obtain a (ternary)
linear Betweenness relation B on Q, where B(x; y, z) holds if and
only if y < x < z or z < x < y.
– Alternatively, by bending the rational line into a Circle one obtains
a natural (ternary) circular ordering K on Q; here, K(x, y, z) ⇐⇒
(x < y < z) ∨ (y < z < x) ∨ (z < x < y).
– The latter too may be viewed up to reversal, to obtain the (qua-
ternary) Separation relation S: S(x, y; z, w) if the points x, y in the
circular ordering separate the points z, w.
The remarkable fact is that these are exactly the structures that in axiomatic
form were described by Huntington in 1935 [Hun1] (as was mentioned above).
The structure Q; < is ω-categorical. The method of proof for this is “back-
and-forth” argument discovered by Huntington (not Cantor) [Hun3]. In fact the
proof shows that Q; < is homogeneous in the following sense.
Definition 4. A structure M is homogeneous if every isomorphism between its
finite substructures extends to an automorphism of M.
32 A. Semenov, S. Soprunov, and V. Uspensky

This deﬁnition is a generalization of its “group counterpart”.

Definition 5. A permutation group is homogeneous iff any finite subset of its

domain can be translated to any other subset of the same cardinality with an
element of the group.
It’s obvious, that if Aut(S) is homogeneous, then the structure is homogeneous
as well. Actually not only the structure Q; < is homogeneous, but also the
group Aut(Q; <) is homogeneous.
Peter Cameron [Cam1] showed that there are just four homogeneous nontrivial
groups of permutations on a countable set. As the corollary we get, that in
the case of Q; < apart from Aut(Q; <) and Sym(Q), there are just three
homogeneous groups. The first is the group of all permutations of Q which either
preserve the order or reverse it. The second is the group of all permutations which
preserve the cyclic relation “x < y < z or y < z < x or z < x < y”;
this corresponds to taking an initial segment of Q and moving it to the end. The
third is the group generated by these other two: it consists of those permutations
which preserve the relation “exactly one of x, y lies between z and w”.
All countable homogeneous structures are ω-categorical, if they have a finite
signature or signature finite for any fixed number of variables. For ω-categorical
structures homogeneity is equivalent to quantifier elimination. All reducts of
Q; < are homogeneous and have quantifier elimination.
A good source for information related to homogeneous structures is [Mac].

4.3 The Random Graph. Thomas Conjecture

Our next example is one more remarkable homogeneous structure.
Definition 6. We call a countable graph random iff given two finite disjoint
sets U, V of vertices, there exists a vertex z joined to every vertex in U and to
no vertex in V.
This Is called “Alice’s Restaurant Property”. The term was coined by Peter
Winkler [Win], in reference to a popular song by Arlo Guthrie. The refrain of
the song “You can get anything you want at Alice’s restaurant” catches the spirit
of this property.
Any two random graphs are isomorphic. The proof is similar to the isomor-
phism proof for every two countable dense unlimited orders (the Q case). The
term “random” can be explained by the following property:

If a graph X on a ﬁxed countable vertex set is chosen by selecting edges

independently at random with probability 1/2 from the unordered pairs
of vertices, then Prob(X=R) = 1.

An explicit construction of R in [Rad]:

The set of vertices is N, and x is connected to y if and only if the x-th

digit in the base 2 expansion of y is equal to 1 or vice versa.
The Lattice of Deﬁnability 33

Here are the subspaces of the random graph R

Let R(k) be the k-ary relation that contains all k-tuples of pairwise distinct
elements x1 , . . . , xk in V such that the number of (undirected) edges between
those elements is odd.
R(a, b) – “(ab) is an edge in R”; R(3) ;R(4) ;R(5) ;Sym — equality
This description is given in [Tho].
It easy to see that structure of R(3) is not homogeneous and does not have
quantiﬁer elimination.
Simon Thomas proved obtained this description in [Tho1]. and suggested the
following conjecture:

If M is a ﬁnitely generated homogeneous structure then M has ﬁnitely

many reducts.

Problem 8. Verify Thomas conjecture.

4.4 Further Examples

In order to verify Thomas conjecture the superposition of two homogeneous
structures: Q; < and random graph G; E was considered in [BoPiPo]. They
presented a complete classification of the reducts of this random ordered graph
up to equivalence. It was shown that without counting obvious reducts D; <, E
and D; = there are precisely 42 such reducts.
In [JuZi] was described a complete lattice of the reducts of expansion of the
structure Q; < by a constant. This expansion can be considered as expansion
by three unary predicates: “x < a”; “x = a”; and “x > a”. Actually in this
paper different expansions of Q; < by unary predicates that have quantifier
elimination were studied. They classified the reducts of such expansions and
showed that there are only finitely many such. In particular it shows that in the
simplest case: expansion of rational numbers by two convex subsets (a cut of
the rational numbers) there are exactly 53 reducts, generated by the 5 standard
reducts on the elements of the cut as well as permutations preserving, swapping
and mixing elements of the cut.
Let us mention the example of an ω-categorical structure, which shows that
the condition of quantifier elimination in the Thomas’ Conjecture is necessary:
[AhZi] describes infinitely many reducts of a doubled infinite-dimensional pro-
jective space over binary field (F2 ).

4.5 Not ω-categorical Spaces. Integers with Successor – Depth 1

We don’t know too much about the reducts of not ω-categorical structure. An-
swering the dual question to Thomas’ one [BoMa] constructs an example of not
ω-categorical structure with the ﬁnite reducts lattice — actually the lattice con-
tains only two items. This example is based on tree of valency three structure.
Another (more simple) example was demonstrated in the [KaSi]. Answer-
ing a question from [BoMa] they show that the structure Q; S(x, y, z), where
34 A. Semenov, S. Soprunov, and V. Uspensky

S(x, y, z) ≡ (z=(x+y)/2) (or, the same, the structure Q; f (x, y, z) where
f (x, y, z) = x−y+z) admits no deﬁnable reduct. Though Svenonious theorem
is not used explicitly in the proof, the approach is rather similar. They note that
the structure Q<ω ; + is the saturated elementary extension of the Q; +, so
it’s enough to consider permutations of the structure Q<ω ; + only. Now the
fact that Aut(Q<ω ; f ) is maximal closed nontrivial subgroup (proved in the
same paper) is used.
The structure Z; +1 — integer numbers with the successor relation is not
ω-categorical, and has depth 1. For any natural number n we deﬁne spaces by
their generators
“x1 −x2 = n” — An ,
“x1 −x2 = x3 −x4 = n ∨ x1 −x2 = x3 −x4 = −n” — Bn , and
“|x1 −x2 | = n” — Cn .

Theorem 5. [SeSo2] Any subspaces of Z; +1 is An or Bn or Cn for a natu-

ral n.
An Bn Cn for any n and if n = m then An Am , Bn Bm , Cn Cm iﬀ
n is a divisor of m.

Problem 9. Describe the lattice of subspaces for N; +1.

Problem 10. Describe the lattice of subspaces for natural numbers with multiple
successors.

We leave out the researches on the reducts of the ﬁeld of real [MaPe, Pet] and
complex [MaPi] numbers.

4.6 Decidability of the Lattice Problems. Muchnik’s Self-deﬁnability

A natural algorithmic problem for an algebraic structure of definability lattice is
does an element of a space (given by a formula in or case) belong to a subspace
generated by a given set of elements? Positive and negative results on this for
homogeneous structures were obtained in [BoPiTs].
Andrei Muchnik in his work [Muc1] introduced the following
Definition 7. A definability space S is called self-definable iff there is a finite
signature (set of generators) Σ for S and sequence of formulas F1 , . . . , Fn , . . .
such that for any n = 1, 2, . . .
1. Fn is a closed formula in signature Σ ∪ {P }, where P is an n-ary symbol
2. Fn is true iff we take as interpretation of P an element from S.
He proved
Theorem 6. The space N; + is self-definable.
He writes:
“Unfortunately, we do not know any other examples of nice self-definable
structures.
The Lattice of Definability 35

Structures with unsolvable elementary theory are usually mutually inter-

pretable with the arithmetic of addition and multiplication of integers,
the non-self-definability of which is proved in [Add] (using category ar-
guments and [Tan] using measure arguments).
We believe that the structure formed by algebraic real numbers (with
addition and multiplication) is not self-definable; however, a formal proof
is missing (and seems to be rather complicated).
(Note that it is easy to prove that the structure formed by all real
numbers with addition and multiplication is not self-definable. Indeed,
let us assume that Φ(A) is true if and only if A is definable. Now we
replace A(x) by x = y. The new formula Φ (y) is true if and only if y
is algebraic. But we can eliminate quantifiers in Φ (y) and get a finite
union of segments. So we come to a contradiction.)”

Problem 11. Give more examples of structures with self-deﬁnability property.

References
[Add] Addison Jr., J.W.: The undefinability of the definable. Notices Amer. Math.
Soc. 12, 347 (1965)
[AhZi] Ahlbrandt, G., Ziegler, M.: Invariant subgroups of V V . J. Algebra 151(1),
26–38 (1992)
[BeCe] Bès, A., Cégielski, P.: Weakly maximal decidable structures. RAIRO - The-
oretical Informatics and Applications 42(1), 137–145 (2008)
[BeCe1] Bès, A., Cégielski, P.: Nonmaximal decidable structures. Journal of Mathe-
matical Sciences 158(5), 615–622 (2009)
[Bet] Beth, E.W.: On Padoa’s method in the theory of definition. Indag. Math. 15,
330–339 (1953)
[BoMa] Bodirsky, M., Macpherson, D.: Reducts of structures and maximal-closed
permutation groups. arXiv:1310.6393. (2013)
[Boo] Boole, G.: The mathematical analysis of logic. Philosophical Library (1847)
[BoPiPo] Bodirsky, M., Pinsker, M., Pongrácz, A.: The 42 reducts of the random or-
dered graph. arXiv:1309.2165 (2013)
[BoPiTs] Bodirsky, M., Pinsker, M., Tsankov, T.: Decidability of definability. In:
26th Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE
(2011)
[BuDa] Buchi, J.R., Danhof, K.J.: Definibility in normal theories. Israel Journal of
Mathematics 14(3), 248–256 (1973)
[Cam] Cameron, P.J.: Aspects of infinite permutation groups. London Mathematical
Society Lecture Note Series 339, 1 (2007)
[Cam1] Cameron, P.J.: Transitivity of permutation groups on unordered sets. Math-
ematische Zeitschrift 148(2), 127–139 (1976)
[ElRa] Elgot, C.C., Rabin, M.O.: Decidability and Undecidability of Extensions of
Second (First) Order Theory of (Generalized) Successor. J. Symb. Log. 31(2),
169–181 (1966)
[Fra] Frasnay, C.: Quelques problèmes combinatoires concernant les ordres totaux
et les relations monomorphes. Annales de l’ institut Fourier 15(2). Institut
Fourier (1965)
36 A. Semenov, S. Soprunov, and V. Uspensky

[Fre] Frege, G.: Begriﬀsschrift, eine der arithmetischen nachgebildete Formel-

sprache des reinen Denkens. Halle. (1879); van Heijenoort J. (trans.) Be-
griffsschrift, a formula language, modeled upon that of arithmetic, for pure
thought. From Frege to Gödel: A Source Book in Mathematical Logic, 3–82
(1879-1931)
[Fre1] Frege, G.: Grundgesetze der Arithmetik, Jena: Verlag Hermann Pohle, Band
I/II. The Basic Laws of Arithmetic, by M. Furth. U. of California Press,
Berkeley (1964)
[Hil] Hilbert, D.: Mathematische probleme. Nachrichten von der Gesellschaft
der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse 1900,
253–297 (1900)
[Hod] Hodges, W.: Model Theory (Draft July 20, 2000),
http://wilfridhodges.co.uk/history07.pdf
[Hod1] Hodges, W.: Tarski on Padoa’s method (2007),
http://wilfridhodges.co.uk/history06.pdf
[Hod2] Hodges, W.: Model theory, Encyclopedia of Mathematics and its Applica-
tions, vol. 42. Cambridge University Press, Cambridge (1993)
[Hun] Huntington, E.V.: The Fundamental Laws of Addition and Multiplication in
Elementary Algebra. The Annals of Mathematics 8(1), 1–44 (1906)
[Hun1] Huntington, E.V.: Inter-Relations Among the Four Principal Types of Order.
Transactions of the American Mathematical Society 38(1), 1–9 (1935)
[Hun2] Huntington, E.V.: A complete set of postulates for the theory of absolute con-
tinuous magnitude. Transactions of the American Mathematical Society 3(2),
264–279 (1902)
[Hun3] Huntington, E.V.: The continuum as a type of order: an exposition of the
model theory. Ann. Math. 6, 178–179 (1904)
[ICM] Duporcq, E. (ed.): Compte rendu du deuxième Congrès international des
mathématiciens: tenu à Paris du 6 au 12 août 1900: procès-verbaux et com-
munications. Gauthier-Villars (1902)
[ICP] International Congress of Philosophy. 1900–1903. Bibliothèque du Congrès
International de Philosophie. Four volumes. Paris: Librairie Armand Colin
[JuZi] Junker, M., Ziegler, M.: The 116 reducts of (Q, <, a). Journal of Symbolic
Logic, 861–884 (2008)
[KaSi] Kaplan, I., Simon, P.: The affine and projective groups are maximal.
arXiv:1310.8157 (2013)
[Kle] Klein, F.: Vergleichende betrachtungen über neuere geometrische forsuchun-
gen. A. Deichert (1872)
[Kor] Korec, I.: A list of arithmetical structures complete with respect to the first-
order definability. Theoretical Computer Science 257(1), 115–151 (2001)
[Lan] Langford, C.H.: Some theorems on deducibility. Annals of Mathematics Sec-
ond Series 28(1/4), 16–40 (1926–1927)
[Lan1] Langford, C.H.: Theorems on Deducibility (Second paper). Annals of Math-
ematics, Second Series 28(1/4), 459–471 (1926–1927)
[LiTa] Lindenbaum, A., Tarski, A.: Über die Beschränktheit der Ausdrucksmittel
deduktiver Theorien. Ergebnisse eines Mathematischen Kolloquiums, fasci-
cule 7 (1934–1935) (Engl. trans.: On the Limitations of the Means of Ex-
pression of Deductive Theories. In: Corcoran, J. (ed.) Alfred Tarski: Logic,
Semantics, Metamathematics, Hackett, Indianapolis, 384–392 (1935))
[Löw] Löwenheim, L.: Über möglichkeiten im relativkalkül. Mathematische An-
nalen 76(4), 447–470 (1915)
The Lattice of Definability 37

[Mac] Macpherson, D.: A survey of homogeneous structures. Discrete Mathemat-

ics 311(15), 1599–1634 (2011)
[MaPe] Marker, D., Peterzil, Y.A., Pillay, A.: Additive reducts of real closed fields.
The Journal of Symbolic Logic, 109–117 (1992)
[MaPi] Marker, D., Pillay, A.: Reducts of (C, +, ·) which contain +. Journal of Sym-
bolic Logic, 1243–1251 (1990)
[Mos] Mostowski, A.: On direct products of theories. Journal of Symbolic Logic,
1–31 (1952)
[Muc] Muchnik, A.A.: Games on infinite trees and automata with dead-ends. A
new proof for the decidability of the monadic second order theory of two
successors. Bull. EATCS 48, 220–267 (1992)
[Muc1] Muchnik, A.A.: The definable criterion for definability in Presburger arith-
metic and its applications. Theoretical Computer Science 290(3), 1433–1444
(2003)
[Pea] Peacock, G.: Report on the recent progress and present state of certain
branches of analysis. British Association for the Advancement of Science
(1833)
[Pei] Peirce, C.S.: Description of a notation for the logic of relatives, resulting from
an amplification of the conceptions of Boole’s calculus of logic. Memoirs of
the American Academy of Arts and Sciences, 317–378 (1873)
[Pei1] Peirce, C.S.: On the algebra of logic: A contribution to the philosophy of
notation. American Journal of Mathematics 7(2), 180–196 (1885)
[Pet] Peterzil, Y.: Reducts of some structures over the reals. Journal of Symbolic
Logic 58(3), 955–966 (1993)
[Pie] Pieri, M.: La geometria elementare istituita sulle nozioni ‘punto’ é ‘sfera’.
Memorie di Matematica e di Fisica della Società Italiana delle Scienze 15,
345–450 (1908)
[Pre] Presburger, M.: Über die Vollständigkeit eines gewissen Systems der Arith-
metik ganzer Zahlen, in welchem die Addition als einzige Operation her-
vortritt. Sprawozdanie z 1 Kongresu Matematyków Krajow Slowianskich,
Ksiaznica Atlas. pp. 92-10 (Translated: On the completeness of a certain
system of arithmetic of whole numbers in which addition occurs as the only
operation. History and Philosophy of Logic 12, 225–233 (1930))
[Rab] Rabin, M.O.: Decidability of second-order theories and automata on infinite
trees. Transactions of the American Mathematical Society 141, 1–35 (1969)
[Rad] Rado, R.: Universal graphs and universal functions. Acta Arithmetica 9(4),
331–340 (1964)
[Sch] Schröder, E.: On Pasigraphy. Its Present State and the Pasigraphic Move-
ment in Italy. The Monist 9(1), 44–62 (1898)
[Sch1] Schröder, E.: Vorlesungen über die Algebra der Logik, Volumes 1 to 3. Teub-
ner, Leipzig. Reprinted by Chelsea, New York (1966)
[Sem] Semenov, A.L.: Finiteness Conditions for Algebras of Relations. Trudy
Matematicheskogo Instituta im. V.A. Steklova 242, 103–107 (2003); En-
glish version: Proceedings of the Steklov Institute of Mathematics 242, 92–96
(2003)
[Sem1] Semenov, A.L.: On certain extensions of the arithmetic of addition of natural
numbers. Izvestiya: Mathematics 15(2), 401–418 (1980)
[Sem2] Semenov, A.L.: Predicates that are regular in two positional systems are
definable in Presburger arithmetic. Siberian Math. J. 18(2), 403–418 (1977)
38 A. Semenov, S. Soprunov, and V. Uspensky

[SeSo] Semenov, A., Soprunov, S.: Finite quantifier hierarchies in relational alge-
bras. Proceedings of the Steklov Institute of Mathematics 274(1), 267–272
(2011)
[SeSo1] Semenov, A.L., Soprunov, S.F.: Remark on Svenonius theorem.
arXiv:1301.2412 (2013)
[SeSo2] Semenov, A.L., Soprunov, S.F.: Lattice of relational algebras definable in
integers with successor. arXiv:1201.4439 (2012)
[Sko] Skolem, T.: Logisch-kombinatorische Untersuchungen über die Erfullbarkeit
oder Beweisbarkeit mathematischer Sdtze nebst einem Theorem über dichte
Mengen. Videnskapsselskapets skrifter. I. Matematisk-naturvidenskabelig
klasse 4 (1920)
[Sko1] Skolem, T.: Über gewisse Satzfunktionen in der Arithmetik. Skrifter utgit av
Videnskapsselskapet i Kristiania, I. klasse 7 (1930)
[Smi] Smith, J.T.: Definitions and Nondefinability in Geometry. The American
Mathematical Monthly 117(6), 475–489 (2010)
[Sop] Soprunov, S.: Decidable expansions of structures. Vopr. Kibern. 134, 175–179
(1988) (in Russian)
[Sve] Svenonius, L.: A theorem on permutations in models. Theoria 25(3), 173–178
(1959)
[Tan] Tanaka, H.: Some results in the effective descriptive set theory. Publications
of the Research Institute for Mathematical Sciences 3(1), 11–52 (1967)
[Tar] Tarski, A.: The Concept of Truth in Formalized Languages. In: Alfred Tarski:
Logic, Semantics, Metamathematics. Trans. J. H. Woodger, second edition
ed. and introduced by John Corcoran, Hackett, Indianapolis, 152–278 (1983)
[Tar1] Tarski, A.: A Decision Method for Elementary Algebra and Geometry Re-
port R-109 (second revised edn.). The Rand Corporation, Santa Monica, CA
(1951)
[Tar2] Tarski, A.: Der Wahrheitsbegriff in den formalisierten Sprachen. Studia
Philosophica 1 (1935); reprinted in Tarski 2, 51–198 (1986)
[Tar3] Tarski, A.: Einige methodologifche Unterfuchungen über die Definierbarkeit
der Begriffe. Erkenntnis 5(1), 80–100 (1935)
[Tar4] Tarski, A.: What are logical notions? History and Philosophy of Logic 7(2),
143–154 (1986)
[Tho] Thomas, S.: Reducts of random hypergraphs. Annals of Pure and Applied
Logic 80(2), 165–193 (1996)
[Tho1] Thomas, S.: Reducts of the random graph. Journal of Symbolic Logic 56(1),
176–181 (1991)
[Win] Winkler, P.: Random structures and zero-one laws. Finite and infinite com-
binatorics in sets and logic, pp. 399–420. Springer, Netherlands (1993)
Counting Popular Matchings
in House Allocation Problems

Rupam Acharyya, Sourav Chakraborty, and Nitesh Jha

Chennai Mathematical Institute

Chennai, India
{rupam,sourav,nj}@cmi.ac.in

Abstract. We study the problem of counting the number of popular

matchings in a given instance. McDermid and Irving gave a poly-time
algorithm for counting the number of popular matchings when the pref-
erence lists are strictly ordered. We ﬁrst consider the case of ties in
preference lists. Nasre proved that the problem of counting the number
of popular matching is #P-hard when there are ties. We give an FPRAS
for this problem.
We then consider the popular matching problem where preference lists
are strictly ordered but each house has a capacity associated with it.
We give a switching graph characterization of popular matchings in this
case. Such characterizations were studied earlier for the case of strictly
ordered preference lists (McDermid and Irving) and for preference lists
with ties (Nasre). We use our characterization to prove that counting
popular matchings in capacitated case is #P-hard.

1 Introduction
A popular matching problem instance I comprises a set A of agents and a set H
of houses. Each agent a in A ranks (numbers) a subset of houses in H (lower
rank specify higher preference). The ordered list of houses ranked by a ∈ A is
called a’s preference list. For an agent a, let Ea be the set of pairs (a, h) such that
the house h appears on a’s preference list. Deﬁne E = ∪a∈A Ea . The problem
instance I is then represented by a bipartite graph G = (A ∪ H, E). A matching
M of I is a matching of the bipartite graph G. We use M (a) to denote the house
assigned to agent a in M and M (h) to denote the agent that is assigned house h
in M . An agent prefers a matching M to a matching M if (i) a is matched in M
and unmatched in M , or (ii) a is matched in both M and M but a prefers the
house M (a) to M (a). Let φ(M, M ) denote the number of agents that prefer M
to M . We say M is more popular than M if φ(M, M ) > φ(M , M ), and denote
it by M M . A matching M is called popular if there exists no matching M
such that M M .
The popular matching problem was introduced in [5] as a variation of the
stable marriage problem [4]. The idea of popular matching has been studied
extensively in various settings in recent times [1,14,12,10,8,11,13], mostly in the
context where only one side has preference of the other side but the other side

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 39–51, 2014.

c Springer International Publishing Switzerland 2014
40 R. Acharyya, S. Chakraborty, and N. Jha

has no preference at all. We will also focus on this setting. Much of the earlier
work focuses on finding efficient algorithms to output a popular matching, if one
exists.
The problem of counting the number of “solutions” to a combinatorial ques-
tion falls into the complexity class #P. An area of interest that has recently gath-
ered a certain amount of attention is the problem of counting stable matchings
in graphs. The Gale-Shapely algorithm [4] gives a simple and efficient algorithm
to output a stable matching, but counting them was proved to be #P-hard in
[6]. Bhatnagar, Greenberg and Randall [2] showed that the random walks on the
stable marriage lattice are slowly mixing, even in very restricted versions of the
problem. [3] gives further evidence towards the conjecture that there may not
exist an FPRAS at all for this problem.
Our motivation for this study is largely due to the similarity of structures
between stable matchings and popular matchings (although no direct relation-
ship is known). The interest is further fueled by the existence of a linear time
algorithm to exactly count the number of popular matchings in the standard set-
ting [12]. We look at generalizations of the standard version - preferences with
ties and houses with capacities. In the case where preferences could have ties, it
is already known that the counting version is #P-hard [13]. We give an FPRAS
for this problem. In the case where houses have capacities, we prove that the
counting version is #P-hard. While the FPRAS for the case of ties is achieved
via a reduction to a well known algorithm, the #P-hardness for the capacitated
case is involving, making it the more interesting setting of the problem.
We now formally describe the different variants of the popular matching prob-
lem (borrowing the notation from [14]) and also describe our results alongside.

House Allocation Problem (HA). These are the instances G = (A ∪ H, E)

where the preference list of each agent a ∈ A is a linear order. Let n = |A| +
|H| and m = |E|. In [1], Abraham et al. give a complete characterization of
popular matchings in an HA instance, using which they give an O(m + n) time
algorithm to check if the instance admits a popular matching and to obtain the
largest such matching, if one exists. The question of counting popular matchings
was ﬁrst addressed in [12], where McDermid et al. give a new characterization
by introducing a powerful structure called the switching graph of an instance.
The switching graph encodes all the popular matchings via switching paths and
switching cycles. Using this structure, they give a linear time algorithm to count
the number of popular matchings.

House Allocation Problem with Ties (HAT). An instance G = (A∪H, E)

of HAT can have applicants whose preference list contains ties. For example, the
preference list of an agent could be [h3 , (h1 , h4 ), h2 ], meaning, house h3 gets rank
1, houses h1 and h4 get a tied rank 2 and house h2 gets the rank 3. A charac-
terization for popular √matchings in HAT was given in [1]. The characterization
is used to give an O( nm) time algorithm to solve the maximum cardinality
popular matching problem. We outline their characterization brieﬂy in Section 2
Counting Popular Matchings in House Allocation Problems 41

where we consider the problem of counting popular matchings in HAT. In [13],

Nasre gives a proof of #P-hardness of this problem. We give an FPRAS for
this problem by reducing it to the problem of counting perfect matchings in a
bipartite graph.

Capacitated House Allocation Problem (CHA). A popular matching in-

stance in CHA has a capacity ci associated with each house hi ∈ H, allowing at
most ci agents to be matched to house hi . The preference list of each agent is
strictly ordered. A characterization for popular matchings in CHA was given in
[14], along √
with an algorithm to ﬁnd the largest popular matching (if one exists)
in time O( Cn1 + m), where n1 = |A|, m = |E| and C is the total capacity of
the houses. In Section 3, we consider the problem counting popular matchings
in CHA. We give a switching graph characterization of popular matchings in
CHA. This is similar to the switching graph characterization for HA in [12]. Our
construction is also motivated from [13], which gives a switching graph charac-
terization of HAT. We use our characterization to prove that it is #P-Complete
to compute the number of popular matchings in CHA.

Remark. A natural reduction exists from a CHA instance G = (A ∪ H, E) to

an HAT instance. The reduction is as follows. Treat each house hi ∈ H with
capacity c as c different houses h1i , . . . , hci of unit capacity, which are always
tied together and appear together wherever hi appears in any agent’s preference
list. Let the HAT instance thus obtained be G . It is clear that every popular
matching of G is a popular matching of G . Hence, for example, an algorithm
which finds a maximum cardinality popular matching for HAT can be used to
find a maximum cardinality popular matching for the CHA instance G. In the
context of counting, it is important to note that one popular matching of G may
translate to many popular matchings in G . It is not clear if there is a useful map
between these two sets that may help in obtaining either hardness or algorithmic
results for counting problems.

2 Counting in House Allocation Problem with Ties

In this section we consider the problem of counting the number of popular match-
ings in House Allocation problem with Ties (HAT). We first describe the char-
acterization given in [1] here using similar notations. Let G = (A ∪ H, E) be
an HAT instance. For any agent a ∈ A, let f (a) denote the set of first choices
of a. For any house h ∈ H, define f (h) := {a ∈ A, f (a) = h}. A house h for
which f (h) = φ is called an f -house. To simplify the definitions, we add a unique
last-resort house l(a) with lowest priority for each agent a ∈ A. This forces every
popular matching to be an applicant complete matching.
Definition 1. (Section 3.1 in [1]). The first choice graph of G is defined to
be G1 = (A ∪ H, E1 ), where E1 is the set of all rank one edges.
Lemma 1. (Lemma 3.1 in [1]). If M is a popular matching of G, then M ∩ E1
is a maximum matching of G1 .
42 R. Acharyya, S. Chakraborty, and N. Jha

Let M1 be any maximum matching of G1 . The matching M1 can be used to

identify the houses h that are always matched to an agent in the set f (h). In
this direction, we observe that M1 defines a partition of the vertices A ∪ H into
three disjoint sets - even, odd and unreachable: a vertex is even (resp. odd) if
there is an even (resp. odd) length alternating path from an unmatched vertex
(with respect to M1 ) to v; a vertex v is unreachable if there is no alternating
path from an unmatched vertex to v. Denote the sets even, odd and unreachable
by E, O and U respectively. The following is a well-known theorem in matching
theory [9].
Lemma 2 (Gallai-Edmonds Decomposition). Let G1 and M1 define the
partition E, O and U as above. Then,
(a) The sets E, O and U are pairwise disjoint, and every maximum matching
in G1 partitions the vertices of G1 into the same partition of even, odd and
unreachable vertices.
(b) In any maximum matching of G1 , every vertex in U is matched with another
vertex in U, and every vertex in O is matched with some vertex in E. No
maximum matching contains an edge between a vertex in O and a vertex in
O ∪ U. The size of a maximum matching is |O| + |U|/2.
(c) G1 contains no edge connecting a vertex in E with a vertex in U.
We show the decomposition of G1 in Figure 1, where we look at the bipartitions
of U, O, and E into their left and right parts, denoted by subscripts l and r
respectively. Since G1 only contained edges resulting from first-choices, every
house in Ur and Or is an f -house. From Lemma 2, each such house h ∈ Ur ∪
Or is matched with an agent in f (h) in every maximum matching of G1 , and
correspondingly in every popular matching of G (Lemma 1).

A H
..
Ul . Ur

..
Ol .
Er

..
. Or
El

Fig. 1. Gallai-Edmonds decomposition of the ﬁrst-choice graph of G

For each agent a, deﬁne s(a) to be a’s most preferred house(s) in Er . Note that
s(a) always exists after the inclusion of last-resort houses l(a). The following is
proved in [1].
Counting Popular Matchings in House Allocation Problems 43

Lemma 3. (Lemma 3.5 in [1]). A matching M is popular in G if and only if

1. M ∩ E1 is a maximum matching of G1 , and
2. for each applicant a, M (a) ∈ f (a) ∪ s(a).

The following hardness result is from [13].

Lemma 4. (Theorem 3 in [13]). Counting the number of popular matchings in

HAT is #P-hard.

We now give an FPRAS for counting the number of popular matchings in the
case of ties. As before, let G = (A ∪ H, E) be our HAT instance. We assume
that that G admits at least one popular matching (this can be tested using
the characterization). We reduce our problem to the problem of counting perfect
matchings in a bipartite graph. We start with the ﬁrst-choice graph G1 of G, and
perform a Gallai-Edmonds decomposition of G1 using any maximum matching
of G1 . In order to get a perfect matching instance, we extend the structure
obtained from Gallai-Edmonds decomposition described in Figure 1. Let F be
the set of f -houses and S be the set of s-houses. We make use of the following
observations in the decomposition.
— Every agent in Ul and Ol gets one of their ﬁrst-choice houses in every popular
matching.
— Er can be further partitioned into the following sets:
– Erf := {h ∈ F ∩ S, h ∈ Er },
– Ers := {h ∈ F ∩ S, h ∈ Er },
f /s
– Er := {h ∈ F ∩ S, h ∈ Er }, and
– Er := {h ∈
/ F ∪ S, h ∈ Er }.
f /s
— Ol can only match with houses in Erf ∪ Er in every popular matching.
These observations are described in Figure 2(a).

Next, we observe that every agent in El that is already not matched to a house
f /s
in Or , must match to a house in Ers ∪ Er . We facilitate this by adding all edges
(a, s(a)) for each agent in El . Finally, we add a set of dummy agent vertices D on
the left side to balance the bipartition. The size of D is |A| − (|H| − |Er |). This
diﬀerence is non-negative as long as the preference-lists of agents are complete.
f /s
We make the bipartition (D, Erf ∪Er ∪Ers ) a complete bipartite graph by adding
the appropriate edges. This allows us to move from one popular matching to
another by switching between ﬁrst and second-choices and, among second choices
of agents. Finally, we remove set Er from the right side. The new structure is
described in Figure 2(b). Denote the new graph by G .

Lemma 5. The number of popular matchings in G is |D|! times the number of

perfect matchings in G .

Proof. Consider a perfect matching M of G . Let the matching M be obtained

by removing from M all the edges coming out of the set D. Observe that M ∩E1
44 R. Acharyya, S. Chakraborty, and N. Jha

A H A H
.. ..
Ul . Ur Ul . Ur

.. ..
Ol . Erf Ol . Erf
f /s f /s
Er Er
Er Er
..
Ers D . Ers

Er ..
.
.. ..
. Or . Or
El El

Fig. 2(a) Fig. 2(b)

Fig. 2. Reduction to a perfect-matching instance by extending the Gallai-Edmonds

decomposition of G1

is a maximum matching of G1 . This is because the sets Ul , Ol and Or are always

matched in M (or else M would not be a perfect matching of G ) and that the
size of a maximum matching in G1 is (|Ul | + |Ol | + |Or |) by Lemma 2. Also,
each agent in A is matched to either a house in F or in S by the construction
of graph G . Using Lemma 3, we conclude that M is a popular matching of G.
Finally, observe that every popular matching in M in G can be augmented to
a perfect matching of G by adding exactly |D| edges. This follows again from
Lemma 2 and Lemma 3.

We now make use of the following result of Jerrum et al. from [7].

Lemma 6. (Theorem 1.1 in [7]). There exists an FPRAS for the problem of
counting number of perfect matchings a bipartite graph.

From Lemma 5 and Lemma 6, we have the following.

Theorem 1. There exists an FPRAS for counting the number of popular match-
ings in the House Allocation problem with Ties.

3 Counting in Capacitated House Allocation Problem

In this section, we consider the structure of popular matchings in Capacitated

House Allocation problem (CHA). A CHA instance I consists of agents A and
houses H. Let |A| = n and |H| = m. Let c : H → Z>0 be the capacity function
for houses. Each agent orders a subset of the houses in a strict order creating
Counting Popular Matchings in House Allocation Problems 45

its preference list. The preference list of ai ∈ A defines a set of edges Ei from
ai to houses in H. Define E = ∪i∈[n] Ei . The problem instance I can then be
represented by a bipartite graph G = (A ∪ H, E).
For the instance I, a matching M is a subset of E such that each agent appears
in at most one edge in M and each house h appears in at most c(h) edges in M .
The definitions of more popular than relationship between two matchings and
popular matching is same as described earlier in Section 1.
We now outline a characterization of popular matchings in CHA from [14].
As before, denote by f (a) the first choice of an agent a ∈ A. A house which is
the first choice of at least one agent is called an f -house. For each house h ∈ H,
define f (h) = {a ∈ A, f (a) = h}. For each agent a ∈ A, we add a unique
last-resort house l(a) with least priority and capacity 1.

Lemma 7. (Lemma 1 in [14]) If M is a popular matching then for each f -house

h, |M (h) ∩ f (h)| = min{c(h), |f (h)|}.

For each agent a ∈ A, deﬁne s(a) to be the highest ranked house h on a’s
preference list such that one of the following is true:

– h is not an f -house, or,

– h is an f -house but h = f (a) and |f (h)| < c(h).

Notice that s(a) always exists after the inclusion of last-resort houses l(a). The
following lemma gives the characterization of popular matchings in G.

Lemma 8. (Theorem 1 in [14]) A matching M is popular if and only if

1. for every f -house h ∈ H,

– if |f (h)| ≤ c(h), then every agent in f (h) is matched to the house h,
– else, house h is matched to exactly c(h) agents, all belonging to f (h),
2. M is an agent complete matching such that for each agent a ∈ A, M (a) ∈
{f (a), s(a)}.

3.1 Switching Graph Characterization of CHA

We now give a switching graph characterization of popular matchings for in-

stances from this class. Our results are motivated from similar characterizations
for HA in [12] and for HAT in [13]. A switching graph for an instance allows us
to move from one popular matching to another by making well deﬁned walks on
the switching graph.
Consider a popular matching M of an instance G of CHA. The switching
graph of G with respect to M is a directed weighted graph GM = (H, EM ), with
the edge set EM deﬁned as follows. For every agent a ∈ A,

– add a directed edge from M(a) to {f (a), s(a)} \ M (a),

– if M (a) = f (a), assign a weight of −1 on this edge, otherwise assign a weight
of +1.
46 R. Acharyya, S. Chakraborty, and N. Jha

Associated with the switching graph GM , we have an unsaturation degree func-

tion uM : H → Z≥0 , deﬁned uM (h) = c(h) − |M (h)|. A vertex h is called
saturated if its unsaturation degree is 0, i.e. uM (h) = 0. If uM (h) > 0, h is
called unsaturated. We make use of the following terminology in the foregoing
discussion. We now describe some useful properties of the switching graph GM .

Property 1: Each vertex h can have out-degree at most c(h).

Proof. Each edge is from a matched house to an unmatched house and since
the house h has a maximum capacity c(h), it can only get matched to at
most c(h) agents.

Property 2: Let M and M be two diﬀerent popular matchings in G and

let GM and GM denote the switching graphs respectively. For any vertex
house h, the number of −1 outgoing edges from h is invariant across GM
and GM . The number of +1 incoming edges on h is also invariant across
GM and GM .
Proof. From Lemma 7, in any popular matching, each f -house h is matched
to exactly min{|f (h)|, c(h)} agents and this is also the number of outgoing
edges with weight −1. A similar argument can be made for +1 weighted
incoming edges.

Property 3: No +1 weighted edge can end at an unsaturated vertex.

Proof. If a +1 weighted edge is incident on a vertex h, this means that
the house h is an f -house for some agent a that is still not matched to it
in M . But if h is unsaturated then it still has some unused capacity. The
matching M obtained by just promoting a to h is popular than M , which
is a contradiction.

Property 4: There can be no incoming −1 weighted edge on a saturated

vertex if all its outgoing edges have weight −1.
Proof. A −1 weighted edge on a vertex h implies that the house h is an s-
house for some agent a. But if h is saturated with all outgoing edges having
a weight of −1, then all the capacity of h has been used up by agents who
had h as their ﬁrst choice. But by deﬁnition, h can not be an s-house for
any other agent.

Property 5: For a given vertex h, if there exists at least one +1 weighted

incoming edge, then all outgoing edges are of weight −1 and there can be
no −1 weighted incoming edge on h.
Proof. Let agent a correspond to any +1 weighted incoming edge. Suppose
h has an outgoing +1 edge ending at a vertex h and agent a corresponds to
this edge. We can promote agents a and a to their ﬁrst choices and demote
any agent which is assigned house h . This leads to a matching more popular
than M . Hence all outgoing edges from h must be of weight −1. Further,
Property 3 and Property 4 together imply that there can be no incoming
edge on h of weight −1.

Counting Popular Matchings in House Allocation Problems 47

Switching Moves. We now describe the operation on the switching graph

which takes us from one popular matching to another. We make use of the fol-
lowing terminology with reference to the switching graph GM . Note that the term
“path” (“cycle”) implies a “directed path” (“directed cycle”). A “+1 edge”(“−1
edge”) means an “edge with weight +1” (“edge with weight −1”).
– A path is called an alternating path if it starts with a +1 edge, ends at a −1
edge and alternates between +1 and −1 edges.
– A switching path is an alternating path that ends at an unsaturated vertex.
– A switching cycle is an even length cycle of alternating −1 and +1 weighted
edges.
– A switching set is a union of edge-disjoint switching cycles and switching
paths, such that at most k switching paths end a vertex of unsaturation
degree k.
– A switching move is an operation on GM by a switching set S in which, for
every edge e in S, we reversed the direction of e and ﬂip the weight of e
(+1 ↔ −1).
Observe that every valid switching graph inherently implies a matching (in the
context of CHA) of G.
Let GM = (H, EM ) and GM = (H, EM ) be the switching graphs associated
with popular matchings M and M of the CHA instance G = (A∪H, E). Observe
that the underlying undirected graph of GM and GM are same. We have the
following.
Theorem 2. Let S be the set of edges in GM that get reversed in GM . Then,
S is a switching set for GM .
We prove this algorithmically in stages.
Lemma 9. Every directed cycle in S is a switching cycle of GM .
Proof. Let C be any cycle in S. From Property 5 of switching graphs, we know
that no vertex in C can have an incoming edge and an outgoing edge of same
weight +1. Similarly, since S is the set of edges in GM which have opposite
directions and opposite weights in GM , we observe that S can not contain any
vertex with incoming and outgoing edges both having weight −1 (again from
Property 5). This forces the weights of cycle C to alternate between +1 and −1.
Moreover, this alternation forces the cycle to be of even length.
At this stage we apply the following algorithm to the set S.

Reduction(S):
1. while (there exists a switching cycle C in S):
let S := S \ C
2. while (S is non-empty):
(a) ﬁnd a longest path P in S which alternates between weights +1 and −1
(b) let S := S \ P
48 R. Acharyya, S. Chakraborty, and N. Jha

At the end of every iteration of the while loop in Step 1, Lemma 9 still holds
true. We now prove a very crucial invariant of the while loop in Step 2.

Lemma 10. In every iteration of the while loop in Step 2 of the algorithm
Reduction, the longest path in step 2(a) is a switching path for GM .

Proof. Let us denote the stages of the run of algorithm Reduction by t. Initially,
at t = 0, before any of the while loops run, S is exactly the difference of edges

in EM and EM . Let the while loop in Step 1 runs t1 times and the while loop
in Step 2 runs t2 times.
Let the current stage be t = t1 + i. Let P be the maximal path in step 2(a) at
this stage. We show that P starts with an edge of weight +1. For contradiction,
let (hi , hj ) be an edge of weight −1 and that this is the first edge of path P . Let
aij be the agent associated with the edge (hi , hj ).
The Property 5 of switching sets precludes any incoming edge of weight −1
on the vertex hi . Hence, no switching path could have ended at hi at any stage
t < t1 + i. Similarly, no switching cycle with an incoming edge −1 was incident
on hi at an earlier stage.
Let us assume that there were r cycles that were incident at hi at t = 0. At
stage t = t1 + i, let the number of outgoing −1 edges be m. Hence at t = 0,
hi had r incoming +1 edges and r + m outgoing −1 edges. But this would also
imply that at t = 0, hi had r + m incoming +1 edges in GM . This contradicts
Property 2, requiring the number of incoming +1 edges to be constant in the
switching graphs corresponding to different popular matchings.
A similar argument can be made for the fact that the path P can only end at
an edge with weight −1 and that P ends at an unsaturated vertex.

The following theorem establishes the characterization for popular matchings in

CHA.

Theorem 3. If GM is the switching graph of the CHA instance G with respect

to a popular matching M , then
(i) every switching move on GM generates another popular matching, and
(ii) every popular matching of G can be generated by a switching move on M .

Proof.

(i) We verify that the new matching generated by applying a switching move on
GM satisﬁes the characterization in Lemma 8. Call the new switching graph
GM and the associated matching M . First, observe that M is indeed an
agent complete matching since GM still has a directed edge for each agent
in A. Next, each agent a is still matched to f (a) or s(a) as the switching
move either reverses an edge of GM or leaves it as it is. Finally, for each
house h, f (h) ⊆ M (h) if |f (h)| < c(h) and |M (h)| = c(h) with M (h) ⊆
f (h) otherwise. This is true because |M (h)| = |M (h)|, by the deﬁnition of
switching moves.
(ii) This is implied by Theorem 2.
Counting Popular Matchings in House Allocation Problems 49

3.2 Hardness of Counting

In this section we prove the #P-hardness of counting popular matchings in CHA.

We reduce the problem of counting the number of matchings in a bipartite graph
to our problem.
Let G = (A ∪ B, E) be a bipartite matching instance in which we want to
count the number of matchings. From G we create a CHA instance I such that
the number of popular matchings of I is same as the number of matchings of G.
Observe that a description of a switching graph gives the following information
about its instance:

– the set of agents A,

– for each agent a ∈ A, it gives f (a) and s(a), and
– for each s-house or f -house h, the unsaturation degree gives the capacity
c(h).

Using this information, we can create the description of the instance I so that it
meets our requirement. For simplicity, we assume G to be connected (as isolated
vertices do not aﬀect the count). We orient all the edges of G from A to B and
call the directed graph G = (A ∪ B, E ). Using G , we construct a graph S,
which will be the switching graph.
Let |A| = n1 , |B| = n2 and |E | = m. S is constructed by augmenting G .
We keep all the vertices and edges of G in S and assign each edge a weight of
−1. Further, for each vertex u ∈ A, add a copy u and add a directed edge from
u to u, and assign a weight of +1 to the edge. Call the new set of vertices A .
The sets A and B contain s-houses and the set A contains f -houses. We label
every vertex in A and A as saturated and for each vertex v in B, we label v
as unsaturated with unsaturation degree 1. Hence, the switching graph S has
2n1 + n2 vertices and n1 + m edges.
The CHA instance I corresponding to the switching graph S has 2n1 + n2
houses and n1 + m agents. Each agent has a preference list of length 2 that is
naturally deﬁned by the weight of edges in S.
Let the popular matching represented by S be Mφ . This corresponds to the
empty matching of G. Every non-empty matching of G can be obtained by a
switching move on S. We make this more explicit in the following theorem.

Theorem 4. The number of matchings in G is same as the number of popular

matchings in I.

Proof. We prove this by showing that each matching in G corresponds to a

unique set of edge disjoint switching paths in the switching graph S of I.
Consider a matching M of G and let (u, v) ∈ M . We look at the length 2
directed path in S that is obtained by extending (u, v) in the reverse direction:
u → u → v with u ∈ A . It’s easy to see that this is a switching path for I.
Moreover, the set of switching paths obtained from any matching of G forms
a valid switching set (as every pair of such paths arising from a matching are
always edge disjoint).
50 R. Acharyya, S. Chakraborty, and N. Jha

For the converse, observe that S can only have switching paths of length 2
and it has no switching cycles. An edge disjoint set of such paths corresponds
to a matching of G. By the deﬁnition of S, it’s easy to see every matching in M
can be obtained by a switching set of S.
Conclusions and Acknowledgement: We obtained an FPRAS for the #P-hard
problem of counting popular matchings where instances could have ties. We
presented a switching graph characterization for Capacitated House Allocation
problem. Though our motivation for studying this structure was to prove a hard-
ness result for counting popular matchings in CHA, the characterization may
itself be of importance to many other problems of interest. This also completes
the picture of House Allocation problems to a wider extent as such characteriza-
tions were only known for HA and HAT instances. We believe that this structure
could be used to give an FPRAS for the case of CHA. This remains an open
question.
We thank Meghana Nasre for fruitful discussions. We also thank anonymous
reviewers for their input.

References
1. Abraham, D.J., Irving, R.W., Kavitha, T., Mehlhorn, K.: Popular matchings.
SIAM J. Comput. 37(4), 1030–1045 (2007)
2. Bhatnagar, N., Greenberg, S., Randall, D.: Sampling stable marriages: why spouse-
swapping won’t work. In: SODA, pp. 1223–1232 (2008)
3. Chebolu, P., Goldberg, L.A., Martin, R.A.: The complexity of approximately count-
ing stable matchings. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds.)
APPROX 2010. LNCS, vol. 6302, pp. 81–94. Springer, Heidelberg (2010)
4. Gale, D., Shapley, L.S.: College admissions and the stability of marriage. The
American Mathematical Monthly 69(1), 9–15 (1962)
5. Gärdenfors, P.: Match making: assignments based on bilateral preferences. Behav-
ioral Science 20(3), 166–173 (1975)
6. Irving, R.W., Leather, P.: The complexity of counting stable marriages. SIAM J.
Comput. 15(3), 655–667 (1986)
7. Jerrum, M., Sinclair, A., Vigoda, E.: A polynomial-time approximation algorithm
for the permanent of a matrix with non-negative entries. In: STOC, pp. 712–721
(2001)
8. Kavitha, T., Mestre, J., Nasre, M.: Popular mixed matchings. Theor. Comput.
Sci. 412(24), 2679–2690 (2011)
9. Lovász, L., Plummer, M.D.: Matching theory. North-Holland Mathematics Studies,
vol. 121. North-Holland Publishing Co., Amsterdam (1986), Annals of Discrete
Mathematics, 29
10. Mahdian, M.: Random popular matchings. In: ACM Conference on Electronic
Commerce, pp. 238–242 (2006)
11. McCutchen, R.M.: The least-unpopularity-factor and least-unpopularity-margin
criteria for matching problems with one-sided preferences. In: Laber, E.S., Born-
stein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 593–604.
Springer, Heidelberg (2008)
Counting Popular Matchings in House Allocation Problems 51

12. McDermid, E., Irving, R.W.: Popular matchings: structure and algorithms. J.
Comb. Optim. 22(3), 339–358 (2011)
13. Nasre, M.: Popular matchings: Structure and cheating strategies. In: STACS,
pp. 412–423 (2013)
14. Sng, C.T.S., Manlove, D.: Popular matchings in the weighted capacitated house
allocation problem. J. Discrete Algorithms 8(2), 102–116 (2010)
Vertex Disjoint Paths in Upward Planar Graphs

Saeed Akhoondian Amiri1 , Ali Golshani2 ,

Stephan Kreutzer1 , and Sebastian Siebertz1
1
Technische Universität Berlin, Germany
{saeed.akhoondianamiri,stephan.kreutzer,sebastian.siebertz}@tu-berlin.de
2
University of Tehran, Iran
ali.golshani@ut.ac.ir

Abstract. The k-vertex disjoint paths problem is one of the most stud-
ied problems in algorithmic graph theory. In 1994, Schrijver proved that
the problem can be solved in polynomial time for every fixed k when
restricted to the class of planar digraphs and it was a long standing open
question whether it is fixed-parameter tractable (with respect to param-
eter k) on this restricted class. Only recently, Cygan et al. [5] achieved
a major breakthrough and answered the question positively. Despite the
importance of this result, it is of rather theoretical importance. Their
proof technique is both technically extremely involved and also has a
doubly exponential parameter dependence. Thus, it seems unrealistic
that the algorithm could actually be implemented. In this paper, there-
fore, we study a smaller but well studied class of planar digraphs, the
class of upward planar digraphs which can be drawn in a plane such that
all edges are drawn upwards. We show that on this class the problem
(i) remains NP-complete and (ii) problem is fixed-parameter tractable.
While membership in FPT follows immediately from [5]’s general result,
our algorithm is very natural and has only singly exponential parameter
dependence and linear dependence on the graph size, compared to the
doubly exponential parameter dependence and much higher polynomial
dependence on the graph size for general planar digraphs. Furthermore,
our algorithm can easily be implemented, in contrast to the algorithm
in [5].

1 Introduction

Computing vertex or edge disjoint paths in a graph connecting given sources

to sinks is one of the fundamental problems in algorithmic graph theory with
applications in VLSI-design, network reliability, routing and many other areas.
There are many variations of this problem which differ significantly in their com-
putational complexity. If we are simply given a graph (directed or undirected)
and two sets of vertices S, T of equal cardinality, and the problem is to compute
|S| pairwise vertex or edge disjoint paths connecting sources in S to targets in T ,
then this problem can be solved efficiently by standard network flow techniques.
A variation of this is the well-known k-vertex disjoint paths problem, where the
sources and targets are given as lists (s1 , . . . , sk ) and (t1 , . . . , tk ) and the problem

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 52–64, 2014.

c Springer International Publishing Switzerland 2014
Vertex Disjoint Paths in Upward Planar Graphs 53

is to ﬁnd k vertex disjoint paths connecting each source si to its corresponding

target ti . The k-disjoint paths problem is NP-complete in general and remains
NP-complete even on planar undirected graphs (see [10]).
On undirected graphs, it can be solved in polynomial time for any fixed num-
ber k of source/target pairs. This was first proved for the 2-disjoint paths prob-
lems, for instance in [18,19,21,14], before Robertson and Seymour proved in [16]
that the problem can be solved in polynomial-time for every fixed k. In fact, they
proved more, namely that the problem is fixed-parameter tractable with parame-
ter k, that is, solvable in time f (k) · |G|c , where f is a computable function, G is
the input graph, k the number of source/target pairs and c a fixed constant (not
depending on k). See e.g. [7] for an introduction to fixed-parameter tractability.
For directed graphs the situation is quite different (see [1] for a survey). For-
tune et al. [8] proved that the problem is already NP-complete for k = 2 and
hence the problem cannot be expected to be fixed-parameter tractable on di-
rected graphs. They cannot even be expected to be fixed-parameter tractable
on acyclic digraphs, as shown by Slivkins [20]. However, on acyclic digraphs the
problem can be solved in polynomial time for any fixed k [8].
In [12], Johnson et al. introduced the concept of directed tree-width as a di-
rected analogue of undirected tree-width for directed graphs. They showed that
on classes of digraphs of bounded directed tree-width the k-disjoint paths prob-
lem can be solved in polynomial time for any fixed k. As the class of acyclic
digraphs has directed tree-width 1, Slivkins’ result [20] implies that the problem
cannot be expected to be fixed-parameter tractable on such classes.
Given the computational intractability of the directed disjoint paths prob-
lem on many classes of digraphs, determining classes of digraphs on which the
problem does become at least fixed-parameter tractable is an interesting and im-
portant problem. Using colour coding techniques, the problem can be shown to
become fixed-parameter tractable if the length of the disjoint paths is bounded.
This has, for instance, been used to show fixed-parameter tractability of the prob-
lem on classes of bounded DAG-depth [9]. In 1994, Schrijver [17] proved that the
directed k-disjoint paths problem can be solved in polynomial time for any fixed
k on planar digraphs, using a group theoretical approach and it was a long stand-
ing open question whether it is fixed-parameter tractable on this restricted class.
Only recently, Cygan et al. [5] achieved a major breakthrough and answered the
question positively. Despite the importance of this result (and the brilliance of
their proof), it is of rather theoretical importance. Their proof technique is based
on irrelevant vertices in sequences of concentric cycles of alternating orientation
and is both technically extremely involved and also has a doubly exponential
parameter dependence. It also uses the algebraic tools used by Schrijver, which
involves in particular a polynomial time algorithm for checking the existence of
a solution in a fixed cohomology class. The currently best known algorithms for
this have an impractical polynomial time running time.Thus, it seems unrealistic
that the algorithm could actually be implemented.
In this paper, therefore, we study a smaller class of planar digraphs, the class of
upward planar digraphs. These are graphs that have a plane embedding such that
54 S.A. Amiri et al.

every directed edge points “upward”, i.e. each directed edge is represented by a
curve that is monotone increasing in the y direction. Upward planar digraphs are
very well studied in a variety of settings, in particular in graph drawing applica-
tions (see e.g. [2]). In contrast to the problem of finding a planar embedding for a
planar graph, which is solvable in linear time, the problem of finding an upward
planar embedding is NP-complete in general [11]. Much work has gone into find-
ing even more restricted classes inside the upward planar class that allow to find
such embeddings in polynomial time [4,3,15].
By definition, upward planar graphs are acyclic graphs. Hence, by the above
results, the k-vertex disjoint paths problem can be solved in polynomial time on
upward planar graphs for any fixed k. As a first result in this paper we show
that the problem remains NP-complete on upward planar graphs, i.e. that we
cannot hope to find a general polynomial-time algorithm. Our construction even
shows that the problem is NP-complete on directed grid graphs.
Our second result is that the problem is fixed-parameter tractable with re-
spect to parameter k on the class of upward planar digraphs if we are given an
upward planar graph together with an upward planar embedding. We present
a linear time algorithm that has single exponential parameter dependency. The
idea of our algorithm is straight forward but the proof of its correctness requires
some work.

2 Preliminaries
By N we denote the set of non-negative integers and for n ∈ N, we write [n] for
the set {1, . . . , n}. We assume familiarity with the basic concepts from (directed)
graph theory, planar graphs and graph drawings and refer the reader to [1,2,6]
for more details. For background on parameterized complexity theory we refer
the reader to [7].
An embedding of a graph G = (V, E) in the real plane is a mapping ϕ that
maps vertices v ∈ V to points ϕv ∈ R2 and edges e = (u, v) ∈ A to continuous
functions ϕe : [0, 1] → R2 such that ϕe (0) = φu and ϕe (1) = ϕv . A plane
embedding is an embedding such that ϕe (z) = ϕe (z ) if z, z ∈ {0, 1} for all
e = e ∈ E. An upward plane embedding is a plane embedding such that every
edge is drawn “upward”, i.e. for all edges e ∈ A, if ϕe (z) = (x, y), ϕe (z ) = (x , y )
and z > z, then y ≥ y. An upward planar graph is a graph that has an upward
plane embedding. To improve readability, we will draw all graphs in this paper
from left to right, instead of upwards.
The k-vertex disjoint paths problem on upward planar graphs is the following
problem.

Vertex Disjoint Paths on Upward Planar Graphs (UpPlan-VDPP)

Input: An upward planar graph G together with an upward plane em-
bedding, (s1 , t1 ), . . . , (sk , tk ).
Problem: Decide whether there are k pairwise internally vertex disjoint
paths P1 , . . . , Pk linking si to ti , for all i.
Vertex Disjoint Paths in Upward Planar Graphs 55

3 NP-Completeness of UpPlan-VDPP

This section is dedicated to the proof of one of our main theorems:

Theorem 3.1. UpPlan-VDPP is NP-complete.

Before we formally prove the theorem, we give a brief and informal overview
of the proof structure. The proof of NP-completeness is by a reduction from
SAT, the satisﬁability problem for propositional logic, which is well-known to
be NP-complete [10]. On a high level, our proof method is inspired by the NP-
completeness proof in [13] but the fact that we are working in a restricted class
of planar graphs requires a number of changes and additional gadgets.
Let V = {V1 , . . . , Vn } be a set of variables and C = {C1 , . . . , Cm } be a set of
clauses over the variables from V. For 1 ≤ i ≤ m let Ci = {Li,1 , Li,2 , . . . , Li,ni }
where each Li,t is a literal, i.e. a variable or the negation thereof. We will con-
struct an upward planar graph GC = (V, E) together with a set of pairs of
vertices in GC such that GC contains a set of pairwise vertex disjoint directed
paths connecting each source to its corresponding target if, and only if, C is
satisﬁable. The graph GC is roughly sketched in Fig. 1.

C1 C2 Cm

V1 G1,1,1 G1,1,2 G1,1,n1 G1,2,1 G1,m,nm V1

V2 G2,1,1 G2,1,2 G2,1,n1 G2,2,1 G2,m,nm V2

Vn Gn,1,1 Gn,1,2 Gn,1,n1 Gn,2,1 Gn,m,nm Vn

C1 C2
Cm

Fig. 1. Structure of the graph GC

We will have the source/target pairs (Vi , Vi ) ∈ V 2 for i ∈ [n] and (Cj , Cj ) ∈
V for j ∈ [m], as well as some other source/target pairs inside the gadgets
2

Gi,j,t that guarantee further properties. As the picture suggests, there will be
two possible paths from Vi to Vi , an upper path and a lower path and our
construction will ensure that these paths cannot interleave. Any interpretation
of the variable Vi will thus correspond to the choice of a unique path from Vi to
Vi . Furthermore, we will ensure that there is a path from Cj to Cj if and only
if some literal is interpreted such that Cj is satisﬁed under this interpretation.
56 S.A. Amiri et al.

We need some additional gadgets which we describe ﬁrst to simplify the pre-
sentation of the main proof.
Routing Gadget: The rôle of a routing gadget is to act as a planar routing
device. It has two incoming connections, the edges et from the top and el from
the left, and two outgoing connections, the edges eb to the bottom and er to the
right. The gadget is constructed in a way that in any solution to the disjoint
paths problem it allows for only two ways of routing a path through the gadget,
either using et and eb or el and er .

⇓ et ⇓
et
el er 2
⇒ ⇒ 1
2 1
eb ⇓
e1
⇒ el e3
e4 er ⇒
e2

3 4

4 3
eb
⇓

Fig. 2. The routing gadget. In the following, when a routing gadget appears as a
subgadget in a ﬁgure, it will be represented by a black box as shown on the left.

Formally, the gadget is deﬁned as the graph displayed in Fig. 2 with source/tar-
get pairs (i, i) for i ∈ [4]. Immediately from the construction of the gadget we
get the following lemma which captures the properties of routing gadgets needed
in the sequel.

Lemma 3.2. Let R be a routing gadget with source/target pairs (i, i) for i ∈ [4].

1. There is a solution of the disjoint paths problem in R.

2. Let P1 , . . . , P4 be any solution tothe disjoint paths problem in R, where Pi
4
links vertex i to i. Let H := R \ i=1 Pi .
(a) H neither contains a path which goes through et to er nor a path which
goes through el to eb .
(b) H either contains a unique path P which goes through et to eb or a unique
path P in H which goes through el to er , and not both.

Crossing Gadget: A crossing gadget has two incoming connections to its left
via the vertices H in and Lin and two outgoing connections to its right via the
vertices H out and Lout . Furthermore, it has one incoming connection at the
Vertex Disjoint Paths in Upward Planar Graphs 57

top via the vertex T and outgoing connection at the bottom via the vertex B.
Intuitively, we want that in any solution to the disjoint paths problem, there is
exactly one path P going from left to right and exactly one path P going from
top to bottom. Furthermore, if P enters the gadget via H in then it should leave
it via H out and if it enters the gadget via Lin then it should leave it via Lout . Of
course, in a planar graph there cannot be such disjoint paths P, P as they must
cross at some point. We will have to split one of the paths, say P , by removing
the outward source/sink pair and introducing two new source/sink pairs, one to
the left of P and one to its right.

e+
m1
e−
H in b1 m2 m4 b4 m7 m9 H out

b3 b6

X W m0 Z Y

b2 b5

Lin m3 m5 m6 m8 m10 m11 Lout

m12

Fig. 3. The crossing gadget

Formally, the gadget is deﬁned as the graph displayed in Fig. 3. The following
lemma follows easily from Lemma 3.2.

Lemma 3.3. Let G be a crossing gadget.

in
1. There are uniquely determined vertex disjoint paths
3 P1 from H to W , P2
from T to B and P3 from X to Y . Let H := G \ i=1 Pi . Then H contains
a path from Z to H out but it does not contain a path from Z to Lout .
in
3 Q1 from L to W , Q2
2. There are uniquely determined vertex disjoint paths
from T to B and Q3 from X to Y . Let H := G \ i=1 Qi . Then H contains
a path from Z to Lout but it does not contain a path from Z to H out .

The next lemma shows that we can connect crossing gadgets in rows in a useful
way. It follows easily by induction from Lemma 3.3.
58 S.A. Amiri et al.

Let G1 , . . . , Gs be a sequence of crossing gadgets drawn from left to right in

in that order. We address the inner vertices of the gadgets by their names in
the gadget equipped with corresponding subscripts, e.g., we write H1in for the
vertex H in of gadget G1 . For each j ∈ [s− 1], we add the edges (Hjout , Hj+1
in
) and
out in
(Lj , Lj+1 ) and call the resulting graph a row of crossing gadgets. We equip this
graph with the source/target pairs (Xj , Yj ), (Zj , Wj+1 ) for j ∈ [s − 1] to obtain
an associated vertex disjoint paths problem Pr (the subscript r stands for row).
Denote by Pr+ the problem Pr with additional source/target pair (H1in , W1 ) and
by Pr− the problem Pr with additional source/target pair (Lin 1 , W1 ).

Lemma 3.4. Let G be a row of crossing gadgets. Then both associated vertex
disjoint paths problems Pr+ , Pr− have unique solutions. For all i ∈ [t − 1], each
path in the solution of Pr+ from Zi to Wi+1 passes through Hi+1in
and each path
−
in the solution of Pr from Zi to Wi+1 passes through Li+1 .
in

The next lemma shows that we can force a relation between rows and columns
of crossing gadgets.
Let G1 , . . . , Gt be a sequence of crossing gadgets drawn from top to bottom in
that order. For each i ∈ [t − 1], we add the edge (Bi , Ti+1 ) and call the resulting
graph a column of crossing gadgets. We equip this graph with the source/target
pairs (Xi , Yi ) for i ∈ [t] and with the pair (T1 , Bt ) to obtain an associated vertex
disjoint paths problem P.

Lemma 3.5. Let G be a column of crossing gadgets. Let P1 , . . . , Pt be a sequence

t such that for i ∈ [t], Pi connects either Hi or Li to
in in
of vertex disjoint paths
Wi . Let H := G \ i=1 Pi .
1. The vertex disjoint paths problem P on H has a solution.
2. There is a unique path Q connecting T1 to Bt which for all i ∈ [t] uses edge
e+ in Gi if and only if Pi starts at Hiin and the edge e− in Gi if and only if
Pi starts at Lin
i .

Note that the paths Pi as stated in the lemma exist and they are uniquely
determined by Lemma 3.3.
We are now ready to construct a vertex disjoint paths instance for any SAT
instance C.

Deﬁnition 3.6. Let C be a SAT instance over the variables V = {V1 , . . . , Vn }

and let {C1 , . . . , Cm } be its set of clauses. For j ∈ [m] let Cj = {Lj,1 , Lj,2 , . . . ,
Lj,nj }, where each Lj,s is a literal, i.e. a variable or the negation thereof.

1. The graph GC is deﬁned as follows.

– For each variable V ∈ V we introduce two vertices V and V .
– For each clause C ∈ C we introduce two vertices C and C .
– For each variable Vi and each literal Lj,t in clause j we introduce a
crossing gadget Gi,j,t .

– For i ∈ [n] we add the edges (Vi , Hi,1,1
in
), (Vi , Lin out
i,1,1 ), (Hi,m,nm , Vi ) and
out
(Li,m,nm , Vi ).
Vertex Disjoint Paths in Upward Planar Graphs 59

– For j ∈ [m], t ∈ [nj ] we add the edges (Cj , T1,j,t ) and (Bn,j,t , Cj ).
– Finally, we delete the edge e+ for all i ∈ [n], j ∈ [m], t ∈ [nj ] in Gi,j,t if
Lj,t is a variable and the edge e− if it is a negated variable.
We draw the graph GC as shown in Fig. 1.
2. We deﬁne the following vertex disjoint paths problem PC on GC . We add all
source/target pairs that are deﬁned inside the routing gadgets. Furthermore:
– For i ∈ [n], j ∈ [m], t ∈ [nj − 1], we add the pairs
• (Vi , Wi,1,1 ),
• (Zi,m,nm , Vi ),
• (Xi,j,t , Yi,j,t ) and
• (Zi,j,t , Wi,j,t+1 ).
– For i ∈ [n], j ∈ [m − 1], we add the pairs (Zi,j,nj , Wi,j+1,1 ).
– For j ∈ [m], we add the pairs (Cj , Cj ).

The proof of the following theorem is based on the fact that in our construction,
edge e+ is present in gadget Gi,j,t , if and only if Cj does not contain variable Vi
negatively and e− is present in gadget Gi,j,t , if and only if Cj does not contain
variable Vi positively (especially, both edges are present if the clause does not
contain the variable at all). In particular, every column contains exactly one
gadget where one edge is missing. Now it is easy to conclude with Lemma 3.4
and Lemma 3.5.

Theorem 3.7. Let C be a SAT-instance and let PC be the corresponding vertex

disjoint paths instance on GC as defined in Definition 3.6. Then C is satisfiable
if and only if PC has a solution.

It is easily seen that the presented reduction can be computed in polynomial

time and this ﬁnishes the proof of Theorem 3.1.
If we replace the vertices Ci and Ci with directed paths, then it is easy to
convert the graph GC to a directed grid graph, i.e. a subgraph of the inﬁnite grid.
This implies that the problem is NP-complete even on upward planar graphs of
maximum degree 4.

4 A Linear Time Algorithm for Fixed k

In this section we prove that the k-disjoint paths problem for upward planar
digraphs can be solved in linear time for any ﬁxed value of k. In other words, the
problem is ﬁxed-parameter tractable by a linear time parameterized algorithm.

Theorem 4.1. The problem UpPlan-VDPP can be solved in time O(k! · k · n),
where n := |V (G)|.

For the rest of the section we ﬁx a planar upward graph G together with an
upward planar embedding and k pairs (s1 , t1 ), . . . , (sk , tk ) of vertices. We will not
distinguish notationally between G and its upward planar embedding. Whenever
we speak about a vertex v on a path P we mean a vertex v ∈ V (G) which is
contained in P . If we speak about a point on the path we mean a point (x, y) ∈ R2
60 S.A. Amiri et al.

which is contained in the drawing of P with respect to the upward planar drawing
of G. The algorithm is based on the concept of a path in G being to the right of
another path which we deﬁne next.

Deﬁnition 4.2. Let P be a path in an upward planar drawing of G. Let (x, y)

and (x , y ) be the two endpoints of P such that y ≤ y , i.e. P starts at (x, y)
and ends at (x , y ). We deﬁne

right(P ) := {(u, v) ∈ R2 : y ≤ v ≤ y and u < u for all u such that (u , v) ∈ P }

left(P ) := {(u, v) ∈ R2 : y ≤ v ≤ y and u > u for all u such that (u , v) ∈ P }.

The next two lemmas follow immediately from the deﬁnition of upward planar
drawings.

Lemma 4.3. Let P and Q be vertex disjoint paths in an upward planar drawing
of G. Then either right(P ) ∩ Q = ∅ or left(P ) ∩ Q = ∅.

Lemma 4.4. Let P be a directed path in an upward planar drawing of a di-

graph G. For i = 1, 2, 3 let pi := (xi , yi ) be distinct points in P such that
y1 < y2 < y3 . Then p1 , p2 , p3 occur in this order on P .

Deﬁnition 4.5. Let P and Q be two vertex disjoint paths in G.

1. A point p = (x, y) ∈ R2 \ P is to the right of P if p ∈ right(P ). Analogously,
we say that (x, y) ∈ R2 \ P is to the left of P if p ∈ left(P ).
2. The path P is to the right of Q, denoted by Q ≺ P if there exists a point
p ∈ P which to the right of some point q ∈ Q. We write ≺∗ for the transitive
closure of ≺.
3. If P is a set of pairwise disjoint paths in G, we write ≺P and ≺∗P for the
restriction of ≺ and ≺∗ , resp., to the paths in P.

We show next that for every set P of pairwise vertex disjoint paths in G the
relation ≺∗ is a partial order on P. Towards this aim, we ﬁrst show that ≺ is
irreﬂexive and anti-symmetric on P.

Lemma 4.6. Let P be a set of pairwise disjoint paths in G.

1. The relation ≺P is irreﬂexive.
2. The relation ≺P is anti-symmetric, i.e. if P1 ≺P P2 then P2 ≺P P1 for any
P1 , P2 ∈ P.

Proof. The first claim immediately follows from the definition of ≺. Towards
the second statement, suppose there are P1 , P2 ∈ P such that P1 ≺P P2 and
P2 ≺P P1 .
Hence, for j = 1, 2 and i = 1, 2 there are points pij = (xij , yji ) such that pij ∈ Pi
and x11 < x21 , y11 = y12 and x12 > x22 , y21 = y22 . W.l.o.g. we assume that y11 < y21 .
Let Q ⊆ P be the subpath of P from p11 to p12 , including the endpoints. Let
Q1 := {(x11 , z) : z < y11 } and Q2 := {(x12 , z) : z > y21 } be the two lines parallel to
the y-axis going from p11 towards negative infinity and from p12 towards infinity.
Vertex Disjoint Paths in Upward Planar Graphs 61

Then Q1 ∪ Q ∪ Q2 separates the plane into two disjoint regions R1 and R2 each
containing a point of P2 . As P1 and P2 are vertex disjoint but p21 and p22 are
connected by P2 , P2 must contain a point in Q1 or Q2 which, on P2 lies between
p21 and p22 . But the y-coordinate of any point in Q1 is strictly smaller than y12
and y22 whereas the y-coordinate of any point in Q2 is strictly bigger than y12
and y22 . This contradicts Lemma 4.4.

We use the previous lemma to show that ≺∗P is a partial order for all sets P of
pairwise vertex disjoint paths.
Lemma 4.7. Let P be a set of pairwise vertex disjoint directed paths. Then ≺∗P
is a partial order.
Proof. By definition, ≺∗P is transitive. Hence we only need to show that it is anti-
symmetric for which, by transitivity, it suffices to show that ≺∗P is irreflexive.
To show that ≺∗P is irreflexive, we prove by induction on k that if P0 , . . . , Pk ∈
P are paths such that P0 ≺P · · · ≺P Pk then Pk ≺P P0 . As for all P ∈ P,
P ≺P P , this proves the lemma.
Towards a contradiction, suppose the claim was false and let k be minimum
such that there are paths P0 , . . . , Pk ∈ P with P0 ≺P · · · ≺P Pk and Pk ≺P P0 .
By Lemma 4.6, k > 1.
k−2
Let R := i=0 right(Pi ). Note that k − 2 ≥ 0, so R is not empty. Furthermore,
as for all P, Q with P ≺ Q, right(P ) ∩ right(Q) = ∅, R is a connected region in

R2 without holes. Let L := k−1 i=1 left(Pi ). Again, as k > 1, L = ∅ and L is a
connected region without holes.
As Pk−2 ≺P Pk−1 , we have L ∩ R = ∅ and therefore L ∪ R separates the plane
into two unbounded regions, the upper region T and the lower region B.
The minimality of k implies that Pi ≺P Pk for all i < k − 1 and therefore
R ∩ Pk = ∅. Analogously, as Pk ≺P Pi for any i > 0, we have L ∩ Pk = ∅.
Hence, either Pk ⊆ B or Pk ⊆ T . W.l.o.g. we assume Pk ⊆ B. We will show that
left(P0 ) ∩ B = ∅.
Suppose there was a point (x, y) ∈ P and some x < x such that (x , y) ∈ B.
This implies that y < v for all (u, v) ∈ L. But this implies that B is bounded by
right(P0 ) and L contradicting the fact that right(Pk−1 ) ∩ B = ∅.

We have shown so far that ≺∗ is a partial order on every set of pairwise vertex
disjoint paths.
Remark 4.8. Note that if two paths P, Q ∈ P are incomparable with respect to
≺∗P then one path is strictly above the other, i.e. (right(P )∪left(P ))∩(right(Q)∪
left(Q)) = ∅. This is used in the next lemma.
Deﬁnition 4.9. Let s, t ∈ V (G) be vertices in G such that there is a directed
path from s to t. The right-most s-t-path in G is an s-t-path P such that for all
s-t-paths P , P ⊆ P ∪ right(P ).
Lemma 4.10. Let s, t ∈ V (G) be two vertices and let P be a path from s to t in
an upward planar drawing of G. If P is an s-t path such that P ∩ right(P ) = ∅
then there is an s-t path Q such that Q ⊆ P ∪ right(P ) and Q ∩ right(P ) = ∅.
62 S.A. Amiri et al.

Proof. If P ⊆ P ∪right(P ) we can take Q = P . Otherwise, i.e. if P ∩left(P ) = ∅,

then as the graph is planar this means that P and P share internal vertices.
In this case we can construct Q from P ∪ P where for subpaths of P and P
between two vertices in P ∩ P we always take the subpath to the right.

Corollary 4.11. Let s, t ∈ V (G) be vertices in G such that there is a directed

path from s to t. Then there is a unique right-most s-t-path in G.

The corollary states that between any two s and t, if there is an s-t path then
there is a rightmost one. The proof of Lemma 4.10 also indicates how such a
path can be computed. This is formalised in the next lemma.

Lemma 4.12. There is a linear time algorithm which, given an upward planar
drawing of a graph G and two vertices s, t ∈ V (G) computes the right-most
s-t-path in G, if such a path exists.

Proof. We first use a depth-first search starting at s to compute the set of ver-
tices U ⊆ V (G) reachable from s. Clearly, if t ∈ U then there is no s-t-path and
we can stop. Otherwise we use a second, inverse depth-first search to compute the
set U ⊆ U of vertices from which t can be reached. Finally, we compute the right-
most s-t path inductively by starting at s ∈ U and always choosing the right-most
successor of the current vertex until we reach t. The right-most successor is deter-
mined by the planar embedding of G. As G is acyclic, this procedure produces the
right-most path and can clearly be implemented in linear time.

We show next that in any solution P to the disjoint paths problem in an upward
planar digraph, if P ∈ P is a maximal element with respect to ≺∗P , we can
replace P by the right-most s-t path and still get a valid solution, where s and
t are the endpoints of P .

Lemma 4.13. Let G be an upward planar graph with a ﬁxed upward planar
embedding and let (s1 , t1 ), . . . , (sk , tk ) be pairs of vertices. Let P be a set of
pairwise disjoint paths connecting (si , ti ) for all i. Let P ∈ P be path connecting
si and ti , for some i, which is maximal with respect to ≺∗P . Let P be the right-
most si − ti -path in G. Then P \ {P } ∪{P } is also a valid solution to the disjoint
paths problem on G and (s1 , t1 ), . . . , (sk , tk ).

Proof. All we have to show is that P is disjoint from all Q ∈ P \ {P }. Clearly,

as P and P are both upward si -ti paths, we have P ⊆ P ∪ left(P ) ∪ right(P ).
By the remark above, if P and Q are incomparable with respect to ≺∗P ,
then one is above the other and therefore Q and P must be disjoint. Now
suppose Q and P are comparable and therefore Q ≺∗P P . This implies that
(P ∪ right(P )) ∩ Q = ∅ and therefore Q ∩ P = ∅.

The previous lemma yields the key to the proof of Theorem 4.1 :
Proof of Theorem 4.1. Let G with an upward planar drawing of G and k pairs
(s1 , t1 ), . . . , (sk , tk ) be given. To decide whether there is a solution to the disjoint
paths problem on this instance we proceed as follows. In the ﬁrst step we compute
Vertex Disjoint Paths in Upward Planar Graphs 63

for each si the set of vertices reachable from si . If for some i this does not include
ti we reject the input as obviously there cannot be any solution.
In the second step, for every possible permutation π of {1, . . . , k} we proceed
as follows. Let i1 := π(k), . . . , ik := π(1) be the numbers 1 to k ordered as
indicated by π and let uj := sij and vj := tij , for all j ∈ [k]. We can view π as
a linear order on 1, . . . , k and for every such π we will search for a solution P of
the disjoint paths problem for which ≺∗P is consistent with π.
For a given π as above we inductively construct a sequence P0 , . . . , Pk of sets
of pairwise vertex disjoint paths such that for all i, Pi contains a set of i paths
P1 , . . . , Pi such that for all j ∈ [i] Pj links uj to vj . We set P0 := ∅ which
obviously satisfies the condition. Suppose for some 0 ≤ i < k, Pi has already
been constructed. To obtain Pi+1 we compute the right-most path linking ui+1
to vi+1 in the graph G\ Pi . By Lemma 4.12, this can be done in linear time for
each such pair (si+1 , ti+1 ). If there is such a path P we define Pi+1 := Pi ∪ {P }.
Otherwise we reject the input. Once we reach Pk we stop and output Pk as
solution.
Clearly, for every permutation π the algorithm can be implemented to run in
time O(k · n), using Lemma 4.12, so that the total running time is O(k! · k · n)
as required.
Obviously, if the algorithm outputs a set P of disjoint paths then P is indeed a
solution to the problem. What is left to show is that whenever there is a solution
to the disjoint path problem, then the algorithm will find one.
So let P be a solution, i.e. a set of k paths P1 , . . . , Pk so that Pi links si
to ti . Let ≤ be a linear order on {1, . . . , k} that extends ≺∗P and let π be
the corresponding permutation such that (u1 , v1 ), . . . , (uk , vk ) is the ordering
of (s1 , t1 ), . . . , (sk , tk ) according to ≤. We claim that for this permutation π the
algorithm will find a solution. Let P be the right-most uk -vk -path in G as com-
puted by the algorithm. By Lemma 4.13, P \ {Pk } ∪ P is also a valid solution so
we can assume that Pk = P . Hence, P1 , . . . , Pk−1 form a solution of the disjoint
paths problem for (u1 , v1 ), . . . , (uk−1 , vk−1 ) in G\ P . By repeating this argument
we get a solution P := {P1 , . . . , Pk } such that each Pi links ui to vi and is the
right-most ui -vi -path in G \ j>i Pj . But this is exactly the solution found by
the algorithm. This prove the correctness of the algorithm and concludes the
proof of the theorem.

We remark that we can easily extend this result to “almost upward planar”
graphs, i.e., to graphs such that the deletion of at most h edges yields an upward
planar graph. As ﬁnding an upward planar drawing of an upward planar graph
is NP-complete, this might be of use if we have an approximation algorithm that
produces almost upward planar embeddings.

5 Conclusion

In this paper we showed that the k-vertex disjoint paths problem is NP-complete
on a restricted and yet very interesting class of planar digraphs. On the other
64 S.A. Amiri et al.

hand, we provided a fast algorithm to approach this hard problem by ﬁnding

good partial order. It is an interesting question to investigate whether the k!
factor in the running time of our algorithm can be improved. Another direction of
research is to extend our result to more general but still restricted graph classes,
such as to digraphs embedded on a torus such that all edges are monotonically
increasing in the z-direction or to acyclic planar graphs.

References
1. Bang-Jensen, J., Gutin, G.Z.: Digraphs - Theory, Algorithms and Applications,
2nd edn. Springer (2010)
2. Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms
for the Visualization of Graphs. Prentice-Hall (1999)
3. Bertolazzi, P., Di Battista, G., Liotta, G., Mannino, C.: Upward drawings of tri-
connected digraphs. Algorithmica 12(6), 476–497 (1994)
4. Bertolazzi, P., Di Battista, G., Mannino, C., Tamassia, R.: Optimal upward pla-
narity testing of single-source digraphs. SIAM J. Comput. 27(1), 132–169 (1998)
5. Cygan, M., Marx, D., Pilipczuk, M., Pilipczuk, M.: The planar directed k-vertex-
disjoint paths problem is ﬁxed-parameter tractable. In: 2013 IEEE 54th Annual
Symposium on Foundations of Computer Science, pp. 197–206 (2013)
6. Diestel, R.: Graph Theory, 3rd edn. Springer (2005)
7. Downey, R., Fellows, M.: Parameterized Complexity. Springer (1998)
8. Fortune, S., Hopcroft, J.E., Wyllie, J.: The directed subgraph homeomorphism
problem. Theor. Comput. Sci. 10, 111–121 (1980)
9. Ganian, R., Hliněný, P., Kneis, J., Langer, A., Obdržálek, J., Rossmanith, P.: On
digraph width measures in parameterized algorithmics. In: Chen, J., Fomin, F.V.
(eds.) IWPEC 2009. LNCS, vol. 5917, pp. 185–197. Springer, Heidelberg (2009)
10. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W.H. Freeman (1979)
11. Garg, A., Tamassia, R.: On the computational complexity of upward and rectilinear
planarity testing. SIAM J. Comput. 31(2), 601–625 (2001)
12. Johnson, T., Robertson, N., Seymour, P.D., Thomas, R.: Directed tree-width. J.
Comb. Theory, Ser. B 82(1), 138–154 (2001)
13. Lynch, J.F.: The equivalence of theorem proving and the interconnection problem.
ACM SIGDA Newsletter 5(3), 31–36 (1975)
14. Ohtsuki, T.: The two disjoint path problem and wire routing design. In: Saito, N.,
Nishizeki, T. (eds.) Graph Theory and Algorithms. LNCS, vol. 108, pp. 207–216.
Springer, Heidelberg (1981)
15. Papakostas, A.: Upward planarity testing of outerplanar dags (extended abstract)
(1995), 10.1007/3-540-58950-3 385
16. Robertson, N., Seymour, P.D.: Graph minors XIII. The disjoint paths problem.
Journal of Combinatorial Theory, Series B 63, 65–110 (1995)
17. Schrijver, A.: Finding k disjoint paths in a directed planar graph. SIAM Jornal on
Computing 23(4), 780–788 (1994)
18. Seymour, P.D.: Disjoint paths in graphs. Discrete Math. 29, 293–309 (1980)
19. Shiloach, Y.: A polynomial solution to the undirected two paths problem. J.
ACM 27, 445–456 (1980)
20. Slivkins, A.: Parameterized tractability of edge-disjoint paths on directed acyclic
graphs. In: European Symposium on Algorithms, pp. 482–493 (2003)
21. Thomassen, C.: 2-linked graphs. European Journal of Combinatorics 1, 371–378
(1980)
On Lower Bounds for Multiplicative Circuits and
Linear Circuits in Noncommutative Domains

V. Arvind1 , S. Raja1 , and A.V. Sreejith2

1
The Institute of Mathematical Sciences (IMSc), Chennai, India
{arvind,rajas}@imsc.res.in
2
Tata Institute of Fundamental Research (TIFR), Mumbai, India
sreejith@imsc.res.in

Abstract. In this paper we show some lower bounds for the size of
multiplicative circuits computing multi-output functions in some non-
commutative domains such as monoids and ﬁnite groups. We also intro-
duce and study a generalization of linear circuits in which the goal is to
compute M Y where Y is a vector of indeterminates and M is a matrix
whose entries come from noncommutative rings. We show some lower
bounds in this setting as well.

1 Introduction

Let (S, ◦) be a semigroup, i.e., S is a set closed under the binary operation ◦
which is associative. A natural multi-output computational model is a circuit
over (S, ◦). The circuit is given by a directed acyclic graph with input nodes
labeled x1 , ..., xn of indegree 0 and output nodes y1 , ..., ym of outdegree 0.
The gates of the circuit all compute the monoid product. We assume that all
gates have fanin 2. The size of the circuit is the number of nodes in it and it
computes a function f : S n → S m .
This provides a general setting to some well studied problems in circuit com-
plexity. For example:
(1) If S = F2 and ◦ is addition in F2 , the problem is one of computing Ax for
an m × n matrix over F2 . The problem of giving an explicit A such that the size
of any circuit for it is superlinear is a longstanding open problem. By means of
counting arguments, we know that there exist such matrices A [11].
This problem has a rich literature with many interesting developments. Mor-
genstern [7] showed an Ω(n log n) lower bound for the Hadamard matrix in the
bounded coeﬃcient model when F = C. Valiant [11] developed matrix rigid-
ity as a means to attack the problem in the case of logarithmic depth circuits.
In spite of many interesting results and developments, superlinear size lower
bounds remain elusive over any ﬁeld F even for the special case of log-depth
circuits (Lokam’s monograph [6] contains most of the recent results).
(2) When S = {0, 1} and ◦ is the boolean OR, this problem is also well studied
and due to its monotone nature it has explicit lower bounds of circuit size n2−o(1)
(e.g. see section 3.4 in [3]).

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 65–76, 2014.

c Springer International Publishing Switzerland 2014
66 V. Arvind, S. Raja, and A.V. Sreejith

A more restricted form is S = (N, +) called SUM circuits also well studied e.g.
[3]. While for monotone settings (OR,SUM circuits) there are nontrivial lower
bounds, in the commutative case for S we do not have strong lower bounds
results. In this paper, we explore the case when (S, ◦) is noncommutative and
manage to prove strong lower bounds in some cases.
An interesting aspect is that the number of inputs can be restricted to just two:
x0 , x1 . The explicit functions yi , 1 ≤ i ≤ m are deﬁned as words yi = yi1 yi2 ...yin
where yij ∈ {x0 , x1 } and {y1 , y2 , ..., ym } are explicitly deﬁned. We show that any
circuit C : {x0 , x1 } → {y1 , y2 , ..., ym } is of size Ω( log
mn
2 n ) in the following four

settings:

1. When (S, ◦) is the free monoid X ∗ for X such that |X| ≥ 2.

2. When (S, ◦) is the ﬁnite matrix semigroup over the boolean ring and matrices
are of dimension nc × nc for some constant c > 0.
3. When (S, ◦) is the free group GX generated by X = {x1 , x2 , x−1 −1
1 , x2 }.
4. When (S, ◦) is the permutation group where S = SN for N = n for some
d

constant d > 0.

In Section 6, we show lower bounds for a generalization of linear circuits

model. In this model we allow coeﬃcients to come from noncommutative rings.

2 Circuits over Free Monoids

We consider the free monoid X ∗ where X is a ﬁnite alphabet and the monoid
operation is concatenation with the empty string as identity. The notion of
a multiplicative circuits over a free monoid is also known in th area of data
compression as a straight line program [5].
Notice that when X is a singleton set X = {1} then (1∗ , ◦) is essentially
the semigroup (N, +). We consider the simplest noncommutative setting with
X = {0, 1}. In the problem, we consider circuits that take the ”generating set”
X as input and the m outputs y1 , y2 , ..., ym ∈ X n ( where n is the ”input”
parameter).
Since each yi is of length n, clearly n gates are suﬃcient to compute each
yi and hence O(mn) is an obvious upper bound for the circuit size. We will
give an explicit set y1 , y2 , ..., ym ∈ {0, 1}n so that Ω( log
mn
2 n ) is the circuit size

lower bound. We will let m = n in the construction and it can be suitably

generalized to larger values of m. We now explain the construction of the set
S = {y1 , y2 , ..., ym } ⊆ {0, 1}n.

Construction of S

Consider the set [n2 ] of the ﬁrst n2 natural numbers. Each i ∈ [n2 ] requires
2 log2 n bits to represent in binary. Initially let D = [n2 ].

for i = 1, ..., n do
On Lower Bounds for Multiplicative Circuits and Linear Circuits 67

pick the ﬁrst 2 log

n
n numbers from the current D, concatenate their binary
representations to obtain yi and remove these numbers from D.
end for

This defines the set S = {y1 , y2 , ..., yn }. Each yi constructed has the property
that yi has ≥ 2 log
n
n distinct substrings of length 2 log n. We show the following
two result about these strings:
– For each yi ∈ S any concatenation circuit that generates yi from input
X = {0, 1} requires size Ω( logn2 n ).
– Any concatenation circuit that takes X = {0, 1} as input and outputs S =
2
{y1 , y2 , ..., yn } at n output gates requires size Ω( logn2 n ).
Lemma 1. Let s ∈ X n be any string where |X| ≥ 2, such that the number of
distinct substrings of s of length l is N . Then any concatenation circuit for s
will require Ω( Nl ) gates.
Proof. Let C be any circuit that computes the string s. Now each gate g of C
computes some string sg . Suppose g = g1 ◦ g2 is a gate whose inputs are gates
g1 , g2 .
Suppose sg1 has k1 distinct substrings of length l and sg2 has k2 distinct
substrings of length l. Now, in sg notice that the new substrings of length l (not
occurring in sg1 or sg2 ) could only arise as a concatenation of some suffix of sg1
and prefix of sg2 such that neither of them is the empty string. The number of
such substrings is at most l − 1.
Hence, sg can have at most k1 + k2 + l − 1 distinct substrings of length l. Thus,
each new gate of C can generate at most l − 1 new substrings of length l. Since
the output string s has N distinct length l substrings, it follows that number of
gates in C is Ω( Nl ).

Note the case not covered by the lemma: |X| = 1. In that case we know that
every string of length n (for every n) has a concatenation circuit of size ≤ 2 log2 n
and the circuit exploits the fact that for each length l there is a unique string.
Similar to Lemma 1 is known earlier (e.g. see Lemma 3 in [2]).
Theorem 1. Let S ⊆ {0, 1}n be the explicit set of n strings defined above. Any
concatenation circuit that takes X = {0, 1} as input and outputs S at its n output
2
gates will require size Ω( logn2 n ).
Proof. Let S = {y1 , y2 , ..., yn } as defined above. Notice that, each yi can be
generated by size n circuit. Let C be any concatenation circuit that takes X =
{0, 1} as inputs and at its n output gates generates y1 , y2 , ..., yn respectively. Let

C be a concatenation circuit obtained from C by adding n − 1 new gates such

that C outputs the concatenation y = y1 y2 ...yn . By construction size(C ) =
size(C)+n−1. The number of distinct length 2 log n strings in the string y is, by
n2
construction, ≥ 2 log n . This is because each yi has ≥ 2 log n distinct substrings
n
2
and these are disjoint for different yi . Hence by Lemma 1, size(C ) = Ω( logn2 n )
2
which implies size(C) = Ω( logn2 n ).

68 V. Arvind, S. Raja, and A.V. Sreejith

3 Circuits over Matrix Semigroups

The setting now is that of a finite monoid (M, ◦) where M consisting of p(n) ×
p(n) matrices whose entries come from the Boolean semiring {0, 1, ∨, ∧}. We
will modify the lower bound of the previous section to make it work over (M, ◦)
which is a finite monoid.
Recall we constructed S = {y1 , y2 , ..., yn }⊆ {0, 1}n . Let Dl be the
set of all
length l substrings of each yi ∈ S. Let D = nl=0 Dl . Clearly |D| = nl=0 |Dl | ≤
n3 . The matrices in M are |D|×|D|. We now define two functions f0 , f1 : D → D
corresponding to the generating set X = {0, 1} of the free monoid. For b ∈ {0, 1},
define

s◦b s◦b ∈ D
fb (s) =
otherwise
where s ∈ {0, 1}∗. These give rise to two matrices Mb , b ∈ {0, 1}. The rows and
columns of Mb are indexed by elements of D and Mb (s, s ◦ b) = 1 if s ◦ b ∈ D

and Mb (s, s ) = 0 if s ◦ b = s . If s ◦ b ∈ / D then Mb (s, ) = 1 and ∀s = ,

Mb (s, s ) = 0.
Thus, we have defined a morphism, Φ : (X ∗ , ◦) → (M, ◦) which maps
b → Mb , b ∈ {0, 1} and by natural extension maps a string s ∈ X ∗ to Ms .
In particular, the set S = {y1 , y2 , ..., yn } defined in section 2 is mapped to
Ŝ = {My1 , My2 , ..., Myn }.

Theorem 2. Any circuit over (M, ◦) that takes M0 , M1 as input and computes
2
{Myi |yi ∈ S} at its n output gates is of size Ω( logn2 n ).

Proof. Let C be a circuit over (M, ◦) computing Myi , 1 ≤ i ≤ n at the n output

gates and input M0 , M1 . Consider the corresponding circuit C over the free
∗
monoid X with input X = {0, 1}. Let gi be the output gate of C computing

Myi , 1 ≤ i ≤ n. In C let wi ∈ X ∗ be the word computed at gi . We know that
Mwi = Myi for 1 ≤ i ≤ n. That means Mwi (, yi ) = 1. By deﬁnition of the
matrices Mb , the only way this can happen is when wi = yi ◦ zi for some zi ∈ X ∗

for each i. Now, let C be a new circuit obtained from C that outputs the

concatenation of w1 , w2 , ..., wn in that order. Then size(C ) ≤ size(C ) + n − 1.

The output string by C is of the form y1 ◦ z1 ◦ y2 ◦ z2 ◦ ... ◦ yn ◦ zn . Since
the number of distinct substrings of length 2 log n in {y1 , y2 , ..., yn } we know
2 2
is ≥ logn2 n , it follows by Lemma 1 that size(C ) = Ω( logn2 n ). Consequently,
2
size(C) = size(C ) = Ω( logn2 n ). This completes the proof.

4 Circuits over Free Groups

We consider the free group GX generated by the set X = {x1 , x2 , x−1 −1
1 , x2 }
consisting of x1 , x2 and their inverses. The group operation is concatenation with
the empty string as identity and the only cancellation rules we can repeatedly
use are xi x−1
i = x−1 ∗
i xi = for i ∈ {1, 2}. Given a word w ∈ X we can repeatedly
On Lower Bounds for Multiplicative Circuits and Linear Circuits 69

apply these rules and obtain a normal form w ∈ GX from it which cannot be
simpliﬁed further. This normal form, by Church-Rosser property, is unique and
independent of how we apply the rules.
Recall the set of binary strings we constructed in Section 2. Replacing 0 by
x1 and 1 by x2 we obtain S = {y1 , y2 , ..., yn } ⊆ {x1 , x2 }n ⊆ GX . Each word yi
constructed has the property that yi has ≥ 2 log n
n distinct subwords of length
2 log n. These words are already in their normal forms.

Lemma 2. Let w ∈ GX be any word where X = {x1 , x2 , x−1 −1

1 , x2 }, such that
the number of distinct subwords of length l in its normal form w is N . Then
any concatenation circuit for w will require size Ω( Nl ) gates.

Proof. Let C be any circuit that computes the word w. Now each gate g of C

computes some word wg and, as above, wg denotes its normal form.
Suppose g = g1 ◦ g2 is a gate whose inputs are gates g1 , g2 . Then, by the
Church-Rosser property of cancellations, the normal form for wg satisﬁes

wg = (wg 1 ◦ wg 2 ) .

Suppose wg1 has k1 distinct subwords of length l and wg2 has k2 distinct

subwords of length l. Now, in wg notice that the new subwords of length l (not

occurring in wg1 or wg2 ) could only arise as a concatenation of some suﬃx of

word wg1 and preﬁx of word wg2 such that neither of them is the empty string.
The number of such new subwords is at most l. Hence, wg can have at most
k1 + k2 + l distinct subwords of length l.
Now, since the normal form w for the output word w has N distinct length
l subwords, it follows that number of gates in C is Ω( Nl ).

Theorem 3. Let S ⊆ {x1 , x2 }n ⊆ GX be the explicit set of n words deﬁned

above. Any concatenation circuit that takes X = {x1 , x2 , x−1 −1
1 , x2 } as input and
n2
outputs S at its n output gates will require size Ω( log2 n ).

Proof. Let S = {y1 , y2 , ..., yn } as defined above and let C be any concatenation
circuit that takes X = {x1 , x2 , x−1 −1
1 , x2 } as inputs and at its n output gates

generates y1 , y2 , ..., yn respectively. Let C be a concatenation circuit obtained

from C by adding n − 1 new gates such that C outputs the concatenation

y = y1 y2 ...yn . By construction size(C ) = size(C) + n − 1. The number of
n2
distinct length 2 log n words in the words y is, by construction, ≥ 2 log n . This is
because each yi has ≥ 2 log n
n distinct subwords and these are disjoint for different
2 2
yi . Hence by Lemma 2, size(C ) = Ω( logn2 n ) which implies size(C) = Ω( logn2 n ).

12 10
Remark 1. Let M0 = , M1 = be 2 × 2 matrices. Consider the
01 21
infinite group G generated by these elements and their inverses over the field
of rationals Q. It is well known (e.g. see [4] for a nice complexity theoretic
70 V. Arvind, S. Raja, and A.V. Sreejith

application) that the group G is isomorphic to the free group GX , where the
isomorphism is deﬁned by x1 → M0 and x2 → M1 . It follows that Theorem 3
also applies to the group G by setting x1 = M0 and x2 = M1 .

5 Circuits over Permutation Groups

We now present a lower bound in the setting of finite groups. We will transform
our free monoid construction to this setting. Recall the set of binary strings S we
constructed in Section 2. To this end, we will define two permutations π0 , π1 ∈ SN
(where N = poly(n) will be defined later). These permutations correspond to
X = {0, 1} and by multiplication the target output permutations are defined:
GS = {πyi = Πj=1n
πyi [j] |yi ∈ S}, where yi [j] is the j-th bit of string yi .

Definition of π0 , π1 :
We pick r primes p1 , p2 , ..., pr where r = n2 such that n < p1 < p2 < ... <
pr < n4 . The permutation π0 is defined as the product of r + 1 disjoint cycles,
π0 = C0 .C1 ...Cr where C0 , C1 are of length p1 and for i ≥ 2, Ci is of length

pi . Similarly, π1 = C0 .C1 ...Cr is a product of r + 1 disjoint cycles with C0 and

C1 of length p1 and for i ≥ 2, Ci is of length pi . Let supp(C) denote the set
of points moved by C for a cycle C (i.e., if we write C = (i1 i2 ...ip ) it means
C maps i1 to i2 and so on ip to i1 and moves no other element of the domain.
Hence, supp(C) = {i1 , i2 , ..., ip }). In the construction above we pick the cycles

Ci and Ci , 0 ≤ i ≤ r such that supp(C0 ) ∩ supp(C0 ) = {1} and ∀(i, j) = (0, 0)

supp(Ci ) ∩ supp(Cj ) = φ. The domain [N ] on which these permutations are

defined is ri=0 (supp(Ci ) ∪ supp(Ci )). Note that N ≤ 4p1 + 2 ri=2 pi = O(n6 ).
Thus, the problem we consider is that of designing a circuit over SN that takes
n
as input x0 , x1 and outputs at the n output gates πyi = Πj=1 xyi [j] where yi [j]
is the j-th bit of string yi for each yi ∈ S.
Theorem 4. Any circuit over the group (SN , ◦) that takes as input π0 , π1 and
2
computes GS = {πyi |yi ∈ S} as output is of size Ω( logn2 n ).

Proof. Let C be the circuit that solves this problem of computing GS from x0 , x1 .
We ﬁx the input as x0 = π0 and x1 = π1 . Now, consider the corresponding

concatenation circuit C with input x0 , x1 ∈ X. At each output gate gi , 1 ≤ i ≤

m, circuit C computes some word wi ∈ X ∗ such that ∀i, πwi = πyi where πwi
is the permutation in SN obtained by putting x0 = π0 and x1 = π1 in wi . If

wi = yi for all i, then in fact C as a concatenation circuit computes the set S
2
at its output gates. This implies by Theorem 1 that size(C ) = Ω( logn2 n ) and
2
size(C) = Ω( logn2 n ).
Suppose wi = yi at some output gate gi . We can write wi = u ◦ b2 ◦ s and
yi = v ◦ b1 ◦ s where b1 = b2 . Assume, without loss of generality, that b1 = 0 and
b2 = 1. Since πwi = πyi , we know πu πb2 πs = πv πb1 πs (i.e., πu π1 πs = πv π0 πs ).
Let α ∈ [N ] such that πs (α) = 1. In πyi = πv π0 πs , the permutation π0 will
On Lower Bounds for Multiplicative Circuits and Linear Circuits 71

map 1 to β ∈ C0 \{1}, whereas in πwi = πu π1 πs the permutation π1 maps 1

to γ ∈ C0 \{1}. Since |v| < n the point β cannot be moved back to 1 and

subsequently to C0 \{1}. This is because p1 > n and the length of cycle C0 is p1 .
Therefore by πyi the point α is mapped to some point in C0 \{1}. Since πwi must

map α to the same point and π1 πs has mapped α to a point in γ ∈ C0 \{1}, πu
must have at least p1 > n occurrences of π1 in it to move γ to 1 and subsequently
to the ﬁnal point in C0 \{1} (using some π0 applications). We will now argue that
this forces wi to be a long string.

Pick any tuple of points (α1 , α2 , ..., αr ) where αi ∈ Ci , 1 ≤ i ≤ r. Notice that
only π1 moves this tuple because αi , 1 ≤ i ≤ r do not belong to supp(π0 ). Since
p1 , ..., pr are distinct primes, the permutation π1 maps (α1 , α2 , ..., αr ) to a set of
r
Πi=1 pi −1 distinct r-tuples before returning to (α1 , α2 , ..., αr ). Suppose there are
l occurrences of π1 in πyi , l < n. Thus, if πyi (α1 , α2 , ..., αr ) = (β1 , β2 , ..., βr ) then
π1l (α1 , α2 , ..., αr ) = (β1 , β2 , ..., βr ). Then πwi (α1 , α2 , ..., αr ) = (β1 , β2 , ..., βr ).
However we know number of occurrences of π1 in πwi is some k ≥ n which
means πwi (α1 , α2 , ..., αr ) = (β1 , β2 , ..., βr ) = π1k (α1 , α2 , ..., αr ).
It follows that π1k−l (α1 , α2 , ..., αr ) = (α1 , α2 , ..., αr ) which implies k − l is a
multiple of Πi=1 r
pi . Hence |wi | ≥ Πi=1 r
pi . This implies that the circuit needs
at least log Πi=1 pi multiplication gates to compute wi . This gives, size(C) ≥
r
2
log Πi=1 r
pi ≥ log 2n = n2 .
2
Putting it together size(C) = Ω( logn2 n ) in any case. This completes the proof.

6 Linear Circuits over Rings

In this section we consider a generalization of the linear circuits model. In this
generalization we allow the coefficients come from noncommutative rings. In
principle, we can expect lower bounds could be easier to prove in this model. The
circuits are more constrained when coefficients come from a noncommutative ring
as fewer cancellations can take place. This is in the same spirit as Nisan’s [8] work
on lower bounds for noncommutative algebraic branching programs. However,
in this paper we succeed in showing only some limited lower bounds. We leave
open problems that might be more accessible than the notorious problems for
linear circuits over fields.
Let (R, +, ·) be an arbitrary ring (possibly noncommutative). A linear circuit
over R takes n inputs y1 , y2 , . . . , yn labeling the indegree 0 nodes of a directed
acyclic graph. The circuit has m output nodes. Each edge of the graph is labeled
by some element of the ring R. The indegree of each non-input node is two. Each
node of the circuit computes a linear form ni=1 αi yi for αi ∈ R as follows: the
input node labeled yi computes yi . Suppose g is a node with incoming edges
from nodes g1 and g2 , and the edges (g1 , g) and (g2 , g) are labeled by α and β
respectively. If g1 and g2 computes the linear forms 1 and 2 respectively, then
g computes α1 + β2 . Thus, for an m × n matrix A over the ring R, the circuit
computes Ay at the m output gates.
72 V. Arvind, S. Raja, and A.V. Sreejith

When R is a field we get the well-studied linear circuits model [7,11,6]. How-
ever, no explicit superlinear size lower bounds are known for this model over
fields (except for some special cases like the bounded coefficient model [7] or in
the cancellation free case [1]).
When the coefficients to come from a noncommutative ring R, we prove lower
bounds for certain restricted linear circuits. Suppose the coefficient ring is R =
Fx0 , x1 consisting of polynomials over the field F in noncommuting variables
x0 and x1 .
Let M ∈ Fn×n x0 , x1 where x0 , x1 are noncommuting variables and Y =
(y1 , y2 , . . . , yn )T is a column vector of input variables. The first restriction we
consider are homogeneous linear circuits over the ring Fx0 , x1 for computing
M Y . The restriction is that for every gate g in the circuit, if g has its two
incoming edges from nodes g1 and g2 , then the edges (g1 , g) and (g2 , g) are
labeled by α and β respectively, where α, β ∈ Fx0 , x1 are restricted to be
homogeneous polynomials of same degree in the variables x0 and x1 . It follows,
as a consequence n of this restriction, that each gate g of the circuit computes a
linear form i=1 αi yi , where the αi ∈ Fx0 , x1 are all homogeneous polynomials
of the same degree. Our goal is to construct an explicit matrix M ∈ Fn×n x0 , x1
such that M Y can not be computed by any circuit C with size O(n) and depth
O(log n). We prove this by suitably generalizing Valiant’s matrix rigidity method
[11] as explained below.
Consider n × n matrices Fn×n over field F. The support of a matrix A ∈ Fn×n
is the set of locations supp(A) = {(i, j) | Aij = 0}.
Definition 1. Let F be any field. The rigidity ρr (A) of a deck of matrices A =
{A1 , A2 , . . . , AN } ⊆ Fn×n is the smallest number t for which there are a set of
t positions S ⊆ [n] × [n] and a deck of matrices B = {B1 , B2 , . . . , BN } such that
for all i: supp(Bi ) ⊆ S and the rank of Ai + Bi is bounded by r. A collection
A = {A1 , A2 , . . . , AN } ⊆ Fn×n is a rigid deck if ρ·n (A) = Ω(n2−o(1) ), where
> 0 is a constant.
Notice that for N = 1 this is precisely the notion of rigid matrices. We are
interested in constructing explicit rigid decks: I.e. a deck A such that for each
k ∈ [N ] and each 1 ≤ i, j ≤ n there is a polynomial (in n) time algorithm that
2
outputs the (i, j)th entry of Ak . We describe an explicit deck of size N = 2n
over any field F and use it to prove our first lower bound result. It is convenient
2
to write the deck as A = {Am | m ∈ {x0 , x1 }n } with matrices Am indexed by
monomials m of degree n2 in the noncommuting variables x0 and x1 . The matrix
Am is defined as follows:

1 if mij = x1
Am [i, j] =
0 if mij = x0
Note that all the matrices Am in the deck A are in Fn×n . Clearly, A is an
explicit deck. We prove that it is a rigid deck.
2
Lemma 3. The deck A = {Am | m ∈ {x0 , x1 }n } is an explicit rigid deck for
any field F.
On Lower Bounds for Multiplicative Circuits and Linear Circuits 73

Proof. Valiant [11] showed that almost all n × n 0-1 matrices over any ﬁeld F
2
have rigidity Ω( (n−r)
log n ) for target rank r. In particular, for r = · n, over any
δ·n2
ﬁeld F, there is a 0-1 matrix R for which we have ρr (R) ≥ log n for some constant
δ > 0 depending on .
δ·n2
We claim that for the deck A we have ρn (A) ≥ log n . To see this, let
2
E = {Em ∈ Fn×n |m ∈ {x0 , x1 }n } be any collection of matrices such that
δn2
|supp(Em )| ≤ log n for each m. Since the deck A contains all 0-1 matrices, in
particular R ∈ A and R = Am for some monomial m. From the rigidity of R
we know that the rank of R + Em is at least n. This proves the claim and the
lemma follows.

We now turn to the lower bound result for homogeneous linear circuits where
the coefficient ring is Fx0 , x1 . We define an explicit n × n matrix M as
2
−((i−1)n+j)
Mij = (x0 + x1 )(i−1)n+j−1 · x1 · (x0 + x1 )n . (1)

It is easy to see that we can express the matrix M as M = m∈{x0 ,x1 }n2 Am m,
n2
where A = {Am | m ∈ {x0 , x1 } } is the deck defined above.

Theorem 5. Any homogeneous linear circuit C over the coeﬃcient ring

Fx0 , x1 computing M Y , for M deﬁned above, requires either size ω(n) or depth
ω(log n).

Proof. Assume to the contrary that C is a homogeneous linear circuit of size O(n)
and depth O(log n) computing M Y . We know that by Valiant’s graph-theoretic
argument (see e.g. [6]) that in the circuit C there is a set of gates V of cardinality
s = logc1log n = o(n) such that at least n − n
n 2 1+δ
, for δ < 1, input-output pairs
have all their paths going through V . Thus, we can write M = B1 B2 + E
where B1 ∈ Fn×s x0 , x1 and B2 ∈ Fs×n x0 , x1 and E ∈ Fn×n x0 , x1 , and
|supp(E)| ≤ n1+δ . By collecting the matrix coeﬃcient of each monomial we can
express M and E as

M= Am m, and E = Em m,
m∈{x0 ,x1 }n2 m∈{x0 ,x1 }n2

where Am are already defined and | ∪m∈{x0 ,x1 }n2 supp(Em )| ≤ n1+δ . Now con-
sider the matrix B1 B2 . By collecting matrix coefficients of monomials we can
write B1 B2 = m∈{x0 ,x1 }n2 Bm m.
We now analyze the matrices Bm . Crucially, by the homogeneity condition
on the circuit C, we can partition V = V1 ∪ V2 ∪ . . . V , where each gate g in Vi
computes a linear form nj=1 γj yj and γj ∈ Fx0 , x1 is a homogeneous degree
di polynomial. Let si = |Vi |, 1 ≤ i ≤ . Then we have s = s1 + s2 + . . . s . Every
monomial m has a unique prefix of lengthdi for each degree di associated with
the gates in V . Thus, we can write Bm = j=1 Bm,j,1 Bm,j,2 , where Bm,j,1 is the
n × sj matrix corresponding to the dj -prefix of m and Bm,j,2 is the sj × n matrix
corresponding to the n2 − dj -suffix of m. It follows that for each monomial m
74 V. Arvind, S. Raja, and A.V. Sreejith

the rank of Bm is bounded by s. Putting it together, for each monomial m we

have Am = Bm + Em , where Bm is rank s and | ∪m∈{x0 ,x1 }n2 supp(Em )| ≤ n1+δ .
This contradicts the fact that A is a rigid deck.

Remark 2. For the matrix M = (mij ), as deﬁned above, it does not seem that
Shoup-Smolensky dimension method [10] can be used to prove a similar lower
bound. To see this, suppose ΓM (n) is the set of all monomials of degree n in
{mij } and let DM (n) be the dimension of the vector space over F spanned by the
set ΓM (n). The upper bound for DM (n) that we can show for a depth d and size
O(n) linear circuit over the ring Fx0 , x1 is as large as ( O(n) dn
d ) . This bound,
unfortunately, is much larger than the bounds obtainable for the commutative
case [10]. On the other hand, the lower bound for DM (n) is only nΘ(n) . Thus,
we do not get a superlinear size lower bound for the size using Shoup-Smolensky
dimensions when the coeﬃcient ring is Fx0 , x1 .

We next consider homogeneous depth 2 linear circuits. These are linear circuits
of depth 2, where each addition gate can have unbounded fanin. More precisely,
gt is an addition gate with inputs from g1 , g2 , . . . , gt then the gate g computes
if
i=1 αi gi , where each edge (gi , g) is labeled by αi ∈ Fx0 , x1 such that αi , 1 ≤
i ≤ t are all homogeneous polynomials of the same degree. We again consider the
problem of computing M Y for M ∈ Fn×n x0 , x1 . The goal is to lower bound
the number of wires in the linear circuit. This problem is also well studied for
linear circuits over ﬁelds and only an explicit Ω(n log2 n/ log log n) lower bound is
known for it [6,9], although for random matrices the lower bound is Ω(n2 / log n).
We show that for the explicit matrix M as deﬁned above, computing M Y by
n2
a depth 2 homogeneous linear circuit (with unbounded fanin) requires Ω( log n)
wires.

Theorem 6. Let M ∈ Fn×n 2 x0 , x1 as deﬁned in Equation 1. Any homogeneous

n2
linear circuit C of depth 2 computing M Y requires Ω( log n ) wires.

Proof. Let C be a homogeneous linear circuit of depth 2 computing M Y . Let

w(C) denote the number of wires in C. Let s be the number of gates in the
middle layer of C. We can assume without loss of generality that, all input to
output paths in C are of length 2 and hence pass through the middle layer. A
level 1 edge connects an input gate to a middle-layer gate and a level 2 edge is

from middle layer to output. Thus, we can factorize M = M ∗ M where the

matrix M is in Fn×s x0 , x1 and M is in Fs×n x0 , x1 , and the complexity of

C is equivalent
to total number of nonzero entries in M and M . As before,
write M = m∈{x0 ,x1 }n2 Am m.
2
Given Am for m ∈ {x0 , x1 }n , we show how to extract from C a depth-2 linear
circuit over the ﬁeld F, call it C (m) , that computes Am such that the number of
wires in C (m) is at most the number of wires in C. Indeed, we do not add any
new gate or wires in obtaining C (m) from C.
For each gate g in the middle layer, there are at most n incoming edges and n
outgoing edges. As C is a homogeneous linear circuit we can associate a degree dg
On Lower Bounds for Multiplicative Circuits and Linear Circuits 75

to gate g. Each edge (i, g) to g is labeled by a homogeneous degree-dg polynomial

αi,g in Fx0 , x1 . Likewise, each edge (g, j) from g to the output layer is labeled
by a degree (n2 − dg ) homogeneous polynomial βg,j . Let m = m1 m2 , where m1
is of degree dg and m2 of degree n2 − dg . For each incoming edge (i, g) to g we
keep as label the coefficient of the monomial m1 in αi,g and for outgoing edge
(g, j) from g we keep as label the coefficient of the monomial m2 in βg,j . We
do this transformation for each gate g in the middle layer of C. This completes
the description of the depth-2 circuit C (m) . By construction it is clear that C (m)
computes Am and the number of wires w(C (m) ) in C (m) is bounded by w(C) for
2 2
each monomial m ∈ {x0 , x1 }n . However, {Am | m ∈ {x0 , x1 }n } is the set of all
0-1 matrices over F and it is known that there are n × n 0-1 matrices Am such
n2
that any depth-2 linear circuit for it requires Ω( log n ) wires (e.g. see [6]). Hence,
2
n
the number of wires in C is Ω( log n ).

If we restrict the edge labels in the linear circuit computing M Y to only constant-
degree polynomials, then we can obtain much stronger lower bounds using Nisan’s
lower bound technique for noncommutative algebraic branching programs. We
can define the matrix M as follows. Let Mij = wij wij R
, where wij ∈ {x0 , x1 }2 log n
and 1 ≤ i, j ≤ n are all distinct monomials of degree 2 log n. We refer to M as
a palindrome matrix. All entries of M are distinct and note that each entry of
M Y can be computed using O(n log n) gates.
Theorem 7. Any linear circuit over Fx0 , x1 computing M Y , where edge labels
n2
are restricted to be constant-degree polynomials, requires size Ω( log n ).

Proof. Let C be such a linear circuit computing M Y . Since edges can be la-
beled by constant-degree polynomials, we can ﬁrst obtain a linear circuit C
computing M Y such that each edge is labeled by a homogeneous linear form.
The size size(C ) = O(size(C) log n). From C , we can obtain a noncommuta-
tive algebraic branching program Ĉ that computes the palindrome polynomial
R
w∈{x0 ,x1 }2 log n ww such that size(Ĉ) = O(size(C )). By Nisan’s lower bound
2
n
[8] size(Ĉ) = Ω(n2 ), which implies size(C) = Ω( log n ).

Theorem 8. Any linear circuit, whose edge labels are restricted to be either
a homogeneous degree 4 log n polynomial or a scalar, computing M Y requires
Ω(n2 ) size, where M is the palindrome matrix. Moreover, there is a matching
upper bound.
Proof. Let C be any linear circuit computing M Y . Each entry m ij of
the matrix
M can be written as sum of products of polynomials mij = ρij e∈ρij l(e)
where ρij is a path from input yj to output gate i in C and l(e) is the label of
edge e in C. Let S be set of all edge labels in C with degree 4 log n polynomial.
Thus, each mij is a linear combinations of elements in the set S over F. This
implies that mij ∈ Span(S) where i ≤ i, j ≤ n. Since all mij are distinct,
2
|S| ≥ n2 . Since fan in is 2, size(C) ≥ n2 = Ω(n2 ).
For upper bound, we use n2 edges (n edges starting from each input yi ) each
labeled by a corresponding monomial in M (of degree 4 log n) and then we add
76 V. Arvind, S. Raja, and A.V. Sreejith

relevant edges to get the output gates. Thus, upper bound is O(n2 ) for computing
MY .

Note that, since we have not used noncommutativity in the proof, Theorem 8
also holds in the commutative settings (we require Ω(n2 ) entries of M to be
distinct).

7 Concluding Remarks
For multiplicative circuits we could prove lower bounds only for large monoids
and large groups. The main question here is whether we can prove lower bounds
for an explicit function f : S n → S m , for some constant size nonabelian group
or monoid S.
We introduced the notion of rigidity for decks of matrices, but the only
2
explicit example we gave was the trivial one with a deck of size 2n . A natural
question is to give explicit constructions for smaller rigid decks of n × n
matrices, say of size n! or less. Or is the construction of rigid decks of smaller
size equivalent to the original matrix rigidity problem?

Acknowledgments. We are grateful to the referees for their detailed comments

and useful suggestions.

References
1. Boyar, J., Find, M.G.: Cancellation-free circuits in unbounded and bounded depth.
In: Gasieniec,
L., Wolter, F. (eds.) FCT 2013. LNCS, vol. 8070, pp. 159–170.
Springer, Heidelberg (2013)
2. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A.,
Shelat, A.: The smallest grammar problem. IEEE Transactions on Information
Theory 51(7), 2554–2576 (2005)
3. Jukna, S., Sergeev, I.: Complexity of linear boolean operators. Foundations and
Trends in Theoretical Computer Science 9(1), 1–123 (2013)
4. Lipton, R.J., Zalcstein, Y.: Word problems solvable in logspace. Journal of the
ACM (JACM) 24(3), 522–526 (1977)
5. Lohrey, M.: Algorithmics on slp-compressed strings: A survey. Groups Complexity
Cryptology 4(2), 241–299 (2012)
6. Lokam, S.V.: Complexity lower bounds using linear algebra. Foundations and
Trends in Theoretical Computer Science 4(1-2), 1–155 (2009)
7. Morgenstern, J.: Note on a lower bound on the linear complexity of the fast fourier
transform. Journal of the ACM (JACM) 20(2), 305–306 (1973)
8. Nisan, N.: Lower bounds for non-commutative computation (extended abstract).
In: STOC, pp. 410–418 (1991)
9. Pudlak, P.: Large communication in constant depth circuits. Combinatorica 14(2),
203–216 (1994)
10. Shoup, V., Smolensky, R.: Lower bounds for polynomial evaluation and interpola-
tion problems. Computational Complexity 6(4), 301–311 (1996)
11. Valiant, L.G.: Graph-theoretic arguments in low-level complexity. In: Gruska, J.
(ed.) MFCS 1977. LNCS, vol. 53, pp. 162–176. Springer, Heidelberg (1977)
Testing Low Degree Trigonometric Polynomials

Martijn Baartse and Klaus Meer

Computer Science Institute, BTU Cottbus-Senftenberg

Platz der Deutschen Einheit 1
D-03046 Cottbus, Germany
baartse@tu-cottbus.de, meer@informatik.tu-cottbus.de

Abstract. We design a probabilistic test verifying whether a given table

of real function values corresponds to a trigonometric polynomial f :
F k → R of certain (low) degree. Here, F is a ﬁnite ﬁeld. The problem
is studied in the framework of real number complexity as introduced by
Blum, Shub, and Smale. Our main result is at least of a twofold interest.
First, it provides one of two major lacking ingredients for proving a real
PCP theorem along the lines of the proof of the original PCP theorem
in the Turing model. Secondly, beside the PCP framework it adds to the
still small list of properties that can be tested in the BSS model over R.

1 Introduction
Probabilistically checkable proofs and property testing represent some of the
most important areas in theoretical computer science within the last two decades.
Among the many deep results obtained one highlight is the proof of the PCP
theorem [2,3] giving an alternative characterization of complexity class NP in
the Turing model. An alternative proof of the theorem has been given by Dinur
more recently [10].
A branch of computability and complexity theory alternative to the Turing
approach and dealing with real and complex number computations has been
developed by Blum, Shub, and Smale in [8], see also [7]. It presents a model of
uniform algorithms in an algebraic context following the tradition in algebraic
complexity theory [9]. As major problem both for real and complex number
complexity theory the analogue of the classical P versus NP question remains
unsolved in the BSS model as well. We assume the reader to be familiar with
the basic definitions of complexity classes in this model, see [7].
Given the tremendous importance probabilistically checkable proofs and the
PCP theorem exhibit in classical complexity theory, for example, with respect
to the areas of property testing and approximation algorithms it seems natural
to analyze such verification procedures and the corresponding classes as well
in the BSS model. This refers both to the question which kind of interesting
properties can be verified with high probability in the algebraic framework and
to the validity of PCP theorems.

Both authors were supported under projects ME 1424/7-1 and ME 1424/7-2 by the
Deutsche Forschungsgemeinschaft DFG. We gratefully acknowledge the support.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 77–96, 2014.

c Springer International Publishing Switzerland 2014
78 M. Baartse and K. Meer

Starting point of this paper is the original proof of the classical PCP theorem
for NP [2,3]. It consists basically of three major steps. The easiest is the con-
struction of so called long transparent proofs for problems in NP. It relies on
testing linear functions. Here, given a table of function values a verifier checks
with high probability whether the table in a certain sense is close to a linear func-
tion. Next, another verification procedure is designed, this time based on testing
low-degree (algebraic) polynomials over finite fields instead of linear functions.
It is combined with a so called sum-check procedure to obtain a different verifier
for problems in NP. The two verification procedures then in a third step are
combined applying a clever new technique of composing verifiers.
Our main result in the present work is that the second step above, i.e., a
structurally similar verification procedure for certain low degree trigonometric
polynomials can be set up in the real number model as well.

1.1 Previous Work

The first result dealing with probabilistically checkable proofs in the framework
of real number computations established the existence of long transparent proofs
used as one part in both proofs of the classical PCP theorem. Given a table of
real function values on a large (i.e., superpolynomial in n ) set X ⊂ Rn it can
be verified probabilistically whether the represented table on a certain subset
of X with high probability represents a linear function. The same holds for the
complex number model, see [12,6] for those results which need considerable more
efforts than the discrete counterpart.
The real and complex PCP theorems were recently proven along the lines of
Dinur’s proof in [5]. Independently of this result we consider it an interesting
question whether as well in the BSS model a proof can be carried out that follows
the orginial one by Arora and co-authors. First, since the two classical proofs are
quite different we hope for a better insight what kind of results and techniques
can (or cannot) be transferred to the real or complex numbers as computa-
tional structures. This is a prominent general theme in BSS complexity theory.
Secondly, such investigations might open as well new research questions. Testing
algorithmically meaningful mathematical objects like it is done for trigonometric
polynomials in this paper in our eyes provides a promising example.
An attempt to extend the original proof, before the results of [5] were ob-
tained, was reported in [13]. There it is studied in how far low-degree algebraic
polynomials are useful as coding objects of real vectors in Rn . Using such poly-
nomials the main result in [13] shows NPR = PCPR (O(log n), poly log n), saying
that for each problem in NPR there is a verification procedure that generates a
logarithmic number of random bits and then inspects a polylogarithmic number
of components of a potential verification proof to make a correct decision with
high probability. A main ingredient in the proof is a test for algebraic low degree
polynomials based on work by Friedl et al. [11] and adapted to the real number
model. Basically, it says that given d, k ∈ N, a large enough finite set F ⊂ R
and a function value table for an f : F k → R, there is a verification proce-
dure accomplishing the following task: If f corresponds on F k to a multivariate
Testing Low Degree Trigonometric Polynomials 79

polynomial of degree d in each of its variables the veriﬁer accepts; if f in a precise

sense is not close to such a polynomial the verifier rejects with high probability;
as its resources the verifier uses O(k · log |F |) many random bits and inspects
O(k · d) components of the table. Choosing appropriate values for k, d, and |F |
depending on the size n of a problem instance from NPR this test is sufficient
to derive with some additional work the above mentioned characterization of
NPR . However, it seems not enough in order to obtain the full real PCP theo-
rem NPR = PCPR (O(log n), O(1)). The reason is hidden behind the structure of
the low degree test, more precisely the way the verifier accesses the components
of a verification proof. In the original approach [2,3] the verifiers testing linear
functions and low degree polynomials are cleverly composed in order to obtain
a third verifier sharing the better parameters of the two. This verifier composi-
tion crucially relies on the low-degree verifier reading the proof components in
a very structured manner, namely in constantly many blocks of polylogarithmic
many components. The above low-degree test does not have this structure. It is
in a certain sense comparable to the low-degree verifier designed in [3]. A ma-
jor amount of work towards the full PCP theorem was done in [2] in order to
optimize such a low-degree verifier to make it suitable for verifier composition.

1.2 Alternative Idea for a Test Scenario

The crucial progress made in [2] with respect to a low degree test is that given
a function value table for f : F k → F for a finite field F the test is designed
to perform certain checks along arbitrary straight lines through F k . In order to
make this idea working the structure of a finite field is essential with respect
to both the resources needed and the probability arguments. There seem to
arise major difficulties if one tries to generalize those ideas to the real numbers,
i.e., when F ⊂ R is not any longer structured. The test performed in [13] uses
paraxial lines and, as mentioned before, does not obey the structure required in
order to use it as part of a verifier composition. It is unclear how to generalize
it. One reason for this is the lacking structure of lines. Seeing F k as subset of
Rk will imply that a real line through F k will leave the set. So one likely would
have to enlarge the domain on which the initial function value table is given.
But direct approaches seem to require much too large domains then. Note that
a similar effect occured when testing linear functions over real domains in [12,6].
However, there they are of no major concern due to the way the result is used
in the proof of the full real PCP theorem [5].
A major goal we try to follow in our approach is to still make use of the prop-
erties of lines through an F k , where F is a finite field. The solution we follow is
to use other coding objects than algebraic polynomials for vectors in Rn , namely
multivariate trigonometric polynomials: The latter should, for a finite field F ,
map the set F k to R. The period of such trigonometric polynomials is taken
as the field’s cardinality q. This has the huge advantage that in the domain it
does not matter whether we consider arguments as real numbers or as finite field
elements. As nice consequence, straight lines through F k correspond to lines in
Rn modulo that period. Though this gives back at least some of the advantages
80 M. Baartse and K. Meer

dealing with finite fields new difficulties arise that way. The major drawback is
the following. All the above mentioned tests rely on restricting the given func-
tion table to one-dimensional subsets, thus working with univariate polynomials
during the test. However, in contrast to algebraic polynomials the degree of a
univariate restriction of such a trigonometric polynomial to an arbitrary line in
F k is not bounded by its original multivariate degree. Depending on the line
chosen the degree of such a restriction can grow too much. This implies that
not all lines are appropriate as restrictions to work with. As consequence, the
design of a suitable set H ⊂ F k of directions of test-lines and the analysis of
a corresponding test require considerable additional technical efforts. The latter
are twofold. First, using the theory of expander graphs one has to establish that
the set H is small, but still rich enough to cover in a reasonable sense all F k .
Secondly, it must be shown that a function table which does not give errors on
H with high probability is close to a trigonometric polynomial on F k .
As main result we obtain a verification procedure for trigonometric polyno-
mials that inspects a constant number of relatively small blocks of proof compo-
nents, thus giving a low degree test which respects the structural requirements
necessary for verifier composition. Independently of this aspect, we extend the
still small list of interesting real number properties for which a probabilistic ver-
ification is possible. In particular, as far as we know trigonometric polynomials
have not yet been used in the realm of real number complexity theory. Given
the huge importance of Fourier analysis this might be interesting to be studied
further.
The paper is organized as follows. Section 2 introduces trigonometric poly-
nomials that map elements from a k-dimensional vector space over a finite field
into the reals. The main task of testing whether a table of real values arises from
such a polynomial is described precisely and a test to figure this out is given.
The rest of the paper then is devoted to prove that the test has the required
properties. Towards this aim two major theorems have to be shown; this is done
in Section 3. The concluding section will discuss the importance of the result in
view of (re-)proving the real PCP theorem; we explain how to choose the pa-
rameters in our statements to use it for PCPs and outline what has to be done
to obtain the full PCP theorem over R.
A final remark: Some of the proofs necessary to establish our main result are
quite technical. Given the page restriction in this extended abstract we thus
focus on presenting the main proof structure. Full proofs have to be postponed
in most cases to the full paper.

2 Trigonometric Polynomials, Problem Task, and Main

Result

In this section we will describe a probabilistic test performed by a real veriﬁer

to check whether a given multivariate function is close to a trigonometric poly-
nomial of low degree. In the following sections we show the main result which
speciﬁes query complexity and success probability of the veriﬁer.
Testing Low Degree Trigonometric Polynomials 81

Let us start with defining the main objects of this paper, namely trigonometric
polynomials. Let F be a finite field with q := |F | being prime. As usual, we
identify F with {0, . . . , |F | − 1}. We want to consider particular real valued
functions defined on some F k .

Definition 1. a) Let F be a finite field as above with cardinality q. For d ∈ N

a univariate trigonometric polynomial of degree d from F to R is a function f
of form
d
x x
f (x) = a0 + am · cos(2πm ) + bm · sin(2πm ),
m=1
q q
where a0 , . . . , ad and b1 , . . . , bd are elements in R.
b) For k ∈ N a multivariate trigonometric polynomial f : F k → R of max-
degree d is deﬁned recursively via

d
f (x1 , . . . , xk ) = a0 (x1 , . . . , xk−1 ) + am (x1 , . . . , xk−1 ) · cos(2πm xqk )+
m=1

d
+ bm (x1 , . . . , xk−1 ) · sin(2πm xqk ),
m=1

where the ai , bj are trigonometric polynomials of max-degree d in k − 1 variables.

Alternatively, one can write

k
f (x1 , . . . , xk ) = ct · exp(2πi xj tj ),
t j=1

where the sum is taken over all t := (t1 , . . . , tk ) ∈ Zk with |t1 | ≤ d, . . . , |tk | ≤ d
and ct ∈ C satisfy ct = c−t for all such t.

In all situations below the potential degrees we work with will be much less than
the ﬁeld’s cardinality.
Since we shall mainly deal with trigonometric polynomials in this paper we
drop most of the times the term ’trigonometric’. Whenever we refer to usual
algebraic polynomials we state it explicitly.
The ultimate goal of this paper is to design a veriﬁer which performs a test
whether a given table of function values is generated with high probability by a
multivariate polynomial. More precisely, the following is our main result.

Theorem 1 (Main Theorem). Let d ∈ N, h = 1015 , k ≥ 32 (2h + 1) and let F

be a finite field with q := |F | being a prime number larger than 104 (2hkd + 1)3 .
We fix these values for the entire paper. There exists a probabilistic verification
algorithm in the BSS-model of computation over the reals with the following
properties:
1. The verifier gets as input a function value table of a multivariate function
f : F k → R and a proof string π consisting of at most |F |2k segments
(blocks). Each segment consists of 2hkd + k + 1 real components.
82 M. Baartse and K. Meer

The verifier first uniformly generates O(k·log q) random bits; next, it uses
the random bits to determine a point x ∈ F k together with one segment in
the proof string it wants to read. Finally, using the values of f (x) and those
of the chosen segment it performs a test (to be described below). According
to the outcome of the test the verifier either accepts or rejects the input.
The running time of the verifier is polynomially bounded in the quantity
k · log q, i.e., polylogarithmic in the input size O(k · q 2k ).
2. For every function value table representing a trigonometric max-degree d
polynomial there exists a proof string such that the verifier accepts with prob-
ability 1.
3. For any 0 < < 10−19 and for every function value table whose distance
to a closest max-degree 2hkd polynomial is at least 2 the probability that
the verifier rejects is at least , no matter what proof string is given. Here,
for two functions f, g : F k → R their distance is defined as dist(f, g) :=
|F k | · |{x ∈ F |f (x) = g(x)}|.
1 k

The first and the second property in the theorem will follow directly from the
description of the test. Proving the last property - as usual with such statements
- is the main task. Repeating the verifier’s computation constantly many times
decreases the error probability below any given fixed positive constant.
Performing (one round of) the test the verifier reads 2hkd+k+1 real numbers.
Thus, it can only test for a local property of low degree polynomials f : F k → R.
A major amount of work will be to figure out what this local property should
look like. The starting idea is common for low degree tests, namely to consider
univariate restrictions along certain lines of F k . The segments of a proof string
mentioned above precisely present the coefficients of such a univariate restriction.
An advantage using a finite field as domain is that such lines only contain |F |
many points. So we do not have to deal with the problem of splitting the domains
into a large test domain and a small safe domain as it is, for example, the case
with the real linearity test from [12]. On the other hand, it will turn out not to
be a good idea to allow any arbitrary line in F k for such a test as it is done in
the classical approach [2]. A fair amount of work will be necessary to figure out
a suitable subset H ⊂ F k of lines for which the test shall be performed.

2.1 Appropriate Domain of Test Directions; Structure of a

Veriﬁcation Proof

As mentioned above, the verifier expects each segment of the proof to specify a
univariate polynomial of appropriate degree on a line. Since univariate restric-
tions of trigonometric polynomials along a line behave a bit differently than of
algebraic polynomials some care is necessary. Let f : F k → R be a (trigonomet-
ric) max-degree d polynomial and let := {x + tv|t ∈ F }, x, v ∈ F k be a line. For
determining an upper bound of the degree of the univariate restriction of f on
it turns out to be helpful to define a kind of absolute value for the elements of F .
The definition is inspired by the fact that if we later on restrict a trigonometric
Testing Low Degree Trigonometric Polynomials 83

polynomial to lines with small components in absolute value the resulting uni-
variate polynomials have a relatively small degree. For t ∈ F = {0, . . . , |F | − 1}
put
t if t < |F |/2
|t| = .
|F | − t if t > |F |/2
If a univariate polynomial p : F → R has degree d, then for a, b ∈ F the
polynomial t → p(a + bt) has degree at most d · |b| and thus t → f (x + tv) has
k
degree at most d · i=1 |vi | , where v = (v1 , . . . , vk ). This is an easy consequence
of Definition 1.
For the test performed by the verifier we want to specify a suitable set H ⊂ F k
of lines along which univariate restrictions are considered. Suitable here refers
to the maximal degree such a restriction could have given a max-degree d multi-
variate polynomial. This maximal degree in a certain sense should be small. The
constant parameter h in Theorem 1 determines what we mean by small. Though
h = 1015 of course is a large constant the decisive point is its independence of
d, k, q.
Definition 2. Let F be a finite field, k ∈ N and h be as in Theorem 1. The set
H is defined to be any subset of F k \ {0} satisfying the following two conditions:
i) For every 0 = v := (v1 , . . . , vk ) ∈ H we have |v| := max{|v1 |, . . . , |vk |} ≤ h
and
ii) if for a fixed v ∈ F k several points in the set {tv|t ∈ F } satisfy condition i)
only one of them is included in H.
Condition i) requires the direction of lines that we consider to have small com-
ponents, whereas condition ii) just guarantees that each line (as point set) is
included at most once. We abstain from specifying which v we include in such a
case and just fix H as one such set.
If a k-variate polynomial of max-degree d is restricted to a line {x + tv|t ∈ F }
whose direction v belongs to H, then the resulting univariate polynomial has
degree at most hkd. Note that for the values chosen hkd is much smaller than
|F |
2 . In later arguments the cardinality of H will be important, so let us say
something about it already now. For h sufficiently smaller than |F | there are
(2h + 1)k − 1 elements v ∈ F k − {0}k such that |v| ≤ h. For every such v it is
| − v| ≤ h, therefore 12 ((2h + 1)k − 1) is an upper bound for |H|. It can be shown
|H|
that for increasing k the fraction 1 ((2h+1) k −1) approaches 1.
2
Given a table of function values for an f : F k → R the verifier now expects
the following information from a potential proof that f is a trigonometric poly-
nomial of max-degree d. For every line in F k which has a direction v ∈ H
the proof should provide a segment of real numbers which represent a univariate
polynomial as follows. The segment consists of a point x ∈ F k on the line as well
as reals a0 , . . . , ahkd , b1 , . . . , bhkd . The verifier will interpret this information as
the univariate polynomial with coefficients ai , bj that ideally, i.e., for a trigono-
metric polynomial, represents f ’s restriction to parametrized as t → f (x + tv).
Obviously, there are several different parametrizations depending on the point x.
84 M. Baartse and K. Meer

Any such point x ∈ F k and direction v ∈ H deﬁne a line v,x = {x + tv : t ∈ F }.

Actually, for all y ∈ v,x , v,y = v,x . Thus, there are |F | possible parametriza-
tions of a same line. We only code one of them, that is we arbitrarily fix an
origin. To be able to discuss about points on these lines, without referring to
the parametrization that is used, we define for all y ∈ v,x the value τy ∈ F such
that x + τy v = y, that is the abscissa of y on v,x . In this way, we can completely
forget the origins of the lines we consider.
The total length of a proof as described above is easily calculated. The point
x ∈ F k requires k reals, thus each segment in the proof needs 2hkd + k + 1 reals.
Each of the |H| directions contributes |F |k−1 different (parallel) lines1 , so the
total number of segments is |H| · |F |k−1 and the length of the proof is upper
bounded by (2hkd+k +1)· 12 ((2h+1)k −1)·|F |k−1 . For our choices of parameters
k, d, |F | this is smaller than |F |2k .
We are now prepared to describe the test in detail:
Low degree test: Let F and H be as above.
Input: A function f : F k → R, given by a table of its values; a list of univariate
trigonometric polynomials defined for each line in F k with direction in H and
specified by its coefficients as described above.

1. Pick uniformly at random a direction v ∈ H and a random point x ∈ F k ;

2. compute deterministically the unique segment in the proof that speciﬁes
the univariate polynomial pv,x (t) which is deﬁned on the line through x in
direction v; 2
3. if f (x) = pv,x (τx ) accept, otherwise reject.

Since F is discrete the objects picked in Step 1 can be addressed using O(k ·
log |F |) random bits. Note that this ﬁrst step is the same as saying we pick
a random direction v ∈ H together with a random line among those having
direction v. There are |F | many points on each such line, i.e., |F | choices for x
result in the same line.

3 Proof of Main Theorem

The proof of Theorem 1 relies on two major theorems together with several
propositions and lemmas. Let us outline the main ideas behind those theorems.
The first one is Theorem 2. It states that the rejection probability of the low-
degree test is about proportional to the distance δ of the given function f to a
closest max-degree 2hkd polynomial. However, this will hold only in case that
this distance δ is not too big. For too large distances the lower bound which the
1
Here we need that no element in H is a multiple of another one and that each line
contains |F | points.
2
In order to evaluate such a polynomial the verifier could use cos 2π
q
and sin 2π
q
as
machine constants. For varying input sizes we address this point again in the final
section.
Testing Low Degree Trigonometric Polynomials 85

theorem gives for the rejection probability gets very small (and even can become
negative). So intuitively the theorem states that if the rejection probability is
small, then f is either very close or very far away from a max-degree 2hkd
polynomial.

Theorem 2. Let δ denote the distance of a function f : F k → R to a closest

max-degree 2hkd polynomial. Then for any proof the probability that the low-
degree test rejects is at least

2h 4h2 k 2 d
(1 − δ) − δ.
2h + 1 |F | − 1

Thus we have to deal with the case that though the rejection probability might be
small given the above lower bound f is far away from such a polynomial. Theorem
3 basically shows that this case cannot occur using the following idea: if for a
function f and a proof string π the probability of rejection is small, then f and π
can be changed in a number of small steps such that these changes do not increase
the rejection probability too much and in the end a max-degree 2hkd polynomial
fs is obtained. Since by Theorem 2 such a transformation process would not be
possible if f would be far away from any max-degree 2hkd polynomial (the
process would have to cross functions for which the test rejects with higher
probability), it follows that a reasonably small rejection probability only occurs
for functions f that were already close to a max-degree 2hkd polynomial.

Theorem 3. Let 0 < ≤ 10−19 and let a function f0 together with a proof π0
be given. If the low-degree test rejects with probability at most , then there is a
sequence (f0 , π0 ), (f1 , π1 ), . . . , (fs , πs ) such that

1. for every i ≤ s the probability that the test rejects input (fi , πi ) is at most
2,
2. for every i < s the functions fi and fi+1 diﬀer in at most |F | arguments and
3. the function fs is a max-degree 2hkd polynomial.

Note that it is only the existence of such a sequence that we need for the proof
of Theorem 1, it does not describe anything that the test does. So it does not
have to be (eﬃciently) computable.
Assuming validity of both theorems the Main Theorem can be proven as
follows:

Proof. (of Main Theorem 1) Statements 1. and 2. of the theorem being obvious
from the foregoing explanations let us assume we are given a function value table
for f : F k → R together with a veriﬁcation proof π such that the low-degree
test rejects with a probability at most ≤ 10−19 . We will show that the distance
δ from f to the set of max-degree 2hkd polynomials is at most 2. In order to
avoid tedious calculations the arguments are given on a slightly more abstract
level based on continuity, however it is no problem to ﬁll in the precise values
for δ in all cases.
86 M. Baartse and K. Meer

Since h is large and |F | ! 4h2 k 2 d Theorem 2 reads (c1 (1 − δ) − c2 ) · δ ≤ ,

where constant c1 is close to 1 and c2 is close to 0. This implies that either
δ is at most slightly larger than or δ is close to 1. We want to exclude the
second case. Assume for a contradiction that δ is close to 1. By Theorem 3 there
exists a function f and a veriﬁcation proof π such that the probability that
the test rejects the pair (f , π ) is at most 2 and the distance δ from f to
the closest max-degree 2hkd polynomial is close to 12 . This is true because each
new element (fi , πi ) in the sequence reduces the number of errors by at most
|F |. Now choose the number of reducing steps such that the number of errors is
reduced from δ|F |k to approximately 12 |F |k . This must happen because ﬁnally
a polynomial is obtained. Again using Theorem 2 it follows that the test must
reject (f , π ) with a probability which is at least about 14 . This contradicts the
fact that (f , π ) is rejected with probability at most 2 and we have reached the
desired contradiction. 2

3.1 Proof of Theorem 2

We shall now turn to the proofs of the two major theorems. Especially that
for Theorem 3 needs considerable additional technical efforts. As said in the
introduction due to the page limit some of those additional results will only be
stated in the present section. Lacking proofs are postponed to a full version.
Let us first prove Theorem 2. Suppose we are given a function f : F k → R
with (exact) distance δ > 0 to a closest max-degree 2hkd polynomial f˜. Let π
denote an arbitrary verification proof specifying univariate polynomials pv,x as
described above. Define the set U ⊂ F k as those points where f and f˜ disagree.
Thus |U | = δ|F |k . The idea behind our analysis is to guarantee the existence of
relatively (with respect to δ) many pairs (x, y) ∈ U × Ū that are located on a
line with direction v ∈ H. More precisely, we shall consider the set C of triples
(x, v, y) with x ∈ U, v ∈ H, and y ∈ Ū = F k \ U on the line v,x . The goal
is to show that C contains many triples (x, v, y) for which pv,x (τx ) = f (x) or
pv,x (τy ) = f (y). 3
Due to x ∈ U and y ∈ Ū for any (x, v, y) ∈ C one of the following two
alternatives holds:
1. The polynomial pv,x is different from f˜ on v,x but agrees with f˜ in y.
2. The polynomial pv,x disagrees with f in x or in y.
Claim: Alternative 1. is satisfied by at most δ|F |k · |H| · 4h2 k 2 d triples from C.
Proof of Claim: Since f˜ is a max-degree 2hkd polynomial, the restriction of f˜
to any line yields a univariate degree 2h2 k 2 d polynomial. By usual interpolation
for trigonometric polynomials different univariate polynomials of degree at most
2h2 k 2 d agree in at most 4h2 k 2 d points. Applying this argument to each of the
3
To avoid misunderstandings we point out once more that here we mean the value of
the univariate polynomial τ → pv,x (τ ) in the respective arguments τx and τy that
on the corresponding line through x and y in direction v result in points x and y,
respectively.
Testing Low Degree Trigonometric Polynomials 87

|H| lines and taking into account that there are |U | = δ|F |k choices for x there
can be at most
δ|F |k · |H| · 4h2 k 2 d
triples in C satisfying the first alternative.
Next we aim for a lower bound on the total number of triples in C. This easily
implies a lower bound on the number of triples satisfying the second alternative;
from that a lower bound for the probability that the test rejects can be obtained.
A lower bound on |C| is given in the following proposition. It basically says
that the set H of test directions is sufficiently large. Its proof requires consider-
able technical efforts relying on the theory of expander graphs, so we just give
the statement here.

Proposition 1. Let U ⊆ F k as above (or any other arbitrary set with cardinal-
ity δ|F |k ) with 0 ≤ δ ≤ 1. Then there are at least

2h
δ(1 − δ)|H|(|F | − 1)|F |k
2h + 1

pairs (x, y) ∈ U × (F k − U ) such that the line through x and y has direction in
H.

This proposition together with the above claim implies that the number of triples
in C satisfying the second alternative is at least
2h
δ(1 − δ)|H|(|F | − 1)|F |k − δ|F |k · |H| · 4h2 k 2 d.
2h + 1
In order to ﬁnish the proof of Theorem 2 an alternative view of the low-degree
test helps. The test can as well be seen as ﬁrst choosing randomly two points x, y
such that they determine a direction v ∈ H. Since there are |H| directions and
|F | points on each line there are |H|(|F | − 1)|F |k such triples (x, v, y) in total.
Then, with probability 12 the test decided whether to check if pv,x (τx ) = f (x)
or if pv,x (τy ) = f (y). Since triples in C that satisfy alternative 2 result in an
error for the low-degree test if the appropriate point for evaluation is chosen, its
probability for rejection is at least

2h 4h2 k 2 d
(1 − δ) − δ.
2h + 1 |F | − 1

Half of this value is contributed by triples (x, v, y) ∈ C for which pv,x (τx ) = f (x)
or pv,x (τy ) = f (y). The other half arises from triples (y, v, x) for which (x, v, y) ∈
C and pv,x (τx ) = f (x) or pv,x (τy ) = f (y). 2

3.2 Proof of Theorem 3

In order to prove Theorem 3 reducing the inconsistency of a proof string plays

an important role. This inconsistency is deﬁned as follows.
88 M. Baartse and K. Meer

Deﬁnition 3. The inconsistency of a proof string π is the fraction of triples

(x, v, v ) ∈ F k × H × H for which pv,x (τx ) = pv ,x (τx ).
The attentive reader might wonder that in the definition ordered triples are
counted and that v = v is not excluded, so the inconsistency can never equal 1.
These are only technical issues to make some calculations below easier.
In proving the existence of a sequence {(fi , πi )}i as in the theorem’s statement
the main idea is to find for a current pair (f, π) a segment in π that can be
changed in a way which decreases the inconsistency. After that, also f is changed
to a function f which fits the best to the new proof string π . The best fit is
defined using majority decisions.

Deﬁnition 4. Let π be a proof string for verifying a low-degree polynomial as

explained above. As usual, denote by {pv,x |v ∈ H, x ∈ F k } the univariate re-
strictions given with π. A majority function for π is a function f : F k → R such
that for all x ∈ F k the value f (x) maximizes |{v ∈ H|pv,x (τx ) = f (x)}|.

Ties can be broken arbitrarily in the deﬁnition of a majority function, so it does

not necessarily have to be unique.
When proving Theorem 3 the sequence (f0 , π0 ), . . . , (fs , πs ) is deﬁned by ﬁrst
changing πi to πi+1 on a single proof segment and then taking fi+1 as a majority
function for πi+1 . During this process there should occur no pair (fi , πi ) for which
the rejection probability is more than twice as large as the rejection probability
of the initial (f0 , π0 ). The following easy lemma will be useful in this context.
It relates the rejection probability of the low-degree test for a pair (f, π) with
inconsistency of π.
Lemma 1. If f is a majority function for π, then the inconsistency of π is at
least as large as the rejection probability of the low-degree test for (f, π) and at
most twice as large.

Proof. For a fixed x ∈ F k let mx be the the number of v ∈ H such that the
value pv,x (τx ) specified by π satisfies pv,x (τx ) = f (x). The rejection probability
. Now for any v ∈ H there cannot be more than mx many v ∈ H
mx
is 1 − |F kx|·|H|
such that (x, v, v ) satisfies pv,x (τx )
= pv ,x (τx ). Applying this to the |H| many
directions v there are at most |H| · x mx triples that do not contribute to π’s
inconsistency. Rearranging shows that the inconsistency is at least as large as
the rejection probability.
Vice versa, for fixed x there must be an H ⊂ H with |H | = mx such that
for all v, v ∈ H the m2x many equations pv,x (τx ) = pv ,x (τx ) are satisfied. Thus
2
x mx
the inconsistency is upper bounded by 1 − . The
latter is easily shown
|F k |·|H|2

mx
to be at most twice the rejection probability, i.e., 2 · 1 − |F kx|·|H| . 2

Without loss of generality we may assume that for the inputted pair (f0 , π0 )
function f0 is already a majority function for π0 . Else we could just deﬁne the
ﬁrst pair in the sequence of (fi , πi ) by changing stepwise one value of f0 while
Testing Low Degree Trigonometric Polynomials 89

leaving proof string π0 unchanged until we have reached a majority function of

π0 . Clearly, during this process the rejection probability will not increase.
Now in each further step the inconsistency of a current proof string πi is
strictly reduced for πi+1 . This is achieved by changing only one of πi ’s segments.
Furthermore, the new function fi+1 is obtained from fi in such a way that it
becomes a majority function for πi+1 . Since inconsistency is reduced step by
step Lemma 1 implies that for every i ≤ s the rejection probability of (fi , πi )
can be at most twice as large as the rejection probability of (f0 , π0 ). Of course,
we have to guarantee that the way the fi ’s are changed ﬁnally turn them into a
trigonometric polynomial.
The following proposition is a key stone to make the above idea precise.

Proposition 2. Let π be a proof string obeying the general structure required

for the low-degree test and having inconsistency at most 2, where ≤ 10−19 . Let
f be a majority function of π which is not already a max-degree 2hkd polynomial.
Then there exists a proof string π such that π diﬀers from π in only one
segment and its inconsistency is strictly less than that of π.

The proof needs several additional technical results. Let us first collect them and
then prove the proposition. The following definition specifies certain point sets
important in the further analysis.

Definition 5. Let a pair (f, π) as above be given. Let α := 10−2 for the rest of
the paper.
a) Define S ⊆ F k to consist of those points x for which the fraction of directions
v ∈ H satisfying pv,x (τx ) = f (x) is less than 1 − α.
b) For v ∈ H define S(v)
⊆ F k as S(v) := {x ∈ F k |x ∈ S and pv,x (τx ) =
f (x)}.

The set S contains those points for which there are relatively many, namely
at least α|H|, inconsistencies between diﬀerent line polynomials through x and

the value f (x). The set S(v) on the other hand contains the points for which
most of the line polynomials agree with f on x, but the particular pv,x does
not. As consequence, the latter disagrees with most of the others with respect to
point x.
The main purpose using the following proposition is to pick out a line along
which the given proof can be changed in such a way that its inconsistency re-
duces. For obtaining this line v∗ ,x∗ the objects x∗ and v ∗ are determined by the
following crucial proposition. Due to its length and technical feature the proof
has to be postponed to the full version. We also note that at this point the
signiﬁcance of Proposition 3 may be hard to see. Its meaning will become clear
in the proof of Proposition 2.

Proposition 3. Let π be a proof string as in Proposition 2. There exist x∗ ∈ F k ,

v ∗ ∈ H and a set H ⊆ H such that
90 M. Baartse and K. Meer

1. ∗ );
x∗ ∈ S(v
2. at most 40 1
α · |F | points on v∗ ,x∗ belong to S;

3. |H | ≥ (1 − 4α)|H| and
4. for all v ∈ H
i) the fraction of pairs (t, s) ∈ F 2 for which pv∗ ,x∗ +sv (τx∗ +tv∗ +sv ) =
pv,x∗ +tv∗ (τx∗ +tv∗ +sv ) is at most 14 and
ii) the fraction of s ∈ F for which pv∗ ,x∗ +sv (τx∗ +sv ) = pv,x∗ (τx∗ +sv ) is at
most 12 .

Fig. 1. The ﬁgure shows those values that are compared in the fourth item of Propo-
sition 3

The second technical result that we need is a direct adaption of a similar lemma
by Arora and Safra [3] to trigonometric polynomials. It says that if the entries
of an |F | × |F | matrix both row-wise and column-wise arise to a large extent
from univariate polynomials, then the majority of values of the entire matrix
arise from a bivariate polynomial.
Lemma 2. (see [2], adapted for trigonometric polynomials) Let d˜ ∈ N, |F | ≥
104 (2d˜ + 1)3 . Suppose there are two sets of univariate trigonometric degree d˜
polynomials {rs }s∈F and {ct }t∈F such that the fraction of pairs (s, t) ∈ F 2 for
which there is a disagreement, i.e., rs (t) = ct (s), is at most 14 . Then there exists
a bivariate trigonometric max degree d˜ polynomial Q(s, t) such that for at least a
3 -fraction of rows s it holds that rs (t) ≡ Q(s, t); similarly for at least a 3 -fraction
2 2

of columns t it holds that ct (s) ≡ Q(s, t).

Having all technical auxiliary material at hand we can now prove Proposition 2
and Theorem 3.
Proof. (of Proposition 2) Let x∗ , v ∗ and H be ﬁxed according to Proposition 3.
The segment we are going to change in π is the segment claiming a univariate
polynomial on line v∗ ,x∗ . We need to show that this can be done in a way that
decreases inconsistency.
Testing Low Degree Trigonometric Polynomials 91

Consider a plane given by two directions v ∗ , v in H; note that the deﬁnition of

H implies that two different points are linearly independent and thus constitute
a plane. We want to apply Lemma 2 to every such (v ∗ , v )- plane through x∗ ,
where v ∈ H .
(v ) (v )
For such a v define rs (t) := pv∗ ,x∗ +sv (τx∗ +tv∗ +sv ) and ct (s) :=
pv ,x∗ +tv∗ (τx∗ +tv∗ +sv ). Proposition 3, item 4,i) implies that the assumptions of

Lemma 2 are satisfied for each v . Let Q(v ) denote the corresponding bivariate
polynomial of max-degree d˜ := hkd.
Note that every (v ∗ , v )-plane through x∗ contains the line v∗ ,x∗ and that for
(v )
every v ∈ H it holds r0 (t) = pv∗ ,x∗ (τx∗ +tv∗ ) independently of v . Thus we

(v )
abbreviate r0 := r0 . The idea now is to show that there exists a degree hkd
polynomial R : F → R such that R is different from r0 and for most v ∈ H the
(v )
function t → ct (0) is close to R. From the precise version of this statement it
will then follow that changing in π the segment containing r0 to R will decrease
the inconsistency.

For v ∈ H let R(v ) (t) := Q(v ) (0, t). We want to show that for many v , v ∈

H , R(v ) ≡ R(v ) . The majority of these polynomials then defines R.

Claim 1: Let v , v ∈ H . If R(v ) ≡ R(v ) , then the distance between t →
(v ) (v )
ct (0) and t → ct (0) is at least

1 2hkd
− .
3 |F |

Proof of Claim 1: By Lemma 2 R(v ) is the unique polynomial to which the
(v )
function t → ct (0) is close with a distance of at most 13 . If R(v ) ≡ R(v ) ,
then as polynomials of degree hkd they diﬀer in at least |F | − 2hkd points, thus
(v ) (v )
t → ct (0) and t → ct (0) have at least the claimed distance.
Next consider the number of inconsistencies on v∗ ,x∗ , i.e., the number of
triples (y, v, w) ∈ v∗ ,x∗ × H 2 for which pv,y (τy ) = pw,y (τy ). Proposition 3 intu-
itively implies that the number of inconsistencies cannot be too large. On the

other hand, Claim 1 above implies that any two v , v for which R(v ) ≡ R(v )
will lead to many inconsistencies on v∗ ,x∗ . Hence, for most v , v ∈ H it will be

the case that R(v ) ≡ R(v ) . More precisely:

Claim 2: The number of pairs (v , v ) ∈ (H )2 for which R(v ) ≡ R(v )
is at
least
2α
(1 − 4α) − 1
2
· |H|2 . (1)
3 − 1
40 α − 2hkd
|F |

Proof of Claim 2: The lower bound is obtained by comparing the number of

inconsistencies caused by triples (y, v, w) ∈ v∗ ,x∗ × H 2 on the one hand side and

those caused by triples (y, v , v ) ∈ v∗ ,x∗ × (H )2 where R(v ) ≡ R(v ) on the
other. We restrict y to belong to v∗ ,x∗ ∩ F k \ S (recall Deﬁnition 5) and give an
upper bound on the ﬁrst quantity and a lower bound on the second that allows
to conclude the claim.
92 M. Baartse and K. Meer

For any y ∈ S there are at least (1 − α)|H| directions w ∈ H such that the
values pw,y (τy ) coincide with f (y) and thus with each other; so for such a ﬁxed
y at least (1 − α)2 |H|2 triples will not result in an inconsistency. Vice versa, at
most (1 − (1 − α)2 )|H|2 ≤ 2α|H|2 inconsistencies can occur. Since there are at
most |F | choices for y ∈ v∗ ,x∗ we have the following upper bound:
|{(y, v, w) ∈ v∗ ,x∗ × H 2 |y ∈ S and pv,y (τy ) = pw,y (τy )}| ≤ 2α|H|2 |F |. (2)
Next consider inconsistencies (y, v , v ) caused by (v , v ) ∈ H such that
(v )
R ≡ R(v ) . According to Claim 1 each such pair (v , v ) implies the existence
of at least 13 |F | − 2hkd points y ∈ v∗ ,x∗ such that (y, v , v ) is an inconsistency
for π. Requiring in addition y ∈ S according to Proposition 3 will still give at

least 13 |F |−2hkd− 40 1
α|F | many such y, i.e., for each (v , v ) ∈ H , R(v ) ≡ R(v )
it holds
1 1
|{y ∈ v ,v |y ∈ S and pv ,y (τy ) = pv ,y (τy )}| ≥ |F | − 2hkd − α|F |. (3)
3 40
Combining (2) and (3) it follows that the number of pairs (v , v ) ∈ (H )2 for

which R(v ) ≡ R(v ) is upper bounded by
2α|H|2 |F | 2α
= · |H|2 .
3 |F | − 2hkd − 40 α|F | − −
1 1 1 1 2hkd
3 40 α |F |

Since |H | ≥ (1 − 4α)|H| this in turn means that the number of pairs (v , v ) ∈

(H )2 for which R(v ) ≡ R(v ) must be at least

2α
(1 − 4α) − 1
2
· |H|2 . (4)
3 − 1
40 α − 2hkd
|F |

This yields Claim 2.

Next, deﬁne the univariate polynomial R as the majority polynomial among

all R(v ) , i.e., the polynomial which maximizes |{v ∈ H |R(v ) ≡ R}|.

Claim 3: The number of choices v ∈ H such that R ≡ R(v ) is at least β|H|,
where
2α
β ≥ (1 − 4α)2 − 1 > 0.84.
3 − 40 α − |F |
1 2hkd

Proof of Claim 3: Let β be the fraction in H (not in H !) of directions v which

belong to H and satisfy R(v ) ≡ R, i.e., β|H| = |{v ∈ H |R(v ) ≡ R}|. Clearly,

for each v ∈ H there can be at most β|H| directions v ∈ H for which R(v ) ≡
(v )
R . Hence, by Claim 2 it is

2α
|H | · β · |H| ≥ (1 − 4α) − 1
2
· |H|2
3 − 1
40 α − 2hkd
|F |

and thus, using |H | ≤ |H|, we obtain

2α
β ≥ (1 − 4α)2 − . (5)
1
3 − 1
40 α − 2hkd
|F |
Testing Low Degree Trigonometric Polynomials 93

From α := 10−2 in Definition 5 and our assumption that |F | ≥ 104 (2hkd + 1)3
it follows that β > 0.84.
Claim 4: The majority polynomial R and r0 are different: R ≡ r0 .
(v )
Proof of Claim 4: Recall that by definition r0 (t) equals r0 (t) for each v ∈ H
and is the polynomial which is claimed by π on v∗ ,x∗ . Similarly, for the majority

of v ∈ H polynomial R(t) equals R(v ) (t). We prove Claim 4 by showing that
the particular value R(0) is attained for more choices of v ∈ H than r0 (0).
(v )
First note that item 4,ii) of Proposition 3 for all v ∈ H implies c0 (s) :=
pv∗ ,x∗ +sv (τx∗ +sv ) = pv ,x∗ (τx∗ +sv ) for at least 12 |F | values of s. Next, Lemma
(v )
2 implies for each v ∈ H that for at least 23 |F | values of s it holds rs (t) =

Q(v ) (s, t) as polynomials in t. For those s it follows in particular that Q(v ) (s, 0) =
(v )
rs (0) = pv∗ ,x∗ +sv (τx∗ +sv ).
Combining the two equations results for each v in at least ( 23 − 12 )|F | many
(v )
values for s for which c0 (s) = Q(v ) (s, 0). Now since both functions are uni-
variate polynomials of degree at most khd they are equal as long as |F | is large
enough.
(v )
Next, it follows that pv ,x∗ (τx∗ ) = c0 (0) = Q(v ) (0, 0) = R(v ) (0); the latter

by definition of R(v ) and pv ,x∗ (τx∗ ) equals the value R(0) for at least β|H|

choices of v ∈ H .
On the other hand it is x∗ ∈ S(v ∗ ), thus for at most α|H| many w ∈ H the
value r0 (0) = pv∗ ,x∗ (τx∗ ) conincides with pw,x∗ (τx∗ ). But β > α, therefore the
claim R ≡ r0 follows.
What remains to be done is to show that using R instead of r0 in the corre-
sponding segment of π strictly reduces its inconsistency.
Claim 5: The number of pairs (y, w) with y ∈ v∗ ,x∗ and w ∈ H for which py,w
agrees with R on y is larger than the number of such pairs for which py,w agrees
with r0 on y.
Proof of Claim 5: Since R = r0 they agree in at most 2hkd points on v∗ ,x∗ .
By the inclusion-exclusion principle it thus suffices to show that among the
|F ||H| triples of form (y, v ∗ , w), y ∈ v∗ ,x∗ , w ∈ H (note that v ∗ is fixed) there
are more than ( 12 |F | + 2hkd)|H| many for which pw,y agrees with R on x∗ .
By Lemma 2 and Claim 3 there exist β|H| directions v ∈ H for which the
(v )
distance from t → ct (0) to R is at most 13 . It follows that |{(y, w) ∈ v∗ ,x∗ ×
H|pw,y agrees with R on y}| ≥ β|H| · 23 |F |.
Plugging in the bounds for β and |F | gives β|H| · 23 |F | > ( 12 |F | + 2hkd)|H|.
This finishes the proof of Claim 5 and thus also the one of Proposition 2. 2

Proof. (of Theorem 3) We have shown that given a veriﬁcation proof π and
a function f which is a majority function of π and not a max-degree 2hkd
polynomial we can construct a veriﬁcation proof π with a majority function f
such that the following holds.
94 M. Baartse and K. Meer

– The univariate polynomials that π and π claim differ on one line (i.e. π and
π differ in one segment) and f and f disagree in at most |F | places.
– The inconsistency of π is strictly less than the inconsistency of π.
If we apply this construction iteratively it must come to an end after finitely
many steps because the inconsistency cannot be reduced an unbounded number
of times. Hence, at some point we must obtain a function fs which is a max-
degree 2hkd polynomial. Lemma 1 implies that for each (fi , πi ) in the sequence
the rejection probability is at most 2 and this finishes the proof. 2

4 Conclusion and Future Problems

The main result of the present paper is the design and analysis of a verifier
performing a local test to check whether a given function has small distance to
the set a k-variate trigonometric polynomials of low degree d. The latter are
defined over a finite field and are real-valued. The verifier accesses a proof that
consists of two parts, one giving the function values of the potential polynomial,
the other giving (the coefficient vectors of) certain univariate restrictions along
a particular set of directions. The entire proof has size O(kd(h|F |)2k ). The main
features of the verifier are: it generates O(k · log |F |) random bits; it then probes
a constant number of blocks in the proof each of algebraic size O(hkd), where h
is a constant. After a number of steps polynomial in k · log |F | the verifier makes
its decision whether to accept the given proof or not. Our result shows that by
repeating the test a constant number of times with high probability the verifier
rejects tables that are far away from trigonometric polynomials.
The result is interesting with respect to the following questions. In the frame-
work of showing the real PCP theorem along [3,2] our low-degree test can be
used as follows. Take an NPR -complete problem, for example, the Quadratic
Polynomial System problem QPS. Its instances are finitely many real polynomi-
als of degree 2 in n variables; the question is to decide whether these polynomials
have a common real zero x∗ ∈ Rn . Using the low-degree approach one can code
such a potential zero as k-variate trigonometric polynomial from F k → R. This
is possible choosing k := logloglogn n , d := O(log n) and |F | := O(log6 n). Then our
result shows that there exists a verifier for checking whether a function value
table is close to such a polynomial using O(log n) many random bits and reading
a constant number of segments in the proof each of size poly log n. 4 The impor-
tant advantage in comparison to the low-degree test for algebraic polynomials
from [13] is the fact that the verifier only reads a constant number of blocks.
This is a crucial requirement in the proof of the classical PCP theorem by Arora
et al. in order to apply the technique of verifier composition.
4
Note a technical detail here: if our verifier is used as BSS algorithm for inputs of
varying size, then for different cardinalities q of the finite field F it needs to work
with different constants cos 2π
q
, sin 2π
q
. Given q one could add two real numbers to
the verification proof which in the ideal case represent a complex primitive q-th root
of unity. The verifier deterministically checks in polynomial time whether this is the
case and then continues to use these constants for performing its test as described.
Testing Low Degree Trigonometric Polynomials 95

Here is a brief outline of how to finish a proof of the real PCP theorem along
these lines. The next step is to use the low-degree test of the present paper
to design a sum-check procedure establishing (once again) the characterization
NPR = PCPR (O(log n), poly log n). This in principle is done using the ideas from
[13]. As in the original proof the verifier resulting from combining the low-degree
test and the sum-checking procedure lacks the necessary segmentation properties
for applying verifier composition; it reads too many blocks. To repair this a
second test is developed which together with using the low-degree test allows to
restructure sum-checking in an appropriate way so that a properly segmented
version is obtained. Though this procedure is not as general as the original
ongoing by Arora et al., which gives quite a general segmentation procedure, it
turns out to be sufficient for our purposes. In a final step, composition of real
verifiers has to be worked out in detail and applied to the long transparent verifier
from [12,6] and the verifier obtained from the above considerations. Filling all
details requires a significant amount of work and space. Therefore we postpone
it to the full version as explained in the introduction.
Let us mention briefly another outcome once a proof of the PCP theorem as
indicated above is at hand. In our main result there seemingly is a discrepancy
with respect to the test accepting degree d polynomials on one side and rejecting
functions far away from degree 2hkd polynomials on the other. This lacking
sharpness is of no concern for our results but seems a bit unusual in comparison
to similar results in testing. However, the full proof can be used to close this
degree gap and make the result sharp from that point of view.
Another line of future research addresses the area of locally testable codes. The
paper shows that trigonometric polynomials can be used as such codes over the
real numbers. So far, not many results into this direction have been obtained in
the BSS model. For example, what about tests for algebraic polynomials? Using
our test for trigonometric polynomials we expect it is possible to design a test
for algebraic polynomials which uses a logarithmic number of random bits and
makes a constant number of queries.
Acknowledgement. Thanks go to the anonymous referees for very helpful
remarks.

References
1. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge
University Press (2009)
2. Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof veriﬁcation and
hardness of approximation problems. Journal of the ACM 45(3), 501–555 (1998)
3. Arora, S., Safra, S.: Probabilistic checking proofs: A new characterization of NP.
Journal of the ACM 45(1), 70–122 (1998)
4. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Pro-
tasi, M.: Complexity and Approximation: Combinatorial Optimization Problems
and Their Approximability Properties. Springer (1999)
5. Baartse, M., Meer, K.: The PCP theorem for NP over the reals. To appear in
Foundations of Computational Mathematics. Springer
96 M. Baartse and K. Meer

6. Baartse, M., Meer, K.: Topics in real and complex number complexity theory. In:
Montana, J.L., Pardo, L.M. (eds.) Recent Advances in Real Complexity and Com-
putation, Contemporary Mathematics, vol. 604, pp. 1–53. American Mathematical
Society (2013)
7. Blum, L., Cucker, F., Shub, M., Smale, S.: Complexity and Real Computation.
Springer (1998)
8. Blum, L., Shub, M., Smale, S.: On a theory of computation and complexity over
the real numbers: NP-completeness, recursive functions and universal machines.
Bull. Amer. Math. Soc. 21, 1–46 (1989)
9. Bürgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory.
Springer (1997)
10. Dinur, I.: The PCP theorem by gap amplification. Journal of the ACM 54(3) (2007)
11. Friedl, K., Hátsági, Z., Shen, A.: Low-degree tests. In: Proc. SODA, pp. 57–64
(1994)
12. Meer, K.: Transparent long proofs: A first PCP theorem for NPR .. Foundations of
Computational Mathematics 5(3), 231–255 (2005)
13. Meer, K.: Almost transparent short proofs for NPR . In: Owe, O., Steffen, M., Telle,
J.A. (eds.) FCT 2011. LNCS, vol. 6914, pp. 41–52. Springer, Heidelberg (2011)
Property Testing Bounds for Linear and
Quadratic Functions via Parity Decision Trees

Abhishek Bhrushundi1 , Sourav Chakraborty1, and Raghav Kulkarni2

1
Chennai Mathematical Institute, India
{abhishek bhr,sourav}@cmi.ac.in
2
Center for Quantum Technologies, Singapore
kulraghav@gmail.com

Abstract. In this paper, we study linear and quadratic Boolean func-

tions in the context of property testing. We do this by observing that
the query complexity of testing properties of linear and quadratic func-
tions can be characterized in terms of complexity in another model of
computation called parity decision trees.
The observation allows us to characterize testable properties of linear
functions in terms of the approximate l1 norm of the Fourier spectrum
of an associated function. It also allows us to reprove the Ω(k) lower
bound for testing k-linearity due to Blais et al [8]. More interestingly,
it rekindles the hope of closing the gap of Ω(k) vs O(k log k) for testing
k-linearity by analyzing the randomized parity decision tree complexity
of a fairly simple function called Ek that evaluates to 1 if and only if the
number of 1s in the input is exactly k. The approach of Blais et al. using
communication complexity is unlikely to give anything better than Ω(k)
as a lower bound.
In the case of quadratic functions, we prove an adaptive two-sided
Ω(n2 ) lower bound for testing aﬃne isomorphism to the inner product
function. We remark that this bound is tight and furnishes an example of
a function for which the trivial algorithm for testing aﬃne isomorphism
is the best possible. As a corollary, we obtain an Ω(n2 ) lower bound for
testing the class of Bent functions.
We believe that our techniques might be of independent interest and
may be useful in proving other testing bounds.

1 Introduction

The field of property testing broadly deals with determining whether a given
object satisfies a property P or is very different from all the objects that satisfy
P. In this paper, the objects of interest are Boolean functions on n variables,
i.e. functions of the form
f : {0, 1}n → {0, 1}.
A Boolean function property P is a collection of Boolean functions. Given a
function g and a parameter , the goal of a tester is to distinguish between the
following two cases:

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 97–110, 2014.

c Springer International Publishing Switzerland 2014
98 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

– g∈P
– g differs from every function in P in at least fraction of points in {0, 1}n.
The query complexity for testing P is the number of queries (of the form “what
is the value of g at x ∈ {0, 1}n?”) made by the best tester that distinguishes
between the above two cases. If the queries made by the tester depend on the
answers to the previous queries, the tester is called adaptive. Also, if the tester
accepts whenever g ∈ P, it is called one-sided.
Testing of Boolean function properties has been extensively studied over the
last couple of decades (See [16,28]). Examples of problems that have been stud-
ied are linearity testing [11], k-junta testing [17,7], monotonicity testing [18,13],
k-linearity testing [19,8,12] etc. An important problem in the area is to char-
acterize Boolean function properties whose query complexity is constant (i.e.,
independent of n, though it can depend on ). For example, such a characteriza-
tion is known in the case of graph properties [1]. Though a general characteriza-
tion for function properties is not yet known, there has been progress for some
special classes of properties. In this paper, we attempt characterizing one such
class: properties which only consist of linear functions. More specifically, we try
to characterize all properties P of linear Boolean functions which can be tested
using constant number of queries.
An example of a property of linear functions is one that contains all parities
on k variables. The problem of testing this property is known as k-linearity
testing. While this problem had been studied earlier [19], recently Blais et al. [8]
used communication complexity to obtain a lower bound of Ω(k) on the query
complexity of adaptive testers for k-linearity. The best known upper bound in
the case of adaptive testers is O(k log k). Whereas a tight bound of Θ(k log k) is
known for the non-adaptive case [12], a gap still exists for adaptive testing: Ω(k)
vs O(k log k). In this paper we give another approach to obtain the Ω(k) lower
bound for the adaptive query complexity. While the lower bound technique of
Blais et al.[8] is unlikely to give a bound beyond Ω(k), our technique has the
potential of proving a better lower bound. We remark that other proof techniques
for the lower bound have also been studied [9].
A rich class of properties for which characterizing constant query testability
has been studied are properties that are invariant under natural transformations
of the domain. For example, [22,4,3] study invariance under affine/linear trans-
formations in this context. Properties that consist of functions isomorphic to a
given function also form an important subclass. The testing of such properties
is commonly referred to as isomorphism testing, and has seen two directions of
study: testing if a function is equivalent to a given function up to permutation
of coordinates [14,10], and testing affine/linear isomorphism.
Our second result1 concerns testing affine isomorphism. A function f is affine
isomorphic to g if there is an invertible affine transformation T such that f ◦ T =
g. Recently, Wimmer and Yoshida [30] characterized the query complexity of
testing affine/linear isomorphism to a function in terms of its Fourier norm. We
complement their work by providing the first example of a function for which
1
This result appears in the preprint [5].
Property Testing Bounds for Linear and Quadratic Functions 99

the query complexity of testing aﬃne isomorphism is the largest possible. As a

corollary, we also prove an adaptive two-sided Ω(n2 ) lower bound for testing the
class of Bent functions which are an important and well-studied class of Boolean
functions in cryptography (See [24,25]).
Grigorescu et al. concurrently and independently obtained these results in [20]
using a diﬀerent proof technique. In fact, they prove an 2Ω(n) lower bound for
testing Bent functions. We believe that our proof is arguably simpler and more
modular, and is also amenable to generalizations (for example, to the quantum
setting), even though the bound we obtain for Bent functions is weaker.
The main technique used in proving all our results is a connection between
testing properties of linear and quadratic functions, and parity decision trees.
Connections between linear functions and parity decision trees have been both
implicitly [9] and explicitly [12] observed in earlier papers. Another connection
that we exploit for proving some of our results is the one between parity deci-
sion tree depth and communication complexity. Similar connections were known
earlier, see for example [31]. We remark that, to the best of our knowledge, our
result is the ﬁrst that combines the two connections, giving yet another way of
relating property testing lower bounds to communication complexity (Blais et
al. [8] observe such a connection in much more generality). Thus, we believe that
our techniques might be of independent interest.

1.1 Our Results and Techniques

Property Testing and Parity Decision Trees. We give a connection be-

tween testing properties of linear functions and parity decision trees. The fol-
lowing is an informal statement of the connection:
Theorem 1. For every property P of linear functions on n variables, one can
associate a Boolean function EP on n variables such that there is an adaptive
q-query tester for determining if a given f is in P or 1/2-far from P only if
there is a randomized parity decision tree that makes q queries to compute EP .
A similar connection holds in the case of quadratic functions.
Theorem 2. For every property P of quadratic functions on n variables, one
can associate a Boolean function EP on n2 variables such that there is an adap-
tive q query tester for determining if a given f is in P or 1/4-far from P only if
there is a randomized parity decision tree that makes q queries to compute EP .
All the results that follow use the above connections crucially. Another important
ingredient for some of the results is a connection between parity decision trees
and the communication complexity of XOR functions. We discuss this in detail
in Section 3.

Characterization of Testable Properties of Linear Functions. Theorem

1 allows us to characterize the constant query testability of a property P of linear
functions in terms of the approximate L1 norm of EP .
100 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

Theorem 3. A property P of linear functions is constant query testable if and

P ||1/4 is constant.
only if ||E 1

This is the ﬁrst such characterization of linear function properties and we hope
our result is a small step towards our understanding of function properties
testable in constant number of queries.

Testing k-linearity. We also obtain an alternate proof of the lower bound for
testing k-linearity due to Blais et al. [8].
Theorem 4. Any adaptive two-sided tester for testing k-linearity requires Ω(k)
queries.
The idea behind the proof is as follows. Applying Theorem 1 in the case of
k-linearity, EP turns out to be equal to the function Ek that outputs 1 if and
only if there are exactly k 1s in the input string. Thus, to prove Theorem 4 we
lower bound the randomized parity decision tree complexity of Ek by Ω(k).
Note that this leaves open the possibility of proving a tight Ω(k log k) bound
for testing k-linearity by improving our lower bound on the randomized parity
decision tree complexity of Ek .

Lower Bound for Testing Affine Isomorphism. Let IPn (x) denote the
n/2
inner product function i=1 xi xn/2+i . We consider the problem of testing affine
isomorphism to IPn (x) and prove a tight lower bound.
Theorem 5. Any adaptive two-sided tester for testing affine isomorphism to
IPn (x) requires Ω(n2 ) queries.
The proof of Theorem 5 is similar to that of Theorem 4, though in this case,
EP turns out to be En , a function that maps graphs on n vertices to {0, 1}, and
outputs 1 if and only if the input graph’s adjacency matrix is nonsingular over
F2 . As mentioned before, this is the first example of a function for which testing
affine isomorphism requires Ω(n2 ) queries (O(n2 ) is a trivial upper bound for
any function and follows from a folklore result).
It can be show that testing the set of quadratic Bent functions reduces to
testing affine isomorphism to IPn (x). Thus, Theorem 5 gives a lower bound for
testing the set of quadratic Bent functions. Furthermore, using a result from
[15], the following corollary can be obtained.
Corollary 1. Any adaptive two-sided tester for testing the set of Bent functions
requires Ω(n2 ) queries.

2 Preliminaries
2.1 Boolean Functions
Recall that functions mapping {0, 1}n to {0, 1}are called Boolean functions. A
Boolean function is linear if it is expressible as i∈S xi for S ⊆ [n] over F2 . The
set of linear functions will be denoted by L.
Property Testing Bounds for Linear and Quadratic Functions 101

A Boolean function is quadratic if it can be expressed as a polynomial of

degree at most two over F2 . We shall denote the set of quadratic functions by
Q, and the set of homogeneous quadratic functions (no linear terms) by Q0 . By
a property of linear or quadratic functions, we shall always mean a subset of L
or Q.
For Boolean functions f and g, dist(f, g) = Prx [f (x) = g(x)]. The notion can
be extended to sets of Boolean functions S and T in a natural way: dist(S, T ) =
minf ∈S,g∈T dist(f, g). We state a simple but useful observation:
Observation 1. If f and g are linear (quadratic) functions then either f = g
or dist(f, g) ≥ 1/2 (dist(f, g) ≥ 1/4).

2.2 Property Testing

Let P be a property of Boolean functions on n variables. We say a randomized
algorithm A -tests P, if given oracle access to the truth table of an input function
f , A determines with probability at least 2/3 whether f ∈ P, or dist(f, P) ≥ .
The number of queries made by the best tester for -testing P is known as the
query complexity of P. It is denoted by Q (P) and may be a function of n.
Remark. When testing properties of linear functions, it is common to assume
that the input function is promised to be a linear function. For a property P
of linear functions, we denote the query complexity of testing P under such a
promise by Q1 (P).
For technical reasons, it will be useful to consider such a notion for quadratic
functions. For a property P ⊆ Q of quadratic functions, we shall denote by
Q2 (P) the query complexity of testing P under the promise that the input is
always a function in Q0 . Observation 1 implies the following statement.
Observation 2. Let P be a property of linear functions. Then, Q1/2 (P) ≥
Q1 (P). Similarly, in the case of quadratic functions, Q1/4 (P) ≥ Q2 (P)
It can also be shown that:
Observation 3. If Q1 (P) = Q then ∀ ∈ (0, 1/4), Q (P) ≤ O (Q log Q)
A proof appears in the appendix of the full version [6].
Let G be a group that acts on {0, 1}n. A function f is G-isomorphic to another
function g if there is a φ ∈ G such that f ◦ φ = g. For a fixed function g,
the problem of testing G-isomorphism to g is to test if an input function f is
G-isomorphic to g, or -far from all functions that are G-isomorphic to g. A
folklore result gives a trivial upper bound for the problem:
Lemma 1. Testing G-isomorphism to a function g can be done in O(log |G|)
queries.
When G is the group of invertible affine transformations, the problem is known
as affine isomorphism testing. The above lemma gives us the following corollary:
Corollary 2. O(n2 ) queries suffice to test affine isomorphism.
102 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

2.3 Parity Decision Trees

Parity decision trees extends the model of ordinary decision trees such that one
may
query the parity of a subset of input bits, i.e. the queries are of form “is
i∈S xi ≡ 1 (mod 2)? ” for an arbitrary subset S ⊆ [n]. We call such queries
parity queries.
Let f be a Boolean function. For a parity decision tree Pf for f, let C(Pf , x)
denote the number of parity queries made by Pf on input x. The parity decision
tree complexity of f is D⊕ (f ) = minPf maxx C(Pf , x).
Note that D⊕ (f ) ≤ D(f ), where D(f ) is the deterministic decision tree com-
plexity of f , as the queries made by a usual decision tree, “is xi = 1?”, are also
valid parity queries.
f
A bounded error randomized parity decision tree R⊕ is a probability distribu-
tion over all deterministic decision trees such that for every input the expected
f
error of the algorithm is bounded by 1/3. The cost C(R⊕ , x) is the highest pos-
f
sible number of queries made by R⊕ on x, and the bounded error randomized
f
decision tree complexity of f is R⊕ (f ) = minRf maxx C(R⊕ , x)
⊕
For a Boolean function f , it turns out that R⊕ (f ) can be lower bounded by the
randomized communication complexity of the so-called XOR function f (x ⊕ y)
(See [23] for the deﬁnition of randomized communication complexity and XOR
functions). So we have the following lemma.

Lemma 2. R⊕ (f ) ≥ 12 RCC(f (x ⊕ y))

Proof. Given a Boolean function f : {0, 1}n → {0, 1} on n consider the commu-
nication game where x is with Alice and y is with Bob and they want to compute
f (x ⊕ y) with error bounded by 1/3. Let RCC(f (x ⊕ y)) denote the randomized
communication complexity of this communication game.
f
Given a randomized parity decision tree R⊕ , Alice and Bob can convert it
f
into a protocol by simulating the parity queries made by R⊕ by two bits of
communication, and thus the inequality follows.

3 Property Testing and Parity Trees

In this section we describe a relation between the testing complexity of a prop-

erty of linear/quadratic functions and the parity decision tree complexity of an
associated function. We remark that such connections have been observed before
in the case of linear functions, though, to the best of our knowledge, such an
observation had not been made for quadratic functions before our work.

3.1 Parity Trees and Linear Functions

Let ei ∈ {0, 1}n denote the Boolean string whose ith bit is 1 and all other bits
are 0. For any linear function f let us deﬁne a string B(f ) ∈ {0, 1}n such that
the ith bit of B(f ) is 1 iﬀ f (ei ) = 1. The following lemma is easy to prove:
Property Testing Bounds for Linear and Quadratic Functions 103

Lemma 3. The map B : L → {0, 1}n gives a bijection between the set L and
strings of length n.

Now let P ⊆ L be a set of linear functions. Given a linear function f we want a

tester T that makes queries to the truth table of f and determines whether f is
in P or is -far from P. Let us deﬁne a set SP ⊆ {0, 1}n as SP := {B(f ) | f ∈ P}.

Lemma 4. For any P ⊆ L and any f ∈ L we have:

– f ∈ P if and only if B(f ) ∈ SP and
– f is 1/2-far from P if and only if B(f ) ∈ SP

We omit the proof of Lemma 4 as it follows directly from Lemma 3 and Obser-
vation 1.
Now, by Lemma 4, testing where f is in P or is 1/2-far from P is exactly same
as deciding if B(f ) ∈ SP . Furthermore, we can translate the queries made by the
tester T to the truth table of f into parity
queries to the string B(f ), and vice-
versa. Since f is linear, we have f (x) = i xi · f (ei ). Now, if Sx := {i | xi = 1}
then, whenever T queries f at x, it can be equivalently viewed as the query

i∈Sx (B(f ))i made to B(f ).
Consider the Boolean function EP : {0, 1}n → {0, 1}, where EP (x) = 1 iﬀ
−1
B (x) ∈ P. From the above discussion, deciding “is x ∈ SP ?” is same as
deciding “is EP (x) = 1?” Thus we have:

Theorem 6. There is a tester that makes q queries for determining if a linear

function f satisﬁes the property P or is 1/2-far from satisfying P if and only
if there is a randomized parity decision that makes q queries for computing EP .
Equivalently, Q1 (P) = R⊕ (EP ).

Notice that Theorem 1 follows from this by using Q1/2 (P) ≥ Q1 (P) from Ob-
servation 2.

3.2 Parity Trees and Quadratic Functions

2
Let Gn ⊆ {0, 1}n denote the set of graphs on n vertices. For any homogeneous
quadratic function f ∈ Q0 let us deﬁne a graph G(f ) with vertex set [n] such
that the edge {i, j} is present in G(f ) iﬀ xi xj occurs as a monomial when f is
expressed as a polynomial over F2 . The following observation follows from the
way we constructed G(f ):
Observation 4. The map G : Q0 → Gn is a bijection.
Let P ⊆ Q be a property of quadratic functions, and let SP = {G(f ) | f ∈ P ∩ Q0 }.
It now easily follows from Observation 4 and 1 that:
Lemma 5. For any P ⊆ Q and any f ∈ Q0 we have:
– f ∈ P if and only if G(f ) ∈ SP and
– f is 1/4-far from P if and only if G(f ) ∈ SP
104 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

Thus, the above lemma says that testing whether a given f ∈ Q0 is in P or

1/4-far from P is exactly the same as deciding if G(f ) is in SP .
Let A be an algorithm that tests if a f ∈ Q0 is in P or 1/4-far from it. We
now describe how to translate queries made by A to the truth table of f to
parity queries to the adjacency matrix of the graph G(f ). Given y ∈ {0, 1}n and
a graph G on the vertex set [n], we denote by G[y] the induced graph on the
vertex set {i| yi = 1}. It is not hard to see that the value f (y) is exactly the
parity of the number of edges in G(f )[y]. Thus, any query to the truth table of f
can be translated to a parity query to the adjacency matrix of G(f ), but unlike
in the case of linear functions, the translation works only in one direction. To be
more precise, an arbitrary parity query to the adjacency matrix of G(f ) cannot
be translated into a query to the truth table of f .
Consider the Boolean function EP : Gn → {0, 1}, where EP (H) = 1 iﬀ
G−1 (H) ∈ P, and observe that deciding “is H ∈ SP ?” is same as deciding
“is EP (H) = 1?”. Combining the observations made above, we get:

Lemma 6. There is an adaptive tester that makes q queries for determining if

a given f ∈ Q0 satisﬁes the property P or is 1/4-far from satisfying P only if
there is a randomized parity decision that makes q queries for computing EP .
Equivalently, Q2 (P) ≥ R⊕ (EP ).

Combining Lemma 6 and Observation 2, we get a more general result:

Theorem 7 (Formal statement of Theorem 2). There is an adaptive tester
that makes q queries for determining if a given f satisﬁes the property P or is
1/4-far from satisfying P only if there is a randomized parity decision tree that
makes q queries for computing EP . Equivalently, Q1/4 (P) ≥ R⊕ (EP ).

4 Characterizing Testable Properties of Linear Functions

In this section we give a characterization of properties of linear functions that are
testable using constant number of queries. We will use some standard concepts
from Fourier analysis of Boolean functions and the reader is referred to [26] for
an introduction to the same.
Recall that for a Boolean2 function f , ||f||1 denotes the minimum possible
g ||1 over all g such that |f (x) − g(x)| ≤ for all x. We begin by proving the
||
following lemma:
Lemma 7. There are constants c1 , c2 > 0 such that for suﬃciently large n, if
f : {0, 1}n → {−1, +1} is a Boolean function, then c1 · log ||f||1 ≤ R⊕ (f ) ≤
1/4

c2 · (||f||1 )2
1/4

2
For the purpose of this section, it will be convenient to assume that the range of a
Boolean function is {−1, +1}.
Property Testing Bounds for Linear and Quadratic Functions 105

Proof. For the ﬁrst inequality, we obtain from Lemma 2 that RCC(f (x ⊕ y)) ≤
2R⊕ (f ). Now, it is well known that RCC(f (x ⊕ y)) = Ω(log ||f||1 ) (see for
1/4

instance [23]) and thus we have

R⊕ (f ) ≥ 1/2 · RCC(f (x ⊕ y)) = Ω(log ||f||1 )

1/4
(1)

To see the second inequality, we will construct a randomized parity decision tree3
T with query complexity O((||f||1 )2 ) that computes f . Let g : {0, 1}n → R be a
1/4

function that point-wise 1/4-approximates f (i.e. for all x, |f (x) − g(x)| ≤ 1/4)
such that || g||1 is the minimum among all functions that 1/4-approximate f .
Let Dg denote a distribution on subsets of [n] such that a set S has probability
|ĝ(S)|/||
g||1 .
We define the randomized parity decision tree T as follows. T makes d (the
parameter will be fixed later) random parity queries S1 , S2 . . . Sd , such that each
Si is distributed according to Dg . Let X1 , X2 , . . . Xd be random variables such
that
sign(ĝ(Si ))(−1) j∈Si xj
Xi =
||
g ||1
Here the sign function sign(x) outputs −1 is x < 0, and 1 otherwise. Finally, the

tree outputs sign( di=1 Xi ).
The first thing to note is that

sign(ĝ(Si ))(−1) j∈Si xj
|ĝ(S)| g(x)
E[Xi ] = =
||
g||1 ||
g ||1 g ||1 )2
(||
S⊆[n]

d
Let X = i=1 Xi . Then, E[X] = d · g(x)/(||g ||1 )2 . Setting d = 100 · (||
g ||1 )2 , we
get E[X] = 100 · g(x).
g||1 , +1/||
Now each Xi is bounded and lies in [−1/|| g||1 ]. Thus by Hoeﬀding’s
inequality we have

−2 · (50)2 −25
Pr[|X − E[X]| ≥ 50] ≤ exp = exp . (2)
400 2

Since g point-wise 1/4-approximates f , sign(g(x)) = sign(f (x)) = f (x). Also, it

is easy to see that, if |X − E[X]| ≤ 50, sign(X) = sign(E[X]) = sign(g(x)). Thus,
by Equation 2, sign(X) = f (x) with very high probability.
The above argument shows that T is a randomized decision tree that computes
g ||1 )2 ) = O((||f||1 )2 ) queries. This
1/4
f with high probability and makes O((||
proves that
R⊕ (f ) = O((||f|| )2 )
1/4
1 (3)

Combining Equations 1 and 3 we can obtain the statement of the Lemma.

3
We shall assume that T ’s range is {−1, +1}.
106 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

Let P be a property of linear functions, and Q1 (P) denote the query complex-
ity of testing P when the input function is promised to be linear. Then, from
the above lemma and Theorem 6, there exist constants c1 , c2 > 0 such that for
large enough n,
P ||1/4 ≤ Q1 (P) ≤ c2 · (||E
c1 · log ||E P ||1/4 )2 (4)
1 1

Using Observation 2 and 3 and Equation 4, we get, for ∈ (0, 1/4), there exists
a constant c2 that depends on such that for large enough n:

P ||1/4 ≤ Q1/4 (P) ≤ Q (P) ≤ c (||E

c1 · log ||E P ||1/4 )2 log ||E
P ||1/4
1 2 1 1

Thus, we can conclude the Theorem 3 from the discussion: a property P of linear
P ||1/4 is
functions is testable using constant number of queries if and only if ||E 1
constant.

5 Testing k-linearity
In this section we apply the result from Section 3 to prove a lower bound for
testing k-linearity. We shall use P to denote the set of k-linear functions on n
variables.
Let Ek : {0, 1}n → {0, 1} denote the Boolean function that outputs 1 if and
only if the number of 1s is exactly k. Recall a notation from Section 3: for any
linear function f we can deﬁne a string B(f ) ∈ {0, 1}n such that B(f )i = 1 iﬀ
f (ei ) = 1. We observe the following:
Observation 5. A Boolean function f is k-linear if and only if B(f ) has exactly
k 1s.
Thus, EP is exactly the function Ek . Using Theorem 6 we have the following:

Q1 (P) = R⊕ (Ek ) (5)

Thus, if we can obtain a lower bound of Ω(k log k) on the randomized parity
decision tree complexity of Ek then we would obtain a tight bound for adaptive
k-linearity testing (This would follow from Observation 2: Q1/2 (P) ≥ Q1 (P)).
Unfortunately we are unable to obtain such a lower bound yet. Instead we can
obtain a lower bound of Ω(k) that matches the previous known lower bound for
k-linearity testing [8].
Using Lemma 2, we have that R⊕ (Ek ) ≥ 12 RCC(Ek (x ⊕ y)). Furthermore,
Huang et al. [21] show that4 :
Lemma 8. RCC(Ek (x ⊕ y)) = Ω(k)
Using Equation 5 and Lemma 8, we have Q1 (P) = Ω(k). Finally, Observation 2
gives us Q1/2 (P) = Ω(k):
4
Actually, Huang et al. show that RCC(E>k (x ⊕ y)) = Ω(k), but their proof can be
used to obtain the same lower bound for RCC(Ek (x ⊕ y)). Alternatively, the lower
bound may be obtained via a reduction to k-DISJ, a problem considered in [8].
Property Testing Bounds for Linear and Quadratic Functions 107

Theorem 8 (Formal statement of Theorem 4). Any adaptive two-sided

tester for 1/2-testing k-linearity must make Ω(k) queries.

Thus we obtain a lower bound of Ω(k) using the lower bound for the randomized
communication complexity of the XOR function Ek (x ⊕ y). Note that using this
method we cannot expect to obtain a better lower bound as there is an upper
bound of O(k) on the communication complexity. But there is hope that one may
be able to obtain a better lower bound for the parity decision tree complexity of
Ek directly.
On the other hand, if one is able to construct a randomized parity decision tree of
depth O(k) for deciding Ek , Lemma 5 immediately implies a tester for k-linearity
that makes O(k) queries under the promise that the input function is linear. Notice
that the exact complexity for even the promise problem is not known and the best
upper bound is O(k log k). (while, naturally, the lower bound is Ω(k).)

6 Testing Aﬃne Isomorphism to the Inner Product

Function
The main result of this section is that 1/4-testing affine isomorphism to the inner
product function IPn (x)5 requires Ω(n2 ) queries. As a corollary, we show that
testing the set of Bent functions requires Ω(n2 ) queries.
Let B denote the set of Bent functions (See [27] for a definition). The following
is a consequence of Dickson’s lemma (We omit the proof here, but it appears in
the full version [6])
Lemma 9. Let Q(n) denote the the query complexity of 1/4-testing affine iso-
morphism to the inner product function. Then Q1/4 (B ∩ Q) = O(Q(n)).
Thus, it is sufficient to lower bound Q1/4 (B ∩ Q). In fact, by Observation 2,
Q1/4 (B ∩ Q) ≥ Q2 (B ∩ Q), and thus we can restrict out attention to lower
bounding Q2 (B ∩ Q).
Recall from Section 3 that we can associate a graph G(f ) with every function
f ∈ Q0 . We now state a criterion for a quadratic function to be Bent that follows
from a result due to Rothaus [27].
Lemma 10. A function f ∈ Q0 is Bent iff the adjacency matrix of G(f ) is
nonsingular.
We omit the proof due to space constraints and give a proof in the appendix of
the full version [6].
2
Recall from Section 3 that Gn ⊆ {0, 1}n is the set of graphs on the vertex set
[n]. Let P := B ∩ Q, and let En : Gn → {0, 1} be a Boolean function such that
En (G) = 1 iff the adjacency matrix of G is nonsingular. Due to Lemma 10, EP
turns out to be exactly equal to En . Combining with Theorem 5, we have

Q2 (P) ≥ R⊕ (En ) (6)

5
For the rest of the section we shall assume that the number of variables n is even.
108 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

As in the case of Ek , analyzing the decision tree complexity of En directly is hard,

and we turn to communication complexity. Lemma 2 tells us that R⊕ (En ) ≥
2 RCC(En (x ⊕ y)).
1

Let Mn (F2 ) denote the set of n × n matrices over F2 , and Detn : Mn (F2 ) →
{0, 1} be the function such that Detn (A) = 1 iff A ∈ Mn (F2 ) is nonsingular.
The following result from [29] analyzes the communication complexity of Detn .
Lemma 11. RCC(Detn (x ⊕ y)) = Ω(n2 )
It turns out that the communication complexity of Detn relates to that of En .
Lemma 12. = RCC(Detn (x ⊕ y)) ≤ RCC(E2n (x ⊕ y))
Proof. Let A ∈ Mn (F2 ). Consider the 2n × 2n matrix A given by

0 At
A :=
A 0
A ∈ G2n by construction and it can be easily verified that A is nonsingular iff
A is nonsingular.
Now, let the inputs to Alice and Bob be A and B respectively. Since (A⊕B) =
A ⊕ B , Detn (A ⊕ B) = 1 iff E2n ((A ⊕ B) ) = 1 iff E2n (A ⊕ B ) = 1. Thus, to

determine if Detn (A ⊕ B) is 1, Alice and Bob can construct A and B from A

and B respectively, and run the protocol for E2n on A and B . This completes
the proof.
Thus, using Lemma 11, we have RCC(En (x ⊕ y)) = Ω(n2 ). Using Lemma 2 and
Equation 6, we have that Q2 (P) = Ω(n2 ).
Thus, based on earlier observations, we can conclude:
Theorem 9 (Formal statement of Theorem 5). Any adaptive two-sided
tester for 1/4-testing aﬃne isomorphism to the inner product function IPn (x)
requires Ω(n2 ) queries.
Corollary 2 tells us that our result is tight. Thus, IPn (x) is an example of a
function for which the trivial bound for testing aﬃne isomorphism is the best
possible.
We have shown that Q1/4 (B ∩ Q) = Ω(n2 ). We now state a result due to Chen
et al.(Lemma 2 in [15]) in a form that is suitable for application in our setting:
Lemma 13. Let P1 and P2 be two properties of Boolean functions that have
testers (possibly two-sided) T1 and T2 respectively. Let the query complexity of
tester Ti be qi (, n). Suppose dist(P1 \P2 , P2 \P1 ) ≥ 0 for some absolute constant
0 . Then, P1 ∩ P2 is -testable with query complexity
0 0
O(max{q1 (, n), q1 ( , n)} + max{q2 (, n), q2 ( , n)})
2 2
In its original form, the lemma has been proven for the case when T1 , T2 are
one-sided, and q1 , q2 are independent of n, but the proof can be easily adapted
to this more general setting.
Another consequence of Dickson’s lemma is the following (A proof appears in
the full version [6]):
Property Testing Bounds for Linear and Quadratic Functions 109

Lemma 14. Let f, g be Boolean functions. If f ∈ B \ Q and g ∈ Q \ B, then

dist(f, g) ≥ 1/4.
We are now ready to prove a lower bound for testing Bent functions.
Theorem 10 (Formal statement of Corollary 1). Any adaptive two-sided
tester that 1/8-tests the set of Bent functions requires Ω(n2 ) queries.

Proof. It is well known via [2] that Q is testable with constant number of queries
(say q1 ()). Suppose there is a tester that tests B using q2 (, n) queries. From
Lemma 14, we know that dist(B \ Q, Q \ B) ≥ 14 . Thus, by Lemma 13, we have
that there is a tester that makes O(max{q1 (), q1 ( 18 )} + max{q2 (, n), q2 ( 18 , n)})
queries to -test B ∩ Q.
Setting = 14 , we have a tester that makes O(q1 ( 18 ) + q2 ( 18 , n)) queries to test
if a given f is in B ∩ Q, or 1/4-far from it. Since Q1/4 (B ∩ Q) = Ω(n2 ) and q1 ( 18 )
is a constant, we get q2 ( 18 , n) = Ω(n2 ), which completes the proof.

References
1. Alon, N., Fischer, E., Newman, I., Shapira, A.: A combinatorial characterization
of the testable graph properties: It’s all about regularity. SIAM J. Comput. 39(1),
143–167 (2009)
2. Alon, N., Kaufman, T., Krivelevich, M., Litsyn, S.N., Ron, D.: Testing low-
degree polynomials over GF(2). In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A.
(eds.) APPROX 2003 + RANDOM 2003. LNCS, vol. 2764, pp. 188–199. Springer,
Heidelberg (2003)
3. Bhattacharyya, A., Fischer, E., Hatami, H., Hatami, P., Lovett, S.: Every locally char-
acterized affine-invariant property is testable. In: Proceedings of the 45th Annual
ACM Symposium on Symposium on Theory of Computing, STOC 2013, pp. 429–436.
ACM Press, New York (2013), http://doi.acm.org/10.1145/2488608.2488662
4. Bhattacharyya, A., Grigorescu, E., Shapira, A.: A unified framework for testing
linear-invariant properties. In: Proceedings of the 51st Annual IEEE Symposium
on Foundations of Computer Science, pp. 478–487 (2010)
5. Bhrushundi, A.: On testing bent functions. Electronic Colloquium on Computa-
tional Complexity (ECCC) 20, 89 (2013)
6. Bhrushundi, A., Chakraborty, S., Kulkarni, R.: Property testing bounds for lin-
ear and quadratic functions via parity decision trees. Electronic Colloquium on
Computational Complexity (ECCC) 20, 142 (2013)
7. Blais, E.: Testing juntas nearly optimally. In: Proc. ACM Symposium on the The-
ory of Computing, pp. 151–158. ACM, New York (2009)
8. Blais, E., Brody, J., Matulef, K.: Property testing via communication complexity.
In: Proc. CCC (2011)
9. Blais, E., Kane, D.: Tight bounds for testing k-linearity. In: Gupta, A., Jansen,
K., Rolim, J., Servedio, R. (eds.) APPROX/RANDOM 2012. LNCS, vol. 7408,
pp. 435–446. Springer, Heidelberg (2012)
10. Blais, E., Weinstein, A., Yoshida, Y.: Partially symmetric functions are efficiently
isomorphism-testable. In: FOCS, pp. 551–560 (2012)
11. Blum, M., Luby, M., Rubinfeld, R.: Self-testing/correcting with applications to
numerical problems. In: STOC, pp. 73–83 (1990)
110 A. Bhrushundi, S. Chakraborty, and R. Kulkarni

12. Buhrman, H., Garcı́a-Soriano, D., Matsliah, A., de Wolf, R.: The non-adaptive
query complexity of testing k-parities. CoRR abs/1209.3849 (2012)
13. Chakrabarty, D., Seshadhri, C.: A o(n) monotonicity tester for boolean functions
over the hypercube. CoRR abs/1302.4536 (2013)
14. Chakraborty, S., Fischer, E., Garcı́a-Soriano, D., Matsliah, A.: Junto-symmetric
functions, hypergraph isomorphism and crunching. In: IEEE Conference on Com-
putational Complexity, pp. 148–158 (2012)
15. Chen, V., Sudan, M., Xie, N.: Property testing via set-theoretic operations. In:
ICS, pp. 211–222 (2011)
16. Fischer, E.: The art of uninformed decisions: A primer to property testing. Sci-
ence 75, 97–126 (2001)
17. Fischer, E., Kindler, G., Ron, D., Safra, S., Samorodnitsky, A.: Testing juntas.
Journal of Computer and System Sciences 68(4), 753–787 (2004), Special Issue on
FOCS 2002
18. Fischer, E., Lehman, E., Newman, I., Raskhodnikova, S., Rubinfeld, R., Samorod-
nitsky, A.: Monotonicity testing over general poset domains. In: STOC, pp. 474–483
(2002)
19. Goldreich, O.: On testing computability by small width obdds. In: Serna, M.,
Shaltiel, R., Jansen, K., Rolim, J. (eds.) APPROX 2010. LNCS, vol. 6302,
pp. 574–587. Springer, Heidelberg (2010)
20. Grigorescu, E., Wimmer, K., Xie, N.: Tight lower bounds for testing linear isomor-
phism. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds.)
RANDOM 2013 and APPROX 2013. LNCS, vol. 8096, pp. 559–574. Springer,
Heidelberg (2013)
21. Huang, W., Shi, Y., Zhang, S., Zhu, Y.: The communication complexity of the
hamming distance problem. Inf. Process. Lett. 99(4), 149–153 (2006)
22. Kaufman, T., Sudan, M.: Algebraic property testing: the role of invariance. In:
STOC, pp. 403–412 (2008)
23. Lee, T., Shraibman, A.: Lower bounds in communication complexity. Foundations
and Trends in Theoretical Computer Science 3(4), 263–398 (2009)
24. MacWilliams, F.J., Sloane, N.J.A.: The Theory of Error-Correcting Codes (North-
Holland Mathematical Library). North Holland Publishing Co. (June 1988),
http://www.worldcat.org/isbn/0444851933
25. Neumann, T.: Bent functions, Master’s thesis (2006)
26. O’Donnell, R.: Analysis of boolean functions (2012),
http://www.analysisofbooleanfunctions.org
27. Rothaus, O.: On bent functions. Journal of Combinatorial Theory, Series
A 20(3), 300–305 (1976), http://www.sciencedirect.com/science/article/
pii/0097316576900248
28. Rubinfeld, R., Shapira, A.: Sublinear time algorithms. Electronic Colloquium on
Computational Complexity (ECCC) 11(013) (2011)
29. Sun, X., Wang, C.: Randomized communication complexity for linear algebra prob-
lems over ﬁnite ﬁelds. In: STACS, pp. 477–488 (2012)
30. Wimmer, K., Yoshida, Y.: Testing linear-invariant function isomorphism. In:
Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.) ICALP 2013, Part
I. LNCS, vol. 7965, pp. 840–850. Springer, Heidelberg (2013)
31. Zhang, Z., Shi, Y.: On the parity complexity measures of boolean functions. Theor.
Comput. Sci. 411(26-28), 2612–2618 (2010)
A Fast Branching Algorithm
for Cluster Vertex Deletion

Anudhyan Boral1 , Marek Cygan2 , Tomasz Kociumaka2 , and Marcin Pilipczuk3

1
Chennai Mathematical Institute, Chennai, India
anudhyan@cmi.ac.in
2
Institute of Informatics, University of Warsaw, Poland
{cygan,kociumaka}@mimuw.edu.pl
3
Department of Informatics, University of Bergen, Norway
Marcin.Pilipczuk@ii.uib.no

Abstract. In the family of clustering problems we are given a set of

objects (vertices of the graph), together with some observed pairwise
similarities (edges). The goal is to identify clusters of similar objects by
slightly modifying the graph to obtain a cluster graph (disjoint union of
cliques).
Hüffner et al. [LATIN 2008, Theory Comput. Syst. 2010] initiated the
parameterized study of Cluster Vertex Deletion, where the allowed
modification is vertex deletion, and presented an elegant O(2k k9 + nm)-
time fixed-parameter algorithm, parameterized by the solution size. In
the last 5 years, this algorithm remained the fastest known algorithm for
Cluster Vertex Deletion and, thanks to its simplicity, became one
of the textbook examples of an application of the iterative compression
principle. In our work we break the 2k -barrier for Cluster Vertex
Deletion and present an O(1.9102k (n + m))-time branching algorithm.

1 Introduction
The problem to cluster objects based on their pairwise similarities has arisen
from applications both in computational biology [6] and machine learning [5]. In
the language of graph theory, as an input we are given a graph where vertices
correspond to objects, and two objects are connected by an edge if they are
observed to be similar. The goal is to transform the graph into a cluster graph
(a disjoint union of cliques) using a minimum number of modifications.
The set of allowed modifications depends on the particular problem variant
and an application considered. Probably the most studied variant is the Cluster
Editing problem, known also as Correlation Clustering, where we seek for
a minimal number of edge edits to obtain a cluster graph. The study of Cluster
Editing includes [3, 4, 14, 20, 31] and, from the parameterized perspective,
[7–11, 15, 16, 19, 22–24, 27–29].
The main principle of parameterized complexity is that we seek algorithms
that are efficient if the considered parameter is small. However, the distance

Partially supported by NCN grant N206567140 and Foundation for Polish Science.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 111–124, 2014.

c Springer International Publishing Switzerland 2014
112 A. Boral et al.

measure in Cluster Editing, the number of edge edits, may be quite large
in practical instances, and, in the light of recent lower bounds refuting the ex-
istence of subexponential FPT algorithms for Cluster Editing [19, 27], it
seems reasonable to look for other distance measures (see e.g. Komusiewicz’s
PhD thesis [27]) and/or different problem formulations.
In 2008, Hüffner et al. [25, 26] initiated the parameterized study of the
Cluster Vertex Deletion problem (ClusterVD for short). Here, the al-
lowed modifications are vertex deletions.
Cluster Vertex Deletion (ClusterVD) Parameter: k
Input: An undirected graph G and an integer k.
Question: Does there exist a set S of at most k vertices of G such that G \ S
is a cluster graph, i.e., a disjoint union of cliques?
In terms of motivation, we want to refute as few objects as possible to make
the set of observations completely consistent. Since a vertex deletion removes as
well all its incident edges, we may expect that this new editing measure may be
significantly smaller in practical applications than the edge-editing distance.
As ClusterVD can be equivalently stated as the problem of hitting, with
minimum number of vertices, all induced P3 s (paths on 3 vertices) in the input
graph, ClusterVD can be solved in O(3k (n + m)) time by a straightforward
branching algorithm [13], where n and m denote the number of vertices and edges
of G, respectively. The dependency on k can be improved by considering more
elaborate case distinction in the branching algorithm, either directly [21], or via a
general algorithm for 3-Hitting Set [17]. Hüffner et al. [26] provided an elegant
O(2k k 9 + nm)-time algorithm, using the iterative compression principle [30]
and a reduction to the weighted maximum matching problem. This algorithm,
presented at LATIN 2008 [25], quickly became one of the textbook examples of
an application of the iterative compression technique.
In our work we pick up this line of research and obtain the fastest algorithm
for (unweighted) ClusterVD.
Theorem 1. Cluster Vertex Deletion can be solved in O(1.9102k (n + m))
time and polynomial space on an input (G, k) with |V (G)| = n and |E(G)| = m.
The source of the exponential 2k factor in the time complexity of the algorithm
of [26] comes from enumeration of all possible intersections of the solution we
are looking for with the previous solution of size (k + 1). As the next step in
each subcase is a reduction to the weighted maximum matching problem (with
a definitely nontrivial polynomial-time algorithm), it seems hard to break the
2k -barrier using the approach of [26]. Hence, in the proof of Theorem 1 we go
back to the bounded search tree approach. However, to achieve the promised
time bound, and at the same time avoiding very extensive case analysis, we
do not follow the general 3-Hitting Set approach. Instead, our methodology
is to carefully investigate the structure of the graph and an optimum solution
around a vertex already guessed to be not included in the solution. We note
that a somehow similar approach has been used in [26] to cope with a variant of
ClusterVD where we restrict the number of clusters in the resulting graph.
A Fast Branching Algorithm for Cluster Vertex Deletion 113

More precisely, the main observation in the proof of Theorem 1 is that, if for
some vertex v we know that there exists a minimum solution S not containing
v, in the neighbourhood of v the ClusterVD problem reduces to Vertex
Cover. Let us define N1 and N2 to be the vertices at distance 1 and 2 from v,
respectively, and define the auxiliary graph Hv to be a graph on N1 ∪ N2 having
an edge for each edge of G between N1 and N2 and for each non-edge in G[N1 ].
In other words, two vertices are connected by an edge in Hv if, together with
v, they form a P3 in G. We observe that a minimum solution S not containing
v needs to contain a vertex cover of Hv . Moreover, one can show that we may
greedily choose a vertex cover with inclusion-wise maximal intersection with N2 ,
as deleting vertices from N2 helps us resolve the remaining part of the graph.
Branching to find the ‘correct’ vertex cover of Hv is very efficient, with worst-
case (1, 2) (i.e., golden-ratio) branching vector. However, we do not have the
vertex v beforehand, and branching to obtain such a vertex is costly. Our ap-
proach is to get as much gain as possible from the vertex cover-style branching
on the graph Hv , to be able to balance the loss from some inefficient branches
used to obtain the vertex v to start with. Consequently, we employ involved
analysis of properties and branching algorithms for the auxiliary graph Hv .
Note that the algorithm of Theorem 1 can be pipelined with the kernelization
algorithm of 3-Hitting Set [1], yielding the following corollary.
Corollary 2. Cluster Vertex Deletion can be solved in O(1.9102k k 4 +nm)
time and polynomial space on an input (G, k) with |V (G)| = n and |E(G)| = m.
However, due to the O(nm) summand in the complexity of Corollary 2, for a
wide range of input instances the running time bound of Theorem 1 is better
than the one of Corollary 2. In fact, the advantage of our branching approach
is that the obtained dependency on the graph size in the running time is linear,
whereas with the approach of [26], one needs to spend at least quadratic time
either on computing weighted maximum matching or on kernelizing the instance.
In the full version [12] we also analyse the co-cluster setting, where one aims
at obtaining a co-cluster graph instead of a cluster one, and show that the linear
dependency on the size of the input can be maintained also in this case.
The paper is organised as follows. We give some preliminary definitions and
notation in Section 2. In Section 3 we analyse the structural properties of the
auxiliary graph Hv . Then, in Section 4 we prove Theorem 1, with the main tool
being a subroutine branching algorithm finding all relevant vertex covers of Hv .

2 Preliminaries

We use standard graph notation. All our graphs are undirected and simple. For
a graph G, by V (G) and E(G) we denote its vertex- and edge-set, respectively.
For v ∈ V (G), the set NG (v) = {u | uv ∈ E(G)} is the neighbourhood of v in G
and NG [v] = NG (v) ∪ {v} is the closed neighbourhood.
We extend these notions
to sets of vertices X ⊆ V (G) by NG [X] = v∈X NG [v] and NG (X) = NG [X]\X.
We omit the subscript if it is clear from the context. For a set X ⊆ V (G) we
114 A. Boral et al.

also deﬁne G[X] to be the subgraph induced by X and G \ X is a shorthand

for G[V (G) \ X]. An even cycle is a cycle with an even number of edges, and an
even path is a path with an even number of edges. A set X ⊆ V (G) is called
a vertex cover of G if G \ X is edgeless. By MinV(G) we denote the size of the
minimum vertex cover of G.
In all further sections, we assume we are given an instance (G, k) of Cluster
Vertex Deletion, where G = (V, E). That is, we use V and E to denote the
vertex- and edge-set of the input instance G.
A P3 is an ordered set of 3 vertices (u, v, w) such that uv, vw ∈ E and uw ∈ / E.
A graph is a cluster graph iﬀ it does not contain any P3 ; hence, in ClusterVD
we seek for a set of at most k vertices that hits all P3 s. We note also the following.

Lemma 3. Let G be a connected graph which is not a clique. Then, for every
v ∈ V (G), there is a P3 containing v.

Proof. Consider N (v). If there exist vertices u, w ∈ N (v) such that uw ∈

/ E(G)
then we have a P3 (u, v, w). Otherwise, since N [v] induces a clique, we must
have w ∈ N (N [v]) such that uw ∈ E(G) for some u ∈ N (v). Thus we have a P3 ,
(v, u, w) involving v.

If at some point a vertex v is ﬁxed in the graph G, we deﬁne sets N1 = N1 (v)

and N2 = N2 (v) as follows: N1 = NG (v) and N2 = NG (NG [v]). That is, N1 and
N2 are sets of vertices at distance 1 and 2 from v, respectively. For a ﬁxed v ∈ V ,
we deﬁne an auxiliary graph Hv with V (Hv ) = N1 ∪ N2 and

E(Hv ) = {uw | u, w ∈ N1 , uw ∈
/ E} ∪ {uw | u ∈ N1 , w ∈ N2 , uw ∈ E}.

Thus, Hv consists of the vertices in N1 and N2 along with non-edges among

vertices of N1 and edges between N1 and N2 . Note that N2 is an independent
set in Hv . Observe the following.
Lemma 4. For u, w ∈ N1 ∪ N2 , we have uw ∈ E(Hv ) iﬀ u, w and v form a P3
in G.

Proof. For every uw ∈ E(Hv ) with u, w ∈ N1 , (u, v, w) is a P3 in G. For uw ∈

E(Hv ) with u ∈ N1 and w ∈ N2 , (v, u, w) forms a P3 in G. In the other direction,
for any P3 in G of the form (u, v, w) we have u, w ∈ N1 and uw ∈ / E, thus
uw ∈ E(Hv ). Finally, for any P3 in G of the form (v, u, w) we have u ∈ N1 ,
w ∈ N2 and uw ∈ E, hence uw ∈ E(Hv ).

We call a subset S ⊆ V a solution when G \ S is a cluster graph, that is,

a collection of disjoint cliques. A solution with minimal cardinality is called a
minimum solution.
Our algorithm is a typical branching algorithm, that is, it consists of a number
of branching steps. In a step (A1 , A2 , . . . , Ar ), A1 , A2 , . . . , Ar ⊆ V , we indepen-
dently consider r subcases. In the i-th subcase we look for a minimum solution
S containing Ai : we delete Ai from the graph and decrease the parameter k
A Fast Branching Algorithm for Cluster Vertex Deletion 115

by |Ai |. If k becomes negative, we terminate the current branch and return a

negative answer from the current subcase.
The branching vector for a step (A1 , A2 , . . . , Ar ) is (|A1 |, |A2 |, . . . , |Ar |). It
is well-known (see e.g. [18]) that the number of ﬁnal subcases of a branching
algorithm
r is−|A bounded by O(ck ), where c is the largest positive root of the equation
1 = i=1 x i | among all branching steps (A1 , A2 , . . . , Ar ) in the algorithm.
At some places, the algorithm makes a greedy (but optimal) choice of including
a set A ⊆ V into the constructed solution. We formally treat it as length-one
branching step (A) with branching vector (|A|).

3 The Auxiliary Graph Hv

In this section we investigate properties of the auxiliary graph Hv . Hence, we
assume that a ClusterVD input (G, k) is given with G = (V, E), and a vertex
v ∈ V is ﬁxed.

3.1 Basic Properties

First, note that an immediate consequence of Lemma 4 is the following.
Corollary 5. Let S be a solution such that v ∈
/ S. Then S contains a vertex
cover of Hv .
In the other direction, the following holds.
Lemma 6. Let X be a vertex cover of Hv . Then, in G \ X, the connected com-
ponent of v is a clique.
Proof. Suppose the connected component of v in G \ X is not a clique. Then by
Lemma 3, there is a P3 involving v. Such a P3 is also present in G. However, by
Lemma 4, as X is a vertex cover of Hv , X intersects such a P3 , a contradiction.

Lemma 7. Let S be a solution such that v ∈
/ S. Denote by X the set S ∩V (Hv ).
Let Y be a vertex cover of Hv . Suppose that X ∩ N2 ⊆ Y ∩ N2 . Then T :=
(S \ X) ∪ Y is also a solution.
Proof. Since Y (and hence, T ∩ V (Hv )) is a vertex cover of Hv and v ∈ / T , we
know by Lemma 6 that the connected component of v in G \ T is a clique. If
T is not a solution, then there must be a P3 contained in Z \ T , where Z =
V \ ({v} ∪ N1 ). But since S ∩ Z ⊆ T ∩ Z, G \ S would also contain such a P3 .
Lemma 7 motivates the following deﬁnition. For vertex covers of Hv , X and
Y , we say that Y dominates X if |Y | ≤ |X|, Y ∩ N2 ⊇ X ∩ N2 and at least
one of these inequalities is sharp. Two vertex covers X and Y are said to be
equivalent if X ∩ N2 = Y ∩ N2 and |X ∩ N1 | = |Y ∩ N1 |. We note that the ﬁrst
aforementioned relation is transitive and strongly anti-symmetric, whereas the
second is an equivalence relation.
116 A. Boral et al.

As a corollary of Lemma 7, we have:

Corollary 8. Let S be a solution such that v ∈ / S. Suppose Y is a vertex cover
of Hv which either dominates or is equivalent to the vertex cover X = S ∩V (Hv ).
Then T := (S \ X) ∪ Y is also a solution with |T | ≤ |S|.

3.2 Special Cases of Hv

We now carefully study the cases where Hv has small vertex cover or has a
special structure, and discover some possible greedy decisions that can be made.
Lemma 9. Suppose X is a vertex cover of Hv . Then there is a minimum solu-
tion S such that either v ∈
/ S or |X \ S| ≥ 2.

Proof. Suppose S is a minimum solution such that v ∈ S and |X \ S| ≤ 1. We

are going to convert S to another minimum solution T that does not contain v.
Consider T := (S \ {v}) ∪ X. Clearly, |T | ≤ |S|. Since T contains X, a vertex
cover, by Lemma 6, the connected component of v in G \ T is a clique. Thus,
there is no P3 containing v. Since any P3 in G \ T which does not include v must
also be contained in G \ S, contradicting the fact that S is a solution, we obtain
that T is also a solution. Hence, T is a minimum solution.

Corollary 10. If MinV(Hv ) = 1 then there is a minimum solution S not con-

taining v.
Lemma 11. Let C be the connected component of G containing v, and assume
that neither C nor C \ {v} is a cluster graph. If X = {w1 , w2 } is a minimum
of G \ {v} that
vertex cover of Hv , then there exists a connected component C

is not a clique and C ∩ {w1 , w2 } = ∅.

Proof. Assume the contrary. Consider a component C of C \ {v} which is not

a clique. Since v must be adjacent to each connected component of C \ {v},
∩ N1 must be non-empty. For any w ∈ C
C ∩ N1 , we have that w1 , w2 = w
and ww1 , ww2 ∈ / E(G), since otherwise the result follows. If uw ∈ E(G) with
u ∈ N2 , then, as {w1 , w2 } is a vertex cover of Hv we must have u = w1 or u = w2 .
We would then have w1 or w2 contained in a non-clique C, contradicting our

assumption. Hence uw ∈ E(G) ⇒ u ∈ N1 . Thus C ⊆ N1 . As w1 and w2 are not
contained in C and they cover all edges in Hv , C must be an independent set in

Hv . In G \ {v}, therefore, C must be a clique, a contradiction.

We now investigate the case when Hv has a very speciﬁc structure. The motiva-
tion for this analysis will become clear in Section 4.3.
A seagull is a connected component of Hv that is isomorphic to a P3 with
middle vertex in N1 and endpoints in N2 . The graph Hv is called an s-skein if
it is a disjoint union of s seagulls and some isolated vertices.

Lemma 12. Let v ∈ V . Suppose that Hv is an s-skein. Then there is a minimum

solution S such that v ∈
/ S.
A Fast Branching Algorithm for Cluster Vertex Deletion 117

Proof. Let Hv consist of seagulls (x1 , y1 , z1 ), (x2 , y2 , z2 ), . . . , (xs , ys , zs ). That is,

the middle vertices yi are in N1 , while the endpoints xi and zi are in N2 . If s = 1,
{y1 } is a vertex cover of Hv and Corollary 10 yields the result. Henceforth, we
assume s ≥ 2.
Let X be the set N1 with all the vertices isolated in Hv removed. Clearly, X is
a vertex cover of Hv . Thus, we may use X as in Lemma 9 and obtain a minimum
solution S. If v ∈
/ S we are done, so let us assume |X \ S| ≥ 2. Take arbitrary i
such that yi ∈ X \ S. As |X \ S| ≥ 2, we may pick another j = i, yj ∈ X \ S. The
crucial observation from the deﬁnition of Hv is that (yj , yi , xi ) and (yj , yi , zi )
are P3 s in G. As yi , yj ∈
/ S, we have xi , zi ∈ S. Hence, since the choice of i was
arbitrary, we infer that for each 1 ≤ i ≤ s either yi ∈ S or xi , zi ∈ S, and,
consequently, S contains a vertex cover of Hv . By Lemma 6, S \ {v} is also a
solution in G, a contradiction.

4 Algorithm
In this section we show our algorithm for ClusterVD, proving Theorem 1. The
algorithm is a typical branching algorithm, where at each step we choose one
branching rule and apply it. In each subcase, a number of vertices is deleted,
and the parameter k drops by this number. If k becomes negative, the current
subcase is terminated with a negative answer. On the other hand, if k is non-
negative and G is a cluster graph, the vertices deleted in this subcase form a
solution of size at most k.

4.1 Preprocessing
At each step, we ﬁrst preprocess simple connected components of G.

Lemma 13. For each connected component C of G, in linear time, we can:

1. conclude that C is a clique; or
2. conclude that C is not a clique, but identify a vertex w such that C \ {w} is
a cluster graph; or
3. conclude that none of the above holds.

Proof. On each connected component C, we perform a depth-ﬁrst search. At

every stage, we ensure that the set of already marked vertices induces a clique.
When we enter a new vertex, w, adjacent to a marked vertex v, we attempt to
maintain the above invariant. We check if the number of marked vertices is equal
to the number neighbours of w which are marked; if so then the new vertex w is
marked. Since w is adjacent to every marked vertex, the set of marked vertices
remains a clique. Otherwise, there is a marked vertex u such that uw ∈ / E(G),
and we may discover it by iterating once again over edges incident to w. In
this case, we have discovered a P3 (u, v, w) and C is not a clique. At least one
of u, v, w must be deleted to make C into a cluster graph. We delete each one
of them, and repeat the algorithm (without further recursion) to check if the
118 A. Boral et al.

remaining graph is a cluster graph. If one of the three possibilities returns a

cluster graph, then (2) holds. Otherwise, (3) holds.
If we have marked all vertices in a component C while maintaining the invari-
ant that marked vertices form a clique, then the component C is a clique.

For each connected component C that is a clique, we disregard C. For each
connected component C that is not a clique, but C \ {w} is a cluster graph for
some w, we may greedily delete w from G: we need to delete at least one vertex
from C, and w hits all P3 s in C. Thus, henceforth we assume that for each
connected component C of G and for each v ∈ V (C), C \ {v} is not a cluster
graph. In other words, we assume that we need to delete at least two vertices to
solve each connected component of G.

4.2 Accessing Hv in Linear Time

Let us now fix a vertex v ∈ V and let C be its connected component in G. Note
that, as Hv contains parts of the complement of G, it may have size superlinear
in the size of G. Therefore we now develop a simple oracle access to Hv that
allows us to claim linear dependency on the graph size in the time bound.
Lemma 14. Given a designated vertex v ∈ V , one can in linear time either
compute a vertex w of degree at least 3 in Hv , together with its neighbourhood
in Hv , or explicitly construct the graph Hv .
Proof. First, mark vertices of N1 and N2 . Second, for each vertex of G compute
its number of neighbours in N1 and N2 . This information, together with |N1 |,
suffices to compute degrees of vertices in Hv . Hence, we may identify a vertex
of degree at least 3 in Hv , if it exists. For such a vertex w, computing NHv (w)
takes time linear in the size of G. If no such vertex w exists, the complement
of G[N1 ] has size linear in |N1 | and we may construct Hv in linear time in a
straightforward manner.

In the algorithm of Theorem 1, we would like to make a decision depending on
the size of the minimum vertex cover of Hv . By the preprocessing step, C is not
a clique, and by Lemma 3, Hv contains at least one edge, thus MinV(G) ≥ 1. We
now note that we can find a small vertex cover of G in linear time.
Lemma 15. In linear time, we can determine whether Hv has a minimum ver-
tex cover of size 1, of size 2, or of size at least 3. Moreover, in the first two cases
we can find the vertex cover in the same time bound.
Proof. We use Lemma 14 to find, in linear time, a vertex w with degree at least
3, or generate Hv explicitly. In the latter case, Hv has vertices of degree at most
2, and it is straightforward to compute its minimum vertex cover in linear time.
If we find a vertex w of degree at least 3 in Hv , then w must be in any vertex
cover of size at most 2. We proceed to delete w and restart the algorithm of
Lemma 14 on the remaining graph to check if Hv in G \ w has a vertex cover
of size 0 or 1. We perform at most 2 such restarts. Finally, if we do not find
a vertex cover of size at most 2, it must be the case that the minimum vertex
cover contains at least 3 vertices.

A Fast Branching Algorithm for Cluster Vertex Deletion 119

4.3 Subroutine: Branching on Hv

We are now ready to present a branching algorithm that guesses the ‘correct’
vertex cover of Hv , for a ﬁxed vertex v. That is, we are now working in the setting
where we look for a minimum solution to ClusterVD on (G, k) not containing
v, thus, by Corollary 5, containing a vertex cover of Hv . Our goal is to branch
into a number of subcases, in each subcase picking a vertex cover of Hv . By
Corollary 8, our branching algorithm, to be correct, needs only to generate at
least one element from each equivalence class of the ‘equivalent’ relation, among
maximal elements in the ‘dominate’ relation.
The algorithm consists of a number of branching steps; in each subcase of
each step we take a number of vertices into the constructed vertex cover of Hv
and, consequently, into the constructed minimum solution to ClusterVD on
G. At any point, the ﬁrst applicable rule is applied.
First, we disregard isolated vertices in Hv . Second, we take care of large-degree
vertices.

Rule 1. If there is a vertex u ∈ V (Hv ) with degree at least 3 in Hv , include either

u or NHv (u) into the vertex cover. That is, use the branching step (u, NHv (u)).

Note that Rule 1 yields a branching vector (1, d), where d ≥ 3 is the degree
of u in Hv . Henceforth, we can assume that vertices have degree 1 or 2 in Hv .
Assume there exists u ∈ N1 of degree 1, with uw ∈ E(Hv ). Moreover, assume
there exists a minimum solution S containing u. If w ∈ S, then, by Lemma 7,
S \ {u} is also a solution, a contradiction. If w ∈ N2 \ S, then (S \ {u}) ∪ {w}
dominates S. Finally, if w ∈ N1 \ S, then (S \ {u}) ∪ {w} is equivalent to S.
Hence, we infer the following greedy rule.

Rule 2. If there is a vertex u ∈ N1 of degree 1 in Hv , include NHv (u) into the

vertex cover without branching. (Formally, use the branching step (NHv (u)).)

Now we assume vertices in N1 are of degree exactly 2 in Hv . Suppose we have

vertices u, w ∈ N1 with uw ∈ E(Hv ). We would like to branch on u as in Rule 1,
including either u or NHv (u) into the vertex cover. However, note that in the
case where u is deleted, Rule 2 is triggered on w and consequently the other
neighbour of w is deleted. Hence, we infer the following rule.

Rule 3. If there are vertices u, w ∈ N1 , uw ∈ E(Hv ) then include either NHv (w)
or NHv (u) into the vertex cover, that is, use the branching step (NHv (w), NHv (u)).

Note that Rule 3 yields the branching vector (2, 2).

We are left with the case where the maximum degree of Hv is 2, there are
no edges with both endpoints in N1 , and no vertices of degree one in N1 . Hence
Hv must be a collection of even cycles and even paths (recall that N2 is an
independent set in Hv ). On each such cycle C, of 2l vertices, the vertices of N1
and N2 alternate. Note that we must use at least l vertices for the vertex cover
of C. By Lemma 7 it is optimal to greedily select the l vertices in C ∩ N2 .
120 A. Boral et al.

Rule 4. If there is an even cycle C in Hv with every second vertex in N2 , include

C ∩N2 into the vertex cover without branching. (Formally, use the branching step
(C ∩ N2 ).)
For an even path P of length 2l, we have two choices. If we are allowed to use
l + 1 vertices in the vertex cover of P , then, by Lemma 7, we may greedily take
P ∩ N2 . If we may use only l vertices, the minimum possible number, we need
to choose P ∩ N1 , as it is the unique vertex cover of size l of such path. Hence,
we have an (l, l + 1) branch with our last rule.
Rule 5. Take the longest possible even path P in Hv and either include P ∩ N1
or P ∩ N2 into the vertex cover. That is, use the branching step (P ∩ N1 , P ∩ N2 ).
In Rule 5, we pick the longest possible path to avoid the branching vector (1, 2)
as long as possible; this is the worst branching vector in the algorithm of this
section. Moreover, note that if we are forced to use the (1, 2) branch, the graph
Hv has a very speciﬁc structure.
Lemma 16. If the algorithm of Section 4.3 may only use a branch with the
branching vector (1, 2), then Hv is an s-skein for some s ≥ 1.
We note that the statement of Lemma 16 is our sole motivation for introducing
the notion of skeins and proving their properties in Lemma 12.
We conclude this section with an observation that the oracle access to Hv
given by Lemma 14 allows us to execute a single branching step in linear time.

4.4 Main Algorithm

We are now ready to present our algorithm for Theorem 1. We assume the
preprocessing (Lemma 13) is done. Pick an arbitrary vertex v. We first run the
algorithm of Lemma 15 to determine if Hv has a small minimum vertex cover.
Then we run the algorithm of Lemma 14 to check if Hv is an s-skein for some s.
We consider the following cases.
1. MinV(Hv ) = 1 or Hv is an s-skein for some s. Then, by Corollary 10
and Lemma 12, we know there exists a minimum solution not containing v.
Hence, we run the algorithm of Section 4.3 on Hv .
2. MinV(Hv ) = 2 and Hv is not a 2-skein.1 Assume the application of
Lemma 15 returned a vertex cover X = {w1 , w2 } of Hv . By Lemma 9, we
may branch into the following two subcases: in the first we look for minimum
solutions containing v and disjoint with X, and in the second, for minimum
solutions not containing v.
In the first case, we first delete v from the graph and decrease k by one.
Then we check whether the connected component containing w1 or w2 is not
a clique. By Lemma 11, for some w ∈ {w1 , w2 }, the connected component
of G \ {v} containing w is not a clique; finding such w clearly takes linear
time. We invoke the algorithm of Section 4.3 on Hw .
In the second case, we invoke the algorithm of Section 4.3 on Hv .
1
Note that the size of a minimum vertex cover of an s-skein is exactly s, so this case
is equivalent to ‘MinV(Hv ) = 2 and Hv is not an s-skein for any s’.
A Fast Branching Algorithm for Cluster Vertex Deletion 121

3. MinV(Hv ) ≥ 3 and Hv is not an s-skein for any s ≥ 3. We branch into

two cases: we look for a minimum solution containing v or not containing v.
In the ﬁrst branch, we simply delete v and decrease k by one. In the second
branch, we invoke the algorithm of Section 4.3 on Hv .

4.5 Complexity Analysis

In the previous discussion we have argued that invoking each branching step
takes linear time. As in each branch we decrease the parameter k by at least
one, the depth of the recursion is at most k. In this section we analyse branching
vectors occurring in our algorithm. To ﬁnish the proof of Theorem 1 we need
to show that the largest positive root of the equation 1 = i=1 x−ai among all
r

possible branching vectors (a1 , a2 , . . . , ar ) is strictly less than 1.9102.

As the number of resulting branching vectors in the analysis is rather large,
we use a Python script for automated analysis2 . The main reason for a large
number of branching vectors is that we need to analyse branchings on the graph
Hv in case when we consider v not to be included in the solution. Let us now
proceed with formal arguments.

Analysis of the Algorithm of Section 4.3. In a few places, the algorithm

of Section 4.3 is invoked on the graph Hv and we know that MinV(Hv ) ≥ h for
some h ∈ {1, 2, 3}. Consider the branching tree T of this algorithm. For a node
x ∈ V (T), the depth of x is the number of vertices of Hv deleted on the path
from x to the root. We mark some nodes of T. Each node of depth less than h
is marked. Moreover, if a node x is of depth d < h and the branching step at
node x has branching vector (1, 2), we infer that graph Hv at this node is an
s-skein for some s ≥ h − d, all descendants of x in V (T) are also nodes with
branching steps with vectors (1, 2). In this case, we mark all descendants of x
that are within distance (in T) less than h − d. Note that in this way we may
mark some descendants of x of depth equal or larger than h.
We split the analysis of an application of the algorithm of Section 4.3 into two
phases: the ﬁrst one contains all branching steps performed on marked nodes,
and the second on the remaining nodes. In the second phase, we simply observe
that each branching step has branching vector not worse than (1, 2). In the ﬁrst
phase, we aim to write a single branching vector summarizing the phase, so that
with its help we can balance the loss from other branches when v is deleted.
We remark that, although in the analysis we aggregate some branching steps
to prove better time bound, we always aggregate only a constant number of
branches (that is, we analyse the branching on marked vertices only for constant
h). Consequently, we maintain a linear dependency on the size of the graph in
the running time bound.
The main property of the marked nodes in T is that their existence is granted
by the assumption MinV(Hv ) ≥ h. That is, each leaf of T has depth at least
2
Available at http://www.mimuw.edu.pl/~ malcin/research/cvd and in the full ver-
sion [12].
122 A. Boral et al.

h, and, if at some node x of depth d < h the graph Hv is an s-skein, we infer

that s ≥ h − d (as the size of minimum vertex cover of an s-skein is s) and the
algorithm performs s independent branching steps with branching vectors (1, 2)
in this case. Overall, no leaf of T is marked.
To analyse such branchings for h = 2 and h = 3 we employ the Python
script. The procedure branch Hv generates all possible branching vectors for the
ﬁrst phase, assuming the algorithm of Section 4.3 is allowed to pick branching
vectors (1), (1, 3), (2, 2) or (1, 2) (option allow skein enables/disables the use
of the (1, 2) vector in the ﬁrst branch). Note that all other vectors described in
Section 4.3 may be simulated by applying a number of vectors (1) after one of
the aforementioned branching vectors.

Analysis of the Algorithm of Section 4.4

Case 1. Here the algorithm of Section 4.3 performs branchings with vectors not
worse than (1, 2).

Case 2. If v is deleted, we apply the algorithm of Section 4.3 to Hw , yielding at

least one branching step (as the connected component with w is not a clique).
Hence, in this case the outcoming branching vector is any vector that came out
of the algorithm of Section 4.3, with all entries increased by one (for the deletion
of v). Recall that in the algorithm of Section 4.3, the worst branching vector is
(1, 2), corresponding to the case of Hw being a skein. Consequently, the words
branching vector if v is deleted is (2, 3).
If v is not deleted, the algorithm of Section 4.3 is applied to Hv . The script
invokes the procedure branch Hv on h = 2 and allow skein=False to obtain a
list of possible branching vectors. For each such vector, we append entries (2, 3)
from the subcase when v is deleted.

Case 3. The situation is analogous to the previous case. The script invokes
the procedure branch Hv on h = 3 and allow skein=False to obtain a list of
possible branching vectors. For each such vector, we append the entry (1) from
the subcase when v is deleted.

Summary. We infer that the largest root of the equation 1 = ri=1 x−ai occurs
for branching vector (1, 3, 3, 4, 4, 5) and is less than 1.9102. This branching vector
corresponds to Case 3 and the algorithm of Section 4.3, invoked on Hv , ﬁrst
performs a branching step with the vector (1, 3) and in the branch with 1 deleted
vertex, ﬁnds Hv to be a 2-skein and performs two independent branching steps
with vectors (1, 2).
This analysis concludes the proof of Theorem 1. We remark that the worst
branching vector in Case 2 is (2, 2, 3, 3, 3) (with solution x < 1.8933), corre-
sponding to the case with single (1, 2)-branch when v is deleted and a 2-skein in
the case when v is kept. Obviously, the worst case in Case 1 is the golden-ratio
branch (1, 2) with solution x < 1.6181.
A Fast Branching Algorithm for Cluster Vertex Deletion 123

5 Conclusions and Open Problems

We have presented a new branching algorithm for Cluster Vertex Deletion.

We hope our work will trigger a race for faster FPT algorithms for ClusterVD,
as it was in the case of the famous Vertex Cover problem.
Repeating after Hüﬀner et al. [26], we would like to re-pose here the question
for a linear vertex-kernel for ClusterVD. As ClusterVD is a special case of
the 3-Hitting Set problem, it admits an O(k 2 )-vertex kernel in the unweighted
case and an O(k 3 )-vertex kernel in the weighted one [1, 2]. However, Cluster
Editing is known to admit a much smaller 2k-vertex kernel, so there is a hope
for a similar result for ClusterVD.

References
1. Abu-Khzam, F.N.: A kernelization algorithm for d-hitting set. Journal of Computer
and System Sciences 76(7), 524–531 (2010)
2. Abu-Khzam, F.N., Fernau, H.: Kernels: Annotated, proper and induced. In: Bod-
laender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 264–275.
Springer, Heidelberg (2006)
3. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Rank-
ing and clustering. Journal of the ACM 55(5), 23:1–23:27 (2008)
4. Alon, N., Makarychev, K., Makarychev, Y., Naor, A.: Quadratic forms on graphs.
In: Proceedings of STOC 2005, pp. 486–493. ACM (2005)
5. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56,
89–113 (2004)
6. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal
of Computational Biology 6(3/4), 281–297 (1999)
7. Böcker, S.: A golden ratio parameterized algorithm for cluster editing. Journal of
Discrete Algorithms 16, 79–89 (2012)
8. Böcker, S., Briesemeister, S., Bui, Q.B.A., Truß, A.: Going weighted: Parameterized
algorithms for cluster editing. Theoretical Computer Science 410(52), 5467–5480
(2009)
9. Böcker, S., Briesemeister, S., Klau, G.W.: Exact algorithms for cluster editing:
Evaluation and experiments. Algorithmica 60(2), 316–334 (2011)
10. Böcker, S., Damaschke, P.: Even faster parameterized cluster deletion and cluster
editing. Information Processing Letters 111(14), 717–721 (2011)
11. Bodlaender, H.L., Fellows, M.R., Heggernes, P., Mancini, F., Papadopoulos, C.,
Rosamond, F.A.: Clustering with partial information. Theoretical Computer Sci-
ence 411(7-9), 1202–1211 (2010)
12. Boral, A., Cygan, M., Kociumaka, T., Pilipczuk, M.: Fast branching algorithm for
cluster vertex deletion. CoRR, abs/1306.3877 (2013)
13. Cai, L.: Fixed-parameter tractability of graph modiﬁcation problems for hereditary
properties. Information Processing Letters 58(4), 171–176 (1996)
14. Charikar, M., Wirth, A.: Maximizing quadratic programs: Extending
Grothendieck’s inequality. In: Proceedings of FOCS 2004, pp. 54–60. IEEE
Computer Society (2004)
15. Damaschke, P.: Fixed-parameter enumerability of cluster editing and related prob-
lems. Theory of Computing Systems 46(2), 261–283 (2010)
124 A. Boral et al.

16. Fellows, M.R., Guo, J., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: Graph-
based data clustering with overlaps. Discrete Optimization 8(1), 2–17 (2011)
17. Fernau, H.: A top-down approach to search-trees: Improved algorithmics for 3-
hitting set. Algorithmica 57(1), 97–118 (2010)
18. Fomin, F.V., Kratsch, D.: Exact Exponential Algorithms. Texts in theoretical com-
puter science. Springer, Heidelberg (2010)
19. Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds
for parameterized complexity of cluster editing. In: Proceedings of STACS 2013.
LIPIcs, vol. 20, pp. 32–43. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik,
Leibniz-Zentrum fuer Informatik (2013)
20. Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters.
Theory of Computing 2(1), 249–266 (2006)
21. Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Automated generation of search
tree algorithms for hard graph modification problems. Algorithmica 39(4), 321–347
(2004)
22. Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Graph-modeled data cluster-
ing: Exact algorithms for clique generation. Theory of Computing Systems 38(4),
373–392 (2005)
23. Guo, J., Kanj, I.A., Komusiewicz, C., Uhlmann, J.: Editing graphs into disjoint
unions of dense clusters. Algorithmica 61(4), 949–970 (2011)
24. Guo, J., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: A more relaxed model
for graph-based data clustering: s-plex cluster editing. SIAM Journal of Discrete
Mathematics 24(4), 1662–1683 (2010)
25. Hüffner, F., Komusiewicz, C., Moser, H., Niedermeier, R.: Fixed-parameter algo-
rithms for cluster vertex deletion. In: Laber, E.S., Bornstein, C., Nogueira, L.T.,
Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 711–722. Springer, Heidelberg
(2008)
26. Hüffner, F., Komusiewicz, C., Moser, H., Niedermeier, R.: Fixed-parameter algo-
rithms for cluster vertex deletion. Theory of Computing Systems 47(1), 196–217
(2010)
27. Komusiewicz, C.: Parameterized Algorithmics for Network Analysis: Clustering &
Querying. PhD thesis, Technische Universität Berlin (2011),
http://fpt.akt.tu-berlin.de/publications/diss-komusiewicz.pdf
28. Komusiewicz, C., Uhlmann, J.: Alternative parameterizations for cluster editing.
In: Černá, I., Gyimóthy, T., Hromkovič, J., Jefferey, K., Králović, R., Vukolić, M.,
Wolf, S. (eds.) SOFSEM 2011. LNCS, vol. 6543, pp. 344–355. Springer, Heidelberg
(2011)
29. Protti, F., da Silva, M.D., Szwarcfiter, J.L.: Applying modular decomposition
to parameterized cluster editing problems. Theory of Computing Systems 44(1),
91–104 (2009)
30. Reed, B.A., Smith, K., Vetta, A.: Finding odd cycle transversals. Operations Re-
search Letters 32(4), 299–301 (2004)
31. Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete
Applied Mathematics 144(1-2), 173–182 (2004)
Separation Logic with One Quantified VariableÆ

Stéphane Demri1 , Didier Galmiche2 ,

Dominique Larchey-Wendling2, and Daniel Méry2
1
New York University & CNRS
2
LORIA – CNRS – University of Lorraine

Abstract. We investigate first-order separation logic with one record field re-
stricted to a unique quantified variable (1SL1). Undecidability is known when
the number of quantified variables is unbounded and the satisfiability problem is
PSPACE -complete for the propositional fragment. We show that the satisfiability
problem for 1SL1 is PSPACE -complete and we characterize its expressive power
by showing that every formula is equivalent to a Boolean combination of atomic
properties. This contributes to our understanding of fragments of first-order sepa-
ration logic that can specify properties about the memory heap of programs with
singly-linked lists. When the number of program variables is fixed, the complex-
ity drops to polynomial time. All the fragments we consider contain the magic
wand operator and first-order quantification over a single variable.

1 Introduction

Separation Logic for Verifying Programs with Pointers. Separation logic [20] is a well-
known logic for analysing programs with pointers stemming from BI logic [14]. Such
programs have specific errors to be detected and separation logic is used as an asser-
tion language for Hoare-like proof systems [20] that are dedicated to verify programs
manipulating heaps. Any procedure mechanizing the proof search requires subroutines
that check the satisfiability or the validity of formulæ from the assertion language. That
is why, characterizing the computational complexity of separation logic and its frag-
ments and designing optimal decision procedures remain essential tasks. Separation
logic contains a structural separating connective and its adjoint (the separating impli-
cation, also known as the magic wand). The main concern of the paper is to study a
non-trivial fragment of first-order separation logic with one record field as far as ex-
pressive power, decidability and complexity are concerned. Herein, the models of sep-
aration logic are pairs made of a variable valuation (store) and a partial function with
finite domain (heap), also known as memory states.
Decidability and Complexity. The complexity of satisfiability and model-checking prob-
lems for separation logic fragments have been quite studied [6,20,7] (see also new de-
cidability results in [13] or undecidability results in [5,16] in an alternative setting).
Separation logic is equivalent to a Boolean propositional logic [17,18] if first-order
quantifiers are disabled. Separation logic without first-order quantifiers is decidable, but
Æ
Work partially supported by the ANR grant DynRes (project no. ANR-11-BS02-011) and by
the EU Seventh Framework Programme under grant agreement No. PIOF-GA-2011-301166
(DATAVERIF).

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 125–138, 2014.

c Springer International Publishing Switzerland 2014
126 S. Demri et al.

it becomes undecidable with first-order quantifiers [6]. For instance, model-checking

and satisfiability for propositional separation logic are PSPACE-complete problems [6].
Decidable fragments with first-order quantifiers can be found in [11,4]. However, these
known results crucially rely on the memory model addressing cells with two record
fields (undecidability of 2SL in [6] is by reduction to the first-order theory of a finite
binary relation). In order to study decidability or complexity issues for separation logic,
two tracks have been observed in the literature. There is the verification approach with
decision procedures for fragments of practical use, see e.g. [2,7,12]. Alternatively, frag-
ments, extensions or variants of separation logic are considered from a logical view-
point, see e.g. [6,5,16].
Our Contributions. In this paper, we study first-order separation logic with one quan-
tified variable, with an unbounded number of program variables and with one record
field (herein called 1SL1). We introduce test formulæ that state simple properties about
the memory states and we show that every formula in 1SL1 is equivalent to a Boolean
combination of test formulæ, extending what was done in [18,3] for the propositional
case. For instance, separating connectives can be eliminated in a controlled way as well
as first-order quantification over the single variable. In that way, we show a quantifier
elimination property similar to the one for Presburger arithmetic. This result extends
previous ones on propositional separation logic [17,18,3] and this is the first time that
this approach is extended to a first-order version of separation logic with the magic
wand operator. However, it is the best we can hope for since 1SL with two quantified
variables and no program variables (1SL2) has been recently shown undecidable in [9].
Of course, other extensions of 1SL1 could be considered, for instance to add a bit of
arithmetical constraints, but herein we focus on 1SL1 that is theoretically nicely de-
signed, even though it is still unclear how much 1SL1 is useful for formal verification.
We also establish that the satisfiability problem for Boolean combinations of test for-
mulæ is NP-complete thanks to a saturation algorithm for the theory of memory states
with test formulæ, paving the way to use SMT solvers to decide 1SL1 (see e.g. the use
of such solvers in [19]).
Even though Boolean combinations of test formulæ and 1SL1 have identical expres-
sive power, we obtain PSPACE-completeness for model-checking and satisfiability in
1SL1. The conciseness of 1SL1 explains the difference between these two complex-
ities. PSPACE-completeness is still a relatively low complexity but this result can be
extended with more than one record field (but still with one quantified variable). This
is the best we can hope for with one quantified variable and with the magic wand, that
is notoriously known to easily increase complexity. We also show that 1SL1 with a
bounded number of program variables has a satisfiability problem that can be solved in
polynomial time.
Omitted proofs can be found in the technical report [10].

2 Preliminaries
2.1 First-Order Separation Logic with One Selector 1SL
Let PVAR x1 , x2 , . . . be a countably infinite set of program variables and FVAR
u1 , u2, . . . be a countably infinite set of quantified variables. A memory state (also
Separation Logic with One Quantified Variable 127

called a model) is a pair s, h such that s is a variable valuation of the form s : PVAR
N (the store) and h is a partial function h : N N with finite domain (the heap) and we
write domh to denote its domain and ranh to denote its range. Two heaps h1 and
h2 are said to be disjoint, noted h1 h2 , if their domains are disjoint; when this holds,
we write h1
h2 to denote the heap corresponding to the disjoint union of the graphs
of h1 and h2 , hence domh1
h2 domh1 domh2 . When the domains of h1
and h2 are not disjoint, the composition h1
h2 is not defined even if h1 and h2 have
the same values on domh1 domh2 .
Formulæ of 1SL are built from expressions of the form e :: x u where x PVAR
and u FVAR, and atomic formulæ of the form π :: e e e e emp. Formulæ
are defined by the grammar A :: π A B A A B A B u A, where
u FVAR. The connective is separating conjunction and is separating implication,
usually called the magic wand. The size of a formula A, written A , is defined as
the number of symbols required to write it. An assignment is a map f : FVAR
N. The satisfaction relation is parameterized by assignments (clauses for Boolean
connectives are omitted):
– s, h f emp iff domh .
– s, h f e e iff e e , with x sx and u fu.
def def

– s, h f e e iff e domh and he e .

– s, h f A1 A2 iff h h1
h2 , s, h1 f A1 , s, h2 f A2 for some h1 , h2 .
– s, h f A1 A2 iff for all h , if h h & s, h f A1 then s, h
h f A2 .
– s, h f u A iff there is l N such that s, h f u l A where fu l is the

assignment equal to f except that u takes the value l.

Whereas ‘’ is clearly a first-order quantifier, the connectives and are known to
be second-order quantifiers. In the paper, we show how to eliminate the three connec-
tives when only one quantified variable is used.
We write 1SL0 to denote the propositional fragment of 1SL, i.e. without any oc-
currence of a variable from FVAR. Similarly, we write 1SL1 to denote the fragment of
1SL restricted to a single quantified variable, say u. In that case, the satisfaction rela-
tion can be denoted by l where l is understood as the value for the variable under the
assignment.
Given q 1 and A in 1SL built over x1 ,. . . , xq , we define its memory thresh-
old thq, A: thq, A 1 for atomic formula A; thq, A1 A2 maxthq, A1 ,
def def

thq, A2 ; thq, A1 thq, A1 ; thq, u A1 thq, A1 ; thq, A1 A2

def def def

thq, A1 thq, A2 ; thq, A1 A2 q maxthq, A1 , thq, A2 .

def

Lemma 1. Given q 1 and a formula A in 1SL, we have 1 thq, A q A .

Let L be a logic among 1SL, 1SL1 and 1SL0. As usual, the satisfiability problem for
L takes as input a formula A from L and asks whether there is a memory state s, h
and an assignment f such that s, h f A. The model-checking problem for L takes as
input a formula A from L, a memory state s, h and an assignment f for free variables
from A and asks whether s, h f A. When checking the satisfiability status of a
formula A in 1SL1, we assume that its program variables are contained in x1 , . . . , xq
for some q 1 and the quantified variable is u. So, PVAR is unbounded but as usual,
when dealing with a specific formula, the set of program variables is finite.
128 S. Demri et al.

Theorem 2. [6,4,9] Satisfiability and model-checking problems for 1SL0 are PSPACE-
complete, satisfiability problem for 1SL is undecidable, even restricted to two variables.

2.2 A Bunch of Properties Stated in 1SL1

The logic 1SL1 allows to express different types of properties on memory states. The
examples below indeed illustrate the expressivity of 1SL1, and in the paper we charac-
terize precisely what can be expressed in 1SL1.

– The domain of the heap has at least k elements: emp emp (k times).
– The variable xi is allocated in the heap: allocxi xi xi .
def

– The variable xi points to a location that is a loop: toloopxi u xi

def

u u u; the variable xi points to a location that is allocated: toallocxi

def

u xi u allocu.
– Variables xi and xj point to a shared location: convxi , xj u xi u
def

xj u; there is a location between xi and xj : inbetweenxi , xj u xi

def

u u x j .
– Location interpreted by xi has exactly one predecessor can be expressed in 1SL1:
u u xi u u xi u u xi.
– Heap has at least 3 self-loops: u u u u u u u u u.

2.3 At the Heart of Domain Partitions

Given s, h and a finite set V x1 , . . . , xq PVAR, we introduce two partitions of
domh depending on V: one partition takes care of self-loops and predecessors of inter-
pretations of program variables, the other one takes care of locations closely related to
interpretations of program variables (to be defined below). This allows us to decompose
the domain of heaps in such a way that we can easily identify the properties that can be
indeed expressed in 1SL1 restricted to the variables in V. We introduce a first partition
of the domain of h by distinguishing the self-loops and the predecessors of variable
interpretations on the one hand, and the remaining locations in the domain on the other
hand: preds, h i preds, h, i with preds, h, i l : hl sxi for every
def def

i 1, q ; loops, h l domh : hl l; rems, h domhpreds, h

def def

loops, h. So, obviously domh rems, h preds, h loops, h. The sets
preds, h and loops, h are not necessarily disjoint. As a consequence of h being a
partial function, the sets preds, h, i and preds, h, j intersect only if sxi sxj ,
in which case preds, h, i preds, h, j .
We introduce a second partition of domh by distinguishing the locations related
to a cell involving a program variable interpretation on the one hand, and the remain-
ing locations in the domain on the other hand. So, the sets below are also implicitly
parameterized by V: ref s, h domh sV , accs, h domh hsV ,
def def

♥s, h ref s, h accs, h, ♥s, h domh♥s, h. The core of the mem-
def def

ory state, written ♥s, h, contains the locations l in domh such that either l is the
Separation Logic with One Quantified Variable 129

interpretation of a program variable or it is an image by h of a program variable (that

is also in the domain). In the sequel, we need to consider locations that belong to the
intersection of sets from different partitions.
Here are their formal definitions:
x4

– pred♥ s, h preds, h♥s, h,

def
r r

pred♥ s, h, i preds, h, i♥s, h,

def

– loop♥ s, h loops, h♥s, h,

x3 x2
def
♥ p ♥
rem♥ s, h rems, h♥s, h.
def

For instance, pred♥ s, h contains the set of locations

x1
♥
l from domh, that are predecessors of a variable in-

terpretation but no program variable x in x1 , . . . , xq

♥ satisfies sx l or hsx l (which means
l ! ♥s, h).

The above figure presents a memory state s, h with the variables x1 , . . . ,x4 . Nodes
labelled by ’♥’ [resp. ’’, ’p’, ’r’] belong to ♥s, h [resp. loop♥ s, h, pred♥ s, h,
rem♥ s, h]. The introduction of the above sets provides a canonical way to decompose
the heap domain, which will be helpful in the sequel.

Lemma 3 (Canonical decomposition). For all stores s and heaps h, the following
identity holds: domh ♥s, h pred♥ s, h loop♥ s, h rem♥ s, h.

The proof is by easy verification since preds, h loops, h ♥s, h.

Proposition 4.
pred s, h, i i 1, q is a partition of pred s, h.
♥ ♥

Remark that both pred♥ s, h, i or pred♥ s, h, i pred♥ s, h, j are possible.
Below, we present properties about the canonical decomposition.
Proposition 5. Let s, h, h1 , h2 be such that h h1
h2 . Then, ♥s, h domh1
♥s, h1 Δs, h1 , h2 with Δs, h1 , h2 domh1 h2 sV sV h1 sV
def

(where X NX).
def

The set Δs, h1 , h2 contains the locations belonging to the core of h and to the
domain of h1 , without being in the core of h1 . Its expression in Proposition 5 uses only
basic set-theoretical operations. From Proposition 5, we conclude that ♥s, h1
h2
can be different from ♥s, h1 ♥s, h2 .

2.4 How to Count in 1SL1

Let us define a formula that states that loop♥ s, h has size at least k. First,
we consider the following set of formulæ: Tq allocx1 , . . . , allocxq
toallocx1 , . . . , toallocxq . For any map f : Tq 0, 1, we associate a for-

mula Af defined by Af B B Tq and fB 1 B B Tq and fB
def

0. Formula Af is a conjunction made of literals from Tq such that a positive literal B
occurs exactly when fB 1 and a negative literal B occurs exactly when fB 0.

We write Apos
f to denote B B Tq and fB 1.
130 S. Demri et al.

Let us define the formula # loop k by u u u u u u (repeated

k times). We can express that loop♥ s, h has size at least k (where k 1) with

# loop♥ k f Af Apos
def
f # loop k , where f spans over the finite set
of maps Tq 0, 1. So, the idea behind the construction of the formula is to divide
the heap into two parts: one subheap contains the full core. Then, any loop in the other
subheap is out of the core because of the separation.

Lemma 6. (I) For any k 1, there is a formula # loop♥ k s.t. for any s, h,
we have s, h # loop♥ k iff cardloop♥ s, h k. (II) For any k 1 and
any i 1, q , there is a formula # predi♥ k s.t. for any s, h, we have s, h
# predi♥ k iff cardpred♥ s, h, i k. (III) For any k 1, there is a # rem♥ k
s.t. for any s, h, we have s, h # rem♥ k iff cardrem♥ s, h k.

All formulae from the above lemma have threshold polynomial in q α.

3 Expressive Completeness

3.1 On Comparing Cardinalities: Equipotence

We introduce the notion of equipotence and state a few properties about it. This will
be useful in the forthcoming developments. Let α N. We say that two finite sets X
and Y are α-equipotent and we write X "α Y if, either cardX cardY or both
cardX and cardY are greater that α. The equipotence relation is also decreasing,
i.e. "α2 "α1 holds for all α1 α2 . We state below two lemmas that will be helpful
in the sequel.

Lemma 7. Let α N and X, X , Y, Y be finite sets such that X X , Y Y

, X "α Y and cardX cardY hold. Then X X "α Y Y holds.
Lemma 8. Let α1 , α2 N and X, X , Y0 be finite sets s.t. X X "α α Y0 holds.
Then there are two finite sets Y, Y s.t. Y0 Y Y , X "α Y and X "α Y hold.
1 2

1 2

3.2 All We Need Is Test formulæ

Test formulæ express simple properties about the memory states; this includes proper-
ties about program variables but also global properties about numbers of predecessors
or loops, following the decomposition in Section 2.3. These test formulæ allow us to
characterize the expressive power of 1SL1, similarly to what has been done in [17,18,3]
for 1SL0. Since every formula in 1SL1 is shown equivalent to a Boolean combination
of test formulæ (forthcoming Theorem 19), this process can be viewed as a means to
eliminate separating connectives in a controlled way; elimination is not total since the
test formulæ require such separating connectives. However, this is analogous to quan-
tifier elimination in Presburger arithmetic for which simple modulo constraints need to
be introduced in order to eliminate the quantifiers (of course, modulo constraints are
defined with quantifiers but in a controlled way too).
Separation Logic with One Quantified Variable 131

Let us introduce the test formulæ. We distinguish two types, leading to distinct sets.
There are test formulæ that state properties about the direct neighbourhood of program
variables whereas others state global properties about the memory states. The test for-
mulæ of the form # predi♥ k are of these two types but they will be included in
Sizeα since these are cardinality constraints.
Definition 9 (Test formulæ). Given q, α 1 , we define sets of test formulæ:

Equality xi xj i, j 1, q

def

Pattern xi xj , convxi , xj , inbetweenxi, xj i, j 1, q

def

toallocxi , toloopxi , allocxi i 1, q

Extra u u, allocu xi u, xi u, u xi i 1, q
u def

Sizeα # predi♥ k i 1, q , k 1, α

def

# loop♥ k, # rem♥ k k 1, α

Basic Equality Pattern Testα Basic Sizeα
def def

Basic Basic Extra

u u
Testuα Testα Extrau
def def

Test formulæ express simple properties about the memory states, even though quite
large formulæ in 1SL1 may be needed to express them, while being of memory thresh-
old polynomial in q α. An atom is a conjunction of test formulæ or their negation (lit-
eral) such that each formula from Testuα occurs once (saturated conjunction of literals).
Any memory state satisfying an atom containing allocx1 # pred1♥ 1
# loop♥ 1 # rem♥ 1 (with q 1) has an empty heap.
Lemma 10. Satisfiability of conjunctions of test formulæ or their negation can be
checked in polynomial time (q and α are not fixed and the bounds k in test formulæ
from Sizeα are encoded in binary).
The tedious proof of Lemma 10 is based on a saturation algorithm. The size of a Boolean
combination of test formulæ is the number of symbols to write it, when integers are en-
coded in binary (those from Sizeα ). Lemma 10 entails the following complexity char-
acterization, which indeed makes a contrast with the complexity of the satisfiability
problem for 1SL1 (see Theorem 28).
Theorem 11. Satisfiability problem for Boolean combinations of test formulæ in set
u
α1 Testα (q and α are not fixed, and bounds k are encoded in binary) is NP -complete.

Checking the satisfiability status of a Boolean combination of test formulæ is typi-

cally the kind of tasks that could be performed by an SMT solver, see e.g. [8,1]. This is
particularly true since no quantification is involved and test formulæ are indeed atomic
formulæ about the theory of memory states.
Below, we introduce equivalence relations depending on whether memory states are
indistinguishable w.r.t. some set of test formulæ.

Definition 12. We say that s, h, l and s , h , l are basically equivalent [resp. extra
equivalent, resp. α-equivalent] and we denote s, h, l #b s , h , l [resp. s, h, l #u
s , h , l , resp. s, h, l #α su , h , l ] when theu condition s, hu l B iff s , h l
½

B is fulfilled for any B Basic [resp. B Extra , resp. B Testα ].

132 S. Demri et al.

Hence s, h, l and s , h , l are basically equivalent [resp. extra equivalent, resp. α-
equivalent] if and only if they cannot be distinguished by the formulæ of Basicu [resp.
Extrau , resp. Testuα ]. Since Extrau Basicu Testuα , it is obvious that the inclusions
#α #b #u hold.
Proposition 13. s, h, l #α s , h , l is equivalent to (1) s, h, l #b s , h , l and
(2) pred♥ s, h, i "α pred♥ s , h , i for any i 1, q , and (3) loop♥ s, h "α
loop♥ s , h and (4) rem♥ s, h "α rem♥ s , h .
The proof is based on the identity Basicu Sizeα Testuα . The pseudo-core of s, h,
written p♥s, h, is defined as p♥s, h sV hsV and ♥s, h is equal to
p♥s, h domh.
Lemma 14 (Bijection between pseudo-cores). Let l0 , l1 N and s, h and s , h
be two memory states s.t. s, h, l0 #b s , h , l1 . Let R be the binary relation on N
defined by: l R l iff (a) [l l0 and l l1 ] or (b) there is i 1, q s.t. [l sxi
and l s xi ] or [l hsxi and l h s xi ]. Then R is a bijective relation
between p♥s, h l0 and p♥s , h l1 . Its restriction to ♥s, h is in bijection
with ♥s , h too if case (a) is dropped out from definition of R.

3.3 Expressive Completeness of 1SL1 with Respect to Test Formulæ

Lemmas 15, 16 and 17 below roughly state that the relation #α behaves properly. Each
lemma corresponds to a given quantifier, respectively separating conjunction, magic
wand and first-order quantifier. Lemma 15 below states how two equivalent memory
states can be split, while loosing a bit of precision.
Lemma 15 (Distributivity). Let us consider s, h, h1 , h2 , s , h and α, α1 , α2 1 such
that h h1
h2 and α α1 α2 and s, h, l #α s , h , l . Then there exists h1
and h2 s.t. h h1
h2 and s, h1 , l #α1 s , h1 , l and s, h2 , l #α2 s , h2 , l .
Given s, h, we write maxvals, h to denote maxsV domh ranh.
Lemma 16 below states how it is possible to add subheaps while partly preserving
precision.
Lemma 16. Let α, q 1 and l0 , l0 N. Assume that s, h, l0 #qα s , h , l0 and
h0 h. Then there is h0 h such that (1) s, h0 , l0 #α s , h0 , l0 (2) s, h
h0 , l0 #α
s , h
h0 , l0 ; (3) maxvals , h0 maxvals , h l0 3q 1α 1.
Note the precision lost from s, h, l0 #qα s , h , l0 to s, h0 , l0 #α s , h0 , l0 .
Lemma 17 (Existence). Let α 1 and let us assume s, h, l0 #α s , h , l1 . We
have: (1) for every l N, there is l N such that s, h, l #u s , h , l ; (2) for all l,
l , s, h, l #u s , h , l iff s, h, l #α s , h , l .
Now, we state the main property in the section, namely test formulæ provide the
proper abstraction.
Lemma 18 (Correctness). For any A in 1SL1 with at most q 1 program variables,
if s, h, l #α s , h , l and thq, A α then s, h l A iff s , h l A. ½
Separation Logic with One Quantified Variable 133

The proof is by structural induction on A using Lemma 15, 16 and 17. Here is one
of our main results characterizing the expressive power of 1SL1.

Theorem 19 (Quantifier Admissibility). Every formula A in 1SL1 with q program

variables is logically equivalent to a Boolean combination of test formulæ in Testuthq,A .

The proof of Theorem 19 does not provide a constructive way to eliminate quanti-
fiers, which will be done in Section 4 (see Corollary 30).

Proof. Let α thq, A and consider the set of literals Sα s, h, l B B

def

Testuα and s, h l B B B Testuα and s, h $l B . As Testuα is finite,

the set Sα s, h, l is finite and let us consider the well-defined atom Sα s, h, l.
We have s , h l
Sα s, h, l iff s, h, l #α s , h , l . The disjunction TA

def
½

Sαs, h, l s, h l A is a (finite) Boolean combination of test formulæ in

Testuα because Sα s, h, l ranges over the finite set of atoms built from Testuα . By
Lemma 18, we get that A is logically equivalent to TA . &
%
When A in 1SL1 has no free occurrence of u, one can show that A is equivalent to
a Boolean combination of formulæ in Testthq,A . Similarly, when A in 1SL1 has no
occurrence of u at all, A is equivalent to a Boolean combination of formulæ of the form
xi xj , xi xj , allocxi and # rem♥ k with the alternative definition ♥s, h
sxi : sxi domh, i 1, q (see also [17,18,3]). Theorem 19 witnesses that
the test formulæ we introduced properly abstract memory states when 1SL1 formulæ
are involved. Test formulæ from Definition 9 were not given to us and we had to design
such formulæ to conclude Theorem 19. Let us see what the test formulæ satisfy. Above
all, all the test formulæ can be expressed in 1SL1, see developments in Section 2.2 and
Lemma 6. Then, we aim at avoiding redundancy among the test formulæ. Indeed, for
any kind of test formulæ from Testuα leading to the subset X Testuα (for instance
X # loop♥ k k α), there are s, h, s , h and l, l N such that (1) for
every B Testuα X, we have s, h l B iff s , h l B but (2) there is B X such
½

that not (s, h l B iff s , h l B). When X # loop♥ k k α, clearly,

the other test formulæ cannot systematically enforce constraints on the cardinality of
the set of loops outside of the core. Last but not least, we need to prove that the set
of test formulæ is expressively complete to get Theorem 19. Lemmas 15, 16 and 17
are helpful to obtain Lemma 18 taking care of the different quantifiers. It is in their
proofs that the completeness of the set Testuα is best illustrated. Nevertheless, to apply
these lemmas in the proof of Lemma 18, we designed the adequate definition for the
function th, and we arranged different thresholds in their statements. So, there is
a real interplay between the definition of th, and the lemmas used in the proof of
Lemma 18.
A small model property can be also proved as a consequence of Theorem 19 and the
proof of Lemma 10, for instance.
Corollary 20 (Small Model Property). Let A be a formula in 1SL1 with q program
variables. Then, if A is satisfiable, then there is a memory state s, h and l N such
that s, h l A and maxmaxvals, h, l 3q 1 q 3thq, A.
134 S. Demri et al.

There is no need to count over thq, A (e.g., for the loops outside the core) and the
core uses at most 3q locations. Theorem 19 provides a characterization of the expressive
power of 1SL1, which is now easy to differenciate from 1SL2.
Corollary 21. 1SL2 is strictly more expressive than 1SL1.

4 Deciding 1SL1 Satisfiability and Model-Checking Problems

4.1 Abstracting Further Memory States

Satisfaction of A depends only on the satisfaction of formulæ from Testuthq,A . So,

to check satisfiability of A, there is no need to build memory states but rather only
abstractions in which only the truth value of test formulæ matters. In this section we
introduce abstract memory states and we show how it matches indistinguihability with
respect to test formulae in Testuα (Lemma 24). Then, we use these abstract structures to
design a model-checking decision procedure that runs in nondeterministic polynomial
space.
Definition 22. Let q, α 1. An abstract memory state a over q, α is a structure
V, E , l, r, p1 , . . . , pq such that:
1. There is a partition P of x1 , . . . , xq such that P V . This encodes the store.
2. V, E is a functional directed graph and a node v in V, E is at distance at most
two of some set of variables X in P . This allows to encode only the pseudo-core of
memory states and nothing else.
3. l, p1 , . . . , pq , r 0, α and this corresponds to the number of self-loops [resp.
numbers of predecessors, number of remaining allocated locations] out of the core,
possibly truncated over α. We require that if xi and xj belong to the same set in the
partition P , then pi pj .

Given q, α 1, the number of abstract memory states over q, α is not only finite but
reasonably bounded. Given s, h, we define its abstraction abss, h over q, α as the
abstract memory state V, E , l, r, p1 , . . . , pq such that

– l minloop♥ s, h, α, r minrem♥ s, h, α, pi minpred♥ s, h, i, α
for every i 1, q .
– P is a partition of x1 , . . . , xq so that for all x, x , sx sx iff x and x belong
to the same set in P .
– V is made of elements from P as well as of locations from the set below:
hsx : sx domh, i 1, q
i i

hhsx : hsx domh, i 1, q sx : i 1, q
i i i

– The graph V, E is defined as follows:

1. X, X E if X, X P and hsx sx for some x X, x X .
2. X, l E if X P and hsx l for some variable x in X and l ! sxi :
i 1, q .
Separation Logic with One Quantified Variable 135

3. l, l E if there is a set X P such that X, l E and hl l and

l ! sxi : i 1, q .
4. l, X E if there is X P such that X , l E and hl sx for some
x X and l ! sxi : i 1, q .
We define abstract memory states to be isomorphic if (1) the partition P is identi-
cal, (2) the finite digraphs satisfy the same formulæ from Basic when the digraphs are
understood as heap graphs restricted to locations at distance at most two from program
variables, and (3) all the numerical values are identical. A pointed abstract memory
state is a pair a, u such that a V, E , l, r, p1 , . . . , pq is an abstract memory state
and u takes one of the following values: u V and u is at distance at most one from
some X P , or u L but l ' 0 is required, or u R but r ' 0 is required, or
u P i for some i 1, q but pi ' 0 is required, or u D. Given a memory
state s, h and l N, we define its abstraction abss, h, l with respect to q, α as the
pointed abstract memory state a, u such that a abss, h and
– u V if either l V and distance is at most one from some X P , or u X and
there is x X P such that sx l,
– or u L if l loop♥ s, h, or u R if l rem♥ s, h,
def def

– or u P i if l pred♥ s, h, i for some i 1, q ,

def

– or u D if none of the above conditions applies (so l ! domh).

def

Pointed abstract memory states a, u and a , u are isomorphic (def

a and a are
isomorphic and, u u or u and u are related by the isomorphism.
Lemma 23. Given a pointed abstract memory state a, u over q, α, there exist a
memory state s, h and l N such that abss, h, l and a, u are isomorphic
Abstract memory states is the right way to abstract memory states when the language
1SL1 is involved, which can be formally stated as follows.
Lemma 24. Let s, h, s , h be memory states and l, l N. The next three proposi-
tions are equivalent: (1) s, h, l #α s , h , l ; (2) abss, h, l and abss , h , l are
isomorphic; (3) there is a unique atom B from Testuα s.t. s, h l B and s , h l B. ½

Equivalence between (1) and (3) is a consequence of the definition of the relation
#α. Hence, a pointed abstract memory state represents an atom of Testuα , except that it
is a bit more concise (only space in Oq logα is required whereas an atom requires
polynomial space in q α).
Definition 25. Given pointed abstract memory states a, u, a1 , u1 and a2 , u2 , we
write a a, u, a1 , u1 , a2 , u2 if there exist l N, a store s and disjoint heaps h1
and h2 such that abss, h1
h2 , l a, u, abss, h1 , l a1 , u1 and abss, h2 , l
a2 , u2 .
The ternary relation a is not difficult to check even though it is necessary to verify
that the abstract disjoint union is properly done.
Lemma 26. Given q, α 1, the ternary relation a can be decided in polynomial time
in q log α for all the pointed abstract memory states built over q, α.
136 S. Demri et al.

1: if B is atomic then return AMC a, u, B;

2: if B B1 then return not MC( a, u, B1 );
3: if B B1 B2 then return (MC( a, u, B1 ) and MC( a, u, B2 ));
4: if B u B1 then return iff there is u such that MC( a, u , B1 ) = ;
5: if B B1 B2 then return iff there are a1 , u1 and a2 , u2 such that
a a, u, a1 , u1 , a2 , u2 and MC( a1 , u1 , B1 ) = MC( a2 , u2 , B2 ) = ;
6: if B B1 B2 then return
iff for some a , u and a , u such that
a a , u , a , u , a, u, MC( a , u , B1 ) = and MC( a , u , B2 ) =
;

Fig. 1. Function MC a, u, B

1: if B is emp then return iff E and all numerical values are zero;
2: if B is xi xj then return iff xi , xj X, for some X P ;
3: if B is xi u then return iff u X for some X P such that xi X;
4: if B is u u then return ;
5: if B is xi xj then return iff X, X E where xi X P and xj X P ;
6: if B is xi u then return iff X, u E for some X P such that xi X;
7: if B is u xi then return iff either u P i or (u V and there is some X P such
that xi X and u, X E);
8: if B is u u then return iff either u L or u, u E;

Fig. 2. Function AMC a, u, B

4.2 A Polynomial-Space Decision Procedure

Figure 1 presents a procedure MC(a, u, B) returning a Boolean value in , ) and
taking as arguments, a pointed abstract memory state over q, α and a formula B with
thq, B α. All the quantifications over pointed abstract memory states are done
over q, α. A case analysis is provided depending on the outermost connective. Its
structure is standard and mimicks the semantics for 1SL1 except that we deal with ab-
stract memory states. The auxiliary function AMC(a, u, B) also returns a Boolean
value in , ), makes no recursive calls and is dedicated to atomic formulæ (see Fig-
ure 2). The design of MC is similar to nondeterministic polynomial space procedures,
see e.g. [15,6].
Lemma 27. Let q, α 1, a, u be a pointed abstract memory state over q, α and
A be in 1SL1 built over x1 , . . . , xq s.t. thq, A α. The propositions below are
equivalent: (I) MCa, u, A returns ); (II) There exist s, h and l N such that
abss, h, l a, u and s, h l A; (III) For all s, h and l N s.t. abss, h, l
a, u, we have s, h l A.
Consequently, we get the following complexity characterization.
Theorem 28. Model-checking and satisfiability pbs. for 1SL1 are PSPACE-complete.
Below, we state two nice by-products of our proof technique.
Corollary 29. Let q 1. The satisfiability problem for 1SL1 restricted to formulæ with
at most q program variables can be solved in polynomial time.
Separation Logic with One Quantified Variable 137

Corollary 30. Given a formula A in 1SL1, computing a Boolean combination of test

formulæ in Testuthq,A logically equivalent to A can be done in polynomial space (even
though the outcome formula can be of exponential size).

Here is another by-product of our proof technique. The PSPACE bound is preserved
when formulæ are encoded as DAGs instead of trees. The size of a formula is then sim-
ply its number of subformulæ. This is similar to machine encoding, provides a better
conciseness and complexity upper bounds are more difficult to obtain. With this alter-
native notion of length, thq, A is only bounded by q 2
A
(compare with Lemma 1).
Nevertheless, this is fine to get PSPACE upper bound with this encoding since the al-
gorithm to solve the satisfiability problem runs in logarithmic space in α, as we have
shown previously.

5 Conclusion

In [4], the undecidability of 1SL with a unique record field is shown. 1SL0 is also known
to be PSPACE-complete [6]. In this paper, we provided an extension with a unique quan-
tified variable and we show that the satisfiability problem for 1SL1 is PSPACE-complete
by presenting an original and fine-tuned abstraction of memory states. We proved that
in 1SL1 separating connectives can be eliminated in a controlled way as well as first-
order quantification over the single variable. In that way, we show a quantifier elimina-
tion property. Apart from the complexity results and the new abstraction for memory
states, we also show a quite surprising result: when the number of program variables is
bounded, the satisfiability problem can be solved in polynomial time. Last but not least,
we have established that satisfiability problem for Boolean combinations of test for-
mulæ is NP-complete. This is reminiscent of decision procedures used in SMT solvers
and it is a challenging question to take advantage of these features to decide 1SL1 with
an SMT solver. Finally, the design of fragments between 1SL1 and undecidable 1SL2
that can be decided with an adaptation of our method is worth being further investigated.
Acknowledgments. We warmly thank the anonymous referees for their numerous and
helpful suggestions, improving significantly the quality of the paper and its extended
version [10]. Great thanks also to Morgan Deters (New York University) for feedback
and discussions about this work.

References
1. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A.,
Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806,
pp. 171–177. Springer, Heidelberg (2011)
2. Berdine, J., Calcagno, C., O’Hearn, P.: Smallfoot: Modular automatic assertion checking
with separation logic. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.)
FMCO 2005. LNCS, vol. 4111, pp. 115–137. Springer, Heidelberg (2006)
3. Brochenin, R., Demri, S., Lozes, E.: Reasoning about sequences of memory states.
APAL 161(3), 305–323 (2009)
4. Brochenin, R., Demri, S., Lozes, E.: On the almighty wand. IC 211, 106–137 (2012)
138 S. Demri et al.

5. Brotherston, J., Kanovich, M.: Undecidability of propositional separation logic and its neigh-
bours. In: LICS 2010, pp. 130–139. IEEE (2010)
6. Calcagno, C., Yang, H., O’Hearn, P.W.: Computability and complexity results for a spatial as-
sertion language for data structures. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS
2001. LNCS, vol. 2245, pp. 108–119. Springer, Heidelberg (2001)
7. Cook, B., Haase, C., Ouaknine, J., Parkinson, M., Worrell, J.: Tractable reasoning in a
fragment of separation logic. In: Katoen, J.-P., König, B. (eds.) CONCUR 2011. LNCS,
vol. 6901, pp. 235–249. Springer, Heidelberg (2011)
8. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J.
(eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)
9. Demri, S., Deters, M.: Two-variable separation logic and its inner circle (September 2013)
(submitted)
10. Demri, S., Galmiche, D., Larchey-Wendling, D., Mery, D.: Separation logic with one quan-
tified variable. arXiv (2014)
11. Galmiche, D., Méry, D.: Tableaux and resource graphs for separation logic. JLC 20(1), 189–
231 (2010)
12. Haase, C., Ishtiaq, S., Ouaknine, J., Parkinson, M.J.: SeLoger: A Tool for Graph-Based Rea-
soning in Separation Logic. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044,
pp. 790–795. Springer, Heidelberg (2013)
13. Iosif, R., Rogalewicz, A., Simacek, J.: The tree width of separation logic with recursive
definitions. In: Bonacina, M.P. (ed.) CADE 2013. LNCS, vol. 7898, pp. 21–38. Springer,
Heidelberg (2013)
14. Ishtiaq, S., O’Hearn, P.: BI as an assertion language for mutable data structures. In: POPL
2001, pp. 14–26 (2001)
15. Ladner, R.: The computational complexity of provability in systems of modal propositional
logic. SIAM Journal of Computing 6(3), 467–480 (1977)
16. Larchey-Wendling, D., Galmiche, D.: The undecidability of boolean BI through phase se-
mantics. In: LICS 2010, pp. 140–149. IEEE (2010)
17. Lozes, E.: Expressivité des logiques spatiales. PhD thesis, LIP, ENS Lyon, France (2004)
18. Lozes, E.: Separation logic preserves the expressive power of classical logic. In: Workshop
SPACE 2004 (2004)
19. Piskac, R., Wies, T., Zufferey, D.: Automating separation logic using SMT. In: Sharygina,
N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 773–789. Springer, Heidelberg (2013)
20. Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: LICS 2002,
pp. 55–74. IEEE (2002)
QuickXsort: Eﬃcient Sorting with
n log n − 1.399n + o(n) Comparisons on Average

Stefan Edelkamp1 and Armin Weiß2

1
TZI, Universität Bremen, Am Fallturm 1, D-28239 Bremen, Germany
2
FMI, Universität Stuttgart, Universitätsstr. 38, D-70569 Stuttgart, Germany

Abstract. In this paper we generalize the idea of QuickHeapsort lead-

ing to the notion of QuickXsort. Given some external sorting algo-
rithm X, QuickXsort yields an internal sorting algorithm if X satisfies
certain natural conditions. We show that up to o(n) terms the average
number of comparisons incurred by QuickXsort is equal to the average
number of comparisons of X.
We also describe a new variant of WeakHeapsort. With QuickWeak-
Heapsort and QuickMergesort we present two examples for the
QuickXsort construction. Both are efficient algorithms that perform
approximately n log n − 1.26n + o(n) comparisons on average. Moreover,
we show that this bound also holds for a slight modification which guaran-
tees an n log n + O(n) bound for the worst case number of comparisons.
Finally, we describe an implementation of MergeInsertion and an-
alyze its average case behavior. Taking MergeInsertion as a base case
for QuickMergesort, we establish an efficient internal sorting algo-
rithm calling for at most n log n − 1.3999n + o(n) comparisons on av-
erage. QuickMergesort with constant size base cases shows the best
performance on practical inputs and is competitive to STL-Introsort.

Keywords: in-place sorting, quicksort, mergesort, analysis of algorithms.

1 Introduction

Sorting a sequence of n elements remains one of the most frequent tasks carried
out by computers. A lower bound for sorting by only pairwise comparisons is
log(n!) ≈ n log n − 1.44n + O(log n) comparisons for the worst and average case
(logarithms denoted by log are always base 2, the average case refers to a uniform
distribution of all input permutations assuming all elements are different). Sort-
ing algorithms that are optimal in the leading term are called constant-factor-
optimal. Tab. 1 lists some milestones in the race for reducing the coefficient in
the linear term. One of the most efficient (in terms of number of comparisons)
constant-factor-optimal algorithms for solving the sorting problem is Ford and
Johnson’s MergeInsertion algorithm [9]. It requires n log n−1.329n+O(log n)
comparisons in the worst case [12]. MergeInsertion has a severe drawback
that makes it uninteresting for practical issues: similar to Insertionsort the
number of element moves is quadratic in n, i. e., it has quadratic running time.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 139–152, 2014.

c Springer International Publishing Switzerland 2014
140 S. Edelkamp and A. Weiß

With Insertionsort we mean the algorithm that inserts all elements succes-
sively into the already ordered sequence finding the position for each element
by binary search (not by linear search as frequently done). However, MergeIn-
sertion and Insertionsort can be used to sort small subarrays such that
the quadratic running time for these subarrays is small in comparison to the
overall running time. Reinhardt [15] used this technique to design an internal
Mergesort variant that needs in the worst case n log n − 1.329n + O(log n)
comparisons. Unfortunately, implementations of this InPlaceMergesort al-
gorithm have not been documented. Katajainen et al.’s [11,8] work inspired by
Reinhardt is practical, but the number of comparisons is larger.
Throughout the text we avoid the terms in-place or in-situ and prefer the
term internal (opposed to external ). We call an algorithm internal if it needs
at most O(log n) space (computer words) in addition to the array to be sorted.
That means we consider Quicksort as an internal algorithm whereas standard
Mergesort is external because it needs a linear amount of extra space.
Based on QuickHeapsort [2], we develop the concept of QuickXsort in
this paper and apply it to Mergesort and WeakHeapsort, what yields effi-
cient internal sorting algorithms. The idea is very simple: as in Quicksort the
array is partitioned into the elements greater and less than some pivot element.
Then one part of the array is sorted by some algorithm X and the other part is
sorted recursively. The advantage of this procedure is that, if X is an external
algorithm, then in QuickXsort the part of the array which is not currently
being sorted may be used as temporary space, what yields an internal variant
of X. We give an elementary proof that under natural assumptions QuickX-
sort performs up to o(n) terms on average the same number of comparisons as
X. Moreover, we introduce a trick similar to Introsort [14] which guarantees
n log n + O(n) comparisons in the worst case.
The concept of QuickXsort (without calling it like that) was first applied in
UltimateHeapsort by Katajainen [10]. In UltimateHeapsort, first the me-
dian of the array is determined, and then the array is partitioned into subarrays of
equal size. Finding the median means significant additional effort. Cantone and
Cincotti [2] weakened the requirement for the pivot and designed QuickHeap-
sort which uses only a sample of smaller size to select the pivot for partition-
ing. UltimateHeapsort is inferior to QuickHeapsort in terms of average case
number of comparisons, although, unlike QuickHeapsort, it allows an n log n +
O(n) bound for the worst case number of comparisons. Diekert and Weiß [3] an-
alyzed QuickHeapsort more thoroughly and described some improvements re-
quiring less than n log n − 0.99n + o(n) comparisons on average.
Edelkamp and Stiegeler [5] applied the idea of QuickXsort to WeakHeap-
sort (which was first described by Dutton [4]) introducing QuickWeakHeap-
sort. The worst case number of comparisons of WeakHeapsort is nlog n −
2
log n + n − 1 ≤ n log n + 0.09n, and, following Edelkamp and Wegener [6], this
bound is tight. In [5] an improved variant with n log n − 0.91n comparisons in the
worst case and requiring extra space is presented. With ExternalWeakHeap-
sort we propose a further refinement with the same worst case bound, but on
QuickXsort: Efficient Sorting 141

Table 1. Constant-factor-optimal sorting with n log n + κn + o(n) comparisons

Mem. Other κ Worst κ Avg. κ Exper.

Lower bound O(1) O(n log n) -1.44 -1.44
BottomUpHeapsort [16] O(1) O(n log n) ω(1) – [0.35,0.39]
WeakHeapsort [4,6] O(n/w) O(n log n) 0.09 – [-0.46,-0.42]
RelaxedWeakHeapsort [5] O(n) O(n log n) -0.91 -0.91 -0.91
Mergesort [12] O(n) O(n log n) -0.91 -1.26 –
ExternalWeakHeapsort # O(n) O(n log n) -0.91 -1.26* –
Insertionsort [12] O(1) O(n2 ) -0.91 -1.38 # –
MergeInsertion [12] O(n) O(n2 ) -1.32 -1.3999 # [-1.43,-1.41]
InPlaceMergesort [15] O(1) O(n log n) -1.32 – –
QuickHeapsort [2,3] O(1) O(n log n) ω(1) -0.03 ≈ 0.20
O(n/w) O(n log n) ω(1) -0.99 ≈ -1.24
QuickMergesort (IS) # O(log n) O(n log n) -0.32 -1.38 –
QuickMergesort # O(1) O(n log n) -0.32 -1.26 [-1.29,-1.27]
QuickMergesort (MI) # O(log n) O(n log n) -0.32 -1.3999 [-1.41,-1.40]

Abbreviations: # established in this paper, MI MergeInsertion, – not analyzed, * for

n = 2k , w: computer word width in bits; we assume log n ∈ O(n/w).
For QuickXsort we assume InPlaceMergesort as a worst-case stopper (without
κworst ∈ ω(1)). The column “Mem.” exhibits the amount of computer words of memory
needed additionally to the data. “Other” gives the amount of other operations than
comparisons performed during sorting.

average requiring approximately n log n − 1.26n comparisons. Using External-

WeakHeapsort as X in QuickXsort we obtain an improvement over Quick-
WeakHeapsort of [5].
Mergesort is another good candidate for applying the QuickXsort con-
struction. With QuickMergesort we describe an internal variant of Merge-
sort which not only in terms of number of comparisons competes with standard
Mergesort, but also in terms of running time. As mentioned before, MergeIn-
sertion can be used to sort small subarrays. We study MergeInsertion and
provide an implementation based on weak heaps. Furthermore, we give an av-
erage case analysis. When sorting small subarrays with MergeInsertion, we
can show that the average number of comparisons performed by Mergesort
is bounded by n log n − 1.3999n + o(n), and, therefore, QuickMergesort uses
at most n log n − 1.3999n + o(n) comparisons in the average case. To our best
knowledge this is better than any previously known bound.
The paper is organized as follows: in Sect. 2 the concept of QuickXsort is
described and our main theorems about the average and worst case number of
comparisons are stated. The following sections are devoted to present examples
for X in QuickXsort: In Sect. 3 we develop ExternalWeakHeapsort, an-
alyze it, and show how it can be used for QuickWeakHeapsort. The next
section treats QuickMergesort and the modiﬁcation that small base cases
are sorted with some other algorithm, e. g. MergeInsertion, which is then
described in Sect. 5. Finally, we present our experimental results in Sect. 6.
Due to space limitations most proofs can be found in the arXiv version [7].
142 S. Edelkamp and A. Weiß

2 QuickXsort
In this section we give a more precise description of QuickXsort and derive
some results concerning the number of comparisons performed in the average
and worst case. Let X be some sorting algorithm. QuickXsort works as fol-
lows: First, choose some pivot element as median of some random sample. Next,
partition the array according to this pivot element, i. e., rearrange the array such
that all elements left of the pivot are less or equal and all elements on the right
are greater or equal than the pivot element. (If the algorithms X outputs the
sorted sequence in the extra memory, the partitioning is performed such that the
all elements left of the pivot are greater or equal and all elements on the right
are less or equal than the pivot element.) Then, choose one part of the array and
sort it with algorithm X. (The preferred choice depends on the sorting algorithm
X.) After one part of the array has been sorted with X, move the pivot element
to its correct position (right after/before the already sorted part) and sort the
other part of the array recursively with QuickXsort.
The main advantage of this procedure is that the part of the array that is
not being sorted currently can be used as temporary memory for the algorithm
X. This yields fast internal variants for various external sorting algorithms such
as Mergesort. The idea is that whenever a data element should be moved
to the external storage, instead it is swapped with the data element occupying
the respective position in part of the array which is used as temporary memory.
Of course, this works only if the algorithm needs additional storage only for
data elements. Furthermore, the algorithm has to be able to keep track of the
positions of elements which have been swapped. As the specific method depends
on the algorithm X, we give some more details when we describe the examples
for QuickXsort.
For the number of comparisons we can derive some general results which hold
for a wide class of algorithms X. Under natural assumptions the average number
of comparisons of X and of QuickXsort differ only by an o(n)-term. For the rest
of
√ the paper, we assume that the pivot is selected as the median of √ approximately
n randomly chosen elements. Sample sizes of approximately n are likely to
be optimal as the results in [3,13] suggest.
The following theorem is one of our main results. It can be proved using
Chernoff bounds and then solving the linear recurrence.
Theorem 1 (QuickXsort Average-Case). Let X be some sorting algorithm
requiring at most n log n + cn + o(n)√ comparisons in the average case. Then,
QuickXsort implemented with Θ( n) elements as sample for pivot selection
is a sorting algorithm that also needs at most n log n + cn + o(n) comparisons in
the average case.
Does QuickXsort provide √ a good bound for the worst case? The obvious
answer is “no”. If always the n smallest elements are chosen for pivot selection,
Θ(n3/2 ) comparisons are performed. However, we can prove that such a worst
case is very unlikely. Let R(n) be the worst case number of comparisons of the
algorithm X.
QuickXsort: Efficient Sorting 143

Proposition 1. Let > 0. The probability that

√ QuickXsort needs more than
4
R(n) + 6n comparisons is less than (3/4 + ) n for n large enough.
In order to obtain a provable bound for the worst case complexity we apply
a simple trick similar to the one used in Introsort [14]. We fix some worst
case efficient sorting algorithm Y. This might be, e. g., InPlaceMergesort.
(In order to obtain an efficient internal sorting algorithm, Y has to be internal.)
Worst case efficient means that we have a n log n + O(n) bound for the worst
case number of comparisons. We choose some slowly decreasing function δ(n) ∈
o(1) ∩ Ω(n− 4 + ), e. g., δ(n) = 1/ log n. Now, whenever the pivot is more than
1

n · δ(n) off the median, we stop with QuickXsort and continue by sorting both
parts of the partitioned array with the algorithm Y. We call this QuickXYsort.
To achieve a good worst case bound, of course, we also need a good bound for
algorithm X. W. l. o. g. we assume the same worst case bounds for X as for Y.
Note that QuickXYsort only makes sense if one needs a provably good worst
case bound. Since QuickXsort is always expected to make at most as many
comparisons as QuickXYsort (under the reasonable assumption that X on
average is faster than Y – otherwise one would use simply Y), in every step of
the recursion QuickXsort is the better choice for the average case.
Theorem 2 (QuickXYsort Worst-Case). Let X be a sorting algorithm with
at most n log n + cn + o(n) comparisons in the average case and R(n) = n log n +
dn + o(n) comparisons in the worst case (d ≥ c). Let Y be a sorting algorithm
with at most R(n) comparisons in the worst case. Then, QuickXYsort is a
sorting algorithm that performs at most n log n + cn + o(n) comparisons in the
average case and n log n + (d + 1)n + o(n) comparisons in the worst case.
In order to keep the the implementation of QuickXYsort simple, we propose
the following algorithm Y: Find the median with some linear time algorithm
(see e.g. [1]), then apply QuickXYsort with this median as first pivot element.
Note that this algorithm is well defined because by induction the algorithm Y is
already defined for all smaller instances. The proof of Thm. 2 shows that Y, and
thus QuickXYsort, has a worst case number of comparisons in n log n + O(n).

3 QuickWeakHeapsort
In this section consider QuickWeakHeapsort as a ﬁrst example of Quick-
Xsort. We start by introducing weak heaps and then continue by describing
WeakHeapsort and a novel external version of it. This external version is a
good candidate for QuickXsort and yields an eﬃcient sorting algorithm that
uses approximately n log n−1.2n comparisons (this value is only a rough estimate
and neither a bound from below nor above). A drawback of WeakHeapsort
and its variants is that they require one extra bit per element. The exposition
also serves as an intermediate step towards our implementation of MergeIn-
sertion, where the weak-heap data structure will be used as a building block.
Conceptually, a weak heap (see Fig. 1) is a binary tree satisfying the following
conditions:
144 S. Edelkamp and A. Weiß

0
1
1
3
2 3
4 7
4 5 7 6
8 6 5 9
9 8
2 11

Fig. 1. A weak heap (reverse bits are set for grey nodes, above the nodes are array
indices.)

(1) The root of the entire tree has no left child.

(2) Except for the root, the nodes that have at most one child are in the last
two levels only. Leaves at the last level can be scattered, i. e., the last level
is not necessarily ﬁlled from left to right.
(3) Each node stores an element that is smaller than or equal to every element
stored in its right subtree.

From the first two properties we deduce that the height of a weak heap that has
n elements is log n + 1. The third property is called the weak-heap ordering
or half-tree ordering. In particular, this property enforces no relation between
an element in a node and those stored its left subtree. On the other hand, it
implies that any node together with its right subtree forms a weak heap on its
own. In an array-based implementation, besides the element array s, an array
r of reverse bits is used, i. e., ri ∈ {0, 1} for i ∈ {0, . . . , n − 1}. The root has
index 0. The array index of the left child of si is 2i + ri , the array index of the
right child is 2i + 1 − ri , and the array index of the parent is %i/2& (assuming
that i = 0). Using the fact that the indices of the left and right children of si
are exchanged when flipping ri , subtrees can be reversed in constant time by
setting ri ← 1 − ri . The distinguished ancestor (d -ancestor (j)) of sj for j = 0, is
recursively defined as the parent of sj if sj is a right child, and the distinguished
ancestor of the parent of sj if sj is a left child. The distinguished ancestor of
sj is the first element on the path from sj to the root which is known to be
smaller or equal than sj by (3). Moreover, any subtree rooted by sj , together
with the distinguished ancestor si of sj , forms again a weak heap with root si
by considering sj as right child of si .
The basic operation for creating a weak heap is the join operation which
combines two weak heaps into one. Let i < j be two nodes in a weak heap
such that si is smaller than or equal to every element in the left subtree of sj .
Conceptually, sj and its right subtree form a weak heap, while si and the left
subtree of sj form another weak heap. (Note that si is not part of the subtree
with root sj .) The result of join is a weak heap with root at position i. If sj < si ,
the two elements are swapped and rj is flipped. As a result, the new element sj
will be smaller than or equal to every element in its right subtree, and the new
element si will be smaller than or equal to every element in the subtree rooted at
QuickXsort: Efficient Sorting 145

sj . To sum up, join requires constant time and involves one element comparison
and a possible element swap in order to combine two weak heaps to a new one.
The construction of a weak heap consisting of n elements requires n − 1 com-
parisons. In the standard bottom-up construction of a weak heap the nodes are
visited one by one. Starting with the last node in the array and moving to the
front, the two weak heaps rooted at a node and its distinguished ancestor are
joined. The amortized cost to get from a node to its distinguished ancestor is
O(1) [6].
When using weak heaps for sorting, the minimum is removed and the weak
heap condition restored until the weak heap becomes empty. After extracting an
element from the root, first the special path from the root is traversed top-down,
and then, in a bottom-up process the weak-heap property is restored using at
most log n join operations. (The special path is established by going once to
the right and then to the left as far as it is possible.) Hence, extracting the
minimum requires at most log n comparisons.
Now, we introduce a modification to the standard procedure described by
Dutton [4], which has a slightly improved performance, but requires extra space.
We call this modified algorithm ExternalWeakHeapsort. This is because it
needs an extra output array, where the elements which are extracted from the
weak heap are moved to. On average ExternalWeakHeapsort requires less
comparisons than RelaxedWeakHeapsort [5]. Integrated in QuickXsort we
can implement it without extra space other than the extra bits r and some other
extra bits. We introduce an additional array active and weaken the requirements
of a weak heap: we also allow nodes on other than the last two levels to have
less than two children. Nodes where the active bit is set to false are considered
to have been removed. ExternalWeakHeapsort works as follows: First, a
usual weak heap is constructed using n − 1 comparisons. Then, until the weak
heap becomes empty, the root – which is the minimal element – is moved to
the output array and the resulting hole has to be filled with the minimum of
the remaining elements (so far the only difference to normal WeakHeapsort
is that there is a separate output area).
The hole is filled by searching the special path from the root to a node x which
has no left child. Note that the nodes on the special path are exactly the nodes
having the root as distinguished ancestor. Finding the special path does not need
any comparisons since one only has to follow the reverse bits. Next, the element
of the node x is moved to the root leaving a hole. If x has a right subtree (i. e.,
if x is the root of a weak heap with more than one element), this hole is filled
by applying the hole-filling algorithm recursively to the weak heap with root x.
Otherwise, the active bit of x is set to false. Now, the root of the whole weak heap
together with the subtree rooted by x forms a weak heap. However, it remains
to restore the weak heap condition for the whole weak heap. Except for the root
and x, all nodes on the special path together with their right subtrees form weak
heaps. Following the special path upwards these weak heaps are joined with their
distinguished ancestor as during the weak heap construction (i. e., successively
they are joined with the weak heap consisting of the root and the already treated
146 S. Edelkamp and A. Weiß

nodes on the special path together with their subtrees). Once, all the weak heaps
on the special path are joined, the whole array forms a weak heap again.

Theorem 3. For n = 2k ExternalWeakHeapsort performs exactly the

same comparisons as Mergesort applied on a ﬁxed permutation of the same
input array.

By [12, 5.2.4–13] we obtain the following corollary.

Corollary 1 (Average Case ExternalWeakHeapsort). For n = 2k the

algorithm ExternalWeakHeapsort uses approximately n log n − 1.26n com-
parisons in the average case.

If n is not a power of two, the sizes of left and right parts of WeakHeapsort
are less balanced than the left and right parts of ordinary Mergesort and one
can expect a slightly higher number of comparisons. For QuickWeakHeapsort,
the half of the array which is not sorted by ExternalWeakHeapsort is used
as output area. Whenever the root is moved to the output area, the element that
occupied that place before is inserted as a dummy element at the position where
the active bit is set to false. Applying Thm. 1, we obtain the rough estimate of
n log n − 1.2n comparisons for the average case of QuickWeakHeapsort.

4 QuickMergesort

As another example for QuickXsort we consider QuickMergesort. For the

Mergesort part we use standard (top-down) Mergesort which can be im-
plemented using m extra spaces to merge two arrays of length m. After the
partitioning, one part of the array – we assume the first part – has to be sorted
with Mergesort. In order to do so, the second half of this first part is sorted
recursively with Mergesort while moving the elements to the back of the whole
array. The elements from the back of the array are inserted as dummy elements
into the first part. Then, the first half the first part is sorted recursively with
Mergesort while being moved to the position of the former second part. Now,
at the front of the array, there is enough space (filled with dummy elements)
such that the two halves can be merged. The procedure is depicted in Fig. 2. As
long as there is at least one third of the whole array as temporary memory left,
the larger part of the partitioned array is sorted with Mergesort, otherwise
the smaller part is sorted with Mergesort. Hence, the part which is not sorted
by Mergesort always provides enough temporary space. Whenever a data ele-
ment is moved to or from the temporary space, it is swapped with the dummy
element occupying the respective position. Since Mergesort moves through
the data from left to right, it is always clear which elements are the dummy
elements. Depending on the implementation the extra space needed is O(log n)
words for the recursion stack of Mergesort. By avoiding recursion this can be
reduced to O(1). Thm. 1 together with [12, 5.2.4–13] yields the next result.
QuickXsort: Efficient Sorting 147

Pivot Pivot

Pivot

Fig. 2. First the two halves of the left part are sorted moving them from one place to
another. Then, they are merged to the original place.

Theorem 4 (Average Case QuickMergesort). QuickMergesort is an

internal sorting algorithm that performs at most n log n − 1.26n + o(n) compar-
isons on average.
We can do even better if we sort small subarrays with another algorithm Z
requiring less comparisons but extra space and more moves, e. g., Insertion-
sort or MergeInsertion. If we use O(log n) elements for the base case of
Mergesort, we have to call Z at most O(n/ log n) times. In this case we can
allow additional operations of Z like moves in the order of O(n2 ) given that
O((n/ log n) · log2 n) = O(n log n). Note that for the next result we only need
that the size of the base cases grows as n grows. Nevertheless, when applying
an algorithm which uses Θ(n2 ) moves, the size of the base cases has to be in
O(log n) in order to achieve an O(n log n) overall running time.
Theorem 5 (QuickMergesort with Base Case). Let Z be some sorting
algorithm with n log n + en + o(n) comparisons on average and other operations
taking at most O(n2 ) time. If base cases of size O(log n) are sorted with Z,
QuickMergesort uses at most n log n + en + o(n) comparisons and O(n log n)
other instructions on average.
Proof. By Thm. 1 and the preceding remark, the only thing we have to prove is
that Mergesort with base case Z requires on average at most ≤ n log n + en +
o(n) comparisons, given that Z needs ≤ U (n) = n log n+en+o(n) comparisons on
average. The latter means that for every > 0 we have U (n) ≤ n log n+ (e + )·n
for n large enough.
Let Sk (m) denote the average case number of comparisons of Mergesort
with base cases of size k sorted with Z and let > 0. Since log n grows as n grows,
we have that Slog n (m) = U (m) ≤ m log m + (e + ) · m for n large enough and
(log n)/2 < m ≤ log n. For m > log n we have Slog n (m) ≤ 2 · Slog n (m/2) + m
and by induction we see that Slog n (m) ≤ m log m + (e + ) · m. Hence, also
Slog n (n) ≤ n log n + (e + ) · n for n large enough.

Recall that Insertionsort inserts the elements one by one into the already
sorted sequence by binary search. Using Insertionsort we obtain the following
result. Here, ln denotes the natural logarithm.
Proposition 2 (Average Case of Insertionsort). The sorting algorithm
Insertionsort needs n log n − 2 ln 2 · n + c(n) · n + O(log n) comparisons on
average where c(n) ∈ [−0.005, 0.005].
148 S. Edelkamp and A. Weiß

Corollary 2 (QuickMergesort with Base Case Insertionsort). If we

use as base case Insertionsort, QuickMergesort uses at most n log n −
1.38n + o(n) comparisons and O(n log n) other instructions on average.

Bases cases of growing size always lead to a constant factor overhead in run-
ning time if an algorithm with a quadratic number of total operations is applied.
Therefore, in the experiments we also consider constant size base cases, which
oﬀer a slightly worse bound for the number of comparisons, but are faster in
practice. We do not analyze them separately since the preferred choice for the
size depends on the type of data to be sorted and the system on which the
algorithms run.

5 MergeInsertion

MergeInsertion by Ford and Johnson [9] is one of the best sorting algorithms
in terms of number of comparisons. Hence, it can be applied for sorting base
cases of QuickMergesort what yields even better results than Insertionsort.
Therefore, we want to give a brief description of the algorithm and our imple-
mentation. Algorithmically, MergeInsertion(s0 , . . . , sn−1 ) can be described as
follows (an intuitive example for n = 21 can be found in [12]):

1. Arrange the input such that si ≥ si+ n/2 for 0 ≤ i < %n/2& with one
comparison per pair. Let ai = si and bi = si+ n/2 for 0 ≤ i < %n/2&, and
b n/2 = sn−1 if n is odd.
2. Sort the values a0 ,...,a n/2−1 recursively with MergeInsertion.
3. Rename the solution as follows: b0 ≤ a0 ≤ a1 ≤ · · · ≤ a n/2−1 and insert
the elements b1 , . . . , b
n/2−1 via binary insertion, following the ordering b2 ,
b1 , b4 , b3 , b10 , b9 , . . . , b5 , . . . , btk−1 , btk−1 −1 , . . . btk−2 +1 , btk , . . . into the main
chain, where tk = (2k+1 + (−1)k )/3.

While the description is simple, MergeInsertion is not easy to implement

efficiently because of the different renamings, the recursion, and the change of
link structure. Our proposed implementation of MergeInsertion is based on a
tournament tree representation with weak heaps as in Sect. 3. It uses n log n + n
extra bits and works as follows: First, step 1 is performed for all recursion levels
by constructing a weak heap. (Pseudo-code implementations for all the opera-
tions to construct a tournament tree with a weak heap and to access the part-
ners in each round can be found in [7] – note that for simplicity in the above
formulation the indices and the order are reversed compared to our implemen-
tation.) Then, in a second phase step 3 is executed for all recursion levels, see
Fig. 3. One main subroutine of MergeInsertion is binary insertion. The call
binary-insert (x, y, z) inserts the element at position z between position x − 1
and x + y by binary insertion. In this routine we do not move the data elements
themselves, but we use an additional index array φ0 , . . . , φn−1 to point to the
elements contained in the weak heap tournament tree and move these indirect
QuickXsort: Efficient Sorting 149

procedure: merge(m: integer)

global: φ array of n integers imposed by weak-heap
for l ← 0 to
m/2 − 1
φm−odd(m)−l−1 ← d -child(φl , m − odd(m));
k ← 1; e ← 2k ; c ← f ← 0;
while e < m
k ← k + 1; e ← 2e;
l ← m/2 + f ; f ← f + (tk − tk−1 );
for i ← 0 to (tk − tk−1 ) − 1
c ← c + 1;
if c = m/2 then
return;
if tk > m/2 − 1 then
binary-insert (i + 1 − odd(m), l, m − 1);
else
binary-insert (
m/2 − f + i, e − 1,
m/2 + f );

Fig. 3. Merging step in MergeInsertion with tk = (2k+1 +(−1)k )/3 , odd(m) =

m mod 2, and d -child(φi , n) returns the highest index less than n of a grandchild
of φi in the weak heap (i. e, d -child (φi , n) = index of the bottommost element in
the weak heap which has d -ancestor = φi and index < n)

addresses. This approach has the advantage that the relations stored in the tour-
nament tree are preserved. The most important procedure for MergeInsertion
is the organization of the calls for binary-insert . After adapting the addresses
for the elements bi (w. r. t. the above description) in the second part of the array,
the algorithm calls the binary insertion routine with appropriate indices. Note
that we always use k comparisons for all elements of the k-th block (i. e., the
elements btk , . . . , btk−1 +1 ) even if there might be the chance to save one compar-
ison. By introducing an additional array, which for each bi contains the current
index of ai , we can exploit the observation that not always k comparisons are
needed to insert an element of the k-th block. In the following we call this the
improved variant. The pseudo-code of the basic variant is shown in Fig. 3. The
last sequence is not complete and is thus tackled in a special case.
Theorem 6 (Average Case of MergeInsertion). The sorting algorithm
MergeInsertion needs n log n − c(n) · n + O(log n) comparisons on average,
where c(n) ≥ 1.3999.
Corollary 3 (QuickMergesort with Base Case MergeInsertion). When
using MergeInsertion as base case, QuickMergesort needs at most n log n−
1.3999n + o(n) comparisons and O(n log n) other instructions on average.

6 Experiments
Our experiments consist of two parts. First, we compare the diﬀerent algorithms
we use as base cases, i. e., MergeInsertion, its improved variant, and Inser-
tionsort. The results can be seen in Fig. 4. Depending on the size of the arrays
150 S. Edelkamp and A. Weiß

the displayed numbers are averages over 10-10000 runs1 . The data elements we
sorted were randomly chosen 32-bit integers. The number of comparisons was
measured by increasing a counter in every comparison2 .
The outcome in Fig. 4 shows that our improved MergeInsertion imple-
mentation achieves results for the constant κ of the linear term in the range of
[−1.43, −1.41] (for some values of n are even smaller than −1.43). Moreover, the
standard implementation with slightly more comparisons is faster than Inser-
tionsort. By the O(n2 ) work, the resulting runtimes for all three implementa-
tions raise quickly, so that only moderate values of n can be handled.

Small−Scale Comparison Experiment Small−Scale Runtime Experiment

−1.35 0.6
Insertionsort Insertionsort
Number of element comparisons − n log n per n

−1.36 Merge Insertion Improved Merge Insertion Improved

0.55
Merge Insertion Merge Insertion

Execution time per (#elements)2 [µs]

−1.37 Lower Bound
0.5
−1.38
0.45
−1.39

−1.4 0.4

−1.41
0.35
−1.42
0.3
−1.43
0.25
−1.44

−1.45 0.2
10 12 14 16 10 12 14 16
2 2 2 2 2 2 2 2
n [logarithmic scale] n [logarithmic scale]

Fig. 4. Comparison of MergeInsertion, its improved variant and Insertionsort.

For the number of comparisons n log n + κn the value of κ is displayed.

The second part of our experiments (shown in Fig. 5) consists of the compar-
ison of QuickMergesort (with base cases of constant and growing size) and
QuickWeakHeapsort with state-of-the-art algorithms as STL-Introsort
(i. e., Quicksort), STL-stable-sort
√ (BottomUpMergesort) and Quick-
sort with median of n elements for pivot selection. For QuickMergesort
with base cases, the improved variant of MergeInsertion is used to sort sub-
arrays of size up to 40 log10 n. For the normal QuickMergesort we used base
cases of size ≤ 9. We also implemented QuickMergesort with median of three
for pivot selection, which turns out to be practically efficient, although
√ it needs
slightly more comparisons than QuickMergesort with median of n. However,
1
Our experiments were run on one core of an Intel Core i7-3770 CPU (3.40GHz, 8MB
Cache) with 32GB RAM; Operating system: Ubuntu Linux 64bit; Compiler: GNU’s
g++ (version 4.6.3) optimized with flag -O3.
2
To rely on objects being handled we avoided the flattening of the array structure
by the compiler. Hence, for the running time experiments, and in each comparison
taken, we left the counter increase operation intact.
QuickXsort: Efficient Sorting 151

since also the larger half of the partitioned

√ array can be sorted with Mergesort,
the diﬀerence to the median of n version is not as big as in QuickHeapsort
[3]. As suggested by the theory, we see that our improved QuickMergesort
implementation with growing size base cases MergeInsertion yields a result
for the constant in the linear term that is in the range of [−1.41, −1.40] – close
to the lower bound. However, for the running time, normal QuickMergesort
as well as the STL-variants Introsort (std::sort) and BottomUpMerge-
sort (std::stable sort) are slightly better. With about 15% the time gap,
however, is not overly big, and may be bridged with additional optimizations.
Also, when comparisons are more expensive, QuickMergesort performs faster
than Introsort and BottomUpMergesort, see the arXiv version [7].

Large−Scale Comparison Experiment Large−Scale Runtime Experiment

STL Introsort (out of range) QuickWeakHeapsort Median Sqrt
Quicksort Median Sqrt 0.4 QuickMergesort (MI) Median Sqrt
Number of element comparisons − n log n per n

0.5 STL Mergesort Quicksort Median Sqrt

QuickMergesort Median 3 QuickMergesort Median Sqrt
QuickWeakHeapsort Median Sqrt 0.35
QuickMergesort Median 3
Execution time per element [µs]

QuickMergesort Median Sqrt STL Mergesort

QuickMergesort (MI) Median Sqrt 0.3 STL Introsort
0 Lower Bound
0.25

0.2
−0.5

0.15

−1 0.1

0.05

−1.5 0
14
2 2
16
2
18
2
20
2
22
2
24 26
2 214 216 218 220 222 224 226
n [logarithmic scale] n [logarithmic scale]

Fig. 5. Comparison of QuickMergesort (with base cases of constant and growing

size) and QuickWeakHeapsort with other sorting algorithms; (MI) is short for in-
cluding growing size base cases derived from MergeInsertion. For the number of
comparisons n log n + κn the value of κ is displayed.

7 Concluding Remarks
Sorting n elements remains a fascinating topic for computer scientists both from
a theoretical and from a practical point of view. With QuickXsort we have
described a procedure how to convert an external sorting algorithm into an inter-
nal one introducing only o(n) additional comparisons on average. We presented
QuickWeakHeapsort and QuickMergesort as two examples for this con-
struction. QuickMergesort is close to the lower bound for the average number
of comparisons and at the same time is practically eﬃcient, even when the com-
parisons are fast.
Using MergeInsertion to sort base cases of growing size for QuickMerge-
sort, we derive an an upper bound of n log n − 1.3999n + o(n) comparisons for
152 S. Edelkamp and A. Weiß

the average case. As far as we know a better result has not been published before.
We emphasize that the average of our best implementation has a proven gap of
at most 0.05n+o(n) comparisons to the lower bound. The value n log n−1.4n for
n = 2k matches one side of Reinhardt’s conjecture that an optimized in-place
algorithm can have n log n − 1.4n + O(log n) comparisons in the average [15].
Moreover, our experimental results validate the theoretical considerations and
indicate that the factor −1.43 can be beaten. Of course, there is still room in
closing the gap to the lower bound of n log n − 1.44n + O(log n) comparisons.

References
1. Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for
selection. J. Comput. Syst. Sci. 7(4), 448–461 (1973)
2. Cantone, D., Cinotti, G.: QuickHeapsort, an efficient mix of classical sorting algo-
rithms. Theoretical Computer Science 285(1), 25–42 (2002)
3. Diekert, V., Weiß, A.: Quickheapsort: Modifications and improved analysis. In:
Bulatov, A.A., Shur, A.M. (eds.) CSR 2013. LNCS, vol. 7913, pp. 24–35. Springer,
Heidelberg (2013)
4. Dutton, R.D.: Weak-heap sort. BIT 33(3), 372–381 (1993)
5. Edelkamp, S., Stiegeler, P.: Implementing HEAPSORT with n log n − 0.9n and
QUICKSORT with n log n + 0.2n comparisons. ACM Journal of Experimental Al-
gorithmics 10(5) (2002)
6. Edelkamp, S., Wegener, I.: On the performance of W EAK − HEAP SORT . In:
Reichel, H., Tison, S. (eds.) STACS 2000. LNCS, vol. 1770, pp. 254–266. Springer,
Heidelberg (2000)
7. Edelkamp, S., Weiß, A.: QuickXsort: Efficient Sorting with n log n − 1.399n + o(n)
Comparisons on Average. ArXiv e-prints, abs/1307.3033 (2013)
8. Elmasry, A., Katajainen, J., Stenmark, M.: Branch mispredictions don’t affect
mergesort. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 160–171. Springer,
Heidelberg (2012)
9. Ford, J., Lester, R., Johnson, S.M.: A tournament problem. The American Math-
ematical Monthly 66(5), 387–389 (1959)
10. Katajainen, J.: The Ultimate Heapsort. In: CATS, pp. 87–96 (1998)
11. Katajainen, J., Pasanen, T., Teuhola, J.: Practical in-place mergesort. Nord. J.
Comput. 3(1), 27–40 (1996)
12. Knuth, D.E.: Sorting and Searching, 2nd edn. The Art of Computer Programming,
vol. 3. Addison Wesley Longman (1998)
13. Martı́nez, C., Roura, S.: Optimal Sampling Strategies in Quicksort and Quickselect.
SIAM J. Comput. 31(3), 683–705 (2001)
14. Musser, D.R.: Introspective sorting and selection algorithms. Software—Practice
and Experience 27(8), 983–993 (1997)
15. Reinhardt, K.: Sorting in-place with a worst case complexity of n log n − 1.3n +
o(log n) comparisons and n log n + o(1) transports. In: Ibaraki, T., Iwama, K.,
Yamashita, M., Inagaki, Y., Nishizeki, T. (eds.) ISAAC 1992. LNCS, vol. 650,
pp. 489–498. Springer, Heidelberg (1992)
16. Wegener, I.: Bottom-up-Heapsort, a new variant of Heapsort beating, on an average,
Quicksort (if n is not very small). Theoretical Computer Science 118(1), 81–98
(1993)
Notions of Metric Dimension of Corona
Products: Combinatorial and Computational
Results

Henning Fernau1 and Juan Alberto Rodrı́guez-Velázquez2

1
FB 4-Abteilung Informatikwissenschaften, Universität Trier, 54286 Trier, Germany
fernau@uni-trier.de
2
Universitat Rovira i Virgili, Av. Paı̈sos Catalans 26, 43007 Tarragona, Spain
juanalberto.rodriguez@urv.cat

Abstract. The metric dimension is quite a well-studied graph param-

eter. Recently, the adjacency metric dimension and the local metric di-
mension have been introduced. We combine these variants and introduce
the local adjacency metric dimension. We show that the (local) metric
dimension of the corona product of a graph of order n and some non-
trivial graph H equals n times the (local) adjacency metric dimension
of H. This strong relation also enables us to infer computational hardness
results for computing the (local) metric dimension, based on according
hardness results for (local) adjacency metric dimension that we also give.

Keywords: (local) metric dimension, (local) adjacency dimension,

NP-hardness.

1 Introduction and Preliminaries

Throughout this paper, we only consider undirected simple loop-free graphs and
use standard graph-theoretic terminology. Less known notions are collected at
the end of this section.
Let (X, d) be a metric space. The diameter of a point set S ⊆ X is deﬁned
as diam(S) = sup{d(x, y) : x, y ∈ S}. A generator of (X, d) is a set S ⊆ X such
that every point of the space is uniquely determined by the distances from the
elements of S. A point v ∈ X is said to distinguish two points x and y of X if
d(v, x) = d(v, y). Hence, S is a generator if and only if any pair of points of X
is distinguished by some element of S.

Four notions of dimension in graphs. Let N denote the set of non-negative

integers. Given a connected graph G = (V, E), we consider the function dG :
V × V → N, where dG (x, y) is the length of a shortest path between u and v.
Clearly, (V, dG ) is a metric space. The diameter of a graph is understood in this
metric. A vertex set S ⊆ V is said to be a metric generator for G if it is a
generator of the metric space (V, dG ). A minimum metric generator is called a
metric basis, and its cardinality the metric dimension of G, denoted by dim(G).

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 153–166, 2014.

c Springer International Publishing Switzerland 2014
154 H. Fernau and J.A. Rodrı́guez-Velázquez

Motivated by the problem of uniquely determining the location of an intruder in

a network, the concept of metric dimension of a graph was introduced by Slater
in [33], where the metric generators were called locating sets. Independently,
Harary and Melter introduced this concept in [16], where metric generators were
called resolving sets. Applications of this parameter to the navigation of robots
in networks are discussed in [25] and applications to chemistry in [22,23]. This
graph parameter was studied further in a number of other papers including
recent papers like [1,10,13,20,35].
Keeping in mind the robot navigation scenario, where the robot can determine
its position by knowing the distances to the vertices in the metric generator, it
makes sense to consider local variants of this parameter, assuming that the robot
has some idea about its current position. A set S of vertices in a connected graph
G is a local metric generator for G (also called local metric set for G [29]) if every
two adjacent vertices of G are distinguished by some vertex of S. A minimum
local metric generator is called a local metric basis for G and its cardinality, the
local metric dimension of G, is denoted by diml (G).
If the distances between vertices are hard to determine, then it might still
be the case that the robot can sense whether or not it is within the range
of some sender installed on some other vertex. This has motivated the next
deﬁnition. A set S of vertices in a graph G is an adjacency generator for G
(also adjacency resolving set for G [21]) if for every x, y ∈ V (G) − S there
exists s ∈ S such that |NG (s) ∩ {x, y}| = 1. This concept is very much related
to that of a 1-locating dominating set [5]. A minimum adjacency generator is
called an adjacency basis for G and its cardinality, the adjacency dimension
of G, is denoted by dimA (G). Observe that an adjacency generator of a graph
G = (V, E) is also a generator in a suitably chosen metric space, namely by
considering (V, dG,2 ), with dG,2 (x, y) = min{dG (x, y), 2}, and vice versa.
Now, we combine the two variants of metric dimension deﬁned so far and
introduce the local adjacency dimension of a graph. We say that a set S of
vertices in a graph G is a local adjacency generator for G if for every two adjacent
vertices x, y ∈ V (G) − S there exists s ∈ S such that |NG (s) ∩ {x, y}| = 1. A
minimum local adjacency generator is called a local adjacency basis for G and
its cardinality, the local adjacency dimension of G, is denoted by dimA,l (G).

Our main results. In this paper, we study the (local) metric dimension of corona
product graphs via the (local) adjacency dimension of a graph. We show that
the (local) metric dimension of the corona product of a graph of order n and
some non-trivial graph H equals n times the (local) adjacency metric dimension
of H. This relation is much stronger and under weaker conditions compared to
the results of Jannesari and Omoomi [21] concerning the lexicographic product
of graphs. This also enables us to infer NP-hardness results for computing the
(local) metric dimension, based on corresponding NP-hardness results for (lo-
cal) adjacency metric dimension that we also provide. To our knowledge, this
is the ﬁrst time combinatorial results on this particular form of graph product
have been used to deduce computational hardness results. The obtained reduc-
tions are relatively simple and also allow us to conclude hardness results based
Notions of Metric Dimension of Corona Products 155

on the Exponential Time Hypothesis. We also discuss NP-hardness results for

planar graphs, which seem to be of some particular importance to the sketched
applications. This also shows the limitations of using corona products to ob-
tain hardness results. Finally, we indicate why computing the (local) adjacency
metric dimension is in FPT (under the standard parameterization), contrasting
what is known for computing the metric dimension.

Let G = (V, E) be a graph. A vertex set D ⊆ V

Some notions from graph theory.
is called a dominating set if v∈D NG [v] = V . The domination number of G,
denoted by γ(G), is the minimum cardinality among all dominating sets in G. A
vertex set C ⊆ V is called a vertex cover if for each edge e ∈ E, C ∩ e = ∅, The
vertex cover number of G, denoted by β(G), is the minimum cardinality among
all vertex covers of G.

Fig. 1. The bold type forms an adjacency basis for P4 P5 but not a dominating set

Let G and H be two graphs of order n and n , respectively. The join (graph)
G + H is deﬁned as the graph obtained from vertex-disjoint graphs G and H
by taking one copy of G and one copy of H and joining by an edge each vertex
of G with each vertex of H. Graph products is one of the recurring themes in
graph theory, see [15]. The corona product (graph) G ( H is deﬁned as the graph
obtained from G and H by taking one copy of G and n copies of H and joining by
an edge each vertex from the ith copy of H with the ith vertex of G [11]. We will
denote by V = {v1 , v2 , . . . , vn } the set of vertices of G and by Hi = (Vi , Ei ) the
ith copy of H so that NGH (vi ) = Vi ∪ NG (vi ) and NGH (x) = {vi } ∪ NHi (x)
for every x ∈ Vi . Notice that the corona graph K1 ( H is isomorphic to the
join graph K1 + H. For our computational complexity results, it is important
but easy to observe that these graph operations can be performed in polynomial
time, given two input graphs. Some of the notions important in this paper are
illustrated in Figure 1.

Simple facts. By deﬁnition, the following inequalities hold for any graph G:

– dim(G) ≤ dimA (G);

– diml (G) ≤ dimA,l (G);
– diml (G) ≤ dim(G);
– dimA,l (G) ≤ dimA (G).
156 H. Fernau and J.A. Rodrı́guez-Velázquez

Moreover, if S is an adjacency generator, then at most one vertex is not

dominated by S, so that
γ(G) ≤ dimA (G) + 1.
Namely, if x, y are not dominated by S, then no element in S distinguishes them.
We also observe that
dimA,l (G) ≤ β(G),
because each vertex cover is a local adjacency generator.
However, all mentioned inequalities could be either equalities or quite weak
bounds. Consider the following examples:

1. diml (Pn ) = dim(Pn ) = 1 ≤ n4 ≤ dimA,l (Pn ) ≤ n4 ≤ 2n+2 5 = dimA (Pn ),
n ≥ 7;
2. diml (K1,n ) =
dim 1,n ) = 1 ≤ n − 1 = dim(K1,n ) = dimA (K1,n ), n ≥ 2;
A,l (K
3. γ(Pn ) = n3 ≤ 2n+2 5 = dim nA(Pn ), n ≥ 7;
4. 4 ≤ dimA,l (Pn ) ≤ 4 ≤ 2 = β(Pn ), n ≥ 2.
n n

The proofs of results marked with an asterisk symbol (∗) can be found in the
long version of this paper that can be retrieved as a Technical Report [30].

2 The Metric Dimension of Corona Product Graphs

versus the Adjacency Dimension of a Graph
The following is the first main combinatorial result of this paper and provides a
strong link between the metric dimension of the corona product of two graphs
and the adjacency dimension of the second graph involved in the product op-
eration. A seemingly similar formula was derived in [20,35], but there, only the
notion of metric dimension was involved (which makes it impossible to use the
formula to obtain computational hardness results as we will do), and also, special
conditions were placed on the second argument graph of the corona product.
Theorem 1. For any connected graph G of order n ≥ 2 and any non-trivial
graph H, dim(G ( H) = n · dimA (H).
Proof. We first need to prove that dim(G ( H) ≤ n · dimA (H). For any i ∈
{1, . . . , n}, let Si be an adjacency basis of Hi , the ith -copy of H. In order to
n
show that X := i=1 Si is a metric generator for G ( H, we differentiate the
following four cases for two vertices x, y ∈ V (G ( H) − X.
1. x, y ∈ Vi . Since Si is an adjacency basis of Hi , there exists a vertex u ∈ Si
so that |NHi (u) ∩ {x, y}| = 1. Hence,

dGH (x, u) = dvi +Hi (x, u) = dvi +Hi (y, u) = dGH (y, u).
2. x ∈ Vi and y ∈ V . If y = vi , then for u ∈ Sj , j = i, we have

dGH (x, u) = dGH (x, y) + dGH (y, u) > dGH (y, u).

Now, if y = vj , j = i, then we also take u ∈ Sj and we proceed as above.

Notions of Metric Dimension of Corona Products 157

3. x = vi and y = vj . For u ∈ Sj , we ﬁnd that

dGH (x, u) = dGH (x, y) + dGH (y, u) > dGH (y, u).

4. x ∈ Vi and y ∈ Vj , j = i. In this case, for u ∈ Si we have

dGH (x, u) ≤ 2 < 3 ≤ dGH (u, y).

Hence, X is a metric generator for G ( H and, as a consequence,

n
dim(G ( H) ≤ |Si | = n · dimA (H).
i=1

It remains to prove that dim(G ( H) ≥ n · dimA (H). To do this, let W be

a metric basis for G ( H and, for any i ∈ {1, . . . , n}, let Wi := Vi ∩ W . Let us
show that Wi is an adjacency metric generator for Hi . To do this, consider two
diﬀerent vertices x, y ∈ Vi − Wi . Since no vertex a ∈ V (G ( H) − Vi distinguishes
the pair x, y, there exists some u ∈ Wi such that dGH (x, u) = dGH (y, u). Now,
since dGH (x, u) ∈ {1, 2} and dGH (y, u) ∈ {1, 2}, we conclude that |NHi (u) ∩
{x, y}| = 1 and consequently, Wi must be an adjacency generator for Hi . Hence,
for any i ∈ {1, . . . , n}, |Wi | ≥ dimA (Hi ). Therefore,

n
n
dim(G ( H) = |W | ≥ |Wi | ≥ dimA (Hi ) = n · dimA (H).
i=1 i=1

Consequences of Theorem 1 We can now investigate dim(G ( H) through the

study of dimA (H), and vice versa. In particular, results from [3,32,35] allow us
to deduce the exact adjacency dimension forseveral special graphs. For instance,
we ﬁnd that dimA (Cr ) = dimA (Pr ) = 2r+2 5 for any r ≥ 7. Other combinatorial
results of this type are collected in the long version of this paper [30].

A detailed analysis of the adjacency dimension of the corona product via the
adjacency dimension of the second operand. We now analyze the adjacency di-
mension of the corona product G ( H in terms of the adjacency dimension of H.

Theorem 2. (∗) Let G be a connected graph of order n ≥ 2 and let H be a

non-trivial graph. If there exists an adjacency basis S for H which is also a
dominating set, and if for every v ∈ V (H) − S, it is satisﬁed that S ⊆ NH (v),
then dimA (G ( H) = n · dimA (H).

Corollary 1. (∗) Let r ≥ 7 with r ≡ 1 mod 5 and r ≡ 3 mod 5. For any con-
nected graph G of order n ≥ 2, dimA (G ( Cr ) = dimA (G ( Pr ) = n · 2r+2
5 .

Theorem 3. Let G be a connected graph of order n ≥ 2 and let H be a non-

trivial graph. If there exists an adjacency basis for H which is also a dominating
set and if, for any adjacency basis S for H, there exists some v ∈ V (H) − S
such that S ⊆ NH (v), then dimA (G ( H) = n · dimA (H) + γ(G).
158 H. Fernau and J.A. Rodrı́guez-Velázquez

Proof. Let W be an adjacency basis for G ( H and let Wi = W ∩ Vi and

U = W ∩ V . Since two vertices belonging to Vi are not distinguished by any
u ∈ W − Vi , the set Wi must be an adjacency generator for Hi . Now consider
the partition {V , V } of V deﬁned as follows:

V = {vi ∈ V : |Wi | = dimA (H)} and V = {vj ∈ V : |Wj | ≥ dimA (H)+1}.

Note that, if vi ∈ V , then Wi is an adjacency basis for Hi , thus in this

case there exists ui ∈ Vi such that Wi ⊆ NHi (ui ). Then the pair ui , vi is not
distinguished by the elements of Wi and, as a consequence, either vi ∈ U or
there exists some vj ∈ U adjacent to vi . Hence, U ∪ V must be a dominating
set and, as a result, |U ∪ V | ≥ γ(G). So we obtain the following:

dimA (G ( H) = |W | = |Wi | + |Wj | + |U |
vi ∈V vj ∈V

≥ dimA (H) + (dimA (H) + 1) + |U |
vi ∈V vj ∈V

= n · dimA (H) + |V | + |U | ≥ n · dimA (H) + |V ∪ U |

≥ n · dimA (H) + γ(G).

To conclude the proof, we consider an adjacency basis S for H which is also

a dominating set, and we denote by Si the copy of S corresponding to Hi . We
claim that forany dominating set D of G of minimum cardinality |D| = γ(G),
n
the set D ∪ ( i=1 Si ) is an adjacency generator for G ( H and, as a result,
n

dimA (G ( H) ≤ D ∪ Si = n · dimA (H) + γ(G).

i=1
n
This can be seen by some case analysis. Let S = D ∪ i=1 Si and let us prove
that S is an adjacency generator for G ( H. We diﬀerentiate the following cases
for any pair x, y of vertices of G ( H not belonging to S .

1. x, y ∈ Vi . Since Si is an adjacency basis of Hi , there exists ui ∈ Si such that

ui is adjacent to x or to y but not to both.
2. x ∈ Vi , y ∈ Vj , j = i. As Si is a dominating set of Hi , there exists u ∈ Si
such that u ∈ NHi (x) and, obviously, u ∈ NGH (y).
3. x ∈ Vi , y = vi ∈ V . As y = vi ∈ D, vj ∈ NG (vi ) distinguishes the pair x, y.
4. x ∈ Vi ∪ {vi }, y = vj ∈ V , i = j. In this case, every u ∈ Sj is a neighbor of
y but not of x.

Corollary 2. Let r ≥ 2. Let G be a connected graph of order n ≥ 2. Then,

dimA (G ( Kr ) = n(r − 1) + γ(G).

Theorem 4. (∗) Let G be a connected graph of order n ≥ 2 and let H be a

non-trivial graph. If no adjacency basis for H is a dominating set, then we have:
dimA (G ( H) = n · dimA (H) + n − 1.
Notions of Metric Dimension of Corona Products 159

It is easy to check that any adjacency basis of a star graph K1,r is composed
of r − 1 leaves, with the last leaf non-dominated. Thus, Theorem 4 implies:

Corollary 3. For a connected graph G of order n ≥ 2, dimA (G(K1,r ) = n·r−1.

Given a vertex v ∈ V we denote by G − v the subgraph obtained from G

by removing v and the edges incident with it. We deﬁne the following auxiliary
domination parameter: γ (G) := minv∈V (G) {γ(G − v)}.

Theorem 5. (∗) Let H be a non-trivial graph such that some of its adjacency
bases are also dominating sets, and some are not. If there exists an adjacency
basis S for H such that for every v ∈ V (H) − S it is satisﬁed that S ⊆ NH (v),
and for any adjacency basis S for H which is also a dominating set, there exists
some v ∈ V (H) − S such that S ⊆ NH (v), then for any connected graph G of
order n ≥ 2, dimA (G ( H) = n · dimA (H) + γ (G).

As indicated in Figure 1, H = P5 satisﬁes the premises of Theorem 5, as in

particular there are adjacency bases that are also dominating set (see the leftmost
copy of a P5 in Figure 1) as well as adjacency bases that are not dominating sets
(see the rightmost copy of a P5 in that drawing). Hence, we can conclude:

Corollary 4. For any connected graph G of order n ≥ 2, dimA (G ( P5 ) =

2n + γ (G).

Since the assumptions of Theorems 2, 3, 4 and 5 are complementary and for

any graph G of order n ≥ 3 it holds that 0 < γ (G) ≤ γ(G) ≤ n2 < n − 1, we
can conclude that in fact, Theorems 2 and 5 are equivalences for n ≥ 3 (or even
n ≥ 2 in the ﬁrst case). Therefore, we obtain:

Theorem 6. Let G be a connected graph of order n ≥ 2 and let H be a non-

trivial graph. The following statements are equivalent:

(i) There exists an adjacency basis S for H, which is also a dominating set,
such that for every v ∈ V (H) − S it is satisﬁed that S ⊆ NH (v).
(ii) dimA (G ( H) = n · dimA (H).
(iii) dimA (G ( H) = dim(G ( H).

This should be conferred to the combinatorial results in [20], as it exactly tells

when they could possibly apply.
As an example of applying Theorem 6 we can take H as the cycle graphs Cr
or the path graphs Pr , where r ≥ 7, r ≡ 1 mod 5, r ≡ 3 mod 5, see Cor. 1.

Theorem 7. Let G be a connected graph of order n ≥ 3 and let H be a non-

trivial graph. The following statements are equivalent:

(i) No adjacency basis for H is a dominating set.

(ii) dimA (G ( H) = n · dimA (H) + n − 1.
(iii) dimA (G ( H) = dim(G ( H) + n − 1.
160 H. Fernau and J.A. Rodrı́guez-Velázquez

3 Locality in Dimensions
First, we consider some straightforward cases. If H is an empty graph, then
K1 (H is a star graph and diml (K1 (H) = 1. Moreover, if H is a complete graph
of order n, then K1 (H is a complete graph of order n+1 and diml (K1 (H) = n.
It was shown in [31] that for any connected nontrivial graph G and any empty
graph H, diml (G ( H) = diml (G). We are going to state results similar to the
non-local situation as discussed in the previous section. We omit all proofs as
they are along similar lines.
Theorem 8. (∗) For any connected graph G of order n ≥ 2 and any non-trivial
graph H, diml (G ( H) = n · dimA,l (H).
Based on [31], this allows to deduce quite a number of combinatorial results
for the new notion of a local adjacency dimension, as contained in [30].
Fortunately, the comparison of the local adjacency dimension of the corona
product with the one of the second argument is much simpler in the local version
as in the previously studied non-local version.
Theorem 9. (∗) Let G be a connected graph of order n ≥ 2 and let H be a non-
trivial graph. If there exists a local adjacency basis S for H such that for every
v ∈ V (H)−S it is satisﬁed that S ⊆ NH (v), then dimA,l (G(H) = n·dimA,l (H).
Theorem 10. (∗) Let G be a connected graph of order n ≥ 2 and let H be a non-
trivial graph. If for any local adjacency basis for H, there exists some v ∈ V (H)−
S which satisﬁes that S ⊆ NH (v), then dimA,l (G ( H) = n · dimA,l (H) + γ(G).
Remark 1. As a concrete example for the previous theorem, consider H = Kn .
Clearly, dimA,l (H) = n − 1, and the neighborhood of the only vertex that is
not in the local adjacency basis coincides with the local adjacency basis. For any
connected graph G of order n ≥ 2, we can deduce that

dimA,l (G ( Kn ) = n · dimA,l (Kn ) + γ(G) = n(n − 1) + γ(G).

Since the assumptions of Theorems 9 and 10 are complementary, we obtain

the following property for dimA,l (G ( H).
Theorem 11. Let G be a connected graph of order n ≥ 2 and let H be a non-
trivial graph. Then the following assertions are equivalent.
(i) There exists a local adjacency basis S for H such that for every v ∈ V (H)−
S it is satisﬁed that S ⊆ NH (v).
(ii) dimA,l (G ( H) = n · dimA,l (H).
(iii) diml (G ( H) = dimA,l (G ( H).

Theorem 12. Let G be a connected graph of order n ≥ 2 and let H be a non-

trivial graph. Then the following assertions are equivalent.
(i) For any local adjacency basis S for H, there exists some v ∈ V (H) − S
which satisﬁes that S ⊆ NH (v).
Notions of Metric Dimension of Corona Products 161

(ii) dimA,l (G ( H) = n · dimA,l (H) + γ(G).

(iii) diml (G ( H) = dimA,l (G ( H) − γ(G).

As a concrete example of graph H where we can apply the above result is the
star K1,r , r ≥ 2. In this case, for any connected graph G of order n ≥ 2, we ﬁnd
that dimA,l (G ( K1,r ) = n · dimA,l (K1,r ) + γ(G) = n + γ(G).

4 Computational Complexity of the Dimension Variants

In this section, we not only prove NP-hardness of all dimension variants, but
also show that the problems (viewed as minimization problems) cannot be solved
in time O(poly(n + m)2o(n) ) on any graph of order n (and size m). Yet, it is
straightforward to see that each of our computational problems can be solved in
time O(poly(n+m)2n ), simply by cycling through all vertex subsets by increasing
cardinality and then checking if the considered vertex set forms an appropriate
basis. More specifically, based on our reductions we can conclude that these
trivial brute-force algorithms are in a sense optimal, assuming the validity of the
Exponential Time Hypothesis (ETH). A direct consequence of ETH (using the
sparsification lemma) is the hypothesis that 3-SAT instances cannot be solved
in time O(poly(n + m)2o(n+m) ) on instances with n variables and m clauses;
see [19,4].
From a mathematical point of view, the most interesting fact is that most of
our computational results are based on the combinatorial results on the dimen-
sional graph parameters on corona products of graphs that are derived above.
Due to the practical motivation of the parameters, we also study their com-
putational complexity on planar graph instances.
We are going to investigate the following problems:
Dim: Given a graph G and an integer k, decide if dim(G) ≤ k or not.
LocDim: Given a graph G and an integer k, decide if diml (G) ≤ k or not.
AdjDim: Given a graph G and an integer k, decide if dimA (G) ≤ k or not.
LocAdjDim: Given a graph G and an integer k, decide if dimA,l (G) ≤ k or not.
As auxiliary problems, we will also consider:
VC: Given a graph G and an integer k, decide if β(G) ≤ k or not.
Dom: Given a graph G and an integer k, decide if γ(G) ≤ k or not.
1-LocDom: Given a graph G and an integer k, decide if there exists a 1-locating
dominating set of G with at most k vertices or not. (A dominating set D ⊆ V in
a graph G = (V, E) is called a 1-locating dominating set if for every two vertices
u, v ∈ V \ D, the symmetric difference of N (u) ∩ D and N (v) ∩ D is non-empty.)

Theorem 13. Dim is NP-complete, even when restricted to planar graphs.

Diﬀerent proofs of this type of hardness result appeared in the literature.

While this result is only mentioned in the textbook of Garey and Johnson [12],
a proof was ﬁrst published in [25]. For planar instances, we refer to [9] where
this result is stated.
162 H. Fernau and J.A. Rodrı́guez-Velázquez

Remark 2. In fact, we can offer a further proof for the NP-hardness of Dim
(on general graphs), based upon Theorem 1 and the following reasoning. If there
were a polynomial-time algorithm for computing dim(G), then we could compute
dimA (H) for any (non-trivial) graph H by computing dim(K2 ( H) with the
assumed polynomial-time algorithm, knowing that this is just twice as much
as dimA (H). As every NP-hardness proof adds a bit to the understanding of
the nature of the problem, this one does so, as well. It shows that Dim is NP-
complete even on the class of graphs that can be written as G ( H, where G is
some connected graph of order n ≥ 2 and H is non-trivial.
Theorem 14. (∗) 1-LocDom is NP-hard, even when restricted to planar graphs.
Moreover, assuming ETH, there is no O(poly(n + m)2o(n) ) algorithm solving
1-LocDom on general graphs of order n and size m.
Proof. (Sketch) Recall the textbook proof for the NP-hardness of VC (see [12])
that produces from a given 3-SAT instance I with n variables and m clauses a
graph G with two adjacent vertices per variable gadget and three vertices per
clause gadget forming a C3 (and 3m more edges that interconnect these gadgets
to indicate which literals occur in which clauses). So, G has 3m + 2n vertices
and 3m + n + 3m = 6m + n edges. We modify G to obtain G as follows: Each
edge that occurs inside of a variable gadget or of a clause gadget is replaced by a
triangle, so that we add 3m + n new vertices of degree two. All in all, this means
that G has (3m+2n)+(3m+n) = 6m+3n vertices and 9m+3n+3m = 12m+3n
edges. Now, assuming (w.l.o.g.) that I contains, for each variable x, at least one
clause with x as a literal and another clause with x̄ as a literal, we can show
that I is satisfiable iff G has a vertex cover of size at most 2m + n iff G has a
1-locating dominating set of size at most 2m + n.

The general case was treated in [7], but that proof (starting out again from
3-SAT) does not preserve planarity, as the variable gadget alone already con-
tains a K2,3 subgraph that inhibits non-crossing interconnections with the clause
gadgets. However, although not explicitly mentioned, that reduction also yields
the non-existence of O(poly(n + m)2o(n) ) algorithms based on ETH. In [30], we
also provide a reduction that works for planar graphs, working on a variant of
Lichtenstein’s reduction [27] that shows NP-hardness of VC on planar graph
instances.
Theorem 15. AdjDim is NP-complete, even when restricted to planar graphs.
Assuming ETH, there is no O(poly(n + m)2o(n) ) algorithm solving AdjDim on
graphs of order n and size m.
Proof. (Sketch) From an instance G = (V, E) and k of 1-LocDom, produce an
instance (G , k) of AdjDim by obtaining G from G by adding a new isolated
vertex x ∈
/ V to G. We claim that G has a 1-locating dominating set of size at
most k if and only if dimA (G ) ≤ k.

Alternatively, NP-hardness of AdjDim (and even the ETH-result) can be
deduced from the strong relation between the domination number and the ad-
jacency dimension as stated in Cor. 2, based on the NP-hardness of Dom.
Notions of Metric Dimension of Corona Products 163

Fig. 2. The clause gadget illustration. The square-shaped vertices do not belong to the
gadget, but they are the three literal vertices in variable gadgets that correspond to
the three literals in the clause.

As explained in Remark 2, Theorem 1 can be used to deduce furthermore:

Corollary 5. Assuming ETH, there is no O(poly(n+m)2o(n) ) algorithm solving
Dim on graphs of order n and size m.
Lemma 1. [28] Assuming ETH, there is no O(poly(n+m)2o(n) ) algorithm solv-
ing Dom on graphs of order n and size m.
From Remark 1 and Lemma 1, we can conclude:
Theorem 16. LocAdjDim is NP-complete. Moreover, assuming ETH, there
is no O(poly(n + m)2o(n) ) algorithm solving LocAdjDim on graphs of order n
and size m.
We provide an alternative proof of the previous theorem in [30]. That proof
is a direct reduction from 3-SAT and is, in fact, very similar to the textbook
proof for the NP-hardness of VC, also see the proof of Theorem 14. This also
proves that LocAdjDim is NP-complete when restricted to planar instances.
More precisely, the variable gadgets are paths on four vertices, where the middle
two ones interconnect to the clause gadgets in which they occur. The clause
gadgets are a bit more involved, as shown in Fig. 2.
As explained in Remark 2, we can (now) use Theorem 8 together with Theo-
rem 16 to conclude the following hitherto unknown complexity result.
Theorem 17. LocDim is NP-complete. Moreover, assuming ETH, there is no
O(poly(n+ m)2o(n) ) algorithm solving LocDim on graphs of order n and size m.
Notice that the reduction explained in Remark 2 does not help ﬁnd any hard-
ness results on planar graphs. Hence, we leave it as an open question whether
or not LocDim is NP-hard also on planar graph instances.
164 H. Fernau and J.A. Rodrı́guez-Velázquez

5 Conclusions

We have studied four dimension parameters in graphs. In particular, establish-

ing concise formulae for corona product graphs, linking (local) metric dimension
with (local) adjacency dimension of the involved graphs, allowed to deduce NP-
hardness results (and similar hardness claims) for all these graph parameters,
based on known results, in particular on Vertex Cover and on Dominating
Set problems. We hope that the idea of using such types of non-trivial (com-
binatorial) formulae for computational hardness proofs can be also applied in
other situations.
For instance, observe that reductions based on formulae as derived in The-
orem 1 clearly preserve the natural parameter of these problems, which makes
this approach suitable for Parameterized Complexity. However, let us notice here
that Dim is unlikely to be fixed-parameter tractable under the natural parame-
terization (i.e., an upper bound on the metric dimension) even for subcubic graph
instances; see [17]. Conversely, it is not hard to see that the natural parameter-
ization of AdjDim can be shown to be in FPT by reducing it to Test Cover.
Namely, let G = (V, E) be a graph and k be an integer, defining an instance
of
V
AdjDim. We construct a Test Cover instance as follows: Let S = be
2
the substances and define the potential test set T = {tv | v ∈ V } by letting

1, if v ∈ N [x])N [y]
tv ({x, y}) =
0, otherwise

Now, if D is some adjacency generator, then TD = {tv | v ∈ D} is some test cover

solution, i.e., for any pair of substances, we find a test that differentiates the two.
The converse is also true. Test Cover has received certain interest recently in
Parameterized Complexity [8,14]. Does AdjDim admit a polynomial-size kernel,
or does it rather behave like Test Cover?
From a computational point of view, let us mention (in-)approximability
results as obtained in [26,34]. In particular, inapproximability of 1-LocDom
readily transfers to inapproximability of AdjDim and this in turn leads to in-
approximability results for Dim as in Remark 2; also see [17].
Also, 1-locating dominating sets have been studied (actually, independently
introduced) in connection with coding theory [24]. Recall that these sets are
basically adjacency bases. Therefore, it might be interesting to try to apply
some of the information-theoretic arguments on variants of metric dimension, as
well. Conversely, the notion of locality used in this paper connects to the idea
of correcting only 1-bit errors in codes. These interconnections deserve further
studies.
All these computational hardness results, as well as the various different ap-
plications that led to the introduction of these graph dimension parameters, also
open up the quest for moderately exponential-time algorithms, i.e., algorithms
that should find an optimum solution for any of our dimension problems in time
O(poly(n + m)cn ) on graphs of size m and order n for some c < 2, or also to
Notions of Metric Dimension of Corona Products 165

ﬁnding polynomial-time algorithms for special graph classes. In this context, we

mention results on trees, series-parallel and distance-regular graphs [7,13,18].
In view of the original motivation for introducing these graph parameters,
it would be interesting to study their complexity on geometric graphs. Notice
that the definition of a metric generator is not exclusively referring to (finite)
graphs, which might lead us even back to the common roots of graph theory and
topology.
In view of the many different motivations, also the study of computational
aspects of other variants of dimension parameters could be of interest. We
only mention here the notions of resolving dominating sets [2] and independent
resolving sets [6].

References
1. Bailey, R.F., Meagher, K.: On the metric dimension of Grassmann graphs. Discrete
Mathematics & Theoretical Computer Science 13, 97–104 (2011)
2. Brigham, R.C., Chartrand, G., Dutton, R.D., Zhang, P.: Resolving domination in
graphs. Mathematica Bohemica 128(1), 25–36 (2003)
3. Buczkowski, P.S., Chartrand, G., Poisson, C., Zhang, P.: On k-dimensional graphs
and their bases. Periodica Mathematica Hungarica 46(1), 9–15 (2003)
4. Calabro, C., Impagliazzo, R., Paturi, R.: The complexity of satisﬁability of small
depth circuits. In: Chen, J., Fomin, F.V. (eds.) IWPEC 2009. LNCS, vol. 5917, pp.
75–85. Springer, Heidelberg (2009)
5. Charon, I., Hudry, O., Lobstein, A.: Minimizing the size of an identifying or
locating-dominating code in a graph is NP-hard. Theoretical Computer Sci-
ence 290(3), 2109–2120 (2003)
6. Chartrand, G., Saenpholphat, V., Zhang, P.: The independent resolving number of
a graph. Mathematica Bohemica 128(4), 379–393 (2003)
7. Colbourn, C.J., Slater, P.J., Stewart, L.K.: Locating dominating sets in series par-
allel networks. Congressus Numerantium 56, 135–162 (1987)
8. Crowston, R., Gutin, G., Jones, M., Saurabh, S., Yeo, A.: Parameterized study
of the test cover problem. In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS
2012. LNCS, vol. 7464, pp. 283–295. Springer, Heidelberg (2012)
9. Dı́az, J., Pottonen, O., Serna, M.J., van Leeuwen, E.J.: On the complexity of metric
dimension. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp.
419–430. Springer, Heidelberg (2012)
10. Feng, M., Wang, K.: On the metric dimension of bilinear forms graphs. Discrete
Mathematics 312(6), 1266–1268 (2012)
11. Frucht, R., Harary, F.: On the corona of two graphs. Aequationes Mathematicae 4,
322–325 (1970)
12. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co., New York (1979)
13. Guo, J., Wang, K., Li, F.: Metric dimension of some distance-regular graphs. Jour-
nal of Combinatorial Optimization, 1–8 (2012)
14. Gutin, G., Muciaccia, G., Yeo, A.: (non-)existence of polynomial kernels for the
test cover problem. Information Processing Letters 113(4), 123–126 (2013)
15. Hammack, R., Imrich, W., Klavžar, S.: Handbook of product graphs. Discrete
Mathematics and its Applications, 2nd edn. CRC Press (2011)
166 H. Fernau and J.A. Rodrı́guez-Velázquez

16. Harary, F., Melter, R.A.: On the metric dimension of a graph. Ars Combinatoria 2,
191–195 (1976)
17. Hartung, S., Nichterlein, A.: On the parameterized and approximation hardness of
metric dimension. In: Proceedings of the 28th IEEE Conference on Computational
Complexity (CCC 2013), pp. 266–276. IEEE (2013)
18. Haynes, T.W., Henning, M.A., Howard, J.: Locating and total dominating sets in
trees. Discrete Applied Mathematics 154(8), 1293–1300 (2006)
19. Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential
complexity? Journal of Computer and System Sciences 63(4), 512–530 (2001)
20. Iswadi, H., Baskoro, E.T., Simanjuntak, R.: On the metric dimension of corona
product of graphs. Far East Journal of Mathematical Sciences 52(2), 155–170
(2011)
21. Jannesari, M., Omoomi, B.: The metric dimension of the lexicographic product of
graphs. Discrete Mathematics 312(22), 3349–3356 (2012)
22. Johnson, M.: Structure-activity maps for visualizing the graph variables arising in
drug design. Journal of Biopharmaceutical Statistics 3(2), 203–236 (1993), pMID:
8220404
23. Johnson, M.A.: Browsable structure-activity datasets. In: Carbó-Dorca, R., Mezey,
P. (eds.) Advances in Molecular Similarity, pp. 153–170. JAI Press Inc., Stamford
(1998)
24. Karpovsky, M.G., Chakrabarty, K., Levitin, L.B.: On a new class of codes for
identifying vertices in graphs. IEEE Transactions on Information Theory 44(2),
599–611 (1998)
25. Khuller, S., Raghavachari, B., Rosenfeld, A.: Landmarks in graphs. Discrete Ap-
plied Mathematics 70, 217–229 (1996)
26. Laifenfeld, M., Trachtenberg, A.: Identifying codes and covering problems. IEEE
Transactions on Information Theory 54(9), 3929–3950 (2008)
27. Lichtenstein, D.: Planar formulae and their uses. SIAM Journal on Computing 11,
329–343 (1982)
28. Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the Exponential
Time Hypothesis. EATCS Bulletin 105, 41–72 (2011)
29. Okamoto, F., Phinezy, B., Zhang, P.: The local metric dimension of a graph. Math-
ematica Bohemica 135(3), 239–255 (2010)
30. Rodrı́guez-Velázquez, J.A., Fernau, H.: On the (adjacency) metric dimension of
corona and strong product graphs and their local variants: combinatorial and com-
putational results. Tech. Rep. arXiv:1309.2275 [math.CO], ArXiv.org, Cornell Uni-
versity (2013)
31. Rodrı́guez-Velázquez, J.A., Barragán-Ramı́rez, G.A., Gómez, C.G.: On the local
metric dimension of corona product graph (2013) (submitted)
32. Saputro, S., Simanjuntak, R., Uttunggadewa, S., Assiyatun, H., Baskoro, E.,
Salman, A., Bača, M.: The metric dimension of the lexicographic product of graphs.
Discrete Mathematics 313(9), 1045–1051 (2013)
33. Slater, P.J.: Leaves of trees. Congressus Numerantium 14, 549–559 (1975)
34. Suomela, J.: Approximability of identifying codes and locating-dominating codes.
Information Processing Letters 103(1), 28–33 (2007)
35. Yero, I.G., Kuziak, D., Rodrı́quez-Velázquez, J.A.: On the metric dimension
of corona product graphs. Computers & Mathematics with Applications 61(9),
2793–2798 (2011)
On the Complexity of Computing Two
Nonlinearity Measures

Magnus Gausdal Find

Department of Mathematics and Computer Science

University of Southern Denmark

Abstract. We study the computational complexity of two Boolean non-

linearity measures: the nonlinearity and the multiplicative complexity.
We show that if one-way functions exist, no algorithm can compute the
multiplicative complexity in time 2O(n) given the truth table of length
2n , in fact under the same assumption it is impossible to approximate
the multiplicative complexity within a factor of (2 − )n/2 . When given
a circuit, the problem of determining the multiplicative complexity is in
the second level of the polynomial hierarchy. For nonlinearity, we show
that it is #P hard to compute given a function represented by a circuit.

1 Introduction

In many cryptographical settings, such as stream ciphers, block ciphers and

hashing, functions being used must be deterministic but should somehow “look”
random. Since these two desires are contradictory in nature, one might settle
with functions satisfying certain properties that random Boolean functions pos-
sess with high probability. One property is to be somehow different from linear
functions. This can be quantitatively delineated using so called “nonlinearity
measures”. Two examples of nonlinearity measures are the nonlinearity, i.e. the
Hamming distance to the closest affine function, and the multiplicative complex-
ity, i.e. the smallest number of AND gates in a circuit over the basis (∧, ⊕, 1)
computing the function. For results relating these measures to each other and
cryptographic properties we refer to [6,4], and the references therein. The im-
portant point for this paper is that there is a fair number of results on the form
“if f has low value according to measure μ, f is vulnerable to the following
attack ...”. Because of this, it was a design criteria in the Advanced Encryption
Standard to have parts with high nonlinearity [10]. In a concrete situation, f
is an explicit, finite function, so it is natural to ask how hard it is to compute
μ given (some representation of) f . In this paper, the measure μ will be either
multiplicative complexity or nonlinearity. We consider the two cases where f is
being represented by its truth table, or by a circuit computing f .
We should emphasize that multiplicative complexity is an interesting measure
for other reasons than alone being a measure of nonlinearity: In many applica-
tions it is harder, in some sense, to handle AND gates than XOR gates, so one is
interested in a circuit over (∧, ⊕, 1) with a small number of AND gates, rather

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 167–175, 2014.

c Springer International Publishing Switzerland 2014
168 M.G. Find

than a circuit with the smallest number of gates. Examples of this include pro-
tocols for secure multiparty computation (see e.g. [8,15]), non-interactive secure
proofs of knowledge [3], and fully homomorphic encryption (see for example
[20]).
It is a main topic in several papers (see e.g. [5,7,9]1 ) to find circuits with
few AND gates for specific functions using either exact or heuristic techniques.
Despite this and the applications mentioned above, it appears that the compu-
tational hardness has not been studied before.
The two measures have very different complexities, depending on the repre-
sentation of f .
Organization of the Paper and Results. In the following section, we introduce
the problems and necessary definitions. All our hardness results will be based
on assumptions stronger than P = NP, more precisely the existence of pseu-
dorandom function families and the “Strong Exponential Time Hypothesis”. In
Section 3 we show that if pseudorandom function families exist, the multiplica-
tive complexity of a function represented by its truth table cannot be computed
(or even approximated with a factor (2 − )n/2 ) in polynomial time. This should
be contrasted to the well known fact that nonlinearity can be computed in al-
most linear time using the Fast Walsh Transformation. In Section 4, we consider
the problems when the function is represented by a circuit. We show that in
terms of time complexity, under our assumptions, the situations differ very little
from the case where the function is represented by a truth table. However, in
terms of complexity classes, the picture looks quite different: Computing the
nonlinearity is #P hard, and multiplicative complexity is in the second level of
the polynomial hierarchy.

2 Preliminaries
In the following, we let F2 be the finite field of size 2 and Fn2 the n-dimensional
vector space over F2 . We denote by Bn the set of Boolean functions, mapping
from Fn2 into F2 . We say that f ∈ Bn is affine if there exist a ∈ Fn2 , c ∈ F2 such
that f (x) = a · x + c and linear if f is affine with f (0) = 0, with arithmetic over
F2 . This gives the symbol “+” an overloaded meaning, since we also use it for
addition over the reals. It should be clear from the context, what is meant.
In the following an XOR-AND circuit is a circuit with fanin 2 over the basis
(∧, ⊕, 1) (arithmetic over GF (2)). All circuits from now on are assumed to be
XOR-AND circuits. We adopt standard terminology for circuits (see e.g. [21]).
If nothing else is specified, for a circuit C we let n be the number of inputs and
m be the number of gates, which we refer to as the size of C, denoted |C|. For a
circuit C we let fC denote the function computed by C, and c∧ (C) denote the
number of AND gates in C.
For a function f ∈ Bn , the multiplicative complexity of f , denoted c∧ (f ), is
the smallest number of AND gates necessary and sufficient in an XOR-AND
1
Here we mean concrete finite functions, as opposed to giving good (asymptotic)
upper bounds for an infinite family of functions.
On the Complexity of Computing Two Nonlinearity Measures 169

circuit computing f . The nonlinearity of a function f , denoted N L(f ) is the

Hamming distance to its closest aﬃne function, more precisely

N L(f ) = 2n − max |{x ∈ Fn2 |f (x) = a · x + c}|.

a∈Fn
2 ,c∈F2

We consider four decision problems in this paper: N LC , N LT T , M CC and

M CT T . For N LC (resp M CC ) the input is a circuit and a target s ∈ N and the
goal is to determine whether the nonlinearity (resp. multiplicative complexity)
of fC is at most s. For N LT T (resp. M CT T ) the input is a truth table of length
2n of a function f ∈ Bn and a target s ∈ N, with the goal to determine whether
the nonlinearity (resp. multiplicative complexity) of f is at most s.
We let a ∈R D denote that a is distributed uniformly at random from D. We
will need the following deﬁnition:
Deﬁnition 1. A family of Boolean functions f = {fn }n∈N , fn : {0, 1}n×{0, 1}n →
{0, 1}, is a pseudorandom function family if f can be computed in polynomial time
and for every probabilistic polynomial time oracle Turing machine A,

| Pr [Afn (k,·) (1n ) = 1] − Pr [Ag(·) (1n ) = 1]| ≤ n−ω(1) .

k∈R {0,1}n g∈R Bn

Here AH denotes that the algorithm A has oracle access to a function H, that
might be fn (k, ·) for some k ∈ Fn2 or a random g ∈ Bn , for more details see [1].
Some of our hardness results will be based on the following assumption.
Assumption 1. There exist pseudorandom function families.
It is known that pseudorandom function families exist if one-way functions
exist [11,12,1], so we consider Assumption 1 to be very plausible. We will also
use the following assumptions on the exponential complexity of SAT , due to
Impagliazzo and Paturi.
Assumption 2 (Strong Exponential Time Hypothesis [13]). For
any ﬁxed c < 1, no algorithm runs in time 2cn and computes SAT correctly.

3 Truth Table as Input

It is a well known result that given a function f ∈ Bn represented by a truth table
of length 2n , the nonlinearity can be computed using O(n2n ) basic arithmetic
operations. This is done using the “Fast Walsh Transformation” (See [19] or
chapter 1 in [16]).
In this section we show that the situation is diﬀerent for multiplicative com-
plexity: Under Assumption 1, M CT T cannot be computed in polynomial time.
In [14], Kabanets and Cai showed that if subexponentially strong pseudoran-
dom function families exist, the Minimum Circuit Size Problem (MCSP) (the
problem of determining the size of a smallest circuit of a function given its
truth table) cannot be solved in polynomial time. The proof goes by showing
that if MCSP could be solved in polynomial time this would induce a natural
170 M.G. Find

combinatorial property (as deﬁned in [17]) useful against circuits of polynomial

size. Now by the celebrated result of Razborov and Rudich [17], this implies the
nonexistence of subexponential pseudorandom function families.
Our proof below is similar in that we use results from [2] in a way similar
to what is done in [14,17] (see also the excellent exposition in [1]). However
instead of showing the existence of a natural and useful combinatorial property
and appealing to limitations of natural proofs, we give an explicit polynomial
time algorithm for breaking any pseudorandom function family, contradicting
Assumption 1.
Theorem 1. Under Assumption 1, on input a truth table of length 2n , M CT T
cannot be computed in time 2O(n) .
Proof. Let {fn }n∈N be a pseudorandom function family. Since f is computable
in polynomial time it has circuits of polynomial size (see e.g. [1]), so we can
choose c ≥ 2 such that c∧ (fn ) ≤ nc for all n ≥ 2. Suppose for the sake of contra-
diction that some algorithm computes M CT T in time 2O(n) . We now describe
an algorithm that breaks the pseudorandom function family. The algorithm has
access to an oracle H ∈ Bn , along with the promise either H(x) = fn (k, x) for
k ∈R Fn2 or H(x) = g(x) for g ∈R Bn . The goal of the algorithm is to distinguish
between the two cases. Specifically our algorithm will return 0 if H(x) = f (k, x)
for some k ∈ Fn2 , and if H(x) = g(x) it will return 1 with high probability, where
the probability is only taken over the choice of g.
Let s = 10c log n and define h ∈ Bs as h(x) = H(x0n−s ). Obtain the complete
truth table of h by querying H on all the 2s = 210c log n = n10c points. Now
compute c∧ (h). By assumption this can be done in time poly(n10c ). If c∧ (h) > nc ,
output 1, otherwise output 0. We now want to argue that this algorithm correctly
distinguishes between the two cases. Suppose first that H(x) = fn (k, ·) for some
k ∈ Fn2 . One can think of h as H where some of the input bits are fixed. But in
this case, H can also be thought of as fn with n of the input bits fixed. Now take
the circuit for fn with the minimal number of AND gates. Fixing the value of
some of the input bits clearly cannot increase the number of AND gates, hence
c∧ (h) ≤ c∧ (fn ) ≤ nc .
Now it remains to argue that if H is a random function, we output 1 with
high probability. We do this by using the following lemma.
Lemma 1 (Boyar, Peralta, Pochuev). For all s ≥ 0, the number of functions
in Bs that can be computed with an XOR-AND circuit using at most k AND gates
2
is at most 2k +2k+2ks+s+1 .
If g is random on Bn , then h is random on B10c log n , so the probability that
c∧ (h) ≤ nc is at most:
c 2
) +2(nc )+2(nc )(10c log n)+10c log n+1
2(n
.
2210c log n
This tends to 0, so if H is a random function the algorithm returns 0 with
probability o(1). In total we have
| Pr [Afn (k,·) (1n ) = 1] − Pr [Ag(·) (1n ) = 1]| = |0 − (1 − o(1))|,
k∈R {0,1}n g∈R Bn
On the Complexity of Computing Two Nonlinearity Measures 171

concluding that if the polynomial time algoritm for deciding M CT T exists, f is

not a pseudorandom function family.

A common question to ask about a computationally intractable problem is

how well it can be approximated by a polynomial time algorithm. An algorithm
approximates c∧ (f ) with approximation factor ρ(n) if it always outputs some
value in the interval [c∧ (f ), ρ(n)c∧ (f )]. By reﬁning the proof above, we see that
it is hard to compute c∧ (f ) within even a modest factor.

Theorem 2. For every constant > 0, under Assumption 1, no algorithm takes

the 2n bit truth table of a function f and approximates c∧ (f ) with ρ(n) ≤ (2 −
)n/2 in time 2O(n) .

Proof. Assume for the sake of contradiction that the algorithm A violates the
theorem. The algorithm breaking any pseudorandom function family works as
the one in the previous proof, but instead we return 1 if the value returned by
A is at least T = (nc + 1) · (2 − )n/2 . Now arguments similar to those in the
proof above show that if A returns a value larger than T , H must be random,
and if H is random, h has multiplicative complexity at most (nc + 1) · (2 − )n/2
with probability at most

c 2
+1)·(2−)(10c log n)/2 ) +2(nc +1)·(2−)10c log n/2 10c log n+10c log n+1
2((n
2210c log n
This tends to zero, implying that under the assumption on A, there is no pseu-
dorandom function family.

4 Circuit as Input

From a practical point of view, the theorems 1 and 2 might seem unrealistic.
We are allowing the algorithm to be polynomial in the length of the truth table,
which is exponential in the number of variables. However most functions used for
practical purposes admit small circuits. To look at the entire truth table might
(and in some cases should) be infeasible. When working with computational
problems on circuits, it is somewhat common to consider the running time in
two parameters; the number of inputs to the circuit, denoted by n, and the size
of the circuit, denoted by m. In the following we assume that m is polynomial
in n. In this section we show that even determining whether a circuit computes
an aﬃne function is coNP-complete. In addition N LC can be computed in time
poly(m)2n , and is #P-hard. Under Assumption 1, M CC cannot be computed
in time poly(m)2O(n) , and is contained in the second level of the polynomial
hierarchy. In the following, we denote by AF F IN E the set of circuits computing
aﬃne functions.

Theorem 3. AF F IN E is coNP complete.

172 M.G. Find

Proof. First we show that it actually is in coNP. Suppose C ∈ AF F IN E. Then

if fC (0) = 0, there exist x, y ∈ Fn2 such that fC (x + y) = fC (x) + fC (y) and if
C(0) = 1, there exists x, y such that C(x + y) + 1 = C(x) + C(y). Given C, x
and y this can clearly be computed in polynomial time. To show hardness, we
reduce from T AU T OLOGY , which is coNP-complete.
Let F be a formula on n variables, x1 , . . . , xn . Consider the following reduc-
tion: First compute c = F (0n ), then for every e(i) (the vector with all coor-
dinates 0 except the ith) compute F (e(i) ). If any of these or c are 0, clearly
F ∈ T AU T OLOGY , so we reduce to a circuit trivially not in AF F IN E. We
claim that F computes an aﬃne function if and only if F ∈ T AU T OLOGY .
Suppose F computes an aﬃne function, then F (x) = a · x + c for some a ∈ Fn2 .
Then for every e(i) , we have

F (e(i) ) = ai + 1 = 1 = F (0),

so we must have that a = 0, and F is constant. Conversely if it is not aﬃne, it

is certainly not constant. In particular it is not a tautology.

So even determining whether the multiplicative complexity or nonlinearity

is 0 is coNP complete. In the light of the above reduction, any algorithm for
AF F IN E induces an algorithm for SAT with essentially the same running
time, so under Assumption 2, AFFINE needs time essentially 2n . This should
be contrasted with the fact that the seemingly harder problem of computing
N LC can be done in time poly(m)2n by ﬁrst computing the entire truth table
and then using the Fast Walsh Transformation. Despite the fact that N LC does
not seem to require much more time to compute than AF F IN E, it is hard for
a much larger complexity class.

Theorem 4. N LC is #P-hard.

Proof. We reduce from #SAT . Let the circuit C on n variables be an instance

of #SAT . Consider the circuit C on n + 10 variables, deﬁned by

C (x1 , . . . , xn+10 ) = C(x1 , . . . , xn ) ∧ xn+1 ∧ xn+2 ∧ . . . ∧ xn+10 .

First we claim that independently of C, the best aﬃne approximation of fC is

always 0. Notice that 0 agrees with fC whenever at least one of xn+1 , . . . , xn+10
is 0, and when they are all 1 it agrees on |{x ∈ Fn2 |fC (x) = 0}| many points. In
total 0 and fC agree on

(210 − 1)2n + |{x ∈ Fn2 |fC (x) = 0}|

inputs. To see that any other aﬃne function approximates fC worse than 0,
notice that any nonconstant aﬃne function is balanced and thus has to disagree
On the Complexity of Computing Two Nonlinearity Measures 173

with fC very often. The nonlinearity of fC is therefore

N L(fC ) = 2n+10 − max |{x ∈ Fn+10
2 |fC (x) = a · x + c}|
a∈Fn+10
2 ,c∈F2

= 2n+10 − |{x ∈ Fn+10

2 |fC (x) = 0}|
10
=2 n+10
− (2 − 1)2 + |{x ∈ Fn2 |fC (x) = 0}|
n

= 2n − |{x ∈ Fn2 |fC (x) = 0}|

= |{x ∈ Fn2 |fC (x) = 1}|

So the nonlinearity of fC equals the number satisfying assignments for C.

So letting the nonlinearity, s, be a part of the input for N LC changes the
problem from being in level 1 of the polynomial hierarchy to be #P hard, but
does not seem to change the time complexity much. The situation for M CC is
essentially the opposite, under Assumption 1, the time M CC needs is strictly
more time than AF F IN E, but is contained in Σ2p . By appealing to Theorem 1
and 2, the following theorem follows.
Theorem 5. Under Assumption 1, no polynomial time algorithm computes M CC .
Furthermore no algorithm with running time poly(m)2O(n) approximates c∧ (f )
with a factor of (2 − )n/2 for any constant > 0.
We conclude by showing that although M CC under Assumption 1 requires
more time, it is nevertheless contained in the second level of the polynomial
hierarchy.
Theorem 6. M CC ∈ Σ2p .
Proof. First observe that M CC written as a language has the right form:
M CC = {(C, s)|∃C ∀x ∈ Fn2 (C(x) = C (x) and c∧ (C ) ≤ s)}.
Now it only remains to show that one can choose the size of C is polynomial
in n + |C|. Speciﬁcally, for any f ∈ Bn , if C is the circuit with the smallest
number of AND gates computing f , for n ≥ 3, we can assume that |C | ≤
2(c∧ (f ) + n)2 + c∧ . For notational convenience let c∧ (f ) = M . C consists of
XOR and AND gates and each of the M AND gates has exactly two inputs and
one output. Consider some topological ordering of the AND gates, and call the
output of the ith AND gate oi . Each of the inputs to an AND gate is a sum
(in F2 ) of xi s, oi s and possibly the constant 1. Thus the 2M inputs to the AND
gates and the output, can be thought of as 2M + 1 sums over F2 over n + M + 1
variables (we can think of the constant 1 as a variable with a hard-wired value).
This can be computed with at most
(2M + 1)(n + M + 1) ≤ 2(M + n)2
XOR gates, where the inequality holds for n ≥ 3. Adding c∧ (f ) for the AND
gates, we get the claim. The theorem now follows, since c∧ (f ) ≤ |C|

174 M.G. Find

The relation between circuit size and multiplicative complexity given in the
proof above is not tight, and we do not need it to be. See [18] for a tight rela-
tionship.

Acknowledgements. The author wishes to thank Joan Boyar for helpful

discussions.

References
1. Arora, S., Barak, B.: Computational Complexity - A Modern Approach, pp. 1–579.
Cambridge University Press (2009)
2. Boyar, J., Peralta, R., Pochuev, D.: On the multiplicative complexity of Boolean
functions over the basis (∧,⊕,1). Theoretical Computer Science 235(1), 43–57
(2000)
3. Boyar, J., Damgård, I., Peralta, R.: Short non-interactive cryptographic proofs. J.
Cryptology 13(4), 449–472 (2000)
4. Boyar, J., Find, M., Peralta, R.: Four measures of nonlinearity. In: Spirakis, P.G.,
Serna, M. (eds.) CIAC 2013. LNCS, vol. 7878, pp. 61–72. Springer, Heidelberg
(2013)
5. Boyar, J., Matthews, P., Peralta, R.: Logic minimization techniques with applica-
tions to cryptology. J. Cryptology 26(2), 280–312 (2013)
6. Carlet, C.: Boolean functions for cryptography and error correcting codes. In:
Crama, Y., Hammer, P.L. (eds.) Boolean Models and Methods in Mathematics,
Computer Science, and Engineering, ch. 8, pp. 257–397. Cambridge Univ. Press,
Cambridge (2010)
7. Cenk, M., Özbudak, F.: On multiplication in ﬁnite ﬁelds. J. Complexity 26(2),
172–186 (2010)
8. Chaum, D., Crépeau, C., Damgård, I.: Multiparty unconditionally secure protocols
(extended abstract). In: Simon, J. (ed.) STOC, pp. 11–19. ACM (1988)
9. Courtois, N., Bard, G.V., Hulme, D.: A new general-purpose method to multiply
3x3 matrices using only 23 multiplications. CoRR abs/1108.2830 (2011)
10. Daemen, J., Rijmen, V.: AES proposal: Rijndael (1999),
http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf
11. Goldreich, O., Goldwasser, S., Micali, S.: How to construct random functions. J.
ACM 33(4), 792–807 (1986)
12. Håstad, J., Impagliazzo, R., Levin, L.A., Luby, M.: A pseudorandom generator
from any one-way function. SIAM J. Comput. 28(4), 1364–1396 (1999)
13. Impagliazzo, R., Paturi, R.: On the complexity of k-SAT. J. Comput. Syst.
Sci. 62(2), 367–375 (2001)
14. Kabanets, V., yi Cai, J.: Circuit minimization problem. In: Yao, F.F., Luks, E.M.
(eds.) STOC, pp. 73–79. ACM (2000)
15. Kolesnikov, V., Schneider, T.: Improved garbled circuit: Free XOR gates and
applications. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M.,
Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part II. LNCS, vol. 5126,
pp. 486–498. Springer, Heidelberg (2008)
16. O’Donnell, R.: Analysis of Boolean Functions. Book draft (2012),
http://www.analysisofbooleanfunctions.org
17. Razborov, A.A., Rudich, S.: Natural proofs. J. Comput. Syst. Sci. 55(1), 24–35
(1997)
On the Complexity of Computing Two Nonlinearity Measures 175

18. Sergeev, I.S.: A relation between additive and multiplicative complexity of Boolean
functions. CoRR abs/1303.4177 (2013)
19. Sloane, N., MacWilliams, F.J.: The Theory of Error-Correcting Codes. North-
Holland Math. Library 16 (1977)
20. Vaikuntanathan, V.: Computing blindfolded: New developments in fully homomor-
phic encryption. In: Ostrovsky, R. (ed.) FOCS, pp. 5–16. IEEE (2011)
21. Wegener, I.: The Complexity of Boolean Functions. Wiley-Teubner (1987)
Block Products and Nesting Negations in FO2

Lukas Fleischer1 , Manfred Kufleitner1,2,∗ , and Alexander Lauser1,∗

1
Formale Methoden der Informatik, Universität Stuttgart, Germany
2
Fakultät für Informatik, Technische Universität München, Germany
fleiscls@studi.informatik.uni-stuttgart.de,
kufleitn@in.tum.de,
lauser@fmi.uni-stuttgart.de

Abstract. The alternation hierarchy in two-variable ﬁrst-order logic

FO2 [<] over words was recently shown to be decidable by Kufleitner
and Weil, and independently by Krebs and Straubing. In this paper we
consider a similar hierarchy, reminiscent of the half levels of the dot-depth
2
hierarchy or the Straubing-Thérien hierarchy. The fragment Σm of FO2
is defined by disallowing universal quantifiers and having at most m − 1
2
nested negations. One can view Σm as the formulas in FO2 which have
at most m blocks of quantifiers on every path of their parse tree, and the
first block is existential. Thus, the mth level of the FO2 -alternation hier-
2
archy is the Boolean closure of Σm . We give an effective characterization
2
of Σm , i.e., for every integer m one can decide whether a given regular
language is definable by a two-variable first-order formula with negation
nesting depth at most m. More precisely, for every m we give ω-terms Um
and Vm such that an FO2 -definable language is in Σm 2
if and only if its
ordered syntactic monoid satisfies the identity Um Vm . Among other
techniques, the proof relies on an extension of block products to ordered
monoids.

1 Introduction

The study of logical fragments over words has a long tradition in computer
science. The seminal Büchi-Elgot-Trakhtenbrot Theorem from the early 1960s
states that a language is regular if and only if it is definable in monadic second-
order logic [1,5,32]. A decade later, in 1971, McNaughton and Papert showed
that a language is definable in first-order logic if and only if it is star-free [17].
Combining this result with Schützenberger’s famous characterization of star-
free languages in terms of finite aperiodic monoids [21] shows that it is decidable
whether a given regular language is first-order definable. Since then, many logical
fragments have been investigated, see e.g. [3,25] for overviews.
The motivation for such results is two-fold. First, restricted fragments often
yield more efficient algorithms for computational problems such as satisfiability
or separability. Second, logical fragments give rise to a descriptive complexity:
∗
The last two authors acknowledge the support by the German Research Foundation
(DFG) under grant DI 435/5-1.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 176–189, 2014.
© Springer International Publishing Switzerland 2014
Block Products and Negation Nesting in FO2 177

The simpler the fragment to define a language, the simpler the language. This
approach can help in understanding the rich structure of regular languages.
Logical fragments are usually defined by restricting some resources in formu-
las. The three most natural restrictions are the quantifier depth (i.e., the num-
ber of nested quantifiers), the alternation depth (i.e., the number alternations
between existential and universal quantification), and the number of variables.
With respect to decidability questions regarding definability, quantifier depth is
not very interesting since for fixed quantifier depth only finitely many languages
are definable (which immediately yields decidability), see e.g. [4]. The situation
with alternation in first-order logic is totally different: Only the very first level
(i.e., no alternation) is known to be decidable [8,23]. By a result of Thomas [31]
the alternation hierarchy in first-order logic is tightly connected with the dot-
depth hierarchy [2] or the Straubing-Thérien hierarchy [24,29], depending on the
presence or absence of the successor predicate. Some progress in the study of
the dot-depth hierarchy and the Straubing-Thérien hierarchy was achieved by
considering the half-levels. For example, the levels 1⁄2 and 3⁄2 in each of the two
hierarchies are decidable [6,18,19]. The half levels also have a counterpart in the
alternation hierarchy of first-order logic by requiring existential quantifiers in the
first block. Another point of view of the same hierarchy is to disallow universal
quantifiers and to restrict the number of nested negations.
Regarding the number of variables, Kamp showed that linear temporal logic is
expressively complete for first-order logic over words [7]. Since every modality in
linear temporal logic can be defined using three variables, first-order logic with
only three different names for the variables (denoted by FO3 ) defines the same
languages as full first-order logic. This result is often stated as FO3 = FO. Allow-
ing only two variable names yields the proper fragment FO2 of first-order logic.
Thérien and Wilke [30] showed that a language is FO2 definable if and only if
its syntactic monoid belongs to the variety DA and, since the latter is decidable,
one can effectively check whether a given regular language is FO2 -definable. For
further information on the numerous characterizations of FO2 we refer to [3,28].
Inside FO2 , the alternation depth is also a natural restriction. One difference
to full first-order logic is that one cannot rely on prenex normal forms as a
simple way of defining the alternation depth. Weil and the second author gave
an effective algebraic characterization of the mth level FO2m of this hierarchy.
More precisely, they showed that it is possible to ascend the FO2 -alternation
hierarchy using so-called Mal’cev products [15] which in this particular case
preserve decidability. There are two main ingredients in the proof. The first
one is a combinatorial tool known as rankers [33] or turtle programs [22], and
the second is a relativization property of two-variable first-order logic. These
two ingredients are then combined using a proof method introduced in [10].
Krebs and Straubing gave another decidable characterization of FO2m in terms
of identities of ω-terms using completely different techniques [9,26]; their proof
relies on so-called block products.
In this paper we consider the half-levels Σm 2
of the FO2 -alternation hierarchy.
A language is definable in Σm if and only if it is definable in FO2 without universal
2
178 L. Fleischer, M. Kufleitner, and A. Lauser

quantifiers and with at most m − 1 nested negations. It is easy to see that one can
avoid negations of atomic predicates. One can think of Σm 2
as those FO2 -formulas
which on every path of their parse tree have at most m quantifier blocks, and the
outermost block is existential. The main contribution of this paper are ω-terms
Um and Vm such that an FO2 -definable language is Σm 2
-definable if and only if
its ordered syntactic monoid satisfies Um Vm . For a given regular language it
2
is therefore decidable whether it is definable in Σm by first checking whether it
is FO -definable and if so, then verifying whether Um Vm holds in its ordered
2

syntactic monoid. Moreover, for every FO2 -definable language L one can compute
2
the smallest integer m such that L is definable in Σm .
The proof step from the identities to logic is a refinement of the approach
of Weil and the second author [15] which in turn uses a technique from [10,
Section IV]. While the proof method in [10] is quite general and can be applied for
solving various other problems [11,12,13,14], it relies on closure under negation.
A very specific modification is necessary in order to get the scheme working in
the current situation.
The proof for showing that Σm2
satisfies the identity Um Vm is an adaptation
of Straubing’s proof [26] to ordered monoids. Straubing’s proof relies on two-
sided semidirect products and the block product principle. We partially extend
both tools to ordered monoids. To the best of our knowledge, this extension does
not yet appear in the literature. The attribute partially is due to the fact that
only the first factor in two-sided semidirect products (as used in this paper) is
ordered while the second factor is an unordered monoid. As shown by Pin and
Weil in the case of one-sided semidirect products [20], one could use ordered
alphabets for further extending this approach. We refrain from this in order to
focus on the presentation of our main result.

2 Preliminaries

The free monoid A∗ is the set of finite words over A equipped with concatenation
and the empty word ε as neutral element. Let u = a1 · · · ak with ai ∈ A be a finite
word. The alphabet (also known as the content ) of u is alph(u) = {a1 , . . . , ak }, its
length is |u| = k, and the positions of u are 1, . . . , k. We say that i is an a-position
of u if ai = a. The word u is a (scattered) subword of w if w ∈ A∗ a1 · · · A∗ ak A∗.
First-Order Logic. We consider first-order logic FO = FO[<] over finite
words. The syntax of FO-formulas is

ϕ ::= + | ⊥ | λ(x) = a | x = y | x < y | ¬ϕ | ϕ ∨ ϕ | ϕ ∧ ϕ | ∃x ϕ

where a ∈ A is a letter, and x and y are variables. We consider universal

quantiﬁers ∀x ϕ as an abbreviation of ¬∃x ¬ϕ, and x y is a shortcut for
(x = y) ∨ (x < y). The atomic formulas + and ⊥ are true and false, respec-
tively. Variables are interpreted as positions of a word, and λ(x) = a is true
if x is an a-position. The semantics of the other constructs is as usual; in par-
ticular, ∃x ϕ means that there exists a position x which makes ϕ true, and
Block Products and Negation Nesting in FO2 179

x < y means that position x is (strictly) smaller than position y. We write

ϕ(x1 , . . . , x ) for a formula ϕ if at most the variables xi appear freely in ϕ; and
we write u, p1 , . . . , p |= ϕ(x1 , . . . , x ) if ϕ is true over u when xi is interpreted
as pi . A sentence is a formula without free variables. A first-order sentence ϕ
defines the language L(ϕ) = {u ∈ A∗ | u |= ϕ}, and a language is definable in a
first-order fragment F if it is defined by some sentence in F .
The formulas ϕm in the mth level Σm of the negation nesting hierarchy in
FO are defined as follows:

ϕm ::= ϕm−1 | ¬ϕm−1 | ϕm ∨ ϕm | ϕm ∧ ϕm | ∃x ϕm

ϕ0 ::= + | ⊥ | λ(x) = a | x = y | x < y | ¬ϕ0 | ϕ0 ∨ ϕ0 | ϕ0 ∧ ϕ0

This means, for m 1 the formulas in Σm have at most m − 1 nested negations

over quantifier-free formulas ϕ0 . Using De Morgan’s laws and the following equiv-
alences, one can avoid negations in quantifier-free formulas for fixed alphabet A:

λ(x) = a ≡ λ(x) = b
b∈A\{a}

x = y ≡ (x < y) ∨ (y < x)
¬(x < y) ≡ (x = y) ∨ (y < x)

Also note that, up to logical equivalence, our deﬁnition of Σm coincides with

the more common definition in terms of formulas in prenex normal form with at
most m blocks of quantifiers which start with an existential block. This can be
seen by the usual procedure of renaming the variables and successively moving
quantifiers outwards.
The two-variable fragment FO2 of first-order logic uses (and reuses) only two
different variables, say x and y. Combining FO2 and Σm yields the fragment Σm 2
.
That is, we have ϕ ∈ Σm if both ϕ ∈ Σm and ϕ ∈ FO . This also justifies the
2 2
2
notation Σm which inherits the symbol as well as the subscript from Σm and
the exponent from FO2 . The Boolean closure of Σm2
is the mth level FO2m of the
alternation hierarchy within FO . 2

Ordered Monoids. Green’s relations are an important tool in the study of

finite monoids. For x, y ∈ M let x R y if xM ⊆ yM , and let x L y if
M x ⊆ M y. We write x R y if both x R y and y R x; and we set x <R y if
x R y but not x R y. The relations L and <L are defined similarly. An element
x ∈ M is idempotent if x2 = x. For every finite monoid M there exists an integer
ωM 1 such that xωM is the unique idempotent power generated by x ∈ M . If
the reference to M is clear from the context, we simply write ω instead of ωM .
An ordered monoid (M, ) is a monoid M equipped with a partial order
which is compatible with multiplication in M ; that is, x x and y y implies
xy x y . Every monoid can be considered as an ordered monoid by using
the identity relation as order. If no ambiguity arises, we subsequently use the
notation M without explicitly mentioning the order. An order ideal of M is a
subset I ⊆ M such that y x and x ∈ I implies y ∈ I.
180 L. Fleischer, M. Kufleitner, and A. Lauser

A monotone homomorphism h : M → N is a monoid homomorphism of ordered

monoids M and N such that x y implies h(x) h(y). Submonoids of ordered
monoids naturally inherit the order. A monoid N divides a monoid M if there ex-
ists a surjective homomorphism from a submonoid of M onto N ; moreover, if M
and N are ordered, then we require the homomorphism to be monotone. The direct
product of ordered monoids M1 , . . . , Mk is the usual direct product M1 × · · · × Mk
equipped with the product order, i.e., (x1 , . . . , xk ) (y1 , . . . , yk ) if xi yi for all
i ∈ {1, . . . , k}. The empty direct product is the trivial monoid.
Varieties and Identities. A variety (respectively, positive variety) is a class
of finite monoids (respectively, finite ordered monoids) closed under division and
finite direct products. By abuse of notation, we sometimes say that an ordered
monoid (M, ) belongs to a variety V of unordered monoids if M ∈ V. Both
varieties and positive varieties are often defined by identities of ω-terms. We
only describe the formal setting for positive varieties. The ω-terms over the
variables X are defined inductively: The constant 1 ∈ X is an ω-term and
every variable x ∈ X is an ω-term. If u and v are ω-terms, then so are uv
and uω . Here, ω is considered as a formal symbol instead of a fixed integer.
Every mapping h : X → M to a finite monoid M uniquely extends to ω-terms by
setting h(1) = 1, h(uv) = h(u)h(v) and h(uω ) = h(u)ωM. An ordered monoid M
satisfies the identity U V for ω-terms U and V if h(U ) h(V ) for all mappings
h : X → M . It satisfies U = V if it satisfies both U V and V U. Every class
of ordered monoids defined by a set of identities of ω-terms forms a positive
variety. In this paper, we need the following varieties:
– The variety J is the class of all so-called J -trivial finite monoids. There are
several well-known characterizations of this class, the most popular being
Simon’s Theorem on piecewise testable languages [23]. One can define J by
the identities (xyz)ω y = (xyz)ω = y(xyz)ω .
– The positive variety J+ is defined by the identity x 1. There is a language
theoretic characterization similar to Simon’s Theorem in terms of so-called
shuffle ideals [18].
– The variety DA is defined by (xyz)ω y(xyz)ω = (xyz)ω . Suppose M ∈ DA
and let u, v, a ∈ M . If v R u R ua, then v R va; and symmetrically, if
v L u L au, then v L av, see e.g. [12, Lemma 1].
Languages and Syntactic Monoids. A language L ⊆ A∗ is recognized by
a homomorphism h : A∗ → M to some ordered monoid M if L = h−1 (I) for
some order ideal I of M . An ordered monoid M recognizes a language L ⊆ A∗
if there exists a homomorphism h : A∗ → M which recognizes L. The syntactic
preorder L on words is defined as follows: We set u L v for u, v ∈ A∗ if
pvq ∈ L implies puq ∈ L for all p, q ∈ A∗ . We write u ≡L v if both u L v
and v L u. The syntactic monoid ML of L is the quotient A∗ /≡L consisting
of the equivalence classes of ≡L ; it is the unique minimal recognizer of L and
it is effectively computable from any reasonable presentation of a given regular
language. The syntactic preorder induces a partial order on the ≡L -classes such
that ML becomes an ordered monoid. The syntactic homomorphism hL : A∗ →
ML is the natural quotient map.
Block Products and Negation Nesting in FO2 181

3 Two-Sided Semidirect Products of Ordered Monoids

The two-sided semidirect product of finite monoids is a useful tool for studying
decompositions and hierarchies of varieties, see e.g. [25]. In this section, we par-
tially extend the definition to ordered monoids. Let M be an ordered monoid
and let N be a monoid. We write the operation in M additively to improve read-
ability, which does not mean that M is commutative. A left action of N on M
is a mapping (n, m) → n · m from N × M to M such that for all m, m1 , m2 ∈ M
and all n, n1 , n2 ∈ N the following axioms hold:
n · (m1 + m2 ) = n · m1 + n · m2
(n1 n2 ) · m = n1 · (n2 · m)
1·m=m
n·0 = 0
n · m1 n · m2 whenever m1 m2
To shorten notation, we usually write nm instead of n·m. A right action of N on
M is defined symmetrically. A left and a right action are compatible if (n1 m)n2 =
n1 (mn2 ) for all m ∈ M and all n1 , n2 ∈ N . For compatible left and right actions
of N on M we define the two-sided semidirect product M ∗∗ N as the ordered
monoid on the set M × N with the multiplication
(m1 , n1 )(m2 , n2 ) = (m1 n2 + n1 m2 , n1 n2 ),
and the order given by
(m1 , n1 ) (m2 , n2 ) if and only if m1 m2 and n1 = n2 .
It is straightforward to verify that M ∗∗N indeed is an ordered monoid for each pair
of compatible actions. The two-sided semidirect product with left action (n, m) →
m and right action (m, n) → m yields the direct product of M and N . In this sense
the two-sided semidirect product generalizes the usual direct product.
We now define the so-called block product as a particular two-sided semidirect
product. Let M N ×N be the ordered monoid of all functions from N × N to the
ordered monoid M with componentwise operation. These functions are ordered
by f1 f2 if f1 (n1 , n2 ) f2 (n1 , n2 ) for all n1 , n2 ∈ N . One can view M N ×N as
the direct product of |N |2 copies of M . The block product M N is the two-sided
semidirect product M N ×N ∗∗ N induced by the following pair of left and right
actions. For f ∈ M N ×N and n, n1 , n2 ∈ N let
(nf )(n1 , n2 ) = f (n1 , nn2 ) and (f n)(n1 , n2 )= f (n1 n, n2 ).
The relationship between two-sided semidirect products and block products is
the same as in the unordered case; see e.g. [27].
Proposition 1. Let M, M , N, N be monoids and suppose that M and M are
ordered. The following properties hold:
1. Both M and (N, =) divide every two-sided semidirect product M ∗∗ N .
2. Every two-sided semidirect product M ∗∗ N divides M N .
3. If M divides M and N divides N , then M N divides M N .
182 L. Fleischer, M. Kufleitner, and A. Lauser

We now extend the notion of two-sided semidirect products to varieties. For a

positive variety V and a variety W we let V ∗∗ W consist of all ordered monoids
dividing a two-sided semidirect product M ∗∗ N for some M ∈ V and N ∈ W.
For two-sided semidirect products M ∗∗ N and M ∗∗ N , we define a new two-
sided semidirect product (M × M ) ∗∗ (N × N ) by the actions
(n, n )(m, m ) = (nm, n m )
(m, m )(n, n ) = (mn, m n )
for all m ∈ M , m ∈ M , n ∈ N , and n ∈ N . An elementary verification shows
that this two-sided semidirect product is isomorphic to (M ∗∗ N ) × (M ∗∗ N ),
and V ∗∗ W forms a positive variety. By Proposition 1 we see that V ∗∗ W
is identical to the positive variety generated by all block products M N with
M ∈ V and N ∈ W.
For a homomorphism hN : A∗ → N we consider the alphabet AN = N ×A×N
and the length-preserving mapping σhN : A∗ → A∗N defined by σhN (a1 · · · an ) =
b1 · · · bn , where
bi = (hN (a1 · · · ai−1 ), ai , hN (ai+1 · · · an ))
for all i ∈ {1, . . . , n}. The following proposition uses such mappings to charac-
terize the languages recognized by two-sided semidirect products. It is known as
the block product principle.
Proposition 2. Let V be a positive variety, let W be a variety, and let L ⊆ A∗ .
The following conditions are equivalent.
1. L is recognized by an ordered monoid in V ∗∗ W.
2. There exists a homomorphism hN : A∗ → N with N ∈ W such that L is a
finite union of languages of the form σh−1
N
(LK ) ∩ LN with LK ⊆ A∗N being
∗
recognized by a monoid in V and LN ⊆ A being recognized by hN .

4 Decidability of Negation Nesting in FO2

In this section we give two algebraic characterizations of the languages definable
2
in the fragment Σm of two-variable first-order logic with a restricted number of
nested negations. The first description is in terms of (weakly) iterated two-sided
semidirect products with J -trivial monoids. For this we define a sequence of
positive varieties by setting W1 = J+ and Wm = Wm−1 ∗∗ J. As for the second
characterization, we define sequences of ω-terms Um and Vm by setting
U1 = z, Um = (Um−1 xm )ω Um−1 (ym Um−1 )ω ,
V1 = 1, Vm = (Um−1 xm )ω Vm−1 (ym Um−1 )ω ,
where x2 , y2 , . . ., xm , ym , z are variables.
Theorem 1. Let L ⊆ A∗ and let m 1. The following conditions are equivalent:
2
1. L is definable in Σm .
2. The ordered syntactic monoid of L is in Wm .
3. The ordered syntactic monoid of L is in DA and satisfies Um Vm .
Block Products and Negation Nesting in FO2 183

Since condition 3. in Theorem 1 is decidable for any given regular language L,

this immediately yields the following corollary.
2
Corollary 1. It is decidable whether a given regular language is definable in Σm .
Note that in condition 3. of Theorem 1 one cannot drop requiring that the
syntactic monoid is in DA. For example, the syntactic monoid of A∗ \ A∗ aaA∗
over A = {a, b} satisfies the identity Um Vm for all m 2. It is nonetheless
not Σm2
-definable, because it is not even FO2 -definable (and thus its syntactic
monoid is not in DA). The remainder of this section proves Theorem 1. We
begin with the direction (1) ⇒ (2). The arguments are similar to Straubing’s for
characterizing FO2m in terms of unordered two-sided semidirect products [26].
Lemma 1. Let m 1. If L is definable in Σm
2
, then ML ∈ Wm .
2
Proof. Let ϕ be a sentence in Σm such that L = L(ϕ). We may assume that
quantifier-free subformulas of ϕ do not contain negations.
The proof proceeds by induction on m. For the base case m = 1, the lan-
guage L is a finite union of languages of the form A∗ a1 · · · A∗ ak A∗ and thus
pq ∈ L implies puq ∈ L for all p, u, q ∈ A∗ . This means that ML satisfies x 1
and therefore, ML ∈ J+ , see [18].
Let now m 2. An innermost block of ϕ is a maximal negation-free sub-
formula ψ(x) of ϕ. As in the unordered case, one can show that each block is
equivalent to a disjunction of formulas of the form
r
λ(x) = a ∧ ∃y1 · · · ∃yr (yi < x ∧ λ(yi ) = ai ) ∧ π(y1 , . . . , yr ) ∧
i=1

s

∃z1 · · · ∃zs (zi > x ∧ λ(zi ) = ai ) ∧ π (z1 , . . . , zs ) ,
i=1

where π and π are quantiﬁer-free formulas deﬁning an order on their parameters.

Hence, each innermost block ψ(x) requires that x is an a-position and that
certain subwords appear to the left and to the right of position x. Let k be the
maximum of all r and s occurring in these blocks. By Simon’s Theorem [23],
there exists an unordered monoid N ∈ J and a homomorphism hN : A∗ → N
such that hN (u) = hN (v) if and only if u and v agree on subwords of length
at most k. Now, the aforementioned blocks can be replaced by a disjunction of
formulas λ(x) = (n, a, n ) with n, n ∈ N and a ∈ A to obtain an equivalent
formula over the alphabet AN .
After replacing each innermost block, the resulting formula ϕ is in Σm−12
.

By induction, the corresponding language L(ϕ ) is recognized by a monoid K ∈
Wm−1 . We have L = L(ϕ) = σhN (L(ϕ )) by construction. Proposition 2 ﬁnally
yields ML ∈ Wm−1 ∗∗ J = Wm .

The following lemma can be seen by a similar reasoning as in the unordered
case due to Straubing [26].
Lemma 2. Let m 1. If M ∈ Wm , then M ∈ DA and M satisfies Um Vm .
184 L. Fleischer, M. Kufleitner, and A. Lauser

We turn to the implication (3) ⇒ (1) in Theorem 1, from Um Vm back to

2
logic Σm . On a high-level perspective, we want to use induction on m, then use
the identity Um−1 Vm−1 to get to Σm−1 2
, and finally lift this back to Σm2
.
An important part of this argument is the ability to restrict (or relativize) the
2
interpretation of Σm -formulas to certain factors of the model which are given by
first and last occurrences of letters.
In the following we also have to take the quantifier depth of a formula into
account, i.e., the maximal number of nested quantifiers. For an integer n 0 let
2 2
Σm,n be the fragment of Σm of formulas with quantifier depth at most n.
Lemma 3. Let ϕ ∈ Σm,n 2
for m, n 0, and let a ∈ A. There exist formulas
ϕ>Xa ∈ Σm,n+1 and ϕ<Xa ∈ Σm+1,n+1
2 2
such that for all u = u1 au2 with
a ∈ alph(u1 ) and i = |u1 a| we have:
u, p, q |= ϕ<Xa if and only if u1 , p, q |= ϕ for all 1 p, q < i,
u, p, q |= ϕ>Xa if and only if u2 , p − i, q − i |= ϕ for all i < p, q |u|.
Proof. Let ϕ<Xa ≡ ϕ if ϕ is an atomic formula. For conjunction and disjunction,
and negation we inductively take ϕ<Xa ∧ ψ<Xa and ϕ<Xa ∨ ψ<Xa , and
¬ϕ<Xa , respectively. For existential quantification let

∃x ϕ<Xa ≡ ∃x ¬(∃y x : λ(y) = a) ∧ ϕ<Xa .
As usual, swapping the variables x and y yields the corresponding constructions
for y. Atomic formulas and Boolean combinations in the construction of ϕ>Xa
are as above. For existential quantification let

∃x ϕ>Xa ≡ ∃x (∃y < x : λ(y) = a) ∧ ϕ>Xa .

The notation in the indices of the formulas mean that we restrict to the posi-
tions smaller (respectively, greater) than the first a-position (the neXt a-position,
thence Xa ). Of course there are dual formulas ϕ<Yb ∈ Σm,n+1
2
as well as ϕ>Yb ∈
2
Σm+1,n+1 for the last b-position (i.e., the Yesterday b-position). The next lemma
handles the case of the first a-position lying beyond the last b-position.
Lemma 4. Let ϕ ∈ Σm,n 2
for m, n 0, and let a, b ∈ A. There exists a formula
ϕ(Yb;Xa) in Σm+1,n+1 such that for all words u = u1 bu2 au3 with b ∈ alph(u2 au3 )
2

and a ∈ alph(u1 bu2 ) and for all |u1 b| < p, q |u1 bu2 | we have:
u, p, q |= ϕ(Yb;Xa) if and only if u2 , p − |u1 b|, q − |u1 b| |= ϕ.
Proof. Atomic formulas and Boolean combinations are straightforward. Let the
macro Yb < x < Xa stand for ¬(∃y x : λ(y) = a) ∧ ¬(∃y x : λ(y) = b). Using
this shortcut, we set ∃x ϕ(Yb;Xa) ≡ ∃x ((Yb < x < Xa ) ∧ ϕ(Yb;Xa) ).

Let h : A∗ → M be a homomorphism. The L-factorization of a word u is
the unique factorization u = s0 a1 · · · s−1 a s with si ∈ A∗ and so-called mark-
ers ai ∈ A such that h(s ) L 1 and h(si ai+1 · · · s−1 a s ) >L h(ai si · · · a s ) L
h(si−1 ai · · · s−1 a s ) for all i. Note that < |M |. Furthermore, if M ∈ DA,
then ai ∈ alph(si ). Let DL (u) consist of the positions of the markers, i.e., let
Block Products and Negation Nesting in FO2 185

DL (u) = {|s0 a1 · · · si−1 ai | | 1 i }. The R-factorization is deﬁned left-right

symmetrically, and the set DR (u) consists of all positions |pa| for preﬁxes pa of u
such that h(p) >R h(pa) for some a ∈ A. The following lemma combines the
R-factorization with the L-factorization for monoids in DA such that, starting
2 2
with Σm , one can express Σm−1 -properties of the factors. To formulate this
feature we set u m,n v for words u, v ∈ A∗ if v |= ϕ implies u |= ϕ for all
fo2

ϕ ∈ Σm,n
2
.
Lemma 5. Let h : A∗ → M be a homomorphism with M ∈ DA, let m 2
and n 0 be integers, and let u, v ∈ A∗ with u fo
2
m,2|M|+n v. There exist
factorizations u = s0 a1 · · · s−1 a s and v = t0 a1 · · · t−1 a t with ai ∈ A and
si , ti ∈ A∗ such that the following properties hold for all i ∈ {1, . . . , }:
1. si fo
2
m−1,n ti ,
2. h(s0 ) R 1 and h(t0 a1 · · · ti−1 ai ) R h(t0 a1 · · · ti−1 ai si ),
3. h(s ) L 1 and h(ai si · · · a s ) L h(si−1 ai · · · a s ).
Proof. Note that in property 2. the suffix is si and not ti . We want to prove
the claim by an induction, for which we have to slightly generalize the claim.
Apart from the words u and v from the premises of the lemma we also con-
sider an additional word p which serves as a prefix for v. The proof is by
induction on |DR (pv) \ DR (p)|. The assumptions are u fo
2
m,n v, where n =
n + |DR (pv) \ DR (p)| + |DL (u)| + 1. We shall construct factorizations u =
s0 a1 · · · s−1 a s and pv = p t0 a1 · · · t−1 a t such that properties 1. and 3. hold,
but instead of 2. we have h(pt0 a1 · · · ti−1 ai ) R h(pt0 a1 · · · ti−1 ai si ) and h(ps0 ) R
h(p). We thus recover the lemma using an empty prefix p.
Let u = s0 c1 · · · s −1 c s be the L-factorization (in particular ci ∈ alph(si ))
and let v = t0 c1 · · · t −1 c t where ci ∈ alph(ti ) for all i. The factorization
of v exists because by assumption u and v agree on subwords of length . The
dual of Lemma 3 yields s0 c1 · · · c −i s −i fo
m−1,n −i t0 c1 · · · c −i t −i as well as
2

fo2
si m−1,n ti for all i.
First suppose DR (p) = DR (pv). In this case h(p) R h(pv), and therefore,
h(p) R h(px) for all x ∈ B ∗ , where B = alph(v). So in particular we have that
h(pt0 c1 · · · ti−1 ci ) R h(pt0 c1 · · · ti−1 ci si ) because alph(u) = B. Setting ai = ci ,
si = si , and ti = ti yields a factorization with the desired properties.
Suppose now DR (p) DR (pv), and let s be the longest prefix of u such that
h(p) R h(ps) >R h(psa) for some a ∈ A. Such a prefix exists as alph(u) =
alph(v). We have a ∈ alph(s) by M ∈ DA. Let t be the longest prefix of v with
a ∈ alph(t). Using Lemma 3 we see alph(t) ⊆ alph(s). Let k and k be maximal
such that s0 c1 · · · sk−1 ck is a prefix of s and such that t0 c1 · · · tk −1 ck is a prefix
of t. We claim k = k . For instance, suppose k < k . Then ack+1 · · · c is a sub-
word of u but not of v (since ck+1 tk+1 · · · c t is the shortest suffix of v with the
subword ck+1 · · · c and since there is no a-position in t0 c1 · · · tk ). Let ai = ci for
i ∈ {1, . . . , k}, let si = si and ti = ti for i ∈ {0, . . . , k − 1}. Let sk and tk such that
s = s0 c1 · · · sk−1 ck sk and t = t0 c1 · · · tk−1 ck tk . Lemma 4 yields sk fo
2
m−1,n tk .

Let u = sau and v = tav , and let p = pta. For all i ∈ {0, . . . , k} we
have h(pt0 a1 · · · ti−1 ai ) R h(pt0 a1 · · · ti−1 ai si ) because alph(t) ⊆ alph(s). Note
that h(ai+1 si+1 · · · ak sk au ) L h(si ai+1 si+1 · · · ak sk au ). Since M ∈ DA we see
186 L. Fleischer, M. Kufleitner, and A. Lauser

h(p) >R h(p ) and thus DR (p) DR (p ). Using the formulas ϕ>Xa from
Lemma 3 yields u fo
m,n −1 v . As n |DR (p v ) \ DR (p )| + |DL (u )| + 2

2

we can apply induction to obtain factorizations u = sk+1 ak+2 · · · s−1 a s and
v = tk+1 ak+2 · · · t−1 a t . Setting ak+1 = a yields the desired factorizations.

The preceding lemma enables induction on the parameter m. We start with
a homomorphism onto a monoid satisfying Um Vm and want to show that
preimages of -order ideals are unions of fo
2
m,n -order ideals for some suﬃciently
large n. Intuitively, a string rewriting technique yields the largest quotient which
satisﬁes the identity Um−1 Vm−1 . One rewriting step corresponds to one ap-
plication of the identity Um−1 Vm−1 of level m − 1. Such rewriting steps can
be lifted to the identity Um Vm in the contexts they are applied.
Proposition 3. Let m 1 be an integer, let h : A∗ → M be a surjective homo-
morphism onto an ordered monoid M ∈ DA satisfying Um Vm . There exists
a positive integer n such that u fo ∗
m,n v implies h(u) h(v) for all u, v ∈ A .
2

Proof. We proceed by induction on m. For the base case m = 1 a result of

Pin [18] shows that, for every -order ideal I of M , the set h−1 (I) is a ﬁnite
union of languages A∗ a1 · · · A∗ ak A∗ for some k 1 and ai ∈ A. Let n be the
maximum of all indices k appearing in those unions when considering all order
ideals I ⊆ M . If u fo ∗ ∗ ∗
1,n v, then for all languages P = A a1 · · · A ak A with
2

k n we have that v ∈ P implies u ∈ P . Moreover, the preimage L of the order

ideal generated by h(v) is a ﬁnite union of languages A∗ a1 · · · A∗ ak A∗ with k n.
We have v ∈ L and thus u ∈ L. This shows h(u) h(v).
In the following let m 2 and ﬁx some integer ω 1 such that xω is idempo-
tent for all x ∈ M . We introduce a string rewriting system → on A∗ by letting
t → s if h(s) = h(t) or if t = pvm−1 q and s = pum−1 q for p, q ∈ A∗ , and v1 = 1
and u1 = z, and for i 2 we have
vi = (ui−1 xi )ω vi−1 (yi ui−1 )ω , ui = (ui−1 xi )ω ui−1 (yi ui−1 )ω
for xi , yi , z ∈ A∗ . Note that t → s implies p tq → p sq for all p, q ∈ A∗ .
∗ ∗
Let → be the transitive closure of →, i.e., let t → s if there exists a chain
t = w1 → w2 → · · · → w = s of rewriting steps for some 1 and wi ∈ A∗ . We
claim that we can lift the rewriting steps of t → ∗
s to M within certain contexts
in an order respecting way.
Claim. Let u, v, s, t ∈ A∗ with t → ∗
s. If both h(u) R h(us) and h(v) L h(sv),
then h(usv) h(utv).
The proof of the claim is by induction on the length of a minimal →-chain from t
to s. The claim is trivial if h(t) = h(s). Suppose t → ∗
t → s and t = pvm−1 q and
s = pum−1 q. Since h(u) R h(us), there exists x ∈ A∗ such that h(u) = h(usx);
∗
and since h(v) L h(sv) there exists y ∈ A such that h(v) = h(ysv). Now
h(u) = h u(pum−1 qx)ω and h(v) = h (ypum−1 q)ω v . By letting xm = qxp and
ym = qyp, the identity Um Vm of M yields

h(usv) = h up(um−1 xm )ω um−1 (ym um−1 )ω qv

h up(um−1 xm )ω vm−1 (ym um−1 )ω qv = h(ut v).
Block Products and Negation Nesting in FO2 187

Observe that (pum−1 qx)ω p = p(um−1 qxp)ω = p(um−1 xm )ω . Note that alph(t ) ⊆
alph(s). Therefore, h(u) R h(us) implies h(u) R h(ut ), and symmetrically
h(v) L h(sv) implies h(v) L h(t v). Induction yields h(ut v) h(utv) and thus
h(usv) h(utv). This completes the proof of the claim.
Let t ∼ s if t → ∗
s and s → ∗
t. Let M be the quotient A∗ /∼. The relation ∼ is
∗
a congruence on A and M is naturally equipped with a monoid structure. Let
h : A∗ → M be the canonical homomorphism mapping u ∈ A∗ to its equivalence
class modulo ∼. The preorder → ∗
on A∗ induces a partial order on M by letting
h (u) h (v) whenever v → u. Thus M forms an ordered monoid. Moreover, M
∗

is an unordered quotient of M and, in particular, M is ﬁnite and in DA, and xω

is idempotent for all x ∈ M .
By construction, M satisﬁes the identity Um−1 Vm−1 and induction yields
an integer n such that u fo
m−1,n v implies h (u) h (v). We show that u m,n v
fo2
2

implies h(u) h(v) for n = n + 2|M |. Suppose u fo

2
m,n v and consider the
factorizations u = s0 a1 · · · s−1 a s and v = t0 a1 · · · t−1 a t from Lemma 5. For
all i we have:
– si fom−1,n ti and thus ti → si by choice of n,
∗
2

– h(t0 a1 · · · ti−1 ai ) R h(t0 a1 · · · ti−1 ai si ), and

– h(ai+1 si+1 · · · a s ) L h(si ai+1 si+1 · · · a s ).
For conciseness t0 a1 · · · ti−1 ai is the empty word if i = 0, and so is ai+1 si+1 · · · a s
if i = . Applying the above claim repeatedly to substitute si with ti for increas-
ing i ∈ {0, . . . , } yields the following chain of inequalities:

h(u) = h(s0 a1 s1 · · · s−1 a s )

h( t0 a1 s1 · · · s−1 a s )
..
.
h( t0 a1 t1 · · · t−1 a s )
h( t0 a1 t1 · · · t−1 a t ) = h(v).

Proof of Theorem 1. The implication 1. ⇒ 2. is Lemma 1, and 2. ⇒ 3. is

Lemma 2. For the implication 3. ⇒ 1., let L ⊆ A∗ be a language, let hL :
A∗ → ML be its syntactic homomorphism. Moreover, suppose that ML is in
DA and satisfies Um Vm . The set I = hL (L) is an order ideal of ML . Propo-
sition 3 shows that there exists an integer n such that L = h−1 L (I) is a union of
fo
2
m,n -order ideals. Up to equivalence, there are only finitely many formulas with
quantifier depth n. Therefore, fo

2 2
m,n -order ideals are Σ m,n -definable.

Conclusion
The fragments Σm 2
of FO2 [<] are defined by restricting the number of nested
negations. They can be seen as the half levels of the alternation hierarchy FO2m
in two-variable first-order logic, and we have Σm2
⊆ FO2m ⊆ Σm+12
. It is known
that the languages definable in FOm form a strict hierarchy, see e.g. [16]. For
2

every m 1 we have given ω-terms Um and Vm such that a language L is

188 L. Fleischer, M. Kufleitner, and A. Lauser

2
definable in Σm if and only if its ordered syntactic monoid is in the variety
DA and satisfies the identity Um Vm . Using this characterization one can
2
decide whether a given regular language is definable in Σm . In particular, we
have shown decidability for every level of an infinite hierarchy. Note that there is
no immediate connection between the decidability of FO2m and the decidability
2
of Σm .
The block product principle is an important tool in the proof of the direction
2
from Σm to identities. In order to be able to apply this tool, we first extended
block products to the case where the left factor is an ordered monoid and then
stated the block product principle in this context. In order to further extend
the block product M N to the case where both M and N are ordered, one
has to consider the monotone functions in N × N → M instead of M N ×N . As
in the case of the wreath product principle [20] this leads to ordered alphabets
when stating the block product principle. However, one implication in the block
product principle fails for ordered alphabets as the universal property does not
hold in this setting.

References
1. Büchi, J.R.: Weak second-order arithmetic and finite automata. Z. Math. Logik
Grundlagen Math. 6, 66–92 (1960)
2. Cohen, R.S., Brzozowski, J.A.: Dot-depth of star-free events. J. Comput. Syst.
Sci. 5(1), 1–16 (1971)
3. Diekert, V., Gastin, P., Kufleitner, M.: A survey on small fragments of first-order
logic over finite words. Int. J. Found. Comput. Sci. 19(3), 513–548 (2008)
4. Ebbinghaus, H.-D., Flum, J.: Finite Model Theory. In: Perspectives in Mathemat-
ical Logic. Springer (1995)
5. Elgot, C.C.: Decision problems of finite automata design and related arithmetics.
Trans. Amer. Math. Soc. 98, 21–51 (1961)
6. Glaßer, C., Schmitz, H.: Languages of dot-depth 3/2. Theory of Computing Sys-
tems 42(2), 256–286 (2008)
7. Kamp, J.A.W.: Tense Logic and the Theory of Linear Order. PhD thesis, University
of California (1968)
8. Knast, R.: A semigroup characterization of dot-depth one languages. RAIRO, Inf.
Théor. 17(4), 321–330 (1983)
9. Krebs, A., Straubing, H.: An effective characterization of the alternation hierarchy
in two-variable logic. In: FSTTCS 2012, Proceedings. LIPIcs, vol. 18, pp. 86–98.
Dagstuhl Publishing (2012)
10. Kufleitner, M., Lauser, A.: Languages of dot-depth one over infinite words. In:
Proceedings of LICS 2011, pp. 23–32. IEEE Computer Society (2011)
11. Kufleitner, M., Lauser, A.: Around dot-depth one. Int. J. Found. Comput.
Sci. 23(6), 1323–1339 (2012)
12. Kufleitner, M., Lauser, A.: The join levels of the trotter-weil hierarchy are decidable.
In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464,
pp. 603–614. Springer, Heidelberg (2012)
13. Kufleitner, M., Lauser, A.: The join of the varieties of R-trivial and L-trivial
monoids via combinatorics on words. Discrete Math. & Theor. Comput. Sci. 14(1),
141–146 (2012)
Block Products and Negation Nesting in FO2 189

14. Kufleitner, M., Lauser, A.: Quantifier alternation in two-variable first-order logic
with successor is decidable. In: Proceedings of STACS 2013. LIPIcs, vol. 20,
pp. 305–316. Dagstuhl Publishing (2013)
15. Kufleitner, M., Weil, P.: The FO2 alternation hierarchy is decidable. In: Proceed-
ings of CSL 2012. LIPIcs, vol. 16, pp. 426–439. Dagstuhl Publishing (2012)
16. Kufleitner, M., Weil, P.: On logical hierarchies within FO2 -definable languages. Log.
Methods Comput. Sci. 8, 1–30 (2012)
17. McNaughton, R., Papert, S.: Counter-Free Automata. The MIT Press (1971)
18. Pin, J.-É.: A variety theorem without complementation. Russian Mathematics (Iz.
VUZ) 39, 80–90 (1995)
19. Pin, J.-É., Weil, P.: Polynomial closure and unambiguous product. Theory Comput.
Syst. 30(4), 383–422 (1997)
20. Pin, J.-É., Weil, P.: The wreath product principle for ordered semigroups. Commun.
Algebra 30(12), 5677–5713 (2002)
21. Schützenberger, M.P.: On finite monoids having only trivial subgroups. Inf. Con-
trol 8, 190–194 (1965)
22. Schwentick, T., Thérien, D., Vollmer, H.: Partially-ordered two-way automata: A
new characterization of DA. In: Kuich, W., Rozenberg, G., Salomaa, A. (eds.) DLT
2001. LNCS, vol. 2295, pp. 239–250. Springer, Heidelberg (2002)
23. Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975.
LNCS, vol. 33, pp. 214–222. Springer, Heidelberg (1975)
24. Straubing, H.: A generalization of the Schützenberger product of finite monoids.
Theor. Comput. Sci. 13, 137–150 (1981)
25. Straubing, H.: Finite Automata, Formal Logic, and Circuit Complexity. Birkhäuser
(1994)
26. Straubing, H.: Algebraic characterization of the alternation hierarchy in FO2 [ < ]
on finite words. In: Proceedings CSL 2011. LIPIcs, vol. 12, pp. 525–537. Dagstuhl
Publishing (2011)
27. Straubing, H., Thérien, D.: Weakly iterated block products of finite monoids. In:
Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 91–104. Springer, Heidelberg
(2002)
28. Tesson, P., Thérien, D.: Diamonds are forever: The variety DA. In: Proceedings of
Semigroups, Algorithms, Automata and Languages, pp. 475–500. World Scientific
(2002)
29. Thérien, D.: Classification of finite monoids: The language approach. Theor. Com-
put. Sci. 14(2), 195–208 (1981)
30. Thérien, D.: Th. Wilke. Over words, two variables are as powerful as one quantifier
alternation. In: Proceedings of STOC 1998, pp. 234–240. ACM Press (1998)
31. Thomas, W.: Classifying regular events in symbolic logic. J. Comput. Syst. Sci. 25,
360–376 (1982)
32. Trakhtenbrot, B.A.: Finite automata and logic of monadic predicates (in Russian).
Dokl. Akad. Nauk. SSSR 140, 326–329 (1961)
33. Weis, P., Immerman, N.: Structure theorem and strict alternation hierarchy for
FO2 on words. Log. Methods Comput. Sci. 5, 1–23 (2009)
Model Checking for String Problems

Milka Hutagalung and Martin Lange

School of Electr. Eng. and Computer Science, University of Kassel, Germany

Abstract. Model checking is a successful technique for automatic pro-

gram verification. We show that it also has the power to yield competitive
solutions for other problems. We consider three computation problems
on strings and show how the polyadic modal μ-calculus can define their
solutions. We use partial evaluation on a model checking algorithm in
order to obtain an efficient algorithm for the longest common substring
problem. It shows good performance in practice comparable to the well-
known suffix tree algorithm. Moreover, it has the conceptual advantage
that it can be interrupted at any time and still deliver long common
substrings.

1 Introduction

Model checking is the process of automatically evaluating a logical formula on a

given interpretation. This logical decision problem has proved to be extremely
useful in the area of systems verification where dynamic systems are modelled as
transition systems and formulas of temporal logics are being used to formalise
behavioural properties [6,20]. Model checking is used to answer the question
of whether or not such systems are correct with respect to some specification,
namely whether or not they possess the formalised properties. The logics that
are typically used in program verification have been designed in order to express
typical correctness properties of dynamic systems: LTL [19], CTL [7], CTL* [8],
etc. The name “model checking” is derived from the process of checking whether
some interpretation in the form of a mathematical structure has the property
defined by the formula, i.e. is a model of the formula in logical terms.
The impact that model checking has had for program verification has led to
a common understanding of model checking as a program verification method.
Still, the applicability of model checking is not limited to that area. Model check-
ing can in principle be used to solve all kinds of decision problems, provided that
the used specification language is strong enough to express that problem in the
usual sense of a word problem: given a representation of an instance x of a prob-
lem, decide whether or not x belongs to some set P . For instance, x could be
a directed graph, and P may consist of all graphs having a Hamiltonian path.
Take for example Monadic Second-Order Logic (MSO) interpreted over graphs.

The European Research Council has provided financial support under the Euro-
pean Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant
agreement no 259267.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 190–203, 2014.

c Springer International Publishing Switzerland 2014
Model Checking for String Problems 191

It is not too diﬃcult to construct a formula ϕham which is satisﬁed by a graph G

iff it has a Hamiltonian path. In fact, Fagin’s Theorem [10] and the fact that the
Hamiltonian path problem is easily seen to belong to NP give such a formula
straight-away. Thus, ϕham expresses the Hamiltonian path problem, and any
model checking algorithm for MSO on graphs can be used to solve the Hamil-
tonian path problem. Likewise, any problem that is definable in a logic with a
decidable model checking problem can therefore be solved by model checking.
The logics used in program verification mentioned above are not capable of ex-
pressing more complex properties like the Hamiltonian path for instance, unless
NLOGSPACE=NP. This is a simple consequence of the fact that the complexity
of model checking a fixed formula – known as data complexity – in either of these
logics is only NLOGSPACE whereas the Hamiltonian path problem is NP-hard.
MSO is not necessarily a good logic for model checking. In order to be useful,
such logics must provide a good balance between expressive power on one hand
and efficient decision procedures on the other. Clearly, these two goals may be in
conflict. It is commonly understood that the use of fixpoint quantifiers provide
such a good balance because fixpoints can be computed using iteration methods
for instance, and often they increase expressive power.
Fixpoints are implicitly present in the temporal logics mentioned above. An-
other prominent logic in model checking for program verification is the modal
μ-calculus Lμ [15] which explicitly adds fixpoint quantifiers to a basic modal
logic. Its data complexity is in P, and it is known that it can express more prop-
erties or problems than temporal logics like LTL, CTL, etc. Still, its expressive
power is limited by other facts, for instance it can only express properties of
single nodes in a directed graph. This weakness has been overcome with the in-
troduction of the polyadic or higher-dimensional μ-calculus [1,18] which behaves
very much like the ordinary modal μ-calculus when it comes to model checking.
In this paper we show how model checking can be used in order to solve prob-
lems in an area that is quite different from program verification: string problems.
We consider three of the most prominent examples of such problems: given a set
of strings over some finite alphabet, find the longest common substring, the
longest common subsequence, resp. the shortest common superstring. Such prob-
lems have important applications in bio-informatics as in sequence and genome
analysis [12,22], in linguistic information retrieval [24]; plagiarism detection, for
instance in publications [11] or source code [17,5]; data compression [21] and
so on.
We use the polyadic μ-calculus in order to express these problems on simple
graphs encoding string inputs and derive algorithms for these problems from a
generic model checking algorithm for this logic. This realises more than a Tur-
ing or Cook reduction in two ways. First of all, model checking is a decision
problem whereas the string problems mentioned above are computation prob-
lems. It is not obvious how model checking could be used to solve them. We do
not implement standard tricks like binary search or others to find a solution.
Instead, we use fixpoint quantifiers and iteration in the polyadic μ-calculus in
order to compute solutions of these strings problems. Second, we want to show
192 M. Hutagalung and M. Lange

that model checking can be used in order to derive competitive algorithms in

areas other than program verification. A vital step towards this goal is partial
evaluation. Model checking takes two inputs: a formula and an interpretation.
Algorithms for particular decision problems can be derived from model checking
algorithms by fixing the formula input to the one that expresses the problem.
Partial evaluation is the process of optimising the generic algorithm to one that
operates on a fixed formula and a variable (encoding of) its interpretation.
The fact that highly expressive modal fixpoint logics can be used in order
to solve problems that are more complex than the reachability problems arising
in automatic program verification has been observed before. For instance, [2]
develops a model checking algorithm for an extension of the modal μ-calculus
with first-order predicate transformers. It is shown that this can be used to solve
problems of higher complexity like NFA universality, QBF, or the shortest com-
mon supersequence problem (SCS). The principals are applicable to the other
mentioned string problems as well. However, that work only presents the princi-
pal applicability of model checking for SCS. The work presented here improves
and extends this in the following ways: the use of first-order predicate trans-
formers turned out to be an overkill; here we show that modal fixpoint logics
without higher-order features suffice for these string problems. Moreover, while
[2] only presents the principal definability of these problems, here we take the
work to a further level and show how the generic model checking algorithms can
be optimised in order to arrive at practical algorithms for string problems.
The rest of the paper is organised as follows. Sect. 2 recalls the polyadic μ-
calculus including the question of how model checking can be done for this logic.
Sect. 3 shows how the three aforementioned string problems can be represented
as model checking problems in this logic. It also contains a discussion on how
model checking logics with fixpoint quantifiers can be used to solve computation
problems. In order to evaluate the viability of this approach we concentrate on
one particular problem in Sect. 4 where we show how partial evaluation is being
used to turn the model checking algorithm for the polyadic μ-calculus on a fixed
formula into an algorithm for the longest common substring problem. Sect. 5
compares the obtained algorithm against existing approaches. Sect. 6 finishes
the paper with some concluding remarks.

2 The Polyadic μ-Calculus

Syntax and Semantics. The polyadic μ-calculus is interpreted over transition
systems. Let Σ and P be ﬁnite sets of labels. A transition system is a labeled
directed graph T = (S, −
→, λ) with S being a set of nodes, −
→ ⊆ S × Σ × S the
edge relation, and λ : S → 2P a function assigning a set of labels from P to
a
every node. We write s −→ t for (s, a, t) ∈ →
−.
Formulas of the polyadic μ-calculus Lω μ are given by

ϕ ::= pi | ¬ϕ | ϕ ∧ ϕ | ai ϕ | X | μX.ϕ

Model Checking for String Problems 193

where p ∈ P, a ∈ Σ, i ∈ N, and X ∈ V for some countably inﬁnite set V of

variables. We additionally assume that any free occurrence of a variable X is
under the scope of an even number of negation symbols in μX.ϕ.
Apart from the usual Boolean operators that can be expressed using ∧ and
¬ we introduce the following abbreviations: [a]i ϕ := ¬ai ¬ϕ and νX.ϕ :=
¬μX.¬ϕ[¬X/X].
The dimension of a formula is the number of diﬀerent indices occurring in
operators of the form pi , ai or [a]i in it. The fragment Lkμ consists of all formulas
of dimension at most k.
A formula of dimension k is interpreted in a k-tuple of nodes in a transition
system T = (S, − →, λ). The indices an atomic propositions and modal operators
refer to a particular dimension, i.e. pi is to be read as “the i-th component (of
the k-tuple) under consideration) satisﬁes p”. Likewise, ai ϕ formalises that the
tuple can be changed in its i-th component to some successor state such that ϕ
holds.
The semantics assigns to every formula of dimension k the set of all k-tuples
that satisfy it as follows. In order to handle free variables we use a variable
interpretation ρ : V → 2S assigning to each variable a set of k-tuples of nodes.
k

Then ρ[X → S] denotes the function that maps X to the set S and agrees with
ρ on all other arguments.
[[pi ]]Tρ := {(s1 , . . . , sk ) | p ∈ λ(si )}
[[¬ϕ]]Tρ := 2S \ [[ϕ]]Tρ
k

[[ϕ ∧ ψ]]Tρ := [[ϕ]]Tρ ∩ [[ψ]]Tρ

[[ai ϕ]]Tρ a
:= {(s1 , . . . , sk ) | ∃t ∈ S s.t. si −→ t and
(s1 , . . . , si−1 , t, si+1 , . . . , sk ) ∈ [[ϕ]]Tρ }
[[X]]Tρ := ρ(X)

[[μX.ϕ]]Tρ := {S ⊆ S k | [[ϕ]]Tρ[X→S] ⊆ S}

Thus, μX.ϕ defines the least fixpoint of the function that takes a set of k-tuples
of nodes S and returns the set of all k-tuples satisfying ϕ assuming that X is
interpreted as S [14,23]. We write s |=ρ ϕ if s ∈ [[ϕ]]Tρ for a k-tuple s of nodes in
T , denoting the fact that s satisfies the property formalised by ϕ. If ϕ does not
contain any free variables we may also drop the interpretation ρ.
A prominent example of a L2μ formula is

ϕbis := νX.( p1 ↔ p2 ) ∧ [a]1 a2 X ∧ [a]2 a1 X .
p∈P a∈Σ

It expresses bisimilarity in the sense that for all pairs of two nodes (s, t) we have
(s, t) |= ϕbis iﬀ s and t are bisimilar.

Model Checking Lkµ . A simple model checking algorithm for Lω μ is implicitly

given in the semantics of that logic. Given a ﬁnite transition system T and a
194 M. Hutagalung and M. Lange

closed Lkμ formula ϕ, one can compute the set [[ϕ]]T by induction on the structure
of ϕ. Fixpoint subformulas of the form μX.ψ or νX.ψ can be handled using
Knaster-Tarski fixpoint iteration: for least fixpoint formulas one binds X to the
empty set and computes the value of ψ on T . Then X is bound to this set of
k-tuples and so on until a fixpoint is reached. For greatest fixpoints one starts
the iteration with S k instead.
The model checking problem for the polyadic μ-calculus has been investigated
before [1,16]. Essentially, there is no conceptual difference to model checking the
ordinary μ-calculus [9] which consists of all formulas of arity 1. In fact, there
is a simple reduction from model checking formulas of arity k to formulas of
arity 1 on the k-fold product of a transition system. Thus, one of the major
parameters in its complexity – besides the formula’s arity – is its alternation
depth. Intuitively, it measures the nesting depth of fixpoints of different type.
For formulas with no such nestings we set it to 1. Since alternating fixpoint
quantifiers do not play any role in tackling string problems in the next section
we omit a formal definition of alternation depth here and refer to the literature
instead [4].
The next proposition summarises the findings on the complexity of model
checking Lω μ.

Proposition 1 ([1,16]). Given a transition system T with n nodes and a closed

Lkμ formula ϕ of alternation depth d, the set of all k-tuples of nodes in T satis-
fying ϕ can be computed in time O((|ϕ| · nk )
d/2 ).

3 Deﬁning String Problems

The three problems under consideration – longest common substring, resp. se-
quence, and shortest common superstring – all get as input a set W = {w1 , . . . ,
wm } of strings over some finite alphabet Σ. For ease of presentation we as-
sume them all to have length n. The theory and procedures to follow are easily
adapted to handle strings of varying lengths. First we consider straight-forward
representations of such inputs by transition systems. Distinguishing the two
cases of finding longest substructures or shortest superstructures turns out to be
beneficial.

Longest Common Substring and -Sequence. For these two problems we

represent the input strings W = {w1 , . . . , wm } by a transition system contain-
ing a single path for each such string. We also use the symbols 1, . . . , m as
propositions on the nodes in order to assess which string a node is in. Let
wi = ai,1 . . . ai,n for i = 1, . . . , m. Then TW is given as
a1,1 a1,2 a1,3 a1,n am,1 am,2 am,3 am,n
... ... ...
1 1 1 1 m m m m
m
Now consider the Lm μ -formula ϕlcst := νX.( i=1 ii ) ∧ a∈Σ a1 . . . am X in-
terpreted over transition systems of the form TW . In order to explain its meaning
Model Checking for String Problems 195

we consider how the naı̈ve fixpoint iteration algorithm computes [[ϕlcst ]]TW . First
m
we observe that i=1 ii satisfies exactly those m-tuples for which each i-th com-
ponent belongs to the i-th input string. Thus, fixpoint iteration only yields tuples
of positions with exactly one for each input string. Let us call these normal.
The greatest fixpoint iteration starts with the set of all tuples, and we can
restrict our attention to all normal tuples only. This set can be seen as a rep-
resentation of all the position at which the string ε occurs. The next fixpoint
iteration forms the union of all sets of normal tuples which represent positions
such that some a-edge is possible from all of them, and the resulting tuple repre-
sents occurrences of the substring a. Thus, it computes all positions of common
substrings of length 1. In general, the j-th fixpoint iteration computes all po-
sitions (as normal m-tuples) of a common substring of length j. Clearly, this
process is monotonically decreasing and there is some j – at most n + 1 – such
that the j-th iteration returns the empty set.
Indeed we have TW |= ϕlcst for any set W of strings. Nevertheless, model
checking via fixpoint iteration computes all common substrings of W before
finding out that the formula is not satisfied. This is the basis for an algorithm
computing the longest common substring using model checking as described in
detail in the next section.

Example 1. Consider the input W = {aabab, abaa, babab}, represented by the

following transition system. For convenience, we have given the nodes names.
We also omit the nodes’ labels since they are just the same as the names’ ﬁrst
components.
a a b a b a b a a
1,0 1,1 1,2 1,3 1,4 1,5 2,0 2,1 2,2 2,3 2,4

b a b a b
3,0 3,1 3,2 3,3 3,4 3,5

A greatest ﬁxpoint iteration for ϕlcst on TW starts with X 0as the set of all
positions. In order to compute the next iteration, note that [[( i=1 ii )]]TW is the
3

set of all tuples of the form ((1, j1 ), (2, j2 ), (3, j3 )) for appropriate j1 , j2 , j3 . Every
further iteration intersects some set obtained by evaluating the modal terms
with
this set. We therefore disregard all other tuples. Under this assumption,
[[ c∈{a,b} c1 c2 c3 X]]T[XW→X 0 ] then evaluates to

X 1 := {(1, 0), (1, 1), (1, 3)} × {(2, 0), (2, 2), (2, 3)} × {(3, 1), (3, 3)}
∪ {(1, 2), (1, 4)} × {(2, 1)} × {(3, 0), (3, 2), (3, 4)}

which is exactly the set of node tuples from which all components can do an
a-edge or all components can do a b-edge.
The next iteration for the evaluation of the greatest ﬁxpoint is obtained by
evaluating the ﬁxpoint body again, this time under the variable interpretation
[X → X 1 ], and it yields

X 2 := {(1, 1), (1, 3)} × {(2, 0)} × {(3, 1), (3, 3)}
196 M. Hutagalung and M. Lange

∪ {(1, 2)} × {(2, 1)} × {(3, 0), (3, 2)}

which is the set of positions of ab, resp. ba, in the input strings. Note that aa
for instance is no common substring, and this is reflected by the fact that there
is no p such that ((1, 0), (2, 2), p) belongs to X 2 .
The next iteration yields X 3 := {((1, 1), (2, 0), (3, 1))} which denotes the po-
sitions of the common substring aba. Finally, we get X 4 = ∅ at which point the
fixpoint is reached, and the solution to this longest common substring instance
is obtained as the value of the last iteration beforehand, namely aba at positions
1, 0, and 1 in the three input strings.
L2μ is also capable of expressing the longest common subsequence problem in
the same sense. Let ∗i ψ abbreviate μY.ψ ∨ a∈Σ ai Y . Informally, it denotes
the set of all tuples such that the i-th component can make an arbitrary num-
ber of steps along
m any edge and some resulting tuple satisfies ψ. Now consider
ϕlcsq := νX.( i=1 ii ) ∧ a∈Σ a1 ∗1 . . . am ∗m X. Evaluating this formula on
a transition system of the form TW will compute the longest common subse-
quence of the input strings in W in the same way as above. Note that, again,
we have TW |= ϕlcsq for any W but all the solutions to this instance are being
found in the last iteration of the greatest fixpoint evaluation.

Shortest Common Superstring. In order to model the shortest common super-

string problem we change the representation of words by simple paths in a tran-
sition system. Given a finite set W of words over Σ, the transition system TW
consists of one component for each word wi = ai,1 . . . ai,n which has the form
ai,1 ai,2 ai,3 ai,n
Σ ... Σ
s e
Two additional atomic propositions s, e are being used to mark the start node
and the end node of each component. They, together with the special structure
of these transition systems, can also be used to enforce tuples to contain exactly
one node from each input string. Thus, propositions 1, . . . , m are not needed
anymore. m m
Now consider ϕscs := ( i=1 si ) ∧ μX.( i=1 ei ) ∨ a∈Σ a1 . . . am X. Intu-
itively, it denotes the set of all tuples such that
1. each component is labeled with s, and
2. there is a sequence of edge labels w such that each component has a path
with these labels and the nodes of the tuple at the end of all these paths are
all labeled with e.
It should be clear from this description that we have TW |= ϕscs for any W .
However, evaluating ϕscs on TW by a least fixpoint iteration will ultimately
construct a shortest common superstring for all the strings in W . This iteration
starts with X 0 := ∅ and – when restricted to tuples with one component in each
string – gradually finds tuples (p1 , . . . , pm ) of positions in the i-th iteration such
that there is a word w of length i with a path labeled with w from each of their
Model Checking for String Problems 197

components. By the structure of TW , the iteration grows monotonically “to the
left”, i.e. it only ever adds tuples with positions further left in the input words.
Eventually – after no more than (n + 1)m iterations in the worst case – the
tuple ((1, 0), . . . , (m, 0)) is being found and the least ﬁxpoint is being reached.
The number of iterations done to achieve this equals the length of a shortest
common superstring, and this string can easily be computed by annotating the
found tuples of positions successively.

4 Partial Evaluation and Optimisation

The algorithms sketched above are rather naı̈ve and do not exploit any opti-
misation potential at all. The descriptions above are only meant to show how
model checking with fixpoint logics can in principle be used in order to solve such
computation problems. Here we focus on one particular problem, namely finding
longest common substrings, and show how partial evaluation of model checking
algorithms can be used to obtain an efficient procedure. Also note that a naı̈ve
estimation of the worst-case time complexity of these algorithms according to
Prop. 1 yields a horrendous overapproximation: in general, model checking Lkμ is
exponential in the arity k, here equalling the number of strings m in the input.
This, however, ignores the special structure of the transition systems used here
and that of the fixed formula.
Consider the algorithm that has been described in Example 1. It basically
works on a set X of common substrings, and in each iteration it extends all
elements of X to a longer common substring by considering one more letter to
the left. For m input strings of length n, the set X is represented by a set of
m-tuples, which initially contains nm tuples that represent the positions of the
empty string.
A straightforward optimisation changes the representation of the set X. In-
stead of using a set of m-tuples, we can represent a single substring w with a
set t(w) of pairs such that (i, j) ∈ t(w) iff w occurs in wi at position j. By using
this representation, initially we have nm positions instead of nm positions for
the empty string. Moreover, it is easy to check whether w is a common substring
which is true iff for every i = 1, . . . , m there is some j with (i, j) ∈ t(w).
Applying these straight-forward optimisations to the procedure described in
Ex. 1 yields Algorithm 1. It collects all non-extendable common substrings in a
set Y , and uses that for a return value.
In the following we describe further optimisations for Algorithm 1, so that it
can find longest common substrings faster and more efficiently.

Extension Restriction. To extend w ∈ X in each iteration, it is not necessary

to consider all letters from Σ. Suppose w = w a, where a ∈ Σ and w ∈ Σ ∗ ,
then to extend w it is enough to consider letters that have successfully extended
w , since for every a ∈ Σ if a w is not a common substring, then a w is not
either. We can get the letters that extend w in constant time by always keeping
a pointer from w to w for each w ∈ X, and from w to all of its extensions
198 M. Hutagalung and M. Lange

Algorithm 1. Finding the longest common substring

1: X ← Σ, Y ← ∅
2: while X = ∅ do extend some w ∈ X
3: take w ∈ X
4: if Ext(w) = ∅ then Ext(w) := {aw | a ∈ Σ, aw common substring }
5: X ← Ext(w) ∪ X\w replace w with its extension Ext(w)
6: else
7: Y ← {w} ∪ Y have w as non-extendable common substring
8: end if
9: end while
10: return the longest w ∈ Y

for each w ∈ X that have been extended. Moreover, we should always take the
shortest w ∈ X in each iteration, to make sure that the extension of w is already
computed in the previous iteration.
Multiple Substrings Extension. Under some conditions, extending a single
w ∈ X may imply extensions of some other substrings u ∈ X. For any w ∈ X
let S(w) = {u ∈ X | u = wv, v ∈ Σ + }. If w is extendable to aw, in general we
cannot conclude that u ∈ S(w) is also extendable to au. However it is the case if
t(aw) is equal to {(i, j − 1)|(i, j) ∈ t(w)}, since this means that all occurrences of
w in the input strings are always preceded by a. In this case, we can extend w to
aw, and also every u ∈ S(w) to au. Likewise if w is not extendable to any longer
common substring, then every u ∈ S(w) is also not extendable. In this case we
can move w and all u ∈ S(w) to Y . Extending all u ∈ S(w) (resp. moving all
u ∈ S(w) to Y ) can be done in constant time by exploiting the pointers deﬁned
before, i.e. a pointer from w = w a to w since every u are successively linked by
the pointer to w.
Multiple Letters Extension. It is also possible to extend w ∈ X with a se-
quence of letters an an−1 . . . a1 ∈ Σ n instead of only one single letter. Suppose
w = w a, and the string w was extended to a common substring an an−1 . . . a1 w
because of the previous extension policy, i.e. because t(a1 w ) = {(i, j − 1)|(i, j) ∈
t(w )}, . . . , t(an . . . a1 w ) = {(i, j − 1)|(i, j) ∈ t(an−1 . . . a1 w )}. Then if we can
extend w to a1 w, we can immediately conclude that w can be extended to
an an−1 . . . a1 w.

All of these optimisations will not make the extension of a single substring
w ∈ X harder since we store more information on each common substring,
to accommodate the optimisations. The extension policies derived from these
optimisations can cut down the number of iterations needed in Algorithm 1.

Example 2. Let W = {cgtacgag, aacgtag, agcgtacg} be the input strings.

We illustrate the computation of the longest common substring using Algo-
rithm 1 in Fig. 1, both with and without optimisation. In each iteration we
pick the shortest common substring to be extended. Figure (a) shows the ﬁrst
9 iterations (of 14 altogether) without any optimisation. Figure (b) shows the
Model Checking for String Problems 199

i X Y i X Y
1 g,t,c,a - 1 g,t,c,a -
2 ag,cg,t,c,a - 2 ag,cg,t,c,a -
3 ag,cg,gt,c,a - 3 ag,cg,gt,c,a -
4 ag,cg,gt,ac,a - 4 ag,cg,gt,ac,a -
5 ag,cg,gt,ac,ta - 5 ag,cg,gt,ac,gta -
6 cg,gt,ac,ta ag 6 cg,gt,ac,gta ag
7 acg,gt,ac,ta ag 7 acg,gt,ac,gta ag
8 acg,cgt,ac,ta ag 8 acg,cgt,ac,cgta ag
9 acg,cgt,ta ag,ac 9 acg,cgt,cgta ag,ac
.. .. .. 10 cgt,cgta ag,ac,acg
. . . 11 - ag,ac,acg,cgt,cgta
(a) without optimisation (b) with optimisation

Fig. 1. Computation for W = {cgtacgag, aacgtag, agcgtacg}

computation with optimisation which finds the longest common substring cgta
after 11 iterations.
Note that in Fig.1 (b), we apply the optimisation in the 4th, 7th, and 10th
iteration. In the 4th iteration it is found that a can be extended to ta and we
have t(gt) = {(i, j − 1) | (i, j) ∈ t(t)} from the previous iteration, so a can be
extended directly to gta. In the 7th iteration, by extending gt to cgt we also
extend gta to cgta since gta ∈ S(gt). In the 10th iteration cgt is not extendable
so we conclude that cgta are not either.
Theorem 1. For input strings w1 , . . . , wm each of length n, the number of it-
erations needed by the optimised algorithm is at most n + n + m(n − 1).
Proof. In each iteration i, we pick the shortest common substring w ∈ X i to be
extended, which satisfies one of these properties, either:
1. w cannot be extended to the left anymore, or
2. w can be extended to aw and t(aw) = {(i, j − 1)|(i, j) ∈ t(w)}, i.e. w allows
multiple substring extension as described previously, or
3. none of these two conditions apply to w.
Let L1 , L2 , L3 be the set of common substring of w1 , . . . , wn , such that w ∈ Li
iff w satisfies the i-th property.
|L1 | ≤ n, since we have a one-to-one mapping from L1 to the set of prefixes of
w1 . Note that each v ∈ L1 is a substring of w1 (resp. w2 , . . . , wn ), and it can be
mapped to a prefix uv of w1 . Every two different v1 , v2 ∈ L1 are mapped to two
different prefixes of w1 , for otherwise one of them would be a suffix of the other,
and thus could be extended to the left which would contradict v1 , v2 ∈ L1 .
|L3 | ≤ m(n − 1), since if v ∈ L3 then v occurs on all input strings, and there
exists an input string wi such that v occurs more than once on wi . If |L3 | = k,
then |w1 | + . . . + |wm | is at least m + k. However, the added lengths of all input
strings is bounded by mn, so |L3 | ≤ mn − m.
200 M. Hutagalung and M. Lange

In general, |L2 | ≤ n2 . But consider L2 ⊆ L2 where v ∈ L2 iﬀ its longest

proper prefix v does not belong to L2 . Suppose on the i-th iteration, we pick
v ∈ L2 to be extended. If v ∈ / L2 then the longest proper prefix v of v is not yet
extended on any j-th iteration, j < i. Otherwise either the optimisation rule:
multiple substring extension or multiple letters extension have been applied to
obtain the extension av of v, and implies v ∈ / X i for every i > j. So v ∈ / L2
contradicts to v being the current shortest common substring found.
|L2 | ≤ n, because we have a one-to-one mapping from L2 to the set of suffixes
of w1 . Each v ∈ L2 can be mapped to a suffix vu of w1 . Every two different
v1 , v2 ∈ L2 are mapped into two different suffixes of w1 , for otherwise one of
them would be a prefix of the other. Suppose w.l.o.g. that v1 was a prefix of
v2 . Then v1 is also a prefix of the longest proper prefix v2 of v2 . Hence v2 ∈ L2
which contradicts that v2 ∈ L2 .

5 A Comparison against the Suﬃx-Tree Approach

The literature describes two algorithms for solving the longest common substring
problem: dynamic programming [13] and the suffix tree algorithm [12]. However,
it is known that the speed and versatility of the suffix tree algorithm is better
than dynamic programming. It is therefore the state-of-the-art and standard
algorithm used for the longest common substring problem.
A suffix tree of W is a tree storing all suffixes of strings in W . It has many
applications in biological sequence data analysis [3], especially for searching pat-
terns in DNA or protein sequences. For a more detailed explanation of suffix trees
see [12]. We compare the optimised Algorithm 1 with the suffix tree algorithm
empirically on a biological data set, and also conceptually.

Empirical Comparison. An interesting application for the longest common

string problem comes from bioinformatics area for identification of common fea-
tures in genome analysis [12]. We compare the performance of the optimised
Algorithm 1 and the suffix tree algorithm on a data set of complete genomes.
We consider the 11 species that are named in Fig. 2. All of these complete
genomes can be obtained on a public GenBank database of NCBI1 .
We choose mostly virus and bacteria genomes since they usually have only
one or two chromosomes, and the size of their complete genome is approxi-
mately 3 megabytes, which is still suitable for the experiment. For more complex
species such as humans the size of their complete genome is approximately 800
megabytes1 . The bacteria Vibrio cholerae and Agrobacterium tumefaciens have
two chromosomes, and we treat each chromosome separately. We take two to
four species for each experiment and try to find their longest common substring.
Fig. 2 compares the running time of the optimised Algorithm 1 and the suf-
fix tree algorithm on some benchmarks. Both algorithms were implemented in
OCaml. The suffix tree algorithm uses Sébastien Ferré’s implementation of the
1
See http://www.ncbi.nlm.nih.gov/genbank
Model Checking for String Problems 201

Species Size Suﬃx Tree Alg. Opt. Alg. 1

E. coli, M. tuberculosis 9.8 mb 70 min 33 sec 54 min 26 sec
A. tumefaciens (I), V. cholerae (I) 7.1 mb 42 min 46 sec 38 min 17 sec
S. enterica, B. subtilis (I) 9.2 mb 64 min 2 sec 52 min 13 sec
A. fugildus, N. gonorrhoeae, 7.1 mb 35 min 10 sec 33 min 50 sec
A. tumefaciens (II)
C. trachomatis, A. aeolicus, 5.7 mb 21 min 33 sec 24 min 34 sec
H. inﬂuenzae, V. cholerae (II)
E. coli, V. cholerae (I) 8.2 mb 60 min 53 sec 46 min 21 sec
M. tuberculosis, V. cholerae (I) 7.5 mb 43 min 36 sec 40 min 4 sec

Fig. 2. Comparison with suﬃx tree algorithm

suffix tree data structure2 . The experiments have been run on a machine with
16 Intel Xeon cores running at 1.87GHz and 256GB of memory.
The result suggests that the optimised Algorithm 1 is comparable to the suffix
tree algorithm. The time needed is often even less than the suffix tree algorithm,
except on the data set with 5.7 mb, where the optimised Algorithm 1 was 3min
slower than the suffix tree algorithm. However, in general we can conclude that
the optimised Algorithm 1 performed well compared to the suffix tree algorithm.
Conceptual Comparison. The suffix tree algorithm builds the tree first and
then searches for the deepest node that represents the longest common substring
of all input strings. The usual approaches to building the tree are incremental
with respect to the number of the input strings [12]. For example, to build
a suffix tree of W = {w1 , . . . , wn }, one starts with a suffix tree of w1 , then
gradually modifies the tree to include the suffixes of w2 , and so on. This has the
disadvantage of not being able to see the common substrings of all w1 , . . . , wm
during the tree construction. The whole tree has to be constructed first before
searching for any common substring of w1 , . . . , wm . What can be recorded during
the tree construction is only the common substring for the first m-input strings.
Now suppose that the input data is large such that despite the linear time com-
plexity an entire run of the algorithm would still take, say, days to terminate. In
such a case it would be great if the algorithm was able to report the finding of long
common substrings on-the-fly, i.e. incrementally produce longer and longer com-
mon substrings. The suffix tree algorithm is not able to do this, because it needs to
process all input strings before finding even the shortest common substring. How-
ever it is not the case for Algorithm 1. We have seen that the algorithm always
maintains the currently longest common substring found in each iteration, and it
is able to incrementally report longer and longer common substrings rather than
finding them only at the very end of the entire computation.

6 Conclusion and Further Work

We have shown that certain string problems can be expressed as model checking
problems for modal ﬁxpoint logics. Fixpoint computation can be used to ﬁnd
2
See http://www.irisa.fr/LIS/ferre/software.en.html
202 M. Hutagalung and M. Lange

optimal solutions for these problems. We have assumed straight-forward encod-

ing of input strings as transition systems of disjoint paths. However, the formulas
of the polyadic μ-calculus that were used to define these string problems also
work on more compact graph encodings when common parts of input strings are
being shared.
We have focused on the longest common substring problem and shown that
it is possible to derive a new competitive algorithm by partial evaluation of a
generic model checking algorithm for the polyadic μ-calculus. It turned out to
have the conceptual advantage of being interruptable: roughly speaking, at half
of the running time it has computed the half-longest common substrings of all
inputs. The suffix tree algorithm, as a standard for this problem, on the other
hand has computed, at half of the running time, the longest common substrings
of half of the input strings.
Further work on the longest common substring algorithms includes a broader
practical evaluation and a thorough study of possibilities to combine features
from the suffix tree algorithm and the model checking approach. It also remains
to execute the partial evaluation work for the two other string problems consid-
ered here, and possibly others as well in order to create hopefully competitive
and usable algorithms for these problems, too.
Finally, we believe that model checking technology can contribute to algorith-
mic solutions for all sorts of other problems as well. This, of course, has to be
studied for every possible decision or computation problem of interest separately.

References
1. Andersen, H.R.: A polyadic modal μ-calculus. Technical Report ID-TR: 1994-195,
Dept. of Computer Science, Technical University of Denmark, Copenhagen (1994)
2. Axelsson, R., Lange, M.: Model checking the first-order fragment of higher-order
fixpoint logic. In: Dershowitz, N., Voronkov, A. (eds.) LPAR 2007. LNCS (LNAI),
vol. 4790, pp. 62–76. Springer, Heidelberg (2007)
3. Bieganski, P., Riedl, J., Cartis, J.V., Retzel, E.F.: Generalized suffix trees for bio-
logical sequence data: applications and implementation. In: Proc. 27th Hawaii Int.
Conf. on System Sciences, vol. 5, pp. 35–44 (January 1994)
4. Bradfield, J., Stirling, C.: Modal mu-calculi. In: Blackburn, P., van Benthem, J.,
Wolter, F. (eds.) Handbook of Modal Logic: Studies in Logic and Practical Rea-
soning, vol. 3, pp. 721–756. Elsevier (2007)
5. Campos, R.A.C., Martı́nez, F.J.Z.: Batch source-code plagiarism detection using
an algorithm for the bounded longest common subsequence problem. In: Proc.
9th Int. IEEE Conf. on Electrical Engineering, Computing Science and Automatic
Control, CCE 2012, pp. 1–4. IEEE (2012)
6. Emerson, E.A., Clarke, E.M.: Characterizing correctness properties of parallel pro-
grams as fixpoints. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS,
vol. 85, pp. 169–181. Springer, Heidelberg (1980)
7. Emerson, E.A., Clarke, E.M.: Using branching time temporal logic to synthesize
synchronization skeletons. Science of Computer Programming 2(3), 241–266 (1982)
8. Emerson, E.A., Halpern, J.Y.: “Sometimes” and “not never” revisited: On branch-
ing versus linear time temporal logic. Journal of the ACM 33(1), 151–178 (1986)
Model Checking for String Problems 203

9. Emerson, E.A., Jutla, C.S., Sistla, A.P.: On model checking for the μ-calculus and
its fragments. TCS 258(1-3), 491–522 (2001)
10. Fagin, R.: Generalized first-order spectra and polynomial-time recognizable sets.
Complexity and Computation 7, 43–73 (1974)
11. Gipp, B., Meuschke, N.: Citation pattern matching algorithms for citation-
based plagiarism detection: greedy citation tiling, citation chunking and longest
common citation sequence. In: Proc. 2011 ACM Symp. on Document Engineering,
pp. 249–258. ACM (2011)
12. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology. Cambridge University Press (1997)
13. Hirschberg, D.: A linear space algorithm for computing maximal common subse-
quences. Commun. ACM 18(6), 341–343 (1975)
14. Knaster, B.: Un théorèm sur les fonctions d’ensembles. Annals Soc. Pol. Math. 6,
133–134 (1928)
15. Kozen, D.: Results on the propositional μ-calculus. TCS 27, 333–354 (1983)
16. Lange, M., Lozes, E.: Model checking the higher-dimensional modal μ-calculus. In:
Proc. 8th Workshop on Fixpoints in Comp. Science, FICS 2012. Electr. Proc. in
Theor. Comp. Sc., vol. 77, pp. 39–46 (2012)
17. Oetsch, J., Pührer, J., Schwengerer, M., Tompits, H.: The system Kato: Detecting
cases of plagiarism for answer-set programs. Theory and Practice of Logic Pro-
gramming 10(4-6), 759–775 (2010)
18. Otto, M.: Bisimulation-invariant PTIME and higher-dimensional μ-calculus.
Theor. Comput. Sci. 224(1–2), 237–265 (1999)
19. Pnueli, A.: The temporal logic of programs. In: Proc. 18th Symp. on Foundations
of Comp. Science, FOCS 1977, Providence, RI, USA, pp. 46–57. IEEE (1977)
20. Queille, J.P., Sifakis, J.: Specification and verification of concurrent systems in CE-
SAR. In: Dezani-Ciancaglini, M., Montanari, U. (eds.) Programming 1982. LNCS,
vol. 137, pp. 337–371. Springer, Heidelberg (1982)
21. Storer, J.A.: Data Compression: Methods and Theory. Comp. Sci. Press (1988)
22. Sung, W.-K.: Algorithms in Bioinformatics: A Practical Approach. CRC Press
(2009)
23. Tarski, A.: A lattice-theoretical fixpoint theorem and its application. Pacific Jour-
nal of Mathematics 5, 285–309 (1955)
24. Xiao, Y., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Using longest common subse-
quence matching for chinese information retrieval. Journal of Chinese Language
and Computing 15(1), 45–51 (2010)
Semiautomatic Structures

Sanjay Jain1, , Bakhadyr Khoussainov2, , Frank Stephan3, ,

Dan Teng3 , and Siyuan Zou3
1
Department of Computer Science, National University of Singapore
13 Computing Drive, COM1, Singapore 117417, Republic of Singapore
sanjay@comp.nus.edu.sg
2
Department of Computer Science, University of Auckland, New Zealand
Private Bag 92019, Auckland, New Zealand
bmk@cs.auckland.ac.nz
3
Department of Mathematics, The National University of Singapore
10 Lower Kent Ridge Road, S17, Singapore 119076, Republic of Singapore
fstephan@comp.nus.edu.sg, {tengdanqq930,zousiyuan}@hotmail.com

Abstract. Semiautomatic structures generalise automatic structures in

the sense that for some of the relations and functions in the structure one
only requires the derived relations and structures are automatic when all
but one input are filled with constants. One can also permit that this
applies to equality in the structure so that only the sets of representa-
tives equal to a given element of the structure are regular while equality
itself is not an automatic relation on the domain of representatives. It
is shown that one can find semiautomatic representations for the field of
rationals and also for finite algebraic field extensions of it. Furthermore,
one can show that infinite algebraic extensions of finite fields have semi-
automatic representations in which the addition and equality are both
automatic. Further prominent examples of semiautomatic structures are
term algebras, any relational structure over a countable domain with
a countable signature and any permutation algebra with a countable
domain. Furthermore, examples of structures which fail to be semiauto-
matic are provided.

1 Introduction
General Background. An important topic in computer science and mathemat-
ics is concerned with classifying structures that can be presented in a way that
certain operations linked to the structures are computed with low computational
complexity. Automatic functions and relations can, in some sense, be considered

S. Jain was supported in part by NUS grants C252-000-087-001, R146-000-181-112
and R252-000-420-112.

B. Khoussainov is partially supported by Marsden Fund grant of the Royal Society
of New Zealand. The paper was written while B. Khoussainov was on sabbatical
leave to the National University of Singapore.

F. Stephan was supported in part by NUS grants R146-000-181-112 and R252-000-
420-112.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 204–217, 2014.

c Springer International Publishing Switzerland 2014
Semiautomatic Structures 205

to have low complexity. The first work in this field centered on the question
which sets are regular (that is, recognised by finite automata) and how one can
transform the various descriptions of regular sets into each other. Later math-
ematicians applied the concept also to structures: Thurston automatic groups
[3] are one of the pioneering works combining automata theory with structures.
Here one has (a) a regular set of representatives A consisting of words over a
finite alphabet of generators, (b) an automatic equivalence relation representing
equality and (c) for every fixed group member y, an automatic mapping fy from
A to A such that fy (x) is a representative of the group member x ◦ y. Here a
function is automatic iff its graph can be recognised by a finite automaton or,
equivalently, iff it is computed in linear time by a one-tape Turing machine which
replaces the input by the output on the tape, starting with the same position
[1]. These concepts have been generalised to Cayley automatic groups [7,10] and
to automatic structures in general.
For automatic structures, one has to define how to represent the input to
functions that have several inputs. This is now explained in more detail. If Σ is
the alphabet used in the regular domain A ⊆ Σ ∗ of the structure, one defines
the convolution of two strings a0 a1 . . . an and b0 b1 . . . bm to consist of combined
characters c0 c1 . . . cmax{m,n} where

if k min{m, n} then ck = abkk else if m < k n then ck = a#k else

ck = b#k .

Here # is a ﬁxed character outside Σ used for padding purposes. Convolution

of strings x and y is denoted by conv(x, y). Now the domain of a function f :
A × A → A is the set {conv(x, y) : x, y ∈ A} which might from now on be
identified with A × A. Similarly one can define convolutions of more than two
parameters and also define that an automatic relation over Ak is an automatic
function from Ak to {0, 1} taking 1 on those tuples where the relation is true and
taking 0 otherwise. A structure A is automatic iff it is isomorphic to a structure
B such that the domain and all functions and relations in the structure are
automatic.
Let N denote the set of natural numbers, Z the set of integers and Q the set
of rational numbers. Now (N, +, =, <) is an automatic structure, as (i) there
is a regular set A such that each member of N is represented by at least one
member of A, (ii) there is an automatic function f : A × A → A such that for
each x, y ∈ A the value f (x, y) is a representative of the sum of the elements
represented by x, y and (iii) the sets {conv(x, y) : x, y represent the same element
of N} and {conv(x, y) : x represents a number n and y represents a number
m with n < m} are both regular. Automatic structures were introduced by
Hodgson [5] and later, independently, by Khoussainov and Nerode [8]. Automatic
structures have a decidable first-order theory and every function or relation first-
order definable in an automatic structure (with quantification over members
of the structure, say group elements and using as parameters relations from
the structure or other automatic relations introduced into the representation
of the structure) are again automatic. These closure properties made automatic
206 S. Jain et al.

structures an interesting ﬁeld of study; however, a limitation is its expressiveness.

For example, the structure (N, ·, =) is not automatic yet its first-order theory is
decidable. There is a limited version of multiplication which is automatic in every
automatic presentation of (N, +) or (Z, +): For every multiplication with a fixed
element n, one can find an automatic function which maps every representative
of a number m to a representative of the number m · n.
Therefore, one would like to overcome the lack of expressivity of automatic
structures and address the following questions: (1) Are there general ways to
utilise finite automata for the representation of non-automatic structures such
as (Q, +) and (N, ·, =)? (2) Under such general settings, what properties of auto-
matic structures should be sacrificed and what properties should be preserved to
accommodate non-automatic structures as those we mentioned above? (3) What
are the limits of finite automata in representations of structures?
The present paper proposes one possible approach to address the questions
above. The main concept is motivated by the notion of Thurston automatic-
ity and Cayley automaticity for groups [3,7]. Namely, one says that a function
f : Ak → A is semiautomatic iff whenever one fixes all but one of the inputs
of f with some fixed elements of A, then the resulting mapping from A to A
is automatic. Similarly a relation R ⊆ Ak is semiautomatic, if it is a semiauto-
matic function when viewed as a {0, 1}-valued function (mapping the members
of R to 1 and the non-members of R to 0). This permits now to give the gen-
eral definition using finite automata representing structures. For a structure,
say (N, +, <, =; ·), one says that this structure is semiautomatic iff there is a
representation (A, f, B, C; g) of this structure such that A is a regular set of rep-
resentatives of N, f is an automatic function representing +, B, C are automatic
relations representing <, =, respectively, and g is a semiautomatic relation rep-
resenting the multiplication. Note that the convention here is that the relations
and functions before the semicolon have to be automatic while those after the
semicolon need only to be semiautomatic. Hence in a structure (N, +, <, =; ·)
the operation + and the relations < and = have to be automatic and · is only
semiautomatic while in a structure (Q; +, ·, <, =) not only the operations addi-
tion and multiplication are semiautomatic but also the relations < and =, that
is, only the sets which compare to a fixed element (say all representatives of
numbers below 1/2 or all representatives of 5) have to be regular. This differ-
ence is crucial, for example, (Q, +; =) is not semiautomatic [14] and (Q, =; +) is
semiautomatic. It is of course the goal to maintain automaticity for as many op-
erations and relations as possible, therefore one needs to pay attention to these
differences. Here are some important comments on the structures.

– The condition that a basic function, say f : A2 → A, is semiautomatic re-

quires, for all a ∈ A, merely the existence of automata recognising the sets
{conv(x, y) : y = f (x, a)} and {conv(x, y) : y = f (a, x)}. This part of the def-
inition is kept as general as possible to accommodate a large class of structures.
In particular, this part is needed to address question (3) posed above. Obvi-
ously, the requirement that the graph {conv(x, y) : y = f (x, a)} is automatic
can be made eﬀective; namely, there is an algorithm that given any a from the
Semiautomatic Structures 207

domain produces a ﬁnite automaton recognising the graph {conv(x, y) : y =

f (x, a)}. All the results of the paper, apart from Theorems 6, 7 and 8, satisfy
this effectiveness condition. Thus, under this effectiveness condition, semiau-
tomatic structures are still structures with finite presentations.
– For the structures A with no relation symbols, semiautomaticity is equiva-
lent to saying that all algebraic polynomials with one variable (as defined
in the beginning of Section 2) are automatic. Thus, semiautomaticity un-
der the effectiveness condition, is equivalent to saying that the structure
A = (A, g0 , g1 , . . .), where g0 , g1 , . . . is the list of all algebraic polynomi-
als with one variable, is automatic. In particular this implies that the first
order theory of this structure derived from A is decidable. The first order
theory of A , can naturally be embedded into the first order theory of A.
Hence, semiautomaticity of A under the effectiveness condition, implies that
a natural fragment of the first order theory of A is decidable. Moreover, al-
gebraically the structure A has exactly the same set of congruences as the
original structure A.
– There is a difference between semiautomatic / automatic functions and rela-
tions when = is only semiautomatic and not automatic. While a function, for
each input, has to find only one representative of an output, a relation must
be true for all representatives of a given tuple which is satisfied. Therefore, it
can be that a function is automatic while the graph {conv(x, y) : y = f (x)}
is not automatic. This difference in the effectivity of functions and relations
is found in many domains where equality is not fully effective. For exam-
ple there are many methods to systematically alter computer programs (for
example, if p computes x → f (x) then F (p) computes x → f (x) + 1). For
many programming languages, such an F can even be realised by an auto-
matic function transforming the programs. However, it would be impossible
to check whether a program q is equal to F (p) in the sense that it has the
same input/output behaviour: the relation {(p, q) such that q computes a
function producing outputs one larger than those outputs produced by p} is
indeed an undecidable set, due to Rice’s Theorem.

Often one identifies the rationals with the set of all pairs written as a/b with a ∈ Z
and b ∈ {1, 2, . . .}; so one identifies “one half” with each of 1/2, 2/4, 3/6, . . . and
consider all of these to be equal. Similarly, in the case that the distinction is not
relevant, the represented structure is often identified with its automatic or semi-
automatic presentation and one denotes representatives in the automatic domain
by their natural representation or vice versa and denotes the automatic functions
realising these operations with the usual notation for operations of the structure
represented.

Contributions of the Paper. First, the paper proposes the class of semiau-
tomatic structures that can be defined in terms of finite automata. This class
contains all automatic structures. Under the effectiveness condition put on semi-
automaticity, (1) these structures have finite presentations, (2) natural fragments
of their first order theories are decidable and (3) the class is wide enough to
208 S. Jain et al.

contain structures with undecidable theories. The paper provides many exam-
ples of semiautomatic structures, see Section 2.
Second, the paper provides several results of a general character. For exam-
ple, purely relational structures, countable ordinals and permutation algebras
all have semi-automatic presentations. This provides a large class of semiauto-
matic structures and showcases the power of finite automata in representation
of algebraic structures. Note that for these results, no effectivity constraints on
the semiautomaticity are made. See Section 3.
Third, the paper proves semiautomaticity for many of the classical algebraic
structures which are groups, rings and vector spaces. The main reason for this
study is that most of these structures lack automatic presentations (such as
(Q, +), (Z; , +, ·, ) and infinite fields). Therefore, it is natural to ask which of
these structures admit semiautomaticity. Many of these structures and in par-
ticular all concretely given examples are also semiautomatic
√ with the effectivity
condition. For instance, the ordered field (Q( n); +, ·, <, =) is semiautomatic
for every natural number n. There are also several counterexamples which are
not semiautomatic. These examples and counterexamples are presented in Sec-
tions 4, 5 and 6.
A full version with the omitted proofs and results is available as Research
Report 457 of the Centre for Discrete Mathematics and Theretical Computer
Science (CDMICS), The Univerity of Auckland.

2 Decidability Theorem and Examples

The ﬁrst result is a simple and general decidability result about semiautomatic
structures without relational symbols. So, let A = (A, f1 , . . . , fn ) be a semiau-
tomatic structure where each fi is an operation. An algebraic polynomial is a
unary operation g of the form f (a1 , . . . , ak , x, ak+2 , . . . , an ), where f is a basic
operation of A, and a1 , . . . , ak , ak+2 , . . . , an are parameters from A. Consider the
structure A = (A; g0 , g1 , . . .), where g0 , g1 , . . . are a complete list of all algebraic
polynomials obtained from f1 , . . . , fn . There is a close relationship between A
and A in terms of congruence relations (that is equivalence relations respected
by the basic operations):

Proposition 1. The set of congruences of the structures A and A coincide.

The transformation A → A gives an embedding of the ﬁrst order theory of

A (with parameters from A) into the first order theory of A. The embedding
is the identity mapping. Assuming the effectivity condition (that is there is an
algorithm that given any algebraic polynomial g produces a finite automaton
recognising the graph of g), the automatic structure A has a decidable first
order theory.

Theorem 2 (Decidability Theorem). If A is semiautomatic, then under the

eﬀectivity condition the ﬁrst order theory of A is decidable.
Semiautomatic Structures 209

The next examples illustrate that there are many semiautomatic structures
which are not automatic.

Example 3. ({0, 1}∗, =; ◦) with ◦ being the string concatenation is an example

of a structure which is semiautomatic but not automatic. For a fixed string v, the
mappings w → vw and w → wv are both automatic; however, conv(v, w) → vw
is not an automatic mapping. Indeed, there is no automatic presentation of
({0, 1}∗, =, ◦).
Furthermore, (N, +, <, =; ·) is an example of a semiautomatic structure which
is not automatic, as there is no automatic copy of the multiplicative structure of
the natural numbers (N, ·). It is known that multiplication is semiautomatic due
to the fact that multiplication with a constant can be implemented as repeated
addition. One can augment this example with a semiautomatic function f :
N × N → N such that f (x + y, x) = 3x + 3x+1 · y − 1 and f (x, x + y + 1) =
2 · 3x + 3x+1 · y − 1. Note that f is a semiautomatic bijection and there exists no
infinite regular set A for which there exists an automatic bijection g : A×A → A.
Every two-sided Cayley automatic group (G, ◦) is an example of a semiauto-
matic structure (G, =; ◦); the reason is that for finitely generated groups, mul-
tiplication with group elements is automatic, as the multiplication with each
generator is automatic. Furthermore, the automaticity of = follows from the def-
inition of Cayley automatic groups. In turn, the definition of a semiautomatic
structure gives that every semiautomatic group (G, =; ◦) is Cayley automatic.
Miasnikov and Sunic [10] provide an example of a group (G, ◦) which is one-sided
but not two-sided Cayley automatic, thus one would not have that (G, =; ◦) is
semiautomatic but only that (G, =; {x → x ◦ y : y ∈ G}) is semiautomatic.

Example 4. Let S be the set of square numbers. Then (N, S, <, =; +) is semi-
automatic. This structure is obtained by first using a default automatic rep-
resentation (A, +, <, =) of the additive monoid of the natural numbers and
then to let B = {conv(a, b) : a, b ∈ A ∧ b a + a} be the desired struc-
ture. Here conv(a, b) represents a2 + b. One has now conv(a, b) < conv(a , b )
iff a < a ∨ (a = a ∧ b < b ). Furthermore, conv(a, b) + 1 = conv(a , b ) iff
(a = a ∧ b = b + 1 a + a) ∨ (a = a + 1 ∧ b = a + a ∧ b = 0). Iterated
addition with 1 defines the addition with any fixed natural number. Note that
(N, S, +, <, =) is not automatic.

The term algebra of a binary function f over a constant a consist of the term a
and all terms f (x, y) formed from previous terms x and y; for example f (a, a),
f (a, f (a, a)) and f (f (a, a), f (a, a)) are terms. Let T denote the set of all terms
formed using the constant a and binary function f .

Theorem 5. The term algebra (T ; f, =) is semiautomatic.

Proof. Let x0 , x1 , . . . be a one-one enumeration of all terms. One now has to

ﬁnd a representation of the terms in which all mappings lef tk : y → f (xk , y)
and rightk : y → f (y, xk ) are automatic. The idea is to represent a by 0 and
each function lef tk by 012k and each function rightk by 012k+1 . That is, if w
210 S. Jain et al.

represents a term y, then 012k w denotes lef tk (y) and 012k+1 w denotes rightk (y).
Note that each term starts with a 0 and thus, for each w ∈ (01∗ )∗ 0, there is a
unique term represented by w.
For the above representation, the functions lef tk and rightk are clearly au-
tomatic, as each of them just inserts the preﬁx 012k or 012k+1 in front of the
input. Thus, f is semiautomatic.
Let depth(a) = 0 and depth(f (x, y)) = 1+max{depth(x), depth(y)}. Now each
term y has only ﬁnitely many representations, as it can only have representations
which have at most depth(y) + 1 zeros and each lef tk or rightk used in the
representation must be a sub-term of y. Thus, = is semiautomatic.

3 Relational Structures, Permutation Algebras and

Ordinals
This section shows that when the signature of the structure is very restricted
then the resulting structure is always semiautomatic. Theorem 6 says that if a
structure over a countable domain consists only of relations and each of these
relations has only to be semiautomatic and not automatic, then one can indeed
find a representation for this structure; this first result will then be applied to
show that every countable set of ordinals which is closed under + and < has a
semiautomatic representation.
Theorem 6. Every relational structure (A; R1 , R2 , . . .), given by at most count-
ably many relations R1 , R2 , . . . over a countable domain A, is semiautomatic.
Delhommé [2] showed that some automatic ordered set (A, <, =) is isomorphic
to the set of ordinals below α iff α < ω ω . Furthermore some tree-automatic set
ω
(A, <, =) is isomorphic to the set of ordinals below α iff α < ω ω . It follows
directly from Theorem 6 that every countable set of ordinals is isomorphic to
an semiautomatic ordered set, the next result shows that one can combine this
result also with a semiautomatic addition of ordinals.
Theorem 7. Let α be a countable ordinal. The structure ({β : β < ω α }; +, <, =)
is semiautomatic.
Proof. Let (A; <, =) be a semiautomatic representation of the ordinals below α
and assume that the symbols ω, ^, (, ), + are not in the alphabet of A. Now let B
be the set of all strings of the form ω^(a0 ) + ω^(a1 ) + . . . + ω^(an ) representing
the ordinal ω a0 + ω a1 + . . . + ω an with the empty string representing 0. The
set of all possible representations of the ordinals below ω α is regular. Now one
can realise the addition of two non-empty strings v, w by forming v + w, so if
v = ω^(5) + ω^(3) + ω^(3) and w = ω^(4) + ω^(1) + ω^(0) then v + w = ω^(5) +
ω^(3) + ω^(3) + ω^(4) + ω^(1) + ω^(0) representing ω 5 + ω 3 + ω 3 + ω 4 + ω 1 + ω 0 .
Ordinals have the rules that if a < b then ω a + ω b = ω b , hence the above ordinal
equals ω 5 + ω 4 + ω 1 + ω 0 .
Following Cantor’s arguments, for every ordinal β < ω α , there is a normal
form β = ω b0 + ω b1 + . . . + ω bn for some n with b0 , b1 , . . . , bn ∈ A and b0
Semiautomatic Structures 211

b1 . . . bn . Now, given an ordinal w = ω a0 + ω a1 + . . . + ω am , it holds that

w = β iff there are i0 , i1 , . . . , in such that i0 < i1 < . . . < in = m and for
all k m, the least index j with k ij must satisfy k < ij ⇒ ak < bj and
k = ij ⇒ ak = bj . Given the automata to check whether some a ∈ A is below or
equal bj , one can build from these finite automata an automaton which checks
whether w = β. Furthermore, w < β iff there are n n and i0 , i1 , . . . , in −1 , in
with i0 < i1 < . . . < in = m + 1 and for all k m, the least index j with k ij
must satisfy k < ij ⇒ ak < bj and k = ij ⇒ ak = bj . Again the corresponding
test can be realised by a finite automaton.
The next result shows that permutation algebras are semiautomatic. Here a
permutation algebra is a domain A plus a function f such that f is a bijection.
Furthermore, the domain A is assumed to be countable and this assumption
applies to all structures considered in the present paper.
Theorem 8. Every permutation algebra (A, f ; =) is semiautomatic.
Proof. Let the orbit of a set z be the set of all z such that there is an n
with f n (z ) = z or f n (z) = z . The idea is to pick up a set X of elements
x0 , x1 , . . . such that for each z there is exactly one xk in its orbit and represents
X by the set Y of its indices (which is either N or a finite subset of N). Now
the domain is Y × Z and one uses the following semiautomatic equivalence re-
lation =: conv(k, 0) represents xk and conv(k, h) = conv(k , h ) iff k = k and

f |h−h | (xk ) = xk ; here |h − h | is the absolute value of h − h and if one starts at
xk , then conv(k, h) = conv(k, h ) iff |h−h | times applying f to xk gives xk again.
Furthermore, f (k, h) = (k, h + 1). It is easy to see that f is automatic on a suit-
able representation of Y × Z and that also every set {(k , h ) : (k , h ) = (k, h)} is
regular as either it is the set {(k, h)} itself or it is the set of all {(k, h+·c) : ∈ Z}
for some c ∈ {1, 2, . . .}.
Theorems 6 and 8 either use relations or a single unary function. If one has both
of these concepts, then the result does no longer hold. The next result below
stands in contrast with Theorem 6. Note that though relations can be made
semi-automatic using the technique of Theorem 6, functions cannot: the intuitive
reason is that the graphs of functions over one variable are relations over two
variables (one for the input and one for the output), and the semiautomaticity
requirement is that the graph of the function (which is two variable relation) is
automatic.
Theorem 9. There is a recursive subset B of N such that the structure (N, B,
Succ; =) is not semiautomatic.
Proof. Let B be a recursive subset of N which is not exponential time com-
putable and let Succ : N → N be the successor function from x to x + 1.
If the above structure (N, B, Succ; =) is semiautomatic, then there exists a
regular domain A (representing N), an automaton M accepting B ⊆ A and a lin-
ear time computable function f : A → A representing S, where {conv(x, f (x)) :
x ∈ A} is regular. Let w ∈ A represent 0. Thus, f n (w) represents n and f n (w)
has length at most (n + 1) · c for some constant c.
212 S. Jain et al.

Now, B(n) can be decided by ﬁrst computing f n (w) and then checking if
M (f n (w)) accepts. This can be done in time polynomial in n and thus expo-
nential in the length of the binary representation of n. This is a contradiction,
as B was chosen not to be exponential time computable. Thus, the structure
(N, B, Succ; =) cannot be semiautomatic. Note that if the structure contains
only one of B and f , then it has to be automatic, as they are a predicate (char-
acteristic function of set) and a function with only one input variable and the
proof does not even use whether = is automatic or semiautomatic at all.

4 Groups and Order

Khoussainov, Rubin and Stephan [9, Corollary 4.4] showed (in slightly diﬀerent
words) that there is a semiautomatic presentation of (Z, =; +) in which the
order of the integers is not semiautomatic. An important question left open is
whether one can improve this result such that the presentation of (Z, +, =) used
is automatic.

Question 10. Is there an automatic presentation of the integers such that ad-
dition and equality are automatic while the set of positive integers is not regular,
that is, the ordering of the integers is not semiautomatic?

Note that a positive answer to this question would be a strengthening of the

fact that the order < is not first-order definable in (Z, +, =). This question mo-
tivates to study the connections between automatic and semiautomatic groups
and order. For this, recall the definition of ordered groups.

Deﬁnition 11. A group (G, ◦) is a structure with a neutral element e such that
for all x, y, z ∈ G there is a u ∈ G satisfying x◦e = e◦x = x, x◦(y◦z) = (x◦y)◦z,
u ◦ x = e and x ◦ u = e. Such a structure without the last statements on the
existence of the inverse is called a monoid. An ordered group (G, ◦, <) satisﬁes
that < is transitive, antisymmetric and that all x, y, z ∈ G with x < y satisfy
x ◦ z < y ◦ z and z ◦ x < z ◦ y. If the preservation of the order holds only for
operations with z from one side, then one calls the corresponding group right-
ordered or left-ordered, respectively.

The ﬁrst result is that in a semiautomatic group (G, ◦; =) and a semiautomatic

ordered group (G, ◦; <, =), the relations = and < are indeed automatic.

Proposition 12. Given a semiautomatic presentation (G, ◦; =) of a group, the

equality in this presentation is already automatic; similarly, given any semiau-
tomatic presentation (G, ◦; <, =) of an ordered group, the equality and order in
this presentation are both automatic.

Proof. Note that there are now several members of the presentation G of the
group which are equal, for ease of notation one just writes still x ∈ G in this
case.
Semiautomatic Structures 213

So let the semiautomatic presentation (G, ◦; =) be given and let e be the

neutral element. In particular the set of all representatives of e is regular. Now
one can define an automatic function neg which finds for every x ∈ G an element
neg(x) ∈ G with x ◦ neg(x) = e. Having this function and using that ◦ is
automatic, one has that x = y ⇔ x ◦ neg(y) = e, hence = is automatic.
Similarly, given an automatic presentation (G, ◦; <, =) of an ordered group,
one shows again that = is automatic. Furthermore, as the set {u ∈ G : u < e}
is regular, one can use that x < y iff x ◦ neg(y) < e in order to show that < is
also automatic.
There are numerous examples of ordered automatic groups. It is clear that such
groups must be torsion-free. Examples would be the additive group of integers,
Zn with lexicographic order on the components and pointwise addition, sub-
groups of the rationals generated by elements of the form x−k for some fixed
rational x and k ranging over N. So it is natural to look for further examples, in
particular noncommutative ones. The next result shows noncommutative auto-
matic ordered groups do not exist; note that the result holds even if in the group
below only (G, ◦, =) is automatic and the ordering exists, but is not effective.

Theorem 13. Every ordered automatic group (G, ◦, <, =) is Abelian.

Proof. Let an automatic ordered group (G, ◦, <, =) be given, as the equality
is automatic, one can without loss of generality assume that every element of
the group is given by a unique representative in G. Nies and Thomas [12,13]
showed that due to the automaticity every finitely generated subgroup (F, ◦) of
G satisfies that it is Abelian by finite. In particular every two elements v, w of F
satisfy that there is a power n with v n ◦ wn = wn ◦ v n . Now, following arguments
of Neumann [11] and Fuchs [4, page 38, Proposition 10], one argues that the
group is Abelian.
In the case that v ◦ wn = wn ◦ v, consider the element wn ◦ v ◦ w−n ◦ v −1
which is different from e; without loss of generality wn ◦ v ◦ w−n ◦ v −1 < e. By
multiplying from both sides inductively with wn ◦ v ◦ w−n and v −1 , respectively,
one gets inductively the relation (wn ◦ v ◦ w−n )m+1 ◦ v −(m+1) < (wn ◦ v ◦ w−n )m ◦
v −m < e for m = 1, 2, . . . , n and by associativity and cancellation the relation
wn ◦ v n ◦ w−n ◦ v −n < e can be derived. This contradicts the assumption that
v n and wn commute and therefore wn ◦ v n ◦ w−n ◦ v −n = e.
In the case that v ◦ wn = wn ◦ v, one again assumes that v ◦ w ◦ v −1 ◦ w−1 < e
and derives that v ◦ wn ◦ v −1 ◦ w−n < e contradicting the assumption that v
and wn commute. Hence one can derive that any two given elements v, w in G
commute and (G, ◦) is an Abelian group.

Example 14. The Klein bottle group is an example of a noncommutative left-

ordered group. This is the group of all ai bj with generators a, b and the deﬁning
equality a ◦ b = b−1 ◦ a. One represents the group as the set of all conv(i, j) with
i, j ∈ Z using an automatic presentation of (Z, +, <). Now the group operation

ai bj ◦ ai bj is given by the mapping from conv(i, j), conv(i , j ) to conv(i +
i , j + j ) in the case that i is even and to conv(i + i , −j + j ) in the case
214 S. Jain et al.

that i is odd. Thus the group is automatic. The ordering on the pairs is the

lexicographic ordering, that is, ai bj < ai bj iff i < i or i = i ∧ j < j . Using

some case distinction, one can show that ai bj < ai bj iff a ◦ ai bj < a ◦ ai bj

iff b ◦ ai bj < b ◦ ai bj and deduce from these basic relations that the group is
left-ordered.
A central motivation of Question 10 is the connection between definability and
automaticity of the order in groups. The next example shows that for some semi-
automatic groups, the order can be first-order defined from the group operation
(which is not the case with the integers). In the example one cannot have that
◦ is automatic, as the group is not commutative.
Theorem 15. There is a semiautomatic noncommutative ordered group (G, <,
=; ◦) such that the ordering is first-order definable from the group operation.
Theorem 16. The additive ordered subgroup ({n · 6m : n, m ∈ Z}, +, <) of the
rationals has a presentation in which the addition and equality are automatic
while the ordering is not semiautomatic.
Proof. The idea is to represent group elements as conv(a, b, c) representing a +
b+c where a ∈ Z, b = b1 b2 . . . bn ∈ {0}∪{0, 1}∗ ·{1} represents b1 /2+b2 /4+. . .+
bn /2n and c1 c2 . . . cm ∈ {0}∪{0, 1, 2}∗ ·{1, 2} represents c1 /3+c2/9+. . .+cm /3m .
The representation of Z is chosen such that addition is automatic. Furthermore,
now one adds conv(a, b, c) and conv(a , b , c ) by choosing conv(a , b , c ) such
that the represented values satisfy a = a + a + (b + b − b ) + (c + c − c )
and b ∈ {b + b , b + b − 1} and c ∈ {c + c , c + c − 1} and 0 b < 1 and
0 c < 1. It can be easily seen that the resulting operation is automatic.
Assume now by way of contradiction that one could compare the fractional
parts b and c of a number in order, that is, the relation {(b, c) : conv(0, b, 0) <
conv(0, 0, c)} would be automatic. Then one could first-order define a function
f which maps every ternary string c to the length-lexicographic shortest binary
string b satisfying conv(0, 0, c1) < conv(0, b, 0) < conv(0, 0, c2). There are 3n · 2
ternary strings c of length n + 1 not ending with a 0 representing different values
between 0 and 1 and f maps these to 3n · 2 different binary strings representing
values between 0 and 1; as the resulting strings are binary, some of these values
f (c) must have the length at least n · log(3)/ log(2). However, this contradicts
the fact the length of f (c) is at most a constant longer than c for all inputs
c from the domain of f (as f is first-order defined from an automatic relation
and thus automatic). Thus the function f cannot be automatic and therefore
the ordering can also not be automatic. It follows from Proposition 12 that the
order is not even semiautomatic.
Tsankov [14] showed that the structure (Q, +, =) is not automatic. However, one
can still get the following weaker representation.
Theorem 17. The ordered group (Q, <, =; +) of rationals is semiautomatic.
Theorem 18. Let G be a Baumslag Solitar group, that is, be a finitely generated
group with generators a, b and the defining relation bn a = abm for some m, n ∈
Z − {0}. Then the group (G; ◦, =) is semiautomatic.
Semiautomatic Structures 215

5 Rings
The ring of integers (Z, +, <, =; ·) is semiautomatic, the semiautomaticity of the
multiplication stems from the fact that multiplication with fixed constants can
be implemented by repeated adding or subtracting the input from 0 a fixed
number of times. One can, however, augment the ring of integers with a root of
a natural number and still preserve that addition and order are automatic and
multiplication is semiautomatic.
√
Theorem 19. The ring (Z( n), +, <, =; ·) has for every positive natural num-
ber n a semiautomatic presentation.
The next result deals with noncommutative rings where the multiplication is not
commutative and where a 1 does not need to exist.
Theorem 20. There is a ring (R, +, =, ·) such that (R, +, =) is an automatic
group and the family of functions {y → y · x : x ∈ R} is semiautomatic while
every function y → x · y with x ∈ R fixed is either constant 0 or not automatic
(independent of the automatic representation chosen for the ring).

6 Fields and Vector Spaces

In the following, let (A, +, <, =; ·) be a semiautomatic ordered ring. Note that
such a ring is an integral domain, as given two nonzero factors v, w, one can
(after multiplication with −1 when needed) assume that 0 < v and 0 < w; then
it follows that 0 < v · w and therefore v · w differs from 0. Hence the quotient
field is always defined.
Theorem 21. If (A, +, <, =; ·) is a semiautomatic ordered ring then the unique
quotient field F defined by the ring is an ordered semiautomatic field (F ; +, ·, <,
=).
Proof. The members of F are of the form ab with a, b ∈ A and 0 < b; they are
represented by conv(a, b) but for convenience in the following always written as
a a
b . Let b be a fixed element of F and consider adding, multiplying and comparing

with ab :

+a ·b
– The addition ab → a·bb·b is automatic, as multiplication with fixed ring

elements a , b is automatic and adding of ring elements is also automatic;
– The multiplication ab → a·a b·b is automatic, for the same reasons as addition;
– The set { ab : a · b < a · b} of all representatives of members of F less than
a
b is regular;
– The set { ab : a · b = a · b} of all representatives of ab is regular.
Hence (F ; +, ·, <, =) is semiautomatic; the verification that the resulting struc-
ture is an ordered field follows the verification that the rationals are an ordered
field when constructed from the ring of integers, this verification is left to the
reader.
216 S. Jain et al.

Corollary 22. The ordered

√ ﬁeld (Q; +, ·, <, =) of the rationals and, for all n ∈
N, the extensions (Q( n); +, ·, <, =) are semiautomatic.

Theorem 23. If (A, +, <, =; ·) is a semiautomatic ordered ring then every ﬁ-

nite-dimensional vector space (F n ; +, ·, =) defined from the quotient field F of
the ring A has a semiautomatic representation and all linear mappings from F n
to F n are automatic. In particular, finite algebraic extensions (G; +, ·, =) of the
field (F, +, ·, =) are semiautomatic.

Question 24. (a) Are the structures (Q, <, =; +, ·) or (Q, =; +, ·) semiauto-
matic? In other words, is it really needed, as done in the above default rep-
resentations, that the equality and the order are not automatic?
(b) Is the polynomial ring (Q[x]; +, ·, =) semiautomatic?
(c) Is there a transcendental ﬁeld extension of the rationals which is semiauto-
matic?

The counterpart of Questions 24 (b) and (c) for ﬁnite ﬁelds has a positive answer.

Theorem 25. Let (F, +, ·) be a finite field. Then the following structures are
semiautomatic:
– Every (possibly infinite) algebraic extension (G, +, =; ·) of the field;
– The polynomial rings (F [x], +, =; ·) in one variable and (F [x, y]; +, ·, =) in
two or more variables;
– The field of fractions ({ ab : a, b ∈ F [x] ∧ b = 0}; +, ·, =) over the polynomial
ring with one variable.

7 Conclusion
The present work gives an overview on initial results on semiautomatic structures
and shows that many prominent structures (countable ordinals with addition,
the ordered fields of rationals extended perhaps by one root of an integer, al-
gebraic extensions of finite fields) are semiautomatic and investigates to which
degree one can still have that some of the involved operators and relations are
automatic. Several concrete questions are still open, in particular the following
ones: Is there an automatic presentation of the integers such that addition and
equality are automatic while the ordering of the integers is not semiautomatic?
Are the structures (Q, <, =; +, ·) or (Q, =; +, ·) semiautomatic, that is, can in the
semiautomatic field of rationals the order and the equality be made automatic?
The corresponding is possible for the additive group of rationals.
Additional questions might relate to the question of effectivity. For example,
for a given function f in some given structure, can one effectively find from the
parameter y an automaton for x → f (x, y)? While this is impossible for the most
general results in Section 3, the concrete structures in Sections 4, 5 and 6 permit
that one obtains the automata from the representatives by recursive functions.
The complexity of these functions might be investigated in subsequent work for
various structures.
Semiautomatic Structures 217

Acknowledgements. The authors would like to thank Anil Nerode as well as

the participants of the IMS Workshop on Automata Theory and Applications
who discussed the topic and initial results with the authors.

References
1. Case, J., Jain, S., Seah, S., Stephan, F.: Automatic Functions, Linear Time and
Learning. In: Cooper, S.B., Dawar, A., Löwe, B. (eds.) CiE 2012. LNCS, vol. 7318,
pp. 96–106. Springer, Heidelberg (2012)
2. Delhommé, C.: Automaticité des ordinaux et des graphes homogènes. Comptes
Rendus Mathematique 339(1), 5–10 (2004)
3. Epstein, D.B.A., Cannon, J.W., Holt, D.F., Levy, S.V.F., Paterson, M.S., Thurston,
W.P.: Word Processing in Groups. Jones and Bartlett Publishers, Boston (1992)
4. Fuchs, L.: Partially Ordered Algebraic Systems. Pergamon Press (1963)
5. Hodgson, B.R.: Décidabilité par automate ﬁni. Annales des Sciences
Mathématiques du Québec 7(1), 39–57 (1983)
6. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Lan-
guages and Computation, 3rd edn. Addison-Wesley (2007)
7. Kharlampovich, O., Khoussainov, B., Miasnikov, A.: From automatic structures to
automatic groups. CoRR abs/1107.3645 (2011)
8. Khoussainov, B., Nerode, A.: Automatic presentations of structures. In: Leivant,
D. (ed.) LCC 1994. LNCS, vol. 960, pp. 367–392. Springer, Heidelberg (1995)
9. Khoussainov, B., Rubin, S., Stephan, F.: Deﬁnability and Regularity in Automatic
Structures. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996, pp.
440–451. Springer, Heidelberg (2004)
10. Miasnikov, A., Šunić, Z.: Cayley graph automatic groups are not necessarily Cayley
graph biautomatic. In: Dediu, A.-H., Martı́n-Vide, C. (eds.) LATA 2012. LNCS,
vol. 7183, pp. 401–407. Springer, Heidelberg (2012)
11. Neumann, B.H.: On ordered groups. American Journal of Mathematics 71, 1–18
(1949)
12. Nies, A.: Describing Groups. The Bulletin of Symbolic Logic 13(3), 305–339 (2007)
13. Nies, A., Thomas, R.: FA-presentable groups and rings. Journal of Algebra 320,
569–585 (2008)
14. Tsankov, T.: The additive group of the rationals does not have an automatic pre-
sentation. The Journal of Symbolic Logic 76(4), 1341–1351 (2011)
The Query Complexity of Witness Finding

Akinori Kawachi1, Benjamin Rossman2 , and Osamu Watanabe1

1
Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology
Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan
2
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

Abstract. We study the following information-theoretic witness finding

problem: for a hidden nonempty subset W of {0, 1}n , how many non-
adaptive randomized queries (yes/no questions about W ) are needed to
guess an element x ∈ {0, 1}n such that x ∈ W with probability > 1/2?
Motivated by questions in complexity theory, we prove tight lower bounds
with respect to a few different classes of queries:
• We show that the monotone query complexity of witness finding
is Ω(n2 ). This matches an O(n2 ) upper bound from the Valiant-
Vazirani Isolation Lemma [8].
• We also prove a tight Ω(n2 ) lower bound for the class of NP queries
(queries defined by an NP machine with an oracle to W ). This shows
that the classic search-to-decision reduction of Ben-David, Chor,
Goldreich and Luby [3] is optimal in a certain black-box model.
• Finally, we consider the setting where W is an affine subspace of
{0, 1}n and prove an Ω(n2 ) lower bound for the class of intersection
queries (queries of the form “W ∩ S = ∅?” where S is a fixed subset
of {0, 1}n ). Along the way, we show that every monotone property
defined by an intersection query has an exponentially sharp threshold
in the lattice of affine subspaces of {0, 1}n .

1 Introduction

We initiate a study of the following information-theoretic search problem, pa-

rameterized by a family W of subsets of {0, 1}n and a family Q of functions
W → {+, ⊥} (i.e. yes/no questions about elements of W, which we refer to as
“queries”).

Question 1. What is the minimum number of nonadaptive randomized queries

from Q required to guess an element x ∈ {0, 1}n such that P[x ∈ W ] > 1/2 for
every nonempty W ∈ W?

Formally, Question 1 asks for a joint distribution (Q1 , . . . , Qm ) on Qm to-

gether with a function f : {+, ⊥}m → {0, 1}n such that

P[f (Q1 (W ), . . . , Qm (W )) ∈ W ] > 1/2

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 218–231, 2014.

c Springer International Publishing Switzerland 2014
The Query Complexity of Witness Finding 219

for every nonempty W ∈ W. We emphasize that randomized queries Q1 , . . . , Qm

are non-adaptive, though not necessarily independent.1
We refer to Question 1 as the witness finding problem and to its answer,
m = m(W, Q), as the Q-query complexity of W-witness finding. (We introduce
the terminology “witness finding” to distinguish this information-theoretic prob-
lem from traditional computational search problems where the solution space is
determined by an input, such as a boolean formula ϕ in the case of the search
problem for SAT.) Note that m(W, Q) is monotone increasing with respect to W
and monotone decreasing with respect to Q. In this paper, we mainly study the
setting where W is the set of all subsets of {0, 1}n. Here, to simplify notation,
we simply write m(Q) and speak of the Q-query complexity of witness finding.
Our main results are tight lower bounds on m(Q) for a few specific classes of
queries (namely, intersection queries, monotone queries and NP queries). How-
ever, before defining these classes and stating our results formally, let us first
dispense with the trivial cases where Q is the class All of all possible queries or the
class Direct of direct queries of the form “x ∈ W ?” where x ∈ {0, 1}n. It is easy
to see that m(All) = n and m(Direct) = 2n − 1. Both lower bounds m(All) ≥ n
and m(Direct) ≥ 2n − 1 follow from considering the random singleton witness
set {x} where x is uniform in {0, 1}n. The upper bound m(Direct) ≤ 2n − 1
is obvious, while the upper bound m(All) ≤ n comes via deterministic queries
Q1 , . . . , Qn where Qi (W ) asks for the ith coordinate in the lexicographically
minimal element of W .

1.1 Intersection Queries and Monotone Queries

The first class Q that we consider, for which the question of m(Q) is nontrivial,
is the class Intersection of intersection queries of the form “S ∩ W = ∅?” for
fixed S ⊆ {0, 1}n. As we now explain, the Valiant-Vazirani Isolation Lemma [8]
gives an elegant upper bound of m(Intersection) = O(n2 ). First, note that if W
is a singleton {w}, then n nonadaptive intersection queries suffice to learn w: for
1 ≤ i ≤ n, we ask “Si ∩W = ∅?” where Si = {x ∈ {0, 1}n : xi = 0}. Moreover, by
asking n additional intersection queries “Ti ∩ W = ∅?” where Ti = {x ∈ {0, 1}n :
xi = 1}, we can learn whether or not W is a singleton, in addition to learning
w in the event that W = {w}. The Valiant-Vazirani Isolation Lemma gives a
distribution X on subsets of {0, 1}n such that P[|W ∩X| = 1] = Ω(1/n) for every
nonempty W ⊆ {0, 1}n. By taking s = O(n) independent copies of X1 , . . . , Xs
of this distribution X, we have P[ sj=1 |W ∩ Xj | = 1] > 1/2 for every nonempty
W ⊆ {0, 1}n. We now get a witness finding procedure which makes 2ns = O(n2 )
randomized intersection queries for sets Si,j := Si ∩ Xj and Ti,j := Ti ∩ Xj . (By
now the reader will have noticed our convention of designating random variables
by bold letters.)
1
That is, Q1 and Q2 may be dependent random variables. However, conditioned on
Q1 = Q1 , Q2 cannot depend on the answer Q1 (W ) ∈ {, ⊥}. We remark that
Question 1 is trivial for adaptive queries: for any class Q which includes queries
“∃x ∈ W such that xi = 1?”, n adaptive (deterministic) queries suffice to find an
element in every nonempty W .
220 A. Kawachi, B. Rossman, and O. Watanabe

The present paper started out as an investigation into the question whether
O(n2 ) is a tight upper bound on m(Intersection). This question arose from work of
Dell, Kabanets, van Melkebeek and Watanabe [7], who showed that the Valiant-
Vazirani Isolation Lemma is optimal among so-called black-box isolation proce-
dures:
Theorem 1 ([7]). For every distribution X on subsets of {0, 1}n , there exists
nonempty W ⊆ {0, 1}n such that P[|X ∩ W | = 1] = O(1/n).
Borrowing an idea from the proof of Theorem 1 (namely, a particular distribu-
tion on subsets of {0, 1}n), we were able to show m(Intersection) = Ω(n2 ). (Note
that Theorem 1 can be derived from this lower bound, as any black-box isolation
procedure with success probability o(1/n) would show that m(Intersection) =
o(n2 ) by the argument sketched above.) As a natural next step, we consid-
ered the class of monotone queries, that is, Q : ℘({0, 1}n) → {+, ⊥} such that
Q(W ) = + ⇒ Q(W ) = + for all W ⊆ W ⊆ {0, 1}n. Note that intersec-
tion queries are monotone, hence n ≤ m(Monotone) ≤ m(Intersection) = Θ(n2 ).
Generalizing our lower bound for intersection queries, we were able to prove the
stronger result:
Theorem 2. The monotone query complexity of witness ﬁnding, m(Monotone),
is Ω(n2 ).
We present the proof of Theorem 2 in §2. The proof uses an entropy argument,
which hinges on the threshold behavior of monotone queries (in particular, the
theorem of Bollobás and Thomason [4]).

1.2 NP Queries
Another motivation for studying Question 1 comes from a question concerning
search-to-decision reductions. In the context of SAT, a search-to-decision reduc-
tion is an algorithm which, given a boolean function ϕ(x1 , . . . , xn ), constructs a
satisfying assignment x ∈ {0, 1}n for ϕ (if one exists) using an oracle for the SAT
decision problem. The standard PNP search-to-decision reduction uses n adap-
tive deterministic queries. In the setting of nonadaptive randomized queries,
Ben-David, Chor, Goldreich and Luby [3] (using the Valiant-Vazirani Isolation
Lemma) gave a BPPNP 2
|| search-to-decision reduction with O(n ) queries. (BPP||
NP

is the class of BPP algorithms with non-adaptive (parallel) query access to an

NP oracle.)
We are interested in lower bounds for the query complexity of search-to-
decisions for SAT. Of course, any nontrivial lower bound would separate P from
NP. However, we can consider a “black-box” setting where, instead of receiving
a boolean formula ϕ(x1 , . . . , xn ) as input, the BPPNP
|| algorithm (including both
the BPP machine and the NP machine) are given input 1n as well as an oracle
to the set {x ∈ {0, 1}n : x is a satisfying assignment for ϕ}. On inspection, it is
clear that the reduction of Ben-David et al. (which is indiﬀerent to the syntax
of the boolean formula ϕ) carries over to this black-box setting. Thus, we have
the upper bound:
The Query Complexity of Witness Finding 221

Theorem 3 (follows from [3]). There is a BPPNP || algorithm which solves the
black-box satisfiability search problem with O(n2 ) queries.
Motivated by this connection to complexity theory, we next set our sights on
the question whether O(n2 ) is tight in Theorem 3. To fit the question into the
framework of Question 1, we define the class of NP queries as follows.
Definition 1. Informally, an NP query is a query Q given by an NP machine
M with an oracle to W where Q(W ) = M W (1n ) (i.e. Q(W ) = + ⇔ M W has
an accepting computation on input 1n ). Formally, an NP query is a sequence
Q = (Q1 , Q2 , . . . ) of queries Qn : ℘({0, 1}n) → {+, ⊥}) such that there exists a
single NP machine M () (with an unspecified oracle) where Qn (W ) = M W (1n )
for every W ⊆ {0, 1}n. An ensemble of NP queries is a sequence (Q1 , . . . , Qm )
of NP queries given by NP machines M1 , . . . , Mm which have a common upper
bound t(n) = nO(1) on their running time.
The NP query complexity of witness finding, m(NP), gives a lower bound on
the query complexity of BPPNP|| algorithms solving the black-box satisfiability
search problem. Note that NP queries and monotone queries are incomparable:
NP queries clearly need not be monotone, while it can be shown that the mono-
tone “majority” query (defined by Qmaj (W ) = + iff |W | ≥ 2n−1 ) is not an NP
query.2 Nevertheless, we show that every NP query can be well-approximated by
a monotone query (Lemma 7). Using this result together with our lower bound
for m(Monotone), we show:
Theorem 4. The NP query complexity of witness finding, m(NP), is Ω(n2 ).
Theorem 4 thus establishes the optimality of the search-to-decision reduction
of Ben-David et al. in the black-box setting. The proof is presented in §3.

1.3 Aﬃne Witness Sets

Finally, we consider the setting where W is the set of affine subspaces of {0, 1}n.
Here, for a class of queries Q, we write maffine (Q) and speak of the Q-query
complexity of affine witness finding. While maffine (Q) ≤ m(Q) by definition,
intuitively the affine witness finding problem is easier because there are only
2 n
2O(n ) possibilities for W , as opposed to 22 . One motivation for studying the
affine setting comes from the observation that lower bounds on maffine (NP) imply
lower bounds on the complexity of the black-box satisfiability search problem
on polynomial-size boolean formulas, since every affine subspace of {0, 1}n is the
set of satisfying assignments to a polynomial-size boolean formula of n variables.
While we were unable to prove any nontrivial lower bounds on maffine (Monotone)
or maffine (NP), we did get a result for intersection queries:
2
Due to uniformity issues, it does not make sense to compare the classes of NP queries
and intersection queries. However, for a natural notion of non-uniform NP queries,
every intersection query “S ∩ W = ∅?” is a non-uniform NP query where the NP
machine M hardwires S using 2n advice bits, non-deterministically guesses x ∈ S
and simply verifies that x ∈ W using one oracle call to W .
222 A. Kawachi, B. Rossman, and O. Watanabe

Theorem 5. The intersection query complexity of aﬃne witness ﬁnding,

maﬃne (Intersection), is Ω(n2 ).

The proof is presented in §4. Along the way, we show that every monotone
property defined by an intersection query has an exponentially sharp thresh-
old in the lattice of affine subspaces of {0, 1}n (Theorem 6). This raises the
question whether all monotone properties have an exponentially sharp threshold
in the affine lattice (Question 2); we note that a positive answer would imply
maffine (Monotone) = Ω(n2 ).

2 Lower Bound for Monotone Queries

In this section, we prove Theorem 2 (m(Monotone) = Ω(n2 )) using an information-
theoretic argument. We briefly present the relevant notation. Let H : [0, 1] → [0, 1]
denote the binary entropy function H(p) := p log(1/p)+(1−p) log(1/(1−p)). For
finite random variables X and Y, entropy H(X) and relative entropy H(X | Y)
are defined by

H(X) := P[X = x] · log(1/P[X = x]),
x∈Supp(X)

H(X | Y) := P[Y = y] · H(X | Y = y).
y∈Supp(Y)

(Here H(X | Y = y) is the entropy of the marginal distribution of X conditioned

on Y = y.) We assume familiarity with the basic properties of entropy, namely
the chain rule H(X, Y) = H(X) + H(Y | X), the fact that H(f (X)) ≤ H(X) for
every deterministic function f of X, and the fact H(X) ≤ log |Supp(X)| with
equality iﬀ X is uniform (for more background, see [6]).
Our lower bound uses a standard averaging argument (Yao’s principle) to
invert the role of randomness in the deﬁnition of m(W, Q). For completeness,
the proof is included in Appendix A.

Lemma 1. Suppose W is a random variable on W \ {∅} such that for all

Q1 , . . . , Qm ∈ Q and every function f : {+, ⊥}m → {0, 1}n,

P[f (Q1 (W), . . . , Qm (W)) ∈ W] ≤ 1/2.

Then the Q-query complexity of W-witness ﬁnding is > m.

We now deﬁne a particular random subset W of {0, 1}n. For all 0 ≤ k ≤ n, let
Wk be the random subset of {0, 1}n containing each x ∈ {0, 1}n independently
with probability nk−n . Let k be uniformly distributed in {1, . . . , n/2}.3 Finally,
let W := Wk . (A similar distribution was considered by Dell et al. [7] in proving
3
For convenience, we assume n/2 is an integer (or an abbreviation for
n/2). For
purposes of §2, k could just as well be monotone in {1, . . . , n}. For purposes of §3, we
merely require that k be uniformly distributed in {1, . . . , n } where n ≤ n−logω(1) n.
The Query Complexity of Witness Finding 223

an upper bound of O(1/n) on the success probability of black-box isolation

procedures.)
The following lemma is a special case of the Bollobás-Thomason Theorem [4]
(informally, “every monotone increasing property of subsets of a ﬁxed set has a
threshold function”). For completeness, a simple self-contained proof is included
in Appendix B.

Lemma 2. Let Q be a non-trivial monotone increasing property of subsets of

{0, 1}n. For all 0 ≤ k ≤ n, let pk := P[Wk has property Q]. Let θ be the unique
index such that pθ ≤ 1/2 < pθ+1 . Then

(1) pθ−i ≤ 2−i ln 2 for all 0 ≤ i ≤ θ,

pθ+i+1 ≥ 1 − 2−2
i
(2) for all 0 ≤ i ≤ n − θ − 1,
|θ−k|−1
(3) H(pk ) ≤ (|θ − k| + 1)/2 for all 0 ≤ k ≤ n.

Using Lemma 2(3), we prove a sharp bound on the relative entropy Q(W | k)
all monotone queries Q.

Lemma 3. H(Q(W) | k) = O(1/n) for every monotone query Q.

Proof. If Q is identically ⊥ or +, then the statement is trivial (as H(Q(W) |

k) = 0). So assume Q is a non-trivial monotone query and let p0 , . . . , pn and θ
be as in Lemma 2. Then

The next lemma relates the entropy of an arbitrary random variable z on

{0, 1}n to the probability that z ∈ W.

Lemma 4. For every random variable z on {0, 1}n (not necessarily independent
of W),
4 1
P[z ∈ W] ≤ H(z) + n/4 .
n 2
Proof. Deﬁne S ⊆ {0, 1}n by S := {x ∈ {0, 1}n : P[z = x] ≥ 2−n/4 }. Note that

P[z ∈ W] ≤ P[z ∈
/ S] + P[S ∩ W = ∅].

We bound each these righthand probabilities. First, by deﬁnition of S and H(z),

log(1/P[z = x]) 4
P[z ∈
/ S] = P[z = x] ≤ P[z = x] ≤ H(z).
n/4 n
x∈{0,1}n \S x∈{0,1}n \S
224 A. Kawachi, B. Rossman, and O. Watanabe

(Here we used x ∈/ S ⇒ P[z = x] < 2−n/4 ⇒ log(1/P[z = x]) > n/4.) Finally,
noting that |S| ≤ 2n/4 and P[x ∈ W] < 2−n/2 for all x ∈ {0, 1}n, we have
1
P[W ∩ S = ∅] ≤ P[x ∈ W] < n/4 .
x∈S
2

Combining Lemmas 3 and 4, we get our main lemma:

Lemma 5. For all monotone queries Q1 , . . . , Qm and every function f : {+, ⊥}m
→ {0, 1}n,
P[f (Q1 (W), . . . , Qm (W)) ∈ W] ≤ O(m/n2 ) + o(1).
Proof. By standard entropy inequalities,

H(f (Q1 (W), . . . , Qm (W))) ≤ H(Q1 (W), . . . , Qm (W))

≤ H(Q1 (W), . . . , Qm (W), k)
= H(k) + H(Q1 (W), . . . , Qm (W) | k)
≤ H(k) + H(Q1 (W) | k) + · · · + H(Qm (W) | k).
Since H(k) = log(n/2) and H(Qi (W) | k) = O(1/n) for all i by Lemma 3, we
have
H(f (Q1 (W), . . . , Qm (W))) ≤ O(m/n) + log n.
Since f (Q1 (W), . . . , Qm (W)) is a random variable on {0, 1}n, we can apply
Lemma 4 to get
4 1
P[f (Q1 (W), . . . , Qm (W)) ∈ W] ≤ H(f (Q1 (W), . . . , Qm (W))) + n/4
n 2
4 log n 1
≤ O(m/n2 ) + + n/4
n 2
= O(m/n2 ) + o(1).
Finally, we prove the main theorem of this section.
Theorem 2. (restated) The monotone query complexity of witness ﬁnding,
m(Monotone), is Ω(n2 ).
Proof. Let m = m(Monotone). By Lemma 1, there exist monotone queries
Q1 , . . . , Qm and a function f : {+, ⊥}m → {0, 1}n such that
P[f (Q1 (W), . . . , Qm (W)) ∈ W | W = ∅] > 1/2.
By Lemma 5 and the fact that P[W = ∅] = 1 − o(1),
P[f (Q1 (W), . . . , Qm (W)) ∈ W]
P[f (Q1 (W), . . . , Qm (W)) ∈ W | W = ∅] =
P[W = ∅]
≤ O(m/n2 ) + o(1).
It follows that 1/2 < O(m/n2 ) + o(1) and hence m = Ω(n2 ).
The Query Complexity of Witness Finding 225

3 Lower Bound for NP Queries

In this section, we prove Theorem 4 (m(NP) = Ω(n2 )). The main idea in the
proof involves showing that every NP query is well-approximated by a monotone
query. First, we give a normal form for NP queries.
Lemma 6. For every NP query Q, there exists a sequence (A1 , B1 ), . . . , (As , Bs )
where Ai , Bi ⊆ {0, 1}n and |Ai |, |Bi | ≤ nO(1) and Ai ∩ Bi = ∅ such that for all
W ⊆ {0, 1}n,

s
Q(W ) = + ⇐⇒ (Ai ⊆ W ) ∧ (Bi ∩ W = ∅).
i=1

Proof. Let M () be the nondeterministic Turing machine (with an unspeciﬁed

oracle) which defines Q, that is, Q(W ) = M W (1n ). Let t = nO(1) be the maxi-
mum running time of M () . For each accepting computation of M () on input 1n ,

there is a sequence σ = ((x1 , y1 ), . . . , (xt , yt )) ∈ ({0, 1}n × {+, ⊥})t , t ≤ t,
such that the computation makes oracle calls x1 , . . . , xt and receives answers
y1 , . . . , yt . Let Aσ := {xi : yi = +} and Bσ := {xi : yi = ⊥} and note that
|Aσ |, |Bσ | ≤ t ≤ t and Aσ ∩ Bσ = ∅. Let (A1 , B1 ), . . . , (As , Bs ) enumerate
pairs (Aσ , Bσ ) over all σ corresponding to accepting computations of M () . This
sequence (A1 , B1 ), . . . , (As , Bs ) satisfies the conditions of the lemma.
The next lemma gives the approximation of NP queries by monotone queries.
Let W continue to denote the random subset of {0, 1}n defined in the previous
section.
Lemma 7. For every NP query Q, there is a monotone query Q+ such that
P[Q(W) = Q+ (W)] = 2−Ω(n) .
Proof. Let (A1 , B1 ), . . . , (As , Bs ) be as in Lemma 6. Define Q+ by

s
Q+ (W ) = + ⇐⇒ (Ai ⊆ W ).
def

i=1

Clearly, Q+ is a monotone query and Q(W ) ⇒ Q+ (W ) (i.e. Q(W ) = + implies

Q+ (W ) = +). We have

P Q(W) = Q+ (W) = P ¬Q(W) ∧ Q+ (W)

s

s

=P (Ai W) ∨ (Bi ∩ W = ∅) ∧ (Ai ⊆ W)
i=1 i=1

s
i−1
≤P (Bi ∩ W = ∅) ∧ (Ai ⊆ W) ∧ (Ai W)
i=1 j=1

i−1

(4) ≤ max P Bi ∩ W = ∅ (Ai ⊆ W) ∧ (Ai W) ,
i
j=1
226 A. Kawachi, B. Rossman, and O. Watanabe

where this last inequality is justiﬁed by the fact that events {(Ai ⊆ W) ∧
i−1
j=1 (Ai W)} are mutually exclusive over i ∈ {1, . . . , s}.
Now ﬁx i which maximizes (4). We claim that

i−1

(5) P Bi ∩ W = ∅ (Ai ⊆ W) ∧ (Ai W) ≤ P[Bi ∩ W = ∅].
j=1

This may be seen as follows. For 1 ≤ k ≤ n/2, write Xk , Yk , Zk for events

i−1
Xk := {Bi ∩ Wk = ∅}, Yk := {Ai ⊆ Wk }, Zk := { j=1 y∈Ai \Aj (y ∈ / Wk )}.
i−1
First, note that Yk ∧ j=1 (Ai Wk ) is equivalent to Yk ∧ Zk . Next, note that
(Xk , Zk ) is independent of Yk (by the independence of events {x ∈ Wk } over x ∈
{0, 1}n and the fact that Ai ∩ Bi = ∅). Therefore, P[Xk | Yk ∧ Zk ] = P[Xk | Zk ].
Next, note that Xk is monotone increasing and Zk is monotone decreasing in
the lattice of subsets of {0, 1}n . By well-known correlation inequalities (the
FKG inequality, see Ch. 6 of [1]), it follows that P[Xk | Zk ] ≤ P[Xk ]. Therefore,
P[Xk | Yk ∧ Zk ] ≤ P[Xk ] for all 1 ≤ k ≤ n/2 and hence P[Xk | Yk ∧ Zk ] ≤ P[Xk ].
Finally, note that (5) is equivalent to the statement P[Xk | Yk ∧ Zk ] ≤ P[Xk ].
Picking up from (5), we have
|Bi | nO(1)
(6) P[Bi ∩ W = ∅] ≤ P[x ∈ W] ≤ = = 2−Ω(n) .
x∈Bi
2n/2 2n/2

Stringing together (4), (5) and (6), we conclude that P[Q(W) = Q+ (W)] =
2−Ω(n) .
Using this approximation of NP queries by monotone queries, we prove:
Theorem 4. (restated) The NP query complexity of witness ﬁnding, m(NP), is
Ω(n2 ).
Proof. Let m = m(NP). By Lemma 1, there exist NP queries Q1 , . . . , Qm and a
function f : {+, ⊥}m → {0, 1}n such that
P[f (Q1 (W), . . . , Qm (W)) ∈ W | W = ∅] > 1/2.
Let Q+ +
1 , . . . , Qm be monotone queries approximating Q1 , . . . , Qm as in Lemma
7. We have

1 (W), . . . , Qm (W)) ∈ W]
P[f (Q+ +

m
≥ P[f (Q1 (W), . . . , Qm (W)) ∈ W] − P[Qi (W) = Q+
i (W)]
i=1
m
= Ω(1) − .
2Ω(n)
On the other hand, by Lemma 5,
P[f (Q+
1 (W), . . . , Qm (W)) ∈ W] ≤ O(m/n ) + o(1).
+ 2

It follows that Ω(1) − m2−Ω(n) ≤ O(m/n2 ) + o(1), which is only possible if

m = Ω(n2 ).
The Query Complexity of Witness Finding 227

4 Aﬃne Witness Sets

At this point, we have shown that m(Intersection), m(Monotone) and m(NP)

are all Θ(n2 ) by a combination of our lower bound (Theorems 2 and 4) and
the upper bounds mentioned in §1. We now turn our attention to the setting of
affine witness sets. We would like to prove lower bounds on maffine (Intersection),
maffine (Monotone) and maffine (NP) using similar information-theoretic argu-
ments. We begin by considering the natural affine analogue of the random witness
set W. For all 0 ≤ k ≤ n, let Ak be the uniform random k-dimensional subspace
of {0, 1}n . Let k be uniform in {1, . . . , n/2} (as before) and let A := Ak .
Unfortunately, when we attempt to repeat the argument in §2, we get stuck
at Lemma 2 (the Bollobás-Thomason Theorem). In particular, in order to have
an appropriate version of Lemma 2(3) in the affine setting, we need a positive
answer the following question:

Question 2. Let Q be a non-trivial monotone increasing property of aﬃne sub-

spaces of {0, 1}n. For all 0 ≤ k ≤ n, let pk := P[Ak has property Q]. Let θ
be the unique index such that pθ ≤ 1/2 < pθ+1 . Is it necessarily true that
min{pk , 1 − pk } ≤ 2−|θ−k|+O(1) for all k?

In other words, Question 2 asks whether every monotone property has an

exponentially sharp threshold in the lattice of aﬃne subspaces of {0, 1}n.

Remark 1. We can ask a similar question with respect to the lattice Ln of linear
subspaces of {0, 1}n (we suspect that the answer is the same). Writing Pn (resp.
P2n ) for the lattice of subsets of [n] (resp. {0, 1}n), note that Ln has an ambigu-
ous status in relation to Pn and P2n : on the one hand, Ln is the “q-analogue” of
Pn ; on the other hand, Ln is a subset (in fact, a sub-meet-semilattice) of P2n . Us-
ing a q-analogue of the Kruskal-Katona Theorem due to Chowdhury and Patkos
[5], we can show that pk ≤ 2−Ω(θ/k) for all k < θ and 1 − pk ≤ 2−Ω((n−θ)/(n−k))
for all k > θ. This shows that the threshold behavior of monotone properties in
Ln scales at least like monotone properties in Pn . The linear version of Ques-
tion 2 asks whether the threshold behavior of monotone properties in Ln in fact
scales like monotone properties in P2n .

If the answer to Question 2 is “yes”, then we get maﬃne (Monotone) = Ω(n2 )

by using the same information-theoretic argument as in our proof of Theorem 2
in §2. While we were unable to answer Question 2 for general monotone queries,
the next theorem gives a positive answer in the special case where Q is an
intersection query.

Theorem 6. Let S be any subset of {0, 1}n. For all 0 ≤ k ≤ n, let pk :=

P[Ak ∩ S = ∅]. Let τ := n − log |S|. Then min{pk , 1 − pk } ≤ 2−|τ −k|+O(1) for all
k.

(Note that |θ − τ | = O(1) for θ as in Question 2.)

228 A. Kawachi, B. Rossman, and O. Watanabe

Proof. The case where k ≤ τ follows from a simple union bound. Let a1 , . . . , a2k
enumerate the elements of Ak in any order. Then

k k
2 2
|S|
pk = P[Ak ∩ S = ∅] ≤ P[ai ∈ S] = = 2−(τ −k) .
i=1 i=1
2n

The case k > τ requires a more careful argument. Let H be a uniform random
aﬃne hyperplane (i.e. (n − 1)-dimensional subspace) in {0, 1}n. (That is, H =
An−1 .)
1
Claim 1. For all λ > 0, P |S ∩ H| ≤ ( 12 − λ)|S| ≤ 2 .
4λ |S|
Proof (Proof of Claim 1). Let Z := |S ∩ H|. We have E[Z] = |S|/2 and

E[Z2 ] = P[x ∈ H] + P[x, y ∈ H]
x∈S x,y∈S : x=y

|S| 2n−1 − 1 1
= + |S|(|S| − 1) ≤ (|S| + |S|2 ).
2 2(2 − 1)
n 4

By Chebyshev’s inequality,
Var(Z) E[Z2 ] − E[Z]2 1
P Z ≤ ( 12 − λ)|S| ≤ P |Z − E[Z]| ≤ λ|S| ≤ 2 2 = ≤ 2 .
λ |S| λ2 |S|2 4λ |S|
Claim

Claim 2. Let S ⊆ {0, 1}n, let B = An−j be a uniform random aﬃne subspace
of {0, 1}n of co-dimension j, and let b = 2−1/4 . Then
2
+···+bj )
2j+4(1+b+b
P[B ∩ S = ∅] ≤ .
|S|

Proof. We argue by induction on j. In base case j = 0 (where B = {0, 1}n), the

lemma holds since P[B ∩ S = ∅] = 0.
For induction step, let j ≥ 1 and assume the lemma holds for j − 1. By the
induction hypothesis, for every aﬃne hyperplane H,
2 j−1
2j−1+4(1+b+b +···+b )
P[B ∩ S = ∅ | B ⊆ H] ≤ .
|S ∩ H|

Let H be a uniform random aﬃne hyperplane. Note that H is independent of

the event that B ⊆ H.
The Query Complexity of Witness Finding 229

Let λ := bj /4. We have

P[B ∩ S = ∅] = P[B ∩ S = ∅ | B ⊆ H]

≤ P B ∩ S = ∅ or |S ∩ H| < ( 12 − λ)|S| B ⊆ H

≤ P |S ∩ H| < ( 12 − λ)|S|

+ P B ∩ S = ∅ B ⊆ H and |S ∩ H| ≥ ( 12 − λ)|S|
2 j−1
1 2j−1+4(1+b+b +···+b )
≤ + (Claim 1 and ind. hyp.)
4λ |S|
2 ( 12 − λ)|S|
2j+4(1+b+b +···+b )
1
2 j−1

= 2(j+4)/2 + .
1 − (bj /2) |S|

Noting that 1 − (bj /2) ≥ 2−b , we have

2 j−1
2j+4(1+b+b +···+b ) 2
+···+bj−1 )+bj
2(j+4)/2 + ≤ 2(j+4)/2 + 2j+4(1+b+b
1 − (bj /2)
(1 + 2−(j+4)/2 )
2
+···+bj−1 )+bj
≤ 2j+4(1+b+b
2
+···+bj−1 )+bj 2−(j+4)/2
≤ 2j+4(1+b+b e
2 j−1 j
≤ 2j+4(1+b+b +···+b +b )
.
The proof is completed by combining the above inequalities. Claim

Returning to the proof of Theorem 6, we now show the case k > τ using Claim
2 as follows:
n−k
2n−k+4(1+b+···+b ) ∞
1 − pk = P[Ak ∩ S = ∅] ≤ ≤ 2τ −k+4 j=0 bj
≤ 2−(k−τ )+26 .
|S|
Therefore, max{pk , 1 − pk } ≤ 2−|τ −k|+O(1) , which completes the proof of the
theorem.
As a corollary of Theorem 6, we get:
Theorem 5. (restated) The intersection query complexity of affine witness find-
ing, maffine (Intersection), is Ω(n2 ).
Proof. We use the same information-theoretic argument as the proof of Theorem
2 in §2, except A plays the role of W and Theorem 6 plays the role of Lemma
2(3) (in particular, we require the bound H(pk ) ≤ (|τ − k| + O(1))/2|τ −k|−O(1) ,
which follows from Theorem 6).

5 Conclusion
We initiated the study of the information-theoretic witness ﬁnding problem.
For three natural classes of queries (intersection queries, monotone queries, NP
230 A. Kawachi, B. Rossman, and O. Watanabe

queries), we proved lower bounds of Ω(n2 ) on the query complexity of wit-

ness finding over arbitrary subsets of {0, 1}n. These lower bounds match upper
bounds coming from classic results of Valiant and Vazirani [8] and Ben-David
et al. [3]. In addition, we considered the setting where witness sets are affine
subspaces of {0, 1}n and proved a tight lower bound of Ω(n2 ) for intersection
queries. (All of our lower bounds hold even under the strong interpretation of
Ω, i.e., for all but finitely many n.) Our investigation of affine witness finding
led to an interesting and apparently new question about the threshold behavior
of monotone properties in the affine lattice (Question 2). Other questions left
open by this work are to resolve the monotone and NP query complexity of
affine witness finding (i.e. maffine (Monotone) and maffine (NP)). Finally, we won-
der whether the idea in §3 of approximating NP queries by monotone queries
might have other applications in complexity theory.

Acknowledgements. We thank Oded Goldreich for feedback on an earlier

manuscript. We are also grateful to the anonymous reviewers for their detailed
and extremely helpful comments.

References
1. Alon, N., Spencer, J.: The Probablistic Method, 3rd edn. Wiley (2008)
2. Bellare, M., Goldwasser, S.: The complexity of decision versus search. SIAM Journal
on Computing 23, 97–119 (1994)
3. Ben-David, S., Chor, B., Goldreich, O., Luby, M.: On the theory of average-case
complexity. Journal of Computer and System Sciences 44(2), 193–219 (1992)
4. Bollobás, B., Thomason, A.G.: Threshold functions. Combinatorica 7(1), 35–38
(1987)
5. Chowdhury, A., Patkos, B.: Shadows and intersections in vector spaces. J. of Com-
binatorial Theory, Ser. A 117, 1095–1106 (2010)
6. Cover, T., Thomas, J.: Elements of Information Theory. Wiley Interscience, New
York (1991)
7. Dell, H., Kabanets, V., van Melkebeek, D., Watanabe, O.: Is the Valiant-Vazirani
isolation lemma improvable? In: Proc. 27th Conference on Computational Complex-
ity, pp. 10–20 (2012)
8. Valiant, L., Vazirani, V.: NP is as easy as detecting unique solutions. Theoretical
Computer Science 47, 85–93 (1986)
9. Yao, A.C.: Probabilistic computations: toward a uniﬁed measure of complexity. In:
Proc. of the 18th IEEE Sympos. on Foundations of Comput. Sci., pp. 222–227. IEEE
(1977)
The Query Complexity of Witness Finding 231

A Proof of Lemma 1
In order to apply Yao’s minimax principle [9], we express m(W, Q) in terms of
a particular matrix M . Let F be the set of functions {+, ⊥}m → {0, 1}n. Let
A := Qm × F (representing the set of deterministic witness ﬁnding algorithms).
Let W0 := W \ {∅}. Finally, let M be the A × W0 -matrix deﬁned by

1 if f (Q1 (W ), . . . , Qm (W )) ∈ W,
M(Q1 ,...,Qm ;f ),W :=
0 otherwise.
In this context, Yao’s minimax principle states that for all random variables
W on W0 and (Q1 , . . . , Qm ; f ) on A,
min E[M(Q1 ,...,Qm ;f ),W ] ≤ max E[M(Q1 ,...,Qm ;f ),W ].
(Q1 ,...,Qm ;f )∈A W ∈W0

It follows that, if P[f (Q1 (W), . . . , Qm (W)) ∈ W] ≤ 1/2 for all Q1 , . . . , Qm ∈ Q

and every function f : {+, ⊥}m → {0, 1}n, then for all (Q1 , . . . , Qm ; f ) ∈ A
(including the special case where f is deterministic, as in the definition of witness
finding procedures), there exists W ∈ W0 such that P[f (Q1 (W ), . . . , Qm (W )) ∈
W ] ≤ 1/2. Therefore, the Q-query complexity of W-witness finding is > m.

B Proof of Lemma 2
For inequality (1), let Y1 , . . . , Y2i be independent copies of Wθ−i . Note that
i
P[x ∈ (Y1 ∪ · · · ∪ Y2i )] = 1 − (1 − 2θ−i−n )2 < 2θ−n = P[w ∈ Wθ ]
independently for all x ∈ {0, 1}n. Therefore, by monotonicity,
P[Q(Y1 ) ∨ · · · ∨ Q(Y2i )] ≤ P[Q(Y1 ∪ · · · ∪ Y2i )] ≤ P[Q(Wθ )].
Using independence of Y1 , . . . , Y2i , we have
2i i i
1/2 ≥ P[Q(Wθ )] ≥ P[ j=1 Q(Yj )] = 1 − P[¬Q(Wθ−i )]2 = 1 − (1 − pθ−i )2 .
i
Therefore, pθ−i ≤ 1 − (1/2)1/2 < (ln 2)/2i .
For inequality (2), let Z1 , . . . , Z2i be independent copies of Wθ+1 . By a similar
argument, we have
i i 1
pθ+i+1 = P[Q(Wθ+i+1 )] ≥ P[ 2j=1 Q(Zj )] = 1 − P[¬Q(Wθ+1 )]2 > 1 − 2i .
2
Finally, for inequality (3), note that for all p, q ∈ [0, 1],
0 ≤ min(p, 1 − p) ≤ q ≤ 1/2 =⇒ H(p) ≤ H(q) ≤ 2q log(1/q).
By this observation, together with (1) and (2), we have
ln 2 2i+1 i+2 1 i 1
H(pθ−i−1 ) ≤ 2 log( )< , H(pθ+i+1 ) ≤ 2 log(22 ) = 2i −i−1 .
2 i+1 ln 2 2i 22i 2
From these two inequalities, it follows that H(pk ) ≤ (|θ − k| + 1)/2|θ−k|−1 .
Primal Implication as Encryption

Vladimir N. Krupski

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University,

Moscow 119992, Russia
krupski@lpcs.math.msu.su

Abstract. We propose a “cryptographic” interpretation for the propo-

sitional connectives of primal infon logic introduced by Y. Gurevich and
I. Neeman and prove the corresponding soundness and completeness re-
sults. Primal implication ϕ →p ψ corresponds to the encryption of ψ with
a secret key ϕ, primal disjunction ϕ ∨p ψ is a group key and ⊥ reﬂects
some backdoor constructions such as full superuser permissions or a uni-
versal decryption key. For the logic of ⊥ as a universal key (it was never
considered before) we prove that the derivability problem has linear time
complexity. We also show that the universal key can be emulated using
primal disjunction.

1 Introduction
Primal Infon Logic ([1], [2], [3], [4], [5]) formalizes the concept of infon, i.e.
a message as a piece of information. The corresponding derivability statement
Γ / ϕ means that the principal can get (by herself, without any communication)
the information ϕ provided she already has all infons ψ ∈ Γ .
Primal implication (→p ) that is used in Primal Infon Logic to represent the
conditional information is a restricted form of intuitionistic implication deﬁned
by the following inference rules:
Γ /ψ Γ / ϕ Γ / ϕ →p ψ
(→p I) , (→p E) .
Γ / ϕ →p ψ Γ /ψ
These rules admit cryptographic interpretation of primal implication ϕ →p ψ
as some kind of digital envelop: it is an infon, containing the information ψ
encrypted by a symmetric key (generated from) ϕ. Indeed, the introduction
rule (→p I) allows to encrypt any available message by any key. Similarly, the
elimination rule (→p E) allows to extract the information from the ciphertext
provided the key is also available. So the infon logic incorporated into commu-
nication protocols ([1], [2]) is a natural tool for manipulating with commitment
schemes (see [7]) without detailed analysis of the scheme itself.
Example. (cf. [8]). Alice and Bob live in diﬀerent places and communicate via a
telephone line or by e-mail. They wish to play the following game distantly. Each
of them picks a bit, randomly or somehow else. If the bits coincide then Alice
wins; otherwise Bob wins. Both of them decide to play fair but don’t believe

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 232–244, 2014.

c Springer International Publishing Switzerland 2014
Primal Implication as Encryption 233

in the fairness of the opponent. To play fair means that they honestly declare
their choice of a bit, independently of what the other player said. So they use
cryptography.
We discuss the symmetric version of the coin ﬂipping protocol from [8] in
order to make the policies of both players the same. Consider the policy of one
player, say Alice. Her initial state can be represented by the context
Γ = {A said ma , A said ka , A IsTrustedOn ma , A IsTrustedOn ka },
where infons ma and ka represent the chosen bit and the key Alice intends to
use for encryption. Her choice is recorded by infons A said ma and A said ka
where A said is the quotation modality governed by the modal logic K.1 Alice
simply says, to herself, the infons ma and ka .
The remaining two members of Γ reﬂect the decision to play fair. The infon
X IsTrustedOn y abbreviates (X said y) →p y. It provides the ability to obtain
the actual value of y from the declaration X said y, so Alice can deduce the
actual ma and ka she has spoken about.
The commit phase. Alice derives ma and ka →p ma from her context by rules
(→p E), (→p I) and sends the infon ka →p ma to Bob. Bob acts similarly, so
Alice will receive a message from him and her context will be extended to
Γ = Γ ∪ {B said (kb →p mb )}.

The reveal phase. After updating the context Alice obtains ka by rule (→p E)
and sends it to Bob. He does the same, so Alice’s context will be
Γ = Γ ∪ { B said kb }.
Now by reasoning in K Alice deduces B said mb . She also has A said ma , so it
is clear to her who wins. Alice simply compares these infons with the patterns
B said 0, B said 1 and A said 0, A said 1 respectively.
The standard analysis of the protocol shows that Bob will come to the same
conclusion. Moreover, Alice can be sure that she is not cheated provided she
successively follows her policy up to the end.2 The same with Bob.
Note that infon logic is used here as a part of the protocol. It is one of the
tools that provide the correctness. But it does not prove the correctness. In order
to formalize and prove the correctness of protocols one should use much more
powerful formal systems.

We make our observation precise by deﬁning interpretations of purely propo-

sitional part of infon logic in “cryptographic” infon algebras and proving the
corresponding soundness and completeness theorems.
1
The only modal inference rule that is used in this paper is X said ϕ, X said (ϕ →p
ψ) X said ψ. It is admissible in K. For more details about modalities in the infon
logic see [4],[5].
2
Here we suppose that the encryption method is practically strong and unambiguous.
It is impossible for a player who does not know the encryption key to restore the
plaintext from a ciphertext. It is also impossible for him to generate two key-message
pairs with diﬀerent messages and the same ciphertext.
234 V.N. Krupski

In Section 2 this is done for the system P which is the {+, ∧, →p }-fragment
of infon logic. We also show that the quasi-boolean semantics for P (see [4]) is
essentially a special case of our semantics.
In Section 3 we show that ⊥ can be used to reﬂect some backdoor construc-
tions. Two variants are considered: system P[⊥] from [4] with the usual elimi-
nation rule for ⊥ and a new system P[⊥w ] with a weak form of elimination rule
for ⊥. The ﬁrst one treats ⊥ as a root password, and the second one — as a
universal key for decryption. For almost all propositional primal infon logics the
derivability problem has linear time complexity. We prove the same complexity
bound for P[⊥w ] in Section 4.
Finally we consider a system P[∨p ] which is the modal-free fragment of Basic
Propositional Primal Infon Logic PPIL from [5]. The primal disjunction ∨p in
P[∨p ] has usual introduction rules and no elimination rules. We treat it as a
group-key constructor and provide a linear time reduction of P[⊥w ] to P[∨p ]. It
thus gives another proof of linear time complexity bound for P[⊥w ].

2 Semantics for {, ∧, →p }-fragment

Let Σ be a ﬁnite alphabet, say Σ = {0, 1}. Let us ﬁx a total pairing function
π : (Σ ∗ ) → Σ ∗ with projections l, r : Σ ∗ → Σ ∗ , where Σ ∗ is the set of all
2

binary strings,
l(π(x, y)) = x, r(π(x, y)) = y, (1)
and two functions enc, dec : (Σ ∗ )2 → Σ ∗ such that enc is total and

dec(x, enc(x, y)) = y. (2)

String enc(x, y) will be treated as a ciphertext containing string y encrypted

with key x. Function dec is the decryption method that exploits the same key. In
this text we do not restrict ourselves to encryptions that are strong in some sense.
For example, enc(x, y) may be the concatenation of strings x and y. Then dec
on arguments x, y simply removes the prefix x from y. The totality of functions
l, r, dec is not supposed, but the left-hand parts of (1) and (2) must be defined
for all x, y ∈ Σ ∗ .
We also fix some set E ⊂ Σ ∗ , E = ∅. It will represent the information known
by everyone, for example, facts like 0 < 1 and 2 · 2 = 4. The structure A =
Σ ∗ , π, l, r, enc, dec, E will be referred as an infon algebra.3

Deﬁnition 1. A set M ⊆ Σ ∗ will be called closed if E ⊆ M and M satisﬁes

the following closure conditions:
1. a, b ∈ M ⇔ π(a, b) ∈ M ,
2. a, enc(a, b) ∈ M ⇒ b ∈ M ,
3. a ∈ Σ ∗ , b ∈ M ⇒ enc(a, b) ∈ M .
3
We use this term diﬀerently from [2] where infon algebras are semi-lattices with
information order “x is at least as informative as y”.
Primal Implication as Encryption 235

A closed set M represents the information that is potentially available to an

agent in a local state, i.e. between two consecutive communication steps of a
protocol. The information is represented by texts. M contains all public and
some private texts. The agent can combine several texts in a single multi-part
document using π function as well as to extract its parts by means of l and r.
She has access to the encryption tool enc, so she can convert a plaintext into a
ciphertext. The backward conversion (by dec) is also available provided she has
the encryption key.
Note that in the closure condition 3 we do not require that a ∈ M . The
agent will never need to decrypt the ciphertext enc(a, b) encrypted by herself
because she already has the plaintext b. The key a can be generated by some
trusted third party and sent to those who really need it. This is the case when
the encryption is used to provide secure communications between agents when
only the connections to the third party are secure (and the authentication is
reliable). On the other hand, some protocols may require the agent to distribute
keys by herself. Then she can use a key that is known to her or get it from the
third party. In the latter case a will be available in her new local state that will
be updated by the communication with the third party.
The natural deduction calculus for primal infon logic P is considered in [4].
The corresponding derivability relation Γ / ϕ is deﬁned by the following rules:

Γ /ϕ Γ / ϕ1 Γ, ϕ1 / ϕ2
(Weakening) (Cut )
/+ ϕ/ϕ Γ, Δ / ϕ Γ / ϕ2

Γ / ϕ1 Γ / ϕ2 Γ / ϕ1 ∧ ϕ2
(∧I) (∧Ei ) (i = 1, 2)
Γ / ϕ1 ∧ ϕ2 Γ / ϕi
Γ / ϕ2 Γ / ϕ1 Γ / ϕ1 →p ϕ2
(→p I) (→p E) .
Γ / ϕ1 →p ϕ2 Γ / ϕ2
Here ϕ, ϕ1 , ϕ2 are infons, i.e. the expressions constructed from the set At of
atomic infons by the grammar

ϕ ::= + | At | (ϕ ∧ ϕ) | (ϕ →p ϕ),

and Γ , Δ are sets of infons.

As usual, a derivation of ϕ from a set of assumptions Γ is a sequence of
infons ϕ1 , . . . , ϕn where ϕn = ϕ and each ϕk is either a member of Γ ∪ {+} or
is obtained from some members of {ϕj | j < k} by one of the rules
ϕ1 ϕ2 ϕ1 ∧ ϕ2 ϕ2 ϕ1 ϕ1 →p ϕ2
.
ϕ1 ∧ ϕ2 ϕi ϕ1 →p ϕ2 ϕ2

It is easy to see that Γ / ϕ iﬀ there exists a derivation of ϕ from Γ . So rules like

(Weakening) or (Cut ) from the deﬁnition of derivability relation are never used
in a derivation itself.
236 V.N. Krupski

Deﬁnition 2. An interpretation (of the infon language) is a pair I = A, v

where A = Σ ∗ , π, l, r, enc, dec, E is an infon algebra and v : At ∪ {+} → Σ ∗ is
a total evaluation that assigns binary strings to atomic infons and to constant
+, v(+) ∈ E. We assume that v is extended as follows:

v(ϕ1 ∧ ϕ2 ) = π(v(ϕ1 ), v(ϕ2 )), v(ϕ1 →p ϕ2 ) = enc(v(ϕ1 ), v(ϕ2 )),

v(Γ ) = {v(ϕ) | ϕ ∈ Γ }.
A model is a pair I, M where I is an interpretation and M ⊆ Σ ∗ is a closed
set.

In the paper [4] it is established that P is sound and complete with respect
to quasi-boolean semantics. A quasi-boolean model is a validity relation |= that
enjoys the following properties:

– |= +,
– |= ϕ1 ∧ ϕ2 ⇔ |= ϕ1 and |= ϕ2 ,
– |= ϕ2 ⇒ |= ϕ1 →p ϕ2 ,
– |= ϕ1 →p ϕ2 ⇒ |= ϕ1 or |= ϕ2 .

An infon ϕ is derivable in the infon logic P from the context Γ iﬀ |= Γ implies

|= ϕ for all quasi-boolean models |=.
It can be seen that the definition of a quasi-boolean model is essentially a
special case of Definition 2. Indeed, suppose that atomic infons are distinct words
in the unary alphabet {|}. Then all infons turn out to be words in some finite
alphabet Σ0 . Consider a translation · : Σ0∗ → {0, 1}∗ that maps all elements
of Σ0 into distinct binary strings of the same length, Λ = Λ for the empty
word Λ and a1 . . . an = a1 . . . an for a1 . . . , an ∈ Σ0 .
The corresponding infon algebra A and the evaluation v can be defined as
follows: v(a) = a for a ∈ At ∪ {+},

π(x, y) = ( x ∧ y ) , enc(x, y) = ( x →p y ) , E = {+}. (3)

Projections and the decryption function can be found from (1) and (2). Note
that for this interpretation the equality v(ϕ) = ϕ holds for every infon ϕ.
Consider a quasi-boolean model |=. Let M be the closure of the set M0 =
{ϕ ||= ϕ}, i.e. the least closed extension of M0 .

Lemma 3. |= ϕ iﬀ v(ϕ) ∈ M .

Proof. It is suﬃcient to prove that the set M \ M0 does not contain words of
the form v(ϕ). Any element b ∈ M \ M0 can be obtained from some elements of
M0 by a ﬁnite sequence of steps 1,2,3 that correspond to closure conditions:

1. x, y → ( x ∧ y ); ( x ∧ y ) → x; ( x ∧ y ) → y;

2. x, ( x →p y ) → y;
3. y → ( x →p y ).
Primal Implication as Encryption 237

The history of this process is a derivation of b from M0 with 1,2,3 treated as

inference rules. Let b = v(ϕ) and b1 , . . . , bn = b be the derivation. Consider the
(partial) top-down syntactic analysis of strings b1 , . . . , bn using patterns

( · ∧ · ), ( · →p · ), || . . . |.

We replace all substrings that remain unparsed by v(a) where a = || . . . | is some

fresh atomic infon. The resulting sequence c1 , . . . , cn is also a derivation of b
from M0 because any string of the from v(ψ) has no unparsed substrings. All
its members have the form ci = v(ϕi ) for some infons ϕi . Moreover, ϕ1 , . . . , ϕn
is a derivation of ϕ = ϕn in P from the set of hypotheses Γ = {ϕj | cj ∈ M0 }.
But |= Γ and P is sound with respect to quasi-boolean models, so |= ϕ and
b = v(ϕ) ∈ M0 . Contradiction.

Theorem 4. Γ / ϕ in P iﬀ v(ϕ) ∈ M for every model I, M with v(Γ ) ⊆ M .

Proof. The theorem states that the infon logic P is sound and complete with
respect to the class of models introduced by Deﬁnition 2. The soundness can be
proven by straightforward induction on the derivation of ϕ from Γ . The com-
pleteness follows from Lemma 3 and the completeness result for quasi-boolean
models (see [4]).

A set {v(ψ) | ψ ∈ T } ⊆ Σ ∗ will be called deductively closed if T / ψ implies

ψ ∈ T for all infons ψ, i.e. T is deductively closed in P. In the proof of Lemma 3
we actually establish that the particular interpretation A, v is conservative in
the following sense: the closure M of any deductively closed set M0 ⊆ Σ ∗ does
not contain “new” strings of the form v(ψ) ∈ M0 . It is also injective: v(ϕ1 ) =
v(ϕ2 ) implies ϕ1 = ϕ2 . An interpretation that enjoys these two properties will
be called plain.

Lemma 5. There exists a plain interpretation.

The completeness part of Theorem 4 can be strengthened.

Theorem 6. Let the interpretation I = A, v be plain. For any context Γ there
exists a model I, M with v(Γ ) ⊆ M such that Γ / ϕ implies v(ϕ) ∈ M for all
infons ϕ.

Proof. Let M be the closure of the set M0 = {v(ψ) | Γ / ψ}. Then v(Γ ) ⊆ M .
The set M0 is deductively closed, so M \ M0 does not contain strings of the form
v(ψ). Suppose Γ / ϕ. Then v(ϕ) ∈ M0 because the interpretation is injective.
Thus v(ϕ) ∈ M .
238 V.N. Krupski

3 Constant ⊥ and Backdoors

⊥ as Superuser Permissions
Infon logic P[⊥] is the extension of P by additional constant ⊥ that satisfies the
elimination rule
Γ /⊥
(⊥E) .
Γ /ϕ
The corresponding changes in Definition 2 are as follows. We add to the
alphabet a new letter f ∈ Σ and set Σ⊥ = Σ ∪ {f }, v(⊥) = f . Functions
∗
π, l, r, enc, dec act on words from Σ⊥ but still satisfy the conditions (1), (2).
∗
We suppose them to preserve Σ : the value should be a binary string provided
all arguments are. We also suppose that v(+) ∈ E ⊆ Σ ∗ and v(ϕ) ∈ Σ ∗ for
ϕ ∈ At and add new closure condition to Definition 1:
∗
4. f ∈ M , a ∈ Σ⊥ ⇒ a ∈ M.
Models for P[⊥] are all pairs I, M where I is an interpretation and M is a
closed set, both in the updated sense. The definition of plain interpretation is
just the same.
Constant ⊥ is some kind of root password that grants the superuser permis-
sions to its owner. The owner has the direct access to all the information available
in the system without any communication or decryption. At the same time ⊥
can be incorporated into some messages that will be used in communication.

⊥ as Universal Key
The root password provides the direct access to all the information in the system
including private information of any agent that was never sent to anybody else.
It is also natural to consider a restricted form of superuser permissions that
protect the privacy of agents but provide the ability to decrypt any available
ciphertext. It can be simulated by infon logic P[⊥w ] with constant ⊥ treated as
a universal key. The corresponding inference rule is a weak form of (⊥E) rule,
Γ /⊥ Γ / ϕ →p ψ
(⊥Ew ),
Γ /ψ
that has an additional premise Γ / ϕ →p ψ. So the owner of ⊥ can get an infon
only if she already has the same information as a ciphertext. The rule (⊥Ew ) is
really weaker than (⊥E) because ψ →p ψ is not derivable in P.
All definitions concerning models for P[⊥w ] are similar to the case of P[⊥]
with closure condition 4 replaced by
4 . f , enc(a, b) ∈ M ⇒ b ∈ M .
Essentially we extend the signature of infon algebras by additional (partial)
operation crack (x, y) that satisfies the equality
crack (f , enc(a, b)) = b (4)
and allow any agent to use it, so her local state satisfies the closure condition 4 .
Primal Implication as Encryption 239

Lemma 7. There exist plain interpretations for P[⊥] and for P[⊥w ].

Proof. We extend the example of plain interpretation for {+, ∧, →p }-fragment

from Section 2 (see (3)). Set ⊥ = f and extend the interpretation in accordance
with (3). The resulting interpretation is plain in the sense of P[⊥]. Indeed, it is
injective because f ∈ Σ. It is also conservative. In order to prove this we use the
construction from Lemma 3.
Let the set M0 = {v(ψ) | ψ ∈ T } ⊆ (Σ ∪ {f })∗ be deductively closed and M
be its closure. Suppose v(ϕ) ∈ M \ M0 for some infon ϕ. Then bn = v(ϕ) has a
derivation b1 , . . . , bn from M0 in the calculus with closure conditions considered
as inference rules:
1. x, y → ( x ∧ y ); ( x ∧ y ) → x; ( x ∧ y ) → y;
2. x, ( x →p y ) → y;
3. y → ( x →p y );
4. f → x.
Consider the (partial) top-down syntactic analysis of strings b1 , . . . , bn using
patterns

( · ∧ · ), ( · →p · ), || . . . |, f.

Replace all substrings that remain unparsed by v(a) where a = || . . . | is some

fresh atomic infon. The resulting sequence c1 , . . . , cn is also a derivation of v(ϕ)
from M0 because any string of the from v(ψ) has no unparsed substrings. All
its members have the form ci = v(ϕi ) for some infons ϕi and ϕ1 , . . . , ϕn is a
derivation of ϕ = ϕn in P[⊥] from the set of hypotheses T . But T is deductively
closed, so ϕ ∈ M0 . Contradiction.
Now set

b, if x = f and y = ( a →p b ),
crack (x, y) :=
undeﬁned, otherwise.

It satisﬁes the condition (4), so the interpretation for P[⊥w ] is deﬁned. One can
prove in a similar way that the interpretation is plain (w.r.t. P[⊥w ] ).

The completeness results from Section 2 hold for logics P[⊥] and P[⊥w ] too.
The proofs are essentially the same with one diﬀerence: the quasi-boolean se-
mantics from [4] does not cover the case of P[⊥w ]. Let L be one of the logics
P[⊥] or P[⊥w ].

Theorem 8. Γ / ϕ in L iﬀ v(ϕ) ∈ M for every model I, M of L with

v(Γ ) ⊆ M .

Proof. The soundness part can be proven by straightforward induction on

the derivation of ϕ from Γ . The completeness follows from Lemma 7 and
Theorem 9.
240 V.N. Krupski

Theorem 9. Let I be a plain interpretation of L. For any context Γ there exists

a model I, M of L with v(Γ ) ⊆ M such that Γ / ϕ implies v(ϕ) ∈ M for all
infons ϕ.

Proof. Similar to Theorem 6.

4 Decision Algorithm for P[⊥w ]

The derivability problems for infon logics P and P[⊥] are linear time decidable
([3], [4], [5]). We provide a decision algorithm for P[⊥w ] with the same complexity
bound.

Deﬁnition 10. (Positive atoms.) In what follows we assume that the language
of P also contains ⊥, but it is an ordinary member of At without any speciﬁc
inference rule for it. Let
At+ (ϕ) = {ϕ} for ϕ ∈ At ∪ {+, ⊥},
At+ (ϕ ∧ ψ) = At+ (ϕ) ∪ At+ (ψ),
At+ (ϕ →p ψ) = At+ (ψ).

For a context Γ set At+ (Γ ) = ϕ∈Γ At+ (ϕ).

Lemma 11. Let Γ / ⊥ in P[⊥w ]. Then Γ / ϕ in P[⊥w ] iﬀ At+ (ϕ) ⊆ At+ (Γ ).

Proof. Suppose Γ / ϕ. The inclusion At+ (ϕ) ⊆ At+ (Γ ) can be proved by

straightforward induction on the derivation of ϕ from Γ .
Now suppose that Γ / ⊥ and At+ (ϕ) ⊆ At+ (Γ ). By rules (∧ Ei ) and (⊥Ew )
we prove that Γ / ψ for every infon ψ ∈ At+ (Γ ). Then we derive Γ / ϕ by rules
(∧ I), (→p I).

Lemma 12. If Γ / ⊥ in P and Γ / ϕ in P[⊥w ] then Γ / ϕ in P.

Proof. Γ / ⊥ in P implies that Γ / ⊥ in P[⊥w ] because the shortest derivation

of ⊥ from Γ cannot use the (⊥Ew ) rule. So any derivation in P[⊥w ] from Γ
cannot use this rule.

The decision algorithm for P[⊥w ] consists of the following three steps:
1. Test whether Γ / ϕ in P. If yes, then Γ / ϕ in P[⊥w ] too. Else go to step 2.
2. Test whether Γ / ⊥ in P. If yes, then Γ / ϕ in P[⊥w ] by Lemma 12. Else
go to step 3.
3. We have Γ / ⊥ in P, so it is also true in P[⊥w ]. Test the condition At+ (ϕ) ⊆
At+ (Γ ). If it is fulﬁlled then Γ / ϕ in P[⊥w ]; otherwise Γ / ϕ in P[⊥w ]
(Lemma 11).
Primal Implication as Encryption 241

Linear time complexity bounds for steps 1,2 follow from the linear bound for
P. In order to prove the same bound for step 3 we use the preprocessing stage
of the linear time decision algorithm from [5]. It deals with sequents Γ / ϕ in a
language that extends the language of P[⊥w ]. The preprocessing stage is purely
syntactic, so it does not depend on the logic involved and can be used for P[⊥w ]
as well.
The algorithm constructs the parse tree for the sequent. Two nodes are
called homonyms if they represent two occurrences of the same infon. For every
homonymy class, the algorithm chooses a single element of it, the homonymy
leader, and labels all nodes with pointers that provide a constant time access
from a node to its homonymy leader. All this can be done in linear time (see
[5]).
Now it takes a single walk through the parse tree to mark by a special ﬂag
all homonymy leaders that correspond to infons ψ ∈ At+ (Γ ). One more walk is
required to test whether all homonymy leaders that correspond to ψ ∈ At+ (ϕ)
already have this ﬂag. Thus we have a linear time test for the inclusion At+ (ϕ) ⊆
At+ (Γ ).

Theorem 13. The derivability problem for infon logic P[⊥w ] is linear time de-
cidable.

5 Primal Disjunction and Backdoor Emulation

Primal infon logic with disjunction P[∨] was studied in [4]. It is deﬁned by all
rules of P and usual introduction and elimination rules for disjunction. P[∨]
can emulate the classical propositional logic, so the derivability problem for it is
co-NP-complete.
Here we consider the logic P[∨p ], an eﬃcient variant of P[∨]. It was mentioned
in [4] and later was incorporated into Basic Propositional Primal Infon Logic
PPIL [5] as its purely propositional fragment without modalities. In P[∨p ] the
standard disjunction is replaced by a “primal” disjunction ∨p with introduction
rules
Γ / ϕi
(∨p Ii ) (i = 1, 2)
Γ / ϕ1 ∨p ϕ2
and without elimination rules. It results in a linear-time complexity bound for
P[∨p ] (and for PPIL too, see [4],[5]).
When the primal implication is treated as encryption, the primal disjunction
can be used as a method to construct group keys. An infon of the form

(ϕ1 ∨p ϕ2 ) →p ψ (5)

represents a ciphertext that can be decrypted by anyone who has at least one of
the keys ϕ1 or ϕ2 . In P the same eﬀect can be produced by the infon

(ϕ1 →p ψ) ∧ (ϕ2 →p ψ), (6)

242 V.N. Krupski

but it requires two copies of ψ to be encrypted. Moreover, a principal A who

does not know both keys ϕ1 and ϕ2 fails to distinguish between (6) and (ϕ1 →p
ψ1 ) ∧ (ϕ2 →p ψ2 ). If A receives (6) from some third party and forwards it to
some principals B and C, she will never be sure that B and C will get the
same plaintext after decryption. Group keys eliminate the length growth and
ambiguity.
An infon algebra for P[∨p ] has an additional total operation gr : (Σ ∗ )2 → Σ ∗
for evaluation of primal disjunction: v(ϕ∨p ψ) = gr (v(ϕ), v(ψ)). The correspond-
ing closure condition in Deﬁnition 1 will be
5. If a ∈ M, b ∈ Σ ∗ or b ∈ M, a ∈ Σ ∗ then gr (a, b) ∈ M .
All the results of Section 3 (Lemma 7, Theorems 8, 9) hold for P[∨p ] too. The
proofs are essentially the same.

P[⊥w ] is linear-time reducible to P[∨p ], so P[∨p ] and PPIL can emulate the
backdoor based on a universal key. The reduction also gives another proof for
Theorem 13.
Remember that in the language of P[∨p ] symbol ⊥ denotes some regular
atomic infon. Consider the following translation:

q ∗ = q for q ∈ At ∪ {+, ⊥},

(ϕ ∧ ψ)∗ = ϕ∗ ∧ ψ ∗ ,
(ϕ →p ψ)∗ = (⊥ ∨p ϕ∗ ) →p ψ ∗ ,
Γ ∗ = {ϕ∗ | ϕ ∈ Γ }.

The transformation of Γ, ϕ into Γ ∗ , ϕ∗ can be implemented in linear time.

Theorem 14. Γ / ϕ in P[⊥w ] iﬀ Γ ∗ / ϕ∗ in P[∨p ].

Proof. Part “only if” can be proved by straightforward induction on the deriva-
tion of ϕ from assumptions Γ in P[⊥w ]. For any inference rule of P[⊥w ], its
translation is derivable in P[∨p ]. For example, consider the elimination rules for
→p and ⊥:

ϕ∗ ⊥
⊥ ∨p ϕ ∗
⊥ ∨p ϕ ∗
→p ψ ∗
⊥ ∨p ϕ∗ ⊥ ∨p ϕ∗ →p ψ ∗
, .
ψ∗ ψ∗
Part “if”. Let Γ ∗ / ϕ∗ in P[∨p ]. Note that P[∨p ] is the modal-free fragment
of PPIL and the shortest derivation of ϕ∗ from assumptions Γ ∗ in PPIL is also
a derivation in P[∨p ]. Let D be this derivation.
It is proved in [5] that any shortest derivation is local. For the case of P[∨p ]
it means that all formulas from D are subformulas of Γ ∗ , ϕ∗ . In particular, ∨p
occurs in D only in subformulas of the form ⊥ ∨p θ∗ .
Case 1. Suppose that the (∨p I1 ) rule is never used in D. Remove part “⊥∨p ”
from every subformula of the form ⊥ ∨p ψ that occurs in D. This transformation
Primal Implication as Encryption 243

eliminates ∨p and makes all steps correspondent to (∨p I2 ) rule trivial. The result
will be a derivation of ϕ from assumptions Γ in P. So Γ / ϕ in P[⊥w ] too.
Case 2. Suppose that the (∨p I1 ) rule is used in D. It has the form

⊥
, (7)
⊥ ∨p θ∗

so D also contains a derivation of ⊥. The corresponding subderivation is the

shortest one and does not use the (∨p I1 ) rule. By applying the transformation
from Case 1 we prove that Γ / ⊥ in P and ⊥ ∈ At+ (Γ ).
We extend Deﬁnition 10 with new item

At+ (ψ1 ∨p ψ2 ) = At+ (ψ1 ) ∪ At+ (ψ2 ),

so At+ (ψ) is deﬁned for every ψ in the language of P[∨p ]. Moreover, At+ (ϕ∗ ) =
At+ (ϕ) and At+ (Γ ∗ ) = At+ (Γ ). We claim that At+ (ϕ∗ ) ⊆ At+ (Γ ∗ ).
Indeed, consider D as a proof tree and its node ψ with At+ (ψ) ⊆ At+ (Γ ∗ )
whereas At+ (ψ ) ⊆ At+ (Γ ∗ ) holds for all predecessors ψ . The only rule that
can produce this eﬀect is (7), so ψ = ⊥ ∨p θ∗ for some θ where all occurrences
of “new” atoms q ∈ At+ (ψ) \ At+ (Γ ∗ ) are inside θ∗ .
Consider the path from the node ψ to the root node ϕ∗ and the trace of ψ
along it. There is no elimination rule for ∨p , so ψ cannot be broken into pieces.
All occurrences of positive atoms in θ∗ will be positive in all formulas along the
trace. But ∨p occurs in ϕ∗ only in the premise of primal implication, so the trace
does not reach the root node. Thus, at some step the formula containing ψ will
be eliminated and “new” atoms from θ∗ will never appear in At+ (ϕ∗ ):

⊥
⊥ ∨p θ∗
· ·
· ·
· ·
η1 [⊥ ∨p θ∗ ] η1 [⊥ ∨p θ∗ ] →p η2
η2

We have established that At+ (ϕ) ⊆ At+ (Γ ). But Γ / ⊥ in P and in P[⊥w ],

so Γ / ϕ in P[⊥w ] by Lemma 11.

Comment. It is also possible to reduce P[⊥w ] to P. The corresponding reduc-

tion is two-step translation. One should convert ϕ into ϕ∗ and then replace all
subformulas of the form (5) in it with (6). Unfortunately, the second step results
in the exponential growth of the length of a formula.

Acknowledgements. I would like to thank Yuri Gurevich, Andreas Blass and

Lev Beklemishev for valuable discussion, comments and suggestions.
The research described in this paper was partially supported by Microsoft
project DKAL and Russian Foundation for Basic Research (grant 14-01-00127).
244 V.N. Krupski

References
1. Gurevich, Y., Neeman, I.: DKAL: Distributed-Knowledge Authorization Language.
In: Proc. of CSF 2008, pp. 149–162. IEEE Computer Society (2008)
2. Gurevich, Y., Neeman, I.: DKAL 2 — A Simpliﬁed and Improved Authorization
Language. Technical Report MSR-TR-2009-11, Microsoft Research (February 2009)
3. Gurevich, Y., Neeman, I.: Logic of infons: the propositional case. ACM Transactions
on Computational Logic 12(2) (2011)
4. Beklemishev, L., Gurevich, Y.: Propositional primal logic with disjunction. J. of
Logic and Computation 22, 26 pages (2012)
5. Cotrini, C., Gurevich, Y.: Basic primal infon logic. Microsoft Research Technical
Report MSR-TR-2012-88, Microsoft Research (August 2012)
6. Troelstra, A., Schwichtenberg, H.: Basic proof theory. Cambridge Tracts in Theo-
retical Computer Science, vol. 43. Cambridge University Press, Cambridge (1996)
7. Goldreich, O.: Foundations of Cryptography: Volume 1, Basic Tools. Cambridge
University Press, Cambridge (2001)
8. Blum, M.: Coin Flipping by Telephone. In: Proceedings of CRYPTO, pp. 11–15
(1981)
Processing Succinct Matrices and Vectors

Markus Lohrey1 and Manfred Schmidt-Schauß2

1
Universität Siegen, Department für Elektrotechnik und Informatik, Germany
2
Institut für Informatik, Goethe-Universität, D-60054 Frankfurt, Germany

Abstract. We study the complexity of algorithmic problems for matrices that

are represented by multi-terminal decision diagrams (MTDD). These are a vari-
ant of ordered decision diagrams, where the terminal nodes are labeled with ar-
bitrary elements of a semiring (instead of 0 and 1). A simple example shows
that the product of two MTDD-represented matrices cannot be represented by an
MTDD of polynomial size. To overcome this deficiency, we extended MTDDs
to MTDD+ by allowing componentwise symbolic addition of variables (of the
same dimension) in rules. It is shown that accessing an entry, equality checking,
matrix multiplication, and other basic matrix operations can be solved in polyno-
mial time for MTDD+ -represented matrices. On the other hand, testing whether
the determinant of a MTDD-represented matrix vanishes is PSPACE-complete,
and the same problem is NP-complete for MTDD+ -represented diagonal ma-
trices. Computing a specific entry in a product of MTDD-represented matrices is
#P-complete. Complete proofs can be found in the full version [19] of this paper.

1 Introduction
Algorithms that work on a succinct representation of certain objects can nowadays be
found in many areas of computer science. A paradigmatic example is the use of OBDDs
(ordered binary decision diagrams) in hardware verification [5,21]. OBDDs are a suc-
cinct representation of Boolean functions. Consider a boolean function f (x1 , . . . , xn )
in n input variables. One can represent f by its decision tree, which is a full binary tree
of height n with {0, 1}-labelled leaves. The leaf that is reached from the root via the
path (a1 , . . . , an ) ∈ {0, 1}n (where ai = 0 means that we descend to the left child
in the i-th step, and ai = 1 means that we descend to the right child in the i-th step)
is labelled with the bit f (a1 , . . . , an ). This decision tree can be folded into a directed
acyclic graph by eliminating repeated occurrences of isomorphic subtrees. The result is
the OBDD for f with respect to the variable ordering x1 , . . . , xn .1 Bryant was the first
who realized that OBDDs are an adequate tool in order to handle the state explosion
problem in hardware verification [5].
OBDDs can be also used for storing large graphs. A graph G with 2n nodes and ad-
jacency matrix MG can be represented by the boolean function fG (x1 , y1 , . . . , xn , yn ),
where fG (a1 , b1 , . . . , an , bn ) is the entry of MG at position (a, b); here a1 · · · an (resp.,

The first (second) author is supported by the DFG grant LO 748/8-2 (SCHM 986/9-2).
1
Here, we are cheating a bit: In OBDDs a second elimination rule is applied that removes
nodes for which the left and right child are identical. On the other hand, it is known that
asymptotically the compression achieved by this elimination rule is negligible [31].

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 245–258, 2014.

c Springer International Publishing Switzerland 2014
246 M. Lohrey and M. Schmidt-Schauß

b1 · · · bn ) is the binary representation of the index a (resp. b). Note that we use the so
called interleaved variable ordering here, where the bits of the two coordinates a and
b are bitwise interleaved. This ordering turned out to be convenient in the context of
OBDD-based graph representation, see e.g. [10].
Classical graph problems (like reachability, alternating reachability, existence of a
Hamiltonian cycle) have been studied for OBDD-represented graphs in [9,30]. It turned
out that these problems are exponentially harder for OBDD-represented graphs than for
explicitly given graphs. In [30] an upgrading theorem for OBDD-represented graphs
was shown. It roughly states that completeness of a problem A for a complexity class
C under quantifier free reductions implies completeness of the OBDD-variant of A for
the exponentially harder version of C under polynomial time reductions.
In the same way as OBDDs represent boolean mappings, functions from {0, 1}n to
any set S can be represented. One simply has to label the leaves of the decision tree
with elements from S. This yields multi-terminal decision diagrams (MTDDs) [11]. Of
particular interest is the case, where S is a semiring, e.g. N or Z. In the same way as
an adjacency matrix (i.e., a boolean matrix) of dimension 2n can be represented by an
OBDD, a matrix of dimension 2n over any semiring can be represented by an MTDD.
As for OBDDs, we assume that the bits of the two coordinates a and b are interleaved
in the order a1 , b1 , . . . , an , bn . This implies that an MTDD can be viewed as a set of
rules of the form

A1,1 A1,2
A→ or B → a with a ∈ S. (1)
A2,1 A2,2
where A, A1,1 , A1,2 , A2,1 , and A2,2 are variables that correspond to certain nodes of
the MTDD (namely those nodes that have even distance from the root node). Every
variable produces a matrix of dimension 2h for some h ≥ 0, which we call the height
of the variable. The variables Ai,j in (1) must have the same height h, and A has height
h + 1. The variable B has height 0. We assume that the additive monoid of the semiring
S is finitely generated, hence every a ∈ S has a finite representation.
MTDDs yield very compact representations of sparse matrices. It was shown that
an (n × n)-matrix with m nonzero entries can be represented by an MTDD of size
O(m log n) [11, Thm. 3.2], which is better than standard succinct representations for
sparse matrices. Moreover, MTDDs can also yield very compact representations of non-
sparse matrices. For instance, the Walsh matrix of dimension 2n can be represented by
an MTDD of size O(n), see [11]. In fact, the usual definition of the n-th Walsh matrix
is exactly an MTDD. Matrix algorithms for MTDDs are studied in [11] as well, but no
precise complexity analysis is carried out. In fact, the straightforward matrix multiplica-
tion algorithm for multi-terminal decision diagrams from [11] has an exponential worst
case running time, and this is unavoidable: The smallest MTDD that produces the prod-
uct of two MTDD-represented matrices may be of exponential size in the two MTDDs,
see Thm. 2. The first main contribution of this paper is a generalization of MTDDs that
overcomes this deficiency: An MTDD+ consists of rules of the form (1) together with
addition rules of the form A → B + C, where “+” refers to matrix addition over the un-
derlying semiring. Here, A, B, and C must have the same height, i.e., produce matrices
of the same dimension. We show that an MTDD+ for the product of two MTDD+ -
represented matrices can be computed in polynomial time (Thm. 3). In Sec. 4.1 we also
Processing Succinct Matrices and Vectors 247

present efficient (polynomial time) algorithms for several other important matrix prob-
lems on MTDD+ -represented input matrices: computation of a specific matrix entry,
computation of the trace, matrix transposition, tensor and Hadamard product. Sec. 5
deals with equality checking. It turns out that equality of MTDD+ -represented matri-
ces can be checked in polynomial time, if the additive monoid is cancellative, in all
other cases equality checking is coNP-complete.
To the knowledge of the authors, complexity results similar to those from [9,30] for
OBDDs do not exist in the literature on MTDDs. Our second main contribution fills this
gap. We prove that already for MTDDs over Z it is PSPACE-complete to check whether
the determinant of the generated matrix is zero (Thm. 6). This result is shown by lifting
a classical construction of Toda [27] (showing that computing the determinant of an ex-
plicitly given integer matrix is complete for the counting class GapL) to configuration
graphs of polynomial space bounded Turing machines, which are of exponential size.
It turns out that the adjacency matrix of the configuration graph of a polynomial space
bounded Turing machine can be produced by a small MTDD. Thm. 6 sharpens a recent
result from [14] stating that it is PSPACE-complete to check whether the determinant of
a matrix that is represented by a boolean circuit (see Sec. 4.2) vanishes. We also prove
several hardness results for counting classes. For instance, computing a specific entry
of a matrix power An , where A is given by an MTDD over N is #P-complete (resp.
#PSPACE-complete) if n is given unary (resp. binary). Here, #P (resp. #PSPACE) is
the class of functions counting the number of accepting computations of a nondetermin-
istic polynomial time Turing machine [29] (resp., a nondeterministic polynomial space
Turing machine [15]). An example of a natural #PSPACE-complete counting problem
is counting the number of strings not accepted by a given NFA [15].

2 Related Work

Sparse Matrices and Quad-Trees. To the knowledge of the authors, most of the litera-
ture on matrix compression deals with sparse matrices, where most of the matrix entries
are zero. There are several succinct representations of sparse matrices. One of which are
quad-trees, used in computer graphics for the representation of large constant areas in
2-dimensional pictures, see for example [24,8]. Actually, an MTDD can be seen as a
quad-tree that is folded into a dag by merging identical subtrees.

Two-Dimensional Straight-Line Programs. MTDDs are also a special case of 2-

dimensional straight-line programs (SLPs). A (1-dimensional) SLP is a context-free
grammar in Chomsky normal form that generates exactly one OBDD. An SLP with n
rules can generate a string of length 2n ; therefore an SLP can be seen as a succinct rep-
resentation of the string it generates. Algorithmic problems that can be solved efficiently
(in polynomial time) on SLP-represented strings are for instance equality checking (first
shown by Plandowski [23]) and pattern matching, see [18] for a survey.
In [3] a 2-dimensional extension of SLPs (2SLPs in the following) was defined.
Here, every variable of the grammar generates a (not necessarily square) matrix (or pic-
ture), where every position is labeled with an alphabet symbol. Moreover, there are two
(partial) concatenation operations: horizontal composition (which is defined for two
248 M. Lohrey and M. Schmidt-Schauß

pictures if they have the same height) and vertical composition (which is defined for
two pictures if they have the same width). This formalism does not share all the nice al-
gorithmic properties of (1-dimensional) SLPs [3]: Testing whether two 2SLPs produce
the same picture is only known to be in coRP (co-randomized polynomial time). More-
over, checking whether an explicitly given (resp., 2SLP-represented) picture appears
within a 2SLP-represented picture is NP-complete (resp., Σ2P -complete). Related hard-
ness results in this direction concern the convolution of two SLP-represented strings
of the same length (which can be seen as a picture of height 2). The convolution of
strings u = a1 · · · an and v = b1 · · · bn is the string (a1 , b1 ) · · · (an , bn ). By a result
from [4] (which is stated in terms of the related operation of literal shuffle), the size
of a shortest SLP for the convolution of two strings that are given by SLPs G and H
may be exponential in the size of G and H. Moreover, it is PSPACE-complete to check
for two SLP-represented strings u and v and an NFA T operating on strings of pairs of
symbols, whether T accepts the convolution of u and v [17].
MTDDs restrict 2SLPs by forbidding unbalanced derivation trees. The derivation
tree of an MTDD results from unfolding the rules in (1); it is a tree, where every non-
leaf node has exactly four children and every root-leaf path has the same length.

Tensor Circuits. In [2,7], the authors investigated the problems of evaluating tensor
formulas and tensor circuits. Let us restrict to the latter. A tensor circuit is a circuit
where the gates evaluate to matrices over a semiring and the following operations are
used: matrix addition, matrix multiplication, and tensor product. Recall that the tensor
product of two matrices A = (ai,j )1≤i≤m,1≤i≤m and B is the matrix
⎛ ⎞
a1,1 B · · · a1,m B
⎜ .. ⎟
A ⊗ B = ⎝ ... . ⎠
an,1 B · · · an,m B
It is a (mk × nl)-matrix if B is a (k × l)-matrix. In [2] it is shown among other results
that computing the output value of a scalar tensor circuit (i.e., a tensor circuit that yields
a (1 × 1)-matrix) over the natural numbers is complete for the counting class #EXP.
An MTDD+ over Z can be seen as a tensor circuit that (i) does not use matrix multi-
plication and (ii) where for every tensor product the left factor is a (2 × 2)-matrix. To
see the correspondence, note that

A1,1 A1,2 10 01 00 00
= ⊗A1,1 + ⊗A1,2 + ⊗A2,1 + ⊗A2,2
A2,1 A2,2 00 00 10 01

a1,1 a1,2 a1,1 B a1,2 B
⊗B =
a2,1 a2,2 a2,1 B a2,2 B
Each of the matrices ai,j B can be generated from B and −B using log |ai,j | many
additions (here we use the fact that the underlying semiring is Z).

3 Preliminaries
We consider matrices over a semiring (S, +, ·) with (S, +) a finitely generated commu-
tative monoid with unit 0. The unit of the monoid (S, ·) is 1. We assume that 0 · a =
Processing Succinct Matrices and Vectors 249

a·0 = 0 for all a ∈ S. Hence, if |S| > 1, then 1 = 0 (0 = 1 implies a = 1·a = 0·a = 0
for all a ∈ S). With S n×n we denote the set of all (n × n)-matrices over S.
All time bounds in this paper implicitly refer to the RAM model of computation with
a logarithmic cost measure for arithmetical operations on integers, where arithmetic
operations on n-bit numbers need time O(n). For a number n ∈ Z let us denote with
bin(n) its binary encoding.
We assume that the reader has some basic background in complexity theory, in par-
ticular we assume that the reader is familiar with the classes NP, coNP, and PSPACE. A
function f : {0, 1}∗ → {0, 1}∗ belongs to the class FSPACE(s(n)) (resp. FTIME(s(n)))
if f can be computed on a deterministic Turing machine in space (resp., time) s(n).2 As
usual, only the space on the working tapes is counted. Moreover, the output is written
from left to right on the output tape, i.e., in each step the machine either outputs a new
symbol on the output tape, in which case the output head moves one cell to the right,
or the machine does not output a new symbol in which case the output head does not
move. Let FP = k≥1 FTIME(nk ) and FPSPACE = k≥1 FSPACE(nk ). Note that
for a function f ∈ FPSPACE we have |f (w)| ≤ 2|w|
O(1)
for every input.
The counting class #P consists of all functions f : {0, 1}∗ → N for which there
exists a nondeterministic polynomial time Turing machine M with input alphabet Σ
such that for all x ∈ Σ ∗ , f (x) is the number of accepting computation paths of M for
input x. If we replace nondeterministic polynomial time Turing machines by nonde-
terministic polynomial space Turing machines (resp. nondeterministic logspace Turing
machines), we obtain the class #PSPACE [15] (resp. #L [1]). Note that for a map-
ping f ∈ #PSPACE, the number f (x) may grow doubly exponential in |x|, whereas
for f ∈ #P, the number f (x) is bounded singly exponential in |x|. Ladner [15] has
shown that a mapping f : Σ ∗ → N belongs to #PSPACE if and only if the map-
ping x → bin(f (x)) belongs to FPSPACE. One cannot expect a corresponding re-
sult for the class #P: If for every function f ∈ #P the mapping x → bin(f (x))
belongs to FP, then by Toda’s theorem [28] the polynomial time hierarchy collapses
down to P. For f ∈ #L, the mapping x → bin(f (x)) belongs to NC2 and hence to
FP∩FSPACE(log2 (n)) [1, Thm. 4.1]. The class GapL (resp., GapP, GapPSPACE) con-
sists of all differences of two functions in #L (resp., #P, #PSPACE). From Ladner’s
result [15] it follows easily that a function f : {0, 1}∗ → Z belongs to GapPSPACE if
and only if the mapping x → bin(f (x)) belongs to FPSPACE, see also [12, Thm. 6].
Logspace reductions between functions can be defined analogously to the language
case: If f, g : {0, 1}∗ → X with X ∈ {N, Z}, then f is logspace reducible to g if there
exists a function h ∈ FSPACE(log n) such that f (x) = g(h(x)) for all x. Toda [27] has
shown that computing the determinant of a given integer matrix is GapL-complete.

4 Succinct Matrix Representations

In this section, we introduce several succinct matrix representations. We formally de-
fine multi-terminal decision diagrams and their extension by the addition operation.
Moreover, we briefly discuss the representation of matrices by boolean circuits.
2
The assumption that the input and output alphabet of f is binary is made here to make the
definitions more readable; the extension to arbitrary finite alphabets is straightforward.
250 M. Lohrey and M. Schmidt-Schauß

4.1 Multi-Terminal Decision Diagrams

Fix a semiring (S, +, ·) with (S, +) a finitely generated commutative monoid, and let
Γ ⊆ S be afinite generating set for (S, +). Thus, every element of S can be written as a
finite sum a∈Γ na a with na ∈ N. A multi-terminal decision diagram G with addition
(MTDD+ ) of height h is a triple (N, P, A0 ), where N is a finite set of variables which
is partitioned into non-empty sets Ni (0 ≤ i ≤ h), Nh = {A0 } (A0 is called the start
variable), and P is a set of rules of the following three forms:

A1,1 A1,2
– A → with A ∈ Ni and A1,1 , A1,2 , A2,1 , A2,2 ∈ Ni−1 for some
A2,1 A2,2
1≤i≤h
– A → A1 + A2 with A, A1 , A2 ∈ Ni for some 0 ≤ i ≤ h
– A → a with A ∈ N0 and a ∈ Γ ∪ {0}

Moreover, for every variable A ∈ N there is exactly one rule with left-hand side A,
and the relation {(A, B) ∈ N × N | B occurs in the right-hand side for A} is acyclic.
If A ∈ Ni then we say that A has height i. The MTDD+ G is called an MTDD if for
every addition rule (A → A1 + A2 ) ∈ P we have A, A1 , A2 ∈ N0 . In other words,
only scalars are allowed to be added. Since we assume that (S, +) is generated by Γ ,
this allows to produce arbitrary elements of S as matrix entries. For every A ∈ Ni
we define a square matrix val(A) of dimension 2i in the obvious way by unfolding
the rules. Moreover, let val(G) = val(A0 ) for the start variable A0 of G. This is a
(2h × 2h )-matrix. The size of a rule A → a with a ∈ Γ ∪ {0} is 1, all other rules
have size log |N |. The size |G| of the MTDD+ G is the sum of the sizes of its rules;
this is up to constant factors the length of the binary coding of G. An MTDD+ G of
size n log n can represent a (2n × 2n )-matrix. Note that only square matrices whose
dimension is a power of 2 can be represented. Matrices not fitting this format can be
filled up appropriately, depending on the purpose.
An MTDD, where all rules have the form A → a ∈ Γ ∪{0} or A → B +C generates
an element of the semiring S. Such an MTDD is an arithmetic circuit in which only
input gates and addition gates are used, and is called a +-circuit in the following. In
case the underlying semiring is Z, a +-circuit with n variables can produce a number of
size 2n , and the binary encoding of this number can be computed in time O(n2 ) from
the +-circuit (since, we need n additions of numbers with at most n bits). In general, for
a +-circuitover the semiring S, we can compute in quadratic time numbers na (a ∈ Γ )
such that a∈Γ na · a is the semiring element to which the +-circuit evaluates to.
Note that the notion of an MTDD+ makes sense for commutative monoids, since
we only used the addition of the underlying semiring. But soon, we want to multiply
matrices, for which we need a semiring. Moreover, the notion of an MTDD+ makes
sense in any dimension, here we only defined the 2-dimensional case.

Example 1. It is straightforward to produce the unit matrix I2n of dimension 2n by an

MTDD of size O(n log n):

Aj−1 0j−1 0j−1 0j−1
A0 → 1, 00 → 0, Aj → , 0j → (1 ≤ j ≤ n).
0j−1 Aj−1 0j−1 0j−1
Processing Succinct Matrices and Vectors 251

(the start variable is An here). In a similar way, one can produce the lower triangular
(2n × 2n )-matrix, where entries on the diagonal and below are 1. To produce the (2n ×
2n )-matrix over Z, where all entries in the k-th row are k, we need the following rules:

Ej−1 + Ej−1 Ej−1 + Ej−1
E0 → 1, Ej → (1 ≤ j ≤ n)
Ej−1 + Ej−1 Ej−1 + Ej−1

Cj−1 Cj−1
C0 → 1, Cj → (1 ≤ j ≤ n).
Cj−1 + Ej−1 Cj−1 + Ej−1
Here, we are bit more liberal with respect to the format of rules, but the above rules can
be easily brought into the form from the general definition of an MTDD+ . Note that
Ej generates the (2j × 2j )-matrix with all entries equal to 2j , and that Cn generates the
desired matrix.
Note that the matrix from the last example cannot be produced by an MTDD of poly-
nomial size, since it contains an exponential number of different matrix entries (for
the same reason it cannot be produced by an 2SLP [3]). This holds for any non-trivial
semiring.
Theorem 1. For any semiring with at least two elements, MTDD+ are exponentially
more succinct than MTDDs.
Proof. For simplicity we argue with MTDDs in dimension 1 (which generate vectors).
We must have 1 = 0 in S. Let m, d > 0 be such that m = 2d . For 0 ≤ i ≤ m − 1 let
Ai such that val(Ai ) has length m, the i-th entry is 1 (the first entry is the 0-th entry)
and all other entries are 0. Moreover, let Bi such that val(Bi ) is the concatenation of
2i copies of val(Ai ). Let C0 produce the 0-vector of length m = 2d , and for 0 ≤ i ≤
m − 1 let Ci+1 → (Ci , Ci + Bi ). Then val(Cm ) is of length 2d+m and consists of
the concatenation of all binary strings of length m. This MTDD+ for this vector is of
size O(m2 log m), whereas an equivalent MTDD must have size at least 2m , since for
every binary string of length m there must exist a nonterminal.

The following result shows that the matrix product of two MTDD-represented matrices
may be incompressible with MTDDs.
Theorem 2. For any semiring with at least two elements there exist MTDDs Gn and
Hn of the same height n and size O(n2 log n) such that val(Gn ) · val(Hn ) can only be
represented by an MTDD of size at least 2n .
On the other hand, the product of two MTDD+ -represented matrices can be represented
by a polynomially sized MTDD+ :
Theorem 3. For MTDD+ G1 and G2 of the same height one can compute in time
O(|G1 | · |G2 |) an MTDD+ G of size O(|G1 | · |G2 |) with val(G) = val(G1 ) · val(G2 ).
For the proof, we compute from G1 and G2 a new MTDD+ G that contains for
all variables A of G1 and B of G2 of the same height a variable (A, B) such that
valG (A, B) = valG1 (A) · valG2 (B).
The following proposition presents several further matrix operations that can be eas-
ily implemented in polynomial time for an MTDD+ -represented input matrix.
252 M. Lohrey and M. Schmidt-Schauß

Proposition 1. Let G, H be a MTDD+ with |G| = n, |H| = m, and 1 ≤ i, j ≤

2height(G)

(1) An MTDD+ for the transposition of val(G) can be computed in time O(n).
(2) +-circuits for the sum of all entries of val(G) and the trace of val(G) can be com-
puted in time O(n).
(3) A +-circuit for the matrix entry val(G)i,j can be computed in time O(n).
(4) MTDD+ of size O(n · m) for the tensor product val(G) ⊗ val(H) (which includes
the scalar product) and the element-wise (Hadamard) product val(G) ◦ val(H)
(assuming height(G) = height(H)) can be computed in time O(n · m).

4.2 Boolean Circuits

Another well-studied succinct representation are boolean circuits [13]. A boolean cir-
cuit with n inputs represents a binary string of length 2n , namely the string of output
values for the 2n many input assignments (concatenated in lexicographic order). In a
similar way, we can use circuits to encode large matrices. We propose two alternatives:
A boolean circuit C(x, y, z) with |x| = m and |y| = |z| = n encodes a (2n × 2n )-
m
matrix MC,2 with integer entries bounded by 22 that is defined as follows: For all
a ∈ {0, 1}m and b, c ∈ {0, 1}n, the a-th bit (in lexicographic order) of the matrix entry
at position (b, c) in MC is 1 if and only if C(a, b, c) = 1.
Note that in contrast to MTDD+ , the size of an entry in MC,2 can be doubly ex-
ponential in the size of the representation C (this is the reason for the index 2 in
MC,2 ). The following alternative is closer to MTDD+ : A boolean circuit C(x, y) with
|x| = |y| = n and m output gates encodes a (2n × 2n )-matrix MC,1 with integer entries
bounded by 2m that is defined as follows: For all a, b ∈ {0, 1}n, C(a, b) is the binary
encoding of the entry at position (a, b) in MC .
Circuit representations for matrices are at least as succinct as MTDD+ . More pre-
cisely, from a given MTDD+ G one can compute in logspace a Boolean circuit C such
that MC,1 = val(G). This is a direct corollary of Proposition 1(3) (stating that a given
entry of an MTDD+ -represented matrix can be computed in polynomial time) and the
fact that polynomial time computations can be simulated by boolean circuits. Recently,
it was shown that checking whether for a given circuit C the determinant of the ma-
trix MC,1 vanishes is PSPACE-complete [14]. An algebraic version of this result for
the algebraic complexity class VPSPACE is shown in [20]. Thm. 6 from Sec. 6 will
strengthen the result from [14] to MTDD-represented matrices.

5 Testing Equality

In this section, we consider the problem of testing equality of MTDD+ -represented

matrices. For this, we do not need the full semiring structure, but we only need the
finitely generated additive monoid (S, +). We will show that equality can be checked
in polynomial time if (S, +) is cancellative and coNP-complete otherwise.
First we consider the case of a finitely generated abelian group. The proof of the
following lemma involves only basic linear algebra.
Processing Succinct Matrices and Vectors 253

Lemma 1. Let ai,1 x1 + · · · + ai,n xn = 0 for 1 ≤ i ≤ m ≤ n + 1 be equations over

a torsion-free abelian group A, where ai,1 , . . . , ai,n ∈ Z, and the variables x1 , . . . , xn
range over A. One can determine in time polynomial in n and max{log |ai,j | | 1 ≤ i ≤
m, 1 ≤ j ≤ n} an equivalent set of at most n linear equations.
Recall that the exponent of an abelian group A is the smallest integer k (if it exists) such
that kg = 0 for all g ∈ A. The following result is shown in [25]:
Lemma 2. Let k ≥ 2 and let A be an abelian group of exponent k. Let ai,1 x1 + · · · +
ai,n xn = 0 for 1 ≤ i ≤ m ≤ n + 1 be equations, where ai,1 , . . . , ai,n ∈ Z, and the
variables x1 , . . . , xn range over A. Then one can determine in time polynomial in n,
log(k), and max{log |ai,j | | 1 ≤ i ≤ m, 1 ≤ j ≤ n} an equivalent set of at most n
linear equations.
Proof. We can consider the coefficients ai,j as elements from Zk . By [25] we can com-
(n+1)×n
pute the Howell normal form of the matrix (ai,j )1≤i≤n+1,1≤j≤n ∈ Zk in poly-
nomial time. The Howell normal form is an (n × n)-matrix with the same row span (a
subset of the module Znk ) as the original matrix, and hence defines an equivalent set of
linear equations.

Theorem 4. Let G be an MTDD+ over a finitely generated abelian group S. Given
two different variables A1 , A2 of the same height, it is possible to check val(A1 ) =
val(A2 ) in time polynomial in |G|.
Proof. Since every finitely generated group is a finite direct product of copies of Z and
Zk (k ≥ 2), it suffices to prove the theorem only for these groups.
Consider the case S = Z. The algorithm stores a system of m equations (m will
be bounded later) of the form ai,1 B1 + · · · + ai,k Bk = 0, where all B1 , . . . , Bk are
pairwise different variables of the same height h. We treat the variables B1 , . . . , Bk as
variables that range over the torsion-free abelian group Z2 ×2 . We start with the single
h h

equation A1 − A2 = 0. We use the rules of G to transform the system of equations into

another system of equations whose variables have strictly smaller height. Assume the
current height is h > 1. We iterate the following steps until only variables of height
h − 1 occur in the equations:
Step 1. Standardize equations: Transform all equations into the form a1 B1 + · · · +
am Bm = 0, where the Bi are different variables and the ai are integers.
Step 2. Reduce the number of equations, using Lemma 1 applied to the torsion-free
abelian group Z2 ×2 .
h h

Step 3. If a variable A of height h occurs in the equations, and the rule for A has the
form A → A1 + A2 , then replace every occurrence of A in the equations by A1 + A2 .
Step 4. If none of steps 1–3 applies to the equations, then only rules of the form

A1,1 A1,2
A→ (2)
A2,1 A2,2

are applicable to a variable A (of height h) occurring in the equations. Applying all
possible rules of this form for the current height results in a set of equations where all
254 M. Lohrey and M. Schmidt-Schauß

variables are (2 × 2)-matrices over variables of height h − 1 (like the right-hand side
of (2)). Hence, every equation can be decomposed into 4 equations, where all variables
are variables of height h − 1.
If the height of all variables is finally 0, then only rules of the form A → a are
applicable. In this case, replace all variables by the corresponding integers, and check
whether all resulting equations are valid or not. If all equations hold, then the input
equation holds, i.e., val(A1 ) = val(A2 ). Otherwise, if at least one equation is not valid,
then val(A1 ) = val(A2 ).
The number of variables in the equations is bounded by the number of variables of
G. An upper bound on the absolute value of the coefficients in the equations is 2|G| ,
since only iterated addition can be performed to increase the coefficients. Lemma 1
shows that the number of equations after step 2 above is at most |G|, (the bound for the
number of different variables).
For the case S = Zk the same procedure works, we only have to use Lemma 2
instead of Lemma 1.

Corollary 1. Let M be a finitely generated cancellative commutative monoid. Given
an MTDD+ G over M and two variables A1 and A2 of G, one can check val(A1 ) =
val(A2 ) in time polynomial in |G|.
Proof. A cancellative commutative monoid M embeds into its Grothendieck group A,
which is the quotient of M ×M by the congruence defined by (a, b) ≡ (c, d) if and only
if a + d = c + b in M . This is an abelian group, which is moreover finitely generated if
M is finitely generated. Hence, the result follows from Thm. 1.

Let us now consider non-cancellative commutative monoids:
Theorem 5. Let M be a non-cancellative finitely generated commutative monoid. It is
coNP-complete to check val(A1 ) = val(A2 ) for a given MTDD+ G over M and two
variables A1 and A2 of G.
Proof. We start with the upper bound. Let {a1 , . . . , ak } be a finite generating set of M .
Let G be an MTDD+ over M and let A1 and A2 two variables of G. Assume that A1
and A2 have the same height h. It suffices to check in polynomial time for two given
indices 1 ≤ i, j ≤ 2h whether val(A1 )i,j = val(A2 )i,j . From 1 ≤ i, j ≤ 2h we can
compute +-circuits for the matrix entries val(A1 )i,j and val(A2 )i,j . From these circuits
we can compute numbers n1 , . . . , nk , m1 , . . . , mk ∈ N in binary representation such
that val(A1 )i,j = n1 a1 +· · ·+nk ak and val(A2 )i,j = m1 a1 +· · ·+mk ak . Now we can
use the following result from [26]: There is a semilinear subset S ⊆ N2k (depending
only on our fixed monoid M ) such that for all x1 , . . . , xk , y1 , . . . , yk ∈ N we have:
x1 a1 + · · · + xk ak = y1 a1 + · · · + yk ak if and only if (x1 , . . . , xk , y1 , . . . , yk ) ∈ S.
Hence, we have to check, whether v =: (n1 , . . . , nk , m1 , . . . , mk ) ∈ S. The semilinear
set S is a finite union of linear sets. Hence, we can assume that S is linear itself. Let
S = {v0 + λ1 v1 + · · · + λl vl | λ1 , . . . , λl ∈ N},
where v0 , . . . , vl ∈ N2k . Hence, we have to check, whether there exist λ1 , . . . , λl ∈ N
such that v = v0 + λ1 v1 + · · · λl vl . This is an instance of integer programming in the
fixed dimension 2k, which can be solved in polynomial time [16].
Processing Succinct Matrices and Vectors 255

For the lower bound we take elements x, y, z ∈ M such that x = y but x+z = y +z.
These elements exist since M is not mcancellative. We use an encoding of 3SAT from
[3]. Take a 3CNF formula C = i=1 Ci over n propositional variables x1 , . . . , xn ,
and let Ci = (αj1 ∨ αj2 ∨ αj3 ), where 1 ≤ j1 < j2 < j3 ≤ n and every αjk is
either xjk or ¬xjk . For every 1 ≤ i ≤ m we define an MTDD Gi as follows: The
variables are A0 , . . . , An , and B0 , . . . , Bn−1 , where Bi produces the vector of length
2i with all entries equal to 0 (which corresponds to the truth value true, whereas z ∈ M
corresponds to the truth value false). For the variables A0 , . . . , An we add the following
rules: For every 1 ≤ j ≤ n with j ∈ {j1 , j2 , j3 } we take the rule Aj → (Aj−1 , Aj−1 ).
For every j ∈ {j1 , j2 , j3 } such that αj = xj (resp. αj = ¬xj ) we take the rule

Aj → (Aj−1 , Bj−1 ) ( resp. Aj → (Bj−1 , Aj−1 )).

Finally add the rule A0 → z and let An be the start variable of Gi . Moreover, let
G (resp. H) be the 1-dimensional MTDD that produces the vector consisting of 2n
many x-entries (resp. y-entries). Then, val(G) + val(G1 ) + · · · + val(Gm ) = val(H) +
val(G1 ) + · · · + val(Gm ) if and only if C is unsatisfiable.

It is worth noting that in the above proof for coNP-hardness, we use addition only at
the top level in a non-nested way.

6 Computing Determinants and Matrix Powers

In this section we present several completeness results for MTDDs over the rings Z
and Zn (n ≥ 2). It turns out that over these rings, computing determinants, iterated
matrix products, or matrix powers are infeasible for MTDD-represented input matrices,
assuming standard assumptions from complexity theory. All completeness results in this
section are formulated for MTDDs, but they remain valid if we add addition. In fact, all
upper complexity bounds in this section even hold for matrices that are represented by
circuits as explained in Sec. 4.2.
The value det(val(G)) for an MTDD G may be of doubly exponential size (and
hence needs exponentially many bits): The diagonal (2n ×2n )-matrix with 2’s on the di-
n
agonal has determinant 22 . We first show that checking whether the determinant of an
MTDD-represented matrix over any of the rings Z or Zn (n ≥ 2) vanishes is PSPACE-
complete, and that computing the determinant over Z is GapPSPACE-complete:

Theorem 6. The following holds for every ring S ∈ {Z} ∪ {Zn | n ≥ 2}:
(1) The set {G | G is an MTDD over S, det(val(G)) = 0} is PSPACE-complete.
(2) The function G → det(val(G)) with G an MTDD over Z is GapPSPACE-complete.

To prove this result we use a reduction of Toda showing that computing the determinant
of an explicitly given integer matrix is GapL-complete [27]. We apply this reduction
to configuration graphs of polynomial space bounded Turing machines, which are of
exponential size. It turns out that the adjacency matrix of the configuration graph of a
polynomial space bounded machine can be produced by a small MTDD (with terminal
entries 0 and 1). This was also shown in [9, proof of Thm. 7] in the context of OBDDs.
256 M. Lohrey and M. Schmidt-Schauß

Note that the determinant of a diagonal matrix is zero if and only if there is a zero-
entry on the diagonal. This can be easily checked in polynomial time for a diagonal
matrix produced by an MTDD. For MTDD+ (actually, for a sum of several MTDD-
represented matrices) we can show NP-completeness of this problem:

Theorem 7. It is NP-complete to check det(val(G1 ) + · · · + val(Gk )) = 0 for given

MTDDs G1 , . . . , Gk that produce diagonal matrices of the same dimension.

Our NP-hardness proof uses again the 3SAT encoding from [3] that we applied in the
proof of Thm. 5.
Let us now discuss the complexity of iterated multiplication and powering. Comput-
ing a specific entry, say at position (1, 1), of the product of n explicitly given matrices
over Z (resp., N) is known to be complete for GapL (resp., #L) [27]. Corresponding
results hold for the computation of the (1, 1)-entry of a matrix power An , where n is
given in unary notation. As usual, these problems become exponentially harder for ma-
trices that are encoded by boolean circuits (see Sec. 4.2). Let us briefly discuss two
scenarios (recall the matrices MC,1 and MC,2 defined from a circuit in Sec. 4.2).

Definition 1. For a tuple

C = (C1 , . . . , Cn ) of boolean circuits we can define the
matrix product MC = ni=1 MCi ,1 .

Lemma 3. The function C → (MC )1,1 , where every matrix MCi ,1 is over N (resp., Z),
belongs to #P (resp., GapP).

Definition 2. A boolean circuit C(w, x, y, z) with k = |w|, m = |x|, and n = |y| = |z|
encodes a sequence of 2k many (2n × 2n )-matrices: For every bit vector a ∈ {0, 1}k ,
define
the circuit Ca = C(a, x, y, z) and the matrix Ma = MCa ,2 . Finally, let MC =
a∈{0,1}k Ma be the product of all these matrices.

Lemma 4. The function C(w, x, y, z) → MC belongs to FPSPACE.

Lemmas 3 and 4 yield the upper complexity bounds in the following theorem. For the
lower bounds we use again succinct versions of Toda’s techniques from [27], similar to
the proof of Thm. 6.

Theorem 8. The following holds:

(1) The function (G, n) → (val(G)n )1,1 with G an MTDD over N (resp. Z) and n a
unary encoded number is complete for #P (resp., GapP).
(2) The function (G, n) → (val(G)n )1,1 with G an MTDD over N (resp. Z) and n a
binary encoded number is #PSPACE-complete (resp., GapPSPACE-complete).

By Thm. 8, there is no polynomial time algorithm that computes for a given MTDD G
and a unary number n a boolean circuit (or even an MTDD+ ) for the power val(G)n ,
unless #P = FP.
By [27] and Thm. 8, the complexity of computing a specific entry of a matrix power
An covers three different counting classes, depending on the representation of the ma-
trix A and the exponent n (let us assume that A is a matrix over N):
Processing Succinct Matrices and Vectors 257

– #L-complete, if A is given explicitly and n is given unary.

– #P-complete, if A is given by an MTDD and n is given unary.
– #PSPACE-complete, if A is given by an MTDD and n is given binary.
Let us also mention that in [6,12,22] the complexity of evaluating iterated matrix prod-
ucts and matrix powers in a fixed dimension is studied. It turns out that multiplying
a sequence of (d × d)-matrices over Z in the fixed dimension d ≥ 3 is complete for
the class GapNC1 (the counting version of the circuit complexity class NC1 ) [6]. It is
open whether the same problem for matrices over N is complete for #NC1 . Moreover,
the case d = 2 is open too. Matrix powers for matrices in a fixed dimension can be
computed in TC0 (if the exponent is represented in unary notation) using the Cayley-
Hamilton theorem [22]. Finally, multiplying a sequence of (d×d)-matrices that is given
succinctly by a boolean circuit captures the class FPSPACE for any d ≥ 3 [12].
For the problem, whether a power of an MTDD-encoded matrix is zero (a variant of
the classical mortality problem) we can finally show the following:

Theorem 9. It is coNP-complete (resp.,PSPACE-complete) to check whether val(G)m

is the zero matrix for a given MTDD G and a unary (resp., binary) encoded number m.

7 Conclusion and Future Work

We studied algorithmic problems on matrices that are given by multi-terminal decision
diagrams enriched by the operation of matrix addition. Several important matrix prob-
lems can be solved in polynomial time for this representation, e.g., equality checking,
computing matrix entries, matrix multiplication, computing the trace, etc. On the other
hand, computing determinants, matrix powers, and iterated matrix products are compu-
tationally hard. For further research, it should be investigated whether the polynomial
time problems, like equality test, belong to NC.

References
1. Àlvarez, C., Jenner, B.: A very hard log-space counting class. Theor. Comput. Sci. 107, 3–30
(1993)
2. Beaudry, M., Holzer, M.: The complexity of tensor circuit evaluation. Computational Com-
plexity 16(1), 60–111 (2007)
3. Berman, P., Karpinski, M., Larmore, L.L., Plandowski, W., Rytter, W.: On the complexity
of pattern matching for highly compressed two-dimensional texts. J. Comput. Syst. Sci. 65,
332–350 (2002)
4. Bertoni, A., Choffrut, C., Radicioni, R.: Literal shuffle of compressed words. In: Ausiello,
G., Karhumäki, J., Mauri, G., Ong, L. (eds.) TCS 2008. IFIP, vol. 273, pp. 87–100. Springer,
Heidelberg (2008)
5. Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Com-
puters 35(8), 677–691 (1986)
6. Caussinus, H., McKenzie, P., Thérien, D., Vollmer, H.: Nondeterministic NC1 computation.
Journal of Computer and System Sciences 57(2), 200–212 (1998)
7. Damm, C., Holzer, M., McKenzie, P.: The complexity of tensor calculus. Computational
Complexity 11(1-2), 54–89 (2002)
258 M. Lohrey and M. Schmidt-Schauß

8. Eppstein, D., Goodrich, M.T., Sun, J.Z.: Skip quadtrees: Dynamic data structures for multi-
dimensional point sets. Int. J. Comput. Geometry Appl. 18, 131–160 (2008)
9. Feigenbaum, J., Kannan, S., Vardi, M.Y., Viswanathan, M.: The complexity of problems on
graphs represented as obdds. Chicago J. Theor. Comput. Sci. (1999)
10. Fujii, H., Ootomo, G., Hori, C.: Interleaving based variable ordering methods for ordered
binary decision diagrams. In: Proc. ICCAD 1993, pp. 38–41. IEEE Computer Society (1993)
11. Fujita, M., McGeer, P.C., Yang, J.C.-Y.: Multi-terminal binary decision diagrams: An effi-
cient data structure for matrix representation. Formal Methods in System Design 10(2/3),
149–169 (1997)
12. Galota, M., Vollmer, H.: Functions computable in polynomial space. Inf. Comput. 198(1),
56–70 (2005)
13. Galperin, H., Wigderson, A.: Succinct representations of graphs. Inform. and Control 56,
183–198 (1983)
14. Grenet, B., Koiran, P., Portier, N.: On the complexity of the multivariate resultant. J. Com-
plexity 29(2), 142–157 (2013)
15. Ladner, R.E.: Polynomial space counting problems. SIAM J. Comput. 18, 1087–1097 (1989)
16. Lenstra, H.: Integer programming with a fixed number of variables. Mathematics of Opera-
tions Research 8, 538–548 (1983)
17. Lohrey, M.: Leaf languages and string compression. Inf. Comput. 209, 951–965 (2011)
18. Lohrey, M.: Algorithmics on SLP-compressed strings: a survey. Groups Complex. Cryptol. 4,
241–299 (2012)
19. Lohrey, M., Schmidt-Schauß, M.: Processing Succinct Matrices and Vectors. arXiv (2014),
http://arxiv.org/abs/1402.3452
20. Malod, G.: Succinct algebraic branching programs characterizing non-uniform complexity
classes. In: Owe, O., Steffen, M., Telle, J.A. (eds.) FCT 2011. LNCS, vol. 6914, pp. 205–
216. Springer, Heidelberg (2011)
21. Meinel, C., Theobald, T.: Algorithms and Data Structures in VLSI Design: OBDD - Foun-
dations and Applications. Springer (1998)
22. Mereghetti, C., Palano, B.: Threshold circuits for iterated matrix product and powering.
ITA 34(1), 39–46 (2000)
23. Plandowski, W.: Testing equivalence of morphisms in context-free languages. In: van
Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)
24. Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley (1990)
25. Storjohann, A., Mulders, T.: Fast algorithms for linear algebra modulo N . In: Bilardi, G.,
Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 139–150.
Springer, Heidelberg (1998)
26. Taĭclin, M.A.: Algorithmic problems for commutative semigroups. Dokl. Akda. Nauk SSSR
9(1), 201–204 (1968)
27. Toda, S.: Counting problems computationally equivalent to computing the determinant.
Technical Report CSIM 91-07, Tokyo University of Electro-Communications (1991)
28. Toda, S.: PP is as hard as the polynomial-time hierarchy. SIAM J. Comput. 20, 865–877
(1991)
29. Valiant, L.G.: Completeness classes in algebra. In: Proc. STOC 1979, pp. 249–261. ACM
(1979)
30. Veith, H.: How to encode a logical structure by an OBDD. In: Proc. 13th Annual IEEE
Conference on Computational Complexity, pp. 122–131. IEEE Computer Society (1998)
31. Wegener, I.: The size of reduced OBDD’s and optimal read-once branching programs for
almost all boolean functions. IEEE Trans. Computers 43(11), 1262–1269 (1994)
Constraint Satisfaction
with Counting Quantiﬁers 2

Barnaby Martin1, and Juraj Stacho2

1
School of Science and Technology, Middlesex University
The Burroughs, Hendon, London NW4 4BT, U.K.
2
IEOR Department, Columbia University,
500 West 120th Street, New York, NY 10027, United States

Abstract. We study constraint satisfaction problems (CSPs) in the

presence of counting quantifiers ∃≥j , asserting the existence of j distinct
witnesses for the variable in question. As a continuation of our previous
(CSR 2012) paper [11], we focus on the complexity of undirected graph
templates. As our main contribution, we settle the two principal open
questions proposed in [11]. Firstly, we complete the classification of clique
templates by proving a full trichotomy for all possible combinations of
counting quantifiers and clique sizes, placing each case either in P, NP-
complete or PSPACE-complete. This involves resolution of the cases in
which we have the single quantifier ∃≥j on the clique K2j . Secondly, we
confirm a conjecture from [11], which proposes a full dichotomy for ∃ and
∃≥2 on all finite undirected graphs. The main thrust of this second result
is the solution of the complexity for the infinite path which we prove is
a polynomial-time solvable problem. By adapting the algorithm for the
infinite path we are then able to solve the problem for finite paths, and
then trees and forests. Thus as a corollary to this work, combining with
the other cases from [11], we obtain a full dichotomy for ∃ and ∃≥2 quan-
tifiers on finite graphs, each such problem being either in P or NP-hard.
Finally, we persevere with the work of [11] in exploring cases in which
there is dichotomy between P and PSPACE-complete, and contrast this
with situations in which the intermediate NP-completeness may appear.

1 Introduction
The constraint satisfaction problem CSP(B), much studied in artificial intelli-
gence, is known to admit several equivalent formulations, two of the best known
of which are the query evaluation of primitive positive (pp) sentences – those
involving only existential quantification and conjunction – on B, and the homo-
morphism problem to B (see, e.g., [9]). The problem CSP(B) is NP-complete in
general, and a great deal of effort has been expended in classifying its complexity
for certain restricted cases. Notably it is conjectured [7,4] that for all fixed B,
the problem CSP(B) is in P or NP-complete. While this has not been settled
in general, a number of partial results are known – e.g. over structures of size

The author was supported by EPSRC grant EP/L005654/1.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 259–272, 2014.

c Springer International Publishing Switzerland 2014
260 B. Martin and J. Stacho

at most three [13,3] and over smooth digraphs [8,1]. A popular generalization
of the CSP involves considering the query evaluation problem for positive Horn
logic – involving only the two quantifiers, ∃ and ∀, together with conjunction.
The resulting quantified constraint satisfaction problems QCSP(B) allow for a
broader class, used in artificial intelligence to capture non-monotonic reasoning,
whose complexities rise to PSPACE-completeness.
In this paper, we continue the project begun in [11] to study counting quanti-
fiers of the form ∃≥j , which allow one to assert the existence of at least j elements
such that the ensuing property holds. Thus on a structure B with domain of size
n, the quantifiers ∃≥1 and ∃≥n are precisely ∃ and ∀, respectively.
We study variants of CSP(B) in which the input sentence to be evaluated
on B (of size |B|) remains positive conjunctive in its quantifier-free part, but is
quantified by various counting quantifiers.
For X ⊆ {1, . . . , |B|}, X = ∅, the X-CSP(B) takes as input a sentence given
by a conjunction of atoms quantified by quantifiers of the form ∃≥j for j ∈ X.
It then asks whether this sentence is true on B.
In [11], it was shown that X-CSP(B) exhibits trichotomy as B ranges over
undirected, irreflexive cycles, with each problem being in either L, NP-complete
or PSPACE-complete. The following classification was given for cliques.
Theorem 1. [11] For n ∈ N and X ⊆ {1, . . . , n}:
& '
(i) X-CSP(Kn ) is in L if n ≤ 2 or X ∩ 1, . . . , %n/2& = ∅.
(ii) X-CSP(Kn ) is NP-complete if n > 2 and X = {1}.
(iii) X-CSP(Kn ) is PSPACE-complete
& if n '> 2 and either j ∈ X for 1 < j <
n/2 or {1, j} ⊆ X for j ∈ n/2, . . . , n .

Precisely the cases {j}-CSP(K2j ) are left open here. Of course, {1}-CSP(K2)
is graph 2-colorability and is in L, but for j > 1 the situation was very unclear,
and the referees noted specifically this lacuna.
In this paper we settle this question, and find the surprising situation that
{2}-CSP(K4) is in P while {j}-CSP(K2j ) is PSPACE-complete for j ≥ 3. The
algorithm for the case {2}-CSP(K4) is specialized and non-trivial, and consists
in iteratively constructing a collection of forcing triples where we proceed to look
for a contradiction.
As a second focus of the paper, we continue the study of {1, 2}-CSP(H).
In particular, we focus on finite undirected graphs for which a dichotomy was
proposed in [11]. As a fundamental step towards this, we first investigate the
complexity of {1, 2}-CSP(P∞), where P∞ denotes the infinite undirected path.
We find tractability here in describing a particular unique obstruction, which
takes the form of a special walk, whose presence or absence yields the answer to
the problem. Again the algorithm is specialized and non-trivial, and in carefully
augmenting it, we construct another polynomial-time algorithm, this time for
all finite paths. This then proves the following theorem.

Theorem 2. {1, 2}-CSP(Pn) is in P, for each n ∈ N.

A corollary of this is the following key result.

Constraint Satisfaction with Counting Quantiﬁers 2 261

Corollary 1. {1, 2}-CSP(H) is in P, for each forest H.

Combined with the results from [8,11], this allows us to observe a dichotomy
for {1, 2}-CSP(H) as H ranges over undirected graphs, each problem being either
in P or NP-hard, in turn settling a conjecture proposed in [11].

Corollary 2. Let H be a graph.

(i) {1, 2}-CSP(H) is in P if H is a forest or is bipartite with a 4-cycle,
(ii) {1, 2}-CSP(H) is NP-hard in all other cases.

In [11], the main preoccupation was in the distinction between P and NP-
hard. Here we concentrate our observations to show situations in which we have
sharp dichotomies between P and PSPACE-complete. In particular, for bipar-
tite graphs, we are able to strengthen the above results in the following manner.

Theorem 3. Let H be a bipartite graph.

(i) {1, 2}-CSP(H) is in P if H is a forest or is bipartite with a 4-cycle,
(ii) {1, 2}-CSP(H) is PSPACE-complete in all other cases.

Note that this cannot be strengthened further for non-bipartite graphs, since
there are NP-complete cases (for instance when H is the octahedron K2,2,2 )
and the situation regarding the NP-complete cases is less clear.
Taken together, our work seems to indicate a rich and largely uncharted com-
plexity landscape that these types of problems constitute. The associated com-
binatorics to this landscape appears quite complex and the absence of a simple
algebraic approach is telling. We will return to the question of algebra in the
final remarks of the paper.
The paper is structured as follows. In §2, we describe a characterization and a
polynomial time algorithm for {2}-CSP(K4). In §3, we show PSPACE-hardness
for {n}-CSP(K2n) for n ≥ 3. In §4, we characterize {1, 2}-CSP for the infinite
path P∞ and describe the resulting polynomial algorithm. Then, in §5, we gen-
eralize this to finite paths and prove Theorem 2 and associated corollaries. Sub-
sequently, in §6, we discuss the P/PSPACE-complete dichotomy of bipartite
graphs, under {1, 2}-CSP. Finally in §7, we illustrate some situations in which the
intermediate NP-completeness arises by discussing cases with loops on vertices.
We conclude the paper in §8 by giving some final thoughts.

1.1 Preliminaries

Our proofs use the game characterization and structural interpretation from [11].
For completeness, we summarize it here. This is as follows.
Given an input Ψ for X-CSP(B), we deﬁne the following game G (Ψ, B):

Deﬁnition 1. Let Ψ := Q1 x1 Q2 x2 . . . Qm xm ψ(x1 , x2 , . . . , xm ). Working from

the outside in, coming to a quantiﬁed variable ∃≥j x, the Prover (female) picks
a subset Bx of j elements of B as witnesses for x, and an Adversary (male)
chooses one of these, say bx , to be the value of x, denoted by f (x).
262 B. Martin and J. Stacho

Prover wins if f is a homomorphism to B, i.e., if B |= ψ(f (x1 ), f (x2 ), . . . , f (xm )).

Lemma 1. Prover has a winning strategy in the game G (Ψ, B) iﬀ B |= Ψ .

Deﬁnition 2. Let H be a graph. For an instance Ψ of X-CSP(H):

– deﬁne Dψ to be the graph whose vertices are the variables of Ψ and edges are
between variables vi , vj for which E(vi , vj ) appears in Ψ .
– denote ≺ the total order of variables of Ψ as they are quantiﬁed in the formula
(from left to right).

We follow the customary graph-theoretical notation with V (G), E(G) denot-

ing the vertex set and edge set of a graph G, and Kn , Cn , and Pn denoting
respectively the complete graph (clique), the cycle, and the path on n vertices.

2 Algorithm for {2}-CSP(K4 )

Theorem 4. {2}-CSP(K4) is decidable in polynomial time.

The template K4 has vertices {1, 2, 3, 4} and all possible edges between distinct
vertices. Consider the instance Ψ of {2}-CSP(K4) as a graph G = Dψ together
with a linear ordering ≺ on V (G) (see Definition 2).
We iteratively construct the following three sets: R+ , R− , and F . The set F
will be a collection of unordered pairs of vertices of G, while R+ and R− will
consist of unordered triples of vertices. (For simplicity we write xy ∈ F in place
of {x, y} ∈ F , and write xyz ∈ R+ or R− in place of {x, y, z} ∈ R+ or R− .)
The meaning of these sets is as follows. A pair xy ∈ F where x ≺ y indicates
that Prover in order to win must offer values so that the value f (x) chosen by
Adversary for x is different from the value f (y) chosen for y. A triple xyz ∈ R+
where x ≺ y ≺ z indicates that if Adversary chose f (x) = f (y), then Prover
must offer one (or both) of f (x), f (y) for z. A triple xyz ∈ R− where x ≺ y ≺ z
tells us that Prover must offer values different from f (x), f (y) if f (x) = f (y).
With this, we describe how to iteratively compute the three sets F , R+ , R− .
We start by initializing the sets as follows: F = E(G) and R+ = R− = ∅. Then
we perform the following rules as long as possible:
(X1) If there are x, y, z ∈ V (G) such that {x, y} ≺ z where xz, yz ∈ F , then
add xyz into R− .
(X2) If there are vertices x, y, w, z ∈ V (G) such that {x, y, w} ≺ z with wz ∈ F
and xyz ∈ R− , then add xyw into R+ .
(X3) If there are x, y, w, z ∈ V (G) such that {x, y, w} ≺ z with wz ∈ F and
xyz ∈ R+ , then if {x, y} ≺ w, then add xyw into R−
else add xw and yw into F .
(X4) If there are vertices x, y, w, z ∈ V (G) such that {x, w} ≺ y ≺ z with
xyz ∈ R+ and wyz ∈ R− , then add xw into F , and add xwy into R+ .
(X5) If there are vertices x, y, w, z ∈ V (G) such that {x, y, w} ≺ z where either
xyz, wyz ∈ R+ , or xyz, wyz ∈ R− , then add xyw into R+ .
Constraint Satisfaction with Counting Quantifiers 2 263

(X6) If there are vertices x, y, q, w, z ∈ V (G) such that {x, y, w} ≺ q ≺ z where

either xyz, wqz ∈ R+ , or xyz, wqz ∈ R− , then add xyw and xyq into R+ .
(X7) If there are vertices x, y, q, w, z ∈ V (G) such that {x, y, w} ≺ q ≺ z where
either xyz ∈ R+ and wqz ∈ R− , or xyz ∈ R− and wqz ∈ R+ , then add
xyq into R− , and if {x, y} ≺ w, also add xyw into R− ,
else add xw and yw into F .
Theorem 5. The following are equivalent:
(i) K4 |= Ψ
(ii) Prover has a winning strategy in G (Ψ, K4 ).
(iii) Prover can play so that in every instance of the game, the resulting mapping
f : V (G) → {1, 2, 3, 4} satisﬁes the following properties:
(S1) For every xy ∈ F , we have: f (x) = f (y).
(S2) For every xyz ∈ R+ such that x ≺ y ≺ z: & '
if f (x) = f (y), then f (z) ∈ f (x), f (y) .
(S3) For every xyz ∈ R− such that x ≺ y ≺ z: & '
if f (x) = f (y), then f (z) ∈ f (x), f (y) .
(iv) there is no triple xyz in R+ such that x ≺ y ≺ z and (see Fig. 1)
– xz ∈ F or yz ∈ F ,
– or xwz ∈ R− for some w ≺ z (possibly w = y),
– or ywz ∈ R− for some y ≺ w ≺ z.

≺ = left-to-right order
x y z x y z w x y z xy ∈ F
x y

y xyz ∈ R+
x z
y
y y xyz ∈ R−
x z x w z x z
w

Fig. 1. Forbidden conﬁgurations from item (iv) of Theorem 5

Proof. (Sketch) (i) ⇐⇒ (ii) is by deﬁnition. (iii)⇒(ii) is implied by the fact

that F ⊇ E(G), and that by (iii) Prover can play to satisfy (S1). Thus in every
instance of the game the mapping f is a homomorphism of G to K4 ⇒ (ii).
Then to complete the proof, we show the implications (ii)⇒(iii), (iii)⇒(iv),
and (iv)⇒(iii). This is done by analysis of possible cases.
For (iii)⇒(iv), we show that in the presence of the obstruction from (iv),
Adversary can play to violate (iii). For (iv)⇒(iii), we let Prover make choices
to satisfy (iii), ﬁrst for triples in R+ , then triples in R− , and ﬁnally edges in F .
Assuming (iv), this will be a winning strategy. For (ii)⇒(iii), we consider the
vertex v where (iii) fails and choose v to be largest with respect to the order ≺.
Assuming (ii) will imply an earlier such a vertex and lead to a contradiction.

With this characterization, we can now prove Theorem 4 as follows.

264 B. Martin and J. Stacho

Proof. (Theorem 4) By Theorem 5, it suﬃces to construct the sets F , R+ ,

and R− , and check the conditions of item (iv) of the said theorem. This can
clearly be accomplished in polynomial time, since each of the three sets contains
at most n3 elements, where n is the number of variables in the input formula,
and elements are only added (never removed) from the sets. Thus either a new
pair (triple) needs to be added as follows from one of the rules (X1)-(X7), or we
can stop and the output the resulting sets.

3 Hardness of {n}-CSP(K2n ) for n ≥ 3

Theorem 6. {n}-CSP(K2n ) is PSPACE-complete for all n ≥ 3.

The template K2n consists of vertices {1, 2, . . . , 2n} and all possible edges
between distinct vertices. We shall call these vertices colours. We describe a re-
duction from the PSPACE-complete [2] problem QCSP(Kn )={1, n}-CSP(Kn)
to {n}-CSP(K2n). Consider an instance of QCSP(Kn ), namely a formula Ψ where
Ψ = ∃≥b1 v1 ∃≥b2 v2 . . . ∃≥bN vN ψ
where each bi ∈ {1, n}. As usual (see Definition 2), let G denote the graph Dψ
with vertex set {v1 , . . . , vN } and edge set {vi vj | E(vi , vj ) appears in ψ}.
We construct an instance Φ of {n}-CSP(K2n ) with the property that Ψ is a
yes-instance of QCSP(Kn ) if and only if Φ is a yes-instance of {n}-CSP(K2n).
In short, we shall model the n-colouring using 2n − 1 colours, n − 1 of which
will treated as don’t care colours (vertices coloured using any of such colours will
be ignored). We make sure that the colourings where no vertex is assigned a
don’t-care colour precisely model all colourings that we need to check to verify
that Ψ is a yes-instance.
We describe Φ by giving a graph H together with a total order of its vertices
with the usual interpretation that the vertices are the variables of Φ, the total
order is the order of quantification of the variables, and the edges of H define
the conjunction of predicates E(·, ·) which forms the quantifier-free part φ of Φ.

x y w q z a b c
...
un u4 u3 u2 u1

Fig. 2. The edge gadget (here, as an example, x is an ∃ vertex while y is a ∀ vertex)

Constraint Satisfaction with Counting Quantiﬁers 2 265

We start constructing H by adding the vertices v1 , v2 , . . . , vN and no edges.

Then we add new vertices u1 , u2 , . . . , un and make them pairwise adjacent.
We make each vi adjacent to u1 , and if bi = n (i.e. if vi was quantified ∀),
then we also make vi adjacent to u2 , u3 , . . . , un .
We complete H by introducing for each edge xy ∈ E(G), a gadget consisting
of new vertices w, q, z, a, b, c with edges wa, wb, qb, qc, za, zb, and we connect this
gadget to the rest of the graph as follows: we make x adjacent to a, make y
adjacent to b, make a adjacent to u1 , make c adjacent to u1 , u2 , u3 , and make
each of a, b, c adjacent to u4 , . . . , un . We refer to Figure 2 for an illustration.
The total order of V (H) first lists u1 , u2 , . . . , un , then v1 , v2 , . . . , vN (exactly
in the same order as quantified in Ψ ), and then lists the remaining vertices of
each gadget, in turn, as depicted in Figure 2 (listing w, q, z, a, b, c in this order).
We consider the game G (Φ, K2n ) of Prover and Adversary played on Φ where
Prover and Adversary take turns, for each variable in Φ in the order of quantifi-
cation, respectively providing a set of n colours and choosing a colour from the
set. Prover wins if this process leads to a proper 2n-colouring of H (no adjacent
vertices receive the same colour), otherwise Prover loses and Adversary wins.
The formula Φ is a yes-instance if and only if Prover has a winning strategy.
Without loss of generality (up to renaming colours), we may assume that the
vertices u1 , u2 , . . . , un get assigned colours n + 1, n + 2, . . . , 2n, respectively, i.e.
each ui gets colour n+i. (The edges between these vertices make sure that Prover
must offer distinct colours while Adversary has no way of forcing a conflict, since
there are 2n colours available.)
The claim of Theorem 6 will then follow from the following two lemmas.

Lemma 2. If Adversary is allowed to choose for the vertices x, y in the edge

gadget (Figure 2) the same colour from {1, 2, . . . , n}, then Adversary wins. If
Adversary is allowed to choose n + 1 for x or y, then Adversary also wins.
In all other cases, Prover wins.

Lemma 3. Φ is a yes-instance of {n}-CSP(K2n) if and only if Ψ is a yes-

instance of QCSP(Kn ).

We ﬁnish the proof by remarking that the construction of Φ is polynomial in

the size of Ψ (in fact the reduction is in L). Thus, since QCSP(Kn ) is PSPACE-
hard, so is {n}-CSP(K2n ). This completes the proof of Theorem 6.

4 Algorithm for {1, 2}-CSP(P∞ )

We consider the inﬁnite path P∞ to be the graph whose vertex set is Z and
whose edges are {ij : |i − j| = 1}. An instance to {1, 2}-CSP(P∞) is a graph
G = Dψ , a total order ≺ on V (G), and a function β : V (G) → {1, 2} where

Ψ := ∃≥β(v1 ) v1 ∃≥β(v2 ) v2 · · · ∃≥β(vn ) vn E(vi , vj )
vi vj ∈E(G)
We write X ≺ Y if x ≺ y for each x ∈ X and each y ∈ Y . Also, we write x ≺ Y
in place of {x} ≺ Y . A walk of G is a sequence x1 , x2 , . . . , xr of vertices of G
266 B. Martin and J. Stacho

where xi xi+1 ∈ E(G) for all i ∈ {1, . . . , r − 1}. A walk x1 , . . . , xr is a closed walk
if x1 = xr . Write |Q| to denote the length of the walk Q (number of edges on Q).

Deﬁnition 3. If Q = x1 , . . . , xr is a walk of G, we deﬁne λ(Q) as follows:

r−1

λ(Q) = |Q| − 2 β(xi ) − 1
i=2
Put diﬀerently, we assign weights to the vertices of G, with weight +1 assigned
to each ∃≥2 node, and weight −1 to each ∃≥1 node; the value λ(Q) is then simply
the total weight of all inner nodes in the walk Q.

Deﬁnition 4. A walk x1 , . . . , xr of G is a looping walk if x1 = xr and if r ≥ 3

(i) {x1 , xr } ≺ {x2 , . . . , xr−1 }, and
(ii) there is ∈ {1, r} such that both x1 , . . . , x and x , . . . , xr are looping walks.

The above is a recursive deﬁnition. Note that endpoints of a looping walk

are distinct and never appear in the interior of the walk. Other vertices, how-
ever, may appear on the walk multiple times as long as the walk obeys (ii).
Notably, it is possible that the same vertex is one of x2 , . . . , x−1 as well as one
of x−1 , . . . , xr−1 where is as deﬁned in (ii). See Figure 3 for examples.
Using looping walks, we deﬁne a notion of “distance” in G that will guide
Prover in the game.

Definition
( 5. For vertices u, v ∈ V (G), define δ(u, v) to be the following: )

min λ(Q) Q = x1 , . . . , xr is a looping walk of G where x1 = u and xr = v .
If no looping walk between u and v exists, define δ(u, v) = ∞.

In other words, δ(u, v) denotes the smallest λ-value of a looping walk between
u and v. Note that δ(u, v) = δ(v, u), since the deﬁnition of a looping walk does
not prescribe the order of the endpoints of the walk.
The main structural obstruction in our characterization of is the following.

Deﬁnition 6. A bad walk of G is a looping walk Q = x1 , . . . , xr of G such that

x1 ≺ xr and λ(Q) ≤ β(xr ) − 2.

4.1 Characterization
Theorem 7. Suppose that G is a bipartite graph. Then the following statements
are equivalent.
(I) P∞ |= Ψ
(II) Prover has a winning strategy in G (Ψ, P∞ ).
(III) Prover can play G (Ψ, P∞ ) so that in every instance of the game, the result-
ing mapping f satisﬁes the following for all u, v ∈ V (G) with δ(u, v) < ∞:
|f (u) − f (v)| ≤ δ(u, v) , ( )
f (u) + f (v) + δ(u, v) is an even number . ())
(IV) There are no u, v ∈ V (G) where u ≺ v such that δ(u, v) ≤ β(v) − 2 .
(V) There is no bad walk in G.
Constraint Satisfaction with Counting Quantiﬁers 2 267

≺ is the left-to-right order

v1 v2 v3 v4 v5 v6 v7 v8 v9
β: ∃ ∃≥2 ∃≥2 ∃≥2 ∃ ∃≥2 ∃ ∃ ∃≥2
Example looping walks:
Q∗ = v1 , v9 , v8 , v7 , v2 |Q∗ | = 4 λ(Q∗ ) = 4 − 2 · 1 = 2
Q = v1 , v9 , v8 , v7 , v6 , v5 , v4 , v3 , v4 , v5 , v6 , v7 , v2 |Q| = 12
{v1 , v2 } ≺ {v3 , . . . , v9 } λ(Q) = 12 − 2 · 6 = 0
We decompose Q into looping walks:
Q1 = v1 , v9 , v8 , v7 , v6 , v5 , v4 , v3 λ(Q1 ) = 7 − 2 · 3 = 1
Q2 = v2 , v7 , v6 , v5 , v4 , v3 λ(Q2 ) = 5 − 2 · 2 = 1
{v1 , v2 } ≺ v3 ≺ {v4 , . . . , v9 }
Note that Q is a bad walk, while neither Q∗ nor Q1 nor Q2 is.

Fig. 3. Examples of looping walks

Proof. (Sketch) We prove the claim by considering individual implications. The

equivalence (I)⇔(II) is proved as Lemma 1. The equivalence (IV)⇔(V) follows
immediately from the deﬁnitions of δ(·, ·) and bad walk. The other implications
are proved as follows. For (III)⇒(II), we show that Prover’s strategy described in
(III) is a winning strategy. For (II)⇒(III), we show that every winning strategy
must satisfy the conditions of (III). For (III)⇒(IV), we show that having vertices
u ≺ v with δ(u, v) ≤ β(v) − 2 allows Adversary to win, by playing along the bad
walk deﬁned by vertices u, v. Finally, for (IV)⇒(III), assuming no bad pair u, v,
we describe a Prover’s strategy satisfying (III).

We conclude this section by remarking that the values δ(u, v) can be easily
computed in polynomial time by dynamic programming. This allows us to test
conditions of the above theorem and thus decide {1, 2}-CSP(P∞) in polytime.

5 Algorithm for {1, 2}-CSP(Pn )

The path Pn has vertices {1, 2, . . . , n} and edges {ij : |i − j| = 1}.
Let Ψ be an instance of {1, 2}-CSP(Pn). As usual, let G be the graph Dψ
corresponding to Ψ , and let ≺ be the corresponding total ordering of V (G).
For simplicity, let us assume that G is connected and bipartite with white and
black vertices forming the bipartition. (If it is not bipartite, there is no solution;
if disconnected, we solve the problem independently on each component.)
We start with a warmup lemma.

Lemma 4. Assume P∞ |= Ψ . Let f be the ﬁrst vertex in the ordering ≺. Then

(i) P1 |= Ψ ⇐⇒ G is the single ∃≥1 vertex f .
(ii) P2 |= Ψ ⇐⇒ G does not contain ∃≥2 vertex except possibly for f .
(iii) P3 |= Ψ ⇐⇒ all ∃≥2 vertices in G have the same colour.
268 B. Martin and J. Stacho

(iv) P4 |= Ψ ⇐⇒ all ∃≥2 vertices in G are pairwise non-adjacent except

possibly for f .
(v) P5 |= Ψ ⇐⇒ there is colour C (black or white) such that each edge xy
between two ∃≥2 vertices where x ≺ y is such that x has colour C.

We now expand this lemma to the general case of {1, 2}-CSP(Pn) as follows.
Recall that we proved that P∞ |= Ψ if and only if Prover can play G (Ψ, P∞ )
so that in every instance of the game, the resulting mapping f satisfies ( ) and
()). In fact the proof of (III)⇒(II) from Theorem 7 shows that every winning
strategy of Prover has this property. We use this fact in the subsequent text.
The following value γ(v) will allow us to keep track of the distance of f (v)
from the center of the (finite) path.
Definition 7. For each vertex v we define γ(v) recursively as follows:
γ(v) = 0 if v is firstin the ordering ≺

*
else γ(v) = β(v) − 1 + max 0, max γ(u) − δ(u, v) + β(v) − 1
u≺v

Lemma 5. Let M be a real number. Suppose that P∞ |= Ψ and that Prover plays
a winning strategy in the game G (Ψ, P∞ ). Then Adversary can play so that the
resulting mapping f satisﬁes |f (v) − M | ≥ γ(v) for every vertex v ∈ V (Dψ ).
Lemma 6. Let M be a real number. Suppose that P∞ |= Ψ . Then there exists a
winning strategy for Prover such that in every instance of the game the resulting
mapping f satisﬁes |f (v) − M | ≤ γ(v) + 1 for every v ∈ V (Dψ ).
With these tools, we can now prove a characterization of the case of even n.
Theorem 8. Let n ≥ 4 be even. Assume that P∞ |= Ψ . Then TFAE.
(I) Pn |= Ψ .
(II) Prover has a winning strategy in the game G (Ψ, Pn ).
(III) There is no vertex v with γ(v) ≥ n2 .

Proof. Note first that since n is even, we may assume, without loss of generality,
the first vertex in the ordering is quantified ∃≥1 . If not, we can freely change its
quantifier to ∃≥1 without affecting the satisfiability of the intance.
(I)⇔(II) is by Lemma 1. For (II)⇒(III), assume there is v with γ(v) ≥ n2
and Prover has a winning strategy in G (Ψ, Pn ). This is also a winning strategy
in G (Ψ, P∞ ). This allows us to apply Lemma 5 for M = n+1 2 to conclude that
Adversary can play against Prover so that |f (v)− n+1 2 | = |f (v)−M | ≥ γ(v) ≥ n2 .
Thus either f (v) ≥ 2 > n or f (v) ≤ 2 < 1. But then f (v) ∈ {1, . . . , n}
2n+1 1

contradicting our assumption that Prover plays a winning strategy.

For (III)⇒(II), assume that γ(v) ≤ n2 −1 for all vertices v. We apply Lemma 6
for M = n+1 2 . This tells us that Prover has a winning strategy on G (Ψ, P∞ ) such
that in every instance of the game, if f is the resulting mapping, the mapping
satisfies |f (v) − n+1 2 | ≤ γ(v) + 1 for every vertex v. From this we conclude that
f (v) ≥ n+1
2 − γ(v) − 1 ≥ n+1
2 − 2 = 2 and that f (v) ≤
n 1 2n+1
2 = n + 12 . Therefore
f (v) ∈ {1, 2, . . . , n} confirming that f is a valid homomorphism to Pn .
Constraint Satisfaction with Counting Quantifiers 2 269

This generalizes to odd n with a subtle twist. Deﬁne γ (v) using same recursion
as γ(v) except set γ (v) = β(v) − 1 if v is ﬁrst in ≺. Note that γ (v) ≥ γ(v).

Theorem 9. Let n ≥ 5 be odd. Assume that P∞ |= Ψ and that the vertices of

Dψ are properly coloured with colours black and white. Then TFAE.
(I) Pn |= Ψ .
(II) Prover has a winning strategy in the game G (Ψ, Pn ).
(III) There are no vertices u, v with γ (u) ≥ n−1
2 and γ (v) ≥ n−1
2 such that
u is black and v is white.

Now to derive Theorem 2, it remains to observe that the values γ(v) and γ (v)
can be calculated using dynamic programming in polynomial time.

5.1 Proofs of Corollaries 1 and 2

In this section, we sketch proofs of the two corollaries.
For Corollary 1, we want to decide {1, 2}-CSP(H) when H is a forest. Let Ψ
be a given instance to this problem, and let G = Dψ be the corresponding graph.
First, we note that we may assume that H is a tree. This follows easily (with
a small caveat mentioned below) as the connected components of G have to be
mapped to connected components of H. Therefore with H being a tree, we ﬁrst
claim that if Ψ is a yes-instance, then Ψ is also a yes-instance to {1, 2}-CSP(P∞).
To conclude this, it can be shown that the condition (III) of Theorem 7 can be
generalized to trees by replacing the absolute value in the condition ( ) by the
distance in H, and by using a proper colouring of H instead of parity in ()).
This implies that no two vertices u,v are mapped in H farther away than δ(u, v).
So a bad walk cannot exist and Ψ is a yes-instance of {1, 2}-CSP(P∞).
A similar argument allows us to generalize Theorems 8 and 9 to trees. Namely,
in an optimal strategy Adversary will play away from some vertex, while Prover
will play towards some vertex. The absolute values will again be replaced by
distances in H. From this we conclude that Adversary can force each v to be
assigned to a vertex in H which is at least γ (v) or γ(v) away from the center
vertex, resp. center edge of H. In summary, this then proves the following.

Corollary 3. Let H be a tree. Let P be a longest path in H. Then Ψ is a yes-

instance of {1, 2}-CSP(H) if and only if Ψ is a yes-instance of {1, 2}-CSP(P ).

This can be phrased more generally for forests in a straightforward manner.

The only caveat is that if two components contain a longest path with odd
number of vertices, then we can make the first vertex in the instance an ∃≥1
vertex without affecting the satisfiability, because if it is ∃≥2 , we let Adversary
choose which midpoint of the two longest paths to use (and either choice is fine).
Finally, to prove Corollary 2, we note that {1, 2}-CSP(H) is NP-hard for
non-bipartite H, since {1}-CSP(H) is as famously proved in [8]. For bipartite
H, the problem is in P if H is a forest (Corollary 1) or if H contains a 4-cycle
(Proposition 10 in [11]). For bipartite graphs of larger girth, the problem is
actually PSPACE-complete as we prove in the next section (Proposition 1).
270 B. Martin and J. Stacho

j−2

...
x v2j−1 v0
... ... • •
... v2j−2 • v1
... •
..
...
.. .. • v2
. ... . .
... ... • • v3
y v4
3j copies of C2j

Fig. 4. The gadget for the case when H contains a cycle C2j

6 Proof of Theorem 3
In this section, we prove the P / PSPACE dichotomy for {1, 2}-CSP(H) for
bipartite graphs H as stated in Theorem 3. We have already discussed the poly-
nomial cases in the previous section. It remains to discuss the hardness.

Proposition 1. If H is a bipartite graph whose smallest cycle is C2j for j ≥ 3,

then {1, 2}-CSP(H) is PSPACE-complete.

Proof. We reuse the reduction from [11] used to prove Theorem 1. We briefly
discuss the key steps. The reduction is from QCSP(Kj ). Let Ψ be an input
formula for QCSP(Kj ). We begin by considering the graph Dψ to which we add a
disjoint copy W = {w1 , . . . , w2j } of C2j . Then we replace every edge (x, y) ∈ Dψ
with a gadget shown in Figure 4, where the black vertices are identified with W .
Finally, for ∀ variables v of Ψ , we add a new path z1 , z2 , . . . , zj where zj = v.
The resulting graph defines the quantifier-free part of θ of our desired formula
Θ. The quantification in Θ is as follows. The outermost quantifiers are ∃≥2 for
variables w1 , . . . , w2j . Then we move inwards through the quantifier order of Ψ ;
when we encounter an existential variable v, we apply ∃≥1 to it in Θ. When we
encounter a ∀ variable v, we apply ∃≥2 to the path z1 , z2 , . . . , zj constructed for
v, in that order. All the remaining variables are then quantified ∃≥1 .
As proved in [11], the cycle C2j models Θ if and only if Kj models Ψ . We now
adjust this to the bipartite graph H. There are three difficulties arising from
simply using the above construction as it is.
Firstly, assume the variables w1 , . . . , w2j are mapped to a fixed copy C of C2j
in H. We need to ensure that variables x, y derived from the original instance Ψ
are also mapped to C. For y variables in our gadget one can check this must be
true – the successive cycles in the edge gadget may never deviate from C, since
H contains no 4-cycle. For x variables off on the pendant this might not be true.
To fix this, we insist that Ψ contains an atom E(x, y) iff it also contains E(y, x);
QCSP(Kj ) remains PSPACE-complete on such instances [2].
Secondly, we need to check that Adversary has freedom to assign any value
from C to each ∀ variable v. Consider z1 , . . . , zj , the path associated with v. As
long as Prover offers values for z1 , . . . , zj from C, Adversary has freedom to chose
any value for v = zj . If on the other hand Prover offers for one of z1 , . . . , zj , say
for zi , a value not on C, then Adversary can choose all subsequent zi+1 , . . . , zj to
Constraint Satisfaction with Counting Quantifiers 2 271

also be mapped outside C, since H has no cycle shorter than C2j . Thus v = zj
is mapped outside C, but we already ensured that this does not happen.
Finally, we discuss how to ensure that W is mapped to a copy of C2j . Since
each vertex in W is quantiﬁed ∃≥2 , Adversary can force this by always choosing
a value not seen already when going through each of w1 , . . . , w2j in turn. If this
is not possible (both oﬀered values have been seen), this gives rise to a cycle in
H shorter than C2j . In conclusion, if Adversary maps W to a cycle, then Prover
must play exclusively on this cycle, thus solving QCSP(Kj ). If Adversary maps
W to a subpath of C2j , then Prover can play to win (regardless whether Φ is a
yes- or no- instance). So the situation is just like with {1, 2}-CSP(C2j ).

7 Partially Reﬂexive Graphs

In this section, we brieﬂy list some results for graphs allowing self-loops on some
vertices (so-called partially reﬂexive graphs). Our understanding of these cases
is rather limited and some recent results [10,12] suggest that a simple dichotomy
is very unlikely. Nonetheless, some cases might still be of further interest.
First, we consider the class of undirected graphs with a single dominating
vertex w which is also a self-loop.

Proposition 2. If H has a reﬂexive dominating vertex w and H \ {w} contains

a loop or is irreﬂexive bipartite, then {1, 2}-CSP(H) is in P.

Proposition 3. If H has a reﬂexive dominating vertex w and H \ {w} is ir-

reﬂexive non-bipartite, then {1, 2}-CSP(H) is NP-complete.

Corollary 4. If H has a reﬂexive dominating vertex, then {1, 2}-CSP(H) is

either in P or is NP-complete.

It follows from Proposition 3 that there is a partially reﬂexive graph on four

vertices, K4 with a single reﬂexive vertex, so that the corresponding {1, 2}-CSP
is NP-complete. We can argue this phenomenem is not visible on smaller graphs.

Proposition 4. Let H be a (partially reﬂexive) graph on at most three vertices,

then either {1, 2}-CSP(H) is in Por it is PSPACE-complete.

8 Final Remarks
In this paper we have settled the major questions left open in [11] and it might
reasonably be said we have now concluded our preliminary investigations into
constraint satisfaction with counting quantiﬁers. Of course there is still a wide
vista of work remaining, not the least of which is to improve our P/ NP-
hard dichotomy for {1, 2}-CSP on undirected graphs to a P/ NP-complete /
PSPACE-complete trichotomy (if indeed the latter exists). The absence of a
similar trichotomy for QCSP, together with our reliance on [8], suggests this
could be a challenging task. Some more approachable questions include lower
bounds for {2}-CSP(K4) and {1, 2}-CSP(P∞). For example, intutition suggests
272 B. Martin and J. Stacho

these might be NL-hard (even P-hard for the former). Another question would
be to study X-CSP(P∞ ), for {1, 2} ⊆ / X ⊂ N.
Since we initiated our work on constraint satisfaction with counting quanti-
fiers, a possible algebraic approach has been published in [5,6]. It is clear reading
our expositions that the combinatorics associated with our counting quantifiers
is complex, and unfortunately the same seems to be the case on the algebraic
side (where the relevant “expanding” polymorphisms have not previously been
studied in their own right). At present, no simple algebraic method, generalizing
results from [2], is known for counting quantifiers with majority operations. This
would be significant as it might help simplify our tractability result of Theorem 2.
So far, only the Mal’tsev case shows promise in this direction.

References
1. Barto, L., Kozik, M., Niven, T.: The CSP dichotomy holds for digraphs with no
sources and no sinks (a positive answer to a conjecture of Bang-Jensen and Hell).
SIAM Journal on Computing 38(5), 1782–1802 (2009)
2. Börner, F., Bulatov, A.A., Chen, H., Jeavons, P., Krokhin, A.A.: The complexity
of constraint satisfaction games and QCSP. Inf. Comput. 207(9), 923–944 (2009)
3. Bulatov, A.: A dichotomy theorem for constraint satisfaction problems on a 3-
element set. J. ACM 53(1), 66–120 (2006)
4. Bulatov, A., Krokhin, A., Jeavons, P.G.: Classifying the complexity of constraints
using finite algebras. SIAM Journal on Computing 34, 720–742 (2005)
5. Bulatov, A.A., Hedayaty, A.: Counting predicates, subset surjective functions, and
counting csps. In: 42nd IEEE International Symposium on Multiple-Valued Logic,
ISMVL 2012, pp. 331–336 (2012)
6. Bulatov, A.A., Hedayaty, A.: Galois correspondence for counting quantifiers. CoRR
abs/1210.3344 (2012)
7. Feder, T., Vardi, M.: The computational structure of monotone monadic SNP and
constraint satisfaction: A study through Datalog and group theory. SIAM Journal
on Computing 28, 57–104 (1999)
8. Hell, P., Nešetřil, J.: On the complexity of H-coloring. Journal of Combinatorial
Theory, Series B 48, 92–110 (1990)
9. Kolaitis, P.G., Vardi, M.Y.: A logical Approach to Constraint Satisfaction. In:
Finite Model Theory and Its Applications. Texts in Theoretical Computer Science.
An EATCS Series. Springer-Verlag New York, Inc. (2005)
10. Madelaine, F., Martin, B.: QCSP on Partially Reflexive Cycles – The Wavy Line
of Tractability. In: Bulatov, A.A., Shur, A.M. (eds.) CSR 2013. LNCS, vol. 7913,
pp. 322–333. Springer, Heidelberg (2013)
11. Madelaine, F., Martin, B., Stacho, J.: Constraint Satisfaction with Counting Quan-
tifiers. In: Hirsch, E.A., Karhumäki, J., Lepistö, A., Prilutskii, M. (eds.) CSR 2012.
LNCS, vol. 7353, pp. 253–265. Springer, Heidelberg (2012)
12. Martin, B.: QCSP on partially reflexive forests. In: Lee, J. (ed.) CP 2011. LNCS,
vol. 6876, pp. 546–560. Springer, Heidelberg (2011)
13. Schaefer, T.J.: The complexity of satisfiability problems. In: Proceedings of STOC
1978, pp. 216–226 (1978)
Dynamic Complexity of Planar 3-Connected
Graph Isomorphism

Jenish C. Mehta

jenishc@gmail.com

Abstract. Dynamic Complexity (as introduced by Patnaik and Immer-

man [14]) tries to express how hard it is to update the solution to a
problem when the input is changed slightly. It considers the changes re-
quired to some stored data structure (possibly a massive database) as
small quantities of data (or a tuple) are inserted or deleted from the
database (or a structure over some vocabulary). The main difference
from previous notions of dynamic complexity is that instead of treat-
ing the update quantitatively by finding the the time/space trade-offs,
it tries to consider the update qualitatively, by finding the complexity
class in which the update can be expressed (or made). In this setting,
DynFO, or Dynamic First-Order, is one of the smallest and the most
natural complexity class (since SQL queries can be expressed in First-
Order Logic), and contains those problems whose solutions (or the stored
data structure from which the solution can be found) can be updated in
First-Order Logic when the data structure undergoes small changes.
Etessami [7] considered the problem of isomorphism in the dynamic
setting, and showed that Tree Isomorphism can be decided in DynFO. In
this work, we show that isomorphism of Planar 3-connected graphs can
be decided in DynFO+ (which is DynFO with some polynomial precom-
putation). We maintain a canonical description of 3-connected Planar
graphs by maintaining a database which is accessed and modified by
First-Order queries when edges are added to or deleted from the graph.
We specifically exploit the ideas of Breadth-First Search and Canoni-
cal Breadth-First Search to prove the results. We also introduce a novel
method for canonizing a 3-connected planar graph in First-Order Logic
from Canonical Breadth-First Search Trees.

1 Introduction
Consider the problem lis(A) of finding the longest increasing subsequence of a
sequence (or array) of n numbers A. The “template” dynamic programming poly-
nomial time solution proceeds by subsequently finding and storing lis(A[1:i]) -
the longest increasing subsequence of numbers from 1 to i that necessarily ends
with the i’th number. lis(A[1:i + 1]) is found, given lis(A[1:1]) to lis(A[1:i]),
by simply finding the maximum sequence formed by possibly appending A[i + 1]
to the largest subsequence from lis(A[1:1]) to lis(A[1:i]).

This work was done while the author was interning at Chennai Mathematical Insti-
tute in May-June, 2010

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 273–286, 2014.

c Springer International Publishing Switzerland 2014
274 J.C. Mehta

This paradigm of dynamic programming (or incremental thinking), of storing

information using polynomial space, and updating it to get the required results,
is neatly captured in the Dynamic Complexity framework introduced by Patnaik
and Immerman [14]. Broadly, Dynamic Complexity tries to measure or express
how hard it is to update some stored information, so that some required query
can be answered. For instance, for some graph problem, like reachability, it tries
to measure (or express) how hard it is to update some stored information when
an edge is inserted or deleted from the graph, so that the required query, like
reachability between two vertices s and t, can be answered easily from the stored
information. Essentially, it asks how hard is one step of induction, or how hard
it is to update one step of some recurrence.
This Dynamic Complexity framework (as in [14]) differs from other notions
in two ways. For some problem (say a graph theoretic problem like colorabil-
ity or reachability), the traditional notions of the dynamic complexity try to
measure the amount of time and space required to make some update to the
problem (like inserting/deleting edges from a graph or inserting/deleting tuples
from a database), and the trade-offs between the two. Dynamic Complexity,
instead tries to measure (or express) the resources required for an update quali-
tatively. Hence, it tries to measure an update by the complexity class in which it
lies, rather than the explicit time/space requirements. For any static complex-
ity class C, informally, the dynamic complexity class DynC consists of the set of
problems, such that any update (to be defined formally later) to the problem can
be expressed in the compexity class C. A bit more formally, a language L is in
the dynamic complexity class DynC if we can maintain a tuple of relations (say
T ) for deciding the language in C, such that after any insertion or deletion of a
tuple to the relations, they can be effectively updated in the complexity class C
(updation is required so that even after the insertion/deletion of the tuple, they
decide the same language L).
Another difference is that it treats the complexity classes in a Descriptive
manner (using the language of Finite Model Theory) rather than the standard
Turing manner (defined by tapes and movement of pointers). Since Descriptive
Complexity tries to measure the hardness of expressing a problem rather than
the hardness of finding a solution to the problem, Dynamic Complexity tries to
measure how hard it is to express an update to some problem. Though, since
either definition - Descriptive or Turing - leads to complexity classes with the
same expressive power, any of the definitions remain valid.
Consider the dynamic complexity class DynP (or DynFO(LFP)). Intuitively, it
permits storage of a polynomial amount of information (generated in polynomial
time), so that (for some problem) the information during any update can be
modified in P. Observe that the above problem of lis(A) lies in DynP, since at
every stage we stored a polynomial amount of information, and the update step
took polynomial time to modify the information.
Although we do not consider relations between static and dynamic complexity
classes here, it is worth mentioning that DynP=P (under a suitable notion of a
reduction). Hence, unless P=NP, it is not possible to store some polynomial
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 275

amount of information (generated in polynomial time), so that insertion of a

single edge in a graph or a single clause in a 3-SAT expression (over a fixed
set of variables), leads to finding whether the graph is 3-colorable or whether
the 3-SAT expression is satisfiable. As another illustration, for the NP-complete
problem of finding the longest path between any two vertices in an n-vertex
undirected graph, even if we are given any kind of (polynomial) information1 ,
including the longest path between all possible pairs of vertices in the given
graph, it is not possible to find the new longest path between any pair of vertices
when a single edge is inserted to the graph (unless P=NP). This means that NP-
complete problems are even hard to simply update, i.e, even a small update to an
NP-complete problem cannot be done in polynomial time. The reader is referred
to [9] for complete problems for DynFO and for reductions among problems in
the dynamic setting.
Although a dynamic programming solution to any problem is in effect a DynP
solution, the class DynP is less interesting since it is essentially same as P. More
interesting classes are primarily the dynamic versions of smaller circuit complex-
ity classes inside P, like DynNC1 , DynTC0 , etc. The most interesting, and perhaps
the smallest dynamic complexity class, is DynFO. Intuitively, DynFO or Dynamic
First-Order is the set of problems for which a polynomial sized database of in-
formation can be stored to answer the problem query (like reachability), such
that after any insertion/deletion of a tuple, the database can be updated using
merely a FO query (i.e. in First-Order Logic). A problem being in DynFO means
that any updation to the problem is extremely easy in some sense.
Another reason why DynFO is important is because it is closely related to
practice. A limitation of static complexity classes is that they are not appropri-
ate for systems where large amounts of data are to be queried. Most real-life
problems are dynamic, extending over extremely long periods of time, manip-
ulating stored data. In such systems, it is necessary that small perturbations
to massive quantities of data can be computed very fast, instead of processing
the data from scratch. Consider for instance, a massive code that is dynamically
compiled. We would expect that the compilation, as letters are typed, should
be done very fast, since only a small part of the program is modified with ev-
ery letter. Hence, for huge continually changing databases (or Big-Data), it is
not feasible to re-compute a query all over again when a new tuple is inserted or
deleted to/from the database. For the problems in DynFO, since an SQL query is
essentially a FO Query, an SQL query can update the database without comput-
ing everything again. This is very useful in dynamic settings. A nice exposition
on DynFO in this respect can be found in [15].
One basic problem considered in this setting is that of Reachability. In [14],
it was shown that Undirected Reachability (which is in the static class L), lies
in the complexity class DynFO. Note how a simple class like FOL, which does
not even contain parity, becomes powerfully expressive in the dynamic setting.

1
By polynomial information, we mean information that has been generated in poly-
nomial time, and after the insertion of an edge, it can be regenerated (in polynomial
time) so as to allow insertion of another edge, and so on ad inﬁnitum.
276 J.C. Mehta

Hesse [8] showed that Directed Reachablity lies in DynTC0 . Also, Dong and Su
[6] further showed that Directed Rechability for acyclic graphs lies in DynFO.
The Graph Isomorphism problem (of finding a bijection between the vertex
sets of two graphs such that the adjacencies are preserved) has so far been elusive
to algorithmic efforts and has not yet yielded a better than subexponential (2o(n) )
time static algorithm. The general problem is in NP, and also in SPP (Arvind
and Kurur [1]). Thus, various special cases have been considered, one important
case being restriction to planar graphs. Hopcroft and Wong [10] showed that
Planar Graph Isomorphism can be decided in linear time. In a series of works,
it was further shown that Tree Isomorphism is in L (Lindell [12]), 3-connected
Planar Graph Isomorphism is in L (Datta et. al. [3]) and finally, Planar Graph
Isomorphism is in L (Datta et. al. [4]).
Etessami considered the problem of isomorphism in the dynamic setting. It
was shown in [7] that Tree Isomorphism can be decided in DynFO.
In this work, we consider a natural extension and show that isomorphism
for Planar 3-connected graphs can be decided in DynFO (with some polynomial
precomputation). Our method of showing this is different from that in [7]. The
main technical tool we employ is that of Canonical Breadth-First Search trees
(abbreviated CBFS tree), which were used by Thierauf and Wagner [16] to show
that 3-connected Planar Graph Isomorphism lies in UL. We also introduce a
novel method for finding the canon of a 3-connected Planar graph from Canonical
Breadth-First Search trees in First-Order Logic (FOL). We finally compare the
canons of the two graphs to decide on isomorphism.
Our main results are:
1. Breadth-First Search for undirected graphs is in DynFO
2. Isomorphism for Planar 3-connected graphs is in DynFO+
DynFO+ is exactly same as DynFO, except that it allows some polynomial pre-
computation, which is necessary until enough edges are inserted so that the
graph becomes 3-connected. Note that this is the best one can hope for, due to
the requirement of 3-connectivity.
In Section 3, we prove Result 1. In Section 4, we prove Result 2. In Section 5,
we introduce a novel method of canonizing a planar 3-connected graph in FOL
from Canonical Breadth-First Search trees. Finally, we conclude with open prob-
lems and scope for future work. All the proofs, diagrams, detailed preliminaries,
and First-Order queries can be found in the extended version of the paper [13].

2 Preliminaries
The reader is referred to [5] for the graph-theoretic definitions, to [11] for the
definitions on Finite-Model Theory, and to [11] or [14] for definitions on Dynamic
Complexity.
Let Ev be the set of edges incident to v. A permutation πv on Ev that has
only one cycle is called a rotation. A rotation scheme for a graph G is a set
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 277

π of rotations, π = {πv | v ∈ V and πv is a rotation on Ev }. Let π c be the

set of inverse rotations, π c = {πvc | v ∈ V }. A rotation scheme π describes an
embedding of graph G in the plane. For 3-connected planar graphs, we shall
asssume that π is the set of anti-clockwise rotations around each vertex, and π c
is the set of clockwise rotations around every vertex. Whitney [17] showed that
π and π c are the only two rotations for 3-connected planar graphs.
We shall refer to the following theorem at certain places, and we make it
explicit here, which can be proven using Ehrenfeucht-Fraisse Games [11]:
Theorem 1. Transitive Closure is not in FOL = uniform AC0 .

Definition 1. For any static complexity class C, we define its dynamic version,
DynC as follows: Let ρ = R1a1 , ..., Rsas , c1 , ..., ct , be any vocabulary and S ⊆
ST RU C(ρ) be any problem.
Let Rn,ρ = {ins(i, a ), del(i, a ), set(j, a) | 1 ≤ i ≤ s, a ∈ {0, ..., n − 1}ai ,
1 ≤ j ≤ t} be the request to insert/delete tuple a into/from the relation Ri , or
set constant cj to a.
∗
Let evaln,ρ : Rn,ρ → ST RU C(ρ) be the evaluation of a sequence or stream
of requests. Define S ∈ DynC iff there exists another problem T ⊂ ST RU C(τ )
(over some vocabulary τ ) such that T ∈ C and there exist maps f and g:
∗
f : Rn,ρ → ST RU C(τ ), g : ST RU C(τ ) × Rn,ρ → ST RU C(τ )

satisfying the following properties:

1. (Correctness) For all r ∈ Rn,ρ∗
, (evaln,ρ (r ) ∈ S) ⇔ (f (r ) ∈ T )
2. (Update) For all s ∈ Rn,ρ , and r ∈ Rn,ρ∗
, f (r s) = g(f (r ), s)
3. (Bounded Universe) ||f (r )|| = ||evaln,ρ (r )||O(1)
4. (Initialization) The functions g and the initial structure f (∅) are com-
putable in C as functions of n.

Our main aim is to deﬁne the update function g (over some vocabulary τ ). If
condition (4) is relaxed, to the extent that the initializing function f may be
polynomially computable (before any insertion or deletion of tuples begin), the
resulting class is DynC+ , that is DynC with polynomial precomputation.

3 Breadth-First-Search in DynFO
In this section, we shall show that Breadth-First-Search (abbreviated BFS) for
any arbitrary undirected graph lies in DynFO. More specifically, we shall show
that there exists a set of relations, such that using those relations, finding the
minimum distance between any two points in a graph can be done through FOL,
and the set of all the points at a particular distance from a given point can
be retrieved through a FO query, in any arbitrary undirected graph. Also, the
modification of the relations can be carried out using FOL, during insertion or
deletion of edges.
278 J.C. Mehta

The definitions and terminologies regarding BFS can be found in any standard
textbook on algorithms, like [2].
The main idea is to maintain the BFS tree from each vertex in the graph. This
idea is important, because it will be extended in the next section. To achieve
this, we shall maintain the following relations:
– Level(v, x, l), implying that the vertex x is at level l in the BFS tree of
vertex v (A vertex x is said to be at level l in the BFS tree of v if the
distance between x and v is l);
– BF SEdge(v, x, y), meaning that the edge (x, y) of the graph is in the BFS
tree rooted at v;
– P ath(v, x, y, z), meaning that vertex z is on the path from x to y, in BFS
tree of v. Also
– Edge(x, y) will denote all the edges present in the entire graph.
Note that it is sufficient to maintain the Level relation to query the length of the
shortest path between any two vertices. We maintain the BF SEdge and P ath
relations only if we want the actual shortest path between any two vertices.
These relations form the vocabulary τ as in Definition 1.

3.1 Maintaining Level(v, x, l), BF SEdge(v, x, y), P ath(v, x, y, z)

We shall ﬁrst focus on the Level(v, x, l) relation, since it will give us the tools
required for the other two relations.
In maintaining this relation, we are eﬀectively maintaining the shortest dis-
tances between every pair of vertices. We will need to understand how the various
BFS trees behave during insertion and deletion of edges before we write down
the queries.
We will use the following notations from this section onwards. Let pathv (α, β)
denote the set of edges in the path from vertex α to β, in the BFS tree of v.
Let |pathv (α, β)| denote its size. Hence Level(v, x, l) means |pathv (v, x)| = l. Let
levelv (x) denote the level of vertex x in BFS tree of v. Hence, Level(v, x, l) ⇔
levelv (x) = l. Also, we shall succinctly denote the edge from a to b by {a, b}.
The vertices which are not connected to v will not appear in any tuple in the
BFS-tree of v.
Note that any path can be split into two disjoint paths. For instance,
pathv (a, b) = pathv (a, d) ∪ pathv (d, b) for any vertex d on pathv (a, b), simply
because there is only one path in a tree between any two vertices.

insert(a, b). Due to the insertion of edge {a, b}, various paths in many BFS
trees will change. We will show that many of the paths do not change, and these
can be used to update the shortest paths that do change.
We shall see how to modify level of some vertex x in the BFS tree of some
vertex v. But before we proceed, we’ll need the following important lemma:
Lemma 1. After the insertion of an edge {a, b}, the level of a vertex x cannot
change both in the BFS trees of a and b.
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 279

Since the level of vertex x remains invariant in atleast one BFS tree, this fact
can be used to modify the level of (and subsequently even the paths to) x using
this invariant. This fact will be crucial in the queries that we write next.
To update the BF SEdge and P ath relations, since we will create the new
shortest path by joining together two diﬀerent paths, we need to ensure that
these paths are disjoint.
Without loss of generality, let |pathb (b, x)| ≤ |patha (a, x)|.
Lemma 2. If any vertex t is on pathb (b, x) and on pathv (v, a), then the shortest
path from v to x does not change after insertion of the edge {a, b}
The proofs of the above lemmas and the corresponding queries can be found
in the extended version of the paper.

delete(a, b). Consider now the deletion of some edge {a, b} from the graph. If
it is present in the BFS tree of some vertex v, the removal of the edge splits the
tree into two diﬀerent trees. Let R1 = {u | v, u are connected in VG \{a, b}}, and
R2 = {u | u ∈ / R1 }. We ﬁnd the set P R = {(p, r) | p ∈ R1 ∧ r ∈ R2 ∧ Edge(p, r)},
where P R is the set of edges in the graph that connect the trees R1 and R2 .
The new path to x will be a path from v to p in the BFS-tree of v, edge {p, r},
and path from r to x in the BFS-tree of r; and {p, r} will be chosen to yield the
shortest such path, and we will choose {p, r} to be the lexicographically smallest
amongst all such edges that yield the shortest path.
The only thing we need to address is the fact that the path from r to x in the
BFS tree of r does not pass through the edge {a, b}.

Lemma 3. When an edge {a, b} separates a set of vertices R2 from the BFS
tree of v, and r and x are vertices belonging to R2 , then pathr (r, x) cannot pass
through edge {a, b}

Remark 1. An important observation is that the above lemma holds only for the
“undirected” case. It fails for the directed case, implying that the same relations
cannot be used for BFS in directed graphs. To see a simple counter-example, note
that there can be a directed edge from r to a in the directed case, and in that
case, the shortest path from r to x can pass through (a, b).

Also note that for every vertex x in R1 , the shortest path from v to x remains
the same, since removal of an edge cannot decrease the shortest distance.

Remark 2. Note that although we pick the new paths for every vertex in the set
R2 in parallel, we need to ensure that the paths picked are consistent, i.e. the
paths form a tree and no cycle is formed. This is straightforward to see, since if
a cycle is formed, it is possible to pick another path for some vertex that came
earlier in the lexicographic ordering. Hence, our queries are consistent.

This leads us to the following theorem:

Theorem 2. Breadth-First-Search for an undirected graph is in DynFO.

280 J.C. Mehta

4 3-Connected Planar Graph Isomorphism

The ideas and the techniques hitherto developed were for general undirected
graphs. Now onwards, our relations would no longer hold for general graphs,
and we restrict ourselves to 3-connected and planar graphs.
We shall now show how to maintain a canonical description of a 3-connected
planar graph in DynFO. To achieve this end, we shall maintain Canonical Breadth-
First Search (abbreviated CBFS) trees similar to the ones used by Thierauf and
Wagner [16].

4.1 Canonical Breadth-First Search Trees

We have defined CBFS trees for the sake of completeness in the extended version
of the paper, and the reader is strongly urged to read it.
We maintain a CBFS tree, denoted by [v, ve ], from each vertex v in the graph,
for each edge (v, ve ) used as the starting embedding edge. This set of CBFS
trees will help us in maintaining the necessary relations, during insertions and
deletions, for isomorphism.
Modifying our previous conventions, let pathv,ve (α, β) denote the path from
vertex α to β, in the CBFS tree [v, ve ]. Let Least Common Ancestor (LCA)
of x and y, lcav,ve (x, y), denote that vertex d which is on pathv,ve (v, a) and
pathv,ve (v, x) and whose level is maximum amongst all such vertices. Denote the
embedded number of vertex x around vertex u by emnumu (x), i.e. emnumu (x) =
πu (x). Let parentv,ve (u, up ) be true if up is the parent of u in [v, ve ]. Denote by
emnumv,ve (u, x) = (πu (x) − πu (up )) mod du , the embedded number of some
vertex x around u in [v, ve ] if up (the parent of u in [v, ve ]) has been assigned
the number 0.
Since the embedding π is unique, given a vertex and an edge incident on it,
the entire CBFS tree is fixed. As such, given v, ve , the length of the shortest
path to a vertex x is fixed. The actual path is decided by the following definition.
Definition 2. Let <c denote a canonical ordering on paths. Let P1 =
pathv,ve (v, x1 ) and P2 = pathv,ve (v, x2 ). Let d = lcav,ve (x1 , x2 ), d1 a vertex
on P1 and d2 a vertex on P2 , and (d, d1 ) and (d, d2 ) edges in the graph.
Then P1 <c P2 if:
– |P1 | < |P2 | or
– |P1 | = |P2 | and emnumv,ve (d, d1 ) < emnumv,ve (d, d2 )

We shall now see how to maintain the CBFS trees [v, ve ]. We maintain the
following relations:

– Emb(v, x, nx ), meaning that the vertex x is in the neighbourhood of v, and

the edge (v, x) around v has the embedded number nx ;
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 281

– F ace(f, x, y, z), meaning that the vertex z is in the anti-clockwise path from
vertex x to vertex y, around the face labelled f .
Note that since the number of faces in a planar graph with n vertices can
be more than n, we should label the face with a 2-tuple instead of a single
symbol; but we do not do this since it adds unnecessary technicality without
adding any new insight. If required, all the queries can be maintained for
the faces labelled as two tuples f = (f1 , f2 ).
– Level(v, x, l), meaning that the vertex x is at level l in the BFS tree of v.
This is exactly as in the general case.
– CBF SEdges(v, ve , s, t), where (s, t) is an edge in the CBFS tree [v, ve ].
– CP ath(v, ve , x, y, z) denoting that z is on the path from x to y in [v, ve ].

4.2 Maintaining Emb(v, x, nx ) and F ace(f, x, y, z)

These two relations deﬁne the embedding of the graph in the plane. We assume
throughout in this section that the embedded numbering is the anti-clockwise
one, and note that the same relations that we maintain for the anti-clockwise
embedding can be maintained for the clockwise embedding.
Lemma 4. In a 3-connected planar graph G, two distinct vertices not connected
by an edge cannot both lie on two distinct faces unless G is a cycle.
As a corrollary to the above lemma, we get:
Corollary 1. In a 3-connected planar graph G, two distinct vertices when con-
nected by an edge splits one face into two new faces, creating exactly 1 new face.
Now, due to the above theorem, we can update the Emb and the F ace relations.

insert(a, b). Any edge {a, b} that is inserted lies on a particular face, say
f . Consider the edges from vertex a. Since a lies on the face f , exactly two
edges from a will lie on the boundary of f . Let these two edges, considered anti-
clockwise be e1 and e2 , having the embedded numbering n1 and n2 , respectively.
Note that n2 = (n1 + 1) mod da , where da is the degree of a. This is because if
this was not so, there would be some other edge in the anti-clockwise direction
between e1 and e2 , which would mean either we have selected a wrong face or
the wrong edges e1 and e2 .
Hence, when we insert the new edge {a, b}, we can give {a, b} the embedded
number n2 , and all the other edges around a which have an embedded number
more than n2 , can be incremented by 1. Similarly we do this for b.

delete(a, b). Note that since we expect the graph to be 3-connected and planar
once the edge (a, b) is removed, by the converse of Lemma 4 above, exactly two
faces will get merged. As such, our queries now will be the exact opposite to
those for insertion.
282 J.C. Mehta

Rotating and Flipping the Embedding. We will show how to rotate or flip
the embedding of the graph in FOL if required, as it will be necessary for further
sections.
The type of rotation that we will accomplish in this section is as follows: In
any given CBFS tree [v, ve ], for every vertex x, we rotate the embedding around
x until its parent gets the least embedding number, number 0 (that is the 0’th
number in the ordering). For the root vertex v which has no parent, we give ve
the least embedding number.
This scheme is like a ’normal’ form for ordering the edges around any vertex,
or ’normalizing’ the embedding. We show in this section that this can be done in
FOL. Also, flipping the ordering from anti-clockwise to clockwise is (very easily)
in FOL.
We shall create the following relation: Embp (v, ve , t, x, nx ), which will mean
that in the CBFS tree [v, ve ], for some vertex t, if the parent of t is tp , and if
the edge (t, tp ) (or the vertex tp ) is given the embedded number 0, then the edge
(t, x) (or the vertex x) gets the embedded number nx .
Note that our relation Emb was independent of any particular CBFS tree,
since it depended only on the structure of the 3-connected planar graph and
not on any CBFS tree we chose. But Embp depends on the chosen CBFS tree.
Another thing to note is that we do not maintain the relation Embp in our
vocabulary τ , since it can be easily created in FOL from the rest of the relations
whenever required.
We create the relation Embp in the following manner. In every CBFS tree
[v, ve ], for every vertex t, we find the degree (dt ) and the parent (tp ) of t, and
the embedded number np of tp . Then for every vertex x in the neighbourhood
of t with embedded numbering nx , we do nx = (nx − np ) mod dt .
We also create the relation Embf which will contain the flipped or the clock-
wise embedding π c .
A note said throughout in the manuscript is necessary to repeat here. Though
the parent of v is null in [v, ve ], we allow the parent of v to be ve , so as to keep
the queries neater. If this convention is not required, then the special case of the
parent of v can be handled easily by modifying the queries.
This shows that the embedding can be flipped and normalized in FOL. We
conclude the following:
Theorem 3. The embedding of a 3-connected planar graph can be maintained,
normalized and flipped in DynFO.

4.3 Maintaining CBF SEdges(v, ve , s, t) and CP ath(v, ve , x, y, z)

In this section, we show how to maintain the ﬁnal two relations via insertions
and deletions of tuples that will help us to decide the isomorphism of two graphs.
The relations are almost completely similar to the ones used for Breadth-First
Search in the previous section. The only diﬀerence which arises is due to the
uniqueness of the paths in Canonical Breadth-First Search Trees. We do not
rewrite the Level(v, x, l) since it will be exactly similar to the general BFS case.
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 283

insert(a, b). The CBFS tree is unique if the path to every vertex x from
the root vertex v is uniquely defined. How shall we choose the unique path?
First, we consider the paths with the shortest length. This is exactly same as
in Breadth-First Search seen in the previous section. But unlike BFS, where
we chose the shortest path arbitrarily (that is by lexicographic ordering during
insertion/deletion), we will very precisely choose one of the paths from the set of
shortest paths, by Definition 2. Intuitively, Definition 2 chooses the path based
on its orientation according to the embedding π.
An important observation from Definition 2 is the following: Distance has
preference over Orientation. This means that if there are two paths P1 and P2
from v to x in [v, ve ] (due to the insertion of an edge which created a cycle in
the tree [v, ve ]), though P2 <c P1 , the path P1 will be chosen if |P1 | < |P2 |
irrespective of the canonical ordering <c .
Consider some [v, ve ]. During insertion of {a, b}, let the old path (from v
to some x) be P1 and assume that the new path P2 passes through (a, b). If
|P1 | < |P2 | or |P1 | = |P2 | ∧ P1 <c P2 , the path to x does not change, and all
the edges and tuples to x in the old relations will belong to the new relations.
If |P2 | < |P1 | or |P1 | = |P2 | ∧ P2 <c P1 , the path to x changes. In this case, the
new path will be from v to a in [v, ve ], the edge {a, b} (from a to b), and the
path from b to x in [b, be ]. The way we choose be is as follows: We find the set
of vertices C that are adjacent to b and are at levelv (b) + 1. Since a will be the
parent of b in [v, ve ] , we rotate the embedding around b until a gets the value 0,
and the choose be to be the vertex in C that gets the least embedding number.
To check for the condition P1 <c P2 , we do the following: We create the set
Embp so that the parent of each vertex has the least embedding number. Let the
path Pa denote the path from v to a, which will be a subset of P2 . We choose
the vertex which is the least common ancestor of a and x, say d = lcaa,x , and
normalize the embedding so that dp , the parent of d, gets the embedding number
0. Existence of lcaa,x is guaranteed since v lies on both P1 and Pa . Now consider
the edge e1 = (lcaa,x , d1 ) on P1 and e2 = (lcaa,x , d2 ) on P2 . Since the embedding
is normalized, we see which edge gets the smaller embedding number around the
vertex lcaa,x. The path on which that edge lies will be the lesser ordered path
according to <c . It is nice to pause here for a moment and observe that this was
possible since the embedding was ’normalized’, otherwise it would not have been
possible.
One more thing needs to be shown. In Lemma 1, we proved that for any vertex
x, its level cannot change both in the BFS trees of a and b. In the previous case
of BFS, as per our algorithm, the level not being changed implied the path not
being changed. But that is not the case in CBFS trees. In CBFS trees, the level
not changing may still imply that the path changes (due to the <c ordering on
paths). Hence, it may be possible that though the level of the vertex x changes
in only one of the CBFS trees, its actual path changes in both the CBFS trees.
We need to show that this is not possible. And the reason this is necessary is
because (just like the previous case) the updation of the path will depend on
one specific path to vertex x in the CBFS tree of a or b which has not changed.
284 J.C. Mehta

Lemma 5. After the insertion of edge {a, b}, the path to any vertex x, cannot
change in both the CBFS trees [a, ae ] and [b, be ] for all ae , be .

delete(a, b). For the deletion operation, we choose the edge from P Rmin based
on the <c relation. Note that when some edge {a, b} is deleted, the path to some
vertex x in [v, ve ] cannot change if {a, b} does not lie on the path. Other things
remain exactly similar to the general case.

5 Canonization and Isomorphism Testing

In [16], the 3-connected planar graph is canonized from Canonical Breadth-First
Search trees by using Depth-First Search (DFS). Performing DFS or any method
that employs computing the transitive closure in any manner cannot be used
here since it would not be possible in FOL, and most of the known methods of
canonization seem to require computing the transitive closure. Note that a canon
is required for condition 1 in Deﬁnition 1 to hold. What we seek is a method to
canonize the graph, which depends only on the properties of vertices that can
be inferred globally.
To achieve this, we shall label each vertex with a vector. Though the label
will not be succinct now, it will be possible to create it in FOL.
Essentially, the canon for a vertex x in some CBFS tree [v, ve ] will be a set
of tuples (l, h) of the levels and (normalized) embedding numbers of ancestors of
x.
Deﬁnition 3. Let canon for each vertex x in [v, ve ] be represented by Canonv,ve (x).
Then,

Canonv,ve (x) = {(l, h) : ∃q, qp , C ∧ L ∧ P ∧ H}

where, C : CP ath(v, ve , v, x, q), L : l = levelv (q), P : parentv,ve (q, qp ),

H : h = emnumv,ve (qp , q)

Lemma 6. For any CBFS tree [v, ve ], for any two vertices x and y, x = y ⇔
Canon(x) = Canon(y)

It is now easy to canonize each of the CBFS trees in FOL. Once each vertex
has a canon, each edge is also uniquely numbered. The main idea is this: A
canon will in itself encode all the necessary properties of the vertex, and the set
of canons of all vertices become the signature of the graph, preserving edges. The
main advantage of Deﬁnition 3 is that the canon of the graph can be generated
in FOL. It’s worthwhile to observe how this neatly beats the otherwise inevitable
computation of transitive closure (Theorem 1) to canonize the graph.
Hence, two 3-connected planar graphs G and H are isomorphic if and only if
for some CBFS tree [g, ge ] of G, there is a CBFS tree [h, he ], such that:
Dynamic Complexity of Planar 3-Connected Graph Isomorphism 285

– ∀x∃y, (x ∈ G ∧ y ∈ H ∧ (Canon(x) = Canon(y))) and

– ∀x1 , x2 , ((Edge(x1 , x2 ) ∈ G) ⇔ (Edge(Canon(x1 ), Canon(x2 )) ∈ H))

H implies either H with the embedding ρ or H with ﬂipped embedding ρ−1 . It

is evident that if the graphs are isomorphic, there will be some CBFS tree in
G and H whose canons will be equivalent in the above sense. If the graphs are
not isomorphic, no canon of any CBFS tree could be equivalent, since it would
then directly give a bijection between the vertices of the graph that preserves
the edges, which would be a contradiction.
Since we still need to precompute all the relations before the condition of
3-connectivity is reached, isomorphism of G and H is in DynFO+ . This brings
us to the main conclusion of this section:
Theorem 4. 3-connected planar graph isomorphism is in DynFO+

Conclusion. We have shown that Breadth-First Search for undirected graphs

is in DynFO and Planar 3-connected Graph Isomorphism is in DynFO+ . A nat-
ural extension is to show that Planar Graph Isomorphism is in DynFO. Though
even parallel algorithms for this problem are known [4], the ideas cannot be di-
rectly employed because of myriad problems arising due to automorphisms of the
bi/tri-connected component trees (which are used in [4]), and various subroutines
that require computing the transitive closure. In spite of these shortcomings, we
strongly believe that Planar Graph Isomorphism is in DynFO, though the exact
nature of the queries still remains open.

Acknowledgments. The author sincerely thanks Samir Datta for fruitful dis-
cussions and critical comments on all topics ranging from the problem statement
to the preparation of the ﬁnal manuscript.

References
1. Arvind, V., Kurur, P.P.: Graph isomorphism is in SPP. In: Proceedings of the
Forty-Third Annual IEEE Symposium on Foundations of Computer Science,
pp. 743–750. IEEE (2002)
2. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms,
2nd edn. McGraw-Hill Higher Education (2001)
3. Datta, S., Limaye, N., Nimbhorkar, P.: 3-connected Planar Graph Isomorphism is
in Logspace. arXiv:0806.1041 (2008)
4. Datta, S., Limaye, N., Nimbhorkar, P., Thierauf, T., Wagner, F.: Planar Graph
Isomorphism is in Logspace. In: Proceedings of the Twenty-Fourth Annual IEEE
Conference on Computational Complexity, pp. 203–214. IEEE (2009)
5. Diestel, R.: Graph Theory. Springer (2005)
6. Dong, G.Z., Su, J.W.: Incremental and Decremental evaluation of Transitive Clo-
sure by First-Order queries. Information and Computation 120(1), 101–106 (1995)
7. Etessami, K.: Dynamic Tree Isomorphism via First-Order Updates to a Relational
Database. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART
Symposium on Principles of Database Systems, pp. 235–243. ACM (1998)
286 J.C. Mehta

8. Hesse, W.: The Dynamic Complexity of Transitive Closure is in DynTC0 . In:

Proceedings of the Eighth International Conference on Database Theory. Citeseer
(2002)
9. Hesse, W.M.: Dynamic Computational Complexity. Computer Science (2003)
10. Hopcroft, J.E., Wong, J.-K.: Linear time algorithm for Isomorphism of Planar
graphs (preliminary report). In: Proceedings of the Sixth Annual ACM Symposium
on Theory of Computing, pp. 172–184. ACM (1974)
11. Immerman, N.: Descriptive Complexity. Springer (1999)
12. Lindell, S.: A Logspace algorithm for Tree Canonization. In: Proceedings of the
Twenty-Fourth Annual ACM Symposium on Theory of Computing, pp. 400–404.
ACM (1992)
13. Mehta, J.C.: Dynamic Complexity of Planar 3-connected Graph Isomorphism.
arXiv (2013), http://arxiv.org/abs/1312.2141
14. Patnaik, S., Immerman, N.: Dyn-FO (preliminary version): A Parallel, Dynamic
Complexity Class. In: Proceedings of the Thirteenth ACM SIGACT-SIGMOD-
SIGART Symposium on Principles of Database Systems, pp. 210–221. ACM (1994)
15. Schwentick, T.: Perspectives of Dynamic Complexity. In: Libkin, L., Kohlenbach,
U., de Queiroz, R. (eds.) WoLLIC 2013. LNCS, vol. 8071, pp. 33–33. Springer,
Heidelberg (2013)
16. Thierauf, T., Wagner, F.: The isomorphism problem for Planar 3-connected graphs
is in Unambiguous Logspace. Theory of Computing Systems 47(3), 655–673 (2010)
17. Whitney, H.: A set of topological invariants for graphs. American Journal of Math-
ematics 55(1), 231–235 (1933)
Fast Approximate Computations with Cauchy
Matrices, Polynomials and Rational Functions

Victor Y. Pan

Department of Mathematics and Computer Science

Lehman College and the Graduate Center of the City University of New York
Bronx, NY 10468 USA
victor.pan@lehman.cuny.edu
http://comet.lehman.cuny.edu/vpan/

Abstract. The papers [18], [9], [29], and [28] combine the techniques of
the Fast Multipole Method of [15], [8] with the transformations of matrix
structures, traced back to [19]. The resulting numerically stable algo-
rithms approximate the solutions of Toeplitz, Hankel, Toeplitz-like, and
Hankel-like linear systems of equations in nearly linear arithmetic time,
versus the classical cubic time and the quadratic time of the previous
advanced algorithms. We extend this progress to decrease the arithmetic
time of the known numerical algorithms from quadratic to nearly linear
for computations with matrices that have structure of Cauchy or Van-
dermonde type and for the evaluation and interpolation of polynomials
and rational functions. We detail and analyze the new algorithms, and
in [21] we extend them further.

Keywords: Cauchy matrices, Fast Multipole Method, HSS matrices,

Vandermonde matrices, Polynomial evaluation; Rational evaluation, In-
terpolation.

1 Introduction
The numerically stable algorithms of [18], [9], [29], and [28] approximate the
solution of Toeplitz, Hankel, Toeplitz-like, and Hankel-like linear systems of
equations in nearly linear arithmetic time versus the classical cubic time and
the previous record quadratic time of [14]. All five cited papers first transform
the matrix structures of Toeplitz and Hankel types into the structure of Cauchy
type, which is a special case of the general technique proposed in [19]. Then
[14] exploits the invariance of the Cauchy matrix structure in row and column
interchange, whereas the other four papers apply numerically stable FMM to op-
erate efficiently with HSS approximation of the basic Cauchy matrix. “HSS” and
“FMM” are the acronyms for “Hierarchically Semiseparable” and “Fast Multi-
pole Method”, respectively. “Historically HSS representation is just a special
case of the representations commonly exploited in the FMM literature” [7].

Some preliminary results of this paper have been presented at CASC 2013. Our
research has been supported by the NSF Grant CC 1116736 and the PSC CUNY
Awards 64512–0042 and 65792–0043.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 287–299, 2014.

c Springer International Publishing Switzerland 2014
288 V.Y. Pan

Our present paper extends the algorithms of [18], [9], [29], and [28] to com-
putations with Cauchy and Vandermonde matrices, namely to approximation of
their products by a vector and of the solution of linear systems of equations with
these matrices, which also covers approximate multipoint polynomial and ratio-
nal evaluation and interpolation. The arithmetic time of the known numerical
approximation algorithms for all these tasks is quadratic [4], [3], and we decrease
it to nearly linear.
As in the papers [18], [9], [29], and [28], we approximate Cauchy matrices by
HSS matrices and exploit the HSS matrix structure. As in these papers our ba-
sic computational blocks are the numerically stable FFT and FMM algorithms,
which have been eﬃciently implemented on both serial and parallel comput-
ers [16], [1], [6]. Unlike the cited papers, however, we treat a large subclass of
Cauchy matrices C = ( si −t 1
)n−1 (we call them CV matrices because they are
j i,j=0
linked to Vandermonde matrices via FFT-based unitary transformations) rather
than just the single CV matrix involved in the fast Toeplitz solvers. For that
matrix, {s0 , . . . , sn−1 } is the set of the nth roots of unity, and {t0 , . . . , tn−1 }
is the set of the other (2n)-th roots of unity, but for a CV matrix C only the
knots {t0 , . . . , tn−1 } are assumed to be equally spaced on the unit circle, whereas
{s0 , . . . , sn−1 } is an unrestricted set of n knots. We still yield the desired HSS
approximation of CV matrices by exploiting a proper partition of the complex
plane into congruent sectors sharing the origin 0. To decrease the cost of com-
puting this approximation and of subsequent computations with HSS matrices,
we handle the harder and so far untreated case where the diagonal blocks are
rectangular and have row indices that pairwise overlap. We detail and analyze
our algorithms. In [21] we extend them to other classes of structured matrices.
We refer the reader to the papers and books [11], [13], [18], [7], [9], [25], [26],
[29], [27], [28], [2], [6], [15], [10], [8], [17], [23], and the bibliography therein on
FMM, HSS, and Matrix Compression (e.g., Nested Dissection) algorithms.
We organize our paper as follows. In the next section we recall some basic
results on computations with general matrices. In Section 3 we study polynomial
and rational evaluation and interpolation as computations with Vandermonde
and Cauchy matrices. In Sections 4 and 5 we extend the known results on HSS
matrix computations. in Section 6 we apply these results to treat CV matrices. In
Section 7 we discuss extensions and implementation. In Section 8 we summarize
our study. Because of the space limitation we leave to [22] COLORED FIGURES,
demonstrations by examples, proofs, details, and comments.

2 Deﬁnitions and Auxiliary Results

We measure the computational complexity by the number of arithmetic opera-
tions performed in the ﬁeld C of complex numbers with no error and hereafter
referred to as ops. |S| denotes the cardinality of a set S. M = (mi,j )m−1,n−1 i,j=0
is an m × n matrix. M T is its transpose, M H is its Hermitian transpose. C(B)
and R(B) are the index sets of the rows and columns of its submatrix B, re-
spectively. For two sets I ⊆ {1, . . . , m} and J ⊆ {1, . . . , n} deﬁne the submatrix
Fast Approximate Computations 289

M (I, J ) = (mi,j )i∈I,j∈J . R(B) = I and C(B) = J if and only if B = M (I, J ).

Write M (I, .) = M (I, J ) when J = {1, . . . , n}. Write M (., J ) = M (I, J ) when
I = {1, . . . , m}. (B0 . . . Bk−1 ) and (B0 | . . . | Bk−1 ) denote a 1×k block matrix
with k blocks B0 , . . . , Bk−1 , whereas diag(B0 , . . . , Bk−1 ) = diag(Bj )k−1
j=0 is a k×k
block diagonal matrix with k diagonal blocks B0 , . . . , Bk−1 , possibly rectangular.
O = Om,n is the m × n matrix ﬁlled with zeros. I = In is the n × n identity ma-
trix. M is a k × l unitary matrix if M H M = Il or M M H = Ik . An m × n matrix
M has a nonunique generating pair (F, GT ) of a length ρ if M = F GT for two
matrices F ∈ Cm×ρ and G ∈ Cn×ρ . The rank of a matrix is the minimum length
of its generating pairs. An m×n matrix is nonsingular or regular if it has full rank
min{m, n}. A matrix M has a rank at least ρ if and only if it has a nonsingular
ρ × ρ submatrix M (I, J ), and if so, then M = M (., J )M (I, J )−1 M (I, .). This
expression deﬁnes a generating triple (M (., J ), M (I, J )−1 , M (I, .)) and two gen-
erating pairs (M (., J ), M (I, J )−1 M (I, .) and (M (., J )M (I, J )−1 , M (I, .) for
a matrix M of a length ρ. We call such pairs and triples generators. One can
obtain some generators of the minimum length for a given matrix by computing
its SVD or its less costly rank revealing factorizations such as ULV and URV
factorizations in [9], [29], and [28], where the factors are unitary, diagonal or
triangular. α(M ) and β(M ) denote the arithmetic complexity of computing the
vectors M u and M −1 u, respectively, maximized over all vectors u and minimized
over all algorithms, and we write β(M ) = ∞ when the matrix M is singular.
The straightforward algorithm supports the following bound.

Theorem 1. α(M ) ≤ 2(m + n)ρ − ρ − m for an m × n matrix M given with its

generating pair of a length ρ.
m−1,n−1
||M || = ||M ||2 denotes the spectral norm of an m×n matrix M = (mi,j )i,j=0 .
√
We also write |M | = maxi,j |mi,j |, ||M || ≤ mn|M |. If a matrix U is unitary,
then ||U || = 1 and ||M U || = ||U M || = ||M ||. A vector u is unitary if and only
if ||u|| = 1, and if this holds we call it a unit vector. A matrix M̃ is an -
approximation of a matrix M if |M̃ − M | ≤ . The -rank of a matrix M denotes
the integer min|M̃ −M|≤ rank(M̃ ). An -basis for a linear space S of dimension k
is a set of vectors that -approximate the k vectors of a basis for this space. An
-generator of a matrix is a generator of its -approximation. α (M ) and β (M )
replace the bounds α(M ) and β(M ) when we -approximate the vectors M u and
M −1 u instead of evaluating them. The numerical rank of a matrix M , which we
denote nrank(M ), is its -rank for a small . A matrix M is ill conditioned if its
rank exceeds its numerical rank.

3 Polynomial and Rational Evaluation and Interpolation

As Operations with Structured Matrices

m−1,n−1
V = Vs = (sji )m−1,n−1
i,j=0 and C = Cs,t = 1
si −tj denote m × n Vander-
i,j=0
monde and Cauchy matrices, respectively. Some authors deﬁne Vandermonde
matrices as the transposes V T (rather than the above matrices V ).
290 V.Y. Pan

Problem 1. Vandermonde-by-vector multiplication.

INPUT: m + n complex scalars p0 , . . . , pn−1 ; s0 , . . . , sm−1 .
OUTPUT: n complex scalars v0 , . . . , vm−1 satisfying

V p = v for V = Vs = (sji )m−1,n−1

i,j=0 , p = (pj )n−1 m−1
j=0 , and v = (vi )i=0 . (1)

Problem 2. The solution of a Vandermonde linear system.

INPUT: 2n complex scalars v0 , . . . , vn−1 ; s0 , . . . , sn−1 , the last n of them distinct.
OUTPUT: n complex scalars p0 , . . . , pn−1 satisfying equation (1) for m = n.
Problem 3. Cauchy-by-vector multiplication.
INPUT: 2m + n complex scalars s0 , . . . , sm−1 ; t0 , . . . , tn−1 ; v0 , . . . , vm−1 .
OUTPUT: m complex scalars v0 , . . . , vm−1 satisfying
1
m−1,n−1
Cu = v for C = Cs,t = , u = (uj )n−1 m−1
j=0 , and v = (vi )i=0 . (2)
si − tj i,j=0

Problem 4. The solution of a Cauchy linear system of equations.

INPUT: 3n complex scalars s0 , . . . , sn−1 ; t0 , . . . , tn−1 ; v0 , . . . , vn−1 , the first 2n
of them distinct.
OUTPUT: n complex scalars u0 , . . . , un−1 satisfying equation (2) for m = n.
The scalars s0 , . . . , sm−1 , t0 , . . . , tn−1 define the Vandermonde and Cauchy
matrices Vs and Cs,t , are basic for Problems 1–4, and are said to be the knots. We
can define a Cauchy matrix up to shifting its knots and scaling them by constants
because aCas,at = Cs,t and Cs+ae,t+ae = Cs,t for a = 0 and e = (1, . . . , 1)T .

Theorem 2. (i) An m × n Vandermonde matrix Vs = (sji )i,j=0 m−1,n−1

has full rank
if and only if all m knots s0 , . . . , sm−1 are distinct. (ii) An m × n Cauchy matrix

m−1,n−1
1
Cs,t = si −t j
is well deﬁned if and only if its two knot sets s0 , . . . , sm−1
i,j=0
and t0 , . . . , tn−1 share no elements. (iii) If this matrix is well deﬁned, then it has
full rank if and only if all its m + n knots s0 , . . . , sm−1 , t0 , . . . , tn−1 are distinct
and also (iv) if and only if all its submatrices have full rank.

Problems 1–4 are equivalent to polynomial and rational multipoint evaluation

n−1 i
and interpolation, e.g., in Problem 1 we evaluate the polynomial i=0 pi x
at the n − 1 knots s0 , . . . , sn−1 , whereas in Problem 1 we interpolate to this
polynomial from its values at the n − 1 knots. The known solution algorithms
use either O((m + n) log2 (m + n)) ops, allowing extended precision [20, Sections
3.1–3.6], or order of (m + n)2 ops performed numerically, with rounding to a
ﬁxed (e.g., standard IEEE double) precision [4], [3].
We decrease this quadratic bound to nearly linear. At ﬁrst we recall a special
i
case where the solution√ is well known. Suppose si = ω are theij nth roots of
1, ω = ωn = exp(2π −1/n), i = 0, . . . , n − 1, and Vs = (ω )n−1 i,j=0 . Write
1 ij n−1
Ω = n (ω )i,j=0 ) and note that Ω Ω = In , that is Ω = Ω and Ω H =
√ H T

Ω −1 = √1n (ω −ij )n−1

i,j=0 are unitary matrices. Then the Generalized FFT (Fast
Fourier transform) and the Generalized Inverse FFT yield numerically stable
Fast Approximate Computations 291

solutions of Problems 1 and 2 by using O(n log(n)) ops, and this solution can
be extended to Problems 3 and 4 (cf. [5, Sections 1.2 and √ 3.4], [20, Sections
2.2, 2.3, and Problem 2.4.2]). Now write t = (f ω j )n−1
j=0 , Vt = nΩ diag(f j )n−1
j=0 ,
−1 1 −j n−1 H 1 n−1
Vt = √n diag(f )j=0 Ω , and Cs,f = Cs,t = ( si −f ωj )i,j=0 and obtain from
[20, equations (3.6.5)–(3.6.7)] that
√ f n−1
m−1
Cs,f = n diag n Vs diag(f −j )n−1 H
j=0 Ω diag(ω
−j n−1
)j=0 , (3)
si − f n i=0

f 1−n
m−1
Vs = √ diag sni − f n Cs,f diag(ω j )n−1 j n−1
j=0 Ω diag(f )j=0 , (4)
n i=0

√ f n−1
n−1
−j n−1 −1
Vs−1 = n diag(f −j )n−1 Ω H
diag(ω ) C
j=0 s,f diag for m = n.
j=0
sni − f n i=0
(5)
These equations link Vandermonde matrices Vs and their inverses to the
Cauchy matrices with the knot set S = {si = f ω i , i = 0, . . . , n − 1} (for f = 0),
which we call CV matrices and denote Cs,f . The equations also link Problems 1
and 2 to Problems 3 and 4.
By means of the transposition of these equations and the substitution of the
vectors t → s, we link the transposed Vandermonde matrices VtT to the matrices
i,j=0 , for f = 0, which we call the CV matrices.
Cf,t = ( f ωi1−tj )n−1 T

4 HSS Matrices
Definition 1. (Cf. Figure 1.) Let M = (M0 | . . . | Mk−1 ) be a 1×k block matrix
with k block columns Mq , each partitioned into a diagonal block Σq and a basic
neutered block column Nq , q = 0, . . . , k − 1 (cf. [18, Section 1]). A matrix given
with its diagonal blocks is basically ρ-neutered (resp. basically (, ρ)-neutered) if
all its basic neutered block columns have ranks (resp. -ranks) at most ρ.
Definition 2. (Cf. Figure 1.) Fix two positive integers l and q such that l + q ≤
k and merge the l basic block columns Mq , . . . , Mq+l−1 , the l diagonal blocks
Σq , . . . , Σq+l−1 , and the l basic neutered block columns Nq , . . . , Nq+l−1 into their
union Mq,l = M (., ∪l−1 j=0 C(Σq+j )), their diagonal union Σq,l , and their neutered
union Nq,l , respectively, such that R(Σq,l ) = ∪l−1 j=0 R(Σq+j ) and the block column
Mq,l is partitioned into the diagonal union Σq,l and the neutered union Nq,l .
Define recursive merging of all diagonal blocks Σ0 , . . . , Σk−1 by a binary tree
whose leaves are associated to these blocks and whose every internal vertex
is the union of its two children. For every vertex v define the sets L(v) and
R(v) of its left and right descendants, respectively. A binary tree is balanced
if 0 ≤ |L(v)| − |R(v)| ≤ 1 for all its vertices v. Such a tree identifies balanced
merging of its leaves, in our case the diagonal blocks. We can uniquely define a
balanced tree with n leaves by removing the 2l(n) − n rightmost leaves of the
complete binary tree that has 2l(n) leaves for l(n) = log2 (n). All leaves of the
resulting heap structure with n leaves lie in its two lowest levels.
292 V.Y. Pan

Deﬁnition 3. A block matrix is a balanced ρ-HSS matrix if it is basically ρ-

neutered at every stage of balanced merging of its diagonal blocks, that is if all
neutered unions of its basic neutered block columns deﬁned at every stage of the
process of balance merging have ranks at most ρ. By replacing ranks with -ranks
we deﬁne balanced (, ρ)-HSS matrices.
Next we bound α(M ) and β(M ) by adjusting the algorithms of [9, Sections 3
and 4], [29], and [28], devised for a distinct matrix class.
Theorem 3. Assume a balanced ρ-HSS matrix M with mq × nq diagonal blocks
k−1
Σq , q = 0, . . . , k − 1, having s = mq nq entries overall and write l =
k−1 k−1 q=0
log2 (k), m = q=0 mq , n = q=0 nq , m+ = maxk−1 k−1
q=0 mq , n+ = maxq=0 nq ,
k−1
and s = q=0 mq nq , and so s ≤ min{m+ n, mn+ }. Then
α(M ) < 2s + (m + 4(m + n)ρ)l. (6)
If also m = n and the matrix M is nonsingular, then
β(M ) = O(n+ s + (n2+ + ρn+ + lρ2 )n + (kρ + n)ρ2 ). (7)
The bounds (6) and (7) also hold if the matrix M is the transpose of a balanced
ρ-HSS matrix and has nq × mq diagonal blocks Σq for q = 0, . . . , k − 1.
Corollary 1. Let kρ = O(n) and n+ + ρ = O(log 2 (n)) under the assumptions
of Theorem 3. Then α(M ) = O((m + n) log(n)) and β(M ) = O(n log3 (n)).

5 Extension to Tridiagonal Blocks

We wish to approximate CV matrices by balanced ρ-HSS -matrices, but this
only works when we extend this class. We are going to do this and to extend
Theorem 3 and Corollary 1 accordingly.
Given a block matrix M with diagonal blocks Σ0 , . . . , Σk−1 , we first glue to-
gether its lower and upper block boundaries. Then each diagonal block, includ-
ing the two extremal blocks Σ0 and Σk−1 , has exactly two neighboring blocks
in its basic block column, given by the pair of its subdiagonal and superdiag-
(c) (c) (c)
onal blocks. Define the tridiagonal blocks Σ0 , . . . , Σk−1 of sizes mq × nq by
(c)
combining such triples of blocks where mq = mq−1 mod k + mq + mq+1 mod k ,
k−1 (c)
q = 0, . . . , k − 1. Write m(c) = q=0 mq and note that m(c) = 3m because
the number of rows in each of the three block diagonals sums to m. Therefore
k−1 (c)
s(c) = q=0 mq nq ≤ m(c) n+ ≤ 3mn+ . The complements of the tridiagonal
blocks in their basic block columns are also blocks, which we call admissible (cf.
[2]). These blocks play the role of basic neutered block columns of Definition 1,
which become blocks after gluing the two block boundaries (see Figure 2).
Working with tridiagonal rather than diagonal blocks, we extend our def-
initions of recursive and balanced merging, unions of blocks, the basically ρ-
neutered, and balanced ρ-HSS matrices M (cf. [22] and our Definitions 1, 2, and
3), as well as basically (, ρ)-neutered and balanced (, ρ)-HSS matrices M , and
we call such matrices extended. Let us also extend Theorem 3.
Fast Approximate Computations 293

Theorem 4. Suppose the matrix M in Theorem 3 is replaced by an m × n

extended balanced ρ-HSS matrix M (c) , whereas the integer parameters m(c) =
(c) (c)
q=0k−1 mq = 3m and s(c) = q=0k−1 mq nq ≤ 3mn+ replace m and s,
respectively, in bounds (6) on α(M ) and (7) on β(M ). (i) Then bound (6) is
extended and (ii) for m = n and under some nondegeneracy assumption for the
input matrix bound (7) is extended as well.

Corollary 2. Under the assumptions of Theorem 4 suppose that kρ = O(n) and

n+ +ρ = O(log(n)). Then α(M ) = O((m+n) log2 (n)) and β(M ) = O(n log3 (n)).

Remark 1. An extended balanced HSS process supporting Theorem 4 can fail

numerically without some additional assumptions on the input matrix (see [22]).

6 Approximation of the CV Matrices by Extended ρ-HSS

Matrices and Algorithmic Implications
6.1 Small-Rank Approximation of Certain Cauchy Matrices
Deﬁnition 4. (See [9, page 1254].) For a separation bound θ < 1 and a complex
separation center c, two complex points s and t are (θ, c)-separated from one
another if | s−c
t−c
| ≤ θ. Two sets of complex numbers S and T are (θ, c)-separated
from one another if every two points s ∈ S and t ∈ T are (θ, c)-separated from
one another. δc,S = mins∈ S |s − c| denotes the distance from the center c to the
set S.

Lemma 1. (See [24] and [9, equation (2.8)].) Suppose two complex values s and
t are (θ, c)-separated from one another for 0 ≤ θ < 1 and a complex center c and
t−c
write q = s−c , |q| ≤ θ. Then for every positive integer ρ we have

1 (t − c)h
ρ−1
1 qρ |q|ρ θρ
= + where |qρ | = ≤ . (8)
s−t s−c (s − c)h s−c 1 − |q| 1−θ
h=0
∞ h ρ−1 h qρ
Proof. 1
s−t = 1 1 1
s−c 1−q , 1−q = ( ρ−1 h
h=0 q + h=ρ q ) = ( h=0 q + 1−q ).

Theorem 5. (Cf. [9, Section 2.2] and [2].) Suppose two sets of 2n distinct com-
plex numbers S = {s0 , . . . , sm−1 } and T = {t0 , . . . , tn−1 } are (θ, c)-separated
from one another for 0 < θ < 1 and a global complex center c. Define the
Cauchy matrix C = ( si −t 1
)m−1,n−1 and write δ = δc,S = minm−1
j i,j=0 i=0 |si − c|
(cf. Definition 4). Fix a positive integer ρ and define the m × ρ matrix F =
(1/(si − c)ν+1 )m−1,ρ−1
i,ν=0 and the n × ρ matrix G = ((tj − c)ν )n−1,ρ−1j,ν=0 . (We can
compute these matrices by using (m + n)ρ + m arithmetic operations.) Then

θρ
C = F GT + E, |E| ≤ . (9)
(1 − θ)δ

Proof. Apply (8) for s = si , t = tj , and all pairs (i, j) to deduce (9).
294 V.Y. Pan

Remark 2. Assume an m×n Cauchy matrix C = ( si −t 1

)m−1,n−1 with m+n dis-
j i,j=0
tinct knots s0 , . . . , sm−1 , t0 , . . . , tn−1 . Then rank(C) = min{m, n} (cf. Theorem
2). Further assume that the sets S = {s0 , . . . , sm−1 } and T = {t0 , . . . , tn−1 } are
(θ, c)-separated from one another √ for a global complex center c and 0 < θ < 1
such that the value (1 − θ)δ/ mn is not small. Then by virtue of the theorem
the matrix C, having full rank, can be closely approximated by a matrix F GT of
a smaller rank ρ < min{m, n}, and therefore is ill conditioned. Furthermore if we
have such (θ, c)-separation just for a k × l submatrix Ck,l of the matrix C (this
implies that nrank(Ck,l ) ≤ ρ), then it follows that nrank(C) ≤ m − k + n − l + ρ.
Consequently if m − k + n − l + ρ < min{m, n}, then again the matrix C is
ill conditioned. This class of ill conditioned Cauchy matrices contains a large
subclass of CV and CVT matrices. In particular a CV matrix is ill conditioned if
all its knots si or all knots si of its submatrix of a large size lie far enough from
the unit circle {z : |z| = 1}, because in this case the origin serves as a global
center for the matrix or the submatrix.

6.2 Block Partition of a Cauchy Matrix

Generally neither CV matrix nor its submatrices of a large size have global
separation centers. So we approximate a CV matrix by an extended balanced
ρ-HSS matrix for a bounded integer ρ rather than by a low-rank matrix. We
first fix a reasonably large integer k and then partition the complex plane into
k congruent sectors sharing the origin 0 to induce a uniform k-partition of the
knot sets S and T and thus a block partition of the associated Cauchy matrix.
In the next subsection we specialize such partitions to the case of a CV matrix.
√
Definition 5. A(φ, φ ) = {z = exp(ψ −1) : 0 ≤ φ ≤ ψ < φ ≤ 2π}
is the semi-open arc of the unit√ circle {z : |z = 1|} √ having length φ − φ

and the√endpoints τ = exp(φ −1) and τ = exp(φ −1). Γ (φ, φ ) = {z =
r exp(ψ −1) : r ≥ 0, 0 ≤ φ ≤ ψ < φ ≤ 2π} is the semi-open sector bounded
by the two rays from the origin to the two endpoints of the arc. Γ̄ (φ, φ ) denotes
the exterior (that is the complement) of this sector.

Fix a positive integer l+ , write k = 2l+ , φq = 2qπ/k, and φq = φq+1 mod k ,
partition the unit circle {z : |z = 1|} by k equally spaced points φ0 , . . . , φk−1
into k semi-open arcs Aq = A(φq , φq ), each of the length 2π/k, and deﬁne the
semi-open sectors Γq = Γ (φq , φq ) for q = 0, . . . , k − 1. Now assume the polar
√ √
representation si = |si | exp(μi −1) and tj = |tj | exp(νj −1), and reenumer-
ate the knots in the counter-clockwise order of the angles μi and νj beginning
with the sector Γ (φ0 , φ0 ) and breaking ties arbitrarily. Induce the block par-
tition of a Cauchy matrix C = (Cp,q )k−1 p,q=0 and its partition into basic block

columns C = (C0 | . . . | Ck−1 ) such that Cp,q = si −t 1

and Cq =

j
si ∈Γp ,tj ∈Γq
1
si −tj for p, q = 0, . . . , k − 1. Now for every q deﬁne the diag-
si ∈{0,...,n−1},tj ∈Γq
onal block Σq = Cq,q , its two neighboring blocks Cq−1 mod k,q and Cq+1 mod k,q ,
Fast Approximate Computations 295

(c)
the tridiagonal block Σq (made up of the block Cq and its two neighbors), and
(c) (c)
the admissible block Nq , which complements the tridiagonal block Σq in its
basic block column Cq .

6.3 (0.5, cq )-Separation of the Extended Diagonal and Admissible

Blocks of a CV Matrix
The following lemma can be readily verified (cf. Figure 3).
√
Lemma 2. √ Suppose 0 ≤ χ ≤ φ ≤√η < φ < χ ≤ π/2 and write τ = exp(φ −1),
c = exp(η −1), and τ = exp(φ −1). Then |c − τ | = 2 sin((η − φ)/2) and the
distance from the point c to the sector Γ̄ (χ, χ ) is equal to sin(ψ), for ψ =
min{η − χ, χ − η}.
Now let C be actually a CV matrix Cs,f for a fixed complex f such that |f | = 1,
√
and so tj = f ωkj for ωk = exp(2π −1/k), j = 0, . . . , n − 1. In this case all knots
tj are lying on the arcs Aq and each arc contains n/k or %n/k& knots. Apply
Lemma 2 for φ = φq , c = cq , φ = φq , χ = 2φq − φq , and χ = 2φq − φq , and
obtain the following results.
Theorem 6. Assume a uniform k-partition of the knot sets of a CV matrix
above for k ≥ 12. Let Γq denote the union of the sector Γq and its two neighbors
on both sides, that is Γq = Γq−1 mod k ∪ Γq ∪ Γq+1 mod k , let Γ̄q denote its
exterior, and let cq denote the midpoints of the arcs Aq = A(φq , φq ) for q =
0, . . . , k − 1. Then for every q the arc Aq and the sector Γ̄q are (θ, cq )-separated
for θ = 2 sin((φq − φq )/4)/ sin(φq − φq ).

Recall that x/ sin x ≈ 1 as x ≈ 0, and therefore θ = 2 sin φ 4−φ / sin(φ − φ) ≈ 0.5
as φ − φ ≈ 0, and hereafter assume that the integer k is large enough such
(c)
that θ ≈ 0.5. Furthermore observe that for every q the admissible block Nq is
defined by the knots tj lying on the arc Aq and the knots si lying in the sector
Γ̄q , apply Theorem 5, and obtain the following result.
Corollary 3. Assume that for a sufficiently large integer k, 2k < n, a uniform
k-partition of the knot sets of an m × n CV matrix C defines admissible blocks
(c) (c) (c)
N0 , . . . , Nk−1 . Then for every q the block Nq has the q -rank at most ρq such
that bound (9) holds for θ ≈ 0.5, |E| = q , ρ = ρq , and δ = mini∈Γ̄q |si − cq |.

6.4 Approximation of a CV matrix by a Balanced ρ-HSS Matrix.

The Complexity of Approximate Computations with CV and
CVT Matrices
The angles 2π/k of the k congruent sectors Γ0 , . . . , Γk−1 are recursively doubled
in every merging. So Lemma 2 implies that δ ≤ δh = sin(3π2h /k) after the hth
merging, h = 1, . . . , l. Choose the integers k = 2l+ and l < l+ such that the
integer k/2l = 2l+ −l is reasonably large, to support separation with parameters
θ about 0.5 or less at all stages of recursive merging. Then δh ≈ 3π2h /k, and
δh+1 /δh ≈ 2 for all h. Now Corollary 3 implies the following result.
296 V.Y. Pan

Theorem 7. The CV matrix C of Corollary 3 is an extended balanced (, ρ)-

HSS matrix when the values and ρ are linked by bound (9) for θ ≈ 0.5, |E| = ,
δ = δh ≈ 3π2h /k, and h = 0, . . . , l.

Combine Corollary 2 with this theorem applied for k = 2l+ of order n/ log(n),
for ρ and log(1/) of order log(n), and for l < l+ such that the integer l+ − l is
reasonably large (verify that the assumptions of the corollary are satisﬁed), and
obtain the following result.

Theorem 8. Assume an m × n CV matrix C and let > 0 and log(1/) =

O(log(n)). Then α (C) = O((m + n) log2 (n)). If in addition m = n and the ma-
trix C is -approximated by an extended balanced ρ-HSS matrix satisfying certain
nondegeneration assumptions (see [22] on details), then β (C) = O(n log3 (n)).

Because of the dual role of the rows and columns in our constructions we can
readily extend all our results from CV matrices C to CVT matrices C T .

Corollary 4. The estimates of Theorem 8 also hold for a CVT matrix C.

Remark 3. Suppose we extend diagonal blocks to v-diagonal blocks for an odd

integer v > 3. How would this change our complexity bounds? The separation
parameter θ would increase by a factor of v, but the implied decrease of the cost
bound would be oﬀset by the increase of the overall numbers of the entries in
the diagonal blocks.

7 Extensions and Implementation

Next we employ equations (3)–(5) to extend Theorem 8 to computations with

Vandermonde matrices, their transposes, and polynomials.

Theorem 9. Suppose that we are given two positive integers m and n, a positive
, and a vector s = (si )m−1 i=0 deﬁning an m × n Vandermonde matrix V = Vs .
Write s+ = maxm−1 i=0 |s i | and let log(1/) = O(log(m + n) + n log(s+ )).
(i) Then α (V ) + α (V T ) = O((m + n)(log2 (m + n) + n log(s+ )).
(ii) Suppose that in addition m = n and for some complex f , |f | = 1, the
matrix Cs,f of equation (3) is approximated by an extended balanced (, ρ)-HSS
matrix satisfying certain nondegeneration assumptions. Then β (V ) + β (V T ) =
O(n log3 (n)).
(iii) The latter bounds on α (V ) and β (V ) can be applied also to the solution
of Problems 1 and 2 of Section 3, respectively.

The term n log(s+ ) is dominated and can be removed from the bounds on
2
log(1/) and α (V ) + α (V T ) when s+ = 1 + O( log (m+n)
n ).
Various extensions to computations with the more general class of Cauchy-like
matrices and with rational functions are covered in [21] and [22], whereas the
recipes in [9], [26], [29], and [28] simplify the implementation of the proposed
Fast Approximate Computations 297

algorithms dramatically. In particular to implement our algorithms one can com-

pute the centers cq and the admissible blocks N q of bounded ranks throughout
the merging process, but one can avoid a large part of these computations by
following the papers [9], [26], [29], and [28]. They bypass the computation of
the centers cq and immediately compute the HSS generators for the admissible
blocks Nq , deﬁned by HSS trees. The length (size) of the generators at every
merging stage (represented by a ﬁxed level of the tree) can be chosen equal to
the available upper bound on the numerical ranks of these blocks or can be
adapted empirically.

8 Conclusions

The papers [18], [9], [29], and [28] combine the advanced FMM/HSS techniques
with a transformation of matrix structures (traced back to [19]) to devise numer-
ically stable algorithms that compute approximate solution of Toeplitz, Hankel,
Toeplitz-like, and Hankel-like linear systems of equations in nearly linear arith-
metic time (versus cubic time of the classical numerical algorithms). We yield
similar results for multiplication of Vandermonde and Cauchy matrices by a
vector and the solution of linear systems of equations with these matrices (with
the extensions to polynomial and rational evaluation and interpolation). The
resulting decrease of the running time of the known approximation algorithms is
by order of magnitude, from quadratic to nearly linear. Our study provides new
insight into the subject and the background for further advances in [21], which
include the extension of our results to Cauchy-like matrices and further accel-
eration of the known approximation algorithms in the case of Toeplitz inputs.
The FMM can help decrease similarly our cost bound (6) (cf. [2]).

APPENDIX: Three Figures

In Figures 1 and 2 we mark by black color the diagonal blocks and by dark grey
color the basic neutered block columns.
In Figure 1 the pairs of smaller diagonal blocks (marked by grey color) are
merged into their diagonal unions, each made up of four smaller blocks, marked
by grey and black colors.
In Figure 2 admissible blocks are shown by grey color, each grey diagonal
block has two black neighboring blocks, and the triples of grey and black blocks
form tridiagonal blocks.
In Figure 3 we show an arc of the unit circle {z : |z = 1|} and the ﬁve line
intervals [0, τ ], [0, c], [0, τ ], [τ, c], and [c, τ ]. We also show the two line intervals
bounding the intersection of the sector Γ (ψ, ψ ) and the unit disc D(0, 1) as
well as the two perpendiculars from the center c onto these two bounding line
intervals.
298 V.Y. Pan

References

1. Bracewell, R.: The Fourier Transform and Its Applications, 3rd edn. McGraw-Hill,
New York (1999)
2. Börm, S.: Efficient Numerical Methods for Non-local Operators: H2 -Matrix Com-
pression, Algorithms and Analysis. European Math. Society (2010)
3. Bella, T., Eidelman, Y., Gohberg, I., Olshevsky, V.: Computations with Quasisep-
arable Polynomials and Matrices. Theoretical Computer Science 409(2), 158–179
(2008)
4. Bini, D.A., Fiorentino, G.: Design, Analysis, and Implementation of a Multipreci-
sion Polynomial Rootfinder. Numer. Algs. 23, 127–173 (2000)
5. Bini, D., Pan, V.Y.: Polynomial and Matrix Computations, Volume 1: Fundamental
Algorithms. Birkhäuser, Boston (1994)
6. Barba, L.A., Yokota, R.: How Will the Fast Multipole Method Fare in Exascale
Era? SIAM News 46(6), 1–3 (2013)
7. Chandrasekaran, S., Dewilde, P., Gu, M., Lyons, W., Pals, T.: A Fast Solver for
HSS Representations via Sparse Matrices. SIAM J. Matrix Anal. Appl. 29(1), 67–81
(2006)
8. Carrier, J., Greengard, L., Rokhlin, V.: A Fast Adaptive Algorithm for Particle
Simulation. SIAM J. Scientific Computing 9, 669–686 (1998)
9. Chandrasekaran, S., Gu, M., Sun, X., Xia, J., Zhu, J.: A Superfast Algorithm for
Toeplitz Systems of Linear Equations. SIAM J. Matrix Anal. Appl. 29, 1247–1266
(2007)
10. Dutt, A., Gu, M., Rokhlin, V.: Fast Algorithms for Polynomial Interpolation, Inte-
gration, and Differentiation. SIAM Journal on Numerical Analysis 33(5), 1689–1711
(1996)
11. Dewilde, P., van der Veen, A.: Time-Varying Systems and Computations. Kluwer
Academic Publishers, Dordrecht (1998)
12. Eidelman, Y., Gohberg, I.: A Modification of the Dewilde–van der Veen Method for
Inversion of Finite Structured Matrices. Linear Algebra and Its Applications 343,
419–450 (2002)
13. Eidelman, Y., Gohberg, I., Haimovici, I.: Separable Type Representations of Ma-
trices and Fast Algorithms. Birkhäuser (2013)
14. Gohberg, I., Kailath, T., Olshevsky, V.: Fast Gaussian Elimination with Partial
Pivoting for Matrices with Displacement Structure. Mathematics of Computa-
tion 64, 1557–1576 (1995)
15. Greengard, L., Rokhlin, V.: A Fast Algorithm for Particle Simulation. Journal of
Computational Physics 73, 325–348 (1987)
16. Gentelman, W., Sande, G.: Fast Fourier Transform for Fun and Profit. Full Joint
Comput. Conference 29, 563–578 (1966)
17. Lipton, R.J., Rose, D., Tarjan, R.E.: Generalized Nested Dissection. SIAM J. on
Numerical Analysis 16(2), 346–358 (1979)
18. Martinsson, P.G., Rokhlin, V., Tygert, M.: A Fast Algorithm for the Inversion of
Toeplitz Matrices. Comput. Math. Appl. 50, 741–752 (2005)
19. Pan, V.Y.: On Computations with Dense Structured Matrices, Math. of Computa-
tion, 55(191), 179–190 (1990); Also in Proc. Intern. Symposium on Symbolic and
Algebraic Computation (ISSAC 1989), 34–42. ACM Press, New York (1989)
20. Pan, V.Y.: Structured Matrices and Polynomials: Unified Superfast Algorithms.
Birkhäuser/Springer, Boston/New York (2001)
Fast Approximate Computations 299

21. Pan, V.Y.: Transformations of Matrix Structures Work Again, accepted by Linear
Algebra and Its Applications and available in arxiv:1311.3729[math.NA]
22. Pan, V.Y.: Fast Approximation Algorithms for Computations with Cauchy Matri-
ces and Extensions, in Tech. Report TR 2014005, PhD Program in Comp. Sci.,
Graduate Center, CUNY (2014),
http://tr.cs.gc.cuny.edu/tr/techreport.php?id=469
23. Pan, V.Y., Reif, J.: Fast and Eﬃcient Parallel Solution of Sparse Linear Systems.
SIAM J. on Computing 22(6), 1227–1250 (1993)
24. Rokhlin, V.: Rapid Solution of Integral Equations of Classical Potential Theory.
Journal of Computational Physics 60, 187–207 (1985)
25. Vandebril, R., Van Barel, M., Mastronardi, N.: Matrix Computations and Semisep-
arable Matrices: Linear Systems, vol. 1. The Johns Hopkins University Press, Bal-
timore (2007)
26. Xia, J.: On the Complexity of Some Hierarchical Structured Matrix Algorithms.
SIAM J. Matrix Anal. Appl. 33, 388–410 (2012)
27. Xia, J.: Randomized Sparse Direct Solvers. SIAM J. Matrix Anal. Appl. 34, 197–
227 (2013)
28. Xia, J., Xi, Y., Cauley, S., Balakrishnan, V.: Superfast and Stable Structured
Solvers for Toeplitz Least Squares via Randomized Sampling. SIAM J. Matrix
Anal. and Applications 35, 44–72 (2014)
29. Xia, J., Xi, Y., Gu, M.: A Superfast Structured Solver for Toeplitz Linear Systems
via Randomized Sampling. SIAM J. Matrix Anal. Appl. 33, 837–858 (2012)
First-Order Logic on CPDA Graphs

Pawel Parys

University of Warsaw, Warsaw, Poland

parys@mimuw.edu.pl

Abstract. We contribute to the question about decidability of ﬁrst-

order logic on configuration graphs of collapsible pushdown automata.
Our first result is decidability of existential FO sentences on configura-
tion graphs (and their ε-closures) of collapsible pushdown automata of
order 3, restricted to reachable configurations. Our second result is un-
decidability of the whole first-order logic on configuration graphs which
are not restricted to reachable configurations, but are restricted to con-
structible stacks. Our third result is decidability of first-order logic on
configuration graphs (for arbitrary order of automata) which are not re-
stricted to reachable configurations nor to constructible stacks, under an
alternative definition of stacks, called annotated stacks.

1 Introduction
Already in the 70’s, Maslov [1, 2] generalized the concept of pushdown automata
to higher-order pushdown automata (n-PDA) by allowing the stack to contain
other stacks rather than just atomic elements. In the last decade, renewed in-
terest in these automata has arisen. They are now studied not only as acceptors
of string languages, but also as generators of graphs and trees. Knapik et al. [3]
showed that trees generated by deterministic n-PDA coincide with trees gener-
ated by safe order-n recursion schemes (safety is a syntactic restriction on the
recursion scheme). Driven by the question of whether safety implies a semanti-
cal restriction to recursion schemes (which was recently proven [4, 5]), Hague et
al. [6] extended the model of n-PDA to order-n collapsible pushdown automata
(n-CPDA) by introducing a new stack operation called collapse (earlier, panic
automata [7] were introduced for order 2), and proved that trees generated by
n-CPDA coincide with trees generated by all order-n recursion schemes.
In this paper we concentrate on conﬁguration graphs of these automata. In
particular we consider their ε-closures, whose edges consist of an unbounded
number of transitions rather than just single steps. The ε-closures of n-PDA
graphs form precisely the Caucal hierarchy [8–10], which is deﬁned independently
in terms of MSO-interpretations and graph unfoldings. These results imply that
the graphs have decidable MSO theory, and invite the question about decidability
of logics in ε-closures of n-CPDA graphs.

The author holds a post-doctoral position supported by Warsaw Center of Math-
ematics and Computer Science. Work supported by the National Science Center
(decision DEC-2012/07/D/ST6/02443).

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 300–313, 2014.

c Springer International Publishing Switzerland 2014
First-Order Logic on CPDA Graphs 301

Unfortunately there is even a 2-CPDA graph that has undecidable MSO the-
ory [6]. Kartzow showed that the ε-closures of 2-CPDA graphs are tree automatic
[11], thus they have decidable first-order theory. This topic was further inves-
tigated by Broadbent [12–15]. He proved that for order 3 (and higher) the FO
theory starts to be undecidable. This can be made more precise. Let nm -CPDA
denote an n-CPDA in which we allow collapse links only of one order m. First-
order theory is undecidable already on:
– nm -CPDA graphs restricted to reachable configurations,1 when n ≥ 3, and
3 ≤ m ≤ n, and the formula is Σ2 , and
– nm -CPDA graphs restricted to reachable configurations,1 when n ≥ 4, and
2 ≤ m ≤ n − 2, and the formula is Σ1 , and
– ε-closures2 of 32 -CPDA graphs, when the formula is Σ2 , and
– 3-CPDA graphs not restricted to reachable configurations (nor to stacks
which are constructible from the empty one by a sequence of stack operation).
On the other side, Broadbent gives some small decidability results:
– for n = 2, FO is decidable even when extended by transitive closures of
quantifier free formulae;
– FO is decidable on 32 -CPDA graphs restricted to reachable configurations;
– Σ1 formulae are decidable on ε-closures of nn -CPDA graphs (for each n),
and of 32 -CPDA graphs.
In the current paper we complement this picture by three new results (answer-
ing questions stated by Broadbent). First, we prove that the existential (Σ1 ) FO
sentences are decidable on ε-closures of 3-CPDA graphs. This is almost proved in
[15]: it holds under the assumption that the 3-CPDA is luminous, which means
that after removing all order-3 collapse links from two different reachable con-
figurations, they are still different (that is, the targets of such links are uniquely
determined by the structure of the stack). We prove that each 3-CPDA can be
turned into an equivalent luminous one. The question whether Σ1 formulae are
decidable for nn−1 -CPDA and nn,n−1 -CPDA (allowing links of orders n and
n − 1) where n ≥ 4, both with and without ε-closure, remains open.
Second, we prove (contrarily to the Broadbent’s conjecture) that first-order
logic is undecidable on 4-CPDA graphs not restricted to reachable configurations,
but restricted to stacks constructible from the empty one by a sequence of stack
operations (although not necessarily ever constructed by the particular CPDA in
question). Our reduction is similar to the one showing undecidability of 3-CPDA
graphs not restricted to reachable configurations nor to constructible stacks.
Third, we prove that first-order logic is decidable (for each n) on n-CPDA
graphs not restricted to reachable configurations nor to constructible stacks,
when stacks are represented as annotated stacks. This is an alternative repre-
sentation of stacks of n-CPDA (defined independently in [16] and [17]), where
1
Thus for their ε-closures as well.
2
For ε-closures, it does not change anything whether we restrict to reachable config-
urations or not.
302 P. Parys

in an atomic element, instead of an order-k link, we keep an order-k stack; the

collapse operation simply recalls this stack stored in the topmost element. In
the constructible case, annotated and CPDA stacks amount to the same thing
(although the annotated variant offers some conveniences in expressing certain
proofs), but in the unconstructible case there is an important difference. Whilst
with an unconstructible CPDA stack each link is constrained to point to some
stack below its source, in an annotated stack it can point to an arbitrary stack,
completely unrelated to the original one. This shows up when we go back through
a pop edge: in the classical case links in the appended stack point (potentially
anywhere) inside our original stack, so we can use them to inspect any place in
the stack. On the other hand, in the annotated case we can append an arbitrary
stack, which does not give us any new information: in first-order logic we can
refer only locally to some symbols near the top of the stack.

2 Preliminaries
We give a standard definition of an n-CPDA, using the “annotated stack” repre-
sentation of stacks. We choose this representation because of Section 5, in which
we talk about all configurations with such stacks. For Sections 3 and 4 we could
choose the standard representation (with links as numbers) as well.
Given a number n (the order of the CPDA) and a stack alphabet Γ , we define
the set of stacks as the smallest set satisfying the following. If 1 ≤ k ≤ n and
s1 , s2 , . . . , sm for m ≥ 1 are (k − 1, n)-stacks, then the sequence [s1 , s2 , . . . , sm ]
is a (k, n)-stack. If a ∈ Γ , and 1 ≤ k ≤ n, and s is a (k, n)-stack or s = [] (the
“empty stack”, which, according to our definition, is not a stack), then (a, k, s)
is a (0, n)-stack. We sometimes use “k-stack” instead of “(k, n)-stack” when n is
clear from the context or meaningless.
A 0-stack (a, l, t) is also called an atom; it has label lb((a, l, t)) := a and link
t of order l. In a k-stack s = [s1 , s2 , . . . , sm ], the top of the stack is on the right.
We define |s| := m, called the height of s, and pop(s) := [s1 , . . . , sm−1 ] (which
is equal to [] if m = 1). For 0 ≤ i ≤ k, topi (s) denotes the topmost i-stack of s.
An n-CPDA has the following operations on an (n, n)-stack s:
– popk , where 1 ≤ k ≤ n, removes the topmost (k − 1)-stack (undefined when
|topk (s)| = 1);
– push1a,l , where 1 ≤ l ≤ n and a ∈ Γ , pushes on the top of the topmost 1-stack
the atom (a, l, pop(topl (s)));
– pushk , where 2 ≤ k ≤ n, duplicates the topmost (k − 1)-stack inside the
topmost k-stack;
– collapse, when top0 (s) = (a, l, t), replaces the topmost l-stack by t (undefined
when t = []);
– rewa , where a ∈ Γ , replaces the topmost atom (b, l, t) by (a, l, t).
Denote the set of all these operations as Θn (Γ ). Operation rewa is not always
present in definitions of CPDA, but we add it following [15].
First-Order Logic on CPDA Graphs 303

A position is an n-tuple x = (pn , . . . , p1 ) of natural numbers. The atom at

position x in an n-stack s is the p1 -th 0-stack in the p2 -th 1-stack in ... in the
pn -th (n − 1)-stack of s. We say that x is a position of s, if such atom exists. For
an n-stack s and a position x in s, we define s≤x as the stack obtained from s
by a sequence of pop operations, in which the topmost atom is at position x.
An (n, n)-stack s is called constructible if it can be obtained by a sequence of
operations in Θn (Γ ) from a stack with only one atom (a, 1, []) for some a ∈ Γ . It
is not difficult to see that when restricted to constructible stacks, our definition
of stacks coincides with the classical one.
Proposition 1. Let s be a constructible n-stack, and x a position of an atom
(a, l, t) in s. Then t is a proper prefix of topl (s≤x ), that is, t = [t1 , . . . , tm ] and
topl (s≤x ) = [t1 , . . . , tm ] with m < m .
An n-CPDA A is a tuple (Σ, Π, Q, q0 , Γ, ⊥0 , Δ, Λ), where Σ is a finite set of
transition labels; Π is a finite set of configuration labels; Q is a finite set of control
states containing the initial state q0 ; Γ is a finite stack alphabet containing the
initial stack symbol ⊥0 ; Δ ⊆ Q × Γ × Σ × Θn (Γ ) × Q is a transition relation;
Λ ⊆ Q × Γ × Π is a predicate relation.
A configuration of A is a pair (q, s) where q is a control state and s is
an (n, n)-stack. Such a configuration satisfies a predicate b ∈ Π just in case
(q, lb(top0 (s)), b) ∈ Λ. For c ∈ Σ, we say that A can c-transition from (q, s) to
c
(q , θ(s)), written (q, s) −
→ (q , θ(s)), if and only if (q, lb(top0 (s)), c, θ, q ) ∈ Δ.
L
For a language L over Σ we write (q, s) − → (q , s ) when (q , s ) can be reached
from (q, s) by a sequence of transitions such that the word of their labels is in
L. The initial configuration of A is (q0 , ⊥), where ⊥ is the stack containing only
one atom which is (⊥0 , 1, []).
We define three graphs with Π-labelled nodes and Σ-labelled directed edges.
The graph G ano (A) has as nodes all configurations, G con (A) only configurations
(q, s) in which s is constructible, and G(A) only configurations (q, s) such that
Σ∗
(q0 , ⊥) −−→ (q, s). In all cases we have a c-labelled edge from (q, s) to (q , s )
c
when (q, s) −→ (q , s ). Assuming that ε ∈ Σ, we can define the ε-closure of a
graph G: it contains only those nodes of G which have some incoming edge not
labeled by ε, and two nodes are connected by a c-labelled edge (where c = ε)
ε∗ c
when in G they are related by −−→. We denote the ε-closure of G(A) as G/ε (A).
We consider first-order logic (FO) on graphs as it is standardly defined, with
a unary predicate for each symbol in Π and a binary relation for each symbol in
Σ, together with a binary equality symbol. A formula is Σ1 , if it is of the form
∃x1 . . . ∃xk .ϕ, where ϕ is without quantifiers.

3 Luminosity for 3-CPDA

The goal of this section is to prove the following theorem.
Theorem 2. Given a Σ1 ﬁrst-order sentence ϕ and a 3-CPDA A, it is decidable
whether ϕ holds in G/ε (A).
304 P. Parys

In [15] (Theorem 5, and the comment below) this is proven under the restric-
tion to 3-CPDA A which are luminous. It remains to show that each 3-CPDA
A can be turned into a luminous 3-CPDA A for which G/ε (A) = G/ε (A ).
Let us recall the definition of luminosity. For an (n, n)-stack s, we write
stripln(s) to denote the (n, n)-stack that results from deleting all order-n links
from s (that is, changing atoms (a, n, p) into (a, n, []); of course we perform this
stripping also inside all links). An n-CPDA A is luminous whenever for every
two configurations (q, s), (q , s ) in the ε-closure with stripln(s) = stripln(s ) it
holds s = s .
For example, the two 2-stacks
[[(a, 1, []), (b, 1, [])], [(a, 1, []), (b, 2, s1 )], [(a, 1, []), (b, 2, s1 )]] and
[[(a, 1, []), (b, 1, [])], [(a, 1, []), (b, 2, s1 )], [(a, 1, []), (b, 2, s2 )]]
with s1 = [[(a, 1, []), (b, 1, [])]] and s2 = [[(a, 1, []), (b, 1, [])], [(a, 1, []), (b, 2, s1)]]
become identical if the links are removed. One has to add extra annotations to
the stack to tell them apart without links.
We explain briefly why luminosity is needed in the decidability proof in [15].
The proof reduces the order of the CPDA by one (a configuration of an n-CPDA
is represented as a sequence of configurations in an (n − 1)-CPDA), at the cost
of creating a more complicated formula. This reduction allows to deal with the
operational aspect of links (that is, with the collapse operation). However, there
is also the problem of preserving identities, to which first-order logic is sensitive.
For this reason, the reduction would be incorrect, if by removing links from two
different configurations, suddenly they would become equal.
Let us emphasize that we are not trying to simulate the operational behavior
of links in a 3-CPDA after removing them. We only want to construct another
3-CPDA with the same G/ε , which still uses links of order-3, but such that
stripln(s) = stripln(s ) implies s = s .
Our construction is quite similar to that from [15] (which works for such n-
CPDA which only have links of order n). The key idea which allows to extend
it to 3-CPDA which also have links of order 2, is to properly assign the value of
“generation” (see below) to atoms with links of order 2.
Fix a 3-CPDA A with a stack alphabet Γ . W.l.o.g. we assume that A “knows”
what is the link order in each atom, and that it does not perform collapse on
links of order 1. We will construct a luminous 3-CPDA A with stack alphabet

Γ = Γ × {1>, 1=, 1<} × {2>, 2=, 2<, ¬2} × {3≥, 3<, ¬3}.
To obtain luminosity, it would be enough to mark for each atom (in particular
for atoms with links of order 3), whether it was created at its position, or copied
from the 1-stack below, or copied from the 2-stack below. Of course we cannot do
this for each atom independently, since when a whole stack is copied, we cannot
change markers in all its atoms; thus some markers are needed also on top of
1-stacks and 2-stacks.
There is an additional diﬃculty that all markers should be placed as a function
of a stack, not depending on how the stack was constructed (otherwise one node
First-Order Logic on CPDA Graphs 305

in G/ε (A) would be transformed into several nodes in G/ε (A )). Thus when an
atom is created by push1a,l we cannot just mark it as created here, since equally
well an identical atom could be copied from a stack below. However, an atom
with a link pointing to the 3-stack containing all the 2-stacks below cannot
be a copy from the previous 2-stack. We can also be sure about this for some
atoms with links of order 2, namely those whose link target already contains an
atom with such “fresh” link of order 3. For these reasons, for each k-stack s (for
0 ≤ k ≤ 2), including s = [], we deﬁne gn(s), the generation of s:

gn([]) := 0,
gn([s1 , . . . , sm ]) := max(0, max gn(si )),
1≤i≤m
⎧
⎨ |t| + 1 if k = 3,
gn((a, k, t)) := gn(t) if k = 2,
⎩
−1 if k = 1.

Intuitively, gn(s) is a lower bound for the height of the 3-stack of the CPDA
at the moment when s was last modiﬁed (or created). For convenience, the
generation of an atom with a link of order 1 is smaller than the generation of
any k-stack for k > 0, and the generation of any atom with a link of order 3 is
greater than the generation of the empty stack.
For each constructible 3-stack s over Γ we deﬁne its marked variant mar(s),
which is obtained by adding markers at each position x of s as follows.
– Let i ∈ {1, 2} and r ∈ {>, =, <}, or i = 3 and r ∈ {≥, <}. If x is the topmost
position in its (i − 1)-stack (always true for i = 1), we put marker ir at x if

gn(pop(topi (s≤x ))) r gn(topi−1 (s≤x )).

– Assume that x is not topmost in its 1-stack, and the position directly above it
has assigned marker 1<. Let t be the atom just above x, and let y be the high-
est position in s≤x (in the lexicographic order) such that gn(top2 (s≤y )) <
gn(t). We put marker 2r at x if

gn(pop(top2 (s≤y ))) r gn(top1 (s≤y ));

– If no marker of the form 2r (or 3r) is placed at x, we put there ¬2 (respec-

tively, ¬3).
– Recall that when the atom at x is (a, l, t), then t is a proper preﬁx of
topl (s≤x ). We attach the markers in t so that this property is preserved,
than is in the same way as in topl (s≤x ).
For example, the marker 2< is placed at the top of some 1-stack to say that
the generation of this 1-stack is greater than of all the 1-stacks below it, in the
same 2-stack.
In the second item, notice that y always will be found, even inside the topmost
2-stack of s≤x . Intuitively, when an atom from a new generation is placed above
y, in y we keep his 2r marker. This is needed to reproduce the 2r marker when
306 P. Parys

y again becomes the topmost position. Necessarily, the marker from y will be
also present at positions x which are copies of y. Notice however that when we
remove an atom at position x using pop1 , and then we reproduce an identical
atom using push1a,k , the 2r marker has to be written there again (mar should be
a function of the stack). For this reason the x containing the 2r marker from y
is not necessarily a copy of y: we store the marker in the highest atom below an
atom from the higher generation. See Figure 1 for an example.

2<2 2=2 2=3

1< 1< 1=
2< 2> 2< 2=2 2<2 3
1> 1> 1> 1< 1> 1<
1 1 1 2<1 3 2>
1< 1< 1< 1< 1< 1>
2= 2= 2= 2= 2>
1> 1> 1> 1> 1> 1>

Fig. 1. An example 2-stack (one out of many in a 3-stack). It grows from left to right.
We indicate all 1r and 2r markers, as well as the generation of atoms (bold; no number
for generation −1). To calculate the 2< marker at positions (1, 3), (3, 3), and (4, 2)
we have used position (1, 3) as y. Observe the atom of generation 2 above an atom of
generation 3; this is possible for an atom with a link of order 2.

The key property is that the markers can be updated by a CPDA. We will
say that a CPDA deﬁnes a path if from each conﬁguration there is at most one
transition available.

Lemma 3. Let θ ∈ Θn (Γ ) be a stack operation. Then there exists a 3-CPDA Aθ

which defines a path, with stack alphabet Γ and two distinguished states q0 , q1 ,
such that for each constructible 3-stack s:
– if θ(s) exists, then there is a unique configuration with state q1 reachable by
Aθ from (q0 , mar(s)); the stack in this configuration is mar(θ(s));
– if θ cannot be applied to s, no configuration with state q1 is reachable by Aθ
from (q0 , mar(s)).
Additionally, Aθ does not use the collapse operation for θ = collapse.

Proof (sketch). This is a tedious case analysis. In most cases we just have to
apply a local change of markers. For a push, we update markers in the previously
topmost atom (depending on markers which were previously there), then we
perform the push, and then we update markers in the new topmost atom. For
popk or collapse, we perform this operation, and then we update markers in the
atom which became topmost, depending on markers in this atom, and in the
atom which was topmost previously.
There is one exception from this schema, during such push1a,k operation which
increases the generation of the topmost 1-stack, but not of the topmost 2-stack.
In this situation in the previously topmost atom we should place a 2r marker,
First-Order Logic on CPDA Graphs 307

the same as in the atom just below the bottommost atom having the highest
generation in the 2-stack. This information is not available locally; to ﬁnd this
atom (and the marker in it), we copy the topmost 2-stack (push3 ), we destruc-
tively search for this atom (which is easy using the markers), and then we remove
the garbage using pop3 .

Lemma 4. Let s and s be constructible 3-stacks such that stripln(mar(s)) =

stripln(mar(s )). Then s = s .

Proof (sketch). We prove this by induction, so we can assume that s is equal to

s everywhere except its topmost atom. Only the situation when top0 (s) has a
link of order 3 is nontrivial; then we have to prove that the generation of the
topmost atoms of s and s is the same. We notice that gn(top0 (s)) = gn(top1 (s)).
We have several cases. When the topmost atom is marked by 2=, its generation
is gn(pop(top2 (s))), which is determined by the part below top0 (s). When it is
marked by 2< and 3<, its generation is |s|. When it is marked by 2< and by
3≥, this atom was necessarily copied from the 2-stack below (and has the same
generation as the corresponding atom there). Finally, when it is marked by 2>,
this atom was necessarily copied from the 1-stack below (here, or in some 2-stack
below).

Having these two lemmas it is easy to conclude. We construct A from A

as follows. The initial stack of A should be mar(⊥). Whenever A wants to
apply a c-labelled transition with operation θ and final state q, A simulates
the automaton Aθ using ε-transitions, and then changes state to q using a c-
labelled transition. Then G/ε (A) is isomorphic to G/ε (A ): a configuration (q, s)
corresponds to (q, mar(s)). Moreover, by Lemma 4 the CPDA is luminous (notice
that the ε-closure contains only configurations with stack of the form mar(s)).

4 Unreachable Conﬁgurations with Constructible Stack

In this section we prove that the FO theory is undecidable for conﬁguration

graphs without the restriction to reachable configurations, but when we allow
only constructible stacks (contrarily to the conjecture stated in [15]). On the
other hand, in the next section we show decidability when one also allows stacks
which are unconstructible. Let us recall that the FO theory is known [15] to be
undecidable, if one also allows stacks which are unconstructible, but for the clas-
sical definition of stacks (links represented as numbers, pointing to substacks).
Our proof goes along a similar line, but additional care is needed to ensure that
the stacks used in the reduction are indeed constructible. For this reason we
need to use stacks of order 4 (while [15] uses stacks of order 3).
To be precise, we prove our undecidability result for the graph G con (A), where
A is the 4-CPDA which has a single-letter stack alphabet { }, one state, and for
each stack operation θ a θ-labelled transition performing operation θ. Since there
is only one state we identify a configuration with the (4, 4)-stack it contains.
308 P. Parys

Theorem 5. FO is undecidable on G con (A).

We reduce from the first-order theory of finite graphs, which is well-known to
be undecidable [18]. A finite graph G = (V, E) consists of a finite domain V of
nodes over which there is a binary irreflexive and symmetric relation E of edges.
We will use the domain of G con (A) to represent all possible finite graphs.
First we observe that in first-order logic we can determine the order of the
link in the topmost atom. That is, for 1 ≤ k ≤ 4 we have a formula link k (s)
which is true in configurations s such that top0 (s) = ( , k, t) with t = []. The
formulae are defined by
collapse
link k (s) := ¬link i (s) ∧ ∃t.(s −−−−→ t ∧ eq k (s, t)),
1≤i<k

where eq k (s, t) states that s and t differ at most in their topmost k-stacks, that
is eq 4 (s, t) := true, and for 1 ≤ k ≤ 3,
popk+1 popk+1 popk+1
eq k (s, t) := ∃u.(s −−−−→ u ∧ t −−−−→ u) ∨ (eq k+1 (s, t) ∧ ¬∃u.(s −−−−→ u)).
Next, we define two sets of substacks of a 4-stack s which can be easily accessed
in FO. The set vis4 (s) contains s and the stacks t for which in top3 (s) there is
the atom ( , 4, t). The set vis3 (s) contains s and the stacks t for which pop(s) =
pop(t) and in top2 (s) there is the atom ( , 3, top3 (t)). When s is constructible,
the property that t ∈ visk (s) (for k ∈ {3, 4}) can be expressed by the FO formula
popk collapse
visk (s, t) := ∃u.(u −−−→ s ∧ link k (u) ∧ u −−−−−→ t).
To every constructible 4-stack s we assign a finite graph G(s) as follows.
Its nodes are V := vis4 (s). Two nodes t, u ∈ V are connected by an edge when
top0 (v) = ( , 4, u) for some v ∈ vis3 (t), or top0 (v) = ( , 4, t) for some v ∈ vis3 (u).
Lemma 6. For each non-empty finite graph G there exists a constructible (4, 4)-
stack sG (in the domain of G con (A)) such that G is isomorphic to G(sG ).
Proof. Suppose that G = (V, E) where V = {1, 2, . . . , k}. The proof is by in-
duction on k. If k = 1, as sG we just take the (constructible) 4-stack consisting
of one atom ( , 1, []). Assume that k ≥ 2. For 1 ≤ i < k, let Gi be the sub-
graph of G induced by the subset of nodes {1, 2, . . . , i}, and let si := sGi be the
stack corresponding to Gi obtained by the induction assumption. We will have
pop4 (sG ) = sk−1 , and top3 (sG ) = tk , where 3-stacks ti for 0 ≤ i ≤ k are defined
by induction as follows. We take t0 = []. For i > 0 we take take pop(ti ) = ti−1 ,
and the topmost 2-stack of ti consists of one or two 1-stacks. Its first 1-stack is
[( , 1, []), ( , 4, s1 ), ( , 4, s2 ), . . . , ( , 4, sk−1 ), ( , 3, t0 ), ( , 3, t1 ), . . . , ( , 3, ti−1 )].
If (i, k) ∈ E we only have this 1-stack; if (i, k) ∈ E, in top2 (ti ) we also have the
1-stack
[( , 1, []), ( , 4, s1 ), ( , 4, s2 ), . . . , ( , 4, si )].
First-Order Logic on CPDA Graphs 309

We notice that vis4 (sG ) contains stacks s1 , s2 , . . . , sk−1 , sG , and vis3 (sG ) con-
tains all stacks obtained from sG by replacing its topmost 3-stack by ti for some
i ≥ 1. It follows that G(sG ) is isomorphic to G.
It is also easy to see that sG is constructible. We create it out of sk−1 by
performing push4 and appropriately changing the topmost 3-stack. Notice that
the bottommost 1-stack of top3 (sk−1 ) starts with ( , 1, []), ( , 4, s1), ( , 4, s2 ), . . . ,
( , 4, sk−2 ). We uncover this preﬁx using a sequence of popi operations. We
append ( , 4, sk−1 ) and ( , 3, t0 ) by push1,4 and push1,3 . If (1, k) ∈ E, we create
the second 1-stack using push2 and a sequence of pop1 . This already gives the
ﬁrst 2-stack. To append each next (i-th) 2-stack, we perform push3 ; we remove
the second 1-stack if it exists using pop2 ; we append ( , 3, ti−1 ) using push1,3 ; if
necessary we create the second 1-stack using push2 and a sequence of pop1 .

We have a formula stating that two nodes x, y of G(s) are connected by an

edge:
collapse
E(x, y) :=∃z.(vis3 (x, z) ∧ link 4 (z) ∧ z −−−−→ y)∨
collapse
∨ ∃z.(vis3 (y, z) ∧ link 4(z) ∧ z −−−−→ x).

Given any sentence ϕ over ﬁnite graphs, we construct a formula ϕ (s) by

replacing all occurrences of the atomic binary predicate xEy with the for-
mula E(x, y) from above, and relativising all quantiﬁers binding a variable x
to vis4 (s, x). Then for each constructible (4, 4)-stack s, ϕ holds in G(s) if and
only if ϕ (s) holds in G con (A). Thus ϕ holds in some ﬁnite graph if and only if
it holds in the empty graph or ∃s.ϕ (s) holds in G con (A). This completes the
reduction and hence the proof of Theorem 5, since it is trivial to check whether
ϕ holds in the empty graph.

5 Unreachable Conﬁgurations with Annotated Stack

In this section we prove decidability of ﬁrst order logic in the graph of all con-
ﬁgurations, not restricted to constructible stacks.

Theorem 7. Given a ﬁrst-order sentence ϕ and a CPDA A, it is decidable

whether ϕ holds in G ano (A).

For the rest of the section fix a CPDA A of order n, with stack alphabet
Γ . The key idea of the proof is that an FO formula can inspect only a small
topmost part of the stack, and check equality of the parts below. Thus instead
of valuating variables into stacks, it is enough to describe how the top of the
stack looks like, and which stacks below are equal. When the size of the described
top part of the stack is fixed, there are only finitely many such descriptions. For
each quantifier in the FO sentence we will be checking all possible descriptions
of fixed size (of course the size of the described part has to decrease with each
next variable). To formalize this we define generalized stacks.
310 P. Parys

Consider the following operations on stacks:

– for each k ∈ {1, . . . , n} operation firstk (·) which takes a (k − 1)-stack s and
returns the k-stack [s],
– for each k ∈ {1, . . . , n} operation appk (·, ·) which takes a k-stack [s1 , . . . , sm ]
and a (k − 1)-stack s, and returns the k-stack [s1 , . . . , sm , s],
– for each a ∈ Γ and k ∈ {1, . . . , n} operation cons(a, k, []) (without argu-
ments) which returns the 0-stack (a, k, []),
– for each a ∈ Γ and k ∈ {1, . . . , n} operation cons(a, k, ·) which takes a k-stack
s and returns the 0-stack (a, k, s).
We notice that stacks can be seen as elements of the free multisorted algebra
with these operations and no generators (we have n + 1 sorts, one for each order
of stacks). In the proof we need elements of the free multisorted algebra with
these operations and some generators: for each sort k we have an infinite set of
constants, denoted xk1 , xk2 , . . . . Elements of this algebra will be called generalized
stacks. Thus a generalized stack is a stack in which we have replaced some prefixes
of some stacks by constants. Generalized stacks will be denoted by uppercase
letters.
For each generalized stack S and each d ∈ N we define the set ts=d (S) of
stacks. These are substacks of S which are at “distance” exactly d from the top.
The definition is inductive: we take ts=0 (S) := {S},
⎧
⎪
⎪ {T } if S = firstk (T ),
⎪
⎪
⎨ {T, U } if S = appk (T, U ),
ts=1 (S) := ∅ if S = cons(a, k, []),
⎪
⎪
⎪
⎪ {T } if S = cons(a, k, T ),
⎩
∅ if S is a constant,

and ts=d+1 (S) := T ∈ts=d (S) ts=1 (T ) for d ≥ 1. Moreover we define ts≤d (S) :=

e≤d ts=e (S) and for d ∈ N ∪ {∞}, ts<d (S) := e<d ts=e (S).
A valuation is a (partial) function v mapping constants to stacks, preserving
the order. Such v can be generalized to a homomorphism v mapping generalized
stacks to stacks. Obviously, to compute v(S) it is enough to define v only on
constants appearing in S.
In FO we can also talk about equality of stacks, so we are interested in val-
uations which applied to different generalized stacks give different stacks. This
is described by a relation "→d which is defined as follows. Let S1 , . . . , Sm (for
m ≥ 0) be generalized stacks, and s1 , . . . , sm stacks, and d ∈ N. Then we say
that (S1 , . . . , Sm ) "→d (s1 , . . . , sm ) if there exists a valuation v such that
– si = v(Si ) for each
i, and
– no element of i ts<d (Si ) is a constant (that is, all constants are at depth
at least d), and
– for each T, U ∈ i ts≤d (Si ) such that v(T ) = v(U ), it holds T = U .
Example 8. Consider the following 2-stack:
s := [[(a, 1, []), (b, 1, []), (c, 1, [])], [(a, 1, []), (b, 1, []), (c, 1, [])]].
First-Order Logic on CPDA Graphs 311

It can be written as:

app2 ﬁrst2 app1 (app1 (ﬁrst1 (cons(a, 1, [])), cons(b, 1, [])), cons(c, 1, [])) ,

app1 (app1 (ﬁrst1 (cons(a, 1, [])), cons(b, 1, [])), cons(c, 1, [])) .

It holds

(app2 (ﬁrst2 (app1 (x1 , cons(c, 1, []))), app1 (x1 , cons(c, 1, [])))) "→2 (s),

where the valuation maps x1 into app1 (first1 (cons(a, 1, [])), cons(b, 1, [])). On the
other hand it does not hold that (app2 (first2 (y 1 ), app1 (x1 , cons(c, 1, [])))) "→2 (s);
the problem is that the two 1-stacks of s were equal, while they are different
in this generalized 2-stack. This shows that we cannot just cut our stack at
one, fixed depth, and place constants in all places at this depth. In fact, for
some stacks, we need to place some constants exponentially deeper than other
constants. As a consequence, our algorithm will be nonelementary (we have to
increase d exponentially with each quantifier).

When a formula (having already some generalized stacks assigned to its free
variables) starts with a quantifier, as a value of the quantified variable we want
to try all possible generalized stacks which are of a special form, as described by
the following definition. Let S1 , . . . , Sm , Sm+1 (for m ≥ 0) be generalized stacks,
let d ∈ N, and d := d + 2d+1 . We say that Sm+1 is d-normalized with respect
to (S1 , . . . , Sm ) if

– no element of ts<d (Sm+1 ) is a constant, and

– each element of ts=d (Sm+1 ) is

• a “fresh” constant, i.e. not belonging to i≤m ts<∞ (Si ), or

• an element of i≤m ts≤d (Si ), or
• an element of ts<d (Sm+1 ).

The key point is that for ﬁxed S1 , . . . , Sm there are only ﬁnitely many d-
normalized generalized stacks Sm+1 (up to renaming of fresh constants), so we
can try all of them. The next two lemmas say that to consider d-normalized
generalized stacks is exactly what we need.

Lemma 9. Let S1 , . . . , Sm (for m ≥ 0) be generalized stacks, let s1 , . . . , sm and

sm+1 be stacks, let d ∈ N and d := d + 2d+2 . Assume that (S1 , . . . , Sm ) "→d
(s1 , . . . , sm ). Then there exists a generalized stack Sm+1 d-normalized with re-
spect to (S1 , . . . , Sm ) and such that (S1 , . . . , Sm , Sm+1 ) "→d (s1 , . . . , sm , sm+1 ).

Proof (sketch). Let d := d + 2d+1 (this is the d used in the deﬁnition in d-

normalization, and is smaller than d ). Let v be a valuation witnessing that
(S1 , . . . , Sm ) "→d (s1 , . . . , sm ), i.e. such that si = v(Si ) for each i ≤ m. For
each stack s ∈ ts≤d (sm+1 ) we deﬁne by induction a generalized stack repl(s):
312 P. Parys

– if s ∈ ts<d (sm+1 ), we take

⎧
⎪
⎪ firstk (repl(t)) if s = firstk (t),
⎨
app (repl(t), repl(u)) if s = appk (t, u),
k
repl(s) :=
⎪
⎪ cons(a, k, []) if s = cons(a, k, []),
⎩
cons(a, k, repl(t)) if s = cons(a, k, t);

– otherwise, if s = v(S) for some S ∈ i≤m ts≤d (Si ), we take repl(s) := S;
– otherwise, we take repl(s) := xs , where xs is a fresh constant.
At the end we take Sm+1 := repl(sm+1 ). It remains to check in detail that such
Sm+1 satisfies all parts of the definitions. Notice that there can exist stacks s
which are simultaneously in ts<d (sm+1 ) and ts=d (sm+1 ) (so it is not true that
we apply one of the last two cases to each stack at depth d).

Lemma 10. Let S1 , . . . , Sm and Sm+1 (for m ≥ 0) be generalized stacks, let

s1 , . . . , sm be stacks, let d ∈ N and d := d+2d+2. Assume that (S1 , . . . , Sm ) "→d
(s1 , . . . , sm ), and that Sm+1 is d-normalized with respect to (S1 , . . . , Sm ). Then
there exists a stack sm+1 such that (S1 , . . . , Sm , Sm+1 ) "→d (s1 , . . . , sm , sm+1 ).

Proof (sketch). It is enough to map the constants appearing in Sm+1 but not in
Si for i ≤ m into “fresh” stacks, such that none of them is a substack of any
other nor of any si for i ≤ m (the latter is easy to obtain by taking these stacks
to be bigger than all si ).

We also easily see that the atomic FO formulae can evaluated on the level of
generalized stacks related by the "→n+1 to the actual stacks.

Lemma 11. Let S, T be generalized n-stacks, and let s, t be n-stacks. Assume

that (S, T ) "→n+1 (s, t). Then taking as input S and T (even not knowing s and
t) one can compute:
– lb(top0 (s)),
– whether s = t, and
– for any stack operation θ ∈ Θn (Γ ), whether it holds θ(s) = t.

Using the last three lemmas we can check whether an FO sentence holds
in G ano (A). Indeed, for each quantifier we check all possible generalized stacks
which are d-normalized with respect to the previously fixed variables, for big
enough d (depending on the quantifier rank of the formula, so that the induction
works fine), and we deal with atomic formulae using Lemma 11.

References
1. Maslov, A.N.: The hierarchy of indexed languages of an arbitrary level. Soviet
Math. Dokl. 15, 1170–1174 (1974)
2. Maslov, A.N.: Multilevel stack automata. Problems of Information Transmis-
sion 12, 38–43 (1976)
First-Order Logic on CPDA Graphs 313

3. Knapik, T., Niwiński, D., Urzyczyn, P.: Higher-order pushdown trees are easy. In:
Nielsen, M., Engberg, U. (eds.) FOSSACS 2002. LNCS, vol. 2303, pp. 205–222.
Springer, Heidelberg (2002)
4. Parys, P.: Collapse operation increases expressive power of deterministic higher
order pushdown automata. In: Schwentick, T., Dürr, C. (eds.) STACS. LIPIcs,
vol. 9, pp. 603–614. Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik (2011)
5. Parys, P.: On the significance of the collapse operation. In: LICS, pp. 521–530.
IEEE (2012)
6. Hague, M., Murawski, A.S., Ong, C.-H.L., Serre, O.: Collapsible pushdown au-
tomata and recursion schemes. In: LICS, pp. 452–461. IEEE Computer Society
(2008)
7. Knapik, T., Niwiński, D., Urzyczyn, P., Walukiewicz, I.: Unsafe grammars and
panic automata. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung,
M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1450–1461. Springer, Heidelberg
(2005)
8. Caucal, D.: On infinite terms having a decidable monadic theory. In: Diks, K.,
Rytter, W. (eds.) MFCS 2002. LNCS, vol. 2420, pp. 165–176. Springer, Heidelberg
(2002)
9. Cachat, T.: Higher order pushdown automata, the Caucal hierarchy of graphs and
parity games. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.)
ICALP 2003. LNCS, vol. 2719, pp. 556–569. Springer, Heidelberg (2003)
10. Carayol, A., Wöhrle, S.: The Caucal hierarchy of infinite graphs in terms of logic
and higher-order pushdown automata. In: Pandya, P.K., Radhakrishnan, J. (eds.)
FSTTCS 2003. LNCS, vol. 2914, pp. 112–123. Springer, Heidelberg (2003)
11. Kartzow, A.: Collapsible pushdown graphs of level 2 are tree-automatic. Logical
Methods in Computer Science 9(1) (2013)
12. Broadbent, C.H.: On collapsible pushdown automata, their graphs and the power
of links. PhD thesis, University of Oxford (2011)
13. Broadbent, C.H.: Prefix rewriting for nested-words and collapsible pushdown au-
tomata. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP
2012, Part II. LNCS, vol. 7392, pp. 153–164. Springer, Heidelberg (2012)
14. Broadbent, C.H.: The limits of decidability for first order logic on CPDA graphs. In:
Dürr, C., Wilke, T. (eds.) STACS. LIPIcs, vol. 14, pp. 589–600. Schloss Dagstuhl,
Leibniz-Zentrum fuer Informatik (2012)
15. Broadbent, C.H.: On first-order logic and CPDA graphs. Accepted to Theory of
Computing Systems
16. Broadbent, C.H., Carayol, A., Hague, M., Serre, O.: A saturation method for col-
lapsible pushdown systems. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer,
R. (eds.) ICALP 2012, Part II. LNCS, vol. 7392, pp. 165–176. Springer, Heidelberg
(2012)
17. Kartzow, A., Parys, P.: Strictness of the collapsible pushdown hierarchy. In: Rovan,
B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464, pp. 566–577.
Springer, Heidelberg (2012)
18. Trachtenbrot, B.: Impossibility of an algorithm for the decision problem in finite
classes. Doklady Akad. Nauk. 70, 569–572 (1950)
Recognizing Two-Sided Contexts in Cubic Time

Max Rabkin

Saarland University
max.rabkin@gmail.com

Abstract. Barash and Okhotin (“Grammars with two-sided contexts”,

Tech. Rep. 1090, Turku Centre for Computer Science, 2013) recently
introduced conjunctive grammars with two-sided contexts, and gave a
variant of Valiant’s algorithm
which recognizes
the languages they gen-
erate in O |G|2 · n3.3727 time and O |G| · n2 space. We use a new nor-
mal
form and techniques from logic programming to improve this to
O |G| · n3 , without increasing the space usage.

1 Introduction

Barash and Okhotin (2012) introduced grammars with one-sided contexts, ex-
tending context-free and conjunctive grammars with quantifiers allowing pro-
ductions to depend on their context on the left-hand side. They later extended
this to allow context on both sides (Barash and Okhotin, 2013).
This notion of context is purely syntactic, and therefore quite different to for-
mulations of context in terms of rewriting systems, such as the context-sensitive
grammars. To determine whether a substring is matched by a non-terminal in a
grammar with two-sided contexts, one need only examine the string. In a classi-
cal context-sensitive, this question does not make sense: one must consider not
only the string itself, but also the appropriate step of a particular derivation.
Derivations in grammars with two-sided contexts, like in context-free grammars,
do not require any notion of time-steps. This means one can draw meaningful
parse trees (actually directed acyclic graphs) for derivations.
In the first paper, Barash and Okhotin gave a cubic-time algorithm for rec-
ognizing the languages generated by these grammars which is similar to the
Cocke-Kasami-Younger (CKY) algorithm for context-free languages. In the sec-
ond,
they gave a recognition
algorithm in the style of Valiant (1975) which takes
O |G| · n2 space and O |G|2 · nω+1 time, where O(nω ) is the complexity of
multiplying n × n boolean matrices. The best known bound is 2 ≤ ω < 2.3727
(Williams, 2012). Their algorithm works on grammars in binary normal form; it
is not known whether this form can be achieved with sub-exponential blow-up.
We give an algorithm which takes only O |G| · n3 time. This algorithm is
derived from the definition of two-sided contexts using deduction systems: we
give a normal form (separated normal form) for the grammars such that the
corresponding deduction can be efficiently computed using standard techniques.
However, some specialization is required to keep the space usage quadratic. Our

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 314–324, 2014.

c Springer International Publishing Switzerland 2014
Recognizing Two-Sided Contexts in Cubic Time 315

normal form allows ε-rules and unit rules, and so can be achieved with only
linear increase in the size of the grammar.
The speed-up can be seen as arising from the improved resolution of dependen-
cies between sub-problems. In the context-free case, the syntactic properties of
a string (i.e., the non-terminals which match it) depend only on its substrings.
With left contexts, the properties of a string-in-context uvw depend on the
substrings of uv. In either case, we can resolve the dependencies by comput-
ing the properties of substrings in a ﬁxed order. As Barash and Okhotin (2013)
noted, the dependencies between substrings are more complicated in the case
of grammars with two-sided contexts; and therefore their algorithm requires
O(|N | · n) passes to ensure all properties have been recognized, where N is the
set of non-terminals in the grammar. In our algorithm the dependencies are
resolved implicitly, so only a single pass is required.

2 Definitions
Informally, a string in context is simply a string considered as a part of a larger
string. Formally, a string-in-context is a triple of strings uvw, where u and w
are the left and right context of v. That is, uvw is v seen as a substring of uvw.
Concatenation of strings-in-context must respect contexts: the concatenation of
uvv w and uvv w is uvv w, but is not defined for pairs that are not of
this form. In particular, our concept of contexts cover the whole string: the left
context includes everything before the substring, and the right context includes
everything after.
For example, if x = hello world and y = hello world, then we have
xy = hello world. On the other hand, hello world and hello earth can-
not be concatenated. A string with empty contexts, such as εhelloε, cannot be
a strict substring of any string.
We give here an informal description of grammars with two-sided contexts;
we will give a formal definition below. These grammars are similar to context-
free grammars but allow rules with conjunction and context quantifiers. Non-
terminals and sentential forms of the grammar should be considered as properties
of terminal strings-in-context; we will not use rewriting systems. For example,
the rule A → BC should be read as meaning that if x and y have properties B
and C respectively, then xy has property A.
A string-in-context has the property α & β if it has properties α and β.
There are four context quantifiers: α denotes the property of strings which are
preceded by a string with property α, i.e., uvw has property α if εuvw has
property α; denotes a property of a string including its left context, i.e., uvw
has property α if εuvw has property α; the right context quantifiers, and
, have symmetrical interpretations. For example, a rule A → BC & D means
that a string has property A if it is of the form BC and is followed by a string
with property D.
Definition 1. A grammar with two-sided contexts is a tuple G = (Σ, N, R, S)
where Σ is the terminal alphabet, N is the non-terminal alphabet, S ∈ N is the
316 M. Rabkin

start symbol, and R is a set of rules of the form

A → α1 & · · · & αm ,

where α1 , . . . , αm are strings in (Σ ∪ N )∗ , each possibly preceded by a quantiﬁer

from {, , , }. Each αi is called a conjunct.

We retain the term non-terminal due to its familiarity, but in the absence of
rewriting systems perhaps atomic property would be more descriptive.
We deﬁne the semantics grammars with two-sided contexts by means of de-
duction systems. We will use atoms of the form [α, uvw] where u, v, w ∈ Σ ∗
and α has the form of a right-hand side of rule.
Deﬁnition 2. Let G = (Σ, N, R, S) be a grammar with two-sided contexts. Then
we create a deduction system /G with the axiom schemes:

/G [ε, uεw]
/G [a, uaw] a∈Σ

and the following schemes for deduction rules:

[α, uvw] /G [A, uvw] A→α∈R

[α, uv1 v2 w], [β, uv1 v2 w] /G [αβ, uv1 v2 w]
[α, uvw], [β, uvw] /G [α & β, uvw]
[α, εuvw] /G [α, uvw]
[α, εuvw] /G [α, uvw]
[α, uvwε] /G [α, uvw]
[α, uvwε] /G [α, uvw].

We deﬁne the language of α as

LG (α) = {uvw : u, v, w ∈ Σ ∗ and /G [α, uvw]}

and the language of G as

L(G) = {v ∈ Σ ∗ : εvε ∈ LG (S)}.

If we forbid rules with context quantiﬁers, the deduction systems /G deﬁnes

the conjunctive languages (Okhotin, 2013); if rules with conjunctions are also
forbidden, then /G deﬁnes the ordinary context-free languages.

3 Examples

The following grammar recognises the language {x$x : x ∈ {a, b}∗ }.

Recognizing Two-Sided Contexts in Cubic Time 317

Example 3. Let G = ({a, b, $}, {S, J, A, L, Ca, Cb }, R, S) where R contains the

following rules

S→$
S → LSA
J →ε
J → AJ
⎫
A→σ ⎪
⎪
⎪
L → σ & Cσ ⎬
σ ∈ {a, b}.
Cσ → $Jσ ⎪
⎪
⎪
⎭
Cσ → ACσ A

If we ignore the context condition on the rule for L, this grammar simply matches
{x$y : |x| = |y| and x, y ∈ {a, b}∗ }. However, Cσ matches strings of the form
x$yσz where |x| = |z|, so the context condition on the rule for L ensures that,
for every σ in the left-hand part, a σ also appears in the corresponding position
(counting from the end) of the right-hand part. Thus L(G) = {x$x : x ∈ {a, b}∗}.
The following example derivation shows that a$a ∈ L(G):

/G [a, εa$a]
/G [$, a$a]
/G [a, a$aε]
/G [ε, a$εa]
[ε, a$εa] /G [J, a$εa]
[$, a$a], [J, a$εa] /G [$J, a$a]
[$J, a$a], [a, a$aε] /G [$Ja, a$aε]
[$Ja, a$aε] /G [Ca , a$aε]
[Ca , a$aε] /G [Ca , εa$a]
[a, εa$a], [Ca , εa$a] /G [a & Ca , εa$a]
[a & Ca , εa$a] /G [L, εa$a]
[$, a$a] /G [S, a$a]
[L, εa$a], [S, a$a] /G [LS, εa$a]
[a, a$aε] /G [A, a$aε]
[LS, εa$a], [A, a$aε] /G [LSA, εa$aε]
[LSA, εa$aε] /G [S, εa$aε].

This derivation corresponds to the parse tree in Fig. 1.

Barash and Okhotin (2013, Sect. 5) give an upper bound for the running
time of their Algorithm 1, but do not show that this is tight. To show that our
algorithm is strictly faster, we give a grammar for which their algorithm requires
Θ(n) passes.
318 M. Rabkin

Fig. 1. A parse tree for a$a

Example 4. Let G = ({a}, {A, S}, R, S) where R contains the following rules:
S→a
S → AS
A → a & S.
This grammar is in both binary and separated normal forms, and matches the
language aa∗ .
In a run of Barash and Okhotin’s Algorithm 1, each pass examines the sub-
strings of w from left to right (by ending position). However, to deduce anything
in G, we must work from right to left. Therefore, the algorithm can only make
one deduction in each pass, but Θ(n) deductions are required.

4 Normal Form
The recognition algorithm of Barash and Okhotin (2013) requires grammars to
be in a normal form similar to Chomsky normal form, called binary normal form.
Our algorithm will not be adversely affected by ε-productions, but cannot
efficiently handle productions with more than one concatenation (e.g. A → BC &
DE or A → BCD), so we will use a normal form which excludes these cases.
Definition 5. A grammar with two-sided contexts G = (Σ, N, R, S) is said to
be in separated normal form (SNF) if each rule in R is in one of the following
forms:
A→ε
A→a
A → QB
A → B1 B2
A → B1 & · · · & Bm
Recognizing Two-Sided Contexts in Cubic Time 319

where a is a terminal symbol, A, B, B1 , . . . , Bm (m ≥ 1) are non-terminals, and

Q ∈ {, , , }, and m ≥ 1.
The size |G| of a grammar G (which we shall use as a complexity parameter)
is the total number of symbols appearing in its rules.
Our algorithm could be made to work with a much more general normal form
but this form will simplify its presentation. A grammar with two-sided contexts
can easily be transformed into SNF using standard techniques.
Theorem 6. If G = (Σ, N, R, S) is a grammar with two-sided contexts then it
can be eﬀectively transformed into an equivalent grammar G in separated normal
form with |G | ∈ O(|G|).
For a grammar G, we will call the equivalent SNF grammar generated by the
above procedure SNF(G).
Ultimately, we are only interested in proofs of atoms where the property
α is a single non-terminal (in particular, [S, εxε]). However, the deduction
system /G can prove many unnecessary atoms not of this form. Therefore we
give an alternative system which only uses atoms of this form, by combining
the derivation steps used to obtain them. For example, when there is a rule
A → BC, instead of the proof [B, x], [C, y] /G [BC, xy] /G [A, xy], we can prove
[B, x], [C, y] /G [A, xy] directly.
Deﬁnition 7. Let G be a grammar in SNF. Then /G is the deduction system
with the following schemes:

/G [A, uεw] A→ε

/G [A, uaw] A → a, a ∈ Σ
[B1 , uv1 v2 w], [B2 , uv1 v2 w] /G [A, uv1 v2 w] A → B1 B2
[B1 , uvw], . . . , [Bm , uvw] /G [A, uvw] A → B1 & · · · & Bm
[B, εuvw] /G [A, uvw] A → B
[B, εuvw] /G [A, uvw] A → B
[B, uvwε] /G [A, uvw] A → B
[B, uvwε] /G [A, uvw] A → B

In fact, the system /G is exactly the system of Barash and Okhotin (2013,
Deﬁnition 2) restricted to SNF; they give a single scheme for deduction rules,
with a complex side-condition, which entails all of ours.
Lemma 8. If G = (Σ, N, R, S) is a grammar in SNF, A ∈ N , and x is a
string-in-context over Σ, then /G [A, x] if and only if /G [A, x].
Proof. Completeness of /G with respect to /G can be shown by a straightforward
induction on the length of proofs in /G .
Soundness can be seen from the fact that every axiom in /G is obtained from
an axiom of /G followed by a single deduction step, and every deduction rule is
obtained by combining rules of /G .
320 M. Rabkin

5 Algorithm
To recognize a string x of length n, we will
construct a set of Horn clauses
from a grammar G in SNF of size O |G| · n3 . We can then apply a linear-time
algorithm for Horn satisfiability (Dowling and Gallier, 1984). The Horn clauses
are derived from the deduction rules used to define the semantics of a grammar
with two-sided contexts, interpreted as logical implications, except that we only
include those which relate to substrings(-in-context) of x. The restriction to SNF
ensures the size of this set is at most cubic in n.
The idea of parsing by interpreting grammar rules as logical implications is
due to Colmerauer and Kowalski (Kowalski, 1979, Chapter 3). Shieber et al.
(1995) noted that when this interpretation is applied to context-free grammars
in Chomsky normal form, one can obtain an efficient algorithm similar to the
CKY algorithm.
We define a set of axioms from which we can prove that a string belongs to
the language of a given grammar with two-sided contexts.
We use the symbols ⇒ and ∧ for logical implication and conjunction, respec-
tively. In a logical formula of the form φ1 ∧ · · · ∧ φm ⇒ ψ, we call φ1 , . . . , φm
the antecedents and ψ the consequent. We will sometimes consider a formula
consisting of a single propositional atom ψ to be an implication with consequent
ψ and no antecedents.
Definition 9. Let G = (Σ, N, R, S) be a grammar with two-sided contexts in
SNF, and let x ∈ Σ ∗ have length n. We will use only atoms of the form [A, uvw]
where A ∈ N and uvw = x.
Construct Axioms(G, x) = r∈R f (r), where
f (A → ε) = {[A, uεw] : uw = x}
f (A → a) = {[A, uaw] : uaw = x}
f (A → B1 B2 ) = {[B1 , uv1 v2 w] ∧ [B2 , uv1 v2 w] ⇒ [A, uv1 v2 w]
: uv1 v2 w = x}
f (A → B1 & · · · & Bm ) = {[B1 , uvw] ∧ · · · ∧ [Bm , uvw] ⇒ [A, uvw]
: uvw = x}
f (A → B) = {[B, εuvw] ⇒ [A, uvw] : uvw = x}
f (A → B) = {[B, εuvw] ⇒ [A, uvw] : uvw = x}
f (A → B) = {[B, uvwε] ⇒ [A, uvw] : uvw = x}
f (A → B) = {[B, uvwε] ⇒ [A, uvw] : uvw = x}

and Ax(G, x) = r∈R f (r), where R ⊂ R is the set of rules not of the form
A → B1 B2 .
Axioms(G, x) is essentially a restatement of the definition of /G . Thus, by
Lemma 8, uvw ∈ LG (A) if and only if Axioms(G, 3 uvw) / [A, uvw].
From each rule of
G we constructed
a set of O n axioms, so the total size of
Axioms(G, x) is O |G| · n3 atoms. The only rules which lead to a cubic
number

of axioms are those of the form A → B1 B2 , so Ax(G, x) has size O |G| · n2 .
Recognizing Two-Sided Contexts in Cubic Time 321

Each axiom is a Horn clause, and the set of atoms which can be deduced
from a set of Horn clauses can be computed in linear time (Dowling and Gallier,
1984, Algorithm 2). Combining such a procedure with the SNF transformation
and Axioms(G, x) yields Algorithm 1 for recognizing the language of a grammar
with two-sided contexts.

Algorithm 1.
function HornConsequences(Γ )
Requires: Γ is a set of Horn clauses
Returns: {ψ : Γ ψ, ψ is an atom}
Q ← {ψ : ψ ∈ Γ, ψ is an atom} — queue of atoms to resolve
P ←Q — set of atoms that have been deduced
while Q = ∅ do
remove any atom φ from Q
mark every occurrence of φ in Γ
for φ1 ∧ · · · ∧ φm ⇒ ψ ∈ Γ with φ1 , . . . , φm all marked do
remove φ1 ∧ · · · ∧ φm ⇒ ψ from Γ
if ψ ∈ P then
add ψ to Q
add ψ to P
return P
function Recognize(G, x)
(Σ, N, R, S) ← SNF(G)
Γ ← Axioms((Σ, N, R, S), x)
return [S, εxε] ∈ HornConsequences(Γ )

Intuitively, this algorithm deduces each atom as soon as it is possible to do so:

if we have an implication, we deduce its consequent as soon as all its antecedents
have been deduced. In contrast, the CKY-style algorithm essentially considers
each axiom in a fixed order.
If implemented
using the appropriate data structures, this algorithm
requires

O |G| · n3 time, but it is not space efficient: the set Γ requires O |G| · n3 space.
We will therefore specialise HornConsequences to represent concatenation
rules separately: Algorithm 2 simulates Algorithm 1 on the set Ax(G, x) but
handles concatenations separately, ensuring only quadratic space usage. We will
give a more concrete description of this algorithm, to demonstrate that it has
the desired complexity.
Each set starts[A][i] is intended to contain the atoms corresponding to a match
of non-terminal A starting from position i; ends[A][i] does the same for ending po-
sitions. These are needed to handle concatenations. For efficiency, uvw should
be represented as the pair (|u|, |uv|) rather than as a triple of strings; this re-
duces size of atoms from O(n) to O(log n), and allows string manipulation to be
replaced by arithmetic.
322 M. Rabkin

Lemma 10. Let G = (Σ, N, R, S) be a grammar in SNF, x ∈ Σ ∗ and ψ an

atom. Then Axioms(G, x) / ψ if and only if ParseSnf(G, x) adds ψ to P .

Proof. We prove the forward direction (completeness) by induction on the length

of derivations.
It is clear that an atom is added to P if and only if Proved([A, uvw]) is
called, and if it is, then [A, uvw] is added exactly once to P , Q, starts[A][|u|]
and ends[A][|uv|]; and it is eventually removed from Q in line 23.
If ψ can be deduced from Axioms(G, x) in zero steps, then it is an element
of Axioms(G, x) and is added to the sets by line 21.
Now let φ1 ∧ · · · ∧ φm ⇒ ψ ∈ Axioms(G, x), and suppose for induction that
Axioms(G, x) / φj and Proved(φj ) has been called for each j ∈ 1, . . . , m. If
φ1 ∧ · · · ∧ φm ⇒ ψ = axiom[i] ∈ Ax(G, x), then each φj is eventually removed
from Q, and at those points antecedents[i] is decremented; since antecedents[i]
was initialized to m, it reaches 0, and Proved is called on consequent[i] = ψ.
Otherwise, φ1 ∧ · · · ∧ φm ⇒ ψ arose from a concatenation rule A → B1 B2 ,
so it has the form [B1 , uv1 v2 w] ∧ [B2 , uv1 v2 w] ⇒ [A, uv1 v2 w]. By the in-
duction hypothesis and the deﬁnition of Proved, eventually [B1 , uv1 v2 w] ∈
ends[B1 ][|uv1 |] and [B2 , uv1 v2 w] ∈ starts[B2 ][|uv1 |]. When both of these have
been removed from Q, Proved([A, uv1 v2 w]) will be called on line 30 or 33.
By induction, every atom which can be deduced from Axioms(G, x) is even-
tually added to P .
In the reverse direction, it is not hard to see that the algorithm makes no
unsound inferences, and therefore the algorithm is sound.

6 Evaluation

Lemma 11. ParseSnf(G, x) has running time O |G| · |x|3

Proof. All the initialisation clearly takes O |G| · |x|2 time.
Lines 19, 25, 30 and 33 are in each of the innermost loops.
Line 19 and line 25 are each run at most once
for each appearance of an atom
as an antecedent in Ax(G, x), i.e. O |G| · |x|2 times.
Lines 30 and 33 are run at most once for each tuple (r, u, v, v , w ) where r is
a rule of the form A → BC and uvv w = x. There are O |G| · |x|3 such tuples.

Lemma 12. ParseSnf(G, x) requires O |G| · |x|2 space.

Proof. The algorithm treats O |G| · |x|2 atoms; each of these can appear at
most once in each of P , Q, Starts and Ends.
There is one entry in appearances[φ] for each appearance of φ as an antecedent
in Ax(G, x), so the total size of the appearances array is at most the size of
Ax(G, x). There is one entry in each of antecedents and consequent for each
axiom in Ax(G, x).
Since |Ax(G, x)| ∈ O |G| · |x|2 , and the variables
not
mentioned above use
only constant space, the total space usage is O |G| · |x|2 .
Recognizing Two-Sided Contexts in Cubic Time 323

Algorithm 2.
1: function ParseSnf(G, x)
2: Requires: G = (Σ, N, R, S) in SNF, x ∈ Σ ∗
3: Returns: {ψ : Axioms(G, x) ψ, ψ is an atom}
4: Q←∅ — queue of atoms to resolve
5: P ←∅ — atoms that have been deduced
6: starts[A][i] ← ∅ for A ∈ N and i ∈ {0, . . . , |x|}
7: ends[A][i] ← ∅ for A ∈ N and i ∈ {0, . . . , |x|}
8: appearances[φ] ← ∅ for each atom φ — axioms where φ is an antecedent
9: procedure Proved(ψ = [A, uvw])
10: if ψ ∈ P then
11: add ψ to Q and P
12: add ψ to starts[A][|u|] and ends[A][|uv|]

13: axiom[1, . . . , k] ← Ax(G, x)

14: for i ∈ {1, . . . , k} do
15: (φ1 ∧ · · · ∧ φm ⇒ ψ) ← axiom[i]
16: antecedents[i] ← m — number of unresolved antecedents
17: consequent[i] ← ψ
18: for j ∈ {1, . . . , m} do
19: add i to appearances[φj ]
20: if m = 0 then
21: Proved(ψ)

22: while Q = ∅ do
23: remove any atom φ = [B, uvw] from Q
24: for i ∈ appearances[φ] do
25: antecedents[i] ← antecedents[i] − 1
26: if antecedents[i] = 0 then
27: Proved(consequent[i])
28: for A → BC ∈ R do
29: for [C, uvv w ] ∈ starts[C][|uv|] do
30: Proved([A, uvv w ])
31: for A → CB ∈ R do
32: for [C, u v vw] ∈ ends[C][|u|] do
33: Proved([A, u v vw])
34: return P
35: function Recognize(G, x)
36: G = (Σ, N, R, S) ← SNF(G)
37: return [S, εxε] ∈ ParseSnf(G , x)
324 M. Rabkin

Combining ParseSnf with the SNF transformation, we obtain our main

result.
Theorem 13. The language
of a grammar
with two-sided contexts G can be
recognized in O |G| · n3 time and O |G| · n2 space.

Proof. By Lemma 10, 11 and 12, the function Recognise from Algorithm 2
recognizes such languages with the desired complexity.

Parse trees correspond to proofs of [S, εxε], so a parse forest can be obtained
by modifying ParseSnf to record all the ways in which each atom is proved.

Acknowledgements. The author is grateful to Alexander Okhotin and the

anonymous reviewers for their suggestions on improving the presentation of this
paper.

References
Barash, M., Okhotin, A.: Deﬁning contexts in context-free grammars. In: Dediu, A.-
H., Martı́n-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 106–118. Springer,
Heidelberg (2012)
Barash, M., Okhotin, A.: Grammars with two-sided contexts. Tech. Rep. 1090, Turku
Centre for Computer Science (2013),
http://tucs.fi/publications/view/?pub_id=tBaOk13b
Dowling, W.F., Gallier, J.H.: Linear-time algorithms for testing the satisﬁability of
propositional Horn formulae. The Journal of Logic Programming 1(3), 267–284
(1984)
Kowalski, R.: Logic for problem-solving. North-Holland Publishing Co. (1979),
http://www.doc.ic.ac.uk/~ rak/
Okhotin, A.: Conjunctive and boolean grammars: the true general case of the context-
free grammars. Computer Science Review 9, 27–59 (2013)
Shieber, S.M., Schabes, Y., Pereira, F.C.N.: Principles and implementation of deductive
parsing. The Journal of Logic Programming 24(1-2), 3–36 (1995)
Valiant, L.G.: General context-free recognition in less than cubic time. Journal of Com-
puter and System Sciences 10(2), 308–315 (1975)
Williams, V.V.: Multiplying matrices faster than Coppersmith-Winograd. In: Proceed-
ings of the 44th Symposium on Theory of Computing, STOC 2012, pp. 887–898.
ACM (2012)
A Parameterized Algorithm for Packing
Overlapping Subgraphs

Jazmı́n Romero and Alejandro López-Ortiz

David R. Cheriton School of Computer Science, University of Waterloo, Canada

Abstract. Finding subgraphs with arbitrary overlap was introduced as

the k-H-Packing with t-Overlap problem in [10]. Speciﬁcally, does a given
graph G have at least k induced subgraphs each isomorphic to a graph H
such that any pair of subgraphs share at most t vertices? This problem
has applications in the discovering of overlapping communities in real
networks. In this work, we introduce the ﬁrst parameterized algorithm
for the k-H-Packing with t-Overlap problem when H is an arbitrary
graph of size r. Our algorithm combines a bounded search tree with
a greedy localization technique and runs in time O(r rk k(r−t−1)k+2 nr ),
where n = |V (G)|, r = |V (H)|, and t < r. Applying similar ideas we
also obtain an algorithm for packing sets with possible overlap which is
a version of the k-Set Packing problem.

1 Introduction
Discovering communities in large and complex networks such as social, citation,
or biological networks has been of interest on the last decades. A community
is a part of the network in which the nodes are more highly interconnected to
each other than to the rest. For example, a community can represent a group of
friends in social networks or a protein complex in biological networks. Naturally,
one person can have different groups of friends, and one protein can belong to
more than one protein complex. Therefore, in realistic scenarios, communities
can share members. The problem of finding communities with possible overlap
was formalized as the k-H-Packing with t-Overlap problem in [10].
In the k-H-Packing with t-Overlap problem, the goal is to find at least k
induced subgraphs (the communities) in a graph G (the network) such that each
subgraph is isomorphic to a graph H (a community model) and each pair of
subgraphs overlap in at most t vertices (the shared members)[10]1 .
The k-H-Packing with t-Overlap problem is NP-Complete [10]. Therefore, we
are interested in the design of algorithms that provide a solution in f (k)nO(1)
running time, i.e., fixed-parameter algorithms or FPT-algorithms. In other words,
the running time of a fixed-parameter algorithm is polynomial in the input size
n but possibly exponential or worse in a specified parameter k, usually the
size of the solution. Thus, our fundamental goal is to explore how the overlap
1
To follow standard notation with packing and isomorphism problems, the meaning
of the graphs G and H have been exchanged with respect to their meaning in [10].

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 325–336, 2014.

c Springer International Publishing Switzerland 2014
326 J. Romero and A. López-Ortiz

inﬂuences the complexity of this problem. Interestingly enough, according to

sociological studies, well-defined communities should usually have low overlap
[1] which naturally leads to a fixed-parameter treatment in a practical setting.
The formal definition of our studied problem is as follows.

The k-H-Packing with t-Overlap problem

Input : A graph G and non-negative integers k and t, where
t < |V (H)|.
Parameter : k
Question: Does G contain at least k induced subgraphs K =
{Q∗1 , . . . , Q∗k } where each Q∗i is isomorphic to a graph H and
|V (Q∗i ) ∩ V (Q∗j )| ≤ t, for any pair Q∗i , Q∗j ?

The number of vertices in G and H will be denoted as n and r, respectively.

Note that r and t are constants.
Related Results. The study of the parameterized complexity of the k-H-Packing
with t-Overlap problem was initiated in [10]. The authors introduced a clique-
crown decomposition for this problem when H is a clique of size r. With that
decomposition, an algorithm that reduces any instance of the problem to a size
of 2(rk − r) is achieved (i.e., a kernel of size 2(rk − r)). As far as we know, there
are no other parameterized results for this problem.
The k-H-Packing with t-Overlap problem is a generalization of the well-
studied k-H-Packing problem which seeks for vertex-disjoint subgraphs instead
of overlapping subgraphs. The parameterized complexity of packing triangles and
stars of n leaves was studied in [4] and [9], respectively. Fellows et al. [5] provided
an O(n + 2|V (H)|k ) running time algorithm for packing an arbitrary graph H.
The latest result is a kernel of size O(k |V (H)|−1 ) for packing an arbitrary graph
H by Moser [8].
Another problem related to the k-H-packing with t-Overlap, when H is a
clique, is the cluster editing problem. This problem consists of modifying a graph
G by adding or deleting edges such that the modified graph is composed of
a vertex-disjoint union of cliques. Some works have considered overlap in the
cluster editing problem [2,3]. Fellows et al. [3] allow that each vertex of the
modified graph can be contained in at most s maximal cliques.
The problem of finding one community of size at least r in a given network
is also related to the k-H-Packing with t-Overlap problem. The most studied
community models are cliques and some relaxations of cliques. Parameterized
complexity results for this problem can be found in [6,7,11]. Overlap has not yet
been considered under this setting.
Our Results. In this work, we provide an O(rrk k (r−t−1)k+2 nr ) running time
algorithm for the k-H-packing with t-Overlap for any arbitrary graph H of size
r and any overlap value t < r. Our algorithm is a non-trivial generalization of
the search tree algorithm to find disjoint triangles presented by Fellows et al.
[4]. In addition, we introduce a novel analysis to handle the need of overlap, and
as we shall see an overlap of as little as one vertex significantly increases the
A Parameterized Algorithm for Packing Overlapping Subgraphs 327

complexity of the problem. Even though the k-H-packing problem (the vertex-
disjoint version) is well studied, our search tree algorithm is the ﬁrst one to
consider variable overlap between subgraphs.
The classical k-Set Packing problem asks for at least k pairwise disjoint sets
from a given collection of sets. We introduce a variant of this problem that would
allow overlap between the sets. We show how our search tree algorithm would
work for this variant as well. To the best of our knowledge, this variant of the
k-set packing problem has not been studied before.
This paper is organized as follows. In Section 2, we introduce the terminology
and notation used in the paper. Section 3 describes the details of our search
tree algorithm. In Section 4, we explain how the search tree algorithm can be
applied for a variant of the k-Set Packing problem. Finally, Section 5 states the
conclusion of this work.

2 Terminology and Notation

Graphs in this paper are undirected and simple. For a graph G, we denote as
V (G) and E(G) its sets of vertices and edges, respectively. The notation G[L]
represents the subgraph induced in G by a set of vertices L. Two subgraphs S
and P are vertex-disjoint if they do not share vertices, i.e., V (S) ∩ V (P ) = ∅.
Otherwise, we say that S and P overlap in |V (S) ∩ V (P )| vertices. We extend
this terminology when S and P are sets of vertices instead.
An H-Packing with t-Overlap L is a set of induced subgraphs L = {Q1 , Q2 , ...}
of G where each Qi is isomorphic to a graph H, and every pair Qi , Qj overlap
in at most t vertices. The size of the H-Packing with t-Overlap is the number of
subgraphs in L. If |L| ≥ k then L is a k-H-Packing with t-Overlap, also called
a k-solution. We use the letter K to represent a k-solution. An H-Packing with
t-Overlap M is maximal, if there is no isomorphic subgraph to H in G that can
be added to M. A maximal packing with t-Overlap will be called a maximal
solution. For simplicity, a subgraph isomorphic to H is called an H-subgraph.
For any pair of disjoint sets of vertices S and P such that |S|+ |P | = r, we say
that P is a sponsor of S (or vice versa) if G[S ∪P ] is an H-subgraph. Sponsors(S)
is the set of sponsors of S in V (G). We use the term complete S to represent
the selection of a sponsor P in Sponsors(S) to update S as S ∪ P . The resulting
H-subgraph G[S ∪ P ] is called an H-completed subgraph, and it is denoted as
S · P . Figure 1 shows an instance of the 4-K4 -Packing with 1-Overlap (H = K4 ).
In this instance, the sets of vertices {1, 3} and {9, 10} form a K4 G[{1, 3, 9, 10}];
thus, the set {9, 10} is a sponsor of the set {1, 3}. Other sponsors of {1, 3} are
{10, 11} and {2, 4}.
Let Q be a collection of k sets of vertices each set of size at most r. We say
that Q is completed into a k-solution, if each set in Q is completed into an
H-subgraph such that any pair of H-completed subgraphs overlap in at most t
vertices. The sponsors that complete the sets in Q are called feasible sponsors.
328 J. Romero and A. López-Ortiz

3 An FPT-Algorithm for the H-Packing with t-Overlap

Problem
The key point of the FPT-algorithm is based on the following lemma which
is a generalization of Observation 2 in [4]. This lemma proves how a maximal
solution intersects with a k-solution of the graph G, assuming that G has one.

Lemma 1. Let M and K be a maximal H-Packing with t-Overlap and a k-H-

Packing with t-Overlap, respectively. We claim that any Q∗ ∈ K overlaps with
some Q ∈ M in at least t + 1 vertices, i.e., |V (Q∗ ) ∩ V (Q)| ≥ t + 1. Furthermore,
there is no pair Q∗i , Q∗j ∈ K for i = j that overlap in the same set of vertices
with Q i.e., V (Q∗i ) ∩ V (Q) = V (Q∗j ) ∩ V (Q).

Proof. Assume by contradiction that there is an H-subgraph Q∗ ∈ K such that

for any H-subgraph Q ∈ M, the overlap between them is at most t, i.e., |V (Q∗ )∩
V (Q)| ≤ t. However, in this case, we could add Q∗ to M, and M ∪ Q∗ is an
H-Packing with t-Overlap contradicting the assumption of the maximality of
M.
To prove the second part of the lemma, assume by contradiction that there is a
pair Q∗i , Q∗j ∈ K that overlap in the same set of vertices for all Q ∈ M. However,
by the ﬁrst part of the lemma, we know that there is at least one Q ∈ M such
that |V (Q∗i ) ∩ V (Q)| ≥ t + 1. This would imply that |V (Q∗i ) ∩ V (Q∗j )| ≥ t + 1,
a contradiction since the k-H-Packing with t-Overlap problem does not allow
overlap greater than t.

Lemma 1 states that every H-subgraph of a k-solution K overlaps in at least

t + 1 vertices with some H-subgraph of a maximal solution M. Let us call this
intersection of t + 1 vertices a feasible seed. A feasible seed is shared only by a
unique pair composed of an H-subgraph of M and an H-subgraph of K. Thus,
a feasible seed is contained in only one H-subgraph of a k-solution K. Note that
a subset of at most t vertices of a feasible seed could belong to more than one
H-subgraph of a k-solution. Based on that we have the following observation.

Observation 1. If the graph G has a k-H-packing with t-Overlap K, each H-

subgraph in K has at least one feasible seed. Furthermore, each feasible seed has
at least one feasible sponsor.

The left-side of Figure 1 shows an example of this intersection for the 4-K4 -
Packing with 1-Overlap problem (k = 4, t = 1). The two K4 ’s of the maximal
solution are indicated by solid lines while the four K4 ’s of the k-solution are
indicated with solid and dashed lines. Edges of the graph that do not belong to
any of these solutions are indicated in light gray. The seeds in the example have
size t + 1 = 2. A collection of four feasible seeds is {{1, 3}, {3, 4}, {5, 6}, {6, 8}};
the vertices of these seeds are ﬁlled in the ﬁgure.
We now proceed to describe the algorithm for the k-H-Packing with t-Overlap
problem. First, we obtain a maximal solution M of G. If the number of H-
subgraphs in M is at least k then M is a k-H-Packing with t-Overlap and the
A Parameterized Algorithm for Packing Overlapping Subgraphs 329

9 1 2 5 6 13

10 3 4 7 8 14
root
11 1 3 ...
12
Qgr ={ } Qi ={{1,3},{3,4},{5,6},{6,8}}

10 11

Ql ={{1,3},{3,4,11},{5,6},{6,8}}

Fig. 1. On the left side, an intersection of a k-solution with a maximal solution of the
4-K4 -Packing with 1-Overlap problem (k = 4 and t = 1). The seeds in this example
are of size t + 1 = 2. To the right, part of the search tree corresponding to the instance
to the left.

algorithm stops. Otherwise, we want to ﬁnd a k-solution using k feasible seeds

(Observation 1).
Since we do not know if a set of t + 1 vertices of an H-subgraph Q ∈ M is
a feasible seed, we would need to consider all the distinct sets of t + 1 vertices
from V (Q). To avoid confusion, we call these sets simply seeds. Two seeds are
distinct if they differ by at least one vertex. Therefore, seeds can overlap in at
most t vertices. The set of all possible seeds from all H-subgraphs of M is called
the universe of seeds. Observe that there are not duplicate seeds in this universe.
Otherwise, there would be at least one pair of H-subgraphs of M with the same
seed implying that they overlap in at least t + 1 vertices. Now, we need to select
k feasible seeds from that universe and complete them into a k-solution.
To do that, we create a search tree where at each node i there is a collection
Qi of k sets of vertices. Each set represents an H-subgraph that would be part
of the k-solution. Initially, the root has a child i for each possible selection of
k seeds from the universe of seeds. The collection Qi is initialized with these
k seeds, i.e., Qi = {si1 , . . . , sik }. Since we are trying all possible selections of k
seeds, at least one child i should have k feasible seeds each one contained in a
different H-subgraph of a k-solution, assuming G has one. In that case, Qi can
be completed into a k-solution.
The right side of Figure 1 shows one child of the root of the search tree
created with the maximal solution in the left. This child has the collection Qi =
{{1, 3}, {3, 4}, {5, 6}, {6, 8}}.
Next, for each child i of the root, the goal is to try to complete the collection
Qi into a k-H-Packing with t-Overlap. To do that, we need to find a feasible
sponsor for each seed in Qi .
Before explaining how a seed can be completed, we next introduce a simple
way to discard some of the sponsors that a seed sij ∈ Qi could have. We say
that a sponsor A of the seed sij is ineligible to complete sij if the collection
330 J. Romero and A. López-Ortiz

Qi = {si1 , . . . , sij ∪ A, . . . , sik } cannot be completed into a k-solution. Otherwise,

A is eligible.
A sponsor A of sij is ineligible if sij · A overlaps in at least t + 1 vertices
with some other seed sil ∈ Qi , where sil = sij . For example, in Figure 1, {2, 4}
and {5, 7} are ineligible sponsors of {1, 3} and {6, 8}, respectively. Discarding
ineligible sponsors ensures that the overlap between any pair of seeds is at any
stage at most t.

Lemma 2. If for a seed sij ∈ Qi all its sponsors are ineligible then Qi cannot
be completed into a k-solution.

Now, we explain how we attempt to complete the collection Qi in greedy

fashion. Let Qgr be the set of H-subgraphs found greedily at child i. Initially,
Qgr = ∅. At iteration j, greedy searches an eligible sponsor A for sij such that
sij ·A overlaps in at most t vertices with every H-subgraph in Qgr . If such eligible
sponsor exists, greedy adds sij · A to Qgr , i.e., Qgr = Qgr ∪ sij · A; if not, greedy
stops. If all the seeds of Qi were completed then we have a k-H-Packing with
t-Overlap.
If the greedy algorithm cannot ﬁnd a k-H-Packing with t-Overlap, then the
next step will be to increase the size of one of the seeds of Qi by one vertex. Let sij
be the seed in Qi that could not be completed by the greedy algorithm. Greedy
could not complete sij because for each eligible sponsor A of sij , the H-subgraph
sij · A overlaps in more than t vertices with at least one H-subgraph in Qgr . For
example, in Figure 1, greedy completed {1, 3} with the sponsor {10, 11}, and
{1, 3} · {10, 11} is added to Qgr . After that greedy cannot complete {3, 4}. The
seed has only one eligible sponsor {11, 12} but {3, 4} · {11, 12} overlaps in two
vertices with {1, 3} · {10, 11} in Qgr . Note that the sponsor {1, 2} of the seed
{3, 4} is ineligible.
If Qi can be completed into a k-solution, then at least one of the eligible
sponsors of sij is feasible. We do not know which one it is, but we are certain
that this feasible sponsor shares some vertices with at least one H-subgraph in
Qgr . We will use this intersection of vertices to ﬁnd such a feasible sponsor.
Let us denote as I(Qgr , EligibleSponsors(sij )) the set of vertices that are shared
between each eligible sponsor of sij and each H-subgraph in Qgr . We will increase
the size of the seed sij by one vertex by creating a child l of the node i for each
vertex vl ∈ I(Qgr , EligibleSponsors(sij )).
The collection of seeds at child l, Ql is the same as the collection of its
parent i with the update of the seed sij as sij ∪ vl , i.e., Ql = {si1 , . . . , sij−1 , sij ∪
vl , sij+1 , . . . , sik }. After that, the greedy algorithm is repeated at the collection
Ql of child l (Qgr starts empty again). In the example of Figure 1, there is one
child of the node i where the seed {3, 4} is updated with the vertex 11. Observe
that after this update, the sponsor {10, 11} is ineligible to complete {1, 3}.
The algorithm stops attempting to complete Qi or some collection of a de-
scendant of the node i when there are no eligible sponsors for one of the seeds
(Lemma 2). Otherwise, one of the leaves of the tree would have a k-H-Packing
A Parameterized Algorithm for Packing Overlapping Subgraphs 331

with t-Overlap. In the example of Figure 1, one leaf of the search tree would com-
plete the collection into the solution K = {{1, 3} · {9, 10}, {3, 4} · {11, 12}, {5, 6} ·
{4, 7}, {6, 8} · {13, 14}}.

3.1 Correctness

The next basic lemma will help us to prove that the algorithm is correct.

Lemma 3. If A is a sponsor of the seed sij then A\X is a sponsor of sij ∪ X,

for any X ⊂ A.

A node i of the tree is feasible, if its collection Qi is composed of k feasible

seeds, and each feasible seed is contained in a diﬀerent H-subgraph of a k-solution
K. That is, Qi = {si1 , . . . , sik }, K = {Q∗1 , . . . , Q∗k }, and sij ⊆ V (Q∗j ) for 1 ≤ j ≤ k.
A child of the root that is feasible is called a feasible child. By Observation 1,
the collection Qi of a feasible node can be completed into a k-solution.

Lemma 4. The Feasible Path. If the graph G has a k-solution then there is at
least one path P on the subtree rooted at a feasible child where each node in P
is feasible.

Proof. The lemma states that there is a path P =< i1 , i2 , . . . , im > such that
each node il in P has the collection Qil = {si1l , . . . , sikl } where sijl ⊆ V (Q∗j ) for
1 ≤ j ≤ k and 1 ≤ l ≤ m. Note that i1 = i.
We prove this claim by induction on the number of levels. At level 1, the ﬁrst
node of the path is a feasible child of the root and the claim follows.
By Observation 1, each feasible seed in Qi has at least one feasible sponsor.
Let {A∗1 , . . . , A∗k } be the set of feasible sponsors where A∗j is the feasible sponsor
of sij , i.e., K = {si1 · A∗1 , . . . , sik · A∗k }.
Next we show that for the remaining nodes of P , the seeds are updated only
with vertices from the feasible sponsors.
Let us suppose that the greedy algorithm failed to complete a seed sij1 at
level 1. The seed sij1 has at least one feasible sponsor A∗j . Since A∗j is feasible
then it is eligible. Greedy failed to complete sij1 , if the H-subgraph formed with
sij1 and each eligible sponsor of sij1 (including the feasible one A∗j ) overlaps in
more than t vertices with an H-subgraph completed by greedy (i.e., in the set
Qgr ). Therefore at level 2, there is at least one child of the node i1 where
the seed sij1 is updated with one vertex from the feasible sponsor A∗j . That is,
Qi2 = {si11 , . . . , sij1 ∪ v ∗ , . . . , sik1 } where v ∗ ∈ A∗j , and the lemma follows.
Now, let us assume that the lemma is true up to the level h − 1. We next show
that lemma holds for level h. Let < i2 , . . . , ih−1 > be the subpath of P where
one seed in each node is updated with one vertex from the feasible sponsors.
By contradiction, suppose that at level h − 1 greedy could not complete the
i i
seed sjh−1 but there is no child of the node ih−1 such that sjh−1 is updated with
a vertex from the feasible sponsor.
332 J. Romero and A. López-Ortiz

Let us suppose that U ∗ is the set of vertices that has been added to sij1 during
the levels 1, . . . , h − 1. By our assumption, the seed sij1 is feasible, and it has
been updated only with vertices from the feasible sponsor. Therefore, U ∗ ⊂ A∗j .
By Lemma 3, A∗j \U ∗ is a sponsor of sjh−1 . Therefore, the only way that none
i

of the children of ih−1 would update sjh−1 with a vertex from A∗j \U ∗ is if A∗j \U ∗
i

i
is not an eligible sponsor of sjh−1 .
However, this would imply that the H-subgraph sjh−1 · (A∗j \U ∗ ) = sij1 · A∗j
i

overlaps in more than t vertices with some feasible seed sil1 ∈ Qi1 . This contra-
dicts our assumption that the collection Qi1 can be completed into a solution
(feasible child).

Lemma 5. The collection of seeds of the last node l (a leaf ) of a feasible path
is a k-solution.
Proof. By Lemma 4, we know that the collection of seeds of every node in a
feasible path is updated only with vertices from the feasible sponsors. Therefore,
eventually a seed is completed into an H-subgraph of the k-solution. Let us
suppose that k − m seeds, where m ≥ 1, were completed in this way.
We claim now that greedy ﬁnds an eligible sponsor for the remaining m seeds.
Assume by contradiction that greedy failed to complete one of these m seeds and
suppose it is slj . Since the collection of a feasible child can be completed into a
solution, and the node l is in a feasible path, then slj should have at least one
eligible sponsor. In that case, the next step would be to create children of the
node l updating the seed slj , contradicting the assumption that the node l is a
leaf.

Theorem 1. The search tree algorithm ﬁnds one k-H-Packing with t-Overlap,
if the graph G has at least one.
Proof. Since we are creating a child of the root for each possible selection of k
seeds from the universe of seeds, at least one child is feasible. By Lemma 4, there
is at least one feasible path that starts at this child. By Lemma 5, a k-solution
is given in the collection of the last node of this path.

3.2 Analysis
2
k(t+1)
e r
Lemma 6. The root has at most t+1 children.
r
Proof. There are t+1 distinct sets of t+1 vertices (seeds) from the set of vertices

t+1
of an H-subgraph. Since |M| ≤ k − 1 then there are at most (k − 1) t+1 er

seeds in the universe of seeds.

From the root of the search tree, we create a node i for each possible
selection
of k seeds of the universe of seeds. Therefore, the root has |universeof seeds|
2
k(t+1)
k
(k−1)( er )t+1 er t+1
e(k−1)( t+1 )
k
children. Hence, k
t+1
≤ k ≤ e r
t+1 .

A Parameterized Algorithm for Packing Overlapping Subgraphs 333

Lemma 7. The height of the search tree is at most (r − t − 1)k − 1.

Proof. If greedy cannot complete the collection Qi of a node i, then we create
at least one child of i. In this new child, the first seed not completed by greedy,
let’s say sij , is updated with one vertex. Since at the first level |sij | = t + 1, then
sij could be completed into an H-subgraph in at most r − (t + 1) levels (which
are not necessarily consecutive).
By Lemma 5, at most k − 1 H-subgraphs are completed in this way since the
last k H-subgraph is completed in greedy fashion. Therefore, we need at most
(r − t − 1)k − 1 levels to complete the k seeds in Qi .

Lemma 8. A node i at level h can have at most r(k − 1) children if h ≤ (r −
t − 2)k and at most r(k − m − 1) where m = h − (r − t − 2)k if h > (r − t − 2)k.
Proof. There is a child of i for each vertex in I(Qgr , EligibleSponsors(sij )), where
sij is the first seed not completed by greedy. This is the set of vertices shared by
each eligible sponsor of sij with the H-subgraphs completed by greedy, i.e., Qgr .
Therefore, I(Qgr , EligibleSponsors(sij )) ≤ r|Qgr |.
The greedy algorithm needs to complete only the seeds in Qi that are not
already completed H-subgraphs. Therefore, |Qgr | ≤ |Qi | − m where m is the
number of seeds of Qi that are completed H-subgraphs.
Assuming all the seeds of Qi − m were completed by greedy but the last one,
i.e., |Qgr | ≤ |Qi | − m − 1, then I(Qgr , EligibleSponsors(sij )) ≤ r(|Qi | − m − 1) ≤
r(k − m − 1).
Now, we need to determine how many seeds of Qi are already H-subgraphs
at level h, i.e., the value of m. Since a seed sij can be completed into an H-
subgraph in at most r − (t + 1) levels but these are not necessarily consecutive,
h
then we cannot guarantee that r−(t+1) is the number of H-subgraphs at level
h. Therefore, in the worst-case one vertex is added to each seed of Qi level by
level. In this way, in at most (r − t − 2)k levels every seed of Qi could have r − 1
vertices, and at level (r − t − 2)k + 1 we could obtain the first seed completed
into an H-subgraph. After that level, the remaining seeds of Qi are completed
into H-subgraphs by adding one vertex.

Theorem 2. The k-H-Packing with t-Overlap can be solved in
O(rrk k (r−t−1)k+2 nr ) time.
Proof. By Lemmas 7 and 8, the size of the tree is
k(t+1) (r−t−2)k
2 2
k−r+t
e2 r
r(k − 1) + r(k − i)
t+1 i=1 i=1
k(t+1)
e2 r
< (r(k − 1))(r−t−1)k−1 .
t+1
A maximal solution M can be computed in time O(krnr ), which is also the
required time to compute the list of sponsors of the seeds. The greedy algorithm
runs in O(k 2 rnr ).

334 J. Romero and A. López-Ortiz

3.3 Overlapping vs. Disjoint Subgraphs

The number of children that every node can have (Lemma 8) is substantially
reduced when overlap is not allowed between the H-subgraphs. At level h, h
vertices have been added to the seeds of the collection Qi . When overlap is not
allowed, these vertices cannot update any other seed in Qi , and there are at
most (r − 1 − h)(k − 1) children of a node i.
If we increase the overlap, even as little as t = 1, the argument above cannot
be applied. At level 1, one vertex has been added to some seed sij ∈ Qi , let us
suppose it is the vertex v. Since t = 1, this vertex could be added to a different
seed at level 2. Assume for example that in the 1, . . . , k levels, the vertex v has
been added to the seeds si1 , . . . , sik , respectively. Therefore at the k + 1 level, the
vertex v cannot be added to any other seed in the collection Qi . However at that
level, we cannot distinguish between the same vertex added to k + 1 different
sets, or k + 1 different vertices added to the same set, or a combination of both.

4 An FPT-Algorithm for the k-Set Packing with

t-Overlap Problem
In this section, we propose an algorithm for the k-Set Packing with t-Overlap
problem, which is deﬁned as follows.

The k-Set Packing with t-Overlap problem

Instance: A collection S of distinct sets, each of size at least
t + 1 and at most r, drawn from a universe U. In addition,
there is no pair Qi , Qj in S such that Qi ⊂ Qj . Non-negative
integers k and t, where t < r.
Parameter : A non-negative integer k.
Question: Does S contain at least k sets K = {Q∗1 , . . . , Q∗k }
where |Q∗i ∩ Q∗j | ≤ t, for any pair Q∗i , Q∗j ?

The classical k-Set Packing problem asks for at least k pairwise disjoint sets.
Therefore, the k-Set Packing with t-Overlap problem generalizes that condition
of the problem.
We now apply the search tree algorithm of Section 3 to this problem. First,
Lemma 1 can be restated as follows.
Lemma 9. Let M and K be a maximal Set Packing with t-Overlap and a k-Set
Packing with t-Overlap, respectively. We claim that any Q∗ ∈ K overlaps with
some Q ∈ M in at least t + 1 elements, i.e., |Q∗ ∩ Q| ≥ t + 1. Furthermore, there
is no pair Q∗i , Q∗j ∈ K for i = j that overlaps in the same set of elements with
Q ∈ M, i.e., Q∗i ∩ Q = Q∗j ∩ Q.
Since the size of each set in S is at least t + 1 and at most r, and there
is no pair Qi , Qj in S such that Qi ⊂ Qj , then this lemma follows by similar
arguments as Lemma 1.
A Parameterized Algorithm for Packing Overlapping Subgraphs 335

Once again, we have that sets from a k-Set Packing with t-Overlap K share
some elements with sets from a maximal k-Set Packing with t-Overlap M. Thus,
we can use the notion of feasible seeds to find a k-Set Packing with t-Overlap K.
For this problem, a seed is a subset of size t + 1 from a set in M. In this way,
the universe of seeds is the set of all possible seeds for each set in M.
Now, given a collection Q of k seeds we want to complete it into a k-Set
Packing with t-Overlap. That is, we want to add elements from U to each seed
in Q such that each updated seed is a set of S, and the overlap between any
pair of updated seeds is at most t. In this sense, the term sponsor of a seed s is
redefined as a set of elements from U that updates s as a set of S. Specifically,
we say that A is a sponsor of s, if |s ∩ A| = 0 and s ∪ A ∈ S.
The set Sponsors(s) can be computed as follows. For each set Q ∈ S, if s ⊂ Q
then a sponsor of s is Q\s. Once the set of sponsors for each seed is computed,
the bounded search tree algorithm follows with minor differences. For example,
now Qgr would be a collection of sets each of size at most r instead of H-
completed subgraphs. In the same way, the rule to discard ineligible sponsors
can be applied.

Theorem 3. The search tree algorithm ﬁnds one k-Set Packing with t-Overlap
assuming there is at least one.

Since the sets in S have size at most r and |U| = n, then O(nr ) is still an
upper bound for the number of sponsors of the seeds. Thus, the running time of
the algorithm follows as well by Theorem 2.

Theorem 4. The k-Set Packing with t-Overlap can be solved in

O(rrk k (r−t−1)k+2 nr ) time.

5 Conclusion
We have introduced the first fixed-parameter algorithm for packing subgraphs
with arbitrary overlap (the k-H-Packing with t-Overlap problem). We have also
provided an insight of the difficulty of packing overlapping subgraphs rather than
vertex-disjoint subgraphs. As discussed in Section 3.3, even overlap of at most
one vertex substantially complicates the analysis of the algorithm. On the other
hand, we show that the algorithm is applicable for a generalized version of the
k-Set Packing problem.
Many leads arise from the results presented here. Naturally, the first one is
to improve of the running time of the algorithm. The second is the design of
data reduction rules to decrease the number of children of the root or to bound
the number of sponsors of the seeds. Finally, alternative parameters besides the
number of the subgraphs could be considered as well.

Acknowledgments. We would like to thank our referees for their invaluable

comments to improve the presentation of this paper.
336 J. Romero and A. López-Ortiz

References
1. Adamcsek, B., Palla, G., Farkas, I., Derenyi, I., Vicsek, T.: Cfinder: locating cliques
and overlapping modules in biological networks. Bioinformatics 22(8), 1021–1023
(2006)
2. Damaschke, P.: Fixed-parameter tractable generalizations of cluster editing.
In: The 6th International Conference on Algorithms and Complexity (CIAC),
pp. 344–355 (January 2006)
3. Fellows, M., Guo, J., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: Graph-based
data clustering with overlaps. Discrete Optimization 8(1), 2–17 (2011)
4. Fellows, M., Heggernes, P., Rosamond, F., Sloper, C., Telle, J.: Finding k dis-
joint triangles in an arbitrary graph. In: The 30th Workshop on Graph-Theoretic
Concepts in Computer Science (WG), pp. 235–244 (2004)
5. Fellows, M., Knauer, C., Nishimura, N., Ragde, P., Rosamond, F., Stege, U.,
Thilikos, D., Whitesides, S.: Faster fixed-parameter tractable algorithms for match-
ing and packing problems. Algorithmica 52(2), 167–176 (2008)
6. Hartung, S., Komusiewicz, C., Nichterlein, A.: On structural parameterizations
for the 2-club problem. In: van Emde Boas, P., Groen, F.C.A., Italiano, G.F.,
Nawrocki, J., Sack, H. (eds.) SOFSEM 2013. LNCS, vol. 7741, pp. 233–243.
Springer, Heidelberg (2013)
7. Komusiewicz, C., Sorge, M.: Finding dense subgraphs of sparse graphs. In: 7th
International Symposium on Parameterized and Exact Computation (IPEC),
pp. 242–251 (2012)
8. Moser, H.: A problem kernelization for graph packing. In: Nielsen, M., Kučera,
A., Miltersen, P.B., Palamidessi, C., Tůma, P., Valencia, F. (eds.) SOFSEM 2009.
LNCS, vol. 5404, pp. 401–412. Springer, Heidelberg (2009)
9. Prieto, E., Sloper, C.: Looking at the stars. Theoretical Computer Science 351(3),
437–445 (2006)
10. Romero, J., López-Ortiz, A.: The G-packing with t-overlap problem. In: Pal, S.P.,
Sadakane, K. (eds.) WALCOM 2014. LNCS, vol. 8344, pp. 114–124. Springer,
Heidelberg (2014)
11. Schäfer, A., Komusiewicz, C., Moser, H., Niedermeier, R.: Parameterized computa-
tional complexity of finding small-diameter subgraphs. Optimization Letters 6(5),
883–891 (2012)
Crossing-Free Spanning Trees in Visibility
Graphs of Points between Monotone
Polygonal Obstacles

Julia Schüler and Andreas Spillner

Department of Mathematics and Computer Science, University of Greifswald

{julia.schueler,andreas.spillner}@uni-greifswald.de

Abstract. We consider the problem of deciding whether or not a geo-

metric graph has a crossing-free spanning tree. This problem is known
to be NP-hard even for very restricted types of geometric graphs. In
this paper, we present an O(n5 ) time algorithm to solve this problem
for the special case of geometric graphs that arise as visibility graphs
of a ﬁnite set of n points between two monotone polygonal obstacles.
In addition, we give a combinatorial characterization of those visibility
graphs induced by such obstacles that have a crossing-free spanning tree.
As a byproduct, we obtain a family of counterexamples to the following
conjecture by Rivera-Campo: A geometric graph has a crossing-free span-
ning tree if every subgraph obtained by removing a single vertex has a
crossing-free spanning tree.

Keywords: geometric graph, crossing-free spanning tree, polygonal

obstacle.

1 Introduction
A geometric graph is a graph whose vertices are points in the plane. Two distinct
edges {u1 , v1 } and {u2 , v2 } in such a graph cross if the straight line segments
u1 v1 and u2 v2 have a point in common that is not an endpoint of both edges. A
subgraph of a geometric graph is crossing-free if it does not contain any crossing
edges (cf. Fig. 1(a)). Rivera-Campo [21] gave the following sufficient condition
for the existence of a crossing-free spanning tree in a geometric graph G = (V, E)
with n ≥ 5 vertices:
(I5 ) For every 5-element subset U ⊆ V , the induced subgraph G[U ] has a
crossing-free spanning tree.
He conjectured that the constant 5 in condition (I5 ) can be replaced by n − 1
which, in turn, would imply that it can be replaced by any k ∈ {2, 3, . . . , n},
yielding a family of conditions (Ik ). Moreover, he showed that condition (Ik )
is indeed sufficient for the existence of a crossing-free spanning tree for all
k ∈ {2, 3, . . . , n} if the vertex set of the geometric graph is in convex position.
In this paper, we present, for every n ≥ 16, a geometric graph that satisfies
condition (In−1 ) but does not have a crossing-free spanning tree. We obtained
these counterexamples as a byproduct when exploring computationally tractable

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 337–350, 2014.

c Springer International Publishing Switzerland 2014
338 J. Schüler and A. Spillner

(a) (b) (c)

Fig. 1. (a) A geometric graph G. The bold edges yield a crossing-free subgraph of G,
but G has no crossing-free spanning tree. (b) The geometric graph (V, E(O)) induced
on a set V of points by a collection O of three polygonal obstacles (drawn shaded). (c)
A geometric graph G = (V, E({C})) induced by a single monotone obstacle C below V .

variants of the following decision problem Crossing-Free Spanning Tree

(CFST): Given a geometric graph G. Does there exist a crossing-free spanning
tree in G? Jansen and Woeginger [10] showed that this problem is NP-hard
even for geometric graphs with just two different edge lengths or with just two
different edge slopes. Here we restrict to geometric graphs G = (V, E(O)) that
are induced on a set V of points in the plane by a collection O of obstacles (cf.
Fig. 1(b)). In general, the obstacles P ∈ O are pairwise disjoint open polygons
with P ∩ V = ∅ and two distinct points u, v ∈ V form an edge {u, v} ∈ E(O) if
the straight line segment uv does not intersect any P ∈ O. To clearly distinguish
between the vertices of G = (V, E(O)) and the vertices of the polygons in O, we
will refer to the latter as corners.
It is easy to see that, without any restrictions on the collection of obstacles,
there exists, for every geometric graph G = (V, E), a collection of obstacles O
with G = (V, E(O)). Therefore, to explore potentially tractable special cases
of CFST, we restrict to obstacles C that are monotone (cf. Fig. 1(c)), that is,
the intersection of C with any vertical straight line is connected. A monotone
obstacle is below /above V if, for all points v ∈ V , the vertical ray emanating
downwards/upwards from v intersects C.
In this paper, we present a characterization for when a geometric graph in-
duced above a single monotone obstacle has a crossing-free spanning tree (Sec-
tion 2). The attempt to generalize this result to all geometric graphs yields the
counterexamples to the conjecture by Rivera-Campo mentioned above. In addi-
tion, we show that the geometric graphs obtained from collections consisting of
precisely two monotone obstacles, one below and one above V , allow to solve
CFST in O(n5 ) time where n is the number of vertices of the input graph (Sec-
tion 3). We conclude the paper mentioning some open problems and possible
directions for future work. The remainder of this section gives a brief overview
over previous and related work.
Crossing-Free Spanning Trees: Knauer et al. [15] showed that the prob-
lem CFST is fixed-parameter tractable with respect to the number of pairs of
crossing edges in the input graph G = (V, E). Subsequently, Halldórsson et
al. [9] improved the run time and, in addition, established that CFST is also
fixed-parameter tractable with respect to the number of vertices that lie in the
interior of the convex hull of V . Rivera-Campo and Urrutia-Galicia showed that
Crossing-Free Spanning Trees in Visibility Graphs of Points 339

condition (I3 ) can be relaxed: It suﬃces that there are at most n − 3 subsets
U ⊆ V with |U | = 3 for which (i) the interior of the convex hull of U does not
contain a vertex v ∈ V and (ii) the induced subgraph G[U ] has no crossing-free
spanning tree. A well-known result established by Károlyi et al. [11] states that
there is no
geometric
graph G = (V, E) such that neither G nor its complement
Gc = (V, V2 − E) has a crossing-free spanning tree. A complete characterization
of the minimal geometric graphs G such that Gc has no crossing-free spanning
tree is given by Keller et al. [12].
Obstacle Representations: Given an abstract graph G = (V , E ), an obstacle
representation of G is a geometric graph G = (V, E(O)) induced by a collection
O of obstacles on a set V of points such that G is isomorphic to G. Alpert
et al. [1] showed that every outerplanar graph has an obstacle representation
(V, E(O)) with |O| = 1 and they present a family of graphs that require an
arbitrarily large number of obstacles to represent them. Subsequently, Pach and
Sarıöz [20] showed that there are even bipartite graphs that require an arbitrarily
large number of obstacles, Mukkamala et al. [19] presented a lower bound of
Ω(n/ log n) on the number of obstacles needed in the worst case for representing
a graph with n vertices and Koch et al. [16] gave a characterization of the
biconnected graphs that have a representation where all obstacles lie in the
unbounded region of the plane outside of the induced geometric graph. Given an
obstacle representation G = (V, E(O)) such that O consists of a single polygon P
that has precisely one hole and all points in V are contained in this hole, Cheng
et al. [3] presented an algorithm that, for any two vertices s, t ∈ V , decides
whether or not G contains a crossing-free path from s to t. The run time of
the algorithm is O(n2 m2 ), where m is the number of corners of P and n = |V |.
Later, Daescu and Luo [5] improved the run time to O(n3 log m + nm).
Monotone Obstacles: A related problem on monotone obstacles that has
received considerable attention over the last years is that of optimal guarding.
In this problem, we want to compute a minimum size set W of points, also
called watchmen, on the upper boundary U of a monotone obstacle C such that
for every point p ∈ U there exists a watchmen w ∈ W that sees p, that is,
the straight line segment wp does not intersect C. There was a series of papers
[2,4,7,13] presenting constant-factor approximation algorithms for this problem,
which was only recently shown to be NP-hard by King and Krohn [14] (see also
[17,18] for hardness results on closely related problems). Our inspiration for re-
stricting the problem CFST to geometric graphs induced by monotone obstacles
came from the work by Gibson et al. [8] who showed that there is a polynomial
time approximation scheme for optimal guarding monotone obstacles while more
general versions of this problem are APX-hard [6].

2 A Combinatorial Characterization

Throughout this paper, we assume that the vertices of geometric graphs and
the corners of obstacles are in general position, that is, no three of them are
340 J. Schüler and A. Spillner

collinear and no two of them have the same x- or y-coordinate. Moreover, for
any two points p and q in the plane, we will say that p lies to the left/right of q if
p has a smaller/larger x-coordinate than q.
Often, when the intended meaning is clear
from the context, we will identify an edge
{u, v} in a geometric graph and the straight
line segment uv. We denote, for any finite
non-empty set V of points in the plane, by
chtop (V ) the upper boundary of the convex
hull of V , that is, the set of all those points Fig. 2. A geometric graph G =
that we meet when moving in clockwise di- (V, E) with the vertices in Vtop
rection along the boundary of the convex drawn as empty circles and the edges
hull of V from its leftmost to its rightmost in Etop drawn bold
point. In addition, for any geometric graph
G = (V, E), we denote by Vtop the set of those vertices in V that are contained
in chtop (V ) and, similarly, by Etop the set of those edges in E that are contained
in chtop (V ) (cf. Fig. 2). We will use the following observation from [21].
Observation 1. Let G = (V, E) be a geometric graph with n ≥ 3 vertices that
are in convex position. Then every crossing-free spanning tree of G must contain
at least two edges that lie on the boundary of the convex hull of V .
In addition, we also rely on the following fact that can be viewed as a dual
version of Observation 1 and that can easily be verified by induction on the
number of vertices of the geometric graph (it clearly holds for graphs with n =
3 vertices and, for n ≥ 4 vertices, any edge naturally partitions the graph into
two smaller subgraphs to which the induction hypothesis can be applied).
Observation 2. Let G = (V, E) be a geometric graph with n ≥ 3 vertices that
are in convex position. If G contains no edges that lie on the boundary of the
convex hull of V then every crossing-free spanning subgraph of G has at least
three connected components.
Next, we explore some properties that are very similar in spirit to Rivera-
Campo’s conditions (Ik ) mentioned in the introduction. To formally describe
these properties, we first introduce some more notation. Let G = (V, E) be a
geometric graph with n vertices and let k ∈ {0, 1, 2, . . . , n}. We say that G is
k-Steiner if for every k-element subset K ⊆ V there exists a crossing-free subtree
T = (V , E ) of G with K ⊆ V . Note that such a tree T can be viewed as a
crossing-free Steiner tree in G for the terminal vertices in K.
Our original motivation for looking into crossing-free Steiner trees was to
identify interesting families F of geometric graphs for which there exists a small
constant k ∗ such that a graph G ∈ F has a crossing-free spanning tree if and
only if G is k ∗ -Steiner. Then, assuming that there exists a polynomial time al-
gorithm A that decides for every G = (V, E) ∈ F and every k ∗ -element subset
K ⊆ V whether or not there exists a crossing-free Steiner tree for K in G,
we would immediately obtain a polynomial time algorithm for CFST when re-
stricted to the family F . As mentioned in the introduction, at least for k ∗ = 2,
Crossing-Free Spanning Trees in Visibility Graphs of Points 341

(a) (b) v3
v2

Fig. 3. (a) The geometric graph Gn considered in the proof of Theorem 4 for n = 8.
(b) A crossing-free Steiner tree in G8 for a set of n − 3 = 5 terminal vertices (non-
terminal vertices marked by empty circles).

that is, crossing-free paths between two speciﬁed vertices, there exists such an
algorithm A for geometric graphs induced by certain polygonal obstacles [3,5].
Unfortunately, as we will see below, this overall approach does not even work for
geometric graphs induced above a single monotone obstacle because there exist
such graphs G that are k-Steiner for arbitrarily large values of k but have no
crossing-free spanning tree. First, we present a technical lemma (proof omitted).
Based on this lemma, we obtain the above-mentioned characterization.

Lemma 3. Let G = (V, E({C})) be a connected geometric graph that is induced

on a set V of n points above a monotone obstacle C. If G is (n − |Vtop |)-Steiner
then G has a crossing-free spanning tree.

Theorem 4. A geometric graph G = (V, E({C})) that is induced on a set V of

n ≥ 4 points above a monotone obstacle C has a crossing-free spanning tree if
and only if G is (n − 2)-Steiner. This equivalence does not hold for any value
smaller than n − 2.

Proof: Clearly, if G has a crossing-free spanning tree then it is k-Steiner for any
k ∈ {0, 1, 2, . . . , n}. For the converse direction, assume that G is (n − 2)-Steiner.
Since n ≥ 4, this implies that G is connected. Moreover, since |Vtop | ≥ 2, it
also implies that G is (n − |Vtop |)-Steiner. Hence, by Lemma 3, G must have a
crossing-free spanning tree.
To see that n − 2 is the smallest possible value, ﬁx an arbitrary n ≥ 4. We
consider a graph resulting from a complete geometric graph with n vertices in
convex position by removing all except one of the edges that lie on the boundary
of the convex hull of its vertex set. Such a geometric graph can easily be obtained
as Gn = (V, E({C})) for some suitably chosen set V of n points above some
monotone obstacle C (cf. Fig. 3(a)).
First note that, by Observation 1, the graph Gn cannot have a crossing-free
spanning tree. Thus, by the equivalence already established, Gn cannot be k-
Steiner for any k ∈ {n − 2, n − 1, n}. Hence, it remains to show that Gn is
(n − 3)-Steiner: Consider an arbitrary set K ⊆ V of n − 3 terminal vertices.
Let v1 , v2 and v3 be the non-terminal vertices in V − K numbered in clockwise
order around the convex hull of V (the terms between/before/after used in the
following always refer to this order). Since n ≥ 4, we can assume without loss
of generality that there is at least one vertex in K between v3 and v1 on the
342 J. Schüler and A. Spillner

boundary of the convex hull of V . To obtain a crossing-free Steiner tree for the
vertices in K, we first connect v2 with all vertices in K between v3 and v1 . Next,
we connect all vertices in K between v1 and v2 (if any) with the vertex in K
immediately before v1 and, finally, we connect all vertices between v2 and v3 (if
any) with the vertex in K immediately after v3 . An example of the resulting
crossing-free Steiner tree is depicted in Fig. 3(b).
When exploring the situation for general geometric graphs, we found that
there are such graphs with n vertices that have no crossing-free spanning tree
even though they are (n − 1)-Steiner. Since a crossing-free Steiner tree for an
(n − 1)-element set K of terminal vertices in such a graph must actually be
a crossing-free spanning tree for the subgraph induced by K, this immediately
gives counterexamples to Rivera-Campo’s conjecture.
Lemma 5. For all n ≥ 16, there exists a geometric graph G = (V, E) with n
vertices that satisfies condition (In−1 ) but has no crossing-free spanning tree.
Proof: Fix an arbitrary n ≥ 16. To construct a suitable geometric graph G =
(V, E) with n vertices, we first arrange a set V1 of n − 8 points in convex position
such that v1 , v2 , . . . , vn−8 is the order of these points along the boundary of the
convex hull of V1 in clockwise direction. We arrange V1 in such a way that v1 is the
point with largest y-coordinate and v8 is the point with smallest y-coordinate.
Similarly, we arrange a set V2 of 8 points in convex position. Let u1 , u2 , . . . , u8
be the order of the points in V2 in counterclockwise direction along the boundary
of the convex hull of V2 . The points are arranged in such a way that u1 is the
point with largest y-coordinate and u8 is the point with smallest y-coordinate.
Now, for i ∈ {1, 2}, let Gi = (Vi , Ei ) denote the geometric graph that we
obtain from the complete geometric graph on vertex set Vi by removing all
edges that lie on the boundary of the convex hull of Vi . Putting
E ∗ = {{vj , uj+1 } : j ∈ {1, 3, 5, 7}} ∪ {{vj , uj−1 } : j ∈ {2, 4, 6, 8}},
we obtain our final geometric graph G with vertex set V = V1 ∪ V2 and edge set
E = E1 ∪ E2 ∪ E ∗ . By placing G2 far enough away to the right of G1 we ensure
that no edge in E ∗ crosses an edge in E1 ∪ E2 (cf. Fig. 4(a)). Note that the edges
in E ∗ form four disjoint pairs of crossing edges.
We first argue that G has no crossing-free spanning tree: By Observation 2,
the restriction of any crossing-free spanning subgraph of G to Vi , i ∈ {1, 2}, has
at least three connected components. Hence, a crossing-free spanning tree of G
would need to use at least five edges that are neither in E1 nor in E2 . But, by
construction of G, a crossing-free subgraph of G can use at most four edges from
E − (E1 ∪ E2 ) = E ∗ .
It remains to show that for every (n−1)-element subset V = V −{w}, w ∈ V ,
there exists a crossing-free spanning tree for the subgraph of G induced by V .
In view of the high degree of symmetry of G, it suffices to consider the case that
w ∈ V1 and, as indicated in Fig. 4(b)-(f), there are only five different types of
(n − 1)-element subsets V that result from removing a vertex w from V1 . It is
easy to check that, for each of them, a crossing-free spanning tree exists.
Crossing-Free Spanning Trees in Visibility Graphs of Points 343

(a) v1 u1 (b)
v2 u2
v3 u3
v4 u4
G2
v5 u5
v6 u6
G1 v7 u7
v8 u8
(c) (d)

(e) (f)

Fig. 4. (a) The geometric graph G constructed in the proof of Lemma 5 for n = 22.
The eight edges in E ∗ between the two subgraphs G1 and G2 are drawn dashed.
(b)-(f) The empty circle marks the vertex w ∈ V1 that is removed to obtain an (n − 1)-
element subset V . There is always a crossing-free spanning tree for the subgraph of
G induced by V but its structure depends on the position of w: (b) w = vj for some
j ∈ {9, 10, . . . , n − 8}, (c)-(f) w = vj for some j ∈ {1, 2, 3, 4}.

3 A Polynomial Time Algorithm

In this section, we present a polynomial time algorithm for CFST restricted to
geometric graphs G = (V, E(O)) where O consists of precisely two monotone
obstacles Ca and Cb that lie above and below V , respectively (cf. Fig. 5(a)).

3.1 Deﬁnitions and Key Facts

For any edge e = {u, v} ∈ E(O) we deﬁne the set R(e) of those vertical rays that
emanate downwards from a point on e and contain a point in V . We sort the
rays in R(e) from left to right and refer to them as r 1 , r 2 , . . . , r , = |R(e)| (cf.
Fig. 5(b)). Similarly, we deﬁne the set R(e) of those vertical rays that emanate
upwards from a point on e and contain a point in V . We refer to them as
344 J. Schüler and A. Spillner

(a) (b) r1 r2 r3 r4
Ca
v
u e

Cb
r1 r2 r3

Fig. 5. (a) A geometric graph induced on a point set V between two monotone obstacles
Ca and Cb . (b) The vertical rays in R(e) and R(e) emanating from e downwards and
upwards, respectively.

r 1 , r 2 , . . . , r , = |R(e)|, from left to right. Moreover, for any edge e = {u, v}

in a crossing-free spanning tree T = (V, E ) of G, a ray r ∈ R(e) crosses T if
(i) r intersects some edge e ∈ E − {e} in its relative interior or (ii) r contains
a vertex w ∈ V − {u, v} that is adjacent in T to at least one vertex that lies to
the left of w and to at least one vertex that lies to the right of w.
We start with two technical lemmas establishing that if there exists a crossing-
free spanning tree of G then there also exists such a spanning tree having certain
special properties that we will exploit later on in our algorithm. Note that these
lemmas also apply to symmetric situations that are obtained by a reﬂection on
the x- or y-axis.
Lemma 6. Let T = (V, E ) be a crossing-free spanning tree of the graph G =
(V, E({Ca , Cb })) and q ∈ V . In addition, let g ∈ E be such that there exists
a ray r ∈ R(g) that contains q and such that r does not cross any edge of T
between its starting point p and q (cf. Fig. 6(a)). Then one of the following must
hold:
(i) There exists a crossing-free spanning tree of G that is obtained by removing
g from T and replacing it by some other edge g in G that does not cross
the straight line segment pq (cf. Fig. 6(b)).
(ii) R(g) contains a ray r that does not cross T (cf. Fig. 6(c)).
Proof: We direct all edges in T away from q. In the following, we only consider
the case that g is directed from its left endpoint u to its right endpoint v (the
other case is completely symmetric). Let Vv denote the set of those vertices in
V that can be reached from v by following the directed edges of T .
Now, in addition to Ca and Cb we also view the edges of T as obstacles and
consider the shortest path π from v to q that avoids all these obstacles and
is homotopic to the piece-wise linear curve C that we traverse when moving
from v along g to p and then upwards along r to q. Note that, intuitively, π
is obtained by pulling C tight and, therefore, is a polygonal path with vertices
v = v0 , v1 , . . . , v = q. Let i be the smallest index in {0, 1, . . . , } such that
vi ∈ Vv . Note that this index must exist since q ∈ Vv . To ﬁnish the proof, we
distinguish two cases.
Case 1: vi ∈ V − Vv . Then we replace edge g by the edge g = {vi−1 , vi } and
obtain again a crossing-free spanning tree.
Crossing-Free Spanning Trees in Visibility Graphs of Points 345

(a) r (b) r (c) r

r
q q q
g g g
p p p

Fig. 6. (a) A crossing-free spanning tree T as described in the assumptions of Lemma 6.

(b) An alternative crossing-free spanning tree obtained from the tree T in (a) by
replacing edge g by g . (c) A crossing-free spanning tree T for which there exists a ray
r ∈ R(g) that does not cross T .

Case 2: vi ∈ V . Then vi is a corner of Ca and, by the construction of π, this

implies that there exists some r ∈ R(g) that does not cross T .
The proof of the next lemma uses a similar argument and is omitted.
Lemma 7. Let T = (V, E ) be a crossing-free spanning tree of the graph G =
(V, E({Ca , Cb })) and g ∈ E . In addition, let q ∈ V be a vertex that lies on a ray
r ∈ R(g) that does not cross T (q may be the left endpoint of g but not the right
endpoint). Further, assume that all vertices adjacent to q in T lie to the right of
q. Let h denote the edge incident to q in T with minimum slope (cf. Fig.7(a)).
Then one of the following must hold:
(i) There exists a crossing-free spanning tree T of G that contains edges g and
h, the ray r does not cross T , all vertices adjacent to q in T lie to the
right of q and there exists a ray r ∈ R(h) − {r1 } that does not cross T (cf.
Fig.7(b)).
(ii) There exists some edge h incident to q in T for which there exists a ray
r ∈ R(h ) that does not cross T (cf. Fig.7(c)).

3.2 Types of Subproblems

The facts collected in the previous section suggest that vertical rays and edges
of G = (V, E({Ca , Cb })) may be used to partition the given instance of CFST
into independent subproblems. In the following we first describe the three types
of subproblems that can arise and then how we process each type. Note that, as
before, for each type we describe below there are symmetric versions obtained
by reflection on the x- or y-axis.
A subproblem of Type (1) is defined by an edge e of G and two rays ri ∈
R(e) and rj ∈ R(e) with max{i, j} > 1 and, in addition, i ∈ {1, |R(e)|} or
j ∈ {1, |R(e)|} (cf. Fig. 8(a)). Let R(e, r i , r j ) denote the closed region that lies
to the right of the piece-wise linear curve that we traverse by moving from −∞
along r j to its starting point, then along e to the starting point of r i and then
along r i to +∞. We have to decide whether there exists a crossing-free spanning
tree T = (V , E ) for the set V of those vertices of G that lie in R(e, r i , rj )
such that all edges e ∈ E are completely contained in R(e, r i , rj ) and such that
none of these edges crosses e.
346 J. Schüler and A. Spillner

(a) r (b) r r (c) r

q q q h
h h h
g g g
r

Fig. 7. (a) A crossing-free spanning tree T as described in the assumptions of Lemma 7.

(b) An alternative crossing-free spanning tree T obtained from the tree T in (a). (c)
A crossing-free spanning tree T for which there exists an edge h incident to q and a
ray r ∈ R(h ) that does not cross T .

A subproblem of Type (2) is deﬁned by an edge e of G and two rays r i , rj ∈ R(e)

with i ≤ j (cf. Fig. 8(b)). Let R(e, r i , rj ) denote the closed region that lies to the
right of the piece-wise linear curve that we traverse by moving from −∞ along
r i to its starting point, then along e to the starting point of r j and then along r j
back to −∞. We have to decide whether there exists a crossing-free spanning tree
T = (V , E ) for the set V of those vertices of G that lie in R(e, r i , r j ).
A subproblem of Type (3) is deﬁned by two edges e and f of G that have their
left endpoint in common and such that the slope of f is less than the slope of e
together with a ray ri ∈ R(f ) that does not cross e (cf. Fig. 8(c)). Let R(e, f, r i )
denote the closed region that lies to the right of the piece-wise linear curve that
we traverse by moving from +∞ along ri to its starting point, then along f to its
left endpoint, then along e to its right endpoint and then along the rightmost ray
r ∈ R(e) to +∞. We have to decide whether there exists a crossing-free spanning
tree T = (V , E ) for the set V of those vertices of G that lie in R(e, f, r i ) such
that all edges e ∈ E are completely contained in R(e, r i , r j ).

(a) (b) (c) r

ri e ri

e e

ri f
rj rj

Fig. 8. (a) A subproblem of Type (1). (b) A subproblem of Type (2). (c) A subproblem
of Type (3).

It follows immediately from the description of the three types of subproblems

above that there are at most O(n4 ) subproblems of each type. Thus, to obtain
a polynomial time algorithm based on dynamic programming, it suﬃces to es-
tablish that the solution for each of these subproblems can be derived eﬃciently
from the solutions of other, smaller subproblems that have one of the types de-
scribed above too. This is made more precise in the following three lemmas and
the outline of their proofs.
Crossing-Free Spanning Trees in Visibility Graphs of Points 347

(a) ri (b) ri rk rk+1 (c) ri

w w w f
f v v f v u
e e e

r1 r1 r1 r

Fig. 9. (a) Processing a subproblem of Type (1), Case 1. (b)-(c) There are two diﬀerent
ways how the problem may be partitioned into two smaller subproblems, but it will
always be into a subproblem of Type (1) and a subproblem of Type (2).

Lemma 8. Suppose there exists a crossing-free spanning tree T = (V , E ) for

a subproblem of Type (1) with |V | > 1. Then this subproblem can be partitioned
into one or two smaller subproblems of Type (1), (2) or (3) that admit a crossing-
free spanning tree. The total number of subproblems that need to be considered
is in O(n).

Proof: We continue to use the notation introduced above to describe a subprob-

lem of Type (1). The focus of the proof will be on describing how the smaller
subproblems arise. It is then not hard to check that the total number of relevant
smaller subproblems is in O(n). We distinguish four cases.
Case 1: 1 < i < |R(e)| and j = 1. Let w denote the vertex in V that is contained
in r i and let v denote the vertex in V that lies above or on edge e and for which
the slope of the edge f = {v, w} is minimum (cf. Fig. 9(a)). Note that f cannot
cross any edge of T . Thus, we assume that f ∈ E . Then, applying Lemma 7
with q = w, g = e and h = f , there must (i) exist some ray r ∈ R(f ) that
does not cross T or (ii) T contains an edge f incident to w for which some ray
r ∈ R(f ) does not cross T (cf. Fig. 9(c)).
If (i) holds, the construction of f implies that there exists some i ≤ k < |R(e)|
such that both rk ∈ R(e) and r k+1 ∈ R(e) have a nonempty intersection with f
but do not cross any other edge of T . In particular, T can be partitioned into a
subtree T1 that is a crossing-free spanning tree for the subproblem of Type (2)
defined by e, r i and r k and a subtree T2 that is a crossing-free spanning tree for
the subproblem of Type (1) defined by e, r k+1 and r1 (cf. Fig. 9(b)). Note that
T1 and T2 are linked together by edge f to form T .
If (ii) holds, we repeatedly apply Lemma 6 to the right endpoint u of e. The
existence of edge f in T implies that it is possible to replace all edges in T
that are crossed by the ray r k ∈ R(e), k = |R(e)|. Let T = (V , E ) denote the
resulting crossing-free spanning tree for V . This tree can be partitioned into a
subtree T1 that is a crossing-free spanning tree for the subproblem of Type (2)
defined by e, r 1 and r k and a subtree T2 that is a crossing-free spanning tree
for the subproblem of Type (1) defined by e, ri and rk . Note that T1 and T2
are glued together at vertex u to form T .
348 J. Schüler and A. Spillner

Case 2: i = |R(e)| and j = 1. Let w denote (a)

the left endpoint of e. Applying Lemma 7 with ri
q = w and g = h = e we have (i) a ray in R(e)−
{r1 } that does not cross T (cf. Fig. 10(a)) or w e
(ii) there exists some edge e = e in T that
is incident to w such that there exists a ray in
R(e ) that does not cross T (cf. Fig. 10(b)). r1 rk rk+1
If (i) holds there exists some 1 ≤ k < |R(e)| (b)
such that both r k ∈ R(e) and rk+1 ∈ R(e) do rk+1
not cross T . Thus, T can be partitioned into a ri rk
subtree T1 that corresponds to the subproblem w e
e
of Type (2) defined by e, r 1 and rk and a sub-
tree T2 that corresponds to the subproblem of
Type (1) defined by e, r k+1 and r i . These two r1
subtrees are linked by edge e.
If (ii) holds there exists some 1 < k < |R(e )| Fig. 10. Processing a subprob-
such that both r k ∈ R(e ) and r k+1 ∈ R(e ) do lem of Type (1), Case 2. (a) Two
resulting smaller subproblems of
not cross T . Thus, T can be partitioned into a
Type (1) and (2). (b) Two result-
subtree T1 that corresponds to the subproblem ing smaller subproblems of Type
of Type (3) defined by e, e and rk and the (1) and (3).
subtree T2 that corresponds to the subproblem
of Type (1) defined by e , r 1 and rk+1 . These
two subtrees are glued together at vertex w.
The remaining two cases can be handled in a similar way.
Subproblems of Type (2) and (3) can be processed in a similar way as de-
scribed above for subproblems of Type (1) in Case 1:

Lemma 9. Suppose there exists a crossing-free spanning tree T = (V , E ) for

a subproblem of Type (2) with |V | > 1. Then this subproblem can be partitioned
into two smaller subproblems of Type (2) that admit a crossing-free spanning
tree. The total number of subproblems that need to be considered is in O(n).

Lemma 10. Suppose there exists a crossing-free spanning tree T = (V , E ) for

a subproblem of Type (3) in which r i does not contain the right endpoint of e.
Then this subproblem can be partitioned into two smaller subproblems, one of
Type (2) and one of Type (3), that both admit a crossing-free spanning tree. The
total number of subproblems that need to be considered is in O(n).

3.3 Summary of the Algorithm

It is now not diﬃcult to design a dynamic programming algorithm for solving
CFST for a geometric graph G = (V, E({Ca , Cb })) induced by two monotone
obstacles. Note that in our run time analysis below we assume that G is explic-
itly given, that is, we ignore the time it would take to compute the edge set
E({Ca , Cb }) if the input were only the point set V and the monotone obstacles
Ca and Cb .
Crossing-Free Spanning Trees in Visibility Graphs of Points 349

Theorem 11. There is a dynamic programming algorithm that, for any geomet-
ric graph G = (V, E({Ca , Cb })) that is induced on a set V of n points between
two monotone obstacles Ca and Cb decides in O(n5 ) time whether or not G has
a crossing-free spanning tree.

Proof: The input graph G can be viewed as a family of subproblems of Type

(1): Let w be the leftmost vertex in G. We consider each edge e that is incident
to w in G and consider the subproblem of Type (1) deﬁned by e, r 2 ∈ R(e) and
r 1 ∈ R(e). Clearly, there exists a crossing-free spanning tree of G if and only if
at least one of these subproblems has a crossing-free spanning tree.
As observed in Section 3.2, there are O(n4 ) subproblems in total. It follows
from Lemmas 8, 9 and 10, that each of these subproblems can be processed in
O(n) time assuming that all relevant smaller subproblems have already been
solved and the solutions have been recorded in a dynamic programming table.
Filling all entries in the table then takes O(n5 ) time.

4 Concluding Remarks
In this paper, we have started to explore properties of crossing-free spanning
trees in geometric graphs that are induced on a point set in the plane by special
types of polygonal obstacles. To illustrate that this may indeed lead to interest-
ing tractable instances of the problem CFST, we showed that for graphs induced
between two monotone obstacles it can be solved in polynomial time. In view of
the fact that the existence of a crossing-free path between two speciﬁed vertices
can be decided in polynomial time even for geometric graphs that are induced
by a single non-monotone polygonal obstacle, it would be interesting to know
whether CFST can also be solved in polynomial time on these more general
instances too. Moreover, our counterexamples to Rivera-Campo’s conjecture im-
mediately raise the following question: What is the largest number k ∗ ∈ N−{0, 1}
such that, for all geometric graphs G = (V, E) with n ≥ k ∗ vertices, condition
(Ik∗ ) implies the existence of a crossing-free spanning tree in G? Combining the
results in this paper with those in [21], we obtain the bounds 5 ≤ k ∗ ≤ 14.

Acknowledgments. We would like to thank Alexander Wolﬀ for initiating

the work presented here while hosting the second author at the University of
Würzburg. In addition, we would like to thank him and Ivo Vigan as well as the
anonymous reviewers for their helpful comments.

References
1. Alpert, H., Koch, C., Laison, J.: Obstacle numbers of graphs. Discrete & Compu-
tational Geometry 44, 223–244 (2010)
2. Ben-Moshe, B., Katz, M., Mitchell, J.: A constant-factor approximation algorithm
for optimal 1.5D terrain guarding. SIAM Journal on Computing 36, 1631–1647
(2007)
350 J. Schüler and A. Spillner

3. Cheng, Q., Chrobak, M., Sundaram, G.: Computing simple paths among obstacles.
Computational Geometry 16, 223–233 (2000)
4. Clarkson, K., Varadarajan, K.: Improved approximation algorithms for geometric
set cover. Discrete & Computational Geometry 37, 43–58 (2007)
5. Daescu, O., Luo, J.: Computing simple paths on points in simple polygons.
In: Ito, H., Kano, M., Katoh, N., Uno, Y. (eds.) KyotoCGGT 2007. LNCS,
vol. 4535, pp. 41–55. Springer, Heidelberg (2008)
6. Eidenbenz, S.: Inapproximability results for guarding polygons without holes. In:
Chwa, K.-Y., Ibarra, O.H. (eds.) ISAAC 1998. LNCS, vol. 1533, pp. 427–436.
Springer, Heidelberg (1998)
7. Elbassioni, K., Krohn, E., Matijević, D., Mestre, J., Ševerdija, D.: Improved
approximations for guarding 1.5-dimensional terrains. Algorithmica 60, 451–463
(2011)
8. Gibson, M., Kanade, G., Krohn, E., Varadarajan, K.: An approximation scheme
for terrain guarding. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX
and RANDOM 2009. LNCS, vol. 5687, pp. 140–148. Springer, Heidelberg (2009)
9. Halldórsson, M., Knauer, C., Spillner, A., Tokuyama, T.: Fixed-parameter
tractability for non-crossing spanning trees. In: Dehne, F., Sack, J.-R., Zeh, N.
(eds.) WADS 2007. LNCS, vol. 4619, pp. 410–421. Springer, Heidelberg (2007)
10. Jansen, K., Woeginger, G.: The complexity of detecting crossingfree configurations
in the plane. BIT 33, 580–595 (1993)
11. Károlyi, G., Pach, J., Tóth, G.: Ramsey-type results for geometric graphs I. Dis-
crete & Computational Geometry 18, 247–255 (1997)
12. Keller, C., Perles, M., Rivera-Campo, E., Urrutia-Galicia, V.: Blockers for non-
crossing spanning trees in complete geometric graphs. In: Pach, J. (ed.) Thirty
Essays on Geometric Graph Theory, pp. 383–397. Springer (2013)
13. King, J.: A 4-approximation algorithm for guarding 1.5-dimensional terrains. In:
Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 629–640.
Springer, Heidelberg (2006)
14. King, J., Krohn, E.: Terrain guarding is NP-hard. SIAM Journal on Computing 40,
1316–1339 (2011)
15. Knauer, C., Schramm, É., Spillner, A., Wolff, A.: Configurations with few crossings
in topological graphs. Computational Geometry 37, 104–114 (2007)
16. Koch, A., Krug, M., Rutter, I.: Graphs with plane outside-obstacle representations
(2013) available online: arXiv:1306.2978
17. Krohn, E., Nilsson, B.: The complexity of guarding monotone polygons. In: Proc.
Canadian Conference on Computational Geoemtry, pp. 167–172 (2012)
18. Krohn, E., Nilsson, B.: Approximate guarding of monotone and rectilinear poly-
gons. Algorithmica 66, 564–594 (2013)
19. Mukkamala, P., Pach, J., Pálvölgyi, D.: Lower bounds on the obstacle number of
graphs. The Electronic Journal of Combinatorics 19 (2012)
20. Pach, J., Sarıöz, D.: On the structure of graphs with low obstacle number. Graphs
and Combinatorics 27, 465–473 (2011)
21. Rivera-Campo, E.: A note on the existente of plane spanning trees of geometrie
graphs. In: Akiyama, J., Kano, M., Urabe, M. (eds.) JCDCG 1998. LNCS, vol. 1763,
pp. 274–277. Springer, Heidelberg (2000)
The Connectivity of Boolean Satisfiability:
Dichotomies for Formulas and Circuits

Konrad Schwerdtfeger

Institut für Theoretische Informatik, Leibniz Universität Hannover,

Appelstr. 4, 30167 Hannover, Germany
k.w.s@gmx.net

Abstract. For Boolean satisﬁability problems, the structure of the so-

lution space is characterized by the solution graph, where the vertices are
the solutions, and two solutions are connected iff they differ in exactly
one variable. Motivated by research on heuristics and the satisfiability
threshold, in 2006, Gopalan et al. studied connectivity properties of the
solution graph and related complexity issues for CSPs [3]. They found
dichotomies for the diameter of connected components and for the com-
plexity of the st-connectivity question, and conjectured a trichotomy for
the connectivity question. Their results were refined by Makino et al. [7].
Recently, we were able to establish the trichotomy [15].
Here, we consider connectivity issues of satisfiability problems defined
by Boolean circuits and propositional formulas that use gates, resp. con-
nectives, from a fixed set of Boolean functions. We obtain dichotomies
for the diameter and the connectivity problems: on one side, the diame-
ter is linear and both problems are in P, while on the other, the diameter
can be exponential and the problems are PSPACE-complete.

1 Introduction

The Boolean satisfiability problem, as well as many related questions like equiv-
alence, counting, enumeration, and numerous versions of optimization, are of
great importance in both theory and applications of computer science.
Common to all these problems is that one asks questions about a Boolean
relation given by some short description, e.g. a propositional formula, Boolean
circuit, binary decision diagram, or Boolean neural network. For the usual for-
mulas with the connectives ∧, ∨ and ¬, several generalizations and restrictions
have been considered. Most widely studied are Boolean constraint satisfactions
problems (CSPs), that can be seen as a generalization of formulas in CNF (con-
junctive normal form), see Definition 2. Another generalization, that we will con-
sider here, are formulas with connectives from an arbitrary fixed set of Boolean
functions B, known as B-formulas. This concept also applies to circuits, where
the allowed gates implement the functions from B, called B-circuits. A further
extension that allows for shorter representations, and in turn makes many prob-
lems harder, are quantifiers, which we will look at in Section 5.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 351–364, 2014.

c Springer International Publishing Switzerland 2014
352 K. Schwerdtfeger

Here we will investigate the structure of the solution space, which is of obvi-
ous relevance to these satisfiability related problems. Indeed, the solution space
connectivity is strongly correlated to the performance of standard satisfiability
algorithms like WalkSAT and DPLL on random instances: As one approaches
the satisfiability threshold (the ratio of constraints to variables at which ran-
dom k-CNF-formulas become unsatisfiable for k ≥ 3 ) from below, the solution
space fractures, and the performance of the algorithms breaks down [9,8]. These
insights mainly came from statistical physics, and lead to the development of
the survey propagation algorithm, which has much better performance on ran-
dom instances [8]. This research was a motivation for Gopalan et al. to study
connectivity properties of the solution space of Boolean CSPs [3].
While the most efficient satisfiability solvers take CNF-formulas as input, one
of the most important applications of satisfiability testing is verification and op-
timization in Electronic Design Automation (EDA), where the instances derive
mostly from digital circuit descriptions [18]. Though many such instances can
easily be encoded in CNF, the original structural information, such as signal
ordering, gate orientation and logic paths, is lost, or at least obscured. Since
exactly this information can be very helpful for solving these instances, consid-
erable effort has been made recently to develop satisfiability solvers that work
with the circuit description directly [18], which have far superior performance in
EDA applications, or to restore the circuit structure from CNF [2]. This is one
major motivation for our study.
A direct application of st-connectivity are reconfiguration problems, that arise
when we wish to find a step-by-step transformation between two feasible solutions
of a problem such that all intermediate results are also feasible. Recently, the re-
configuration versions of many problems such as Independent-Set, Vertex-
Cover, Set-Cover, Graph-k-Coloring, Shortest-Path have been studied
[4,5], and many complexity results were obtained, in some cases making use of
Gopalan et al.’s results.
Since many of the satisfiability related problems are hard to solve in gen-
eral (they are NP- or even PSPACE-complete), one has tried to identify easier
fragments and to classify restrictions in terms of their complexity. Possibly the
best known result is Schaefer’s 1978 dichotomy theorem for CSPs, which states
that for certain classes of allowed constraints the satisfiability of a CSP is in P,
while it is NP-complete for all other classes [13]. Analogously, Gopalan et al. in
2006 classified the complexity of connectivity questions for CSPs in Schaefer’s
framework. In this paper, we consider the same connectivity issues as Gopalan
et al., but for problems defined by Boolean circuits and propositional formulas
that use gates, resp. connectives, from a fixed set of Boolean functions.

2 Propositional Formulas and Their Solution Space

Connectivity

Definition 1. An n-ary Boolean relation is a subset of {0, 1}n (n ≥ 1). The set
of solutions of a propositional formula φ with n variables deﬁnes in a natural way
The Connectivity of Boolean Satisﬁability 353

an n-ary Boolean relation R, where the variables are taken in lexicographic order.
The solution graph G(φ) of φ is the subgraph of the n-dimensional hypercube
graph induced by the vectors in R, i.e., the vertices of G(φ) are the vectors in R,
and there is an edge between two vectors precisely if they differ in exactly one
position.
We use a, b, . . . to denote vectors of Boolean values and x, y, . . . to denote vec-
tors of variables, a = (a1 , a2 , . . .) and x = (x1 , x2 , . . .). The Hamming distance
|a − b| of two Boolean vectors a and b is the number of positions in which they
differ. If a and b are solutions of φ and lie in the same connected component
of G(φ), we write dφ (a, b) to denote the shortest-path distance between a and b.
The diameter of a connected component is the maximal shortest-path distance
between any two vectors in that component. The diameter of G(φ) is the maximal
diameter of any of its connected components.
In our proofs for B-formulas and B-circuits, we will use Gopalan et al.’s results
for 3-CNF-formulas, so we also need to introduce some terminology for constraint
satisfaction problems.
Definition 2. A CNF-formula is a Boolean formula of the form C1 ∧ · · · ∧ Cm
(1 ≤ m < ∞), where each Ci is a clause, that is, a finite disjunction of literals
(variables or negated variables). A k-CNF-formula (k ≥ 1) is a CNF-formula
where each Ci has at most k literals.
For a finite set of Boolean relations S, a CNF(S)-formula (with constants)
over a set of variables V is a finite conjunction C1 ∧ · · · ∧ Cm , where each Ci is
a constraint application ( constraint for short), i.e., an expression of the form
R(ξ1 , . . . , ξk ), with a k-ary relation R ∈ S, and each ξj is a variable in V or one
of the constants 0, 1.
A k-clause is a disjunction of k variables or negated variables. For 0 ≤ i ≤ k,
let Di be the set of all satisfying truth assignments of the k-clause whose first i
literals are negated, and let Sk = {D0 , . . . , Dk } Thus, CNF(Sk ) is the collection
of k-CNF-formulas.
Gopalan et al. studied the following two decision problems for CNF(S)-formulas:
– the connectivity problem Conn(S): given a CNF(S)-formula φ, is G(φ) con-
nected? (if φ is unsatisfiable, then G(φ) is considered connected)
– the st-connectivity problem st-Conn(S): given a CNF(S)-formula φ and
two solutions s and t, is there a path from s to t in G(φ)?
Lemma 1. [3, Lemm 3.6] st-Conn(S3 ) and Conn(S3 ) are PSPACE-complete.
Proof. st-Conn(S3 ) and Conn(S3 ) are in PSPACE: Given a CNF(S3 )-formula
φ and two solutions s and t, we can guess a path of length at most 2n between
them and verify that each vertex along the path is indeed a solution. Hence
st-Conn(S3 ) is in NPSPACE=PSPACE. For Conn(S3 ), by reusing space we
can check for all pairs of vectors whether they are satisfying and, if they both
are, whether they are connected in G(φ).
We can not state the full proof for the PSPACE-hardness here. It consists of
a direct reduction from the computation of a space-bounded Turing machine M .
354 K. Schwerdtfeger

The input-string w of M is mapped to a CNF(S3 )-formula and two satisfying

assignments s and t, corresponding to the initial and accepting conﬁguration
respectively, s.t. s and t are connected in G(φ) iﬀ M accepts w.

Lemma 2. [3, Lemm 3.7] For n ≥ 2 , there is an n-ary Boolean function f

with f (1, . . . , 1) = 1 and a diameter of at least 2 2 .
n

3 Circuits, Formulas, and Post’s Lattice

An n-ary Boolean function is a function f : {0, 1}n → {0, 1}. Let B be a finite
set of Boolean functions.
A B-circuit C with input variables x1 , . . . , xn is a directed acyclic graph,
augmented as follows: Each node (here also called gate) with indegree 0 is labeled
with an xi or a 0-ary function from B, each node with indegree k > 0 is labeled
with a k-ary function from B. The edges (here also called wires) pointing into a
gate are ordered. One node is designated the output gate.
Given values a1 , . . . , an ∈ {0, 1} to x1 , . . . , xn , C computes an n-ary function
fC as follows: A gate v labeled with a variable xi returns ai , a gate v labeled
with a function f computes the value f (b1 , . . . , bk ), where b1 , . . . , bk are the
values computed by the predecessor gates of v, ordered according to the order
of the wires. For a more formal definition see [17].
A B-formula is defined inductively: A variable x is a B-formula. If φ1 , . . . , φm
are B-formulas, and f is an n-ary function from B, then f (φ1 , . . . , φn ) is a
B-formula; here, we identify the function f and the symbol representing it in a
formula.
It is easy to see that the functions computable by a B-circuit, as well as the
functions definable by a B-formula, are exactly those that can be obtained from
B by superposition, together with all projections [1]. By superposition, we mean
substitution (that is, composition of functions), permutation and identification
of variables, and introduction of fictive variables (variables on which the value
of the function does not depend). This class of functions is denoted by [B]. B is
closed (or said to be a clone) if [B] = B. A base of a clone F is any set B with
[B] = F .
Already in the early 1920s, Emil Post extensively studied Boolean functions,
identified all closed classes, found a finite base for each of them, and detected
their inclusion structure [11]. The closed classes form a lattice, called Post’s
lattice, depicted in Figure 1; a table of the bases can be found e.g. in [1], a
modern proof e.g. in [19]. The classes are defined as follows:
– BF is the class of all Boolean functions.
– For a ∈ {0, 1}, an n-ary Boolean function f is called a-reproducing, if
f (a, . . . , a) = a; the classes Ra contain all a-reproducing functions.
– f is called monotonic, if a1 ≤ b1 , . . . , an ≤ bn implies f (a1 , . . . , an ) ≤
f (b1 , . . . , bn ); M is the class of all monotonic functions.
– f is called self-dual, if f (x1 , . . . , xn ) = f (x1 , . . . , xn ); D is the class of all
self-dual functions.
The Connectivity of Boolean Satisfiability 355

R1 R0

S20 S21
M1 M0

S202 S201 M2 S211 S212

S30 S31

S200 S210
S302 S301 S311 S312

Sn
0 S300 S310 Sn
1

Sn
02 Sn
01 D Sn
11 Sn
12
S0 S1

D1
Sn
00 Sn
10
S02 S01 S11 S12
D2

S00 S10

V L E

V1 V0 L1 L3 L0 E1 E0

V2 L2 E2

st-BF-Conn, N2 st-QBF-Conn,
BF-Conn, QBF-Conn,
st-Circ-Conn, / Diameter for
Circ-Conn quantiﬁed formulas:
/ Diameter:
PSPACE-complete
I
PSPACE-complete / exponential
/ exponential I1 I0 PSPACE-complete
in P / linear / exponential
in P / linear I2 in P / linear

Fig. 1. Post’s lattice with our results for the connectivity problems and the diameter.
For comparison, the satisﬁability problem (without quantiﬁers) is NP-complete for the
bold circled classes, and in P for the other ones.
356 K. Schwerdtfeger

– f is called aﬃne, if f (x1 , . . . , xn ) = xi1 ⊕ · · · ⊕ xim ⊕ c with i1 , . . . , im ∈

{1, . . . , n} and c ∈ {0, 1}; L is the class of all affine functions.
– For c ∈ {0, 1}, f is called c-separating, if there exists an i ∈ {1, . . . , n} s.t.
ai = c for all a ∈ f −1 (c); the classes Sc contain all c-separating functions.
– For c ∈ {0, 1} and k ≥ 2, f is called c-separating of degree k, if for all
U ⊆ f −1 (c) of size |U | = k there exists an i ∈ {1, . . . , n} s.t. ai = c for all
a ∈ U ; the classes Skc contain all c-separating functions of degree k.
– The class E contains the constant functions and all conjunctions.
– The class V contains the constant functions and all disjunctions.
– f is called a projection, if there exists an i ∈ {1, . . . , n} s.t. f (x1 , . . . , xn ) =
xi ; The class I contains the constant functions and all projections.
– The class N contains the constant functions, all projections and all negations
of projections.
– All other classes are defined from the above by intersection according to
Post’s lattice.
Not surprisingly, the complexity of problems defined by B-formulas and B-
circuits depends on [B], and the complexity of numerous problems for B-circuits
and B-formulas has been classified by means of Post’s lattice [12,14], starting
with satisfiability: Analogously to Schaefer, Lewis in 1978 found a dichotomy for
B-formulas [6]; if [B] contains the function x ∧ y, Sat is NP-complete, else it is
in P.
While for B-circuits the complexity of every decision problem solely depends
on [B] (up to AC0 isomorphism), for formulas this need not be the case, since the
transformation of a B-formula into a B -formula might require an exponential
increase in the formula size even if [B] = [B ], as the B -representation of some
function from B may need to use some input variable more than once [10]. For
example, let h(x, y) = x ∧ y; then there is no shorter {h}-representation of the
function x ∧ y than h(x, h(x, y)).

4 Computational and Structural Dichotomies

Now we consider the connectivity problems for B-formulas and B-circuits:
– BF-Conn(B): Given a B-formula φ, is G(φ) connected?
– st-BF-Conn(B): Given a B-formula φ and two solutions s and t, is there
a path from s to t in G(φ)?
The corresponding problems for circuits are denoted Circ-Conn(B) resp. st-
Circ-Conn(B).
Theorem 1. Let B be a ﬁnite set of Boolean functions.
1. If B ⊆ M, B ⊆ L, or B ⊆ S0 , then
(a) st-Circ-Conn(B) and Circ-Conn(B) are in P,
(b) st-BF-Conn(B) and BF-Conn(B) are in P,
(c) the diameter of every function f ∈ [B] is linear in the number of variables
of f .
The Connectivity of Boolean Satisﬁability 357

2. Otherwise,
(a) st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,
(b) st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,
(c) there are functions f ∈ [B] such that their diameter is exponential in the
number of variables of f .
The proof follows from the Lemmas in the next subsections. By the following
Proposition, we can relate the complexity of B-formulas and B-circuits.
Proposition 1. Every B-formula can be transformed into an equivalent B-
circuit in polynomial time.

Proof. A B-formula already is a suitable encoding for a special B-circuit with

outdegree of at most one.

4.1 The Easy Side of the Dichotomy

Lemma 3. If B ⊆ M, the solution graph of any n-ary function f ∈ [B] is
connected, and df (a, b) = |a − b| ≤ n for any two solutions a and b.

Proof. The table of all closed classes of Boolean functions shows that f is mono-
tonic in this case. Thus, either f = 0, or (1, . . . , 1) must be a solution, and every
other solution a is connected to (1, . . . , 1) in G(φ) since (1, . . . , 1) can be reached
by flipping the variables assigned 0 in a one at a time to 1. Further, if a and
b are solutions, b can be reached from a in |a − b| steps by first flipping all
variables that are assigned 0 in a and 1 in b, and then flipping all variables that
are assigned 1 in a and 0 in b.

Lemma 4. If B ⊆ S0 , the solution graph of any function f ∈ [B] is connected,

and df (a, b) ≤ |a − b| + 2 for any two solutions a and b.

Proof. Since f is 0-separating, there is an i such that ai = 0 for every vector

a with f (a) = 0, thus every b with bi = 1 is a solution. It follows that every
solution t can be reached from any solution s in at most |s − t| + 2 steps by
first flipping the i-th variable from 0 to 1 if necessary, then flipping all other
variables in which s and t differ, and finally flipping back the i-th variable if
necessary.

Lemma 5. If B ⊆ L,
1. st-Circ-Conn(B) and Circ-Conn(B) are in P,
2. st-BF-Conn(B) and BF-Conn(B) are in P,
3. for any function f ∈ [B], df (a, b) = |a − b| for any two solutions a and b
that lie in the same connected component of G(φ).

Proof. Since every function f ∈ L is linear, f (x1 , . . . , xn ) = xi1 ⊕. . .⊕xim ⊕c, and
any two solutions s and t are connected iff they differ only in fictional variables:
If s and t differ in at least one non-fictional variable (i.e., an xi ∈ {xi1 , . . . , xim }),
to reach t from s, xi must be flipped eventually, but for every solution a, any
358 K. Schwerdtfeger

vector b that diﬀers from a in exactly one non-ﬁctional variable is no solution.

If s and t differ only in fictional variables, t can be reached from s in |s − t|
steps by flipping one by one the variables in which they differ.
Since {x ⊕ y, 1} is a base of L (see Fig. 1 int [1]), every B-circuit C can
be transformed in polynomial time into an equivalent {x ⊕ y, 1}-circuit C by
replacing each gate of C with an equivalent {x ⊕ y, 1}-circuit. Now one can
decide in polynomial time whether a variable xi is fictional by checking for C
whether the number of “backward paths” from the output gate to gates labeled
with xi is odd, so st-Circ-Conn(B) is in P.
G(C) is connected iff at most one variable is non-fictional, thus Circ-Conn(B)
is in P.
By Proposition 1, st-BF-Conn(B) and BF-Conn(B) are in P also.

This completes the proof of the easy side of the dichotomy.

4.2 The Hard Side of the Dichotomy

Proposition 2. st-Circ-Conn(B) and Circ-Conn(B), as well as st-BF-
Conn(B) and BF-Conn(B), are in PSPACE for any ﬁnite set B of Boolean
functions.

Proof. This follows as in Lemma 1.

Proposition 3. For 1-reproducing 3-CNF-formulas, the problems st-Conn and

Conn are PSPACE-complete.

Proof. We chose the variables in the proof of Lemma 1 such that the accepting
conﬁguration of the Turing machine corresponds to the (1, . . . , 1) vector.

An inspection of Post’s lattice shows that if B M, B L, and B S0 ,

then [B] ⊇ S12 , [B] ⊇ D1 , or [B] ⊇ Sk02 , ∀k ≥ 2, so we have to prove PSPACE-
completeness and show the existence of B-formulas with an exponential diameter
in these cases.
In the proofs, we will use the following abbreviations: If we have the n variables
x1 , . . . , xn , we write x for x1 ∧ · · · ∧ xn and x for x1 ∧ · · · ∧ xn . Also, we write
(x = c1 · · · cn ) for x1 ↔ c1 ∧· · ·∧xn ↔ cn , where c1 , . . . , cn ∈ {0, 1} are constants;
e.g., we write (x = 101) for x1 ∧ x2 ∧ x3 . Further, we use x ∈ {a, b, . . .} for
(x = a)∨(x = b)∨. . .. Finally, if we have two vectors of Boolean values a and b of
length n and m resp., we write a·b for their concatenation (a1 , . . . , an , b1 , . . . bm ).

Lemma 6. If [B] ⊇ S12 ,

1. st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,
2. st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,
3. for n ≥ 3, there is an n-ary function f ∈ [B] with diameter of at least
2 2 .
n−1
The Connectivity of Boolean Satisﬁability 359

Proof. 1. We reduce the problems for 1-reproducing 3-CNF-formulas to the ones

for B-formulas: We map a 1-reproducing 3-CNF-formula φ and two solutions s
and t of φ to a B-formula φ and two solutions s and t of φ such that s and
t are connected in G(φ ) iff s and t are connected in G(φ), and such that G(φ )
is connected iff G(φ) is connected.
First for any 1-reproducing formula ψ, we define a connectivity-equivalent
formula Tψ ∈ S12 using the standard connectives, then we show how to transform
φ into the B-formula φ that will be equivalent to Tφ .
Let ψ be a 1-reproducing formula over the variables x1 , . . . , xn . We define the
formula Tψ over the n + 1 variables x1 , . . . , xn and y as

Tψ = ψ ∧ y,

where y is a new variable. All solutions a of Tψ (x, y) have an+1 = 1, so Tψ is

1-seperating and 0-reproducing. Moreover, Tψ is still 1-reproducing, and thus in
S12 . For any two solutions s and t of ψ(x), s = s · 1 and t = t · 1 are solutions
of Tψ (x, y), and it is easy to see that they are connected in G(Tψ ) iﬀ s and t
are connected in G(ψ), and that G(Tψ ) is connected iﬀ G(ψ) is connected.
Now we know that for any 1-reproducing 3-CNF-formula φ, Tφ can be ex-
pressed as a B-formula φ since Tφ ∈ S12 . However, the transformation could
lead to an exponential increase in the formula size (see Section 3), so we have to
show how to construct φ in polynomial time. We do this by parenthesizing the
conjunctions of φ such that we get a tree of ∧’s of depth logarithmic in the size of
φ, and then replacing each clause Ci with some B-formula ξCi , and each expres-
sion φ1 ∧ φ2 with a B-formula ξ∧ (φ1 , φ2 ), s.t. the resulting formula is equivalent
to Tφ . This can increase the formula size by only a polynomial in the original size
even if ξ∧ uses some input variable more than once. This is a standard-technique
for such proofs in Post’s framework, see e.g. [1]. Here we easily see that we can
simply replace each clause Ci of φ with some B-formula equivalent to TCi and
each ∧ with a B-formula equivalent to T∧ since (ψ1 ∧y)∧(ψ2 ∧y)∧y ≡ ψ1 ∧ψ2 ∧y,
but in the next proofs this will not be obvious, so we formalize the procedure.
Let φ = C1 ∧ · · · ∧ Cn be a 1-reproducing 3-CNF-formula. Since φ is 1-
reproducing, every clause Ci of φ is itself 1-reproducing, and we can express TCi
through a B-formula TC∗i . Also, we can express T∧ (x1 , x2 ) = x1 ∧ x2 ∧ y through

a B-formula T∧∗ since ∧ is 1-reproducing. Now let φ =Tr TC∗1 , . . . , TC∗n , where
Tr is the following recursive algorithm that takes a list of formulas as input,

Algorithm Tr(ψ1 , . . . , ψm )
1. if m = 1 return ψ1
2. else if m is even, return
Tr(T∧∗ [x1 /ψ1 , x2 /ψ2 ] , T∧∗ [x1 /ψ3 , x2 /ψ4 ] , . . . , T∧∗ [x1 /ψm−1 , x2 /ψm ])
3. else return
Tr(T∧∗ [x1 /ψ1 , x2 /ψ2 ] , T∧∗ [x1 /ψ3 , x2 /ψ4 ] , . . . , T∧∗ [x1 /ψm−2 , x2 /ψm−1 ] , ψm ).
360 K. Schwerdtfeger

Here ψ[xi /ξ] denotes the formula obtained by substituting the formula ξ for the
variable xi in the formula ψ. Note that in every Tψ∗ we have the same variable y.
Since the recursion terminates after a number of steps logarithmic in the
number of clauses of φ, and every step increases the total formula size by
only a constant factor, the algorithm runs in polynomial time. We show φ =
Tφ by induction. The basis is clear. Since Tψ ≡ Tψ∗ , it suﬃces to show that
T∧ [x1 /Tψ1 , x2 /Tψ2 ] ≡ Tψ1 ∧ψ2 :

T∧ [x1 /Tψ1 , x2 /Tψ2 ] = Tψ1 ∧Tψ2 ∧y = (ψ1 ∧y)∧(ψ2 ∧y)∧y ≡ ψ1 ∧ψ2 ∧y = Tψ1 ∧ψ2 .

2. This follows from 1. by Proposition 1.

3. By Lemma 2 there is an 1-reproducing (n− 1)-ary function f with diameter
of at least 2 2 . Let f be represented by a formula φ; then, Tφ represents an
n−1

n-ary function of the same diameter in S12 .

Lemma 7. If [B] ⊇ D1 ,
1. st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,
2. st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,
3. for n ≥ 5, there is an n-ary function f ∈ [B] with diameter of at least
2 2 .
n−3

Proof. 1. This proof is similar to the previous one, but the construction is more
intricate; for every 1-reproducing 3-CNF formula we have to construct a self-dual
function s.t. the connectivity is retained. For clarity, we do the construction in
two steps.
For a 1-reproducing formula ψ over the n variables x1 , . . . , xn , we construct
a formula Tψ∼ ∈ D1 with three new variables (y1 , y2 , y3 ) = y,

Tψ∼ = (ψ(x) ∧ y) ∨ ψ(x) ∧ y ∨ y ∈ {100, 010, 001} .

Observe that Tψ∼ (x, y) is self-dual: for any solution ending with 111, the in-
verse vector (that ends with 000) is no solution; all vectors ending with 100,
010, or 001 are solutions and their inverses are no solutions. Also, Tψ∼ is still
1-reproducing, and it is 0-reproducing since ψ(0 · · · 0) ≡ ψ(1 · · · 1) ≡ 0.
Further, for any two solutions s and t of ψ(x), s = s · 111 and t = t · 111
are solutions of Tψ∼ (x, y) and are connected in G(Tψ∼ ) iff s and t are connected
in G(ψ): Every solution a of ψ corresponds to a solution a · 111 of Tψ∼ , and the
connectivity does not change by padding the vectors with 111, and since there
are no solutions of Tψ∼ ending with 110, 101, or 011, every other solution of Tψ∼
differs in at least two variables from the solutions a · 111 that correspond to
solutions of ψ.
Note that exactly one connected component is added in G(Tψ∼ ) to the com-
ponents corresponding to those of G(ψ): It consists of all vectors ending with
000, 100, 010, or 001 (any two vectors ending with 000 are connected e.g. via
those ending with 001). It follows that G(Tψ∼ ) is always unconnected. To fix
The Connectivity of Boolean Satisfiability 361

this, we modify Tψ∼ to a function Tψ by adding 1 · · · 1 · 110 as a solution, thereby

connecting 1 · · · 1 · 111 (which is always a solution because Tψ∼ is 1-reproducing)
with 1 · · · 1 · 100, and thereby with the additional component of Tψ . To keep the
function self-dual, we must in turn remove 0 · · · 0 · 001, which does not alter the
connectivity. Formally,

Tψ = Tψ∼ ∨ (x ∧ (y = 110)) ∧ ¬(x ∧ (y = 001))

= (ψ(x) ∧ y) ∨ ψ(x) ∧ y (1)
∨ (y ∈ {100, 010, 001} ∧ ¬(x ∧ (y = 001))) ∨ (x ∧ (y = 110)).
Now G(Tψ ) is connected iff G(ψ) is connected.
Next again we use the algorithm Tr from the previous proof to transform any
1-reproducing 3-CNF-formula φ into a B-formula φ equivalent to Tφ , but with
the definition (1) of T . Again, we have to show T∧ [x1 /Tψ1 , x2 /Tψ2 ] ≡ Tψ1 ∧ψ2 .
Here,

T∧ [x1 /Tψ1 , x2 /Tψ2 ] = (Tψ1 ∧ Tψ2 ∧ y) ∨ Tψ1 ∧ Tψ2 ∧ y

∨ y ∈ {100, 010, 001} ∧ ¬ Tψ1 ∧ Tψ2 ∧ (y = 001)
∨ (Tψ1 ∧ Tψ2 ∧ (y = 110)) .
We consider the parts of the formula in turn: For any formula ξ we have Tξ (xξ )∧
y ≡ ξ(xξ ) ∧ y and Tξ (xξ ) ∧ y ≡ ψ(xξ ) ∧ y, where xξ denotes the variables of
ξ. Using Tψ1 (xψ1 ) ∧ Tψ2 (xψ2 ) ∧ y = (Tψ1 (xψ1 ) ∨ Tψ2 (xψ2 )) ∧ y, the first line
becomes

(ψ1 (xψ1 ) ∧ ψ2 (xψ2 ) ∧ y) ∨ ψ1 (xψ1 ) ∧ ψ2 (xψ2 ) ∧ y .

For the second line, we observe Tψ (xψ ) ≡ ψ(xψ ) ∨ ¬(y) ∧ (ψ(xψ ) ∨ ¬(y)) ∧
(y ∈
/ {100, 010, 001} ∨ (xψ ∧ (y = 001))) ∧ (¬(xψ ) ∨ (y = 110)), thus Tψ (xψ ) ∧
(y = 001) ≡ xψ ∧ (y = 001), and the second line becomes
∨ (y ∈ {100, 010, 001} ∧ ¬ (xψ1 ∧ xψ2 ∧ (y = 001))) .
Since Tψ (xψ )∧(y = 110) ≡ (xψ ∧(y = 110)) for any ψ, the third line becomes
∨ (xψ1 ∧ xψ2 ∧ (y = 110)) .
Now T∧ [x1 /Tψ1 , x2 /Tψ2 ] equals

Tψ1 ∧ψ2 = (ψ1 (xψ1 ) ∧ ψ2 (xψ2 ) ∧ y) ∨ ψ1 (xψ1 ) ∧ ψ2 (xψ2 ) ∧ y
∨ (y ∈ {100, 010, 001} ∧ ¬ (xψ1 ∧ xψ2 ∧ (y = 001)))
∨ (xψ1 ∧ xψ2 ∧ (y = 110)) .
2. This follows from 1. by Proposition 1.
3. By Lemma 2 there is an 1-reproducing (n− 3)-ary function f with diameter
of at least 2 2 . Let f be represented by a formula φ; then, Tφ represents an
n−3

n-ary function of the same diameter in D1 .

362 K. Schwerdtfeger

Lemma 8. If [B] ⊇ Sk02 ,

1. st-BF-Conn(B) and BF-Conn(B) are PSPACE-complete,

2. st-Circ-Conn(B) and Circ-Conn(B) are PSPACE-complete,
3. for n ≥ k + 4, there is an n-ary function f ∈ [B] with diameter of at least
2 2 .
n−k−2

Proof. 1. This proof is analogous to the previous one. For a 1-reproducing for-
mula ψ over the n variables x1 , . . . , xn , we construct the formula Tψ∼ ∈ Sk02 with
the additional variables y and (z1 , . . . , zk+1 ) = z,

Tψ∼ = (ψ ∧ y ∧ z) ∨ z ∈
/ {0 · · · 0, 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01} .

Tψ∼ (x, y, z) is 0-separating of degree k since all vectors that are no solutions
of Tψ∼ end with a vector b ∈ {0 · · · 0, 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01} ⊂ {0, 1}k+1
and thus any k of them have at least one common variable assigned 0. Also, Tψ∼
is 0-reproducing and still 1-reproducing.
Further, for any two solutions s and t of ψ(x), s = s · 1 · 0 · · · 0 and t =
t · 1 · 0 · · · 0 are solutions of Tψ∼ (x, y, z) and are connected in G(Tψ∼ ) iﬀ s and t
are connected in G(ψ).
But again, we have produced an additional connected component (consisting
of all vectors not ending with 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01, or 0 · · · 0). To connect
it to a component corresponding to one of ψ, we add 1 · · · 1·1·10 · · · 0 as a solution,

Tψ = (ψ ∧ y ∧ z) ∨ z ∈
/ {0 · · · 0, 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01}
∨(x ∧ y ∧ (z = 10 · · · 0)).

Now G(Tψ ) is connected iﬀ G(ψ) is connected.

Again we show that the algorithm Tr works in this case. Here,

T∧ [x1 /Tψ1 , x2 /Tψ2 ] = (Tψ1 (xψ1 ) ∧ Tψ2 (xψ2 ) ∧ y ∧ z)

∨z ∈
/ {0 · · · 0, 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01}
∨ (Tψ1 (xψ1 ) ∧ Tψ2 (xψ2 ) ∧ y ∧ (z = 10 · · · 0)) ,

which is equivalent to

Tψ1 ∧ψ2 = (ψ1 (xψ1 ) ∧ ψ2 (xψ2 ) ∧ y ∧ z)

∨z ∈/ {0 · · · 0, 10 · · · 0, 010 · · · 0, . . . , 0 · · · 01}
∨ (xψ1 ∧ xψ2 ∧ y ∧ (z = 10 · · · 0)) .

2. This follows from 1. by Proposition 1.

3. By Lemma 2 there is an 1-reproducing (n − k − 2)-ary function f with
diameter of at least 2 2 . Let f be represented by a formula φ; then, Tφ
n−k−2

represents an n-ary function of the same diameter in Sk02 .

This completes the proof of Theorem 1.

The Connectivity of Boolean Satisﬁability 363

5 The Connectivity of Quantified Formulas

Definition 3. A quantiﬁed B-formula φ (in prenex normal form) is an expres-

sion of the form

Q1 y1 · · · Qm ym ϕ(y1 , . . . , ym , x1 , . . . xn ),

where ϕ is a B-formula, and Q1 , . . . , Qm ∈ {∃, ∀} are quantiﬁers. x1 , . . . , xn are

called the free variables of φ.

For quantiﬁed B-formulas, we deﬁne the connectivity problems

– QBF-Conn(B): Given a quantiﬁed B-formula φ, is G(φ) connected?
– st-QBF-Conn(B): Given a quantiﬁed B-formula φ and two solutions s and
t, is there a path from s to t in G(φ)?

Theorem 2. Let B be a ﬁnite set of Boolean functions.

1. If B ⊆ M or B ⊆ L, then
(a) st-QF-Conn(B) and QBF-Conn(B) are in P,
(b) the diameter of every quantified B-formula is linear in the number of
free variables.
2. Otherwise,
(a) st-QBF-Conn(B) and QBF-Conn(B) are PSPACE-complete,
(b) there are quantified B-formulas with at most one quantifier such that
their diameter is exponential in the number of free variables.

Proof. See the extended version of this paper [16].

Remark 1. An analog to Theorem 2 also holds for quantiﬁed circuits as deﬁned

in [12, Section 7].

6 Conclusions
While the classification for CSPs required an essential enhancement of Schae-
fer’s framework and the introduction of new classes of CNF(S)-formulas, for
B-formulas and B-circuits the connectivity issues fit entirely into Post’s frame-
work, although the proofs were quite novel, and made substantial use of Gopalan
et al.’s results for 3-CNF-formulas.
As Gopalan et al. stated, we also believe that “connectivity properties of
Boolean satisfiability merit study in their own right”, which is substantiated by
the recent interest in reconfiguration problems. Moreover, we imagine our results
could aid the advancement of circuit based SAT solvers.

Acknowledgments. I am grateful to Heribert Vollmer for pointing me to these

interesting themes.
364 K. Schwerdtfeger

References
1. Böhler, E., Creignou, N., Reith, S., Vollmer, H.: Playing with boolean blocks, part
i: Posts lattice with applications to complexity theory. In: SIGACT News (2003)
2. Fu, Z., Malik, S.: Extracting logic circuit structure from conjunctive normal form
descriptions. In: 20th International Conference on VLSI Design, Held Jointly with
6th International Conference on Embedded Systems, pp. 37–42. IEEE (2007)
3. Gopalan, P., Kolaitis, P.G., Maneva, E., Papadimitriou, C.H.: The connectivity of
boolean satisfiability: Computational and structural dichotomies. SIAM J. Com-
put. 38(6), 2330–2355 (2009), http://dx.doi.org/10.1137/07070440X
4. Ito, T., Demaine, E.D., Harvey, N.J.A., Papadimitriou, C.H., Sideri, M., Uehara,
R., Uno, Y.: On the complexity of reconfiguration problems. Theor. Comput.
Sci. 412(12-14), 1054–1065 (2011),
http://dx.doi.org/10.1016/j.tcs.2010.12.005
5. Kamiński, M., Medvedev, P., Milanič, M.: Shortest paths between shortest paths
and independent sets. In: Iliopoulos, C.S., Smyth, W.F. (eds.) IWOCA 2010. LNCS,
vol. 6460, pp. 56–67. Springer, Heidelberg (2011)
6. Lewis, H.R.: Satisfiability problems for propositional calculi. Mathematical Sys-
tems Theory 13(1), 45–53 (1979)
7. Makino, K., Tamaki, S., Yamamoto, M.: On the boolean connectivity problem
for horn relations. In: Marques-Silva, J., Sakallah, K.A. (eds.) SAT 2007. LNCS,
vol. 4501, pp. 187–200. Springer, Heidelberg (2007)
8. Maneva, E., Mossel, E., Wainwright, M.J.: A new look at survey propagation and
its generalizations. Journal of the ACM (JACM) 54(4), 17 (2007)
9. Mézard, M., Mora, T., Zecchina, R.: Clustering of solutions in the random satisfi-
ability problem. Physical Review Letters 94(19), 197205 (2005)
10. Michael, T.: On the applicability of post’s lattice. Information Processing Let-
ters 112(10), 386–391 (2012)
11. Post, E.L.: The Two-Valued Iterative Systems of Mathematical Logic(AM-5),
vol. 5. Princeton University Press (1941)
12. Reith, S., Wagner, K.W.: The complexity of problems defined by Boolean circuits
(2000)
13. Schaefer, T.J.: The complexity of satisfiability problems. In: STOC 1978,
pp. 216–226 (1978)
14. Schnoor, H.: Algebraic techniques for satisfiability problems. Ph.D. thesis, Univer-
sität Hannover (2007)
15. Schwerdtfeger, K.W.: A computational trichotomy for connectivity of boolean satis-
fiability. ArXiv CoRR abs/1312.4524 (2013), extended version of a paper submitted
to the JSAT Journal, http://arxiv.org/abs/1312.4524
16. Schwerdtfeger, K.W.: The connectivity of boolean satisfiability: Dichotomies for
formulas and circuits. ArXiv CoRR abs/1312.6679 (2013), extended version of this
paper, http://arxiv.org/abs/1312.6679
17. Vollmer, H.: Introduction to Circuit Complexity: A Uniform Approach. Springer-
Verlag New York, Inc. (1999)
18. Wu, C.A., Lin, T.H., Lee, C.C., Huang, C.Y.R.: Qutesat: a robust circuit-based sat
solver for complex circuit structure. In: Proceedings of the Conference on Design,
Automation and Test in Europe, EDA Consortium, pp. 1313–1318 (2007)
19. Zverovich, I.E.: Characterizations of closed classes of boolean functions in terms of
forbidden subfunctions and post classes. Discrete Appl. Math. 149(1-3), 200–218
(2005), http://dx.doi.org/10.1016/j.dam.2004.06.028
Randomized Communication Complexity
of Approximating Kolmogorov Complexity

Nikolay Vereshchagin

Moscow State University, Higher School of Economics, Yandex

ver@mech.math.msu.su

Abstract. The paper [Harry Buhrman, Michal Koucký, Nikolay

Vereshchagin. Randomized Individual Communication Complexity.
IEEE Conference on Computational Complexity 2008: 321-331] consid-
ered communication complexity of the following problem. Alice has a
binary string x and Bob a binary string y, both of length n, and they
want to compute or approximate Kolmogorov complexity C(x|y) of x
conditional to y. It is easy to show that deterministic communication
complexity of approximating C(x|y) with additive error α is at least
n−2α −O(1). The above referenced paper asks what is randomized com-
munication complexity of this problem and shows that for r-round ran-
domized protocols its communication complexity is at least Ω((n/α)1/r ).
In this paper, for some positive ε, we show the lower bound 0.99n for
(worst case) communication length of any randomized protocol that with
probability at least 0.01 approximates C(x|y) with additive error εn for
all input pairs.

1 Introduction

Kolmogorov complexity of x conditional to y is deﬁned as the minimal length of

a program (for a universal machine) that given y as input prints x. Assume that
Alice has x and Bob has y, which are strings of length n. Is there a communication
protocol to transmit x to Bob (i.e. to compute the function I(x, y) = x) that
communicates about C(x|y) bits for all input pairs (x, y)?
The trivial upper bound for communication complexity of this problem is n
(Alice sends her input to Bob). If Alice knew y, she could do better: she could ﬁnd
C(x|y) bit program transforming y to x and send it to Bob. However, without
any prior knowledge of y it seems impossible to solve the problem in about C(x|y)
communicated bits, and the paper [3] conﬁrms this intuition for deterministic
protocols. Moreover, for deterministic protocols even testing equality x = y
may require much more than C(x|y) bits of communication. Indeed, for every
deterministic protocol that tests equality there is an input pair (x, x) on which
the protocol communicates at least n bits (see e.g. [5]). On the other hand, we
have C(x|x) = O(1).

The work was in part supported by the RFBR grant 12-01-00864.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 365–374, 2014.

c Springer International Publishing Switzerland 2014
366 N. Vereshchagin

Surprisingly, the situation changes when we switch to randomized communi-

cation protocols. The paper [4] shows that for every positive ε there is a ran-
domized communication protocol with public3 randomness that for all input pairs
(x, y) communicates at most C(x|y) + O( C(x|y)) + log(1/ε) bits and 3 computes
I(x, y) = x with error probability at most ε. That protocol runs in O( C(x|y))
rounds.
The paper [4] asks whether it is possible to reduce the number of rounds
(keeping
3 the communication close to C(x|y)) or to decrease the surplus term
O( C(x|y)) in communication length. Both questions are related to the com-
munication complexity of approximating the conditional complexity C(x|y). In-
deed, assume that there is a randomized communication protocol that finds
C(x|y) with additive error α in r rounds and communicates at most l bits. Then
the following randomized communication protocol computes I(x, y) = x in r + 1
rounds with additional error ε and communicates at most C(x|y)+l+α+log(1/ε)
bits. Alice and Bob first run the given protocol to approximate C(x|y). Assume
that the protocol outputs an integer k. Then Alice communicates to Bob the
value of a randomly chosen linear mapping A : {0, 1}n → {0, 1}k+α+log(1/ε) on
her x. Bob finds any x in the set S = {x | C(x |y) < k + α} such that Ax = Ax
and outputs it (we consider protocols with public randomness thus Bob knows
A). The additional error probability of this protocol is the probability of the
event that S contains some x = x such that Ax = Ax . By union bound this
probability is at most 2k+α 2−k−α+log ε = ε (here 2k+α is an upper bound for the
cardinality of S and 2−k−α+log ε is the probability that Ax = Ax for any fixed
x = x).
The paper [4] shows that the worst case randomized communication complex-
ity of approximating C(x|y) with additive error α in r rounds is Ω((n/α)1/r ) and
asks what happens when the number of rounds is not bounded. In this paper,
we prove that for some positive ε every randomized protocol that for all input
pairs with probability at least 0.01 computes C(x|y) with additive error εn must
communicate 0.99n bits for some input pair. That is, randomized communica-
tion complexity of approximating C(x|y) is close to the trivial upper bound n
unless the error is very bad (more than εn).
Actually, we prove more. In the strongest form, our result shows a lower
bound for communication complexity of approximating the complexity of the
pair C(x, y|n) conditional to n, and not conditional complexity C(x|y). Let us
show that approximating C(x, y|n) and C(x|y) reduce to each other. By sym-
metry of information [6], we have

|C(x, y) − (C(y) + C(x|y))| 4 log n + O(1).

As Bob can ﬁnd C(y) privately1 and transmit it to Alice in log n bits, approxi-
mating C(x, y) and C(x|y) with more than logarithmic additive error are almost
1
Although C(y) is not computable, Bob can do that, as we are using a non-uniform
model of computation where the parties can just hard-wire a table containing C(x)
for all x of length up to 2n.
Randomized Communication Complexity 367

equivalent. On the other hand,

|C(x, y) − C(x, y|n)| 2 log n + O(1)
and hence approximating C(x, y) and C(x, y|n) with more than logarithmic ad-
ditive error are also equivalent. More speciﬁcally, if a protocol approximates
C(x|y) with additive error α then it can approximate C(x, y|n) with additive
error α + 6 log n + O(1) by communicating extra log n bits, and the other way
around.
We show that if a randomized protocol of depth d with shared randomness for
every (x, y) ∈ {0, 1}n ×{0, 1}n with probability at least p approximates C(x, y|n)
with additive error α, then
d n − log n − O(α/p). (1)
Moreover, our result holds for a weaker notion of enumeration in place of ap-
proximation. We say that a protocol e-enumerates a number if it outputs a list
of e entries that contains that number.2 Obviously, if a protocol approximates a
function with additive error α then it is able to 2α + 1-enumerate that function.
We show that the lower bound (1) holds also for the depth of any random-
ized protocol that with probability at least p for any input pair α-enumerates
C(x, y|n).

2 Preliminaries
All logarithms in this paper have the base 2.

2.1 Kolmogorov Complexity

Let U be a partial computable function that maps pairs of binary strings to
binary strings. Kolmogorov complexity of a binary string x conditional to a
binary string y with respect to U is defined as
CU (x|y) = min{|p| | U (p, y) = x}.
The notation |p| refers to the length of p.
We call U universal or optimal if for any other partial computable function
V there is a constant c such that
CU (x|y) CV (x|y) + c
for all x, y.
By Solomonoff–Kolmogorov theorem universal partial computable functions
exist [6]. We fix a universal U , drop the subscript U and call C(x|y) the Kol-
mogorov complexity of x conditional to y. We call U also a “universal machine”.
If U (p, y) = x we say that “program p outputs x on input y”.
2
The notion of enumeration has been studied in many contexts. In the context of
communication complexity, it was first studied perhaps in [2].
368 N. Vereshchagin

Kolmogorov complexity of a string x is the minimal length of a program that

prints x on the empty input Λ:

C(x) = C(x|Λ) = min{|p| | U (p, Λ) = x}.

Kolmogorov complexity of other ﬁnite objects (like pairs of strings) is deﬁned

as follows: we ﬁx a computable encoding of the objects in question by binary
strings and declare Kolmogorov complexity of an object to be Kolmogorov com-
plexity of its code.
For the properties of Kolmogorov complexity we refer to the textbook [6].
Actually, in this paper we do not need many of them. The ﬁrst property we will
need is an upper bound on the number of string of small complexity: for every
y and k there are less than 2k strings x with C(x|y) < k. We will use also the
following obvious inequality C(x) |x| + O(1). Also we will use the inequality
for the complexity C(x, y) of the pair of strings x, y:

C(x, y) 2C(x) + C(y) + O(1),

which is almost obvious: a short program to print the pair (x, y) can be identified
by the shortest program to print x encoded in a prefix free way (the easiest prefix
free encoding doubles the length) concatenated with the shortest program to
print y. Finally, we will implicitly use the fact that algorithmic transformations
do not increase complexity: C(A(x)) C(x) + O(1) for every algorithm A and
all x (the constant O(1) depends on A but not on x).

2.2 Communication Protocols

In this paper we use standard notions of a deterministic communication protocol

and of a communication protocol with public randomness, as in the textbook [5].
Assume that Alice and Bob want to compute a function f : X × Y → Z where
the input x ∈ X is given to Alice, and the input y ∈ Y to Bob.
A deterministic communication protocol to compute such a function is iden-
tified by a rooted finite binary tree whose inner nodes are labeled with letters
A (Alice) and B (Bob), labels indicate the turn to move. Additionally, each
A-marked node is labeled by a function from X to {0, 1} (different nodes may
be labeled by different functions). This function identifies how the bit sent by
Alice in her turn depends on her input. Similarly each B-marked node is labeled
by a function from Y to {0, 1}. Each leaf of the tree is labeled by an element of
Z (the output of the protocol).
Each node of the tree represents the state of the computation according to
the protocol, which is the sequence of bits sent so far. The root is the initial
state (no bits sent yet), the left son of a node u represents the state obtained
after sending 0 in the state u and the right son of a node u represents the state
obtained after sending 1 in the state u. When the current node is a leaf the
computation halts, and the label of that leaf is considered as the result of the
protocol, which should be equal to the value of the function f on the input pair.
Randomized Communication Complexity 369

The depth of the protocol tree is the worst case length of communication
according to the protocol.
We will consider also randomized communication protocols. A randomized
communication protocol of depth d with public randomness is a probability
distribution P over deterministic communication protocols of depth d. We say
that a randomized protocol P computes a function f with success probability p
if for all input pairs (x, y) the protocol P drawn at random with respect to P
computes f (x, y) with probability at least p.

3 Results
3.1 Deterministic Protocols
Theorem 1. If a deterministic protocol P computes C(x|y) with additive error
α then its depth d is at least n − 2α − O(1).

Proof. Indeed, let P (x, y) denote the output of P for the input pair (x, y). The
protocol P deﬁnes a partition of the set {0, 1}n × {0, 1}n into at most 2d rect-
angles3 such that P (x, y) is constant on every rectangle from the partition [5].
Let (y, y) be a diagonal input pair, A × B the rectangle in the partition
containing it and k the value of P on that rectangle. As C(y|y) = O(1), we have
k α + O(1). Since the rectangle A × B includes A × {y}, we have C(x|y)
2α + O(1) for all x ∈ A, which implies that |A| 22α+O(1) . Hence the number
of diagonal pairs (y , y ) in A × B is at most 22α+O(1) . As the total number of
diagonal pairs is 2n , it follows that the partition should have at least 2n−2α−O(1)
rectangles hence d n − 2α − O(1).

3.2 Randomized Protocols

For randomized protocols it is much harder to derive lower bounds for commu-
nication complexity of our problem. For ﬁxed number of rounds a lower bound
was shown in [4].

Theorem 2 ([4]). Assume that a randomized r round protocol with shared ran-
domness for every (x, y) ∈ {0, 1}n ×{0, 1}n communicates at most d bits and with
probability at least p > 1/2 produces a number k such that k C(x|y) < k + α.
Then d Ω((n/α)1/r ). The constant in Ω-notation depends on r and p.

We strengthen this theorem by removing the dependence of the lower bound

on r. Our lower bound holds even for protocols whose success probability p may
approach 0. Our main result shows that approximating C(x, y|n) (and hence ap-
proximating C(x|y)) is hard for arbitrary randomized communication protocols.

3
A rectangle is a set of the form A × B.
370 N. Vereshchagin

Theorem 3. Assume that a randomized protocol of depth d with shared ran-

domness for every (x, y) ∈ {0, 1}n × {0, 1}n with probability at least p produces
a list of α numbers containing C(x, y|n). Then
d n − log n − O(α/p).
Corollary 1. For some positive ε for all large enough n there is no randomized
protocol of depth 0.99n that for all input pairs with probability at least 0.01
approximates C(x, y|n) with additive error εn. The same statement holds for
C(x, y) and C(x|y) in place of C(x, y|n).
Proof (Proof of Theorem 3). First notice that it suffices to prove the statement
for α = 1. Indeed, if a protocol computes a list with α entries containing C(x, y|n)
with probability p then a randomly chosen entry of the list equals C(x, y|n) with
probability p/α. Thus we will assume that α = 1. In other words, we will consider
protocols that compute C(x, y|n) with success probability p.
Assume that there is a randomized protocol of depth d that computes C(x, y|n)
for every input pair (x, y) with success probability at least p. By Yao’s prin-
ciple [7], it follows that for any probability distribution μ on pairs (x, y) ∈
{0, 1}n × {0, 1}n there is a deterministic protocol of depth d that computes
C(x, y|n) on a fraction at least p of input pairs with respect to μ. Thus it suf-
fices to find a distribution μ such that every deterministic protocol that computes
C(x, y|n) on a fraction at least p of input pairs with respect to μ has large depth.
To show that the constructed distribution μ has this property we will use a
method similar to the discrepancy method [5]. More specifically, for the con-
structed distribution μ, for all rectangles R ⊂ {0, 1}n × {0, 1}n the following will
hold: The fraction of pairs (with respect to μ) inside the rectangle that have any
specific value of the function C(x, y|n) is small compared to the size of the rect-
angle. The following lemma states that for such a μ every deterministic protocol
of small depth is able to compute C(x, y|n) only for a small fraction of input
pairs. In that lemma μ is a probability distributions over {0, 1}n × {0, 1}n and
f is any function from {0, 1}n × {0, 1}n into N.
Lemma 1. Assume that for every rectangle R ⊂ {0, 1}n × {0, 1}n and all k ∈ N
we have
μ({(x, y) ∈ R | f (x, y) = k}) ε|R| + δ.
Then every deterministic protocol of depth d computes f correctly on a fraction
at most
ε22n + δ2d
of input pairs with respect to μ.
Proof. Fix a deterministic protocol P of depth d and call P (x, y) its output on
input pair (x, y). The protocol P defines a partition of the set {0, 1}n × {0, 1}n
into at most 2d rectangles such that P (x, y) is constant on every rectangle from
the partition [5]. The contribution of any rectangle R from the partition to the
fraction of successful pairs equals
μ({(x, y) ∈ R | f (x, y) = k})
Randomized Communication Complexity 371

where k stands for the value of P (x, y) on the rectangle. By the assumption this
contribution is at most ε|R| + δ. Summing up the contributions of all rectangles
we obtain the upper bound ε22n + δ2d .

On the top level the construction of μ is the following. For some integer l n,
we construct a family of l distributions μi where i = 2n − l + 1, . . . , 2n, with the
following properties:
(1) |C(x, y|n) − i| = O(1) for all pairs (x, y) in the support of μi ;
(2) μi (R) ε |R| + δ for every rectangle R ⊂ {0, 1}n × {0, 1}n .
Then we will let μ be the arithmetic mean of μi . The properties (1) and (2)
imply that the assumptions of Lemma 1 are fulﬁlled for
ε
δ

ε=O and δ = O
l l
(for the function f (x, y) = C(x, y|n)). Indeed, for any k and for any rectangle R
the μ-probability of the set

{(x, y) ∈ R | C(x, y|n) = k}

is the arithmetic mean of its μi -probabilities. By property (1) the μi -probability

of this set is non-zero only when i is in the interval [k − O(1); k + O(1)] and by
property (2) for such i’s it is at most ε |R| + δ .
By Lemma 1 properties (1) and (2) imply that every deterministic protocol
of depth d computes C(x, y|n) correctly on a fraction at most
ε 22n + δ 2d

O
l
of input pairs with respect to μ.
Is suﬃces to construct a large family of distributions such that properties (1)
and (2) hold for small ε , δ . To this end we will need the following combinatorial
lemma.
Lemma 2. For every n 1 and every 3 < i 2n there is a bipartite graph
Gn,i whose left and right nodes are all binary strings of length n, that has at
least 2i−1 and at most 2i+1 edges and for every left set A and right set B with

log |A|, log |B| > 2n − i + log n + 4

the rectangle A × B has at most |A × B| · 2i−2n+1 edges.

Let us finish the proof of the theorem assuming this lemma. Let l ∈ [n; 2n) be
an integer number to be chosen later. Apply Lemma 2 to all i = 2n−l+1, . . . , 2n.
The number of edges En,i in the resulting graph is between 2i−1 and 2i+1 . We
may assume that the graph Gn,i is computable given n, i (using brute force
search we can find the first graph satisfying the lemma). Thus the Kolmogorov
complexity of each edge in Gn,i (conditional to n) is at most i + O(1) (every
372 N. Vereshchagin

edge can be identified by a its i + 1 bit index). Remove from the graph all edges
of complexity less than i − 2. The number of removed edges is less that 2i−2 and
hence the resulting graph has more than 2i−1 − 2i−2 = 2i−2 edges.
Let μi be the uniform probability distribution over the edges of Gn,i . The first
property holds by construction. Let us show that the second property holds for
some small ε , δ for every rectangle A × B. Assume first that both log |A| and
log |B| are larger than 2n − i + log n + 4 (this bound comes from Lemma 2). The
probability that a random edge from Gn,i falls into A × B is at most the number
of edges in A × B divided by the total number of edges in Gn,i . By Lemma 2
the number of edges in A × B is at most |A × B| · 2i−2n+1 and En,i is at least
2i−2 . Hence
μi (A × B) = O(|A × B|/22n ).
Otherwise either |A|, or |B| is less than 22n−i+log n+4 and we use the trivial
upper bound |A × B| 2n × 22n−i+log n+4 for the number of edges of Gn,i in
A × B and the inequality i > 2n − l. We have

μi (A × B) |A × B|/2i−2 = O(23n−2i+log n )

=O(22l−n+log n) ).

Thus the second property4 holds for

ε = O(2−2n ) and δ = O(22l−n+log n ).

By Lemma 1 if a deterministic depth d protocol computes C(x, y|n) on a

fraction p of input pairs with respect to μ then
1 + 2d+2l−n+log n

pO . (2)
l
By Yao’s principle Equation (2) also holds for the success probability of every
depth d randomized protocol to compute C(x, y|n). Now we have to choose l so
that this inequality yields the best lower bound for d. A simple analysis reveals
that an almost optimal choice of l is such that the exponent in the power of 2
in the right hand side of (2) is 0, that is l = (n − d − log n)/2 (notice that if this
l is negative then there is nothing to prove). Plugging such l in (2), we obtain

O(1)
p .
n − log n − d
The statement of the theorem easily follows.
4
The reader could wonder why we did not let μi be the uniform distribution over
all pairs of Kolmogorov complexity about i. This distribution certainly satisfies the
first property. However the second property is fulfilled only for ε = 2−i , which is
much larger than 2−2n . Indeed, let R = A × A where A is the set of all extensions
of a fixed string of length 2n − i and complexity close to 2n − i. Then complexity of
almost all pairs in R is close to (2n − i) + (i − n) + (i − n) = i. Hence μi (R) is close
to |R|2−i .
Randomized Communication Complexity 373

It remains to prove Lemma 2. The lemma is proved by the probabilistic

method. We will show that a randomly chosen graph has the desired proper-
ties with positive probability. The probability distribution over graphs is defined
as follows. Every pair (left node, right node) is an edge of the graph with prob-
ability 2i−2n and decisions for different pairs are independent.
We have to show that both requirements hold with probability more than one
half. To this end we will use the Chernoff bound in the exponential form [1,
Cor A.1.14]: for any independent random variables T1 , . . . , Tk with values 0,1
the probability that their sum T exceeds twice the expectation ET of T is less
than 2−ET /4 and the probability that T is less than ET /2 is less than 2−ET /6 .
The first requirement states that the number of edges in the graph is between
2i−1 and 2i+1 . The expected number of edges is 2i . Hence by Chernoff bound5
the probability that the requirement is not met is at most 2−2 /4 + 2−2 /6 < 1/2,
i i

as i 4.
The second requirement states that for all A, B of cardinality at least
22n−i+log n+4 the number of edges in A × B does not exceed twice its expec-
tation. Fix a and b greater than 22n−i+log n+4 32. Fix A and B of sizes a, b
respectively. The expected number of edges that connect A and B is ab2i−2n .
Thus the probability that the number of edges between A and B exceeds its av-
erage two times is at most 2−ab2
i−2n−2
. The number of possible A’s of size a is at
most 2na . Similarly, the number of possible B’s of size b is at most 2nb . By union
bound, the probability that there are A and B of sizes a, b respectively, that
i−2n−2
violate the statement of the theorem is at most 2nb+na−ba2 . The exponent
in this formula can be written as the sum of b(n−a2i−2n−3 ) and a(n−b2i−2n−3 ).
The lower bound for |A|, |B| was chosen so that both terms n − a2i−2n−3 and
n − b2i−2n−3 be less than −n. By union bound the probability that there are A
and B, that violate the statement of the theorem is at most

n n n
2 2 2
−bn−an −bn
2 = 2 2−an < 1/2.
b,a=32 b=32 a=32

4 Open Problems and Acknowledgments

1. What is communication complexity of approximating C(x|y) for quantum

communication protocols?
2. Is it possible to drop the annoying log n term in the lower bound of Theo-
rem 3?
3. Is it true that the depth of any randomized protocol which for every input
pair (x, y) with probability at least p approximates C(x, y) (or C(x|y)) with
additive error α is also at least n − O(log n) − O(α/p)?
The author is grateful to anonymous referees for helpful suggestions.

5
We could use here a weaker bound of large deviations.
374 N. Vereshchagin

References
1. Alon, N., Spencer, J.: The probabilistic method, 2nd edn. John Wiley & Sons (2000)
2. Ambainis, A., Buhrman, H., Gasarch, W.I., Kalyanasundaram, B., Torenvliet, L.:
The communication complexity of enumeration, elimination and selection. Journal
of Computer and System Sciences 63, 148–185 (2001)
3. Buhrman, H., Klauck, H., Vereshchagin, N.K., Vitányi, P.M.B.: Individual commu-
nication complexity. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996,
pp. 19–30. Springer, Heidelberg (2004)
4. Buhrman, H., Koucký, M., Vereshchagin, N.: Randomized Individual Communica-
tion Complexity. In: IEEE Conference on Computational Complexity, pp. 321–331
(2008)
5. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press
(1997)
6. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications.
Springer (1997)
7. Yao, A.C.-C.: Probabilistic computations: Toward a uniﬁed measure of complexity.
In: 18th Annual IEEE Symposium on Foundation of Computer Science, pp. 222–227
(1977)
Space Saving by Dynamic Algebraization

Martin Fürer and Huiwen Yu

Department of Computer Science and Engineering

The Pennsylvania State University, University Park, PA, USA
{furer,hwyu}@cse.psu.edu

Abstract. Dynamic programming is widely used for exact computa-

tions based on tree decompositions of graphs. However, the space com-
plexity is usually exponential in the treewidth. We study the problem of
designing eﬃcient dynamic programming algorithm based on tree decom-
positions in polynomial space. We show how to construct a tree decom-
position and extend the algebraic techniques of Lokshtanov and Nederlof
[18] such that the dynamic programming algorithm runs in time O∗ (2h ),
where h is the maximum number of vertices in the union of bags on the
root to leaf paths on a given tree decomposition, which is a parameter
closely related to the tree-depth of a graph [21]. We apply our algorithm
to the problem of counting perfect matchings on grids and show that it
outperforms other polynomial-space solutions. We also apply the algo-
rithm to other set covering and partitioning problems.

Keywords: Dynamic programming, tree decomposition, space-eﬃcient

algorithm, exponential time algorithms, zeta transform.

1 Introduction
Exact solutions to NP-hard problems typically adopt a branch-and-bound, in-
clusion/exclusion or dynamic programming framework. While algorithms based
on branch-and-bound or inclusion/exclusion techniques [20] have shown to be
both time and space efficient, one problem with dynamic programming is that for
many NP-hard problems, it requires exponential space to store the computation
table. As in practice programs usually run out of space before they run out of
time [27], an exponential-space algorithm is considered not scalable. Lokshtanov
and Nederlof [18] have recently shown that algebraic tools like the zeta trans-
form and Möbius inversion [22,23] can be used to obtain space efficient dynamic
programming under some circumstances. The idea is sometimes referred to as
the coefficient extraction technique which also appears in [15,16].
The principle of space saving is best illustrated with the better known
Fourier transform. Assume we want to compute a sequence of polynomial addi-
tions and multiplications modulo xn − 1. We can either use a linear amount of
storage and do many complicated convolution operations throughout, or we can
start and end with the Fourier transforms and do the simpler component-wise

Research supported in part by NSF Grant CCF-0964655 and CCF-1320814.

E.A. Hirsch et al. (Eds.): CSR 2014, LNCS 8476, pp. 375–388, 2014.

c Springer International Publishing Switzerland 2014
376 M. Fürer and H. Yu

operations in between. Because we can handle one component after another,

during the main computation, very little space is needed. This principle works
for the zeta transform and subset convolution [3] as well.
In this paper, we study the problem of designing polynomial-space dynamic
programming algorithms based on tree decompositions. Lokshtanov et al. [17]
have also studied polynomial-space algorithms based on tree decomposition.
They employ a divide and conquer approach. For a general introduction of tree
decomposition, see the survey [6]. It is well-known that dynamic programming
has wide applications and produces prominent results on efficient computations
defined on path decomposition or tree decomposition in general [4]. Tree de-
composition is very useful on low degree graphs as they are known to have a
relatively low pathwidth [9]. For example, it is known that any degree 3 graph
of n vertices has a path decomposition of pathwidth n6 . As a consequence, the
minimum dominating set problem can be solved in time O∗ (3n/6 )1 , which is the
best running time in this case [26]. However, the algorithm trades large space
usage for fast running time.
To tackle the high space complexity issue, we extend the method of [18] in a
novel way to problems based on tree decompositions. In contrast to [18], here
we do not have a fixed ground set and cannot do the transformations only
at the beginning and the end of the computation. The underlying set changes
continuously, therefore a direct application on tree decomposition does not lead
to an efficient algorithm. We introduce the new concept of zeta transforms for
dynamic sets. Guided by a tree decomposition, the underlying set (of vertices in a
bag) gradually changes. We adapt the transform so that it always corresponds to
the current set of vertices. Herewith, we might greatly expand the applicability
of the space saving method by algebraization.
We broadly explore problems which fit into this framework. Especially, we
analyze the problem of counting perfect matchings on grids which is an interest-
ing problem in statistical physics [12]. There is no previous theoretical analysis
on the performance of any algorithm for counting perfect matchings on grids of
dimension at least 3. We analyze two other natural types of polynomial-space
algorithms, the branching algorithm and the dynamic programming algorithm
based on path decomposition of a subgraph [14]. We show that our algorithm
outperforms these two approaches. Our method is particularly useful when the
treewidth of the graph is large. For example, grids, k-nearest-neighbor graphs [19]
and low degree graphs are important graphs in practice with large treewidth. In
these cases, the standard dynamic programming on tree decompositions requires
exponential space.
The paper is organized as follows. In Section 2, we summarize the basis of tree
decomposition and related techniques in [18]. In Section 3, we present the frame-
work of our algorithm. In Section 4, we study the problem of counting perfect
matchings on grids and extend our algorithmic framework to other problems.

1
O∗ notation hides the polynomial factors of the expression.
Space Saving by Dynamic Algebraization 377

2 Preliminaries

2.1 Saving Space Using Algebraic Transformations

Lokshtanov and Nederlof [18] introduce algebraic techniques to solve three types
of problems. The first technique is using discrete Fourier transforms (DFT) on
problems of very large domains, e.g., for the subset sum problem. The second
one is using Möbius and zeta transforms when recurrences used in dynamic pro-
gramming can be formulated as subset convolutions, e.g., for the unweighted
Steiner tree problem. The third one is to solve the minimization version of the
second type of problems by combining the above transforms, e.g., for the trav-
eling salesman problem. To the interest of this paper, we explain the techniques
used in the second type of problems.
Given a universe V , let R be a ring and consider functions from 2V to R.
Denote the collection of such functions by R[2V ]. A singleton fA [X] is an element
of R[2V ] which is zero unless X = A. The operator ⊕ is the pointwise addition
and the operator ( is the pointwise multiplication. We first define some useful
algebraic transforms.
The zeta transform of a function f ∈ R[2V ] is defined to be

ζf [Y ] = f [X]. (1)
X⊆Y

The Möbius transform/inversion [22,23] of f is deﬁned to be

μf [Y ] = (−1)|Y \X| f [X]. (2)
X⊆Y

The Möbius transform is the inverse transform of the zeta transform, as they
have the following relation [22,23]:

μ(ζf )[X] = f [X]. (3)

The high level idea of [18] is that, rather than directly computing f [V ] by
storing exponentially many intermediate results {f [S]}S⊆V , they compute the
zeta transform of f [S] using onlypolynomial space. f [V ] can be obtained by
Möbius inversion (2) as f [V ] = X⊆V (−1)|V \X| (ζf )[X]. Problems which can
be solved in this manner have a common nature. They have recurrences which
can be formulated by subset convolutions. The subset convolution [3] is deﬁned
to be
f ∗R g[X] = f (X )g(X \ X ). (4)
X ⊆X

To apply the zeta transform to f ∗R g, we need the union product [3] which
is deﬁned as
f ∗u g[X] = f (X1 )g(X2 ). (5)

X1 X2 =X
378 M. Fürer and H. Yu

The relation between the union product and the zeta transform is as follows
[3]:
ζ(f ∗u g)[X] = (ζf ) ( (ζg)[X]. (6)
In [18], functions over (R[2V ]; ⊕, ∗R ) are modeled by arithmetic circuits. Such
a circuit is a directed acyclic graph where every node is either a singleton (con-
stant gate), a ⊕ gate or a ∗R gate. Given any circuit C over (R[2V ]; ⊕, ∗R )
which outputs f , every gate in C computing an output a from its inputs b, c is
|V |
replaced by small circuits computing a relaxation {ai }i=1 of a from relaxations
|V | |V |
{bi }i=1 and {ci }i=1 of b and c respectively. (A relaxation of a function f ∈ R[2V ]
is a sequence of functions {f i : f i ∈ R[2V ], 0 ≤ i ≤ |V |}, such that ∀i, X ⊆ V ,
f i [X] = f [X] if i = |X|, f i [X] = 0 if i < |X|, and f i [X] is an arbitrary value if
i > |X|.) For a ⊕ gate, replace a = b ⊕ c by ai = bi ⊕ ci , for 0 ≤ i ≤ |V |. For a
i
∗R gate, replace a = b ∗R c by ai = j=0 bj ∗u ci−j , for 0 ≤ i ≤ |V |. This new
circuit C1 over (R[2V ]; ⊕, ∗u ) is of size O(|C|·|V |) and outputs f|V | [V ]. The next
step is to replace every ∗u gate by a gate ( and every constant gate a by ζa. It
turns C1 to a circuit C2 over (R[2V ]; ⊕, (), such that for every gate a ∈ C1 , the
corresponding gate in C2 outputs ζa. Since additions and multiplications in C2
are pointwise, C2 can be viewed as 2|V | disjoint circuits C Y over (R[2V ]; +, ·)
for every subset Y ⊆ V . The circuit C Y outputs (ζf )[Y ]. It is easy to see that
the construction of every C Y takes polynomial time.
As all problems of interest in this paper work on the integer domain Z, we
consider R = Z and replace ∗R by ∗ for simplicity. Assume 0 ≤ f [V ] < m for
some integer m, we can view the computation as on the ﬁnite ring Zm . Additions
and multiplications can be implemented eﬃciently on Zm (e.g., using the fast
algorithm in [10] for multiplication).

Theorem 1 (Theorem 5.1 [18]). Let C be a circuit over (Z[2V ]; ⊕, ∗) which

outputs f . Let all constants in C be singletons and let f [V ] < m for some integer
m. Then f [V ] can be computed in time O∗ (2|V | ) and space O(|V ||C| log m).

2.2 Tree Decomposition

For any graph G = (V, E), a tree decomposition of G is a tree T = (VT , ET )
such that every node x in VT is associated with a set Bx (called the bag of x)
of vertices in G and T has the following additional properties:
1. For any nodes x, y, and any node z belonging to the path connecting x and
y in T , Bx ∩ By ⊆ Bz .
2. For any edge e = {u, v} ∈ E, there exists a node x such that u, v ∈ Bx .
3. ∪x∈VT Bx = V .
The width of a tree decomposition T is maxx∈VT |Bx | − 1. The treewidth of a
graph G is the minimum width over all tree decompositions of G. We reserve the
letter k for treewidth in the following context. Constructing a tree decomposition
with minimum treewidth is an NP-hard problem. If the treewidth of a graph
is bounded by a constant, a linear time algorithm for ﬁnding the minimum
treewidth is known [5]. An O(log n) approximation algorithm of the treewidth
Space Saving by Dynamic Algebraization 379

is given in [7]. The result has been further improved to O(log k) in [8]. There
are also a series of works studying constant approximation of treewidth k with
running time exponential in k, see [5] and references therein.
To simplify the presentation of dynamic programming based on tree decom-
position, an arbitrary tree decomposition is usually transformed into a nice tree
decomposition which has the following additional properties. A node in a nice
tree decomposition has at most 2 children. Let c be the only child of x or let
c1 , c2 be the two children of x. Any node x in a nice tree decomposition is of one
of the following ﬁve types:
1. An introduce vertex node (introduce vertex v), where Bx = Bc ∪ {v}.
2. An introduce edge node (introduce edge e = {u, v}), where u, v ∈ Bx and
Bx = Bc . We say that e is associated with x.
3. A forget vertex node (forget vertex v), where Bx = Bc \ {v}.
4. A join node, where x has two children and Bx = Bc1 = Bc2 .
5. A leaf node, a leaf of T .
For any tree decomposition, a nice tree decomposition with the same treewidth
can be constructed in polynomial time [13]. Notice that an introduce edge node
is not a type of nodes in a common deﬁnition of a nice tree decomposition. We
can create an introduce edge node after the two endpoints are introduced. We
further transform every leaf node and the root to a node with an empty bag by
adding a series of introduce nodes or forget nodes respectively.

3 Algorithmic Framework
We explain the algorithmic framework using the problem of counting perfect
matchings based on tree decomposition as an example to help understand the
recurrences. The result can be easily applied to other problems. A perfect match-
ing in a graph G = (V, E) is a collection of |V |/2 edges such that every vertex
in G belongs to exactly one of these edges.
Consider a connected graph G and a nice tree decomposition T of treewidth k
on G. Consider a function f ∈ Z[2V ]. Assume that the recurrence for computing
f on a join node can be formulated as a subset convolution, while on other types
of tree nodes it is an addition or subtraction. We explain how to eﬃciently eval-
uate f [V ] on a nice tree decomposition by dynamic programming in polynomial
space. Let Tx be the subtree rooted at x. Let Tx be the vertices contained in bags
associated with nodes in Tx which are not in Bx . For any X ⊆ Bx , let YX be the
union of X and Tx . For any X ⊆ Bx , let fx [X] be the number of perfect match-
ings in the subgraph YX with edges introduced in Tx . As in the construction of
Theorem 1, we ﬁrst replace fx by a relaxation {fxi }0≤i≤k+1 of f , where k is the
treewidth. We then compute the zeta transform of fxi , for 0 ≤ i ≤ k + 1. In the
following context, we present only recurrences of fx for all types of tree nodes
except the join node where we need to use the relaxations. The recurrences of
fx based on fc can be directly applied to their relaxations with the same index
as in Theorem 1.
380 M. Fürer and H. Yu

For any leaf node x, (ζfx )[∅] = fx [∅] is a problem-dependent constant. In the
case of the number of perfect matchings, fx [∅] = 1. For the root x, (ζfx )[∅] =
fx [∅] = f [V ] which is the value of interest. For the other cases, consider an
arbitrary subset X ⊆ Bx .
1. x is an introduce vertex node. If the introduced vertex v is not in X,
fx [X] = fc [X]. If v ∈ X, in the case of the number of perfect matchings, v has no
adjacent edges, hence fx [X] = 0 (for other problems, fx [X] may equal to fc [X],
which implies a similarrecurrence). By deﬁnition
of the zeta transform,
if v ∈ X,

we have (ζfx )[X] = v∈X ⊆X fx [X ] + / ⊆X fx [X ] =
v ∈X / ⊆X fx [X ].
v ∈X
Therefore,

(ζfc )[X] v∈ /X
(ζfx )[X] = (7)
(ζfc )[X \ {v}] v ∈ X

2. x is a forget vertex node. fx [X] = fc [X ∪ {v}] by deﬁnition.

(ζfx )[X] = fx [X ] = fc [X ∪ {v}]
X ⊆X X ⊆X
= (ζfc )[X ∪ {v}] − (ζfc )[X]. (8)

3. x is a join node with two children. By assumption, the computation of fx

on a join node can be formulated as a subset convolution. We have

fx [X] = fc1 [X ]fc2 [X \ X ] = fc1 ∗ fc2 [X]. (9)
X ⊆X

For the problem of counting perfect matchings, it is easy to verify that fx [X]
i
can be computed using (9). Let fxi = j=0 fcj1 ∗u fci−j
2
. We can transform the
computation to

i
(ζfxi )[X] = (ζfcj1 )[X] · (ζfci−j
2
)[X], for 0 ≤ i ≤ k + 1. (10)
j=0

4. x is an introduce edge node introducing e = {u, v}. The recurrence of

fx with respect to fc is problem-dependent. Since the goal of the analysis of
this case is to explain why we need to modify the construction of an introduce
edge node, we consider only the recurrence for the counting perfect matchings
problem. In this problem, if e X, fx [X] = fc [X], then (ζfx )[X] = (ζfc )[X].
If e ⊆ X, we can match u and v by e or not use e for matching, thus fx [X] =
fc [X] + fc [X \ {u, v}]. In this case, we have

(ζfx )[X] = fx [X ] + fx [X ] = (fc [X ] + fc [X \ {u, v}])
e⊆X ⊆X eX ⊆X e⊆X ⊆X

+ fc [X ] = fc [X ] + f (X \ {u, v}).
eX ⊆X X ⊆X e⊆X ⊆X
Space Saving by Dynamic Algebraization 381

Hence,

(ζfc )[X] eX
(ζfx )[X] = (11)
(ζfc )[X] + (ζfc )[X \ {u, v}] e ⊆ X
In cases 2 and 4, we see that the value of (ζfx )[X] depends on the values
of ζfc on two different subsets. We can visualize the computation along a path
from a leaf to the root as a computation tree. This computation tree branches
on introduce edge nodes and forget vertex nodes. Suppose along any path from
the root to a leaf in T , the maximum number of introduce edge nodes is m and
the maximum number of forget vertex nodes is h. To avoid exponentially large
storage for keeping partial results in this computation tree, we compute along
every path from a leaf to the root in this tree. This leads to an increase of the

running time by a factor of O(2m +h ), but the computation is in polynomial space
(explained in detail later). As m could be Ω(n), this could contribute a factor
of 2Ω(n) to the time complexity. To reduce the running time, we eliminate the
branching introduced by introduce edge nodes. On the other hand, the branching
introduced by forget vertex nodes seems inevitable.
For any introduce edge node x which introduces an edge e and has a child c
in the original nice tree decomposition T , we add an auxiliary child c of x, such
that Bc = Bx and introduce the edge e at c . c is a special leaf which is not
empty. We assume the evaluation of ζf on c takes only polynomial time. For
the counting perfect matchings problem, fc [X] = 1 only when X = e or X = ∅,
otherwise it is equal to 0. Then (ζfc )[X] = 2 if e ⊆ X, otherwise (ζfc )[X] = 1.
We will verify that this assumption is valid for other problems considered in the
following sections. We call x a modified introduce edge node and c an auxiliary
leaf. As the computation on x is the same as that on a join node, we do not talk
about the computation on modified introduce edge nodes separately.
In cases 1 and 2, we observe that the addition operation is not a strictly
pointwise addition as in Theorem 1. This is because in a tree decomposition,
the set of vertices on every tree node might not be the same. However, there
is a one-to-one correspondence from a set X in node x to a set X in its child
c. We call it a relaxed pointwise addition and denote it by ⊕ . Hence, f can
be evaluated by a circuit C over (Z[2V ]; ⊕ , ∗). We transform C to a circuit C1
over (Z[2V ]; ⊕ , ∗u ), then to C2 over (Z[2V ]; ⊕ , (), following constructions in
Theorem 1.
In Theorem 1, C2 can be viewed as 2|V | disjoint circuits. In the case of tree
decomposition, the computation makes branches on a forget node. Therefore, we
cannot take C2 as O(2k ) disjoint circuits. Consider a subtree Tx of T where the
root x is the only join node in the subtree. Take an arbitrary path from x to a
leaf l and assume there are h forget nodes along this path. We compute along
every path of the computation tree expanded by the path from x to l, and sum

up the result at the top. There are 2h computation paths which are independent.

Hence we can view the computation as 2h disjoint circuits on (Z; +, ·). Assume
the maximum number of forget nodes along any path from the root x to a leaf
in Tx is h and there are nl leaves, the total computation takes at most nl · 2h
time and in polynomial space.
382 M. Fürer and H. Yu

In general, we proceed the computation in an in-order depth-ﬁrst traversal on

a tree decomposition T . Every time we hit a join node j, we need to complete all
computations in the subtree rooted at j before going up. Suppose j1 , j2 are the
closest join nodes in two subtrees rooted at the children of j (if there is no other
join node consider j1 or j2 to be empty). Assume there are at most hj forget
nodes between j, j1 and j, j2 . Let Tx be the time to complete the computation
of (ζfx )[X] at node x. We have Tj ≤ 2 · 2hj · max{Tj1 , Tj2 }). The modiﬁed edge
node is a special type of join node. In this case, since one of its children c1 is
always a leaf, the running time only depends on the subtree rooted at c2 , thus
similar to an introduce vertex node. Suppose there are nj join nodes and let
h be the maximum number of forget nodes along any path from the root to
a leaf. By induction, it takes 2nj · 2h time to complete the computation on T
and in polynomial space. Notice that 2nj is the number of leaves in T , hence
2nj = O(|V | + |E|).
To summarize, we present the algorithm for the problem of counting perfect
matchings based on a modiﬁed nice tree decomposition T in Algorithm 1.

Algorithm 1. Counting perfect matchings on a modiﬁed nice tree decomposition

Input: a modiﬁed nice tree decomposition T with root r.
return (ζf )(r, ∅, 0).
procedure (ζf )(x, X,i). // (ζf )(x, X, i) represents (ζfxi )[X].
if x is a leaf: return 1.
if x is an auxiliary leaf: return 2 when e ⊆ X, otherwise 1.
if x is an introduce vertex node: return (ζf )(c, X, i) when v ∈ / X, or (ζf )(c, X −
{v}, i) when v ∈ X.
if x is a forget vertex node:return (ζf )(c, X ∪ {v}, i) − (ζf )(c, X, i).
if x is a join node: return ij=0 (ζf )(c1 , X, j) · (ζf )(c2 , X, i − j).
end procedure

For any tree decomposition T of a graph G, we can transform it to a modiﬁed

nice tree decomposition T with the convention that the root has an empty bag.
In this way, the parameter h, the maximum number of forget nodes along any
path from the root to a leaf in T is equal to the maximum size of the union of
all bags along any path from the root to a leaf in T . We directly tie this number
h to the complexity of our algorithm. Let hm (G) be the minimum value of h
for all tree decompositions of G. We show that hm (G) is closely related to a
well-known parameter, the tree-depth of a graph [21].

Deﬁnition 1 (tree-depth [21]). Given a rooted tree T with vertex set V , a

closure of T , clos(T ) is a graph G with the same vertex V , and for any two
vertices x, y ∈ V such that x is an ancestor of y in T , there is a corresponding
edge (x, y) in G. The tree-depth of T is the height of T . The tree-depth of a graph
G, td(G) is the minimum height of trees T such that G ⊆ clos(T ).
Space Saving by Dynamic Algebraization 383

Proposition 1. For any connected graph G, hm (G) = td(G).

Proof. For any tree decomposition of G, we ﬁrst transform it to a modiﬁed nice
tree decomposition T . We contract T by deleting all nodes except the forget
nodes. Let Tf be this contracted tree such that for every forget node in T
which forgets a vertex x in G, the corresponding vertex in Tf is x. We have
G ⊆ clos(Tf ). Therefore, td(G) ≤ h, here h is the maximum number of forget
nodes along any path from the root to a leaf in T .
For any tree T such that G ⊆ clos(T ), we construct a corresponding tree
decomposition T of G such that, T is initialized to be T and every bag associated
with the vertex x of T contains the vertex itself. For every vertex x ∈ T , we
also put all ancestors of x in T into the bag associated with x. It is easy to
verify that it is a valid tree decomposition of G. Therefore, the tree-depth of T ,
td(T ) ≥ hm (G).
In the following context, we also call the parameter h, the maximum size of the
union of all bags along any path from the root to a leaf in a tree decomposition
T , the tree-depth of T . Let k be the treewidth of G, it is shown in [21] that
td(G) ≤ (k+1) log |V |. Therefore, we also have hm (G) ≤ (k+1) log |V |. Moreover,
it is obvious to have hm (G) ≥ k + 1.
Finally, we summarize the main result of this section in the following theorem.

Theorem 2. Given any graph G = (V, E) and tree decomposition T on G. Let

f be a function evaluated by a circuit C over (Z[2V ]; ⊕ , ∗) with constants being
singletons. Assume f [V ] < m for integer m. We can compute f [V ] in time
O∗ ((|V | + |E|)2h ) and in space O(|V ||C| log m). Here h is the maximum size of
the union of all bags along any path from the root to a leaf in T .

4 Counting Perfect Matchings

The problem of counting perfect matchings is $P-complete. It has long been
known that in a bipartite graph of size 2n, counting perfect matchings takes
O∗ (2n ) time using the inclusion and exclusion principle. A recent breakthrough
[1] shows that the same running time is achievable for general graphs. For low
degree graphs, improved results based on dynamic programming on path decom-
position on a suﬃciently large subgraph are known [2].
Counting perfect matchings on grids is an interesting problem in statistical
physics [12]. The more generalized problem is the Monomer-Dimer problem [12],
which essentially asks to compute the number of matchings of a speciﬁc size.
We model the Monomer-Dimer problem as computing the matching polynomial
problem . For grids in dimension 2, the pure Dimer (perfect matching) problem
is polynomial-time tractable and an explicit expression of the solution is known
[24]. We consider the problem of counting perfect matchings in cube/hypercube
in Section 4.1. Results on counting perfect matchings in more general grids,
computing the matching polynomial and applications to other set covering and
partitioning problems are presented in Section 4.2.
384 M. Fürer and H. Yu

4.1 Counting Perfect Matchings on Cube/Hypercube

We consider the case of counting perfect matchings on grids of dimension d,

where d ≥ 3 and the length of the grid is n in each dimension. We denote
this grid by Gd (n). To apply Algorithm 1, we first construct a balanced tree
decomposition on Gd (n) with the help of balanced separators. The balanced tree
decomposition can easily be transformed into a modified nice tree decomposition.
Tree Decomposition Using Balanced Vertex Separators. We first explain
how to construct a balanced tree decomposition using vertex separators of gen-
eral graphs. An α-balanced vertex separator of a graph/subgraph G is a set of
vertices S ⊆ G, such that after removing S, G is separated into two disjoint
parts A and B with no edge between A and B, and |A|, |B| ≤ α|G|, where α
is a constant in (0, 1). Suppose we have an oracle to find an α-balanced vertex
separator of a graph. We begin with creating the root of a tree decomposition
T and associate the vertex separator S of the whole graph with the root. Con-
sider a subtree Tx in T with the root x associated with a bag Bx . Denote the
vertices belonging to nodes in Tx by Vx . Initially, Vx = V and x is the root of
T . Suppose we have a vertex separator Sx which partitions Vx into two disjoint
parts Vc1 and Vc2 . We create two children c1 , c2 of x, such that the set of vertices
belonging to Tci is Sx ∪ Vci . Denote the set of vertices belonging to nodes in the
path from x to the root of T by Ux , we define the bag Bci to be Sx ∪ (Vci ∩ Ux ),
for i = 1, 2. It is easy to verify that this is a valid tree decomposition. Since Vx
decreases by a factor of at least 1 − α in each partition, the height of the tree
is at most log 1−α 1 n. To transform this decomposition into a modified nice tree
decomposition, we only need to add a series of introduce vertex nodes, forget
vertex nodes or modified introduce edge nodes between two originally adjacent
nodes. We call this tree decomposition algorithm Algorithm 2.
We observe that after the transformation, the number of forget nodes from
Bci to Bx is the size of the balanced vertex separator of Vx , i.e. |Sx |. Therefore,
the number of forget nodes from the root to a leaf is the sum of the sizes of the
balanced vertex separators used to construct this path in the tree decomposition.
A grid graph Gd (n) has a nice symmetric structure. Denote the d dimensions
by x1 , x2 , ..., xd and consider an arbitrary subgrid Gd of Gd (n) with length ni
in dimension xi . The hyperplane in Gd which is perpendicular to xi and cuts
Gd into halves can be used as a 1/2-balanced vertex separator. We always cut
the dimension with the longest length. If ni = ni+1 , we choose to first cut the
dimension xi , then xi+1 .
To run Algorithm 2 on Gd (n), we cut dimensions x1 , x2 , ..., xd consecutively
1
with separators of size 2i−1 nd−1 , for i = 1, 2..., d. Then we proceed with subgrids
of length n/2 in every dimension. It is easy to see that the treewidth of this tree
decomposition is 32 nd−1 . The tree-depth h of this tree decomposition is at most
∞ d−1 1 2d −1
i=0 2i · ( 2j n)
1 d−1
j=0 , which is 2d−1 −1
nd−1 .

Lemma 1. The treewidth of the tree decomposition T on Gd (n) obtained by

2d −1
Algorithm 2 is 32 nd−1 . The tree-depth of T is at most 2d−1 −1 n
d−1
.
Space Saving by Dynamic Algebraization 385

To apply Algorithm
1 to the problem of counting perfect matchings, we verify
that f [S] ≤ |V|E|
|/2
≤ |E| |V |/2
and all constants are singletons.

Theorem 3. The problem of counting perfect matchings on grids of dimension

2d −1
nd−1
d and uniform length n can be solved in time O∗ (2 2d−1 −1 ) and in polynomial
space.

To the best of our knowledge, there is no rigorous time complexity analysis of the
counting perfect matchings problem in grids in the literature. To demonstrate
the eﬃciency of Algorithm 1, we compare it to three other natural algorithms.
1. Dynamic programming based on path decomposition. A path decom-
position is a special tree decomposition where the underlying tree is a path. A
path decomposition with width 2nd−1 is obtained by putting all vertices with
x1 coordinate equal to j and j + 1 into the bag of node j, for j = 0, 1, ..., n − 1.
A path decomposition with a smaller pathwidth of nd−1 can be obtained as fol-
lows. Construct n nodes {p1 , p2 , ..., pn } associated with a bag of vertices with
x1 coordinate equal to j, for j = 0, 1, ..., n − 1. For any pj , pj+1 , start from pj ,
add a sequence of nodes by alternating between adding a vertex of x1 = j + 1
and deleting its neighbor with x1 = j. The number of nodes increases by a fac-
tor of nd−1 than the ﬁrst path decomposition. We run the standard dynamic
programming on the second path decomposition. This algorithm runs in time
O∗ (2n ), however the space complexity is O∗ (2n ). It is of no surprise that
d−1 d−1

it has a better running time than Algorithm 1 due to an extra space usage. We
remark that van Rooij et al. [25] give a dynamic programming algorithm for the
counting perfect matching problem on any tree decomposition of treewidth k
with running time O∗ (2k ) and space exponential to k.
2. Dynamic programming based on path decomposition on a subgrid.
One way to obtain a polynomial space dynamic programming is to construct a
low pathwidth decomposition on a sufficiently large subgraph. One can then run
dynamic programming on this path decomposition and do an exhaustive enumer-
ation on the remaining graph in a similar way as in [2]. To extract from Gd (n)
a subgrid of pathwidth O(log n) (notice that this is the maximum pathwidth for
a polynomial space dynamic programming algorithm), we can delete a portion
of vertices from Gd (n) to turn a ”cube”-shaped grid into a long ”stripe” with
d
O(log n) cross-section area. It is sufficient to remove O( (log n)n1/(d−1) ) vertices.
nd
O( )
This leads to a polynomial-space algorithm with running time 2 (log n)1/(d−1) ,
which is worse than Algorithm 1.
3. Branching algorithm. A naive branching algorithm starting from any ver-
d
tex in the grid could have time complexity 2O(n ) in the worst case. We analyze
a branching algorithm with a careful selection of the starting point. The branch-
ing algorithm works by first finding a balanced separator S and partitioning the
graph into A∪S ∪B. The algorithm enumerates every subset X ⊆ S. A vertex in
X either matches to vertices in A or to vertices in B while vertices in S \ X are
matched within S. Then the algorithm recurses on A and B. Let Td (n) be the
386 M. Fürer and H. Yu

running time of this branching algorithm on Gd (n). We use the same balanced
separator as in Algorithm 2. We have an upper bound of the running time as,
|X|
Td (n) ≤ 2Td ( n−|S|
2 ) X⊆S 2 Td−1 (|S \ X|). We can use any polynomial space
algorithm to count perfect matchings on S \ X. For example using Algorithm
d−2
1, since the separator is of size O(nd−1 ), we have Td−1 (|S \ X|) = 2O(n ) .
d−1 |S| |S|
Therefore, Td (n) ≤ 2Td ( n2 ) · 2o(n ) i=0 i 2i = 2Td ( n2 ) · 2o(n ) 3|S| . We
d−1

2d −1
nd−1
get Td (n) = O∗ (3h ), i.e. O∗ (3 2d−1 −1 ), which is worse than Algorithm 1. We
remark that this branching algorithm can be viewed as a divide and conquer
algorithm on balanced tree decomposition, which is similar as in [17].

4.2 Extensions

Counting Perfect Matchings on General Grids. Consider more general

grids of dimension d with each dimension of length ni , 1 ≤ i ≤ d, which is at
most nm . We use Algorithm 2 to construct a balanced tree decomposition T of
a general grid and obtain an upper bound of the tree-depth h of T . The proof
is omitted due to space constraint.
Lemma 2. Given any grid of dimension d and volume V. Using Algorithm 2,
the tree-depth of this tree decomposition is at most 3dV
nm .

Based on Lemma 2, we give time complexity results of algorithms discussed in

Section 4.1. First, h is the only parameter to the running time of Algorithm 1 and
3dV
the branching algorithm. Algorithm 1 runs in time O∗ (2 nm ) and the branching
3dV
algorithm runs in time O∗ (3 nm ). The dynamic programming algorithm based
V
O( )
on path decomposition on a subgrid has a running time 2 (log nm )1/(d−1) . Those
three algorithms have polynomial space complexity. For constant d, Algorithm 1
has the best time complexity. For the dynamic programming algorithm based
V
on path decomposition, it runs in time O∗ (2 nm ) but in exponential space.
Computing the Matching Polynomial. The matching polynomial of a graph
|G|/2 i i i
G is defined to be m[G, λ] = i=0 m [G]λ , where m [G] is the number of
matchings of size i in graph G. We put the coefficients of m[G, λ] into a vector
m[G]. The problem is essentially to compute the coefficient vector m[G].
For every node x in a tree decomposition, let vector mx [X] be the coefficient
vector of the matching polynomial defined on YX . Notice that every entry of
mx [X] is at most |E||V |/2 and all constants are singletons. m0x [X] = 1 and
mix [X] = 0 for i > |X|/2. The case of x being a forget vertex node follows
exactly from Algorithm 1. For any type of tree node x,
- x is a leaf node. mix [∅] = 1 if i = 0, or 0 otherwise.
- x is an introduce vertex node. If v ∈ X, mix [X] = mic [X \ {v}]. Hence
(ζmix )[X] = 2(ζmic )[X \ {v}] if v ∈ X, or (ζmix )[X] = (ζmic )[X] otherwise.
- x is an auxiliary leaf of a modified introduce edge node. mix [X] = 1 only
when u, v ∈ X and i = 1, or i = 0. Otherwise
i it is 0.
- x is a join node. mix [X] = X ⊆X j=0 mjc1 [X ]mi−j
c2 [X \ X ].
Space Saving by Dynamic Algebraization 387

Counting l-packings. Given a universe U of elements and a collection of sub-

sets S on U , an l-packing is a collection of l disjoint sets. The l-packings problem
can be solved in a similar way as computing the matching polynomial. Packing
problems can be viewed as matching problems on hypergraphs. Tree decompo-
sition on graphs can be generalized to tree decomposition on hypergraph, where
we require every hyperedge to be assigned to a specific bag [11]. A hyperedge is
introduced after all vertices covered by this edge are introduced.
Counting Dominating Sets, Counting Set Covers. The set cover problem
is given a universe U of elements and a collection of sets S on U , find a subcol-
lection of sets from S which covers the entire universe U . The dominating set
problem is defined on a graph G = (V, E). Let U = V , S = {N [v]}v∈V , where
N [v] is the union of the neighbors of v and v itself.The dominating set problem
is to find a subset of vertices S from V such that v∈S N [v] covers V .
The set cover problem can be viewed as a covering problem on a hypergraph,
where one selects a collection of hyperedges which cover all vertices. The domi-
nating set problem is then a special case of the set cover problem. If S is closed
under subsets, a set cover can be viewed as a disjoint cover. We only consider
the counting set covers problem. For any subset X ⊆ Bx , we define hx [X] to
be the number of set covers of YX . We have hx [X] ≤ |U ||S| , and all constants
are singletons. We omit the recurrence for forget vertex nodes as we can directly
apply recurrence (8) in Algorithm 1. For any node x, hx [∅] = 1.
- x is a leaf node. hx [∅] = 1.
- x is an introduce vertex node. If v ∈ X, hx [X] = 0. If v ∈ / X, hx [X] = hc [X].
- x is an auxiliary leaf of a modified introduce hyperedge node. hx [X] = 1
when X ⊆ e, and hx [X] = 0 otherwise.

- x is a join node. hx [X] = X ⊆X hc1 [X ]hc2 [X − X ].
Finally, we point out that our framework has its limitations. First, it cannot be
applied to problems where the computation on a join node cannot be formalized
as a convolution. The maximum independent set problem is an example. Also it is
not known if there is a way to adopt the framework to the Hamiltonian path prob-
lem, the counting l-path problems, and the unweighted Steiner tree problem. It
seems that for theses problems we need a large storage space to record intermediate
results. It is interesting to find more problems which fit in our framework.

References
1. Björklund, A.: Counting perfect matchings as fast as Ryser. In: SODA, pp. 914–921
(2012)
2. Björklund, A., Husfeldt, T.: Exact algorithms for exact satisﬁability and number
of perfect matchings. Algorithmica 52(2), 226–249 (2008)
3. Björklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Fourier meets Möbius: fast
subset convolution. In: STOC, pp. 67–74 (2007)
4. Bodlaender, H.L.: Dynamic programming on graphs with bounded treewidth. In:
Lepistö, T., Salomaa, A. (eds.) ICALP 1988. LNCS, vol. 317, pp. 105–118. Springer,
Heidelberg (1988)
5. Bodlaender, H.L.: A linear time algorithm for ﬁnding tree-decompositions of small
treewidth. In: STOC, pp. 226–234 (1993)
388 M. Fürer and H. Yu

6. Bodlaender, H.L.: Discovering treewidth. In: Vojtáš, P., Bieliková, M., Charron-
Bost, B., Sýkora, O. (eds.) SOFSEM 2005. LNCS, vol. 3381, pp. 1–16. Springer,
Heidelberg (2005)
7. Bodlaender, H.L., Gilbert, J.R., Kloks, T., Hafsteinsson, H.: Approximating
treewidth, pathwidth, and minimum elimination tree height. In: Schmidt, G.,
Berghammer, R. (eds.) WG 1991. LNCS, vol. 570, pp. 1–12. Springer, Heidelberg
(1992)
8. Bouchitté, V., Kratsch, D., Müller, H., Todinca, I.: On treewidth approximations.
Discrete Appl. Math. 136(2-3), 183–196 (2004)
9. Fomin, F.V., Gaspers, S., Saurabh, S., Stepanov, A.A.: On two techniques of com-
bining branching and treewidth. Algorithmica 54(2), 181–207 (2009)
10. Fürer, M.: Faster integer multiplication. SIAM J. Comput. 39(3), 979–1005 (2009)
11. Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions: A survey. In: Sgall,
J., Pultr, A., Kolman, P. (eds.) MFCS 2001. LNCS, vol. 2136, pp. 37–57. Springer,
Heidelberg (2001)
12. Kenyon, C., Randall, D., Sinclair, A.: Approximating the number of monomer-
dimer coverings of a lattice. J. Stat. Phys. 83 (1996)
13. Kloks, T. (ed.): Treewidth. LNCS, vol. 842. Springer, Heidelberg (1994)
14. Kneis, J., Mölle, D., Richter, S., Rossmanith, P.: A bound on the pathwidth of
sparse graphs with applications to exact algorithms. SIAM J. Discret. Math. 23(1),
407–427 (2009)
15. Koutis, I.: Faster algebraic algorithms for path and packing problems. In: ICALP,
pp. 575–586 (2008)
16. Koutis, I., Williams, R.: Limits and applications of group algebras for parameter-
ized problems. In: ICALP, pp. 653–664 (2009)
17. Lokshtanov, D., Mnich, M., Saurabh, S.: Planar k-path in subexponential time and
polynomial space. In: Kolman, P., Kratochvı́l, J. (eds.) WG 2011. LNCS, vol. 6986,
pp. 262–270. Springer, Heidelberg (2011)
18. Lokshtanov, D., Nederlof, J.: Saving space by algebraization. In: STOC, pp. 321–
330 (2010)
19. Miller, G.L., Teng, S.-H., Thurston, W., Vavasis, S.A.: Separators for sphere-
packings and nearest neighbor graphs. J. ACM 44(1), 1–29 (1997)
20. Nederlof, J.: Fast polynomial-space algorithms using inclusion-exclusion. Algorith-
mica 65(4), 868–884 (2013)
21. Nešetřil, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism
bounds. Eur. J. Comb. 27(6), 1022–1041 (2006)
22. Rota, G.-C.: On the foundations of combinatorial theory. i. theory of möbius func-
tions. Zeitschrift Wahrscheinlichkeitstheorie und Verwandte Gebiete 2(4), 340–368
(1964)
23. Stanley, R.P., Rota, G.C.: Enumerative Combinatorics, vol. 1. Cambridge Univer-
sity Press (2000)
24. Temperley, H.N.V., Fisher, M.: Dimer problem in statistical mechanics - an exact
result. Philosophical Magazine 6, 1061–1063 (1961)
25. van Rooij, J.M.M., Bodlaender, H.L., Rossmanith, P.: Dynamic programming on
tree decompositions using generalised fast subset convolution. In: Fiat, A., Sanders,
P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 566–577. Springer, Heidelberg (2009)
26. van Rooij, J.M.M., Nederlof, J., van Dijk, T.C.: Inclusion/Exclusion meets measure
and conquer. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 554–
565. Springer, Heidelberg (2009)
27. Woeginger, G.J.: Space and time complexity of exact algorithms: Some open prob-
lems (invited talk). In: 1st International Workshop on Parameterized and Exact
Computation, pp. 281–290 (2004)
Author Index

Acharyya, Rupam 39 Lohrey, Markus 245

Akhoondian Amiri, Saeed 52 López-Ortiz, Alejandro 325
Arvind, V. 65
Martin, Barnaby 259
Baartse, Martijn 77 Meer, Klaus 77
Bhrushundi, Abhishek 97 Mehta, Jenish C. 273
Boral, Anudhyan 111 Méry, Daniel 125

Chakraborty, Sourav 39, 97 Pan, Victor Y. 287

Cygan, Marek 111 Parys, Pawel 300
Pilipczuk, Marcin 111
Demri, Stéphane 125 Plandowski, Wojciech 1
Diekert, Volker 1
Rabkin, Max 314
Edelkamp, Stefan 139 Raja, S. 65
Rodrı́guez-Velázquez, Juan Alberto 153
Fernau, Henning 153 Romero, Jazmı́n 325
Find, Magnus Gausdal 167 Rossman, Benjamin 218
Fleischer, Lukas 176
Fürer, Martin 375 Schmidt-Schauß, Manfred 245
Schüler, Julia 337
Galmiche, Didier 125 Schwerdtfeger, Konrad 351
Golshani, Ali 52 Semenov, Alexei 23
Grohe, Martin 16 Siebertz, Sebastian 52
Soprunov, Sergey 23
Hutagalung, Milka 190 Spillner, Andreas 337
Sreejith, A.V. 65
Jain, Sanjay 204 Stacho, Juraj 259
Jeż, Artur 1 Stephan, Frank 204
Jha, Nitesh 39
Teng, Dan 204
Kawachi, Akinori 218
Khoussainov, Bakhadyr 204 Uspensky, Vladimir 23
Kociumaka, Tomasz 111
Kreutzer, Stephan 52 Vereshchagin, Nikolay 365
Krupski, Vladimir N. 232
Kuﬂeitner, Manfred 176 Watanabe, Osamu 218
Kulkarni, Raghav 97 Weiß, Armin 139