Intelligent Computer Mathematics
Intelligent Computer Mathematics
Intelligent Computer
Mathematics
13th International Conference, CICM 2020
Bertinoro, Italy, July 26–31, 2020
Proceedings
123
Lecture Notes in Artificial Intelligence 12236
Series Editors
Randy Goebel
University of Alberta, Edmonton, Canada
Yuzuru Tanaka
Hokkaido University, Sapporo, Japan
Wolfgang Wahlster
DFKI and Saarland University, Saarbrücken, Germany
Founding Editor
Jörg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Christoph Benzmüller Bruce Miller (Eds.)
•
Intelligent Computer
Mathematics
13th International Conference, CICM 2020
Bertinoro, Italy, July 26–31, 2020
Proceedings
123
Editors
Christoph Benzmüller Bruce Miller
Department of Mathematics National Institute of Standards
and Computer Science and Technology
Freie Universität Berlin Gaithersburg, MD, USA
Berlin, Germany
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Program Committee
Akiko Aizawa University of Tokyo, Japan
David Aspinall University of Edinburgh, UK
Frédéric Blanqui INRIA, France
Jacques Carette McMaster University, Canada
James H. Davenport University of Bath, UK
William Farmer McMaster University, Canada
Jacques Fleuriot University of Edinburgh, UK
Osman Hasan NUST, Pakistan
Jan Jakubuv Czech Technical University, Czech Republic
Mateja Jamnik University of Cambridge, UK
Cezary Kaliszyk University of Innsbruck, Austria
Fairouz Kamareddine Heriot-Watt University, UK
Manfred Kerber University of Birmingham, UK
Andrea Kohlhase University of Applied Sciences Neu-Ulm, Germany
Michael Kohlhase FAU Erlangen-Nürnberg, Germany
Laura Kovacs TU Vienna, Austria
Temur Kutsia JKU Linz, Austria
Adam Naumowicz University of Bialystok, Poland
Karol Pak University of Bialystok, Poland
Florian Rabe FAU Erlangen-Nürnberg, Germany, and LRI Paris,
France
Moritz Schubotz FIZ Karlsruhe, Germany
Volker Sorge University of Birmingham, UK
Geoff Sutcliffe University of Miami, USA
Olaf Teschke FIZ Karlsruhe, Germany
Josef Urban Czech Technical University, Czech Republic
Makarius Wenzel sketis.net, Germany
Abdou Youssef George Washington University, USA
viii Organization
Additional Reviewers
Invited Talks
Full Papers
The Tactician: A Seamless, Interactive Tactic Learner and Prover for Coq . . . 271
Lasse Blaauwbroek, Josef Urban, and Herman Geuvers
Christian Szegedy(B)
1 Introduction
Today, AI systems are able to learn solving tasks that used to be thought of
taking uniquely human capabilities until recently: computer vision [46], gener-
ating artistic images [13], music [21], mastering the game of go [43], discovering
novel drugs [15] and performing symbolic integration [31], to name just a few.
These and many other domains seemed to require uniquely human intuition and
insight, but were transformed by deep learning in the past few years. While
progress has been extremely impressive in those areas, each particular solution
addresses a relatively narrow use case. On the other hand, general reasoning
still seems a uniquely human feat and many [20] would argue that creating AI
agents with general reasoning capabilities equaling to those of humans would
take decades, maybe centuries, if possible at all.
This invited paper argues that in the coming years we will see automated
systems to rival humans in general reasoning and the fastest path to achieve this
is by creating automated mathematical reasoning systems via autoformalization.
Here, I give an overview of the hurdles involved, a realistic path ahead and
indications on the feasibility of that path.
Mathematics is the discipline of pure reasoning. Mathematical reasoning is
not about mathematics per se, it is about reasoning in general. Whether to verify
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 3–20, 2020.
https://doi.org/10.1007/978-3-030-53518-6_1
4 C. Szegedy
2 What is (Auto-)formalization?
The task of formalization is to turn informal descriptions into some formally
correct and automatically checkable format. Examples of mathematical formal-
ization include the formal proofs of the Kepler conjecture [22], the Four-Color
theorem [16] and the Feit-Thompson theorem [17]. These formalization works
required a lot of human effort. For example the formalization of the Kepler con-
jecture took over 20 man-years of work. The aim of autoformalization would be
A Promising Path Towards Autoformalization and General AI 5
to automate such efforts and scale them up to process large chunks of existing
mathematics in a fully automated manner.
More generally, “formalization” can refer to any process that takes an infor-
mal description for input and produces machine executable code. By this defi-
nition, formalization covers both programming and mathematical formalization.
This generalized notion is also justified because computer verifiable proofs are in
fact programs to feed some minimalistic verification kernel. For example, most
proof assistants are complete programming languages that allow for running
arbitrary programs while guaranteeing the correctness of the produced proofs.
Complex mathematics is especially time consuming to formalize by humans.
Therefore, it is highly unlikely that a significant portion of mathematics will be
formalized manually in the coming decades. Could formalization be ever auto-
mated completely? The ideal solution could process natural language text fully
automatically, with minimal intervention from the user.
We call an automated system that is capable of automatically formalizing
significant portions of mathematics from a natural language input and verifying
it automatically an autoformalization system.
domain level knowledge, the system could learn to produce code from natu-
ral language input. Such reasoning systems should be able to create the formal
specification of the task, the executable code and correctness proof of the newly
designed algorithm, all at the same time.
Furthermore, this would give rise to strong and flexible general purpose rea-
soning engines that could be integrated into AI applications, combining reasoning
with perception. This could be used to infuse strong reasoning capabilities into
other AI systems and serve as a basis for a wide range of such applications (for
example semantic search, software synthesis and verification, computer aided
design, etc.).
5 Hurdles of Autoformalization
1. The seed formalization system is too weak to initiate a feedback loop that
can open-endedly improve itself.
2. The system might start to generate mistranslations for further training of the
translation model, entering a feedback loop of increasingly worse translations.
3. Translation gets stuck: it would generate a lot of incorrect statements that
are never verified; and the system stops to improve.
8 C. Szegedy
Formalized Corpus D
Form
Exploration
The first technical issue concerns the input format of informal mathematical
content (i.e. mathematical papers and text books). A textual representation
could work well for use cases that do not require the understanding of formulas,
diagrams and graphs. However, mathematical content often uses a lot of formulas
and diagrams. Geometric illustrations play a role in informing the reader as well.
The safest path seems to rely on images instead of textual representation. While
this puts more burden on the machine learning part of the system, it can reduce
the engineering effort significantly.
Let S denote the set of syntactically correct formal mathematical statements
in some formalization environment (e.g. HOL Light). By S , we denote those
statements with a formal proof already present in our database. C is the set
A Promising Path Towards Autoformalization and General AI 9
For processing the natural language input, our computer vision model aξ
predicts aξ (p) = eθ (t(p)), where t(p) stands for the hypothesized formalization
of page p. Since eθ is assumed to be an embedding model that reflects semantic
similarity, t can be multi-valued, reflecting that there are several correct formal
translations of the same informal statement, the embedding vectors of which are
expected to cluster in Rn .
In order to create a feedback loop between training θ and ξ, we maintain a
set of proved theorems, a large set of informal statements P and translations
Tξ = {aξ (p)|p ∈ P } of approximate translations of informal statements. To
generate the training data for training, we run guided exploration by sampling
forward reasoning steps using another deep neural network gη starting from
our already proved theorems with the goal of getting close to as many of the
approximate translated embeddings Tξ as possible. For this purpose, η is trained
via reinforcement learning in which the reward is based on the negative mini-
mum distance to the closest target embedding vector. The guidance model gη
samples both conversions and conversion parameters (“premises” used for the
conversion). Note that gη can be trained while circumventing the sparse reward
problem: even if we do not get close to any of our original targets, we can pretend
that the embedding of the statement we arrived at was our original goal from
the start. This idea is known as hindsight experience replay [2].
Once our guided search finds enough statements that match some of the
prescribed embeddings in Tξ , we would check that they are non-trivially true
and use them as verified translations for retraining aξ . As we proceed, we can
incrementally train eθ and gη as well. For example eθ could be trained by analyz-
ing the dependency structure of the explored statements (the tactic parameters
that led to the new statement), while gη is trained using reinforcement learning
utilizing the rewards collected during exploration.
The main advantage is that this system is expected to be more robust to
errors and incomplete inputs: if exploration is powerful enough, then it can work
even if we fail to translate some of the statements properly. Also, if formaliza-
tion gets stuck, the system can just relax the distance with which it accepts
formalization attempts in the embedding space, still producing valid theories
that might not exactly correspond to the informal corpus.
Also the system should be able to generalize to completely new domains more
easily, as exploration is more likely to be efficient in the early stages. This can
bootstrap the easy parts of the system and can prime the translation model and
later exploration so that it can continue bootstrapping successfully.
For the neural representation of formal content, the network architecture has a
significant effect on the performance of the reasoning system. Currently, deep
graph embedding networks [40,52] with node-sharing do best, however trans-
former networks [51] have yielded breakthrough on formal integration [31],
recently.
Our main approach is based on forward exploration. Aligning the result of
forward exploration with the target statement might require reverse (goal ori-
ented) proof search, however. As most research is done on reverse proof search
e.g. [4,26,55], integrating with such methods is likely a useful idea and a fruitful
engineering direction.
As described in the Sect. 6, we need to filter out translation candidates that
are incorrect, trivial or uninteresting. The first criterion is clear: we do not expect
wrong statements to be correct formalization candidates. It is harder to discard
candidate translations that are trivially true (e.g. due to too general assumptions
or other translation errors). This could be identified by observing the hardness of
proving statements. Also, if a statement is overly long, or has a lot of redundant
subtrees, then it is highly unlikely to come from formalizing human content. The
usefulness of the produced statements should give another strong indication of
good translations.
Curriculum learning is a promising way of learning to find longer proofs. A
remarkable result demonstrating the power of strong curriculum is [61] in which
they trained a reinforcement learning system to find proofs consisting of several
thousand elementary proof-steps without any search, just by letting the policy
network predict them in a single run.
Tactics in proof assistants are subroutines that perform complicated algo-
rithms in order to produce long chains of arguments about the correctness of
certain formulas. Examples of existing tactics include the application of SAT-
solvers or first order automated provers to prove statements that require simple
logical reasoning, but they can be as complex as using Gröbner bases of ILP
solvers to reason about polynomial equations or linear systems of Diophantine
inequalities. Given the complexity of such algorithms, it is unlikely that one
could synthesize a general purpose computer algebra system from scratch ini-
tially. However, the vast majority of sophisticated human mathematics was dis-
covered without the aid of computer programs, so we can hope that matching the
performance of human mathematicians could be achieved without synthesizing
complicated tactics.
For refutation and counterexample generation, it might be important to find
substitutions into statements that provide a refutation of that statement. In
general it is a promising research direction to use deep learning based models to
embed not just the syntactic form of formulas, but also some experience stream
associated with experimentation with the statements.
One difference between theorem proving and game playing engines is the
much wider breadth of mathematics. For neural network based systems, this
might mean that it could require very large neural networks to distill all the
A Promising Path Towards Autoformalization and General AI 13
skills required to cope with all areas of mathematics at once. One could try to
cope with that by utilizing mixture of expert models [58]. However, their fixed
gating mechanism and rigid model architectures are relatively hard to extend.
More flexible are multi-agent architectures using artificial market mechanisms
that allow arbitrary agents to bet on the status of mathematical conjectures
while the agents are rewarded for correct predictions, proving theorems formally
and for introducing interesting new conjectures. The above direction opens a
large box of interesting mechanism design [39] questions. [12] proposes that a
betting market based multi-agent system under resource constraints is useful for
assigning consistent probability values to mathematical statements. This could
give some theoretical backing and guidance towards such solutions.
The idea of autoformalization was first presented in 1961 by John McCarthy [36].
Another early attempt was the 1990 doctoral thesis of Donald Simons [44]. A
first thorough study was performed in the 2004 doctoral thesis of Claus Zinn [60].
These works did not result in even partially practical solutions.
Josef Urban started to work on the topic in the early 2000s. He devised a
first large scale benchmark for reasoning in large theories [48], motivated by the
insight that reasoning in the presence of a large knowledge base of mathemat-
ical facts is a critical component in any autoformalization system. In 2007, he
published the pioneering MaLARea [50] system for reasoning in large theories.
From then on, with Cezary Kaliszyk they have been spearheading the research
on reasoning in large theories and autoformalization [28,29].
9 Indications of Feasibility
Given the great complexity and breadth of the problem, it is justified to ask why
is autoformalization even considered as a realistic goal in the short term – that
is, within years. This section tries to give heuristic arguments for the feasibility
of this task by methods that are either known or are on a clear improvement
trajectory.
The success of autoformalization hinges on solving two difficult-looking tasks:
The thesis of this paper is that deep learning will enable the advancement of both
of those areas to the extent that is necessary for human level formalization and
reasoning performance in the coming years. Let us review their recent progress in
separation with the focus of exploring how they could enable autoformalization.
14 C. Szegedy
networks [51] and large scale self-supervised training on vast corpora [9,41,56].
This has spurred fast advances in machine translation and language understand-
ing. On some of the benchmark, this has resulted in human or close to human
performance, for example on SQuAD 1.0 [57]. However this has lead to devel-
opment of improved benchmarks to target the common weak points of those
algorithms. Progress is still strong in this domain: improved model architectures
and better tasks on larger corpora have yielded significant gains at a steady
pace. On the analogy with computer vision, one can also foresee that natural
architecture search will give rise to further advances in this field as well. Aut-
oformalization systems can leverage all those advances for stronger translation
models from natural language to the embedding space of formal statements.
9.3 Overview
Here is a short overview of the factors that support the potential success of
autoformalization in the coming years:
1. The success of deep learning infused search in two person games, especially
AlphaZero [42] style Monte Carlo tree search [30].
2. The demonstrations of the usefulness of deep learning in automated reason-
ing: premise selection [1] and proof guidance [4,35,40]
3. The demonstration that automated proof search can be learned without
imitation [3].
4. The fast progress and success of neural architectures for formal and natural
language content, especially graph neural networks [40,52,54] and transform-
ers [51] for symbolic mathematics [31].
5. The success of imposing cyclic translation consistency [59] in image genera-
tion and unsupervised translation [32] give strong indications that autofor-
malization could be bootstrapped using very limited set of labeled pairs of
formalized theorems.
6. The success of hindsight experience replay [2] to address the sparse reward
problem for robotics applications.
7. The quick pace of progress in natural language processing via large, deep net-
work models, and large scale self-supervised pretraining. Impressive results
in several translation and natural language understanding benchmarks [34].
8. Generative neural models improve at a fast pace and yield impressive result
in a wide range of domains from image generation to drug discovery.
9. Multi-agent system with agents specialized in different domains [12] could
give a rise to open-ended self-improvement.
10. Automated optimization of neural architectures via neural architecture
search [47,62] and other automated methods [19].
11. Computational resources available for deep learning purposes are still
expanding quickly and are getting cheaper. For example, as of July 2019,
Google’ TPUv3 based pods can deliver over 100 petaFLOPS performance
for deep learning purposes [18].
16 C. Szegedy
References
1. Alemi, A.A., Chollet, F., Eén, N., Irving, G., Szegedy, C., Urban, J.: Deepmath -
deep sequence models for premise selection. In: Lee, D.D., Sugiyama, M., von
Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Pro-
cessing Systems 29: Annual Conference on Neural Information Processing Systems
2016, Barcelona, Spain, 5–10 December 2016, pp. 2235–2243 (2016)
2. Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural
Information Processing Systems 30 (NIPS 2017), pp. 5048–5058 (2017)
3. Bansal, K., Loos, S.M., Rabe, M.N., Szegedy, C.: Learning to reason in large the-
ories without imitation. arXiv preprint arXiv:1905.10501 (2019)
4. Bansal, K., Loos, S.M., Rabe, M.N., Szegedy, C., Wilcox, S.: HOList: an envi-
ronment for machine learning of higher-order theorem proving. In: Chaudhuri,
K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on
Machine Learning, ICML 2019, Proceedings of Machine Learning Research, Long
Beach, California, USA, 9–15 June 2019, vol. 97, pp. 454–463. PMLR (2019)
5. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without
BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49059-0 14
6. Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED.
J. Formalized Reasoning 9(1), 101–148 (2016)
7. The Coq Proof Assistant. http://coq.inria.fr
8. de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The lean
theorem prover (System Description). In: Felty, A.P., Middeldorp, A. (eds.) CADE
2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.
org/10.1007/978-3-319-21401-6 26
9. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (Long and Short Papers), vol. 1, pp.
4171–4186 (2019)
10. Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella,
A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-24605-3 37
11. Fitting, M.: First-order Logic and Automated Theorem Proving. Springer, New
York (2012). https://doi.org/10.1007/978-1-4612-2360-3
12. Garrabrant, S., Benson-Tilsen, T., Critch, A., Soares, N., Taylor, J.: Logical induc-
tion. arXiv preprint arXiv:1609.03543 (2016)
13. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neu-
ral networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recog-
nition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 2414–2423. IEEE
Computer Society (2016)
14. Gauthier, T., Kaliszyk, C., Urban, J.: TacticToe: learning to reason with HOL4
tactics. In: Eiter, T., Sands, D. (eds.) LPAR-21, 21st International Conference on
Logic for Programming, Artificial Intelligence and Reasoning, Maun, Botswana, 7–
12 May 2017, EPiC Series in Computing, vol. 46, pp. 125–143. EasyChair (2017)
15. Gawehn, E., Hiss, J.A., Schneider, G.: Deep learning in drug discovery. Mol. Inform.
35(1), 3–14 (2016)
16. Gonthier, G.: Formal proof-the four-color theorem. Not. AMS 55(11), 1382–1393
(2008)
18 C. Szegedy
17. Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: Blazy,
S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 163–
179. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39634-2 14
18. Google’s scalable supercomputers for machine learning, Cloud TPU Pods, are now
publicly available in beta. https://bit.ly/2YkZh3i
19. Gordon, A., et al.: MorphNet: Fast & simple resource-constrained structure learn-
ing of deep networks. In: 2018 IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 1586–
1595. IEEE Computer Society (2018)
20. Grace, K., Salvatier, J., Dafoe, A., Zhang, B., Evans, O.: When will AI exceed
human performance? evidence from AI experts. J. Artif. Intell. Res. 62, 729–754
(2018)
21. Hadjeres, G., Pachet, F., Nielsen, F.: DeepBach: a steerable model for Bach chorales
generation. In: Proceedings of the 34th International Conference on Machine Learn-
ing, vol. 70, pp. 1362–1371. JMLR (2017)
22. Hales, T., et al.: A formal proof of the Kepler conjecture. In: Forum of Mathematics,
Pi, vol. 5. Cambridge University Press (2017)
23. Harrison, J.: HOL light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.)
FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996). https://
doi.org/10.1007/BFb0031814
24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR
2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society
(2016)
25. Heule, M.J.H., Kullmann, O., Marek, V.W.: Solving and verifying the Boolean
Pythagorean triples problem via cube-and-conquer. In: Creignou, N., Le Berre, D.
(eds.) SAT 2016. LNCS, vol. 9710, pp. 228–245. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-40970-2 15
26. Huang, D., Dhariwal, P., Song, D., Sutskever, I.: GamePad: a learning environment
for theorem proving. In: 7th International Conference on Learning Representations,
ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)
27. Kaliszyk, C., Urban, J.: HOL (y) hammer: online ATP service for HOL light. Math.
Comput. Sci. 9(1), 5–22 (2015)
28. Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (Rough
Diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22102-1 15
29. Kaliszyk, C., Urban, J., Vyskocil, J.: System description: statistical parsing of
informalized Mizar formulas. In: Jebelean, T., Negru, V., Petcu, D., Zaharie, D.,
Ida, T., Watt, S.M., (eds.) 19th International Symposium on Symbolic and Numeric
Algorithms for Scientific Computing, SYNASC 2017, Timisoara, Romania, 21–24
September 2017, pp. 169–172. IEEE Computer Society (2017)
30. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J.,
Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp.
282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842 29
31. Lample, G., Charton, F.: Deep learning for symbolic mathematics. In: 8th Inter-
national Conference on Learning Representations, ICLR 2020, Addis Ababa,
Ethiopia, 26–30 April 2020. OpenReview.net (2020)
32. Lample, G., Conneau, A., Denoyer, L., Ranzato, M.: Unsupervised machine trans-
lation using monolingual corpora only. In: 6th International Conference on Learn-
ing Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018,
Conference Track Proceedings. OpenReview.net (2018)
A Promising Path Towards Autoformalization and General AI 19
33. Lee, D., Szegedy, C., Rabe, M.N., Loos, S.M., Bansal, K.: Mathematical reasoning
in latent space. In: 8th International Conference on Learning Representations,
ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020)
34. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv
preprint arXiv:1907.11692 (2019)
35. Loos, S., Irving, G., Szegedy, C., Kaliszyk, C.: Deep network guided proof search.
In: Eiter, T., Sands, D. (eds.) LPAR-21, 21st International Conference on Logic for
Programming, Artificial Intelligence and Reasoning, Maun, Botswana, 7–12 May
2017, EPiC Series in Computing, vol. 46, pp. 85–105. EasyChair (2017)
36. McCarthy, J.: Computer programs for checking mathematical proofs. In: A Paper
Presented at the Symposium on Recursive Function Theory, New York, April 1961
37. Megill, N.: Metamath. In: Wiedijk, F. (ed.) The Seventeen Provers of the World.
LNCS (LNAI), vol. 3600, pp. 88–95. Springer, Heidelberg (2006). https://doi.org/
10.1007/11542384 13
38. The Mizar Mathematical Library. http://mizar.org
39. Nisan, N., et al.: Introduction to mechanism design (for computer scientists). Algo-
rithmic Game Theor. 9, 209–242 (2007)
40. Paliwal, A., Loos, S., Rabe, M., Bansal, K., Szegedy, C.: Graph representations for
higher-order logic and theorem proving. In: The Thirty-Fourth AAAI Conference
on Artificial Intelligence, AAAI 2020, New York, NY, USA, 7–12 February 2020.
AAAI Press (2020)
41. Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M.A.,
Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North Ameri-
can Chapter of the Association for Computational Linguistics: Human Language
Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, 1–6 June 2018,
(Long Papers), vol. 1, pp. 2227–2237. Association for Computational Linguistics
(2018)
42. Silver, D., et al.: A general reinforcement learning algorithm that masters chess,
shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
43. Silver, D., et al.: Mastering the game of go without human knowledge. Nature
550(7676), 354 (2017)
44. Simon, D.L.: Checking number theory proofs in natural language. Ph.D thesis
(1990)
45. Slind, K., Norrish, M.: A brief overview of HOL4. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 28–32. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-71067-7 6
46. Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Com-
puter Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June
2015, pp. 1–9. IEEE Computer Society (2015)
47. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neu-
ral networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th
International Conference on Machine Learning, ICML 2019, Long Beach, Califor-
nia, USA, 9–15 June 2019, Proceedings of Machine Learning Research, vol. 97, pp.
6105–6114. PMLR (2019)
48. Urban, J.: Translating Mizar for first order theorem provers. In: Asperti, A., Buch-
berger, B., Davenport, J.H. (eds.) MKM 2003. LNCS, vol. 2594, pp. 203–215.
Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36469-2 16
49. Urban, J.: MPTP 0.2: design, implementation, and initial experiments. J. Autom.
Reasoning 37(1–2), 21–43 (2006)
20 C. Szegedy
50. Urban, J.: MaLARea: a metasystem for automated reasoning in large theories.
In: Sutcliffe, G., Urban, J., Schulz, S. (eds.) Proceedings of the CADE-21 Work-
shop on Empirically Successful Automated Reasoning in Large Theories, Bremen,
Germany, 17th July 2007, CEUR Workshop Proceedings, vol. 257. CEUR-WS.org
(2007)
51. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances
in Neural Information Processing Systems 30: Annual Conference on Neural Infor-
mation Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp.
5998–6008 (2017)
52. Wang, M., Tang, Y., Wang, J., Deng, J.: Premise selection for theorem proving by
deep graph embedding. In: Advances in Neural Information Processing Systems 30
(NIPS 2017), pp. 2786–2796 (2017)
53. Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle framework. In: Mohamed,
O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 33–38.
Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71067-7 7
54. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive
survey on graph neural networks. In: IEEE Transactions on Neural Networks and
Learning Systems, pp. 1–21 (2020)
55. Yang, K., Deng, J.: Learning to prove theorems via interacting with proof assis-
tants. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th Inter-
national Conference on Machine Learning, ICML 2019, Long Beach, California,
USA, 9–15 June 2019, Proceedings of Machine Learning Research, vol. 97, pp.
6984–6994. PMLR (2019)
56. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet:
generalized autoregressive pretraining for language understanding. In: Wallach,
H.M., et al. (eds.) Advances in Neural Information Processing Systems 32:
Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019,
Canada, Vancouver, BC, 8–14 December 2019, pp. 5754–5764 (2019)
57. Yu, A.W., et al.: QANet: combining local convolution with global self-attention
for reading comprehension. In: 6th International Conference on Learning Repre-
sentations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference
Track Proceedings. OpenReview.net (2018)
58. Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE
Trans. Neural Networks Learn. Syst. 23(8), 1177–1193 (2012)
59. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation
using cycle-consistent adversarial networks. In: 2017 IEEE Conference on Com-
puter Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26
July 2017, pp. 2223–2232. IEEE Computer Society (2017)
60. Zinn, C.: Understanding informal mathematical discourse. Ph.D thesis, Institut für
Informatik, Universität Erlangen-Nürnberg (2004)
61. Zombori, Z., Csiszárik, A., Michalewski, H., Kaliszyk, C., Urban, J.: Towards find-
ing longer proofs. arXiv preprint arXiv:1905.13100 (2019)
62. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In:
5th International Conference on Learning Representations, ICLR 2017, Toulon,
France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)
Full Papers
Formal Adventures in Convex
and Conical Spaces
1 Introduction
The notion of convex sets appears in various mathematical theories. A subset X
of a real vector space is called a convex set if, for any x, y ∈ X and p ∈ [0, 1],
their convex combination px + (1 − p)y is again in X. One basic use of it is to
define the convexity of functions. A function f is said to be convex if f (px +
(1 − p)y) ≤ pf (x) + (1 − p)f (y) for any convex combination px + (1 − p)y. Thus,
convex sets are natural domains for convex functions to be defined on. Good
examples of these notions can be found in information theory, where convexity is
a fundamental property of important functions such as logarithm, entropy, and
mutual information. Our InfoTheo library [17] developed in the Coq proof
assistant [29] has a formalization of textbook proofs [12] of such results.
In the course of formalizing such convexity results, we find that axiomatizing
convex sets is a useful step which provides clarity and organizability in the
results. We abstract the usual treatment of convex sets as subsets of some vector
space and employ an algebraic theory of convex spaces, which was introduced
by Stone [27]. The formalization uses the packed class construction [15,24], so
as to obtain generic notations and lemmas, and more importantly, to be able
to combine structures. Binary convex spaces are formalized in Sect. 2, and their
multiary versions are formalized in Sect. 3, along with proofs of equivalence.
We also formalize an embedding of convex spaces into conical spaces (a.k.a.
cones or real cones [31]), which we find an indispensable tool to formalize convex
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 23–38, 2020.
https://doi.org/10.1007/978-3-030-53518-6_2
24 R. Affeldt et al.
2 Convex Spaces
Let us begin with the definition of convex spaces. As mentioned in the introduc-
tion, convex spaces are an axiomatization of the usual notion of convex sets in
vector spaces. It has a long history of repeated reintroduction by many authors,
often with minor differences and different names: barycentric algebra [27], semi-
convex algebra [28], or, just, convex sets [19].
We define convex spaces following Fritz [14, Definition 3.1].
Definition 1 (Module ConvexSpace in [18]). A convex space is a structure for
the following signature:
– Carrier set X.
– Convex combination operations ( p ) : X × X → X indexed by p ∈ [0, 1].
– Unit law: x 1 y = x.
– Idempotence law: x p x = x.
– Skewed commutativity law: x 1−p y = y p x.
– Quasi-associativity law: x p (y q z) = (x r y) s z,
p/s if s = 0
where s = 1 − (1 − p)(1 − q) and r = .
0 otherwise
(Note that r is irrelevant to the value of (x r y) s z if s = 0.)
We can translate this definition to Coq as a packed class [15] with the fol-
lowing mixin interface:
1 Record mixin_of (T : choiceType) : Type := Mixin {
2 conv : prob -> T -> T -> T where "a <| p |> b" := (conv p a b);
3 _ : forall a b, a <| 1%:pr |> b = a ;
4 _ : forall p a, a <| p |> a = a ;
5 _ : forall p a b, a <| p |> b = b <| p.~%:pr |> a;
6 _ : forall (p q : prob) (a b c : T),
7 a <| p |> (b <| q |> c) = (a <|[r_of p, q]|> b) <| [s_of p, q] |> c }.
Formal Adventures in Convex and Conical Spaces 25
There are some notations and definitions to be explained. The type prob in
the above Coq code denotes the closed unit interval [0, 1]. The notation r%:pr is
a notation for a real number r equipped with a canonical proof that 0 ≤ r ≤ 1.
The notation p.~ is for 1 − p. The notation [s_of p, q] is for 1 − (1 − p)(1 − q),
and [r_of p, q] for p/[s_of p, q] .
Intuitively, one can regard the convex combination as a probabilistic choice
between two points. At line 3, the left argument is chosen with probability 1.
The lines that follow correspond to idempotence, skewed commutativity, and
quasi-associativity.
An easy example of convex space is the real line R, whose convex combination
is expressed by ordinary addition and multiplication as pa + (1 − p)b. Probability
distributions also form a convex space. In the formalization, the type fdist
A of distributions over any finite type A (borrowed from previous work [6]) is
equipped with a convex space structure, where the convex combination of two
distributions d1 , d2 is defined pointwise as x → pd1 (x) + (1 − p)d2 (x).
As a result of the packed class construction, we obtain the type convType of
all types which implicitly carry the above axioms. Then, each example of convex
space is declared to be canonically a member of convType, enabling the implicit
inference of the appropriate convex space structure. These two implicit inference
mechanisms combined make the statement of generic lemmas on convex spaces
simple and applications easy.
3.1 Axiomatization
A definition of convex spaces based on multiary operations is given as follows
(see for example [10, Definition 5] and [16, Sect. 2.1]).
Definition 2 (Convex space, multiary version). A convex space based on
multiary operations is a structure for the following signature:
– Carrier set X.
– Multiary convex combination operations, indexed by an arity n and a distri-
bution d over In :
Xn → X
(xi )i<n → di xi
i<n
– Barycenter law: di ei,j xj = di ei,j xj . (ax bary in [18])
i<n j<m j<m
i<n
We also have to separately show that (δj,Ǩ(k) λρkj )k<n and (ρj )j<m form probabil-
ity distributions. As an exceptional case, (δj,Ǩ(k) λρkj )k<n is replaced by a uniform
distribution if ρj = 0.
convType
Binary operator <| |>
- Laws from Def. 1
NaryToBin
BinToNary
- Map law
Standard
Multiary operator <&>_
- Projection law
BeaulieuToStandard - Barycenter law
- Partition-barycenter law StandardToBeaulieu
- Injective map law Beaulieu
Multiary operator <&>_
- Partition law
- Idempotence law
The partition-barycenter law can be derived from the Beaulieu style axioms, and
in turn is used to prove the injective map law. Together they allow to prove the
barycenter law.
The equivalence between binary and multiary axiomatizations requires first
to define their operators in terms of each other.
1
The support of a probability distribution d is the set {i | di > 0}.
28 R. Affeldt et al.
x0 p x1 = di xi where d0 = p and d1 = 1 − p.
i<2
The first direction, functor BinToNary in [18], must prove that the first defi-
nition satisfies the multiary axioms, and indeed amounts to proving a variant of
Stone’s lemma. We will see in the next section that the original proof by Stone
is better formalized by transporting the argument to conical spaces.
The opposite direction, functor NaryToBin, must prove the binary axioms
from the multiary ones. While we start from the standard version, the idem-
potence law proved to be instrumental in this task, together with the following
unrestricted map law.
– Map law: di gu(i) = di gj for any map u. (ax map in [18])
i<m j<n
i<m
u(i)=j
Finally, one also needs to prove that the definitions we used for each operation
in both directions are coherent.
Lemma 2 (equiv conv and equiv convn in [18]). The constructions in Def-
inition 4 (Convn and binconv) cancel each other. That is,
– If ∗ is the operator induced by Definition 4(a), and † the one induced from
it by Definition 4(b), we can derive a p † b = a p b from the binary axioms.
– If ∗ the operator induced by Definition 4(b), † is the one induced from it
by Definition 4(a), we can derive †i<n di xi = i<n di xi from the multiary
axioms.
di xi = d0 x0 + · · · + dn−1 xn−1 .
i<n
Formal Adventures in Convex and Conical Spaces 29
The additions on the right-hand side are of vectors, and thus are associative and
commutative. This means that the multiary combination on the left-hand side
is invariant under permutations or partitions on indices. We want to show that
these invariance properties are also satisfied generally in any convex space.
However, the search for the proofs is painful if naively done. This is because
binary convex combination operations satisfy associativity and commutativity
only through cumbersome parameter computations. For example, a direct proof
of the permutation case involves manipulations on the set In of indices and on
the symmetry groups, which require fairly long combinatorics [27, Lemma 2].
We present a solution to this complexity by transporting the arguments
on convex spaces to a closely related construction of conical spaces. Conical
spaces are an abstraction of cones in real vector spaces just like convex spaces
are an abstraction of convex sets. Like convex spaces, the definition of conical
spaces appears in many articles. We refer to the ones by Flood (called semicone
there) [13] and by Varacca and Winskel (called real cone there) [31]:
Definition 5 (Conical space). A conical space is a semimodule over the semir-
ing of non-negative reals. That is, it is a structure for the following signature:
– Carrier set X.
– Zero 0 : X.
– Addition operation + : X × X → X.
– Scaling operations c : X → X indexed by c ∈ R≥0 .
– Associativity law for addition: x + (y + z) = (x + y) + z.
– Commutativity law for addition: x + y = y + x.
– Associativity law for scaling: c(dx) = (cd)x.
– Left-distributivity law: (c + d)x = cx + dx.
– Right-distributivity law: c(x + y) = cx + cy.
– Zero law for addition: 0 + x = x.
– Left zero law for scaling: 0x = 0.
– Right zero law for scaling: c0 = 0.
– One law for scaling: 1x = x.
We display this definition only to show that conical spaces have straightfor-
ward associativity and commutativity. In fact, the formalization is elaborated
on the embedding of convex spaces into canonically constructed conical spaces,
which appeared in the article by Flood [13]. We build on top of each convex
space X, the conical space SX of its “scaled points”:
Definition 6 (scaled pt, addpt, and scalept in [18]). Let X be a convex
space. We define a set SX which becomes a conical space with the following
addition and scaling operations.
SX := (R>0 × X) ∪ {0}.
That is, the points of SX are either a pair p ∗ x of p ∈ R>0 and x ∈ X, or a new
additive unit 0. Addition of points a, b ∈ SX is defined by cases to deal with 0:
⎧
⎪
⎨(r + q) ∗ (x r/(r+q) y) if a = r ∗ x and b = q ∗ y
a + b := a if b = 0
⎪
⎩
b if a = 0
30 R. Affeldt et al.
We omit here the proofs that SX with these addition and scaling satisfies the
conical laws. They are proved formally in [18] (see the lemmas addptC, addptA,
scalept addpt, etc.).
Properties of the underlying convex spaces are transported into and back
from this conical space, through an embedding:
Definition 7 (S1 in [18])
ι : X SX
x → 1 ∗ x
Convex combinations in X are mapped by ι to additions in SX .
Lemma 3 (S1 convn in [18])
ι( di xi ) = di ι(xi ).
i<n
i<n
The right-hand side of the lemma is a conical sum (Fig. 2), which behaves like
an ordinary linear sum thanks to the conical laws, and enjoys good support from
MathComp’s big operator library [9].
y
y
x w
x w ι z
z −→
x/2 y/4+z/4
y/4
z/4
0
1 1 1
Fig. 2. Example of S1 convn: 1 ∗ w = 2
∗x+ 4
∗y+ 4
∗z
di xi = (d ◦ s)i (x ◦ s)i ,
i<n i<n
The proof of the barycenter property [27, Lemma 4] from Sect. 3 is based on
the same technique (see Convn convnfdist in [18]).
A way to understand this conical approach is to start from Stone’s definition
of convex spaces [27]. He uses a quaternary convex operator (x, y; α, β) where x
and y are points of the space, and α and β are non-negative coefficients such that
α + β > 0. Its values are quotiented by an axiom to be invariant under scaling,
removing the need to normalize coefficients for associativity. This amounts to
regarding a convex space as the projective space of some conical space.
The definition of SX is a concrete reconstruction of such a conical space from
a given convex space X. The benefit of this method over Stone’s is the removal of
quotients by moving the coefficients from operations to values. We can then use
the linear-algebraic properties of conical sums such as the neutrality of zeroes,
which had to be specially handled in Stone’s proofs (e.g., [27, Lemma 2]).
Examples. We illustrate how ι is used in practice with the proof of the entropic
identity. Let T be a convType; we want to show that
(a q b) p (c q d) = (a p c) q (b p d). (1)
We could use the properties of convex spaces, but this will result in cumbersome
computations, in particular because of quasi-associativity. Instead, we proceed
by an embedding into the set of scaled points over T using ι. First, we observe
that these scaled points form a convex space for the operator p, a, b → pa + p̄b
and that ι(a p b) = ι(a) p ι(b). As a consequence, when we apply ι to
Equation (1), its left-hand side becomes
The main difference with Eq. (1) is that + (Coq notation: addpt) enjoys (uncon-
ditional) associativity, making the rest of the proof easier. In the proof script
below, line 4 performs the embedding by first using the injectivity of ι (lemma
S1 inj), then using the fact that ι is a morphism w.r.t. p (lemma S1 conv),
and last by revealing the definition of the operator of the convex spaces formed by
scaled points (lemma convptE). The proof can be completed by rewritings with
properties of addpt and scalept until the left-hand side matches the right-hand
side.
1 Lemma convACA (a b c d : T) p q :
2 (a <|q|> b) <|p|> (c <|q|> d) = (a <|p|> c) <|q|> (b <|p|> d).
3 Proof.
4 apply S1_inj; rewrite ![in LHS]S1_conv !convptE.
5 rewrite !scalept_addpt ?scalept_comp //.
6 rewrite !(mulRC p) !(mulRC p.~) addptA addptC (addptC (scalept (q * p) _)).
7 rewrite !addptA -addptA -!scalept_comp -?scalept_addpt //.
8 by rewrite !(addptC (scalept _.~ _)) !S1_conv.
9 Qed.
We conclude this section with an example that provides a closed formula for
the multiary convex combination i<n ei gi (Coq notation: <|>_e g ) in the
case of the real line (seen as a convex space):
32 R. Affeldt et al.
i<n ei gi = scaleR(ι(
i<n ei gi )) by Scaled1RK
= scaleR(
i<n ei ι(gi )) by S1 convn
= i<n scaleR(ei ι(gi )) by big scaleR
= i<n ei scaleR(ι(gi )) by scaleR scalept
= i<n ei gi by Scaled1RK
We use the predicate is convex set to define the type {convex_set T} of convex
sets over T.
We can turn any set of points in a convex space into a convex set, namely,
by taking convex hulls.
Example. The following example illustrates the usefulness of conical spaces when
reasoning about convex hulls.
Our goal is to prove that for any z ∈ hull (X ∪ Y ) (X = ∅, Y = ∅), there exist
x ∈ X and y ∈ Y such that z = x p y for some p (see the formal statement at
line 1 below).
We first introduce two notations. Let scaled set X be the set {p ∗ x | x ∈ X}.
For any a = 0, let [point of a0] (where a0 is the proof that a = 0) be the x
such that a = p ∗ x for some p.
To prove our goal, it is sufficient to prove that there exist a ∈ scaled set X
and b ∈ scaled set Y such that ι(z) = a + b (this reasoning step is the purpose
of line 6). When a = 0 or b = 0, we omit easy proofs at lines 8 and 9. Otherwise,
we can take x to be [point of a0] and y to be [point of b0] as performed by
the four lines from line 10.
Formal Adventures in Convex and Conical Spaces 33
We now establish the sufficient condition (from line 14). Since z is in the
hull, we have a distribution d and n points gi such that z = i<n di gi . We then
decompose ι(z) as follows:
ι(z) = di (ι(gi )) = di (ι(gi )) + di (ι(gi )) .
i<n i<n,gi ∈X i<n,gi ∈X
/
b c
In this section, we first (Sect. 6.1) formalize a generic definition of convex func-
tions based on convex spaces; for that purpose, we introduce in particular ordered
convex spaces. To demonstrate this formalization, we then apply it to the proof of
the concavity of the logarithm function and to an information-theoretic function
(Sect. 6.2).
34 R. Affeldt et al.
7 Related Work
Conical spaces have been known in the literature to work as a nice-behaving
replacement of convex spaces when constructing models of nondeterministic
36 R. Affeldt et al.
computations. Varacca and Winskel [31] used convexity when building a cat-
egorical monad combining probability and nondeterminism, but they chose to
avoid the problem of equational laws in convex spaces by instead working with
conical spaces. There is a similar preference in the study of domain-theoretic
semantics of nondeterminism, to a conical structure (d-cones [23]) over the cor-
responding convex structure (abstract probabilistic domain [20]). The problem
is the same in this case: the difficulty in working with the equational laws of
convex spaces [22,30].
Flood [13] proposed to use conical spaces to investigate the properties of
convex spaces. He showed that for any convex space, there is an enveloping
conical space and the convex space is embedded in it. (A version of the embedding
for convex sets into cones in vector spaces was already present in Semadini’s
book [26].) Keimel and Plotkin [21] extended the idea for their version of ordered
convex spaces and applied it in the proof of their key lemma [21, Lemma 2.8],
which is an ordered version of the one proved by Neumann [25, Lemma 2].
Another aspect of convex spaces is the relationship to probabilistic distri-
butions. From any set, one can freely generate a convex space by formally tak-
ing all finite convex combinations of elements of this set. The resulting convex
space can be seen as a set of distributions over the original set, since the for-
mal convex combinations are equivalent to distributions over the given points.
By this construction, convex spaces serve as a foundation for the algebraic and
category-theoretic treatments of probability. This allows for another application
of our work to the semantics of probabilistic and nondeterministic program-
ming [16,19]. We have also been investigating this topic [3,7]. Our most recent
result [2] is based on the properties of convex sets and convex hulls, and deals
with derived notions such as convex powersets. Its purpose is the formal study of
program semantics from a category-theoretic point of view, rather than the for-
mal study of the mathematical structure of convex spaces itself, which is rather
the purpose of this paper.
8 Conclusion
In this paper, we formalized convex and conical spaces and developed their
theories. In particular, we formally studied the various presentations of the con-
vex combination operator, be it binary or multiary (Sect. 3). We provide formal
proofs of the equivalence between several axiomatizations of both operators,
where “proofs” in the literature were often only mere references to Stone’s foun-
dational paper [27], while it only contains a reduction of the multiary case to
the binary one. Based on convex and conical spaces, we also developed a theory
of convex functions and of convex hulls. We illustrated these developments with
detailed examples from real analysis and information theory.
References
1. Affeldt, R., Cohen, C., Rouhling, D.: Formalization techniques for asymptotic rea-
soning in classical analysis. J. Formaliz. Reason. 11(1), 43–76 (2018)
2. Affeldt, R., Garrigue, J., Nowak, D., Saikawa, T.: A trustful monad for axiomatic
reasoning with probability and nondeterminism, March 2020, https://arxiv.org/
abs/2003.09993
3. Affeldt, R., et al.: Monadic equational reasoning in Coq (2019). https://github.
com/affeldt-aist/monae/, Coq scripts
4. Affeldt, R., Garrigue, J., Saikawa, T.: Examples of formal proofs about data com-
pression. In: International Symposium on Information Theory and Its Applications
(ISITA 2018), Singapore, 28–31 October 2018, pp. 665–669. IEICE, IEEE Xplore,
October 2018
5. Affeldt, R., Garrigue, J., Saikawa, T.: Reasoning with conditional probabilities
and joint distributions in Coq. Computer Software (2020, to appear). Japan Soci-
ety for Software Science and Technology. https://staff.aist.go.jp/reynald.affeldt/
documents/cproba preprint.pdf
6. Affeldt, R., Hagiwara, M., Sénizergues, J.: Formalization of Shannon’s theorems.
J. Autom. Reason. 53(1), 63–103 (2014)
7. Affeldt, R., Nowak, D., Saikawa, T.: A hierarchy of monadic effects for program
verification using equational reasoning. In: Hutton, G. (ed.) MPC 2019. LNCS,
vol. 11825, pp. 226–254. Springer, Cham (2019). https://doi.org/10.1007/978-3-
030-33636-3 9
8. Beaulieu, G.: Probabilistic completion of nondeterministic models. Ph.D. thesis,
University of Ottawa (2008)
9. Bertot, Y., Gonthier, G., Ould Biha, S., Pasca, I.: Canonical big operators. In:
Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp.
86–101. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71067-
7 11
10. Bonchi, F., Silva, A., Sokolova, A.: The power of convex algebras. In: Meyer, R.,
Nestmann, U. (eds.) 28th International Conference on Concurrency Theory (CON-
CUR 2017). Leibniz International Proceedings in Informatics (LIPIcs), vol. 85, pp.
23:1–23:18. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2017). https://doi.
org/10.4230/LIPIcs.CONCUR.2017.23
11. Cheung, K.H.: Distributive interaction of algebraic effects. Ph.D. thesis, University
of Oxford (2017)
12. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley,
Hoboken (2006)
13. Flood, J.: Semiconvex geometry. J. Aust. Math. Soc. 30(4), 496–510 (1981).
https://doi.org/10.1017/S1446788700017973
14. Fritz, T.: Convex spaces I: Definition and examples (2015). https://arxiv.org/abs/
0903.5522, First version: 2009
15. Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical
structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs
2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). https://doi.org/
10.1007/978-3-642-03359-9 23
16. van Heerdt, G., Hsu, J., Ouaknine, J., Silva, A.: Convex language semantics for
nondeterministic probabilistic automata. In: Fischer, B., Uustalu, T. (eds.) ICTAC
2018. LNCS, vol. 11187, pp. 472–492. Springer, Cham (2018). https://doi.org/10.
1007/978-3-030-02508-3 25
38 R. Affeldt et al.
Overview. In the next section we will show that mathematical resources and
information needs have more aspects than the formulas and words used in MIR so
far. In Sect. 3 we present an architecture for a generic multi-aspect representation
and search system, in Sect. 4 we discus indexing concrete mathematical data, and
in Sect. 5 we specify a cross-aspect query language for MIR. Section 6 concludes
the paper.
The OEIS portal features a simple boolean search engine which allows
to search for sequences by OEIS ID, name, keywords, and subsequence (this
can contain anonymous wildcards for integers and subsequences). Additionally,
atomic queries can be qualified by prefixes that restrict keywords to the various
42 K. Berčič et al.
classes of items or change the sequence matching algorithm (e.g. from signed to
unsigned equality on components). The query results of this query are a sequence
of complete presentations of the sequence information ordered by “relevance”,
which combines match quality, sequence popularity and number. There is a vari-
ant called superseeker (an e-mail server) that “tries hard to find an explanation
for a number sequence” combining information from the OEIS and other sources.
1. symbolic knowledge: the formulae, even though in this case they are informal
ASCII art; there is also computer code,
Towards a Heterogeneous Query Language for Mathematical Knowledge 43
physically occur in the fragment but logically belong to it. We call these induced
objects. There are many instances of induced objects, e.g.:
– In an organizational library, we can take the transitive closure of a relation.
– In a symbolic language that uses some kind of inheritance or logical imports,
a fragment F might be a class/module etc., and we can index also objects
logically occurring in F through inheritance. That is already done routinely
in many documentation generation tools, especially for object-oriented pro-
gramming languages. We also built a symbolic index like that for Mmt in
[IKP14].
– Sometimes, especially in deduction systems, the most feasible way to imple-
ment the harvester is to instrument the kernel. But the kernel may perform
extensive normalization, in which case the index will contain the objects
induced by normalization. Unfortunately, that also means it might not con-
tain some objects originally in the library (because they are normalized away),
which is a known problem with indexing deductive libraries.
In the sequel, we follow our design from the previous section and specify how
a relational database can be used to build an index of concrete objects. Note that
in this design, the entire database serves as the index, and that use of the word
“index” must be distinguished from any internal indexes kept by the database
implementation.
An Index Design. We use a relational database with one table for each type in
our standardized language of concrete objects. Each table has a column “value”
holding the object using a chosen standard encoding.
For each type we define a set of operations that are precomputed and stored
with the objects (e.g., the factorization of an integer or the roots of a polynomial),
and their results are stored in additional columns. However, these columns do
not hold the actual result objects; instead, the results are concrete objects that
are themselves stored in the index, and the columns just hold references to them.
(A recursion threshold is used in case this process does not terminate.)
48 K. Berčič et al.
Concrete Queries. A concrete query over this index is of the form SELECT X1 :
T1 , . . . , Xn : Tn WHERE P (X1 , . . . , Xn ). Here the Ti are types and P is a com-
putable MDDL-expression of boolean type. The Xi represent objects in the
index of type Ti and are bound in P . The intended semantics is that it returns
all substitutions to the Xi for which P is true.
It is straightforward to develop more complex query languages, but even
this simple form is quite difficult to implement. Most critically, even if P is
computable, it may not be efficiently computable. And even if it is, it may not
be practical to program the computation inside an SQL database.
On the other hand, many simple forms of P can be directly translated to SQL
queries. For example, if f is one of the precomputed values for T , then SELECT X :
T WHERE f (X) = 5 becomes the SQL query SELECT value FROM T WHERE f = 5.
Open Problems. While we are convinced in general of the utility of this design,
several open problems remain, for which further research is needed. We discuss
these in the remainder.
In some cases, our design will explode. For example, storing all subsequences
of an OEIS sequence may become infeasible quickly even if attention is restricted
to fixed-length prefixes of sequences. Thus, special indexing techniques must be
developed for individual types and operations.
Another issue is the choice of codec in the index. For each type, we can choose
a standard codec and use it to represent the objects in that type’s table. Then
harvesters that find encoded objects in different encodings must transcode them
into the standard encoding. However, in some cases this will be inefficient—the
most common example is the trade-off between sparse and dense encodings of
lists.
But even in the seemingly trivial case of integers, this can become an issue:
For example, in [WKR17], we encountered multiple different encodings of unlim-
ited precision integers transcoding between which was not always trivial. This
is aggravated in connection with the next issue discussed below: different codecs
may commute more easily with different mathematical operations. Therefore,
it may be necessary to use multiple tables for the same type—one per codec.
This will make retrieval harder as results from all tables have to be considered;
moreover, the same object might exist in multiple tables.
Towards a Heterogeneous Query Language for Mathematical Knowledge 49
L
Q2
Q
QL
3 :S
ber of libraries, which are
AR
narr:Elastic R2
Q
harvested into four aspect- ∗
Qs
4 :SP
..
specific indexes as described . Qp
Q
above. A user query Q is conc:SQLDB R3
expressed in a cross-aspect R
R4
query language described lib n org:GraphDB ResAgg
below. It is passed to a ..
query engine that separates .
Q into a set of aspect-
specific atomic queries Qi , Fig. 3. The search architecture
for which the respective
database returns result Ri . These are then aggregated into the overall result
R that is returned to the user. Note that our drawing uses exactly one query
Qi per aspect—that is just an example, and there can be any number of queries
(including zero) per index. It is also straightforward to extend the design with
additional indexes if new kinds of indexes are conceived.
In this paper, we focus on a relatively simple format for the queries: Every
query Q consists of
– a list of query variables X1 , . . . , Xn , we use upper case letters for them,
– a list of atomic queries Qi (X1 , . . . , Xn ).
Each atomic query is aspect-specific and resolved by lookup in the respective
index. The intuition of the overall result R is to return the intersection of the
atomic queries Qi . More formally, the results Ri and R of the queries are sub-
stitutions for the query variables. The atomic queries are evaluated sequentially;
each time some query variables may already have been instantiated by previous
atomic queries, and the results are substitutions for the remaining ones.
More complex queries can easily be conceived, but this simple fragment cap-
tures not only the most practically relevant cases but also one of the biggest
difficulties of heterogeneous queries: How can queries of different aspects mean-
ingfully share query variables? The latter is what we discuss in the remainder.
50 K. Berčič et al.
Atomic Queries with Shared Variables. To specify our query language in detail,
we have to spell out the structure of the atomic queries. Here, we are mostly
bound by the capabilities of the existing aspect-specific indexes except for occa-
sionally deriving improvement suggestions for them.
All atomic queries are relative to a set of query variables ranging over formal
objects. All query variables may by typed with MDDL types. The results are
substitutions of the query variables with formal objects. Here the set of formal
objects should be a large enough to subsume content MathML but should also
allow any URI as an identifier even if it is not declared in some content dictionary
(e.g., any identifier of a paper, author, etc.) as well as sufficient literals as needed
to build concrete objects.
Concretely, we assume the following:
Both SPAQRL and MDDL queries naturally use a SELECTWHERE form with the
WHERE clause containing a conjunction of atoms. This inspires our overall syntax
for heterogeneous queries: SELECT V ∗ WHERE A∗ where each V declares a query
variable X as X : T , and each A is one of the four atoms. For convenience,
we also allow undeclared query variables—these are simply dropped from the
returned substitutions.
Notably, stand-alone symbolic query engines only use S as the query (rather
than F ∈ Symb(S)) and return pairs of fragment identifiers and substitutions.
Similarly, stand-alone narrative query engines usually only use the bag of words
as the query. But in heterogeneous queries, we may want to use the fragment
identifier in other atoms of the query. Therefore, we have extended the syntax
for symbolic and narrative atoms with an explicit query variable referring to
the fragment. The corresponding extension is not needed for organizational and
concrete atoms.
Towards a Heterogeneous Query Language for Mathematical Knowledge 51
The first atom in the WHERE-clause returns all arc-transitive graphs G in the
concrete index.
The second atom retrieves the names of these graphs and runs a narrative
query for them. This includes evaluating the expression Name(G) into a string
by retrieving the corresponding value from the concrete index. To avoid false-
positives, we include the word “graph” in the narrative atom. It instantiates F
with the identifier of the matching fragment, presumably a part of a paper.
The next three atoms are organizational atoms that perform a SPARQL
query retrieving first the identifier P of the paper containing F , the identifiers
J of the journal it appeared in, and its h-index H. H is a concrete value that is
reused in the final concrete query on the size of H.
Finally, we throw away all variables from the obtained substitutions except
for the graphs G. Alternatively, we could include P in the SELECT-clause to also
return the paper.
In the above example, we see how a query compiler should consider merging
consecutive organizational atoms into a single SPARQL query. In that case, the
last concrete atom of the example could, because it is so simple, alternatively
and more efficiently be included in that SPARQL query as well. Moreover, the
atoms in the WHERE-clause were ordered in a way that previous queries restrict
the scope of the subsequent ones. More generally, the query compilers should
reorder the atoms automatically.
mathematical libraries often have a primary aspect, they usually also contain or
reference material of other aspects as well. Our cross-aspect search architecture
proposes to harvest all objects into aspect-specific indexes. Correspondingly, the
proposed query language combines atomic queries from existing aspect-specific
query languages and a query compiler distributes them to the respective indices.
The query language is more than just a sum of the four parts as it allows to
share variables between the aspect-specific sub-queries and compute non-trivial
joins.
We have conducted a requirement analysis on the respective basis technolo-
gies and have confirmed the principal adequacy of the query language on paradig-
matic, cross-aspect information needs. This shows that existing search/indexing
technologies are essentially sufficient for cross-aspect search except for the con-
crete aspect, where our previous work in MathDataHub provides a good first
step.
The obvious next step is an implementation of a distributed cross-aspect
search engine as sketched as part of the MathHub system. MathHub already has
already collected most of the largest theorem prover libraries (symbolic), the
1.5M preprints of the arXiv, and several large collections of concrete mathemat-
ical objects in a common representation format and assigned uniform identifiers
to their fragments. MathHub already integrates symbolic and narrative indices,
and the MMT system which MathHub employs for knowledge management –
while not a dedicated index – can already answer complex symbolic and organi-
zational queries [Rab12].
References
[AD] Apache Drill - Schema-free SQL Query Engine for Hadoop, NoSQL and
Cloud Storage. https://drill.apache.org. Accessed 03 Feb 2020
[Aiz+16] Aizawa, A., et al.: NTCIR-12 MathIR task overview. In: Kando, N., Sakai,
T., Sanderson, M. (ed.) Proceedings of the 12th NTCIR Conference on
Evaluation of Information Access Technologies, Tokyo, Japan: NII, Tokyo,
pp. 299–308 (2016). https://tinyurl.com/sofcxjs
[BKR20] Berčič, K., Kohlhase, M., Rabe, F.: Towards a Heterogeneous Query Lan-
guage for Mathematical Knowledge - Extended Report (2020). http://
kwarc.info/kohlhase/papers/tetrasearch.pdf. Accessed 27 Mar 2020
[Car+20a] Carette, J., et al.: Big math and the one-brain barrier - the tetrapod model
of mathematical knowledge. In: Mathematical Intelligencer (2020, in press).
https://arxiv.org/abs/1904.10405
[Car+20b] Carette, J., et al.: The space of mathematical software systems - a survey
of paradigmatic systems. preprint; http://arxiv.org/abs/2002.04955 (2020)
[Cho+05] Chong, E.I., et al.: An efficient SQL-based RDF querying scheme. In: Pro-
ceedings of the 31st VLDB Conference (2005)
[DHI12] Doan, A.H., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier,
Amsterdam (2012)
[DMH] Datasets on MathHub.info. https://data.mathhub.info. Accessed 24 Sept
2019
54 K. Berčič et al.
[EBO] Eick, B., Besche, H.U., O’Brien, E.: SmallGrp - The GAP Small
Groups Library. https://www.gap-system.org/Manuals/pkg/SmallGrp-1.
3/doc/chap1.html. Accessed 13 Oct 2018
[EET] Wilson, S., Potočnik, P.: A Census of edge-transitive tetravalent graphs.
https://jan.ucc.nau.edu/∼swilson/C4FullSite/index.html. Accessed 23 Jan
2019
[Eso] Elastic Search, 20 February 2014. http://www.elasticsearch.org/. Accessed
20 Feb 2014
[FL] Kohonen, J.: Lists of finite lattices (modular, semimodular, graded and geo-
metric). https://www.shsu.edu/mem037/Lattices.html. Accessed 25 Jan
2019
[GSC15] Guidi, F., Sacerdoti Coen, C.: A survey on retrieval of mathematical knowl-
edge. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.)
CICM 2015. LNCS (LNAI), vol. 9150, pp. 296–315. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-20615-8 20
[HKP14] Hambasan, R., Kohlhase, M., Prodescu, C.: MathWeb-search at NTCIR-11.
In: Kando, N., Joho, H., Kishida, K. (ed.) NTCIR 11 Conference, Tokyo,
Japan: NII, Tokyo, pp. 114–119 (2014). https://tinyurl.com/wzj7mcg
[IKP14] Iancu, M., Kohlhase, M., Prodescu, C.: Representing, archiving, and search-
ing the space of mathematical knowledge. In: Hong, H., Yap, C. (eds.) ICMS
2014. LNCS, vol. 8592, pp. 26–30. Springer, Heidelberg (2014). https://doi.
org/10.1007/978-3-662-44199-2 5
[Lmf] The L-functions and Modular Forms Database. http://www.lmfdb.org.
Accessed 27 Aug 2016
[McF] McKay, B.: Description of graph6, sparse6 and digraph6 encodings. http://
users.cecs.anu.edu.au/∼bdm/data/formats.txt. Accessed 22 Mar 2019
[MDT] Berčič, K.: Math Databases Table. https://mathdb.mathhub.info/.
Accessed 15 Jan 2020
[OEIS] The On-Line Encyclopedia of Integer Sequences. http://oeis.org. Accessed
28 May 2017
[Rab12] Rabe, F.: A query language for formal mathematical libraries. In: Jeuring,
J., et al. (eds.) CICM 2012. LNCS (LNAI), vol. 7362, pp. 143–158. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-31374-5 10
[SK08] Stamerjohanns, H., Kohlhase, M.: Transforming the arX iv to XML. In:
Autexier, S., et al. (eds.) CICM 2008. LNCS (LNAI), vol. 5144, pp.
574–582. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-
85110-3 46
[Slo03] Sloane, N.J.A.: The on-line encyclopedia of integer sequences. In: Notices
of the AMS, vol. 50, no. 8, p. 912 (2003)
[ST16] Stathopoulos, Y., Teufel, S.: Mathematical information retrieval based on
type embeddings and query expansion. In: Proceedings of COLING 2016,
ACL, pp. 2344–2355 (2016). https://www.aclweb.org/anthology/C16-1221
[Sta+18] Stathopoulos, Y., et al.: Variable typing: assigning meaning to variables
in mathematical text. In: NAACL 2018 Proceedings, ACL, pp. 303–312
(2018). https://doi.org/10.18653/v1/N18-1028
[WKR17] T. Wiesing, M. Kohlhase, and F. Rabe. ”Virtual Theories - A Uniform
Interface to Mathematical Knowledge Bases”. In: Mathematical Aspects of
Computer and Information Sciences. Ed. by J. Blömer et al. Springer, 2017,
pp. 243–257
Leveraging the Information Contained in
Theory Presentations
1 Introduction
A theorem prover on its own is not nearly as useful for end-users as one equipped
with extensive libraries. Most users have tasks to perform that are not related to
new ideas in theorem proving. The larger the library of standard material, the
faster that users can just get to work. However building large libraries is currently
very labor intensive. Although some provers provide considerable automation for
proof development, they do not the same for theory development.
This is the problem we continue [1,6,8,9] to tackle here, and that others [11]
have started to look at as well. It is worthwhile noting that some programming
languages already provide interesting features in this direction. For example,
Haskell [22] provides the deriving mechanism that lets one get instances for some
classes “for free”; recently, the Deriving Via mechanism [2] has been introduced,
that greatly amplifies these features. Some libraries, such as the one for Lens [24],
use Template Haskell [33] for the same purpose.
Libraries of algebra define algebraic structures, constructions on these, and
properties satisfied by the structures and constructions. While structures like
Semigroup, Monoid, AbelianGroup, Ring and Field readily come to mind, a
look at compendiums [21,23] reveals a much larger zoo of hundreds of structures.
Haskell Agda
class Semiring a = > Monoid a record Monoid c :
where Set ( suc ( c )) where
mempty :: a infixl 7 _•_
mappend :: a -> a -> a infix 4 _≈_
mappend = ( < >) field
mconcat :: [ a ] -> a Carrier : Set c
mconcat = _≈_ : Rel Carrier
foldr mappend mempty _•_ : Op 2 Carrier
Coq isMonoid : IsMonoid _≈_ _•_ ε
class Monoid { A : type } where IsMonoid is defined as
( dot : A → A → A ) record IsMonid (• : Op 2 ) (ε : A )
( one : A ) : Prop := { : Set ( a ) where
dot_assoc : forall x y z : A , field
( dot x ( dot y z )) = isSemiring : IsSemiring •
dot ( dot x y ) z identity : Identity ε
unit_left : forall x , identity l : LeftIdentity ε •
dot one x = x identity l : proj 1 identity
unit_right : forall x , identity r : Rightdentity ε •
dot x one = x identity r : proj 2 identity
}
Alternative Definition: MMT
Record monoid := { theory Semigroup : ? NatDed =
dom : Type ; u : sort
op : dom -> dom -> dom comp : tm u → tm u → tm u
where " x * y " := op x y ; # 1 * 2 prec 40
id : dom where "1" := id ; assoc : ∀ [x , y , z ]
assoc : forall x y z , (x * y) * z = x * (y * z)
x * (y * z) = (x * y) * z; assocLeftToRight :
left_neutral : forall x , {x ,y , z } ( x * y ) * z
1 * x = x; = x * (y * z)
right_neutal : forall x , = [x ,y , z ]
x * 1 = x; allE ( allE ( allE assoc x ) y ) z
} assocRightToLeft :
MathScheme {x ,y , z } x * ( y * z )
Monoid := Theory { = (x * y) * z
U : type ; = [x ,y , z ] sym assocLR
* : (U , U ) → U ; theory Monoid : ? NatDed
e : U; includes ? Semigroup
axiom right_identity_ * _e : unit : tm u # e
forall x : U · ( x * e ) = x ; unit_axiom : ∀ [ x ] = x * e = x
axiom left_identity_ * _e :
forall x : U · ( e * x ) = x ;
axiom associativity_ * :
forall x ,y , z : U ·
( x * y ) * z = x * ( y * z );
}
systems implement Monoid in different ways (see Fig. 1). Other than layout and
vocabulary, different libraries also make more substantial choices:
One can see that this definition can be “derived” from that of Monoid. And that,
in fact, this derivation is uniform in the “shape” of the definition of Monoid,
so that this construction applies to any single-sorted equational theory. This
observation is one of the cornerstones of Universal Algebra [35].
There are other classical constructions that can also be generated. This poses
a number of questions:
Theories written in equational logic that describe algebraic structures are rich
in implicit information that can be extracted automatically.
There are obstacles to this automation. For example, definitional and
“bundling” choices can make reuse of definitions from one project in another
with different aims difficult. Thus users resort to redefining constructs that have
already been formalized. We then end up with multiple libraries for the same
topic in the same system. For example, there are at least four algebra libraries in
Coq [17,18,30,34], and even more for Category Theory [19]. In [17], the authors
mention, referring to other libraries:
“In spite of this body of prior work, however, we have found it difficult to
make practical use of the algebraic hierarchy in our project to formalize
the Feit-Thompson Theorem in the Coq system.”
2.1 Homomorphism
1
The implementation is available at https://github.com/ysharoda/tog.
2
We do not have enough room to give an introduction to each system; hopefully each
system’s syntax is clear enough for the main ideas to come through.
Leveraging the Information Contained in Theory Presentations 59
private
module F = Monoid From
module T = Monoid To
(∀ x ∈ carrier G · ∀ y ∈ carrier G ·
h ( x ⊕G y ) = h x ⊕H h y )}"
The reader might notice a discrepancy in the above: unit preservation is missing.
The Isabelle library does not provide this version. There is, however, a proof that
such a multiplication-preserving homomorphism necessarily maps the source unit
to a unit of the image (sub)monoid, but that unit is not necessarily that of the
full image. The above definition is also used to define group homomorphism and
other structures. We consider this to be missing information in the library.
Lean’s definition of monoid homomorphism is the one that most resembles the
one found in textbooks.
structure monoid_hom ( M : Type *) ( N : Type *)
[ monoid M ] [ monoid N ] :=
( to_fun : M → N )
( map_one ’ : to_fun 1 = 1)
( map_mul ’ : ∀ x y , to_fun ( x * y ) = to_fun x * to_fun y )
However, in the same file, there is another definition of add_monoid_hom that
looks “the same” up to renaming. This points to a weakness of Lean: there is
no renaming operation on structure, and for a Ring to contain two “monoids”,
one is forced to duplicate definitions. This redundancy is unpleasant.
The “term language” of a theory is the (inductive) data type that represents the
syntax of well-formed terms of that theory, along with an interpretation function
from expressions to the carrier of the (implicitly single-sorted) given theory, i.e.
its denotational semantics.
In Agda, the definition of Monoid term language is straightforward:
data Expr ( n : N) where
var : Fin n → Expr n
id : Expr n
_⊕_ : Expr n → Expr n → Expr n
In Agda, these definitions are not found with the definitions of the algebraic
structures themselves, but rather as part of the Solver for equations over that
theory. Here, we find more duplication, as the above definitions are repeated
for the following three highly related structures: Monoid, CommutativeMonoid and
IdempotentCommutativeMonoid.
Despite its usefulness, we were not able to find the definition of the term
language of a theory in Isabelle/HOL or Lean.
2.3 Product
Until recently, there was no definition of the product of algebraic structures in
the Agda library. A recent pull request has suggested adding these, along with
other constructions. The following hand-written definition has now been added:
rawMonoid : RawMonoid c c → RawMonoid d d →
RawMonoid ( c d ) ( c d)
rawMonoid M N = record
{ Carrier = M . Carrier × N . Carrier
; _≈_ = Pointwise M . _≈_ N . _≈_
; _·_ = zip M . _·_ N . _·_
; ε = M .ε , N .ε
}
where
module M = RawMonoid M
module N = RawMonoid N
These could have been mechanically generated from the definition of Monoid.
Both Isabelle/HOL and Lean provide definitions of product algebras for
monoids, which we omit for space. It is worth mentioning that the Lean library
has 15 definitions for products of structures that look very similar and could be
generated.
One can easily proceed to show that this predicate on a monoid induces a new
(sub)monoid. In fact, we do not need associativity for this; in other words,
already a unital magma induces a trivial monoid.
Monoid actions are extremely useful for expressing ideas in group theory, and in
automata theory. They are only defined in the presence of a monoid structure,
which can be easily checked at the meta level.
Subsets Action. The fourth example construct, from a Monoid M, the monoid
on the subsets of M. Note that the following is pseudo-code written in an imagined
Set-theoretic extension of dependent type theory.
record SubsetsAction { A : Set } ( M : Monoid A ) : Set
where
constructor subsetsAction
Leveraging the Information Contained in Theory Presentations 63
field
S : ( powerset A )
e’ : S
op ’ : S → S → S
e ’ def : e ’ == { M . e }
op ’ def : { x y : S } → ( op ’ x y )
== {( M . op a b ) | a ∈ x and b ∈ y }
The subsets monoid is used extensively in automata theory and group theory.
The above can also be written as a construction of a new monoid, in depen-
dent type theory, where the carrier is the set of unary relations on A.
Monoid Cosets. The next example constructs, from a Monoid M , the cosets
of M . This is also pseudo-code, as above.
record MonoidCosets { A : Set } ( M : Monoid A ) : Set
where
constructor monoidCosets
field
S : ( powerset A )
e’ : S
op ’ : A → S → S
e ’ def : e ’ == { M . e }
op ’ def : { a : A } → { x : S } → ( op ’ a x )
== {( M . op a b ) | b ∈ x }
Monoid cosets are extensively used in group theory.
• A parameter to the record has the type Binding. It can be hidden using
HBind [Arg] Expr, or explicit using Bind [Arg] Expr.
• A declaration within the record has the type Constr Name Expr.
In Universal Algebra, an algebraic theory consists of sorts, function symbols
(with their arities) and a list of axioms, often denoted as a theory T having three
components (S,F,E). We assume a single sort. This can be internalized, in the
Haskell implementation of Tog, as
data EqTheory = EqTheory {
thryName :: Name_ ,
sort :: Constr ,
funcTypes :: [ Constr ] ,
axioms :: [ Constr ] ,
waist :: Int }
where:
– sort, funcTypes, and axioms are treated as elements of a telescope [13]. There-
fore, the order in which they are defined matters.
– The waist is a number referring to how many of the declarations within the
telescope are parameters. The notation is taken from [1]. This information is
needed in generating some constructions, like homomorphism.
Given a Tog record type that exhibits an equational theory structure, like
that of Monoid in Sect. 1, we convert it into an instance of EqTheory. We, then,
proceed with generating useful information from the theory. Finally, we convert
this information into Tog records and data types, so they can be type checked
by Tog, i.e. our approach builds on Tog, without changing its syntax or type
checker. In the sequel of this section, we describe the constructions we generate.
3.1 Signature
Given a theory T = (S,F,E), the signature of the theory is Sig(T) = (S,F). A
signature is obtained from an EqTheory as follows:
signature_ :: Eq . EqTheory -> Eq . EqTheory
signature_ =
over Eq . thyName (++ " Sig ") . set Eq . axioms [] . gmap ren
For a theory with name X, the signature is an EqTheory with the name XSig and
an empty axioms list. The theory and its signature exists in the same module.
Tog requires that they have different field names. We use gmap ren to apply this
renaming. We discuss this in more details in Sect. 3.5.
Constructors are generated by substituting the name of the language type for a
sort A. Term languages are realized as Tog data declarations using the constructor
Data.
Generating the closed term language is a first step to generating an open
term language (i.e. a term language parametrized by a type of variables), and
an interpreter.
For some kinds of axioms, namely those that can be oriented, we can turn
these into simplification rules, i.e. into (unconditional) rewrite rules. The result-
ing simplifier can be shown to be meaning preserving. These two pieces, the
evaluator and simplifier, can be attached to each other to form a partial evalua-
tor, using the “finally tagless” [7] method. Eventually, we would like to be able
to automate the majority of the hand-written code for a generative geometry
library [4], which is indeed quite amenable to such techniques. Unfortunately,
the details will have to wait for a future paper.
3.4 Homomorphism
For a theory T = (S,F,E), with instances T1 and T2 , the homomorphism of T
consists of
1. a function mapping the carrier of T1 to that of T2 ,
2. a set of axioms asserting that operations (i.e. elements of F) are preserved.
Our definition of homomorphism is parameterized by the instances T1 and T2 .
The parameters of T, if waist > 0, are lifted out as parameters to the resulting
homomorphism, and used to define the instances of the theory.
66 J. Carette et al.
3.5 Discussion
The above are a small sample of what can be done. We’ve found at least 30
constructions that should be amenable to such a treatment and are currently
implementing them, including quotient algebras and induction axioms. Figure 2
shows the generated constructions. The input is the theory of Monoid represented
as a Tog record type (illustrated on the left with the blue background). For
this, we generate the four constructions discussed above (illustrated with pink
background). The names of carriers A1 and A2 , names of instances Mo1 and Mo2
are machine generated based on the names used by the input theory, which are
given by the user. A somehow unpleasant restriction is that all field names need
to be distinct, even if the fields belong to different records. That is the reason
we have names like eL in MonoidLang and eS in MonoidSig. This is still a minor
inconvenience, given that we are working on an abstract level, from which more
readable and usable code will be generated.
4 Related Work
Many algebraic hierarchies have been developed before. [18] documents the devel-
opment of the algebra needed for proving the fundamental theorem of algebra.
[17] formalizes the same knowledge in Coq, but suggests a packaging struc-
ture alternative to telescopes, to support multiple inheritance. [11] addresses
the important problem of library maintainability, especially when dealing with
changes to the hierarchy. We have proposed an alternate solution in [9], based
on the categorical structures already present in dependent type theories.
Leveraging the Information Contained in Theory Presentations 67
Fig. 2. The generated constructions from Monoid theory (Color figure online)
References
1. Al-hassy, M., Carette, J., Kahl, W.: A language feature to unbundle data at will
(short paper). In: Proceedings of the 18th ACM SIGPLAN International Confer-
ence on Generative Programming: Concepts and Experiences, GPCE 2019, pp.
14–19. ACM, New York (2019)
2. Blöndal, B., Löh, A., Scott, R.: Deriving via: or, how to turn hand-written
instances into an anti-pattern. In: Proceedings of the 11th ACM SIGPLAN Interna-
tional Symposium on Haskell, Haskell 2018, pp. 55–67. Association for Computing
Machinery, New York (2018)
Leveraging the Information Contained in Theory Presentations 69
3. Capretta, V.: Universal algebra in type theory. In: Bertot, Y., Dowek, G., Théry,
L., Hirschowitz, A., Paulin, C. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 131–148.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48256-3 10
4. Carette, J., Elsheikh, M., Smith, S.: A generative geometric kernel. In: Proceed-
ings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program
Manipulation, pp. 53–62. ACM (2011)
5. Carette, J., et al.: The MathScheme library: Some preliminary experiments. arXiv
preprint arXiv:1106.1862, June 2011
6. Carette, J., Farmer, W.M., Kohlhase, M., Rabe, F.: Big math and the one-brain
barrier a position paper and architecture proposal. arXiv preprint arXiv:1904.10405
(2019)
7. Carette, J., Kiselyov, O., Shan, C.: Finally tagless, partially evaluated: tagless
staged interpreters for simpler typed languages. J. Funct. Program. 19(5), 509–
543 (2009)
8. Carette, J., O’Connor, R.: Theory presentation combinators. In: Jeuring, J., Camp-
bell, J.A., Carette, J., Dos Reis, G., Sojka, P., Wenzel, M., Sorge, V. (eds.) CICM
2012. LNCS (LNAI), vol. 7362, pp. 202–215. Springer, Heidelberg (2012). https://
doi.org/10.1007/978-3-642-31374-5 14
9. Carette, J., O’Connor, R., Sharoda, Y.: Building on the diamonds between theories:
theory presentation combinators. arXiv preprint arXiv:1812.08079 (2018)
10. Clavel, M., Eker, S., Lincoln, P., Meseguer, J.: Principles of Maude. In: Meseguer,
J. (ed.) Proceedings of the First International Workshop on Rewriting Logic, vol.
4, pp. 65–89 (1996)
11. Cohen, C., Sakaguchi, K., Tassi, E.: Hierarchy builder: algebraic hierarchies made
easy in Coq with Elpi. https://hal.inria.fr/hal-02478907 (2020). working paper or
preprint
12. The Mathlib Community. The lean mathematical library. arXiv preprint
arXiv: 1910.09336 (2019).
13. de Bruijn, N.G.: Telescopic mappings in typed lambda calculus. Inf. Comput.
91(2), 189–204 (1991)
14. Denecke, K., Wismath, S.L.: Universal Algebra and Applications in Theoretical
Computer Science. Taylor & Francis, New York (2002)
15. Ehrig, H., Mahr, B.: Fundamentals of Algebraic Specification 1: Equations and
Initial Semantics. Monographs in Theoretical Computer Science. An EATCS Series.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-69962-7
16. Farmer, W.M., Guttman, J.D., Javier Thayer, F.: Little theories. In: Kapur, D.
(ed.) CADE 1992. LNCS, vol. 607, pp. 567–581. Springer, Heidelberg (1992).
https://doi.org/10.1007/3-540-55602-8 192
17. Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical
structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs
2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). https://doi.org/
10.1007/978-3-642-03359-9 23
18. Geuvers, H., Pollack, R., Wiedijk, F., Zwanenburg, J.: A constructive algebraic
hierarchy in Coq. J. Symb. Comput. 34(4), 271–286 (2002)
19. Gross, J., Chlipala, A., Spivak, D.I.: Experience implementing a performant
category-theory library in Coq. In: Klein, G., Gamboa, R. (eds.) ITP 2014. LNCS,
vol. 8558, pp. 275–291. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
08970-6 18
20. Gunther, E., Gadea, A., Pagano, M.: Formalization of universal algebra in Agda.
Electron. Not. Theor. Comput. Sci. 338, 147–166 (2018). The 12th Workshop on
Logical and Semantic Frameworks, with Applications (LSFA 2017)
70 J. Carette et al.
Mario Carneiro(B)
1 Introduction
The idea of using computers to check mathematical statements has been around
almost as long as computers themselves, but the scope of formalizations have
grown in recent times, both in pure mathematics and software verification, and
it now seems that there is nothing that is really beyond our reach if we aim
for it. But at the same time, software faces a crisis of correctness, where more
powerful systems lead to more reliance on computers and higher stakes for failure.
Software verification stands poised to solve this problem, providing a high level
of certainty in correctness for critical components.
But software verification systems are themselves critical components, partic-
ularly the popular and effective ones. A proof in such a system is only as good
as the software that checks it. How can we bootstrap trust in our systems?
This paper presents a formal system, called Metamath Zero (MM0), which
aims to fill this gap, having both a simple extensible logical theory and a straight-
forward yet efficient proof format. Work to prove the correctness theorem is
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 71–88, 2020.
https://doi.org/10.1007/978-3-030-53518-6_5
72 M. Carneiro
ongoing, but this paper explains the design of the system and how it relates to
other theorem provers, as well as general considerations for any bootstrapping
theorem prover.
1. It should be proven correct down to the lowest possible level. Some options
for the lowest level include:
(a) a logical rendering of the code;
(b) the code itself, inside a logical rendering of the language;
(c) the machine code, specified given an ISA (instruction set architecture);
(d) the computer, down to the logic gates that make it up;
(e) the fabrication process relative to some electrical or physical model.
2. It should permit the user to prove any theorem they like (including specifying
the axiom system of interest).
3. It should permit the user to write any program they like and prove any
theorem about that program.
4. There should be no practical upper limit on the complexity of the target
programs, and they should be able to run as fast as the machine is capable.
5. It should be fast.
6. It should be easy to use.
While there is no theoretical reason not to push (1) all the way to level (e), the
drawback of the most aggressive levels (d) and (e) is that it limits redistribution
capabilities. If you prove a system correct at level (e), the proof only holds if
you use the given fabrication process, and similarly (d) only holds if you use the
given chip design. A proof relative to (c) holds as long as the “reader” has the
Metamath Zero 73
same ISA as the author, so if we pick a relatively popular ISA then we can write
proofs that are independently verifiable by a large segment of the population.
For the MM0 project, we target (c) with the Intel x86-64 architecture on Linux.
(This does put the OS in the trusted base, but it is not possible to do otherwise
for a regular, user-mode application, which is again important for distribution.
We can at least keep interaction with the OS to a minimum and formally specify
what we expect from the OS, such that a “bare-metal” version of the verifier is
largely the same.)
To satisfy (2), MM0 is a logical framework, with “pluggable axioms” to sup-
port any desired mathematical foundation. To satisfy (3) and (4), we imple-
mented, in MM0, a specification for x86-64, so that users can write any program
they like. To satisfy (5), proofs will be laid out such that checking them is as
straightforward as possible, roughly linear time, while keeping the de Bruijn
factor down.
(6) is a subjective criterion, and also one we are willing to compromise on in
favor of the others. Nevertheless some degree of ease of use is needed in order
to get enough done to achieve the other goals. To that end, MM0 the verifier
has a front end MM1, which provides some ITP (interactive theorem prover)
features, in an unverified setting. This is an extension of the LCF-style prover
architecture to completely separate and isolate the verifier from the prover.
In particular, a compiler for producing verified programs would sit at this
unverified prover level, producing machine code from high level code so that
users don’t have to write machine code. This compromises (4) to some extent if
the compiler produces poor code, but that has an obvious mitigation.
The trusted components in this architecture are the verifier, and the .mm0
file containing the statements of the theorems. Additionally one has to trust
that the text file is faithfully shown to the reader, and the reader understands
the content of the file. As such, the .mm0 file format balances human-readability
with simplicity of the verifier implementation required to parse it and validate
it against the data in the proof file.
The remainder of the paper discusses the various components of this process.
Section 2 describes the logical framework in which theorems are proved, Sect. 2.1
describes the specification format, Sect. 3 describes the proof format, and Sect. 4
discusses how MM0 proof objects can be generated. Section 5 shows work done
to connect MM0 to other proof languages. Section 6 discusses progress towards
proving the verifier implementation correctness theorem.
Sorts. An MM0 file declares a (finite) collection of sorts. Every expression has
a unique sort, and an expression can only be substituted for a variable of the
same sort. There are no type constructors or function types, so the type system
is finite. (Higher order functions are mimicked using open terms, see Sect. 2.)
Variables. MM0 distinguishes between two different kinds of variables. One may
variously be called names, first order variables or bound/binding variables. These
play the role of “variable variables” from Metamath, and will be denoted in this
paper with letters x, y, z, . . . . They are essentially names that may be bound
by quantifiers internal to the logic. “Substitution” of names is α-conversion;
expressions cannot be substituted directly for names, although axioms may be
used to implement this action indirectly. The other kind of variable may be called
76 M. Carneiro
Fig. 2. MM0 syntax and well formedness judgments. · denotes iteration or lists, and
ei denotes the ith element of e. The Γ ctx, Γ e : s, Γ e :: Γ , and δ ok judgments
are parameterized over a fixed global environment E. (f (Γ ) : s x) ∈ E means there is
a term or def in E with this signature. See Fig. 3 for the definition of Γ ; A B.
Metamath Zero 77
a (schematic) metavariable or second order variable, and these may not be bound
by quantifiers; they are always implicitly universally quantified and held fixed
within a single theorem, but unlike names, they may be directly substituted for
an expression. We use ϕ, ψ, χ, . . . to denote schematic metavariables.
In FOL, notations like ϕ(x̄) are often used to indicate that a metavariable is
explicitly permitted to depend on the variables x̄, and sometimes but not always
additional “parameter” variables not under consideration. In MM0, we use a
binder ϕ : s x, where s is the sort and x are the dependencies of ϕ, to indicate
that ϕ represents an open term that may reference the variables x declared in
the context. Such a variable may also be glossed as a pre-applied higher order
variable; for example a variable of type ϕ : wff x can be interpreted as a predicate
P : U → bool where every occurrence of ϕ in the statement is replaced with P x.
and V(all y ψ) = {y} ∪ V(ψ). It is easy to see that FV(e) ⊆ V(e) generally;
that is, every free variable in an expression e is present in e. Metamath, and
Metamath Zero, take the somewhat unorthodox approach of using V instead of
FV in the definition of an admissible substitution (the side condition ∀i j x, Γi =
x∈ / VΓ (Γj ) → ei ∈
/ VΓ (ej ) in the theorem application rule in Fig. 3, which says
in words that if Γj is a variable in the context that is not declared to depend on
x, then the substitution for Γj cannot contain the name that is being substituted
for x), but this is sound because if e does not contain any occurrence of x then it
clearly does not contain a free occurrence of x. This is done because V is faster
to compute than FV, and α-conversion in the logic can make up the difference.
ϕ ψ : wff; · ϕ → ψ → ϕ
ϕ ψ χ : wff; · (ϕ → ψ → χ) → (ϕ → ψ) → (ϕ → χ)
ϕ ψ : wff; · (¬ϕ → ¬ψ) → (ψ → ϕ)
ϕ ψ : wff; ϕ → ψ, ϕ ψ
x : var, ϕ ψ : wff x; · ∀x (ϕ → ψ) → (∀x ϕ → ∀x ψ)
x : var, ϕ : wff; · ϕ → ∀x ϕ
Notice that ϕ has type wff x in the first theorem and wff in the second, even
though x appears in both statements. This indicates that in the first theorem
ϕ may be substituted with an open term such as x < 2, while in the second
theorem ϕ must not contain an occurrence of x (not even a bound occurrence).
Proofs and Convertibility. Metamath has only the first and third rules of
Fig. 3: the hypothesis rule, and the application of a theorem after (direct) admis-
sible substitution. Metamath Zero adds the second rule, which consists only of
definition unfolding and compatibility rules.
The rule for thm (Γ ; A B) ok allows additional dummy variables y : s to
be used in the proof, as long as they do not appear in the statement (A and
B must not mention y). This in particular implies that all sorts are nonempty.
(The free sort modifier allows us to relax this constraint; see [5].)
Metamath Zero 79
assuming the sorts var, wff, the terms imp and all, and notations -> and A.
for them have been previously declared.
As its name implies, the .mm0 specification file is only about specifying axioms
and theorems, so it does not contain any proofs. Axioms and theorems look
exactly the same except for the keyword used to introduce them. This is an
unusual choice for a theorem prover, although some systems like Mizar and
Isabelle support exporting an “abstract” of the development, with proofs omit-
ted. We do this so that there is a clean separation between the trusted part (the
statements of the theorems) and the verified part (the proofs of the theorems).
We can do something similar with definitions. A definition requires a definiens
in Fig. 2, but we can instead write a definition with no definiens, so that it looks
just like a term declaration. This allows us to assert the existence of a term
constructor which satisfies any theorems that follow, which gives us a kind of
abstraction. Sometimes it is easier to write down characteristic equations for
a function rather than an explicit definition, especially in the case of recursive
functions.
Once one is committed to not proving theorems in the specification file, it
is able to shrink dramatically, because theorems never reference each other, and
only reference terms and definitions involved in their statements. So if focus is
given to one theorem, then almost everything else goes away, and even in extreme
cases it becomes quite feasible to write down everything up to and including
the axiomatic framework. For example, if we specify Fermat’s last theorem, we
must define the natural numbers and exponentiation in the specification file, but
certainly not modular forms, which are properly the domain of the proof file.
Having a precise language for specifying formal statements is nice, but it is most
powerful when coupled with a method for proving those formal statements. We
have indicated several times now design decisions that were made for efficiency
reasons. By spoon-feeding the verifier a very explicit proof, we end up doing
a lot less computation, and by deduplicating and working directly with dag-
like expressions at all stages, we can avoid all the exponential blowups that
happen in unification. (As we will see in Sect. 4, the user does not have to write
these proofs directly. It is expected that they are compiled from a more human-
friendly input.) Using these techniques, we managed to translate set.mm into
MM0 (see Sect. 5) and verify the resulting binary proof file in 195 ± 5 ms (Intel
i7 3.9 GHz, single threaded). While set.mm is formidable, at 34 MB/590 kLOC,
we are planning to scale up to larger or less optimized formal libraries to see if
it is competitive even on more adversarial inputs.
The proof file is designed to be manipulated in situ; it does not need to be
processed into memory structures, as it is already organized like one. It contains
a header, the term and theorem tables, and the declaration stream, followed by
debugging data.
The term table and theorem table contain the statements of all theorems and
the types of all term constructors. These tables are consulted during typecheck-
ing, and the verifier uses a counter as a sliding window into the table to mark
what part of the table has been verified (and thus is usable). This means that a
term lookup is generally a single indexed memory access, usually in cache, which
makes type checking for expressions extremely fast in practice.
2
https://github.com/digama0/mm0/tree/master/mm0-c.
Metamath Zero 81
After the term and theorem tables is the declaration stream, which validates
each declaration in the .mm0 file, possibly interspersed with additional definitions
and theorems. This data is processed in a single pass, and contains in particular
proofs of theorems. A proof stream is a sequence of opcodes (see [5] for the
full grammar) with associated data. Each instruction changes the state of the
verifier, roughly in one-to-one correspondence with the proof rules in Fig. 3, and
at the end the verifier should have a state indicating that the desired theorem
has been proven.
During a proof, the verifier state consists of a store (a write-once memory
arena that is cleared after each proof) which builds up pointer data structures
for constructed expressions, a heap H, and a stack S. A stack element can be
either an expression e or a proof A, both of which are simply pointers into
the store where the relevant expression is stored. (There are also stack elements
corresponding to convertibility proofs, which we will not discuss.)
At the beginning of a proof, the heap is initialized with expressions for all the
variables. An opcode like Term f will pop n elements e from the stack, and push
f e, while Ref i will push H[i] to the stack. The verifier is arranged such that an
expression is always accessed via backreference if it is required more than once,
so equality testing is always O(1).
The opcode Thm T pops e from the stack (the number of variables in the
theorem), pops B from the stack (the substituted conclusion of the theorem),
then calls a unifier for T , stored in the theorem table for T , which is another
sequence of opcodes. This will pop some number of additional A assumptions
from the stack, and then B is pushed on the stack.
The unifier is responsible for deconstructing B and proving that B[Γ →
e] = B , where B and Γ are fixed from the definition of T , and e and B
are provided by the theorem application. It has its own stack K and heap U ;
the unify heap is the incoming substitution, and the unify stack is the list of
unification obligations. For example URef i pops e from the stack and checks
that U [i] = e, while UTerm f pops an expression e from the unify stack, checks
that e = f e , and then pushes e on the stack (in reverse order). The appropriate
list of opcodes can be easily constructed for a given expression by reading the
term in prefix order, with UTerm at each term constructor and URef for variables.
The UHyp instruction pops A from the main stack S and pushes A to the
unify stack K; this is how the theorem signals that it needs a hypothesis.
The handling of memory is interesting in that all allocations are controlled
by the compiler in the sense that they happen only on Term f and Dummy s
steps (Dummy s puts a new variable on the heap and stack). There is no “auto-
allocation” during substitution because unification only deconstructs expres-
sions, it does not create new ones. This means that the compiler can preprocess
the proof to ensure that every equality test is a pointer equality, by only con-
structing the term on first use and referring back to it on subsequent uses. So the
verifier can assume that the compiler has already done so and reject files that
aren’t prepared in this way, achieving the aforementioned O(1) comparison.
82 M. Carneiro
Verification is not quite linear time, because each Thm T instruction causes
the verifier to read the unifier for T , which may be large if T has a long statement.
It is O(mn) where n is the length of the proof and m is the length of the longest
theorem statement. In practice this is essentially linear time, because it is rare to
have theorems with long statements, and even rarer to use them so many times
in a single proof.
One may think that the compilation process for such an intricately prepared
proof would be difficult, but assuming that proof trees are stored as tree data
structures in the usual way, the process is essentially hash-consing to deduplicate
the tree, followed by a postorder traversal of the proof to produce the proof
stream and a preorder traversal of the statement to produce the unify stream
for the theorem. (See [5] for an example.)
.mmb directly. (Note that tactics can make this number much less favorable,
depending on how complex and expensive they are. Our aim is to get a compiler
roughly comparable to a simple unoptimizing C compiler, so that execution time
is reasonable even with proof production.)
Here we see an important reason for speed: the faster the server can read and
execute the file, the faster the response time to live features like diagnostics that
the user is relying on for making progress through the proof. The MM1 language
also contains a Turing-complete meta-programming language based on Scheme.
It is intended for writing small “tactics” that construct proofs. Besides a few
small quality-of-life improvements, we used it to implement a general algorithm
for proving congruence lemmas (theorems of the form A = B → f (A) = f (B))
for all new definitions.
Support for multi-file developments is as yet nascent, but it is worth men-
tioning that besides other .mm1 files, an .mm1 file can import “compiled” .mmb
files (from an .mm1 source or even generated from another source, such as a large
scale tactic), which provides a way to isolate components and only compile as
needed. It is possible to do much more in this direction, but the need is not
pressing as end-to-end compiles are fast enough for interactive use.
While MM1 has a long way to go to compete with heavyweights in the theo-
rem proving world like Coq, Isabelle, or Lean, we believe this to be an effective
demonstration that even a parsimonious language like Metamath or MM0 can
be used as the backend to a theorem prover, and “all” that is necessary is a bit
of UI support to add features like a type system, a tactic language, unification,
and inference.
6 Bootstrapping
5
https://github.com/digama0/mm0/blob/master/mm0-lean/mm0/set/post.lean.
6
https://github.com/digama0/mm0/blob/master/examples/mm0.mm0.
7
https://github.com/digama0/mm0/blob/master/examples/x86.mm0.
8
https://github.com/digama0/mm0/blob/master/examples/x86 mm0.mm0.
Metamath Zero 85
7 Related Work
The idea of a bootstrapping theorem prover is not new. There are a number of
notable projects in this space, many of which have influenced the design of MM0.
However, none of these projects seem to have recognized (in words or actions)
the value of parsimony, specifically as it relates to bootstrapping.
At its heart, a theorem prover that proves it is correct is a type of circular
proof. While a proof of correctness can significantly amplify our confidence that
we haven’t missed any bugs, we must eventually turn to other methods to ground
the argument, and direct inspection is always the fallback. But the effectiveness
of direct inspection is inversely proportional to the size of the artifact, so the
only way to make a bootstrap argument more airtight is to make it smaller.
The most closely related projects, in terms of bootstrapping a theorem prover
down to machine code, are CakeML and Milawa.
– CakeML [15] is a compiler for ML that is written in the logic of HOL4 [23],
and HOL4 is a theorem prover written in ML. Unfortunately, the ML that
CakeML supports is not sufficient for HOL4, and while a simpler kernel,
called Candle, has been implemented in CakeML, it supports a variant of
HOL Light, not HOL4.
– Milawa [8] is a theorem prover based on ACL2, which has a sequence of
verifiers Ai Ai+1 with A12 ‘A0 is correct . This project was later extended
by Magnus Myreen to Jitawa [20], a Lisp runtime that was verified in HOL4
down to the machine code and can run Milawa.
There are a few other projects that have done bootstraps at the logic level:
– “Coq in Coq” (1996) [2] is a formalization of Calculus of Constructions and
a typechecker thereof in Coq. Unfortunately, this lacks inductive types, so it
fails to “close the loop” of the bootstrap.
– “Towards self-verification of HOL Light” (2006) [11] writes down a transla-
tion of the HOL Light kernel (written in OCaml) in HOL Light, and proves
soundness given additional axioms. This leaves off verification of OCaml (in
fact OCaml is known to break soundness), and the translation from OCaml
code to HOL Light definitions is unverified and slightly nontrivial in places.
86 M. Carneiro
The MM0 project draws from ideas in a number of fields, most of which have
long histories and many contributors.
8 Conclusion
Metamath Zero is a theorem prover built to solve the problem of bootstrapping
trust into a system. It is general purpose, so it can support all common formal
systems (ZFC, HOL, DTT, PA, really anything recursively enumerable). It is
extremely fast, at least on hand-written inputs like set.mm, and is built to handle
computer-science-sized problems.
Although the correctness theorem for MM0 is still ongoing, we believe there
is value added in clearly delineating the necessary components for a system that
pushes the boundaries of formal verification to cover as much as possible, so that
we can have programs that are both fast and correct.
We hope to see a future where all the major theorem provers are either
proven correct or can export their proofs to systems that are proven correct, so
that when we verify our most important software, we bequeath the highest level
of confidence we are capable of providing. It’s not an impossible dream—the
technology is in our hands; we need only define the problem, and solve it.
Acknowledgments. I would like to thank Norman Megill for writing Metamath, and
André Bacci, Wolf Lammen, David A. Wheeler, Giovanni Mascellani, Seul Baek, and
Jeremy Avigad for their input and suggestions during the design phase of MM0. I
thank Jeremy Avigad, Jesse Han, Benoı̂t Jubin, and the anonymous reviewers for their
reviews of early versions of this work.
This work was supported in part by AFOSR grant FA9550-18-1-0120 and a grant
from the Sloan Foundation.
Metamath Zero 87
References
1. Armstrong, A., et al.: ISA semantics for ARMv8-A, RISC-V, and CHERI-MIPS.
In: Proceedings of 46th ACM SIGPLAN Symposium on Principles of Programming
Languages, January 2019. https://doi.org/10.1145/3290384. Proc. ACM Program.
Lang. 3(POPL), Article 71
2. Barras, B.: Coq en coq (1996)
3. Berghofer, S., Nipkow, T.: Proof terms for simply typed higher order logic. In:
Aagaard, M., Harrison, J. (eds.) TPHOLs 2000. LNCS, vol. 1869, pp. 38–52.
Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44659-1 3
4. Cao, Q., Beringer, L., Gruetter, S., Dodds, J., Appel, A.W.: VST-FLOYD: a sep-
aration logic tool to verify correctness of C programs. J. Autom. Reason. 61(1–4),
367–422 (2018)
5. Carneiro, M.: Metamath Zero: The Cartesian Theorem Prover (2019, preprint)
6. Carneiro, M.: Specifying verified x86 software from scratch. In: Workshop on
Instruction Set Architecture Specification (SpISA 2019) (2019). https://www.cl.
cam.ac.uk/∼jrh13/spisa19/paper 07.pdf
7. Dasgupta, S., Park, D., Kasampalis, T., Adve, V.S., Roşu, G.: A complete for-
mal semantics of x86-64 user-level instruction set architecture. In: Proceedings
of the 40th ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI 2019), pp. 1133–1148. ACM, June 2019. https://doi.org/
10.1145/3314221.3314601
8. Davis, J.C., Moore, J.S.: A self-verifying theorem prover. Ph.D. thesis, University
of Texas (2009)
9. Goel, S., Slobodova, A., Sumners, R., Swords, S.: Verifying x86 instruction imple-
mentations. In: Proceedings of the 9th ACM SIGPLAN International Conference
on Certified Programs and Proofs, pp. 47–60 (2020)
10. Haftmann, F.: Code generation from Isabelle/HOL theories
11. Harrison, J.: Towards self-verification of HOL light. In: Furbach, U., Shankar, N.
(eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 177–191. Springer, Heidelberg
(2006). https://doi.org/10.1007/11814771 17
12. Jung, R., Jourdan, J.H., Krebbers, R., Dreyer, D.: Rustbelt: securing the founda-
tions of the rust programming language. Proc. ACM Program. Lang. 2(POPL),
1–34 (2017)
13. Jung, R., Krebbers, R., Jourdan, J.H., Bizjak, A., Birkedal, L., Dreyer, D.: Iris
from the ground up: a modular foundation for higher-order concurrent separation
logic. J. Funct. Program. 28 (2018)
14. Kumar, R., Mullen, E., Tatlock, Z., Myreen, M.O.: Software verification with ITPs
should use binary code extraction to reduce the TCB. In: Avigad, J., Mahboubi, A.
(eds.) ITP 2018. LNCS, vol. 10895, pp. 362–369. Springer, Cham (2018). https://
doi.org/10.1007/978-3-319-94821-8 21
15. Kumar, R., Myreen, M.O., Norrish, M., Owens, S.: CakeML: a verified implemen-
tation of ML. SIGPLAN Not. 49(1), 179–191 (2014). https://doi.org/10.1145/
2578855.2535841
16. Leroy, X., et al.: The compcert verified compiler. Documentation and user’s man-
ual. INRIA Paris-Rocquencourt 53 (2012)
17. Letouzey, P.: Extraction in Coq: an overview. In: Beckmann, A., Dimitracopoulos,
C., Löwe, B. (eds.) CiE 2008. LNCS, vol. 5028, pp. 359–369. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-69407-6 39
88 M. Carneiro
18. Megill, N., Wheeler, D.A.: Metamath: A Computer Language for Mathematical
Proofs. Lulu Press, Morrisville (2019)
19. Myreen, M.O.: Formal verification of machine-code programs. Technical report,
University of Cambridge, Computer Laboratory (2009)
20. Myreen, M.O., Davis, J.: A verified runtime for a verified theorem prover. In:
van Eekelen, M., Geuvers, H., Schmaltz, J., Wiedijk, F. (eds.) ITP 2011. LNCS,
vol. 6898, pp. 265–280. Springer, Heidelberg (2011). https://doi.org/10.1007/978-
3-642-22863-6 20
21. Roşu, G., Şerbănuţă, T.F.: An overview of the K semantic framework. J. Logic
Algebraic Program. 79(6), 397–434 (2010). https://doi.org/10.1016/j.jlap.2010.03.
012
22. Sacerdoti Coen, C.: A plugin to export Coq libraries to XML. In: Kaliszyk, C.,
Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.) CICM 2019. LNCS (LNAI),
vol. 11617, pp. 243–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-
030-23250-4 17
23. Slind, K., Norrish, M.: A brief overview of HOL4. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 28–32. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-71067-7 6
24. Sozeau, M., Forster, Y., Winterhalter, T.: Coq Coq correct!
Adding an Abstraction Barrier to
ZF Set Theory
1 Introduction
1.1 Background: Set Theory and Type Theory as Foundations
A large portion of the mathematical literature is based on set theory, explicitly or
implicitly, directly or indirectly. Set theory is pervasive in mathematical culture.
University mathematics programmes have introductory courses on set theory and
many other courses that rely heavily on set-theoretic concepts (sets, classes, etc.),
notation (comprehensions a.k.a. set-builders, power set, etc.), and reasoning.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 89–104, 2020.
https://doi.org/10.1007/978-3-030-53518-6_6
90 C. Dunne et al.
Formal foundations for mathematics have been developed since the early 20th
century, with both set-theoretic and type-theoretic approaches being considered.
Although there are a number of set-theoretic foundations, for this paper it is suf-
ficient to consider Zermelo-Fraenkel set theory (ZF), which anyway seems to be
broadly accepted and reasonably representative of the strengths and weaknesses
of set theory in actual practice. The core concept of ZF is the set membership
relation ∈, which acts on a domain of objects called sets. The theory is a col-
lection of formulas (known as axioms) of first-order logic which characterise the
membership relation. Logical deduction from these axioms yields a rich theory
of sets. Moreover, mathematical objects such as ordered pairs, functions, and
numbers can be represented as sets in ZF.
At roughly the same time as Zermelo was formulating his axiomatic set the-
ory, Russell introduced the first type theory. Both Zermelo and Russell had the
goal of rigorous, formal, logical reasoning free from the paradoxes that plagued
the earlier systems of Cantor and Frege. Most modern type theories are descen-
dants of Church’s typed λ-calculus [9]. Many of the methods of modern type
theory have been developed by computer scientists to solve problems in program-
ming languages and formal verification. Types add layers of reasoning that help
with soundness and representation independence. Some type theories have been
used to formulate foundations of mathematics in which mathematical objects
(e.g., groups, rings, etc.) are represented by terms and types of what is essen-
tially a very fancy typed λ-calculus.
Formalizing mathematics that has been developed in a set-theoretic culture
using a type-theoretic foundation can lead to dilemmas and frustration [6]. Sub-
typing may not work smoothly when formalising chains of structures such as
the number systems and those belonging to universal algebra. There are also
design choices in how to model predicates which can make proving some things
easier but other things much harder. The rules of powerful type systems are also
very complicated, so users require machine assistance to follow the typing rules,
and even with machine support it can be quite challenging. In contrast, ZF-like
set theories typically have very few ‘types’, e.g., there might be a type of sets
and a type of logical formulas or perhaps a type of classes. When nearly every
mathematical object you need is of ‘type set’ it is easy to obey the typing rules.
There are problems formalizing mathematics in pure ZF set theory also.
When everything is of ‘type set’, a computer proof system has no easy way to
know that it would be wasting its time to try to prove a theorem about ordinal
numbers using lemmas and tactics for groups or rings, so automated support
is more challenging. When representing mathematical objects (e.g., numbers)
as sets, the bookkeeping of the intended ‘type’ of these objects is not avoided,
but must be managed by the user outside the realm of a type system. In many
not-too-tricky cases, a type inference algorithm can automatically infer type
information that represents necessary preconditions for successful use of theo-
rems and lemmas, but in pure set theory such automated inference is not very
useful when the only type is ‘set’.
Adding an Abstraction Barrier to ZF Set Theory 91
1.2 The Issue of Representation and the Case of the Ordered Pair
∀x, y : (∀a : a ∈ x ↔ a ∈ y) → x = y
which asserts that any two objects are equal if they have exactly the same set
members. Because non-set objects of course have no set members, this ZF axiom
forces them all to equal the empty set, meaning there can not be any.
Existing set theories with urelements generally (except see GST below) do
not consider urelements with ‘internal’ structure that might include sets. The
ordered pair is a simple and important example of a mathematical object with
‘internal’ structure which is not usually intended to be viewed as a set. Ordered
pairs have been of enormous value in building theories of relations, functions, and
spaces. The most widely used set-theoretical definition, by Kuratowski, defines
the ordered pair a, b to be the set {{a}, {a, b}}. Because a is in all sets in
a, b and b is only in one, a first-order logic formula using only the membership
relation can check if an object is the first (or second) projection of an ordered
pair. Kuratowski pairs satisfy the characteristic property of ordered pairs:
a, b = c, d ↔ (a = c ∧ b = d)
92 C. Dunne et al.
which may only return true when their second argument is of the correct ‘type’.
This proof has been machine-checked in Isabelle/ZF.1
Although our model for ZFP is built purely of sets and implements ordered
pairs as sets, another model could use other methods (e.g., type-theoretic) and
implement ordered pairs differently. Hence, we have put an ‘abstraction barrier’
between the user of ZFP and the implementation of ordered pairs.
1.5 Outline
Section 2 presents and discusses the first-order logic we use and definitions and
axioms of ZF. Section 3 presents and discusses ZFP in the form of definitions
and two collections of axioms, one for sets, and one for ordered pairs. Section 4
proves the existence in ZF of a model for the axioms of ZFP (which implies that
ZFP is consistent if ZF is). Section 5 discusses the significance of these results,
and how they will be used in further investigation.
2 Formal Machinery
Let X := Y be meta-level notation meaning that X stands for Y .
1
See http://www.macs.hw.ac.uk/∼cmd1/cicm2020/ZFP.thy for the source, and
http://www.macs.hw.ac.uk/∼cmd1/cicm2020/ZFPDoc/index.html for the HTML.
94 C. Dunne et al.
Definition 2.1. The axioms of ZF are all the instances of the following formu-
las for every formula ϕ with free variables at most a, b, c1 and c2 .
1. Extensionality: ∀x, y : (∀a : a ∈ x ↔ a ∈ y) → x = y
2. Union: ∀x : ∃y : ∀a : a ∈ y ↔ (∃z ∈ x : a ∈ z)
3. Power Set: ∀x : ∃y : ∀z : z ∈ y ↔ z ⊆ x
4. Infinity (ugly version; pretty version below): ∃y : (∃ z ∈ y : ∀ b : b ∈
/ z) ∧ (∀ x ∈
y : ∃ s ∈ y : ∀ c : c ∈ s ↔ (c ∈ x ∨ c = x))
5. Replacement: ∀c1 , c2 , x : (∀a ∈ x : ∃!b : ϕ) → (∃y : ∀b : b ∈ y ↔ ∃a ∈ x : ϕ)
6. Foundation: ∀x : x = ∅ ∨ (∃y ∈ x : ¬∃b ∈ x : b ∈ y)
The axioms are due to Zermelo, except for Replacement which is due to
Fraenkel and Skolem [3] and Foundation which is due to Von Neumann. Exten-
sionality asserts that sets are equal iff they contain the same members. Union
and Power Set state that ∪ X and P(X) are defined if X is defined; this implies
the domain of discourse is closed under ∪ and P. Infinity states that there exists
a set containing ∅ which is closed under the ordinal successor operation; from
this we can extract the Von Neumann natural numbers N. Here is a prettier
presentation of Infinity that we do not use as the axiom to avoid bootstrap
confusion3 :
∃y : ∅ ∈ y ∧ (∀x ∈ y : x+ ∈ y)
The powerful infinite axiom schema Replacement asserts the existence of the
range of a function determined by any formula ϕ where the values of the variables
a and b that make ϕ true have a functional dependency of b on a and where the
domain of the function exists as a set. Foundation enforces the policy that there
are no infinite descending chains of the form X0 X1 · · · .
Lemma 2.2. The following theorems of ZF are often presented as axioms. For
every formula ϕ such that any free variable must be a, the following hold in ZF:
1. Empty Set: ∃x : ∀b : b ∈ x
2. Pairing: ∀a, b : ∃x : ∀c : (c ∈ x ↔ (c = a ∨ c = b))
3. Specification: ∀x : ∃y : ∀a : (a ∈ y ↔ (a ∈ x ∧ ϕ))
We call a and b the first and second projections of a, b respectively. The first
projection of an ordered pair q is in all sets in q, whereas the second is only in
one.4 The projection relations π1 and π2 only give meaningful results when the
set Q on the right side of the relation is an ordered pair, i.e., this holds:
Kuratowski ordered pairs are sets and have set members that are distinct from
their projections. In fact, no matter which representation we use, there will
always exist some x such that x ∈ a, b (for all but at most one ordered pair
which can be represented by ∅). If A and B are defined, we can show the cartesian
product A × B is defined using Replacement nested inside Replacement5 :
A × B = ∪ { z | ∃ c ∈ A : z = { p | ∃ d ∈ B : p = c, d } }
3 Extending ZF to ZFP
This section introduces Zermelo-Fraenkel Set Theory with Ordered Pairs (ZFP),
a set theory with primitive non-set ordered pairs. ZFP axiomatises the member-
ship predicate symbol ∈ similarly to ZF. The ordered pair projection predicate
symbols π1 and π2 are axiomatised in ZFP instead of being abbreviations that
use ∈ as in ZF. Ordered pairs in ZFP qualify as urelements because they contain
no members via the set membership relation ∈, but they are unusual urelements
because they can contain arbitrary sets via the π1 and π2 relations.
4
This holds even in the case of a, a = {{a}, {a, a}} = {{a}}.
5
The traditional construction of A × B as { p ∈ P(P(A ∪ B)) | ∃c ∈ A, d ∈ B :
p = c, d } is only needed if the weaker Specification is preferred over Replacement.
We avoid the traditional construction because it depends on a set representation of
ordered pairs and thus will not work for ZFP.
Adding an Abstraction Barrier to ZF Set Theory 97
We reuse the text of the abbreviation definitions for ZF for {A, B}, X ∪ Y , {A},
and {A1 , . . . , An } where n ≥ 3. We redefine the following abbreviations a bit
differently for ZFP, where a, b, c, p, x, y, and z are not free in A, B, X and Y :
These abbreviations are defined if their arguments are defined due to the axioms.
Definition 3.1. The axioms of ZFP are all the instances of the following for-
mulas for every formula ϕ with free variables at most a, b, c1 , c2 .
– Sets
S1. Set Extensionality: ∀Set x, y : (∀a : a ∈ x ↔ a ∈ y) → x = y
S2. Union: ∀Set x : ∃y : ∀a : a ∈ y ↔ (∃z ∈ x : a ∈ z)
S3. Power Set: ∀Set x : ∃y : ∀z : z ∈ y ↔ z ⊆ x
S4. Infinity (ugly version): ∃y : (∃Set z ∈ y : ∀ b : b ∈ / z) ∧ (∀ x ∈ y : ∃ s ∈ y :
∀ c : c ∈ s ↔ (c ∈ x ∨ c = x)).
S5. Replacement: ∀c1 , c2 , x : (∀a ∈ x : ∃!b : ϕ) → (∃Set y : ∀b : b ∈ y ↔ ∃ a ∈ x : ϕ)
S6 Foundation: ∀Set x : x = ∅ ∨ (∃a ∈ x : ¬∃b ∈ x : b π1 a ∨ b π2 a ∨ b ∈ a)
– Ordered Pairs
P1. Ordered Pair Emptiness: ∀Pair p : ∀a : a ∈ /p
P2. Ordered Pair Formation: ∀a, b : ∃p : a π1 p ∧ b π2 p
P3. Projection Both-Or-Neither: ∀p : (∃a : a π1 p) ↔ (∃b : b π2 p)
P4. Projection Uniqueness: ∀Pair p : (∃!a : a π1 p) ∧ (∃!b : b π2 p)
P5. Ordered Pair Extensionality:
∀Pair p, q : (∀a : (a π1 p ↔ a π1 q) ∧ (a π2 p ↔ a π2 q)) → p = q
Lemma 3.2. For every formula ϕ such that any free variable must be a, the
following hold in ZFP:
For Lemma 3.2 (3), note that the cartesian product A×B can be built in ZFP
using the same construction given for ZF in Sect. 2.3, which does not depend on
any set representation of ordered pairs.
98 C. Dunne et al.
3.2 Discussion
Axioms for Sets. Each ZF axiom was transformed to make a ZFP axiom.
First, because we use abbreviations for more readable axioms, those used in
axioms needed to be modified for ZFP. The definition of ⊆ (used in Power Set)
was changed to ensure an ordered pair is neither a subset nor has a subset. The
definition of ∅ (used in Foundation) was changed to ensure a defined result.
Second, some occurrences of (∀ b : ψ) and (∃ b : ψ) needed to enforce that
ψ can be true only when b stands for a set. Where needed, such occurrences
were changed to (∀Set b : ψ) respectively (∃Set b : ψ). Each quantifier needed
individual consideration. If the sethood of b was already enforced by ψ only
being true when b has at least 1 set member, there was no need for a change but
a change might also clarify the axiom. If the truth of ψ was unaffected by any
set members of b, there was no need for a change and this generally indicated
that a change would go against the axiom’s intention. We needed to understand
the axiom’s intention and expected usage because it was not written to specify
where it is expected that ‘X is a set’ (because this always holds in ZF).
Finally, Foundation was extended to enforce a policy of no infinite descending
chains through not just ∈ but also π1 and π2 , so that ZF proofs using Kuratowski
ordered pairs (having no such chains) would continue to work in ZFP.
Consider the example of Power Set which states that for any set X there
exists a set Y containing all of the subsets of X and nothing else, i.e., P(X):
∀Set x : ∃y : ∀z : (z ∈ y ↔ z ⊆ x)
We could have left ∀Set x as ∀x, because when x is an ordered pair it would
act like ∅ and this would only add another reason that P(∅) exists. However,
we thought this would be obscure. It would not hurt to change ∃y to ∃Set y but
there is no need to do so because the body forces y to contain a set member and
hence rejects y being an ordered pair. We did not change ∀z to ∀Set z because
this would allow y to contain extra junk ordered pairs that proofs expecting to
get P(x) would have to do extra work using Replacement to filter out.
Axioms for Ordered Pairs. The ZFP axioms for ordered pairs specify the
abstract properties of ordered pairs via the relations π1 and π2 . These ordered
pairs have no ‘type’ restrictions, i.e., each pair projection can be either a set
or an ordered pair. Ordered Pair Emptiness (P1) ensures that no object has
both a projection (ordered pairs only) and a set member (sets only). Ordered
Pair Formation (P2) ensures that for every two objects b and c there exists an
ordered pair with b as first projection and c as second. Projection Both-Or-
Neither (P3) ensures that every object either has no projections (sets) or both
projections (ordered pairs). Projection Uniqueness (P4) ensures each ordered
pair has exactly one first projection and one second projection. Ordered Pair
Extensionality (P5) ensures that for every choice of first and second projections,
there is exactly one ordered pair.
Adding an Abstraction Barrier to ZF Set Theory 99
∀ a, b, c, d : (a, b) = (c, d) → (a = b ∧ c = d)
4 A Model of ZFP
We define within ZF a model for ZFP, i.e., an interpretation of the domain and
predicate symbols of ZFP. A translation from a ZFP formula ψ to a ZF formula
ψ ∗ is defined to interpret ZFP formulas in the model. Terms and formulas in
this section belong to ZF except for the arguments of ( · )∗ . All axioms of ZFP
hold under this translation, which implies that if ZF is consistent, so is ZFP [4].
That each axiom’s translation holds has been checked in Isabelle/ZF.
Like the Von Neumann universe V used as the domain of a model of ZF, our
domain W is a set hierarchy indexed by ordinal numbers.
An ordinal is a transitive set that is totally ordered by ∈, which we specify
formally by Ord(x) := (∀ y ∈ x : y ⊆ x) ∧ (∀ y, z ∈ x : y = z ∨ y ∈ z ∨ z ∈ y). Let
α and β range over ordinals. Let 0 := ∅, 1 := 0+ , 2 := 1+ , and so on. Ordinal
β is a successor ordinal iff β = α+ for some α. Ordinal β is a limit ordinal
iff β is neither 0 nor a successor ordinal. Let λ range over limit ordinals. Let
(x < y) := (x ∈ y ∧ Ord(y)) and define related symbols (e.g., ≤) as usual.
Any model of ZFP must have some way of distinguishing between the objects
in its domain representing ZFP sets, and those that represent ZFP pairs, i.e.,
ZFP needs a domain split into two disjoint subdomains. We model this in ZF
using Kuratowski ordered pairs and cartesian products to tag all domain objects
with 0 (‘set’) or 1 (‘ordered pair’).
Definition 4.1. For ordinal α, define the set Wα via transfinite recursion thus:
W0 = ∅, Wβ + = ({0} × P(Wβ )) ∪ ({1} × (Wβ )2 ), Wλ = Wβ
β∈λ
Starting from ∅, each successor tier Wβ + is built by taking the disjoint union
of the power set and cartesian square of the previous tier. Each limit tier Wλ is
the union of all preceding tiers. The use of disjoint union to build each successor
tier Wβ + gives a set-theoretic universe split into two. Although our disjoint
union uses Kuratowski pairs with 0 and 1 tags, we could use instead any two
definable injective operators from a large enough class (e.g., the universe) to
disjoint classes that raise rank by at most a constant.
Let W be the proper class such that x ∈ W iff x ∈ Wα for some α. We use a
bold upright serif font to emphasize that W is not a ZF set.6 By the transfinite
recursion theorem, given x there is a definite description W(x) that evaluates to
Wα when x evaluates to α.7 We express X belonging to W as follows:
Let an m-object be any member of W (i.e., a ZF set x such that H(x) holds),
an m-set be any m-object of the form 0, x, and an m-pair be any m-object of
the form 1, x. The following result says every m-object x is either an m-set or
an m-pair, and tells where in the hierarchy the contents of x are.
Lemma 4.3. Suppose H(x), so that x ∈ Wα . Then for some β < α either:
To interpret a ZFP formula ϕ in ZF, we must show the formula holds when
quantification is restricted to the domain W, and the predicate symbols are
replaced by the interpretations defined above.
(X ∈ Y )∗ := (Y ∗ )
(X ∗ ) ∈ (ϕ → ψ)∗ := (ϕ∗ ) → (ψ ∗ )
(X π1 Y )∗ := ∗
(X ) π 1 (Y ∗ ) (¬ϕ)∗ := ¬(ϕ∗ )
(X π2 Y )∗ := ∗
(X ) π 2 (Y ∗ ) (∀x : ϕ)∗ := (∀ x : H(x) → (ϕ∗ ))
x∗ := x ( x : ϕ)∗
ι := ( x : H(x) ∧ (ϕ∗ ))
ι
problems arise from the coincidences where a ZFP set x and a ZFP primitive
ordered pair p would be represented by the same ZF set y.
Observe that the ZFP abbreviations Set and Pair from Sect. 3.1 that act like
unary predicates are interpreted in ZF as follows:
Pair(x)∗ := (∃ a : H(a) ∧ a π
1 x) Set(x)∗ := ¬(Pair(x)∗ )
These predicates are clearly meaningful within the model because:
Lemma 4.9. Suppose that H(x), then we have that:
Pair(x)∗ ↔ (∃ a, b : x = 1, a, b) Set(x)∗ ↔ (∃ y : x = 0, y)
Now we reach our main result, which implies ZFP is consistent if ZF is [4]:
Theorem 4.10. For each ZFP axiom ϕ, the translation ϕ∗ holds in ZF.
The proof of this theorem simply observes the conjunction of a number of
lemmas, each of which shows for a ZFP axiom φ that φ∗ holds in ZF. Most of
these lemmas are straightforward. Here we show a representative example:
Lemma 4.11. The translation of ZFP’s Power Set axiom holds in ZF.
Proof. First, we find the translation using Definition 4.7 and Lemma 4.8:
∗
y ↔ ((z ⊆ x)∗ ))))
∀x : H(x) → (Set(x) → (∃y : H(y) ∧ ∀z : H(z) → (z ∈
∗
Let x be such that H(x), and suppose Set(x) . By Lemma 4.9, x = 0, x for
some set x . Let y = 0, y where y = {0}×P(x ) be our candidate for the power
set. We must show that y has the property ∀z : H(z) → (z ∈ y ↔ (z ⊆ x)∗ ),
and also that y is indeed a member of W. Fix z and assume H(z), then:
y ↔ z ∈ y
z∈
by def of y and ∈
↔ z ∈ {0} × P(x ) by def of y
↔ ∃z : z = 0, z ∧ z ⊆ x by def of × and P
∗
↔ Set(z) ∧ (∀a : a ∈ z→a∈ x) since z = 0, z , z ⊆ x
∗ ∗
↔ Set(z) ∧ Set(x) ∧ (∀a : a ∈ z→a∈
x) since H(x), x = 0, x
↔ (z ⊆ x)∗ because H(z)
It now remains to show that H(y). From H(x), we have that x ∈ Wα for some
ordinal α. By Lemma 4.4, x ∈ Wα+ , and by Lemma 4.3, x ⊆ Wα . Then:
x ⊆ Wα → P(x ) ⊆ P(Wα )
→ {0} × P(x ) ⊆ {0} × P(Wα )
→ y ⊆ {0} × P(Wα ) by def of y
→ y ⊆ Wα+ because {0} × P(Wα ) ⊆ Wα+
→ y ∈ Wα++ by def of y = 0, y
→ H(y) by def of H
Adding an Abstraction Barrier to ZF Set Theory 103
5 Conclusion
5.1 Summary of Contributions
References
1. Aczel, P.: Generalised set theory. In: Logic, Language and Computation. CSLI
Lecture Notes, vol. 1, pp. 1–17 (1996)
2. Barwise, J.: Admissible Sets and Structures. Cambridge University Press, Cam-
bridge (2017). Originally published by Springer in 1976
3. Ebbinghaus, H.-D.: Ernst Zermelo. Springer, Heidelberg (2007)
4. Enderton, H.B.: A Mathematical Introduction to Logic, 2nd edn. Elsevier, Ams-
terdam (2001)
5. Farmer, W.M.: Chiron: a multi-paradigm logic. Stud. Logic Gramm. Rhetor.
10(23), 1–19 (2007)
6. Harrison, J.: Let’s make set theory great again! (2018). http://aitp-conference.org/
2018/slides/JH.pdf. Accessed 27 May 2020
7. Holmes, M.R.: Alternative axiomatic set theories. In: The Stanford Encyclopedia
of Philosophy. Stanford University (2017)
8. Kanamori, A.: The empty set, the singleton, and the ordered pair. Bull. Symb.
Logic 9(3), 273–298 (2003)
9. Kubota, K.: Foundations of mathematics. Genealogy and overview (2018). https://
owlofminerva.net/files/fom 2018.pdf. Accessed 27 May 2020
10. Megill, N., Wheeler, D.A.: Metamath: A Computer Language for Mathematical
Proofs. LULU Press, Morrisville (2019)
11. Paulson, L.C.: Set theory for verification: I. From foundations to functions. J.
Autom. Reason. 11(3), 353–389 (1993)
12. Quinlan, D., Wells, J.B., Kamareddine, F.: BNF-style notation as it is actually
used. In: Kaliszyk, C., Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.) CICM
2019. LNCS (LNAI), vol. 11617, pp. 187–204. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-23250-4 13
13. Wiedijk, F.: Is ZF a hack?: Comparing the complexity of some (formalist inter-
pretations of) foundational systems for mathematics. J. Appl. Logic 4(4), 622–645
(2006)
A Framework for Formal Dynamic
Dependability Analysis Using HOL
Theorem Proving
1 Introduction
The rest of the paper is structured as follows: Sect. 2 presents our proposed
framework for the formal dynamic dependability analysis. Section 3 provides a
brief description of our formalization of DFTs. In Sect. 4, we present DRBDs
and our developed DRBD algebra. Section 5 presents the required mathematical
foundations of the CTMC dependability analysis. We report the current status
of the project and the remaining milestones in Sect. 6. Finally, Sect. 7 concludes
the paper.
2 Proposed Framework
Figure 1 shows an overview of our proposed framework for formal dynamic
dependability analysis. This framework provides verified generic expressions of
dependability in the HOL4 theorem prover using DFTs, DRBDs and CTMCs.
The analysis starts by having a system description with some dependability
requirements, such as a certain expression of reliability. The dependability of
this system can be modeled using a DFT, DRBD or CTMC model according to
its description. For the case of the DFTs and DRBDs, we need, respectively, a
library of formalized DFT gates and DRBD constructs besides their simplifica-
tion theorems and verified probabilistic behavior. For the CTMC formal analysis,
it is required to have both formal transient and steady state analyses. The formal
DFT and DRBD models can be analyzed qualitatively or quantitatively. In the
former, the sources of vulnerabilities of the system are verified by identifying the
cut sets and cut sequences. In the latter, we prove generic failure and reliability
expressions of DFT and DRBD based systems, respectively. It is worth men-
tioning that unlike PMC approaches, the formally verified generic expressions of
DFT and DRBD are independent of the probability distributions of the system
components. For CTMC based models, the proposed framework formally ana-
lyzes availability and reliability metrics by proving generic expressions of these
dependability metrics that are independent of the failure rates of system com-
ponents. We choose HOL4 in the development of the formalization of dynamic
dependability models as this would facilitate using some of the available theories,
such as the probabilistic PIE [18], Lebesgue integral [19] and probability [20].
Our ultimate goal in this project is to develop a tool that accepts the depend-
ability model in either a graphical or simple textual format. Then, using a parser,
the tool creates the HOL formal models of these formats that can be used in
the formal analysis using a HOL theorem prover. The aim of this tool is to
reduce the user interaction with the theorem proving environment, which would
facilitate the usage of this framework by users who are not familiar with HOL
theorem proving or the underlying mathematical foundations of the dependabil-
ity models. This requires invoking several techniques, such as machine learning,
to automatically (to a certain extent) verify the required expressions. Therefore,
this proposed framework will allow conducting the dynamic dependability anal-
ysis of many real-world systems to provide generic expressions. We highlight the
details of the proposed framework in the following sections including the current
status of the formalization and provide some insights about the remaining steps.
Dynamic fault trees (DFTs) [3] model the failure dependencies among system
components that cannot be captured using traditional SFTs. A DFT is a graphi-
cal representation of the sources of failure of a given system. The modeling starts
by an undesired top event that represents the failure of a system or a subsystem.
DFT inputs represent basic events that contribute to the occurrence (failure) of
the top event. The relationships and dependencies among these basic events
are modeled using DFT gates (Fig. 2). For example, the output event of the
Priority-AND (PAND) gate occurs when both input events occur in sequence.
Fault tree analysis (FTA) can be generally carried out qualitatively and quan-
titatively [3]. In the qualitative analysis, the combinations and sequences of basic
events that contribute to the occurrence of the top event (failure of the system)
are identified. These combinations and sequences represent the cut sets and cut
sequences [21], respectively. In the quantitative analysis, attributes, such as the
mean-time-to-failure (MTTF) and the probability of failure, can be evaluated
Formal Dynamic Dependability Analysis Framework 109
based on the failure distribution of the basic events and their relationships.
Dynamic FTA has been commonly conducted using some sort of a DFT algebra
(e.g., [22]) or by analyzing the corresponding CTMC of the given DFT [3]. In
the former method, an algebra similar to the ordinary Boolean algebra is defined
with some temporal operators and simplification properties that allow reducing
the structure function of the top event. Based on this function, both the qualita-
tive and quantitative analyses can be carried out, where the probability of failure
of the DFT’s top event can be expressed based on the failure distribution of the
basic events. On the other hand, the given DFT can be converted into its equiv-
alent CTMC, which can then be analyzed to find the probability of failure of the
top event [3]. Complex systems can generate CTMCs with a large state space
that can be handled by applying a modulerization approach, where the DFT is
divided into static and dynamic parts. The static part can be analyzed using
one of the conventional methods, such as binary decision diagrams (BDDs) [21].
The dynamic part can then be analyzed by converting it to its corresponding
CTMC. This kind of modulerization is implemented in the Galileo tool [23].
The arithmetic foundations of the algebraic approach of [22] were not for-
mally verified, which puts a question mark on the soundness of the reported
results. In [24], we proposed to formalize this DFT algebra in higher-order logic
theorem proving and developed an integrated methodology to conduct DFT’s
qualitative analysis using the HOL4 theorem prover and quantitative analysis
using the STORM model checker. However, generic expressions of probability of
failure cannot be obtained based on this methodology as a PMC is involved in
the quantitative analysis. Moreover, our definitions in [24] could not cater for
the DFT probabilistic analysis. Therefore, in [14,25], we improved our definitions
of DFT gates to conduct both the DFT qualitative and quantitative analyses
in the form of generic expressions in a theorem prover. Next, we provide the
description of the DFT algebra and its formalization in order to have a better
understanding of the first part of our proposed framework of Fig. 1.
fails when the first input occurs before or at the same time of the second input.
We formally defined these elements and operators in HOL4 as extended-real
functions of time [14]. The purpose of choosing extended-real numbers, which
are real numbers besides ±∞, is to be able to model the NEVER event that
returns +∞ as its time of failure. Several simplification properties are introduced
in the algebraic approach [22] to simplify the structure function of DFTs (the
function of the top event). This reduced structure function can then be used in
the probabilistic analysis. We verified over 80 simplification theorems [24] that
vary from simple theorems to more complex ones. This enables having formally
verified reduced cut sets and cut sequences, i.e., formal qualitative DFT analysis.
DFTs use the ordinary FT gates, i.e., AND and OR gates, besides the dynamic
gates (Fig. 2). AND (·) and OR (+) are used in the algebraic approach as oper-
ators as well as FT gates. The output of the AND gate fails when both inputs
fail. This means that the time of occurrence of the output event of the AND gate
is the maximum time of occurrence of both input events. The output of the OR
gate fails when at least one of the input events occurs. Therefore, the time of
occurrence equals the minimum time of occurrence of its inputs. The Priority-
AND (PAND) gate is similar to the AND gate, where the output fails when
both inputs fail. However, the input events should occur in a certain sequence,
conventionally, from left to right. The Functional DEPendency (FDEP) gate is
used to model a situation when the failure of one system component triggers the
failure of another. For the FDEP gate of Fig. 2, the occurrence of T triggers the
occurrence of X. Finally, the spare gate models spare parts in systems, where
the main part is replaced by the spare part after failure. In [14], we formally
defined these gates as functions of time to enable the verification of their failure
probabilistic expressions, as will be explained in the following section.
Additional constructs are used to model the dynamic dependencies among sys-
tem blocks. The main dynamic constructs are: spare, state-dependencies and
load sharing. The last two constructs enable modeling more realistic scenar-
ios in system reliability that include the effect of activation/deactivation of one
component on the rest of the components. This behavior cannot be captured
using DFTs [28] as they can only capture the failure without considering the
activation/deactivation effect.
Due to the dynamic nature of DRBDs, they can be analyzed by converting
them into a state-space model, i.e., a Markov chain. Then, the resultant Markov
chain can be analyzed using one of the traditional techniques, including analyt-
ical methods or simulation. Some tools, such as BlockSim [29], enable DRBD
analysis by providing a graphical user interface to model DRBDs and conduct
the analysis either analytically or using discrete event simulation. As mentioned
previously, complex systems can generate Markov chains with a large number
of states, which hinders the analysis process. Decomposition can be applied to
divide the DRBD into a dynamic part that can be solved using Markov chains
and a static part that can be analyzed using static RBD analysis techniques [30].
Although this decomposition would reduce the state space, such simulation based
analysis cannot provide accurate and complete results.
The formal semantics of DRBDs were introduced in [31] using the Object-Z
formalism [32]. Then, this DRBD is converted into a Colored Petri net (CPN),
where it can be analyzed using existing Petri net tools. However, since the given
DRBD is converted into a CPN, only state-based properties can be analyzed. In
addition, generic expressions of reliability cannot be obtained, which represents
our target in the proposed framework. HOL theorem proving has been only used
for the analysis of traditional SRBDs [12], which cannot support the scope of the
proposed framework, i.e., dynamic dependability. To the best of our knowledge,
there is no support of DRBD analysis using a HOL theorem prover that can cater
for the analysis of real-world systems that exhibit dynamic behavior. The main
challenge towards this direction is the absence of a formal DRBD algebra that
can provide similar analysis like DFTs. Therefore, we developed a novel algebra
that allows conducting both the qualitative and quantitative analyses based on
the structure function of DRBDs with spare constructs [15]. The formalization
of this algebra in HOL facilitates the analysis using a theorem prover. Below,
we provide an overview of DRBD constructs and structures.
Formal Dynamic Dependability Analysis Framework 113
The main dynamic DRBD constructs are shown in Fig. 3 [33]. The spare
construct is used to model spare parts in systems, similar to the DFT
spare gate. The state dependencies are used to model the effect of activa-
tion(A)/deactivation(D)/failure(F) among system components. In Fig. 3(b), the
A/D/F of the trigger will cause the state dependency controller (SDEP) to signal
the A/D/F of components X1 ...Xn . Finally, the load sharing (LSH) construct
is used to model the effect of sharing a load on the overall system failure. For
example, the LSH in Fig. 3 models a load that is shared among n components. It
is required that at least k out of these n components to be working in order for
the successful functionality of the overall system. Therefore, the D/F of some of
these components may cause the D/F of the rest of the components.
Besides the dynamic DRBD constructs, system components are represented
as blocks that can be connected in series, parallel, series-parallel and parallel-
series fashion, as shown in Fig. 4 [34]. Each block in Fig. 4 represents either a
simple system component or one of the DRBD dynamic constructs.
captures the situation where one system component is required to continue work-
ing after the failure of a second one; 4) Simultaneous operator (Δ) which is sim-
ilar to the DFT simultaneous operator; and 5) Inclusive after () that combines
the behavior of both the after and simultaneous operators. In [15] we provided
mathematical expressions for these operators, and expressed the DRBD struc-
tures and spare construct based on their mathematical expressions.
The DRBD blocks can be connected in several ways depending on the success
paths of the modeled system. The definitions and reliability expressions of the
structures of Fig. 4 are listed in Table 1 [34]. In the series structure, it is required
that all blocks are working for the system to work. Therefore, the series structure
can be modeled as the intersection of the individual block events, as listed in
Table 1, where Xi represents the DRBD event of the ith block. This structure
can be also modeled by ANDing the functions of these blocks. The reliability of
this structure equals the multiplication of the reliability of the individual blocks.
The parallel structure requires at least one of the blocks to be working for a
successful system behavior. Hence, it is modeled as the union of the events of
the individual blocks and it can be also modeled by ORing these functions. The
series-parallel structure (Fig. 4(c)) represents a series structure of blocks each
of which is a parallel structure. Therefore, it is modeled as the intersection of
unions. The parallel-series structure (Fig. 4(d)), is a parallel structure of several
series structures. It is modeled as the union of intersection of the individual
block events. In [15], we formally verified these expressions besides the reliability
of the spare construct. We plan to extend the DRBD algebra to model the
remaining dynamic constructs, i.e., load sharing and state dependency. This
requires modeling the deactivation state of system components and may include
introducing new DRBD operators to capture such behavior.
If the transition can happen at any time, i.e., the time is continuous, then
the MC is called a Continuous Time Markov Chain (CTMC). In the proposed
framework, we are interested in CTMCs as they can capture the dynamic behav-
ior at any instance of time. Once the process is in a certain state, i, the time it
spends in this state is exponentially distributed with rate λi .
The probabilistic behavior of the CTMC is described by the
initial state
probability vector πk (t0 ) [2], which is defined as P r X(t0 = k , k = 0, 1, 2, ...
and the transition probabilities pij [2], where
pij (v, t) = P r X(t) = j | X(v) = i , 0 ≤ v ≤ t and i, j = 0, 1, 2, ... (4)
116 Y. Elderhalli et al.
If we substitute v with 0, then, only the transition probabilities and the initial
state probability vector are enough to describe the probabilistic behavior of the
CTMC [2]. The state probability vector, π(t), is a vector with an entry for each
unconditional state probability. The sum of the entries in the state probability
vector at any time is equal to 1, as the MC should be in a certain state.
πj (t) = 1 (8)
j∈Ω
certain component in the system can lead to the failure of the whole system. So
the failure of components in the system will cause the transition from one state
in the CTMC to another. The transition rate depends on the failure rate of the
component that failed. A fail state is used to model the fail state of the system.
The CTMC quantitative analysis can be conducted using either the transient
analysis or steady-state analysis depending on the dependability metric that we
are interested in. These include the instantaneous availability, reliability and
steady state availability, as will be described below:
Transient Analysis. The transition probabilities and the transition rates are
related using Kolmogrov’s forward or backward equations [2]. The backward
Kolmogrov’s equations are defined as:
pij (t) = λik pki (t) − λi pij (t) (10)
k=i
Starting from a CTMC that models the failure behavior of a given system,
we can find the probability of being available at a certain moment of time, i.e.,
instantaneous availability or the system reliability using this transient analysis.
This is achieved by finding the probability of being in a fail or a working state.
π = πP(t) (13)
This means that if the CTMC starts with this initial stationary distribu-
tion, the unconditional state probabilities vector at any time will stay the same.
The stationary distribution can befound by solving the following set of linear
equations with the condition that j∈Ω πj (t) = 1 [36]:
πG = 0 (14)
Using this stationary distribution, we can find the overall probability of sys-
tem availability by finding the probability of being in a working state. This
means that we can find the fraction of time where the system is available during
its life cycle, which represents the steady state availability.
118 Y. Elderhalli et al.
Table 2. Roadmap
ML models. Finally, we have to program the core of the tool that connects the
pieces of the framework together to enable the automatic dynamic dependability
analysis. As the development of this tool is an incremental process, which can
be improved with time, we plan to conduct some tutorials for end-users that
are not familiar with HOL to train them and consider their feedback. This step
is also important for verification and reliability engineers that are interested in
enriching the underlying theories of the proposed framework. This helps in the
sustainability of the proposed framework by engaging many users with differ-
ent goals and backgrounds in the development of the framework and its tool. A
summary of this roadmap is provided in Table 2.
7 Conclusions
In this paper, we proposed a comprehensive framework to conduct the formal
dynamic dependability analysis using HOL theorem proving. We provided the
details of the mathematical foundations of each part of the proposed framework.
The main contributions of this work is the development of the proposed frame-
work in the HOL4 theorem prover that includes the formalization of DFTs and
CTMCs besides the development of the DRBD algebra. These formalized models
allow the dependability analysis of many real-world system that exhibit dynamic
behavior. We described the future milestones to complete the proposed project
including the final tool that enables the (semi) automation of the analysis.
References
1. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxon-
omy of dependable and secure computing. IEEE Trans. Dependable Secure Com-
put. 1(1), 11–33 (2004)
2. Trivedi, K.S.: Probability and Statistics with Reliability, Queuing and Computer
Science Applications. Wiley, Hoboken (2002)
3. Stamatelatos, M., Vesely, W., Dugan, J., Fragola, J., Minarick, J., Railsback, J.:
Fault Tree Handbook with Aerospace Applications. NASA Office of Safety and
Mission Assurance (2002)
4. Distefano, S., Xing, L.: A new approach to modeling the system reliability: dynamic
reliability block diagrams. In: Reliability and Maintainability Symposium, pp. 189–
195. IEEE (2006)
5. Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
6. Gordon, M.J., Melham, T.F.: Introduction to HOL: A Theorem Proving Environ-
ment for Higher-Order Logic. Cambridge University Press, Cambridge (1993)
7. Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A storm is coming: a modern prob-
abilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS,
vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-
319-63390-9 31
8. Ghadhab, M., Junges, S., Katoen, J.-P., Kuntz, M., Volk, M.: Model-based safety
analysis for vehicle guidance systems. In: Tonetta, S., Schoitsch, E., Bitsch, F.
(eds.) SAFECOMP 2017. LNCS, vol. 10488, pp. 3–19. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-66266-4 1
Formal Dynamic Dependability Analysis Framework 121
9. Elderhalli, Y., Volk, M., Hasan, O., Katoen, J.-P., Tahar, S.: Formal verification of
rewriting rules for dynamic fault trees. In: Ölveczky, P.C., Salaün, G. (eds.) SEFM
2019. LNCS, vol. 11724, pp. 513–531. Springer, Cham (2019). https://doi.org/10.
1007/978-3-030-30446-1 27
10. Kwiatkowska, M., Norman, G., Parker, D.: Quantitative analysis with the proba-
bilistic model checker PRISM. Electron. Notes Theor. Comput. Sci. 153(2), 5–31
(2006)
11. Ahmed, W., Hasan, O.: Formalization of fault trees in higher-order logic: a deep
embedding approach. In: Fränzle, M., Kapur, D., Zhan, N. (eds.) SETTA 2016.
LNCS, vol. 9984, pp. 264–279. Springer, Cham (2016). https://doi.org/10.1007/
978-3-319-47677-3 17
12. Ahmed, W., Hasan, O., Tahar, S.: Formalization of reliability block diagrams in
higher-order logic. J. Appl. Logic 18, 19–41 (2016)
13. HOL4 (2020). https://hol-theorem-prover.org/
14. Elderhalli, Y., Ahmad, W., Hasan, O., Tahar, S.: Probabilistic analysis of dynamic
fault trees using HOL theorem proving. J. Appl. Logics 6, 467–509 (2019)
15. Elderhalli, Y., Hasan, O., Tahar, S.: A formally verified algebraic approach for
dynamic reliability block diagrams. In: Ait-Ameur, Y., Qin, S. (eds.) ICFEM 2019.
LNCS, vol. 11852, pp. 253–269. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-32409-4 16
16. Hölzl, J.: Markov processes in Isabelle/HOL. In: ACM SIGPLAN Conference on
Certified Programs and Proofs, pp. 100–111 (2017)
17. Isabelle (2020). https://isabelle.in.tum.de/
18. Ahmed, W., Hasan, O.: Towards formal fault tree analysis using theorem proving.
In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015.
LNCS (LNAI), vol. 9150, pp. 39–54. Springer, Cham (2015). https://doi.org/10.
1007/978-3-319-20615-8 3
19. Mhamdi, T., Hasan, O., Tahar, S.: On the formalization of the lebesgue integration
theory in HOL. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol.
6172, pp. 387–402. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-
642-14052-5 27
20. Mhamdi, T., Hasan, O., Tahar, S.: Formalization of entropy measures in HOL. In:
van Eekelen, M., Geuvers, H., Schmaltz, J., Wiedijk, F. (eds.) ITP 2011. LNCS,
vol. 6898, pp. 233–248. Springer, Heidelberg (2011). https://doi.org/10.1007/978-
3-642-22863-6 18
21. Ruijters, E., Stoelinga, M.: Fault tree analysis: a survey of the state-of-the-art in
modeling. Anal. Tools Comput. Sci. Rev. 15–16, 29–62 (2015)
22. Merle, G.: Algebraic modelling of dynamic fault trees, contribution to qualitative
and quantitative analysis. Ph.D. thesis, ENS, France (2010)
23. Sullivan, K.J., Dugan, J.B., Coppit, D.: The galileo fault tree analysis tool. In:
IEEE Symposium on Fault-Tolerant Computing, pp. 232–235 (1999)
24. Elderhalli, Y., Hasan, O., Ahmad, W., Tahar, S.: Formal dynamic fault trees
analysis using an integration of theorem proving and model checking. In: Dutle,
A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 139–156.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5 10
25. Elderhalli, Y., Hasan, O., Tahar, S.: A methodology for the formal verification of
dynamic fault trees using HOL theorem proving. IEEE Access 7, 136176–136192
(2019)
26. Boudali, H., Crouzen, P., Stoelinga, M.: A rigorous, compositional, and extensi-
ble framework for dynamic fault tree analysis. IEEE Trans. Dependable Secure
Comput. 7, 128–143 (2010)
122 Y. Elderhalli et al.
27. Altby, A., Majdandzic, D.: Design and implementation of a fault-tolerant drive-by-
wire system. Master’s thesis, Chalmers University of Technology, Sweden (2014)
28. Distefano, S., Puliafito, A.: Dynamic reliability block diagrams vs dynamic fault
trees. In: Reliability and Maintainability Symposium, pp. 71–76. IEEE (2007)
29. BlockSim (2020). https://www.reliasoft.com/products/reliability-analysis/
blocksim
30. Distefano, S.: System dependability and performances: techniques, methodologies
and tools. Ph.D. thesis, University of Messina, Italy (2005)
31. Xu, H., Xing, L.: Formal semantics and verification of dynamic reliability block
diagrams for system reliability modeling. In: International Conference on Software
Engineering and Applications, pp. 155–162 (2007)
32. Smith, G.: The Object-Z Specification Language, vol. 1. Springer, Boston (2012).
https://doi.org/10.1007/978-1-4615-5265-9
33. Xu, H., Xing, L., Robidoux, R.: Drbd: dynamic reliability block diagrams for sys-
tem reliability modelling. Int. J. Comput. Appl. 31(2), 132–141 (2009)
34. Hasan, O., Ahmed, W., Tahar, S., Hamdi, M.S.: Reliability block diagrams based
analysis: a survey. In: International Conference of Numerical Analysis and Applied
Mathematics, vol. 1648, p. 850129.1-4. AIP (2015)
35. Liu, L., Hasan, O., Tahar, S.: Formal reasoning about finite-state discrete-time
Markov chains in HOL. J. Comput. Sci. Technol. 28(2), 217–231 (2013)
36. Grimmett, G., Stirzaker, D., et al.: Probability and Random Processes. Oxford
University Press, Oxford (2001)
37. Elderhalli, Y., Hasan, O., Tahar, S.: Using machine learning to minimize user
intervention in theorem proving based dynamic fault tree analysis. In: Conference
on Artificial Intelligence and Theorem Proving, pp. 36–37 (2019)
38. Li, Y., Lee, P.P.C., Lui, J.C.S.: Stochastic analysis on RAID reliability for solid-
state drives. In: IEEE International Symposium on Reliable Distributed Systems,
pp. 71–80 (2013)
Induction with Generalization in
Superposition Reasoning
1 Introduction
Automating inductive reasoning opens up new possibilities for generating
and proving inductive properties, for example properties with inductive data
types [4,21] or inductive invariants in program analysis and verification [13,14].
Recent advances related to automating inductive reasoning, such as first-order
reasoning with inductively defined data types [16], the Avatar architecture [26],
inductive strengthening of SMT properties [22], structural induction in superpo-
sition [10] and general induction rules within saturation [19], make it possible to
re-consider the grand challenge of mechanizing mathematical induction [5]. In
this paper, we contribute to these advances by generalizing inductive reasoning
within the saturation-based proof search of first-order theorem provers using the
superposition calculus.
It is common in inductive theorem proving, that given a formula/goal F , to
try to prove a more general goal instead [5]. This makes no sense in saturation-
based theorem proving, which is not based on a goal-subgoal architecture. As
we aim to automate and generalize inductive reasoning within saturation-based
proof search, our work follows a different approach than the one used in induc-
tive theorem provers. Namely, our methodology in Sect. 4 picks up a formula F
(not necessarily the goal) in the search space and adds to the search space new
induction axioms with generalization, that is, instances of generalized induction
schemata, aiming at proving both ¬F and a more general formula than ¬F .
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 123–137, 2020.
https://doi.org/10.1007/978-3-030-53518-6_8
124 M. Hajdú et al.
2 Preliminaries
Fig. 1. Term algebras of N and L, together with additional symbols and axioms.
Specifically, we will deal with + and ≤ for N having their standard meaning
and ++ and prefix for L, denoting the list concatenation and the prefix relation,
respectively. These additional symbols are axiomatized by first-order formulas
corresponding to their recursive definitions, shown in Fig. 1.
While we use N and L for illustration, we however note that our approach
can be used for proving properties over any other theories with various forms of
induction.
Theorem proving of first-order properties of inductively defined data types
needs to handle the domain closure, injectivity, distinctness and acyclicity axioms
of term algebras – a detailed definition of these axioms can be found in [16,23].
The challenge we address in [16] is how to automate proving term algebras
properties given the fact that the acyclicity axiom is not finitely axiomatizable.
Throughout this paper, we will be using the structural induction axiom and
rule for N, introduced in [19], for illustrating our approach. Given a literal ¬L[t],
where t is chosen as an induction term, a structural induction axiom for N is:
L[0] ∧ ∀x.(L[x] → L[s(x)]) → ∀y.(L[y]). (1)
Informally, the axiom expresses that if the base case holds, and if the induction
step holds, then the literal holds for all possible values. The structural induction
rule for N, given a clause ¬L[t] ∨ C, adds a clausified form of this axiom to the
search space:
¬L[t] ∨ C
. (2)
(¬L[0] ∨ L[σ] ∨ L[y]) ∧ (¬L[0] ∨ ¬L[s(σ)] ∨ L[y])
After using the rule, the L[y] in both resulting clauses can be resolved against
the ¬L[t] in the premise clause.
126 M. Hajdú et al.
3 Motivating Example
Let us now motivate our approach to induction with generalization, by consid-
ering the following formula expressing the associativity of addition over N:
The induction approach introduced in [19] is able to prove this problem. The
main steps of such a proof are shown in Fig. 2 and discussed next. First, the
negation of formula (3) is skolemized, yielding the (unit) clause C1 of Fig. 2.
As already mentioned, the σi denote fresh Skolem constants introduced during
clausification. Next, the structural induction axiom (1) is instantiated so that its
conclusion can resolve against C1 using the constant σ1 as the induction term,
resulting in the formula:
0 + (σ2 + σ3 ) = (0 + σ2 ) + σ3 ∧
∀x.(x + (σ2 + σ3 ) = (x + σ2 ) + σ3 → s(x) + (σ2 + σ3 ) = (s(x) + σ2 ) + σ3 ) (4)
→ ∀y.(y + (σ2 + σ3 ) = (y + σ2 ) + σ3 ).
Then, the CNF of the induction axiom (4) is added to the search space using
the following instance of the structural induction rule (2):
σ1 + (σ2 + σ3 ) = (σ1 + σ2 ) + σ3
. (5)
(0 + (σ2 + σ3 ) = (0 + σ2 ) + σ3 ∨ σ + (σ2 + σ3 ) = (σ + σ2 ) + σ3 ∨
y + (σ2 + σ3 ) = (y + σ2 ) + σ3 )
∧
(0 + (σ2 + σ3 ) = (0 + σ2 ) + σ3 ∨ s(σ) + (σ2 + σ3 ) = (s(σ) + σ2 ) + σ3 ∨
y + (σ2 + σ3 ) = (y + σ2 ) + σ3 )
The clauses from the inference conclusion are resolved against C1 , yielding
clauses C2 , C3 of Fig. 2. Clause C4 originates by repeated demodulation into C3
using the second axiom of Fig. 1 over N. Further, C5 is derived from C4 by using
the injectivity property of term algebras and C6 is a resolvent of C2 and C5 .
Clause C7 is then derived by repeated demodulation into C6 , using the first
axiom of Fig. 1 over N. By removing the trivial inequality from C7 , we finally
derive the empty clause C8 .
Induction with Generalization in Superposition Reasoning 127
which is different from (7). When we add this formula, we can derive the empty
clause in the same way as in Fig. 2.
Saturation with Induction with Generalization. The main questions to
answer when applying induction with generalization is which occurrences of the
induction term in the induction literal we should choose.
Generally, if the subterm t occurs n times in the premise, there are 2n − 1
ways of applying the rule, all potentially resulting in formulas not implying each
other. Thus, an obvious heuristic to use all non-empty subsets may result in too
many formulas. For example, σ1 + (σ1 + σ1 ) = (σ1 + σ1 ) + σ1 would result in
adding 63 induction formulas.
Another simple heuristic is to restrict the number of occurrences selected as
induction term to a fixed number. This strategy reduces the number of appli-
cations of induction at the cost of losing proofs that would need subsets of
Induction with Generalization in Superposition Reasoning 129
cardinality larger than the limit. Finding possible heuristics for selecting specific
subsets for common cases of literals can be subject of future work, especially
interesting in proof assistants in mathematics.
Note that some of the conclusions of (11) can, in turn, have many children
obtained by induction with generalization. Our experiments in Sect. 5 show that,
even when we generate all possible children, Vampire can still solve large exam-
ples with more than 10 occurrences of the same induction variable, again thanks
to the effect that, for each application of induction, only a small number of
ground clauses turn out to be added to the search space.
We therefore believe that our work can potentially be also useful for larger
examples, and even in cases when the inductive property to be proved is embed-
ded in a larger context.
5 Experiments
Implementation. We implemented induction with generalization in Vampire,
with two new options:
– boolean-valued option indgen, which turns on/off the application of induction
with generalization, with the default value being off, and
– integer-valued option indgenss, which sets the maximum size of the subset
of occurrences used for induction, with the default value 3. This option is
ignored if indgen is off.
Our implementation of induction with generalization is available at: https://
github.com/vprover/vampire.
In experiments described here, if indgen is off, Vampire performs induction on
all occurrences of a term in a literal as in [19]. In this section
– Vampire refers to the (default) version of Vampire with induction rule (10)
(i.e., the option -ind struct)
– Vampire* additionally uses the IndGen rule of induction with generaliza-
tion (11) (i.e., the options -ind struct -indgen on).
– Vampire** uses the same options as Vampire* plus the option -indoct on,
which applies induction to arbitrary ground terms, not just to constants as
in Vampire or in Vampire*.
2. Is the new rule useful at all for this kind of benchmarks? While the new rule
can be used in principle, should it (or can it) be used in program analysis
and verification?
Our results show that the overhead is relatively small but we could not solve
problems not solvable without the use of the new rule.
Induction (10) in Vampire was already evaluated in [19] against other
solvers on these examples. Hence, we only compare how Vampire*/Vampire**
performs against Vampire, using both the default and the portfolio modes.
(In the default mode, Vampire/Vampire*/Vampire** uses default values for
all parameters except the ones specified by the user; in the portfolio mode,
Vampire/Vampire*/Vampire** sequentially tries different configurations for
parameters not specified by the user.) Together, we ran 18 instances: Vampire,
Vampire* with indgenss set to 2, 3, 4 and unlimited, and Vampire** with
the same four variants of indgenss; each of them in both default and portfolio
mode. We ran our experiments on the StarExec cluster [25].
The best Vampire*/Vampire** solved 5 problems in the portfolio mode
and 1 problem in the default mode not solved by Vampire. However, the proofs
found by them did not use induction with generalization. This is a common
problem in experiments with saturation theorem proving: new rules change the
direction of the proof search and may result in new simplifications that also
drastically affect the search space. As a result, new proofs may be found, yet
these proofs do not actually use the new rule. There were no problems solved by
Vampire that were not solved by any Vampire*/Vampire**.
The maximum number of IndGen applications in proofs was 3 and the max-
imum depth of induction was 4. Vampire*/Vampire** used generalized induc-
tion in proofs of 10 problems. However, these problems are also solvable by
Vampire (without generalized induction). Thus, we conclude that SMT-LIB
problems (probably as well as other typical program analysis and verification
benchmarks) typically do not gain from using generalization.
Experiments with Mathematical Problems. We handcrafted a number of
natural problems over natural numbers and lists and tested the new rule on
these problems. Our benchmarks are available at: https://github.com/vprover/
inductive benchmarks.
Table 1 lists 16 of such examples using the functions defined in Fig. 1. Some
examples were taken from or inspired by the TIP benchmark library [9]: e.g., the
seventh benchmark in Table 1 is adapted from the TIP library and the second
problem is inspired by a symmetric problem from the TIP library, ∀x.(s(x)+x =
s(x+x)). While they are handcrafted, we believe they are representative since no
attempt was done to exclude problems not solvable by Vampire using induction
with generalization.
We evaluated and compared several state-of-the-art reasoners supporting
standard input formats and, due to the nature of our work, either superposition-
based approaches or approaches to generalization. It was not easy to make these
experiments since provers use different input syntaxes (see Table 2). As a result,
we also had to design translations of our benchmarks.
Induction with Generalization in Superposition Reasoning 131
n
io
Im o sit
te
C pir *
ew n
Va pir *
m e*
ri
n po
pR e
cl a
m e
vc e
Zi 4-G
A dr
Va y
Va pir
Ze er
or
Zi 4
2
an
pp
m
vc
he
T
C
∀x, y.(x + y = y + x) –
∀x.(x + s(x) = s(x + x)) – – – – – –
∀x, y, z.(x + (y + z) = (x + y) + z)
∀x.(x + (x + x) = (x + x) + x) – – – – –
∀x.((x+x)+((x+x)+x) = x+(x+((x+x)+x))) – – – – –
N
∀x, y.(y + (x + x) = (x + y) + x) – – – – – – –
∀x.(x ≤ x)
∀x, y.(x ≤ x + y)
∀x.(x ≤ x + x) – – – – – – – –
∀x.(x + x ≤ (x + x) + x) – – – – – – –
∀l, k, j.(l ++ (k ++ j) = (l ++ k) ++ j)
∀l.(l ++ (l ++ l) = (l ++ l) ++ l) – – – – – – –
∀l, k.(l ++ (k ++ (l ++ l)) = (l ++ k) ++ (l ++ l)) – – – – – – –
L
∀l, k.prefix(l, l ++ k)
∀l.prefix(l, l ++ l) – – – – – – – –
∀l : L, x : N.(cons(x + s(x), l) ++ (l ++ l)
= (cons(s(x) + x, l) ++ l) ++ l)
N, L – – – – – – – –
this equality has the same number of occurrences in t1 and t2 . For example, the
following equality is valid over natural numbers:
To prove such problems over N, one needs both induction and generalization.
Without the successor function, they can be easily proved using associativity and
commutativity of +, but associativity and commutativity are not included in the
axioms of N. When the terms are large, the problems become highly challenging.
Table 2. Configurations and input format of solvers for the mathematical problems.
We generated a set of instances of these problems (with and without the suc-
cessor function, and also other functions and predicates) by increasing term sizes.
We also generated similar problems for lists using concatenation and reverse
functions, and prefix predicate. Some of the terms were, e.g., variations of (6)
with 20 occurrences of x. Our entire dataset, containing over 3,300 examples, is
available at the previously mentioned URL.
We were again interested in evaluating and comparing various reasoners and
approaches on these problems. The interesting feature of these problems is that
they are natural yet we can generate problems of almost arbitrary complexity.
We evaluated and compared Vampire*, Vampire**, Cvc4-Gen, Zeno and
ZipRewrite, that is the best performing solvers on inductive reasoning with
generalization according to Table 1, using the same experimental setting as
already described for Table 1. Table 3 lists a partial summary of our experi-
ments, displaying results for 2,007 large instances of four simple properties with
one variable, corresponding to the fourth, ninth, twelfth and fifteenth problem
from Table 1. (Due to space constraints, we chose these problems as a represen-
tative subset of our large benchmarks, since the solvers’ performance was very
similar for the whole benchmark set.)
In Table 3, we use the following notation. By nx = nx we denote formulas
of the form x ◦ ... ◦ x = x ◦ ... ◦ x with n occurrences of x on both sides of the
equality, and parentheses on various places in the expressions, with ◦ being +, or
Induction with Generalization in Superposition Reasoning 133
6 Related Work
Research into automating induction has a long history with a number of tech-
niques developed, including for example approaches based on semi-automatic
inductive theorem proving [5,7,8,18], specialized rewriting procedures [12], SMT
reasoning [22] and superposition reasoning [10,11,15,19].
Previous works on automating induction mainly focus on inductive theorem
proving [7,8,24]: deciding when induction should be applied and what induction
axiom should be used. Further restrictions are made on the logical expressive-
ness, for example induction only over universal properties [5,24] and without
uninterpreted symbols [18], or only over term algebras [11,15]. Inductive proofs
usually rely on auxiliary lemmas to help proving an inductive property. In [8]
heuristics for finding such lemmas are introduced, for example by randomly
generating equational formulas over random inputs and using these formulas if
they hold reasonably often. The use of [8] is therefore limited to the underlin-
ing heuristics. Other approaches to automating induction circumvent the need
for auxiliary lemmas by using uncommon cut-free proof systems for inductive
reasoning, such as a restricted ω-rule [1], or cyclic reasoning [6].
The work presented in this paper automates induction by integrating it
directly in superposition-based proof search, without relying on rewrite rules
and external heuristics for generating auxiliary inductive lemmas/subgoals as
in [5,7,8,18,24]. Our new inference rule IndGen for induction with generaliza-
tion adds new formulas to the search space and can replace lemma discovery
heuristics used in [7,8,22]. Our work also extends [19] by using and instanti-
ating induction axioms with logically stronger versions of the property being
proved. Unlike [10], our methods do not necessarily depend on Avatar [26], can
be used with any (inductive) data type and target induction rules different than
structural induction. Contrarily to [11], we are not limited to induction over
term algebras with the subterm ordering and we stay in a standard saturation
framework. Moreover, compared to [5, 7,8,22], one of the main advantages of our
approach is that it does not use a goal-subgoal architecture and can, as a result,
combine superposition-based equational reasoning with inductive reasoning.
134 M. Hajdú et al.
7 Conclusions
References
1. Baker, S., Ireland, A., Smaill, A.: On the use of the constructive omega-rule within
automated deduction. In: Voronkov, A. (ed.) LPAR 1992. LNCS, vol. 624, pp.
214–225. Springer, Heidelberg (1992). https://doi.org/10.1007/BFb0013063
2. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011.
LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-22110-1 14
3. Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories Library
(SMT-LIB) (2016). www.SMT-LIB.org
4. Blanchette, J.C., Peltier, N., Robillard, S.: Superposition with datatypes and
codatatypes. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS
(LNAI), vol. 10900, pp. 370–387. Springer, Cham (2018). https://doi.org/10.1007/
978-3-319-94205-6 25
136 M. Hajdú et al.
23. Rybina, T., Voronkov, A.: A decision procedure for term algebras with queues.
ACM Trans. Comput. Log. 2(2), 155–181 (2001)
24. Sonnex, W., Drossopoulou, S., Eisenbach, S.: Zeno: an automated prover for prop-
erties of recursive data structures. In: Flanagan, C., König, B. (eds.) TACAS 2012.
LNCS, vol. 7214, pp. 407–421. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-28756-5 28
25. Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: a cross-community infrastructure
for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014.
LNCS (LNAI), vol. 8562, pp. 367–373. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-08587-6 28
26. Voronkov, A.: AVATAR: the architecture for first-order theorem provers. In: Biere,
A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 696–710. Springer, Cham
(2014). https://doi.org/10.1007/978-3-319-08867-9 46
A Survey of Languages for Formalizing
Mathematics
1 Introduction
effective for humans in a way that formal languages have so far not been able
to capture. In fact, in 2007, Wiedijk claimed [Wie07], citing four representative
statements, that no existing formal system was sufficient to naturally express
basic mathematical content. Despite the progress made since then, his critique
still applies.
We give an introduction to the objectives and main approaches in Sect. 2.
Then Sects. 3 and 4 describe the main approaches: formal system and intermedi-
ate languages in more detail. Sections 5 and 6 describe closely related orthogonal
aspects: language frameworks and interchange libraries. We evaluate our findings
and conclude in Sect. 7.
2 Overview
2.1 Objectives
Thus, a big picture goal of the field is a tighter integration of (i) natural lan-
guage mathematical content such as textbooks or software specifications, and
(ii) formalization of such content in logics and theorem proving systems. We can
identify the following overarching objectives:
Practical Workflows that Integrate Natural and Formal Languages. Such a lan-
guage and standard library would enable substantially better tool support for
working researchers in mathematical sciences: Being structurally similar to both
natural and formal languages, they could serve as an interface language for tools
of either kind. This would allow enriching existing workflows such as LATEX-based
140 C. Kaliszyk and F. Rabe
2.2 Approaches
The most successful formal languages for mathematical content have been devel-
oped in the areas of formal logic where they occur most prominently as the input
languages of proof assistants as well as in computer algebra where they occur as
programming languages fitted to mathematical algorithms. These combine formal
foundations with complex structuring mechanisms, especially type, module, proof,
and computation systems, which have proved critical to achieve efficient large scale
tool support. Importantly, these fix not only the syntax but also the semantics of,
e.g., proofs and computations On the contrary, in natural language, these are not
spelled out at all, let alone explicated as a primitive feature—instead, they are
emergent features driven by conventions and flexibly adaptable to different con-
texts. Consequently, formalization is usually a non-structure-preserving transfor-
mation that is often prohibitively expensive and creates an entry barrier for casual
users. For example, the mathematician Kevin Buzzard admonishes computer sci-
entists to build more human-near languages “so users can at least read sentences
they understand and try to learn to write these sentences”.
Between the extremes of natural and formal languages, a variety of inter-
mediate languages make different trade-offs aiming at combining the universal
applicability of natural language with the advantages of formal semantics. A
central observation is that
– existing intermediate languages apply only the formal syntax and (to varying
degrees) semantics of formal languages but not their complex structuring
mechanisms, and
– this limitation is not necessarily an inherent feature of the approach but rather
a frontier of research.
The following table summarizes the resulting trichotomy and shows how each
kind of language satisfies only two out of three essential requirements:
In the subsequent sections, we discuss the state of the art for these languages
in more detail.
3 Formal Languages
Many formal systems use what we call hard type systems, which assign a unique
type to each object and are thus easiest to automate. Systems derived from
Martin-Löf type theory [ML74] or the calculus of constructions [CH88] usually
use the proofs-as-programs correspondence (Curry-Howard [CF58,How80]) that
represents mathematical properties as types and proofs as data. These include
Agda [Nor05], Coq [Coq15], Lean [dKA+15], Matita [ASTZ06] as well as Nuprl
[CAB+86]. Systems derived from Church’s higher-order logic [Chu40] usually
use the LCF architecture [Mil72] that uses an abstract type of proved theorems.
These include HOL4 [HOL], ProofPower [Art], Isabelle [NPW02], and HOL Light
[Har96].
Hard type systems are at odds with natural language as the unique-type
property precludes representing mathematical sets and subsets as types and
subtypes. In particular, the lack of expressive subtyping in hard type systems is
fundamentally at odds with every day mathematics, where sets and subsets are
used throughout: hard type system precludes a direct representation of sets as
types because they cannot represent the rich (even undecidable) subset relation
using subtyping.
Multiple systems have explored compromises. We speak of semi-soft type
systems if a hard type system is extended with variants of subtyping. For exam-
ple, PVS [ORS92] uses predicate subtypes, Lean [dKA+15] and Nurpl [CAB+86]
support predicate subtypes and quotient types, and IMPS uses [FGT93] refine-
ment types.
Both hard and semi-soft type systems force users to choose between rep-
resenting information using the type system (e.g., ∀x : N.P (x)) or the logical
system (e.g., ∀x.x ∈ N ⇒ P (x)). Problematically, this choice usually has far-
reaching consequences, e.g., the type system may be decidable but the logic
system undecidable. But from the perspective of mathematics this distinction
is artificial, and the fact that the two resulting representations may be entirely
incompatible down the road is very awkward.
These problems are avoided in untyped languages. ACL2 [KMM00] is a first-
order logic on top of the untyped λ-calculus of Lisp that strongly emphasizes
computation. Untyped set theory is used in Isabelle/ZF [PC93], Metamath
[Meg07], and the B method [Abr96]. Untyped languages are also common in
142 C. Kaliszyk and F. Rabe
module systems differ greatly from natural language where no two-layer lan-
guage is fixed.
Secondly, internal module systems use record types to mimic modular struc-
ture inside the type system. This is possible in all systems that support record
types (e.g., Agda, Coq, Isabelle, Lean, PVS); Mizar’s structures behave simi-
larly. Soft modules are more flexible and thus similar to natural language, but
the lack of a concise module system makes modular reasoning like inheritance
and refinement more difficult. For example, soft module systems must manu-
ally employ extra-logical conventions (e.g., [GGMR09]), and combining modules
built with different conventions quickly becomes impractical. This is even worse
in the common case where both hard and soft module systems are present in
parallel (we have initiated work in this direction in [MRK18]).
Both of the above can be seen as hard module systems in the sense that a
module encapsulates a fixed set of declarations that induce selectors that can
be applied to the module’s instances. A third group, which we call soft module
systems is somewhat hypothetical as it is used much less widely. Here, in analogy
to soft tying, modules are treated as unary predicates that range over objects.
Inheritance then becomes a special case of implication. This idea is used in
the GAP module system, whose soft types (called properties) and soft modules
(called categories) are treated very similarly: they are jointly filters, and the
run-time system tracks which object satisfies which filters. The main difference
between them is that categories can have constructors and thus allow for filters
that are satisfied by construction.
Finally, since module systems have mostly been designed as extensions of
existing logical languages, both hard and soft module systems fail to capture a
number of essential features of natural mathematical language: the identification
of isomorphic instances of the same module; the seamless extension of operations
across substructures and quotient structures (e.g., + is first defined on N, then
extended to Z); the flexibility of presence and order of fields in a structure (e.g.,
(Z, +, ∗) and (Z, +, 0, −, ∗, 1) should be the same ring); the context-sensitive
meaning of structures (e.g., Z should be a ring or a total order, depending on the
context); and in many systems also the implicit application of forgetful functors
(e.g., a group is not automatically also a monoid).
The second major application of formal languages stems from computer algebra
systems, which use mathematics-customized variants of general purpose pro-
gramming languages for efficient computation.
Even though mathematics uses mostly pure functions, most systems are
based on Turing-complete imperative programming, mostly to reuse existing user
knowledge and fast implementations. It is common to use the same language for
pure mathematical algorithms and interspersed imperative meta-operations like
I/O, logging, memoization, or calling external tools (in particular in SageMath).
Proof assistants take a much more restricted approach to integrate pure com-
putations with a logic. Three main approaches exist. Firstly, normalization in
the type theory, in particular β-reduction is a primitive form of computation. It
becomes much stronger when combined with (co)inductive types and recursion,
and these are primitive features in most complex type theories like Coq. Systems
then usually include heuristic termination criteria to check the soundness of the
functions, which leads to a trade-off between logical and Turing-completeness.
Secondly, certain theorems such as Horn formulas about equality can be inter-
preted as conditional rewrite rules. Typically, systems require the user to choose
which theorems to use and then exhaustively rewrite expressions with them.
This is much slower but allows for a much simpler language as computation is
relegated to the meta-level. This is the main method used in systems without
primitive (co)inductive types such as Isabelle. Thirdly, computation can be sup-
plied by external tools or special kernel modules. This computation can be a
part of the, consequently rather big, trusted code base, such as in PVS decision
procedures, the usage of SAT solvers is Mizar [Nau14]. This is also the case
in Theorema: As the proof assistant is written in the Mathematica computer
algebra system, it is in principle possible to use most Mathematica’s algorithms
inside Theorema [Win14]. In some cases, a trade-off is possible where computa-
tions are run externally and their results are efficiently verified by the prover.
4 Intermediate Languages
Intermediate languages try to capture the advantages of natural languages in
a formal language. There is a rather diverse set of such approaches, which we
A Survey of Languages for Formalizing Mathematics 145
Fig. 1. Functionality for intermediate languages (left) market gap for stepwise formal-
ization support (right)
describe in groups below. However, we can identify some general effects that
motivate the design of many intermediate languages.
Firstly, an intermediate language can already provide sufficient automation
support for some tasks. Thus, it can serve as a more natural and easier-to-use
target language for (partial) formalization if the task at hand is supported. For
example, search, interactive documents, or dependency management can be real-
ized well in some intermediate languages and even benefit from structural similar-
ity to the human-near natural language formulation. The main counter-examples
are verification and computation, which requires a lot more formalization. This
is indicated in Fig. 1 (left).
Secondly, an intermediate language can serve as an interface between human-
near natural language and a verification- or computation-oriented formal lan-
guage. This enables stepwise formalization and thus a smoother transition from
the informal to the formal realm. It may also allow for a separation of concerns
where a domain experts transform content from informal to intermediate in a
first step and a formalization transforms from intermediate to formal in a second
step. The relative lack of highly successful approaches in this style is indicated
in Fig. 1 (right).
Thirdly, the intermediate representation is often not or only barely com-
mitted to a particular formal language (e.g., a particular type system, module
system, proof system or computation system). During stepwise formalization,
this means that the first step only needs to be done once and can then be reused
for different second steps targeting different formal languages. Expanding on
this, we see that an intermediate language can provide an interoperability layer
between formal languages. That can help with the notorious lack of interoper-
ability between formal systems (see also Sect. 6).
146 C. Kaliszyk and F. Rabe
systems that would be critical to capture the complexity of large scale formal
libraries. A partial exception is the second author’s Mmt system, which combines
aspects of standard languages [DIK+16,KMP+17] and prover interchange lan-
guages [BRS08,HR15,KRS16] with hard type and module systems [RK13]. The
OAF project [KR16,KR20] used Mmt to represent large libraries of proof assis-
tants in a standard representation language, including those of Mizar in [IKR11],
HOL Light in [KR14], PVS in [KMOR17] (including the NASA library), Coq in
[MRS19] (including all available libraries), and Isabelle in [KRW20] (including
the Archive of Formal Proofs).
5 Language Frameworks
Language frameworks are formal languages in which the syntax and semantics
of other languages can be represented. They are superficially related to parser
frameworks but much stronger because they (i) allow specifying not only the
syntax but also the semantics of a language, (ii) often offer strong support for
context-sensitivity, which is critical in mathematics.
Logical frameworks are language frameworks for building formal language.
Examples are Isabelle [Pau94], Dedukti [BCH12], λProlog [MN86], or the LF
[HHP93] family including Twelf [PS99] and others. Frameworks also exist for
building controlled natural languages such as GF [Ran11].
Contrary to the approaches discussed above, these frameworks do not in
themselves provide languages for formalizing mathematics. But they are worth
discussing in this context for two reasons: Firstly, they allow the rapid proto-
typing of implementations, which speeds up the feedback loop between language
design and applications. Thus, users can experiment with new languages and
conduct large case studies in parallel. Secondly, they allow developing scalable
applications language-independently such that they are immediately applicable
for any language defined in the framework. That is important because evaluat-
ing formal languages often requires building (or trying to build) large libraries in
them. Such applications include at least parsing and type-checking but can also
include meta-reasoning (e.g., Twelf), interactive theorem proving (e.g., Isabelle),
or language translation (e.g., GF).
Despite many successes in representing logical languages in logical frame-
works (e.g., [CHK+11,KR14,KMOR17,MRS19]), current frameworks cover only
unrealistically simple languages compared to the needs for mathematically struc-
tured content and do not have good support for, e.g., soft type systems and soft
module systems and practical proof systems. Thus, even the representation of
the already insufficient languages discussed above is often very difficult or not
possible.
Therefore, more flexible logical frameworks were developed recently. Both
ELPI [DGST15] and Mmt [Rab17,Rab18] allow users to flexibly change critical
algorithms whenever a concrete language definition needs it. That makes them
more promising for representing languages designed for mathematical content
(and can even allow sharing some functionality across incompatible foundations).
A Survey of Languages for Formalizing Mathematics 149
6 Interchange Libraries
The quest for the best formal language for mathematics is likely to never-ending.
Therefore, it is important to investigate how to combine the existing libraries of
formalized content. Due to major incompatibilities between the various formal
systems, this is an extremely difficult problem, and it would go beyond the scope
of this paper to discuss approaches in detail. But we want to mention the idea
of interchange libraries because we consider it to be one of the most promising
ideas.
An interchange library I is a formalization of mathematics written in an inter-
mediate language with the goal of serving as an interoperability layer between
formal systems. The main idea is that all translations from source system S to
target system T are split into two steps S → I and I → T .
Both steps have characteristic difficulties. The step S → I is usually a partial
translation because every formal systems uses idiosyncratic features that cannot
be represented in I and optimizations for verification/computation that need
not be represented in I. The step I → T tends to be easier, but there is a tricky
trade-off in the design of I: the less I commits to a particular formal system, the
more systems T can be handled but the more difficult the individual translations
I → T become. In practice, a further major logistic problem is that I and the
translations via it needs to be built and maintained, which is even harder to
organize and fund than for the systems S and T themselves.
The standard content dictionaries written in OpenMath [BCC+04] were the
first concerted effort to build an interchange library. 214 dictionaries (including
contributed ones) declaring 1578 symbols are maintained by the OpenMath Soci-
ety. These focus on declaring names for mathematical symbols and describing
their semantics verbally and with formal axioms. However, the approach was
not widely adopted as little tool support existed for OpenMath itself and for
OpenMath-based interoperability. Individual formal systems were also less able
to export/import their objects at all.
150 C. Kaliszyk and F. Rabe
Recently, the idea was picked up again in the OpenDreamKit project. It uses
Mmt (whose language of theories and expressions essentially subsumes Open-
Math CDs and objects) to write a formal interchange library (dubbed MitM for
Math-in-the-middle) [DIK+16]. MitM is more formal than the OpenMath CDs,
in particular employing a hard type and module system. It was used as an inter-
operability layer for computer algebra systems [KMP+17] and mathematical
databases [WKR17,BKR19].
A complementary approach is SMGloM [GIJ+16], a multi-lingual glossary of
mathematical concepts. It retains the untyped natural of OpenMath CDs but
uses sTeX to obtain tool support for writing the library.
SMGloM and MitM serve similar purposes with different methods that recall
the distinctions described in Sect. 2: SMGloM uses mostly natural language,
and MitM uses formal language with hard type and module system. The short-
comings of these efforts seem to indicate that soft types and modules may be
the best trade-off for building an interchange library.
In order to streamline the process of building the translations S → I and
I → T , the concept of alignments was developed [KKMR16]. An alignment
between two symbols c and c in different libraries captures that translations
should try to translate objects with c to objects with head d. Both exact manual
efforts [MRLR17] and machine learning–based heuristic approaches were used
to find alignments across formal libraries. The latter includes alignment from six
proof assistants [GK19], showing that such alignments allow both conjecturing
and more powerful automation [GK15]. The same approach has been used to
obtain alignments between informal and formal libraries, which can be used to
automatically formalize parts of mathematical texts, both statistically [KUV17]
and using deep learning techniques [WKU18]. Similarly, [GC14] automatically
obtains alignments between informal libraries.
7 Conclusion
References
[ABC+10] Ausbrooks, R., et al.: Mathematical Markup Language (MathML) Version
3.0. Technical report, World Wide Web Consortium (2010). http://www.
w3.org/TR/MathML3
[Abr96] Abrial, J.: The B-Book: Assigning Programs to Meanings. Cambridge Uni-
versity Press, Cambridge (1996)
[Art] Arthan, R.: ProofPower. http://www.lemma-one.com/ProofPower/
[ASTZ06] Asperti, A., Coen, C.S., Tassi, E., Zacchiroli, S.: Crafting a proof assis-
tant. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol.
4502, pp. 18–32. Springer, Heidelberg (2007). https://doi.org/10.1007/978-
3-540-74464-1 2
[BCC+04] Buswell, S., Caprotti, O., Carlisle, D., Dewar, M., Gaetano, M., Kohlhase,
M.: The Open Math Standard, Version 2.0. Technical report, The Open
Math Society (2004). http://www.openmath.org/standard/om20
[BCH12] Boespflug, M., Carbonneaux, Q., Hermant, O.: The λΠ-calculus modulo as
a universal proof language. In: Pichardie, D., Weber, T. (eds.) Proceedings
of PxTP2012: Proof Exchange for Theorem Proving, pp. 28–43 (2012)
[BKP19] Brown, C., Kaliszyk, C., Pak, K.: Higher-order Tarski Grothendieck as
a foundation for formal proof. In: Harrison, J., O’Leary, J., Tolmach, A.
(eds.) Interactive Theorem Proving. LIPIcs, vol. 141, pp. 9:1–9:16 (2019)
[BKR19] Berčič, K., Kohlhase, M., Rabe, F.: Towards a unified mathematical data
infrastructure: database and interface generation. In: Kaliszyk, C., Brady,
E., Kohlhase, A., Sacerdoti Coen, C. (eds.) CICM 2019. LNCS (LNAI),
vol. 11617, pp. 28–43. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-23250-4 3
[BRS08] Benzmüller, C., Rabe, F., Sutcliffe, G.: THF0 – the core of the TPTP
language for higher-order logic. In: Armando, A., Baumgartner, P., Dowek,
G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 491–506. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-71070-7 41
[CAB+86] Constable, R., et al.: Implementing Mathematics with the Nuprl Develop-
ment System. Prentice-Hall, Upper Saddle River (1986)
[CF58] Curry, H., Feys, R.: Combinatory Logic. North-Holland, Amsterdam (1958)
[CFK+09] Cramer, M., Fisseni, B., Koepke, P., Kühlwein, D., Schröder, B., Veldman,
J.: The Naproche project controlled natural language proof checking of
mathematical texts. In: Fuchs, N.E. (ed.) CNL 2009. LNCS (LNAI), vol.
5972, pp. 170–186. Springer, Heidelberg (2010). https://doi.org/10.1007/
978-3-642-14418-9 11
[CH88] Coquand, T., Huet, G.: The calculus of constructions. Inf. Comput.
76(2/3), 95–120 (1988)
[CHK+11] Codescu, M., Horozal, F., Kohlhase, M., Mossakowski, T., Rabe, F.:
Project abstract: logic atlas and integrator (LATIN). In: Davenport, J.H.,
Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS (LNAI), vol.
6824, pp. 289–291. Springer, Heidelberg (2011). https://doi.org/10.1007/
978-3-642-22673-1 24
152 C. Kaliszyk and F. Rabe
[Chu40] Church, A.: A formulation of the simple theory of types. J. Symb. Log.
5(1), 56–68 (1940)
[Coq15] Coq Development Team: The Coq proof assistant: reference manual. Tech-
nical report, INRIA (2015)
[DGST15] Dunchev, C., Guidi, F., Sacerdoti Coen, C., Tassi, E.: ELPI: fast,
embeddable, λprolog interpreter. In: Davis, M., Fehnker, A., McIver, A.,
Voronkov, A. (eds.) LPAR 2015. LNCS, vol. 9450, pp. 460–468. Springer,
Heidelberg (2015). https://doi.org/10.1007/978-3-662-48899-7 32
[DIK+16] Dehaye, P.-O., et al.: Interoperability in the OpenDreamKit project: the
math-in-the-middle approach. In: Kohlhase, M., Johansson, M., Miller, B.,
de de Moura, L., Tompa, F. (eds.) CICM 2016. LNCS (LNAI), vol. 9791,
pp. 117–131. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
42547-4 9
[dKA+15] de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The
lean theorem prover (system description). In: Felty, A.P., Middeldorp, A.
(eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham
(2015). https://doi.org/10.1007/978-3-319-21401-6 26
[edi07] Common Logic editors: Common Logic (CL) – A framework for a family
of logic-based languages. Technical Report 24707, ISO/IEC (2007)
[FGT93] Farmer, W., Guttman, J., Thayer, F.: IMPS: an interactive mathematical
proof system. J. Autom. Reason. 11(2), 213–248 (1993)
[GAA+13] Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In:
Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol.
7998, pp. 163–179. Springer, Heidelberg (2013). https://doi.org/10.1007/
978-3-642-39634-2 14
[Gan13] Ganesalingam, M.: The language of mathematics. In: Ganesalingam, M.
(ed.) The Language of Mathematics. LNCS, vol. 7805, pp. 17–38. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-37012-0 2
[GC14] Ginev, D., Corneli, J.: NNexus reloaded. In: Watt, S.M., Davenport, J.H.,
Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS (LNAI), vol.
8543, pp. 423–426. Springer, Cham (2014). https://doi.org/10.1007/978-
3-319-08434-3 31
[GGMR09] Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathe-
matical structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M.
(eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg
(2009). https://doi.org/10.1007/978-3-642-03359-9 23
[GIJ+16] Ginev, D., et al.: The SMGloM project and system: towards a terminology
and ontology for mathematics. In: Greuel, G.-M., Koch, T., Paule, P.,
Sommese, A. (eds.) ICMS 2016. LNCS, vol. 9725, pp. 451–457. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-42432-3 58
[GK15] Gauthier, T., Kaliszyk, C.: Sharing HOL4 and HOL light proof knowledge.
In: Davis, M., Fehnker, A., McIver, A., Voronkov, A. (eds.) LPAR 2015.
LNCS, vol. 9450, pp. 372–386. Springer, Heidelberg (2015). https://doi.
org/10.1007/978-3-662-48899-7 26
[GK19] Gauthier, T., Kaliszyk, C.: Aligning concepts across proof assistant
libraries. J. Symb. Comput. 90, 89–123 (2019)
[Hal12] Hales, T.: Dense Sphere Packings: A Blueprint for Formal Proofs. London
Mathematical Society Lecture Note Series, vol. 400. Cambridge University
Press (2012)
[Hal14] Hales, T.: Developments in formal proofs. Séminaire Bourbaki, 1086, 2013–
2014. arxiv.org/abs/1408.6474
A Survey of Languages for Formalizing Mathematics 153
[Meg07] Megill, N.: Metamath: A Computer Language for Pure Mathematics. Lulu
Press, Morrisville (2007)
[Mil72] Milner, R.: Logic for computable functions: descriptions of a machine
implementation. ACM SIGPLAN Not. 7, 1–6 (1972)
[ML74] Martin-Löf, P.: An intuitionistic theory of types: predicative part. In: Pro-
ceedings of the ’73 Logic Colloquium, pp. 73–118. North-Holland (1974)
[MN86] Miller, D.A., Nadathur, G.: Higher-order logic programming. In: Shapiro,
E. (ed.) ICLP 1986. LNCS, vol. 225, pp. 448–462. Springer, Heidelberg
(1986). https://doi.org/10.1007/3-540-16492-8 94
[MR19] Müller, D., Rabe, F.: Rapid prototyping formal systems in MMT: case
studies. In: Miller, D., Scagnetto, I. (eds.) Logical Frameworks and Meta-
languages: Theory and Practice, pp. 40–54 (2019)
[MRK18] Müller, D., Rabe, F., Kohlhase, M.: Theories as types. In: Galmiche, D.,
Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900,
pp. 575–590. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
94205-6 38
[MRLR17] Müller, D., Rothgang, C., Liu, Y., Rabe, F.: Alignment-based translations
across formal systems using interface theories. In: Dubois, C., Woltzenlogel
Paleo, B. (eds.) Proof eXchange for Theorem Proving, pp. 77–93. Open
Publishing Association (2017)
[MRS19] Müller, D., Rabe, F., Sacerdoti Coen, C.: The Coq library as a theory
graph. In: Kaliszyk, C., Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.)
CICM 2019. LNCS (LNAI), vol. 11617, pp. 171–186. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-23250-4 12
[Nau14] Naumowicz, A.: SAT-enhanced Mizar proof checking. In: Watt, S.M., Dav-
enport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS
(LNAI), vol. 8543, pp. 449–452. Springer, Cham (2014). https://doi.org/
10.1007/978-3-319-08434-3 37
[Nor05] Norell, U.: The Agda WiKi (2005). http://wiki.portal.chalmers.se/agda
[NPW02] Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL—A Proof
Assistant for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg
(2002). https://doi.org/10.1007/3-540-45949-9
[ORS92] Owre, S., Rushby, J.M., Shankar, N.: PVS: a prototype verification system.
In: Kapur, D. (ed.) CADE 1992. LNCS, vol. 607, pp. 748–752. Springer,
Heidelberg (1992). https://doi.org/10.1007/3-540-55602-8 217
[Pau94] Paulson, L.: Isabelle: A Generic Theorem Prover. LNCS, vol. 828. Springer,
Heidelberg (1994). https://doi.org/10.1007/BFb0030541
[PC93] Paulson, L., Coen, M.: Zermelo-Fraenkel Set Theory. Isabelle distribution,
ZF/ZF.thy (1993)
[PS99] Pfenning, F., Schürmann, C.: System description: Twelf - a meta-logical
framework for deductive systems. In: Ganzinger, H. (ed.) Automated
Deduction, pp. 202–206 (1999)
[Rab13] Rabe, F.: A logical framework combining model and proof theory. Math.
Struct. Comput. Sci. 23(5), 945–1001 (2013)
[Rab17] Rabe, F.: How to identify, translate, and combine logics? J. Log. Comput.
27(6), 1753–1798 (2017)
[Rab18] Rabe, F.: A Modular type reconstruction algorithm. ACM Trans. Comput.
Log. 19(4), 1–43 (2018)
[Ran11] Ranta, A.: Grammatical Framework: Programming with Multilingual
Grammars. CSLI Publications, Stanford (2011)
156 C. Kaliszyk and F. Rabe
[RK13] Rabe, F., Kohlhase, M.: A scalable module system. Inf. Comput. 230(1),
1–54 (2013)
[RS09] Rabe, F., Schürmann, C.: A practical module system for LF. In: Cheney,
J., Felty, A. (eds.) Proceedings of the Workshop on Logical Frameworks:
Meta-Theory and Practice (LFMTP), pp. 40–48. ACM Press (2009)
[S+13] Stein, W., et al.: Sage Mathematics Software. The Sage Development Team
(2013). http://www.sagemath.org
[SSCB12] Sutcliffe, G., Schulz, S., Claessen, K., Baumgartner, P.: The TPTP typed
first-order form with arithmetic. In: Bjørner, N., Voronkov, A. (eds.) LPAR
2012. LNCS, vol. 7180, pp. 406–419. Springer, Heidelberg (2012). https://
doi.org/10.1007/978-3-642-28717-6 32
[Sut09] Sutcliffe, G.: The TPTP problem library and associated infrastructure: the
FOF and CNF Parts, v3.5.0. J. Autom. Reason. 43(4), 337–362 (2009)
[TB85] Trybulec, A., Blair, H.: Computer assisted reasoning with MIZAR. In:
Joshi, A. (eds.) Proceedings of the 9th International Joint Conference on
Artificial Intelligence, pp. 26–28. Morgan Kaufmann (1985)
[Wen11] Wenzel, M.: Isabelle as document-oriented proof assistant. In: Davenport,
J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS (LNAI),
vol. 6824, pp. 244–259. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-22673-1 17
[Wie07] Wiedijk, F.: The QED manifesto revisited. In: From Insight to Proof,
Festschrift in Honour of Andrzej Trybulec, pp. 121–133 (2007)
[Win14] Windsteiger, W.: Theorema 2.0: a system for mathematical theory explo-
ration. In: Hong, H., Yap, C. (eds.) ICMS 2014. LNCS, vol. 8592, pp. 49–52.
Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44199-2 9
[WKR17] Wiesing, T., Kohlhase, M., Rabe, F.: Virtual theories – a uniform inter-
face to mathematical knowledge bases. In: Blömer, J., Kotsireas, I.S., Kut-
sia, T., Simos, D.E. (eds.) MACIS 2017. LNCS, vol. 10693, pp. 243–257.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72453-9 17
[WKU18] Wang, Q., Kaliszyk, C., Urban, J.: First experiments with neural trans-
lation of informal to formal mathematics. In: Rabe, F., Farmer, W.M.,
Passmore, G.O., Youssef, A. (eds.) CICM 2018. LNCS (LNAI), vol. 11006,
pp. 255–270. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
96812-4 22
[Wol12] Wolfram. Mathematica (2012)
OntoMathEdu : A Linguistically Grounded
Educational Mathematical Ontology
1 Introduction
We present the first release of OntoMathEdu , a new educational mathematical
ontology. This ontology is intended to be:
– A Linked Open Data hub for mathematical education. In this respect, the
ontology lies at the intersection of two long-established trends of using LOD
for educational purposes [1–4] and for mathematical knowledge manage-
ment [5,6].
– A linguistic resource for common mathematical language processing. In this
respect, the ontology can complement mathematical linguistic resources, such
as SMGloM [7,8], and serve as an interface between raw natural language
texts and mathematical knowledge management applications.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 157–172, 2020.
https://doi.org/10.1007/978-3-030-53518-6_10
158 A. Kirillovich et al.
– An end-user reference educational database, and play the same role in sec-
ondary school math, that PlanetMath or MathWorld play in professional
mathematics.
This ontology is a central component of the digital educational platform
under development, which is intended for solving such tasks as: (1) automatic
questions generation; (2) automatic recommendation of educational materials
according to an individual study plan; (3) semantic annotation of educational
materials.
In the development of OntoMathEdu we would rely on our experience of the
development of OntoMathP RO (http://ontomathpro.org/) [9], an ontology of
professional mathematics. This ontology underlies a semantic publishing plat-
form [10,11], that takes as an input a collection of mathematical papers in LATEX
format and builds their ontology-based Linked Open Data representation. The
semantic publishing platform, in turn, is a central component of OntoMath digi-
tal ecosystem [12,13], an ecosystem of ontologies, text analytics tools, and appli-
cations for mathematical knowledge management, including semantic search
for mathematical formulas [14] and a recommender system for mathematical
papers [15].
Despite the fact that OntoMathP RO has proved to be effective in several
educational applications, such as assessment of the competence of students [9]
and recommendation of educational materials in Virtual Learning Communities
[16–19], its focus on professional mathematics rather than on education prevents
it to be a strong foundation for the digital educational platform. The main
differences between OntoMathP RO and a required educational ontology are the
following:
– Conceptualization. OntoMathP RO ontology specifies a conceptualization of
professional mathematics, whilst the required educational ontology must spec-
ify a conceptualization of school mathematics. These conceptualizations are
noticeably different, for example, in school conceptualization, Number is a
primitive notion, while in professional conceptualization it is defined as a
subclass of Set.
– Selection of concepts. The required educational ontology must contain con-
cepts from a school mathematics curriculum.
– Terminology. Concepts of OntoMathP RO ontology are denoted by profes-
sional terms, whilst concepts of the required educational ontology must be
denoted by school math terms. There isn’t so much difference between pro-
fessional and educational terminology in English, but this difference is more
salient in such languages as Russian or Tatar. For example, the term ‘mno-
gochlen’ (the native word for ‘polynom’) should be used instead of the pro-
fessional term ‘polinom’ (the Greek loan word with the same meaning) in
educational environment.
– Prerequisite relations. In the required educational ontology, logical relations
between concepts must be complemented with prerequisite ones. The concept
A is called a prerequisite for the concept B, if a learner must study the
concept A before approaching the concept B. For example, comprehension of
OntoMathEdu 159
2 Ontology Structure
According to the project, OntoMathEdu ontology is organized in three layers:
1. Foundational ontology layer, where a chosen foundational ontology is
UFO [22].
2. Domain ontology layer, which contains language-independent math con-
cepts from the secondary school mathematics curriculum. The concepts are
grouped into several modules, including the general concepts module and
modules for disciplines of mathematics, e.g. Arithmetic, Algebra and Plane
Geometry. The concepts will be interlinked with external LOD resources, such
as DBpedia [23], ScienceWISE [24] and OntoMathP RO . Additionally, relay-
ing on the MMT URIs scheme [25], the concepts can be aligned with MitM
ontology [26], and through it with the concepts of several computer algebra
systems.
3. Linguistic layer, containing multilingual lexicons, that provide linguistic
grounding of the concepts from the domain ontology layer. The lexicons will
be interlinked with the external lexical resources from the Linguistic Linked
Open Data (LLOD) cloud [27,28], first of all in English [29,30], Russian [31]
and Tatar [32] (Fig. 1).
160 A. Kirillovich et al.
contains its name in English, Russian and Tatar, axioms, relations with other
concepts, and links to external resources of the LOD cloud and educational ref-
erence databases.
The concepts are organized in two main hierarchies: the hierarchy of objects
and the hierarchy of reified relationships.
The top level of the hierarchy of objects consists of the following classes:
1. Plane Figure, with subclasses such as Line, Polygon, Ellipse, Angle, Median
of a Triangle or Circumscribed Circle.
2. Plane Geometry Statement, with subclasses such as Axiom of construction of
a circle with a given center and radius or Pythagorean Theorem.
3. Plane Geometry Problem with subclasses such as Problem of straightedge and
compass construction or Heron’s problem.
4. Plane Geometry Method with subclasses such as Constructing an additional
line for solving plane geometry problem.
5. Unit of Measurement, with subclasses such as Centimeter, Radian, or Square
meter.
6. Measurement and Construction Tool, with subclasses such as Protractor,
Astrolabe, T -square, Sliding T bevel, or Marking gauge.
3. Has part relation. For example, any Vertex of a Triangle is a part of a Tri-
angle.
4. Aboutness relation that holds between a Statement and the subject matter
of this statement. For example, Heron’s formula is related to the Area of a
polygon concept.
5. Prerequisite relation. The concept A is called a prerequisite for the concept
B, if a learner must study the concept A before approaching the concept B.
In the first release of the ontology, these relations are introduced only indi-
rectly in coarse-grained manner by arrangement of the concepts by successive
educational levels.
6. Belongs to educational level, that binds a concept and an educational level
(such as an age of leaning) at which the concept is firstly introduced.
7. External resource, that interlinks a concept and an external Linked Open
Data or reference educational resource describing this concept.
4 Linguistic Layer
The linguistic layer contains multilingual lexicons, that provide linguistic ground-
ing of the concepts from the domain ontology layer.
Currently we are developing Russian and English lexicons and are going to
develop the lexicon for Tatar.
A lexicon consists in:
– Lexical entries, denoting mathematical concepts. Examples of lexical entries
are “triangle”, “right triangle”, “side of a polygon”, “Riemann integral of f
over x from a to b”, “to intersect”, “to touch”, etc.
– Forms of lexical entries (in different numbers, cases, tenses, etc).
– Syntactic trees of multi-word lexical entries.
– Syntactic frames of lexical entries. A syntactic frame represents the syntactic
behavior of a predicate, defining the set of syntactic arguments this predicate
requires and their mappings to ontological entities. For example, a syntactic
frame of the “to touch” verb determines that in “X touches Y at Z” phrase,
subject X represents a tangent line to a curve, direct object Y represents the
curve, and prepositional adjunct Z represents the point of tangency.
166 A. Kirillovich et al.
Fig. 6. (continued)
The lexicons are expressed in terms of Lemon [43,44], LexInfo, OLiA [45]
and PreMOn [46] ontologies.
Figure 6 represents an example of the “to touch” verb, its canonical form,
syntactic frame and lexical sense. The syntactic frame defines three arguments
of this verb: a subject, a direct object and an optional prepositional adjunct,
168 A. Kirillovich et al.
marked by the “at” preposition. The lexical sense defines a mapping of the verb
and its syntactic arguments to the corresponding ontological concepts. According
to the mapping, the verb denotes the reified relationship between a tangent
line and a curve, while the syntactic arguments express the participants of this
relationship: the subject expresses a tangent line to a curve, the direct object
expresses the curve, and the prepositional adjunct expresses the tangent point.
5 Conclusions
In this paper, we present the first release of OntoMathEdu , a new educational
mathematical ontology.
While there are many educational ontologies on the one hand, and several
mathematical ontologies on the other, to our knowledge, OntoMathEdu is the
first general-purpose educational mathematical ontology. Additionally, it is the
first Linked Open Data mathematical ontology, intended to: (1) respect ontologi-
cal distinctions provided by a foundational ontology; (2) represent mathematical
relationships as first-order entities; and (3) provide strong linguistic grounding
for the represented mathematical concepts.
Currently, our first priority is to release the linguistic layer of the ontology
that is still under development and hasn’t been published yet. After that, we will
extend the ontology to other fields of secondary school mathematics curriculum,
such as Arithmetic, Algebra and Trigonometry.
Finally, we are going to apply the modeling principles, drafted on this project,
in the development of the new revised version of the ontology of professional
mathematics OntoMathP RO .
Acknowledgements. The first part of the work, the development of the domain
ontology layer, was partially funded by RFBR, projects # 19-29-14084. The second
part of the work, the development of the linguistic layer, was funded by Russian Science
Foundation according to the research project no. 19-71-10056.
References
1. Pereira, C.K., Matsui Siqueira, S.W., Nunes, B.P., Dietze, S.: Linked data in edu-
cation: a survey and a synthesis of actual research and future challenges. IEEE
Trans. Learn. Technol. 11(3), 400–412 (2018). https://doi.org/10.1109/TLT.2017.
2787659
2. d’Aquin, M.: On the use of linked open data in education: current and future
practices. In: Mouromtsev, D., d’Aquin, M. (eds.) Open Data for Education. LNCS,
vol. 9500, pp. 3–15. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
30493-9 1
3. Taibi, D., Fulantelli, G., Dietze, S., Fetahu, B.: Educational linked data on the web
- exploring and analysing the scope and coverage. In: Mouromtsev, D., d’Aquin,
M. (eds.) Open Data for Education. LNCS, vol. 9500, pp. 16–37. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-30493-9 2
OntoMathEdu 169
4. Nahhas, S., Bamasag, O., Khemakhem, M., Bajnaid, N.: Added values of linked
data in education: a survey and roadmap. Computers 7(3) (2018). https://doi.org/
10.3390/computers7030045
5. Lange, C.: Ontologies and languages for representing mathematical knowledge on
the Semantic Web. Semant. Web 4(2), 119–158 (2013). https://doi.org/10.3233/
SW-2012-0059
6. Elizarov, A.M., Kirillovich, A.V., Lipachev, E.K., Nevzorova, O.A., Solovyev, V.D.,
Zhiltsov, N.G.: Mathematical knowledge representation: semantic models and for-
malisms. Lobachevskii J. Math. 35(4), 348–354 (2014). https://doi.org/10.1134/
S1995080214040143
7. Ginev, D., et al.: The SMGloM project and system: towards a terminology and
ontology for mathematics. In: Greuel, G.-M., Koch, T., Paule, P., Sommese, A.
(eds.) ICMS 2016. LNCS, vol. 9725, pp. 451–457. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-42432-3 58
8. Kohlhase, M.: A data model and encoding for a semantic, multilingual terminology
of mathematics. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J.
(eds.) CICM 2014. LNCS (LNAI), vol. 8543, pp. 169–183. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-08434-3 13
9. Nevzorova, O.A., Zhiltsov, N., Kirillovich, A., Lipachev, E.: OntoMath PRO ontol-
ogy: a linked data hub for mathematics. In: Klinov, P., Mouromtsev, D. (eds.)
KESW 2014. CCIS, vol. 468, pp. 105–119. Springer, Cham (2014). https://doi.
org/10.1007/978-3-319-11716-4 9
10. Nevzorova, O., et al.: Bringing math to LOD: a semantic publishing platform
prototype for scientific collections in mathematics. In: Alani, H., et al. (eds.) ISWC
2013. LNCS, vol. 8218, pp. 379–394. Springer, Heidelberg (2013). https://doi.org/
10.1007/978-3-642-41335-3 24
11. Elizarov, A.M., Lipachev, E.K., Nevzorova, O.A., Solov’ev, V.D.: Methods and
means for semantic structuring of electronic mathematical documents. Dokl. Math.
90(1), 521–524 (2014). https://doi.org/10.1134/S1064562414050275
12. Elizarov, A., Kirillovich, A., Lipachev, E., Nevzorova, O.: Digital ecosystem
OntoMath: mathematical knowledge analytics and management. In: Kalinichenko,
L., Kuznetsov, S.O., Manolopoulos, Y. (eds.) DAMDID/RCDL 2016. CCIS, vol. 706,
pp. 33–46. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57135-5 3
13. Elizarov, A.M., Zhiltsov, N.G., Kirillovich, A.V., Lipachev, E.K., Nevzorova, O.A.,
Solovyev, V.D.: The OntoMath ecosystem: ontologies and applications for math
knowledge management. In: Semantic Representation of Mathematical Knowledge
Workshop, 5 February 2016. http://www.fields.utoronto.ca/video-archive/2016/
02/2053-14698
14. Elizarov, A., Kirillovich, A., Lipachev, E., and Nevzorova, O.: Semantic formula
search in digital mathematical libraries. In: Proceedings of the 2nd Russia and
Pacific Conference on Computer Technology and Applications (RPC 2017), pp.
39–43. IEEE (2017). https://doi.org/10.1109/RPC.2017.8168063
15. Elizarov, A.M., Zhizhchenko, A.B., Zhil’tsov, N.G., Kirillovich, A.V., Lipachev,
E.K.: Mathematical knowledge ontologies and recommender systems for collections
of documents in physics and mathematics. Dokl. Math. 93(2), 231–233 (2016).
https://doi.org/10.1134/S1064562416020174
16. Barana, A., Di Caro, L., Fioravera, M., Marchisio, M., Rabellino, S.: Ontology
development for competence assessment in virtual communities of practice. In:
Penstein Rosé, C., et al. (eds.) AIED 2018, Part II. LNCS (LNAI), vol. 10948, pp.
94–98. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2 18
170 A. Kirillovich et al.
17. Barana, A., Di Caro, L., Fioravera, M., Floris, F., Marchisio, M., Rabellino, S.:
Sharing system of learning resources for adaptive strategies of scholastic remedial
intervention. In: Proceedings of the 4th International Conference on Higher Edu-
cation Advances (HEAd 2018), pp. 1495–1503. Editorial Universitat Politècnica de
València (2018). https://doi.org/10.4995/HEAd18.2018.8232
18. Marchisio, M., Di Caro, L., Fioravera, M., Rabellino, S.: Towards adaptive sys-
tems for automatic formative assessment in virtual learning communities. In: Sorel
Reisman, et al. (eds.) Proceedings of the 42nd IEEE Annual Computer Software
and Applications Conference (COMPSAC 2018), pp. 1000–1005. IEEE (2018).
https://doi.org/10.1109/COMPSAC.2018.00176
19. Barana, A., Di Caro, L., Fioravera, M., Floris, F., Marchisio, M., Rabellino,
S.: Developing competence assessment systems in e-learning communities. In:
Volun-geviciene, A., Szücs, A. (eds.) Proceedings of the European Distance and E-
Learning Network 2018 Annual Conference: Exploring the Micro, Meso and Macro
(EDEN 2018), pp. 879–888. EDEN (2018)
20. Kirillovich, A., Nevzorova, O., Falileeva, M., Lipachev, E., Shakirova, L.:
OntoMathEdu: towards an educational mathematical ontology. In: Kaliszyk, C.,
et al. (eds.) Workshop Papers at 12th Conference on Intelligent Computer Math-
ematics (CICM-WS 2019). CEUR Workshop Proceedings (2019, forthcoming)
21. Ranta, A.: Syntactic categories in the language of mathematics. In: Dybjer, P.,
Nordström, B., Smith, J. (eds.) TYPES 1994. LNCS, vol. 996, pp. 162–182.
Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60579-7 9
22. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models. CTIT,
Enschede (2005)
23. Lehmann, J., et al.: DBpedia: a large-scale, multilingual knowledge base extracted
from Wikipedia. Semant. Web J. 6(2), 167–195 (2015). https://doi.org/10.3233/
SW-140134
24. Astafiev, A., Prokofyev, R., Guéret, C., Boyarsky, A., Ruchayskiy, O.: Science-
WISE: a web-based interactive semantic platform for paper annotation and ontol-
ogy editing. In: Simperl, E., et al. (eds.) ESWC 2012. LNCS, vol. 7540, pp. 392–396.
Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46641-4 33
25. Müller, D., Gauthier, T., Kaliszyk, C., Kohlhase, M., Rabe, F.: Classification of
alignments between concepts of formal mathematical systems. In: Geuvers, H.,
England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS (LNAI),
vol. 10383, pp. 83–98. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
62075-6 7
26. Dehaye, P.-O., et al.: Interoperability in the OpenDreamKit project: the math-in-
the-middle approach. In: Kohlhase, M., Johansson, M., Miller, B., de de Moura,
L., Tompa, F. (eds.) CICM 2016. LNCS (LNAI), vol. 9791, pp. 117–131. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-42547-4 9
27. McCrae, J.P., et al.: The open linguistics working group: developing the linguistic
linked open data cloud. In: Calzolari N., et al. (eds.) Proceedings of the 10th
International Conference on Language Resources and Evaluation (LREC 2016),
pp. 2435–2441. ELRA (2016)
28. Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic linked open data
cloud. Linguistic Linked Data, pp. 29–41. Springer, Cham (2020). https://doi.org/
10.1007/978-3-030-30225-2 3
29. McCrae, J.P., Fellbaum, C., Cimiano, P.: Publishing and linking WordNet using
lemon and RDF. In: Chiarcos, C., et al. (eds.) Proceedings of the 3rd Workshop
on Linked Data in Linguistics (LDL-2014), pp. 13–16. ELRA (2014)
OntoMathEdu 171
30. Ehrmann, M., Cecconi, F., Vannella, D., McCrae, J., Cimiano, P., Navigli, R.:
Representing multilingual data as linked data: the case of BabelNet 2.0. In: Calzo-
lari N., et al. (eds.) Proceedings of the 9th International Conference on Language
Resources and Evaluation (LREC 2014), pp. 401–408. ELRA (2014)
31. Kirillovich, A., Nevzorova, O., Gimadiev, E., Loukachevitch, N.: RuThes Cloud:
towards a multilevel linguistic linked open data resource for Russian. In: Różewski,
P., Lange, C. (eds.) KESW 2017. CCIS, vol. 786, pp. 38–52. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-69548-8 4
32. Galieva, A., Kirillovich, A., Khakimov, B., Loukachevitch, N., Nevzorova, O.,
Suleymanov, D.: Toward domain-specific Russian-tatar thesaurus construction. In:
Proceedings of the International Conference IMS-2017, pp. 120–124. ACM (2017).
https://doi.org/10.1145/3143699.3143716
33. Ganesalingam, M.: The Language of Mathematics, vol. 7805. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-37012-0
34. Guarino, N., Welty, C.A.: A formal ontology of properties. In: Dieng, R., Corby,
O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 97–112. Springer, Heidelberg
(2000). https://doi.org/10.1007/3-540-39967-4 8
35. Noy, N., Rector, A.: Defining N-ary Relations on the Semantic Web. W3C Working
Group Note, 12 April 2006. https://www.w3.org/TR/swbp-n-aryRelations/
36. Borgo, S., Masolo, C.: Ontological foundations of DOLCE. In: Poli, R., Healy,
M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applica-
tions, pp. 279–295. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-
481-8847-5 13
37. Borgo, S., Masolo, C.: Foundational choices in DOLCE. In: Staab, S., Studer, R.
(eds.) Handbook on Ontologies. IHIS, pp. 361–381. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-540-92673-3 16
38. Gangemi, A., Mika, P.: Understanding the semantic web through descriptions and
situations. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS,
vol. 2888, pp. 689–706. Springer, Heidelberg (2003). https://doi.org/10.1007/978-
3-540-39964-3 44
39. Brasileiro, F., Almeida, J.P.A., Carvalho, V.A., Guizzardi, G.: Expressive multi-
level modeling for the semantic web. In: Groth, P., et al. (eds.) ISWC 2016, Part I.
LNCS, vol. 9981, pp. 53–69. Springer, Cham (2016). https://doi.org/10.1007/978-
3-319-46523-4 4
40. Carvalho, V.A., Almeida, J.P.A., Fonseca, C.M., Guizzardi, G.: Multi-level
ontology-based conceptual modeling. In: Data & Knowledge Engineering, vol. 109,
pp. 3–24, May 2017. https://doi.org/10.1016/j.datak.2017.03.002
41. Kirillovich, A., Nevzorova, O.: Ontological analysis of the Wikipedia category sys-
tem. In: Aveiro, D., et al. (eds.) Proceedings of the 10th International Joint Con-
ference on Knowledge Discovery, Knowledge Engineering and Knowledge Man-
agement (IC3K 2018), Seville, Spain, 18–20 September 2018. KEOD, vol. 2, pp.
358–366. SCITEPRESS (2018)
42. Guarino, N., Welty, C.A.: An overview of OntoClean. In: Staab, S., Studer, R.
(eds.) Handbook on Ontologies. IHIS, pp. 201–220. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-540-92673-3 9
43. Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon model for ontologies. Final Com-
munity Group Report, 10 May 2016. https://www.w3.org/2016/05/ontolex/
44. McCrae, J.P., Bosque-Gil, J., Gracia, J., Buitelaar, P., Cimiano, P.: The OntoLex-
Lemon model: development and applications. In: Kosem I., et al. (eds.) Proceedings
of the 5th biennial conference on Electronic Lexicography (eLex 2017), pp. 587–597.
Lexical Computing CZ (2017)
172 A. Kirillovich et al.
45. Chiarcos, C.: OLiA - ontologies of linguistic annotation. Semant. Web 6(4), 379–
386 (2015). https://doi.org/10.3233/SW-140167
46. Rospocher, M., Corcoglioniti, F., Palmero Aprosio, A.: PreMOn: LODifing linguis-
tic predicate models. Lang. Resour. Eval. 53(3), 499–524 (2018). https://doi.org/
10.1007/s10579-018-9437-8
Frame IT: Detangling Knowledge
Management from Game Design
in Serious Games
1 Introduction
Serious games could be a solution to the often-diagnosed problem that tradi-
tional education via personal instruction and educational documents has seri-
ous scalability, subject specificity, and motivation limitations. A serious game
is “a mental contest, played with a computer in accordance with specific rules,
that uses entertainment to further government or corporate training, education,
health, public policy, and strategic communication objectives” [Zyd05]. Beyond
educational games for students, the term “Serious Game” is used for games that
help to acquire skills in general. This includes training professionals of basically
all industry sectors.
Serious games have the power to effectively supplement technical documents
and online courses and thereby allow students to learn how to apply their knowl-
edge to real world scenarios. Moreover, serious games very elegantly solve the
motivation problem many people experience when studying technical subjects.
Through gamification [Det+11] a serious game can be very entertaining while
at the same time providing educational value to the user.
Unfortunately, serious games for complex subjects like science, technology,
engineering, and mathematics (STEM) are currently very complex, domain-
specific, and expensive even though their motivational effects could be disruptive
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 173–189, 2020.
https://doi.org/10.1007/978-3-030-53518-6_11
174 M. Kohlhase et al.
right in these areas. Even more seriously, developers of such games need to com-
bine the skill sets of game development, pedagogy, and domain expertise, a rare
combination indeed.
To alleviate this, we propose the Frame IT Method, which – instead of
using ad-hoc methods for dealing with the underlying STEM domain knowledge
in the game – uses established mathematical knowledge management (MKM)
techniques and implementations. It loosely couples a game engine for interact-
ing with virtual worlds with the Mmt system, which performs knowledge rep-
resentation and management services, thus detangling the domain knowledge
integration from the game development process. The main mechanism involved
is the maintenance of a mapping between objects of the virtual world and their
properties (“facts”), which are formally represented in OMDoc/Mmt. On this
basis, learning objects in the form of represented theorem statements can i)
be visualized in the game world (“scrolls”) for the player to understand, ii) be
instantiated by the player by assigning a game object to every required assump-
tion, and iii) can, together with their instantiations, be represented in Mmt as
OMDoc/Mmt views. The latter enables validity checking and computation of
results which can then be transferred back into the game world bringing things
full circle.
The Frame IT Method is supposed to increase a player’s understanding of
formulae by making them apply such abstract formulae in concrete settings hap-
pening within a game world. To fulfill a formula’s assumptions, the player has to
perform a combination of selecting, moving, and generating game objects. With
the help of OMDoc/Mmt in the background and back-and-forth synchroniza-
tion, concrete outcomes of formula applications can immediately be visualized
for the user in the game world, too.
The Tree Example. At this point, we would like to introduce a running example
of an in-game word problem for a serious game. We use this problem in our seri-
ous game prototype as well throughout this document to progressively explain
the Frame IT Method.
Concretely, the player
is presented a tree in a
forested 3D world and
is asked to determine its
height using a limited set
of gadgets; each of those
providing facts about
the world, e.g. acquirable
angles and lengths from
the player’s perspective
(cf. Fig. 1). The intended
solution is to frame this
problem in the language
Fig. 1. Example problem
of trigonometry as find-
ing the length of the
Frame IT: Detangling Knowledge Management from Game Design 175
opposite side given an angle and the adjacent side. Other solutions are also
possible, e.g. choosing an isosceles 45◦ -45◦ -90◦ triangle, for which both legs of
the triangle then have the same length.
Didactically, the game world is rigged so that the gadgets produce only facts
acquirable from the player’s perspective. For instance, they cannot climb the
tree, and hence the provided measuring tape gadget disallows measuring the
tree’s height. Instead, the user is expected to use scrolls1 to discover new facts
about the world in alternative ways. In the problem at hand, such a scroll on
trigonometry could provide the length of the opposite side of a right-angled
triangle given an angle and the length of an adjacent side – both of which are
acquirable from the player’s perspective.
1
The name “scroll” is meant to evoke the fact that the knowledge contained in it is
a valuable commodity in the game.
176 M. Kohlhase et al.
2 Preliminaries
For the concept and implementation of the Frame IT Method, we require an
MKM system capable of storing, relating, and combining knowledge items in a
structured knowledge graph. To the best of our knowledge, besides the Mmt
system, the only other systems supporting this sufficiently are Hets [MML07]
and Specware [SPEC].
In the second step, the user explores the virtual world and experiments with
the given facts and scrolls. In some serious games, this happens off-band by the
player with pen and paper. By contrast, the Frame IT Method actively encourages
in-game exploration and even requires it to solve puzzles. World exploration can
involve marking new points and lines within the world, possibly guided by scrolls
like the OppositeLen scroll our player has been presented. Concretely, we imagine
they use the pointer gadget in the game UI to mark a point E on the ground
and the line gadget to mark a triangle through E and the tree’s endpoints.
180 M. Kohlhase et al.
Moreover, they measure GEF = 45◦ and EF G = 90◦ using some protractor
gadget. On the side of the MKM system in Fig. 5, we see that the collected facts
are communicated to the MKM system as soon as they are created: the situation
theory grows.
In the third step, the player frames the in-game word problem in terms of
the OppositeLen scroll by mapping every scroll input to a game world object.
Here, the inputs for the point facts a, b, c, the enclosed angle cab, and the right
angle abc are mapped to the facts E, F , G, GEF = 45◦ , and EF G = 90◦ ,
respectively. This assignment is communicated to the MKM system which estab-
lishes that it constitutes a view – we call it the application view (cf. Fig. 6).
Critically for our serious game use case, it establishes the precondition that abc
is a right-angled triangle which justifies the application of the OppositeLen scroll.
If the player frames the game problem with an assignment that does not lead
to a view – e.g. if the ground the tree stands on is sloped and thus the angle
EF G is different from 90◦ – the MKM system will reject the framing and can
pinpoint exactly where the error lies.
In the final step (cf. Fig. 7), the MKM system computes the pushout of the
application view over the inclusion of the problem into the solution theory. More-
over, it simplifies terms, computes values, and reports to the game engine that
the user has solved the puzzle. Concretely, success was determined by checking
whether the fact |F G| simplifies to a numeric value in context of the pushout
theory. This formal notion corresponds to the intuitive puzzle objective of finding
that length.
Having solved the puzzle, the player can now proceed to choose a new puzzle
to play. Importantly, the knowledge gained so far is not thrown away, but kept
for future use by the player. For example, in subsequent puzzles the player can
use the tree’s height as input for other scrolls. This effect is easily achieved by
updating the pointer to the situation theory to the computed pushout theory in
the course of the last step.
Frame IT: Detangling Knowledge Management from Game Design 181
Common to all ways of obtaining facts is that upon acquisition they are syn-
chronized with the MKM system. Namely, it is supposed to serve as a single
source of truth for all knowledge items. We will discuss the implementation of
an appropriate framework next.
182 M. Kohlhase et al.
Gadgets and Facts. On a technical level, gadgets consist of the following parts:
– To identify tools within the game, they need graphical representations.
Currently, we only use a planar icon for the UI, but in the future, we plan to
have 3D objects to show the gadgets in the virtual world.
– The activation of a gadget triggers gadget events that initialize or update
its internal state. These events are used for communication between the player
and the gadget.
– Gadgets give feedback to the player via gadget visual effects, e.g. for show-
ing assisting previews during fact creation.
– Finally, gadgets trigger fact events to initiate the creation of the appropriate
facts.
Facts are managed by Mmt but, just as gadgets, they have graphical compo-
nents: a Unity GameObject for interaction in the virtual world and an icon for
interaction in the UI.
In order to develop a new gadget, there are three main modules which have to
be extended: FactManager, FactSpawner and VisualEffectsManager. These mod-
ules cope with the different gadget parts described above. The FactManager is
aware of the currently active gadget and handles the gadget-specific inputs made
by the player. If necessary, it delegates work to the other modules. For instance,
when a gadget was used successfully, it updates the global fact list (by addi-
tion or removal) and triggers the FactSpawner to arrange for the facts’ in-game
visualization. Moreover, for visualizing assisting previews in the course of using
a gadget, the FactManager delegates and transmits the necessary data to the
VisualEffectsManager. All of the modules assume that suitable fact types and
gadgets producing instances of them have previously been established. Addi-
tionally, every fact type needs to be given a formalized counterpart on the Mmt
side. Hence, if a new gadget exceeds the current range of functionality, these
parts may also need adaptation.
Framing UI. On the lower edge of the screen, players can find the Gadget Tool-
bar, which allows access and activation of the respective gadgets. To interact
with the measured facts, the user can activate an overlay that freezes the under-
lying game and gives access to framing (cf. Fig. 8). Facts are depicted as small
tiles and are collected in the fact inventory on the top left. Complementarily,
available scrolls are shown on the right edge, of which the currently active scroll
is shown beneath the fact inventory. Players can then fill the scrolls with facts via
drag & drop. When the player clicks the “Magic” button, UFrame IT constructs
and transfers the application view to Mmt, which computes the pushout and,
after successful verification, hands back the resulting facts.
4.2 Communication
To allow Mmt to process information and give feedback according to the
Frame IT Method, we use a very fine-grained communication approach. The back-
end server provides a RESTful-interface with endpoints to add facts (one end-
point per fact type), generate views, request pushout computations, and to list
184 M. Kohlhase et al.
available scrolls. The corresponding payloads are transmitted in the JSON data
format. There are three different types of events that trigger communication
with the server:
– Game World Triggers: These automatically send requests during interac-
tion with the game world but are not used for our simple example.
– Fact List Modification: We report all changes to the fact list to the server.
Most prominently, these changes are triggered by gadgets. Each gadget-
generated fact entails sending an HTTP request including the fact details
to the server. On the Mmt side, the putative fact is first checked for validity,
then, upon success, stored as corresponding declaration(s) in the situation
theory, and lastly, its generated declaration identifier is sent back to Unity.
– Attempt of Scroll Application: When the player tries to apply a scroll,
a test for applicability is started: The mappings of the filled scroll are sent
to the server and packaged into a putative view by Mmt. The latter is then
run through the type checker, whose outcome is reported back to the game
engine. Upon success, the game engine requests the pushout computation wrt.
the Problem/Solution theory pair representing the current scroll and updates
the UI with the results.
Gadgets and Facts in FrameWorld-1 . Gadgets are the core way of interact-
ing with the world; for FrameWorld-1 we created three gadgets. The pointer
gadget marks a point in the game world and produces a new fact that
declares a labeled point. Upon gadget activation, objects in the environment
that shouldn’t be markable, e.g. the sky or existing points, are set to be
ignored. Moreover, snap zones are activated. Placing a point within these
zones positions it exactly at the center of the zone, which is necessary to
accurately mark the root and the top of the tree. The user can then relate
Frame IT: Detangling Knowledge Management from Game Design 185
two or three different points by measuring the distance between them with a
measuring tape gadget or the angle between them with the laser angle
finder. An angle is defined by the selection of three existing point objects.
Every single selection triggers an event that
updates the internal state of the gadget. After
the second point is selected, we preview the angle
by following the mouse pointer until the third
point has been fixed (cf. Fig. 9). Distance mea-
suring is implemented analogously, in this case,
with a preview line following the cursor. Impor-
tantly, we let the line only follow the cursor up to
the height of the player and prevent connection
with points which are higher than that. Even
Fig. 9. Measuring angles
though these three gadgets were developed for
FrameWorld-1, it is clear that they are generally useful for problems based on 3D
geometry and can thus be shared with subsequent games.
Formalization in Detail. Recall that scrolls may represent theorems and in those
cases they should only be applicable on situations fulfilling the theorem’s precon-
ditions. Fortunately, we can leverage Mmt as an MKM system to enforce such
conditions. For example, our background theory provides us a separate distance
type for every real value of distance and two points. Using such a distance type
for distBC in the problem theory allows us to enforce that only correct distance
facts get mapped to it. For instance, a putative view mapping a distance fact for
|AC| or even |CB| to distBC would lead to a typing error. We follow a similar
approach for angle facts (cf. angleABC and angleBCA). Note that for angleBCA
we fix the only correct angle of 90.0◦ directly in the type. In contrast, for the
previous distance fact distBC and for angleABC, we used extra (unconstrained)
declarations that make the actual value being mapped as the distance and the
angle freely selectable. After all, these values are universally quantified over in
the theorem statement.
Taking a step back from these practical experiences, we return to a conceptual
level in the next section and evaluate the Frame IT Method.
6 Conceptual Evaluation
In the introduction, we have already given an account on how our approach fits
on the spectrum of knowledge management in serious games. Now we evaluate
it relative to two aspects in which we deviate from the other approaches.
First, we employ a dedicated mechanism for knowledge management instead
of handling knowledge within source code. This is similar to GeoGebra, and in
Frame IT: Detangling Knowledge Management from Game Design 187
Further advantages stem from using a very modular and expressive sys-
tem like an MKM system. Again, GeoGebra which uses a computer algebra
system heads into a similar direction as we do. On the other hand, approaches
reimplementing such business logic in source code, such as PhET simulations, are
arguably more flexible, but not necessarily modularly so. The following features
can profitably be imported from an MKM system:
– Dependency Handling: The MKM system can be used to track formalized
dependencies of game world objects that have been given a suitable coun-
terpart on that side. Thus, after knowledge integration, developers can often
avoid to reimplement these kinds of relation handling.
– Feedback: The MKM system can detect at which point a player’s solution
fails and to some extent also why. This allows to give feedback helping players
to spot and rectify problems while solving puzzles.
– Multiple Solutions: With careful implementation of puzzle objectives in
the MKM system, the game can be made agnostic to solution paths. Thus,
if there are multiple ways to complete the game, the user is free to do so by
default.
– Compound Problems/Solutions: By treating facts and puzzle objectives
in a uniform way, we can naturally construct compound problems asking for
facts to be obtained by subproblems. We have presented a simple example,
but it is not difficult to think of more advanced examples that require multiple
scroll applications.
Nonetheless, employing a separate MKM system also introduces potential
issues. In more complex games the sheer number of communication requests
might impact game performance. Additionally, explicit modeling of background
knowledge entails accounting for edge cases, which can be worked around in
traditional (code-the-behavior) approaches.
188 M. Kohlhase et al.
7 Conclusion
We have presented a novel application of MKM technology: knowledge manage-
ment in serious games, which we call the Frame IT Method. This principle alle-
viates the creation of games which, for instance, teach the application of simple
mathematical models in geometry by instantiating them in virtual worlds. To
realize the Frame IT Method, we have created an interface between the Mmt
system and Unity. This prototype implementation shows that combining a game
engine with an MKM system is not only possible but indeed useful: The explicit
representation of the underlying domain knowledge and the game world’s sit-
uation in the MKM system allow for checking the applicability of the model
on the MKM side. Consequently, our approach creates separated workflows and
encourages reuse of content.
We have instantiated the UFrame IT framework to obtain FrameWorld-1, a
simple serious game which challenges players to solve basic geometric problems
using “scrolls” derived from 3D geometry and trigonometry. In accordance with
our goals, our framework allowed to formalize knowledge in Mmt largely inde-
pendently from the remaining game development. Dually, we were also able to
implement the game itself generically by building user-interface “gadgets”, with-
out necessitating domain expertise in geometry.
References
[Det+11] Sebastian, D., et al.: From game design elements to gamefulness: defin-
ing “Gamification”. In: Proceedings of the 15th International Academic
MindTrek Conference. MindTrek 2011, pp. 9–15. ACM, New York (2011).
https://doi.org/10.1145/2181037.2181040
[GG] International GeoGebra Institute. Graphing Calculator - GeoGebra, 27 May
2020. https://www.geogebra.org
[MitM] MitM/core, 18 Jan 2020. https://gl.mathhub.info/MitM/core
[MML07] Mossakowski, T., Maeder, C., Lüttich, K.: The heterogeneous tool set, Hets.
In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 519–
522. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71209-
1 40
[MR19] Müller, D., Rabe, F.: Rapid prototyping formal systems in MMT: 5 case
studies. In: LFMTP 2019. Electronic Proceedings in Theoretical Computer
Science (EPTCS) (2019). https://kwarc.info/people/frabe/Research/MR
prototyping 19.pdf
[PhET] University of Colorado. PhET Interactive Simulations, 27 May 2020. https://
phet.colorado.edu
[Rab13] Rabe, F.: The MMT API: a generic MKM system. In: Carette, J., et al. (eds.)
Intelligent Computer Mathematics. Lecture Notes in Computer Science, vol.
7961, pp. 339-343. Springer, Heidelberg(2013). https://doi.org/10.1007/978-
3-642-39320-4
[RK13] Rabe, F., Kohlhase, M.: A scalable module system. In: Information & Com-
putation 0.230, pp. 1–54 (2013). https://kwarc.info/frabe/Research/mmt.
pdf
Frame IT: Detangling Knowledge Management from Game Design 189
[RKM16] Rochau, D., Kohlhase, M., Müller, D.: FrameIT reloaded: serious math
games from modular math ontologies. In: Kohlhase, M., et al. (ed.) Intelli-
gent Computer Mathematics - Work in Progress Papers (2016). http://ceur-
ws.org/Vol-1785/W50.pdf
[SPEC] Kestrel Institute. The Specware System, 27 May 2020. https://www.kestrel.
edu/home/projects/specware/index.html
[UFM] Formalizations for UFrameIT FrameWorld, 19 March 2020. https://gl.
mathhub.info/FrameIT/FrameWorld
[Uni] Unity Technologies. Unity Realtime Development Platform. Version
2019.3.6, 19 March 2020. https://unity.com/
[Zyd05] Zyda, M.: From visual simulation to virtual reality to games. Computer
38(9), 25–32 (2005). https://doi.org/10.1109/MC.2005.297
Formalizing Graph Trail Properties in
Isabelle/HOL
1 Introduction
The problem of finding a longest trail with strictly increasing or strictly decreas-
ing weights in an edge-weighted graph is an interesting graph theoretic prob-
lem [3,7,8,14], with potential applications to scheduling and cost distribution
in traffic planning and routing [5]. In this paper, we formalize and automate
the reasoning about strictly increasing and strictly decreasing trail properties by
developing an extendable flexible library in the proof assistant Isabelle/HOL [11].
As a motivating example consider the following (undirected) graph K4 , where
each edge is annotated with a different integer-valued weight ranging from
1, . . . , 6:
1
v1 v2
6 5
4
3
2
v3 v4
existing works in this area. In particular, the first to raise the question of the
minimum length of strictly increasing trails of arbitrary graphs were Chvátal
and Komlós [4]. Subsequently, Graham and Kletman [8] proved that the lower
bound of the length of increasing trails is given by 2 · nq , as also mentioned
above. In our work, we formalize and verify such results in Isabelle/HOL. Yet,
our work is not a straightforward adaptation and formalization of Graham and
Kletman’s proof [8]. Rather, we focus on decreasing trails instead of increasing
trails and give an algorithm computing longest decreasing trails of a given graph
(Algorithm 1). By formalizing Algorithm 1 in Isabelle/HOL, we also formally
verify the correctness of the trails computed by our approach. Moreover, we prove
that any strictly decreasing trail is also an strictly increasing one, allowing this
way to use our formalization in Isabelle/HOL also to formalize results of Graham
and Kletman [8].
Contributions. This paper brings the following contributions.
(1) We formalize strictly increasing trails and provide basic lemmas about their
properties. We improve results of [8] by giving a precise bound on the
increase of trail length.
(2) We formalize strictly decreasing trails, in addition to the increasing trail
setting of [8]. We prove the duality between strictly increasing and strictly
decreasing trails, that is, any such decreasing trail is an increasing one, and
vice versa. Thanks to these extensions, unlike [8], we give a constructive
proof of the existence of strictly ordered trails (Lemma 1).
(3) We design an algorithm computing longest ordered trails (Algorithm 1), and
formally verify its correctness in Isabelle/HOL. We extract our algorithm
to Haskell program code using Isabelle’s program extraction tool. Thus, we
obtain a fully verified algorithm to compute the length of strictly-ordered
trails in any given graph and weight distribution.
(4) We verify the lower bound on the minimum length of strictly decreasing
trails of arbitrary graphs, and of complete graphs in particular.
(5) We build upon the Graph-Theory library by Noschinski [12], that is part of
the Archive of Formal Proofs (AFP) and already includes many results on
walks and general properties of graphs. We introduce the digital dataset v
formalizing properties of graph trails. Our dataset consists of ∼2000 lines of
Isabelle code and it took about one month for one person to finish. As far as
we know this is the first formalization of ordered trails in a proof assistant.
This paper was generated from Isabelle/HOL source code using Isabelle’s
document preparation tool and is therefore fully verified. The source code is
available online at https://github.com/Lachnitt/Ordered Trail. The rest of the
paper is organized as follows. Section 2 recalls basic terminology and properties
from graph theory. We prove lower bounds on strictly increasing/decreasing
trails in Sect. 3. We describe our Isabelle/HOL formalization in Isabelle/HOL
in Sect. 4. We discuss further directions in Sect. 5 and conclude our paper with
Sect. 6.
Formalizing Graph Trail Properties in Isabelle/HOL 193
2 Preliminaries
We briefly recapitulate the basic notions of graph theory. A graph G = (V, E)
consists of a set V of vertices and a set E ⊆ V ×V of edges. A graph is undirected
if (v1 , v2 ) ∈ E implies that also (v2 , v1 ) ∈ E. A graph is complete if every pair
of vertices is connected by an edge. A graph is loopfree or simple if there are no
edges (x, x) ∈ E and finite if the number of vertices |V | is finite. Finally, we call
a graph G = (V , E ) a subgraph of G = (V, E) if V ⊆ V and E ⊆ E.
If a graph is equipped with a weight function w : E → R that maps edges
to real numbers, it is called an edge-weighted graph. In the following, whenever
a graph is mentioned it is implicitly assumed that this graph comes equipped
with a weight function. A vertex labelling is a function L : V → N.
A trail of length k in a graph G = (V, E) is a sequence (e1 , . . . , ek ), ei ∈ E,
of distinct edges such that there exists a corresponding sequence of vertices
(v0 , ..., vk ) where ei = vi−1 vi . A strictly decreasing trail in an edge-weighted
graph G = (V, E) with weight function w is a trail such that w(ei ) > w(ei+1 ).
Likewise, a strictly increasing trail is a trail such that w(ei ) < w(ei+1 ). A trail
is strictly-ordered if it is strictly increasing or strictly decreasing.
We will denote the length of a longest strictly increasing trail with Pi (w, G).
Likewise we will denote the length of a longest strictly decreasing trail with
Pd (w, G). In any undirected graph, it holds that Pi (w, G) = Pd (w, G), a result
that we will formally verify in Sect. 4.2.
Let fi (n) = minn Pi (w, Kn ) denote the minimum length of an strictly
increasing trail that must exist in the complete graph with n vertices. Like-
wise, fd (n) = minn Pd (w, Kn ) in the case that we consider strictly decreasing
trails.
In Fig. 2 the example graph from Fig. 1 is revisited to illustrate these definitions.
We need to prove the following property.
1
v1 v2
Decreasing trails from v3 are:
v 3 − v4 ,
v 3 − v1 − v 2 ,
5 v 3 − v2 − v 1 ,
3 4
v3 − v2 − v4 − v3
Therefore, L5 (v3 ) = 3.
Lemma 1. If i < q, then v∈V Li+1 (v) ≥ v∈V Li (v) + 2.
Proof. Let e be the edge labelled with i + 1 and denote its endpoints with u1
and u2 . It holds that E i ∪ {e} = E i+1 , therefore the graph Gi+1 is Gi with the
additional edge e. As w(e ) < w(e), for all e ∈ E i we have Li+1 (v) = Li (v) for all
v ∈ V with u1 = v, u2 = v. It also holds that Li+1 (u1 ) = max(Li (u2 )+1, Li (u1 ))
because either that longest trail from u1 can be prolonged with edge e (i + 1
will be greater than the weight of the first edge in this trail by construction of
Li+1 ) or there is already a longer trail starting from u1 not using e. We derive
Li+1 (u2 ) = max(Li (u1 ) + 1, Li (u2 )) based on a similar reasoning. See Fig. 3 for
an illustration.
Note that Li+1 (v) = Li (v) for v ∈ V \ {u1 , u2 }, because no edge incident
to these vertices was added and a trail starting from them cannot be prolonged
since the new edge has bigger weight than any edge in such a trail.
If L(u1 ) = L(u2 ), then Li+1 (u1 ) = Li (u1 ) + 1 and Li+1 (u2 ) = Li (u2 ) + 1
and thus the sum increases exactly by 2. If L(u1 ) > L(u2 ) then Li+1 (u2 ) =
Li (u1 ) + 1 ≥ Li (u2 ) + 2, otherwise Li+1 (u1 ) = Li (u2 ) + 1 ≥ Li (u1 ) + 2. Thus,
Li+1 (v) = Li+1 (v) + Li+1 (u1 ) + Li+1 (u2 )
v∈V v∈(V −{u1 ,u2 })
≥ Li+1 (v) + Li (u1 ) + Li (u2 ) + 2
v∈(V −{u1 ,u2 })
= Li (v) + 2.
v∈V
Formalizing Graph Trail Properties in Isabelle/HOL 195
Li (u1 ) Li (u2 )
Situation before adding edge e:
Note that the proof of Lemma 1 is constructive, yielding the Algorithm 1 for
computing longest strictly decreasing trails. Function f indEndpoints searches
for an edge in a graph G by its weight i and returns both endpoints. Function
f indM ax returns the maximum value of the array L.
Algorithm 1: Find Longest Strictly Decreasing Trail
for v ∈ V do
L(v) := 0
end
for i = 1; i < |E|; i + + do
(u, v) = f indEndpoints(G, i);
temp = L(u);
L(u) = max(L(v) + 1, L(u)) ;
L(v) = max(temp + 1, L(v)) ;
end
return findMax(L);
Lemma 2. v∈V Lq (v) ≥ 2q.
i+1
Proof. We proceed by induction, using the property
v∈V L
(v) ≥
i 0
v∈V L (v)+2 from Lemma 1. For the induction base note that v∈V L (v) = 0
because G0 does not contain any edges and thus no vertex has a strictly decreas-
ing trail of length greater than 0.
We next prove the lower bound on the length of longest strictly decreasing trails.
196 L. Kovács et al.
Proof. Assume that no vertex is a starting point of a trail of length at least 2· nq ,
that is Lq (v) < 2 · n , for all v ∈ V . Then, v∈V Lq (v) < 2 · nq n ≤ 2 · q. But
q
this is a contradiction to Lemma 2 that postulates that the sum of the length
of all longest strictly decreasing trails v∈V Lq (v) is greater than 2 · q. Hence,
there has to be at least one vertex with a strictly decreasing trail that is longer
than 2 · nq in Gq . This trail contains a subtrail of length 2 · nq . Since E q = E
it follows that Gq = G, which concludes the proof.
To increase the reusability of our library we build upon the Graph-Theory library
by Noschinski [12]. Graphs are represented as records consisting of vertices and
edges that can be accessed using the selectors pverts and parcs. We recall the
definition of the type pair-pre-digraph:
Now restrictions upon the two sets and new features can be introduced using
locales. Locales are Isabelle’s way to deal with parameterized theories [1]. Con-
sider for example pair-wf-digraph. The endpoints of an edge can be accessed
using the functions fst and snd. Therefore, conditions arc-fst-in-verts and arc-
snd-in-verts assert that both endpoints of an edge are vertices. Using so-called
sublocales a variety of other graphs are defined.
Formalizing Graph Trail Properties in Isabelle/HOL 197
fun decT rail :: a pair-pre-digraph ⇒ (a × a) weight-f un ⇒ (a × a) list ⇒ bool
where
decT rail g w [] = T rue |
198 L. Kovács et al.
Defining trails as lists in Isabelle has many advantages including using pre-
defined list operators, e.g., drop. Thus, we can show one result that will be
constantly needed in the following, that is, that any subtrail of an ordered trail
is an ordered trail itself.
In Isabelle we then show the equivalence between the two definitions decTrail
and decTrail2 of strictly decreasing trails. Similarly, we also show the equivalence
between the definition incTrail and incTrail2 of strictly increasing trails.
Any strictly decreasing trail (e1 , . . . , en ) can also be seen as a strictly increas-
ing trail (en , ..., e1 ) if the graph considered is undirected. To this end, we make
use of the locale pair-sym-digraph that captures the idea of symmetric arcs.
However, it is also necessary to assume that the weight function assigns the
same weight to edge (vi , vj ) as to (vj , vi ). This assumption is therefore added to
decTrail-eq-rev-incTrail and incTrail-eq-rev-decTrail.
not restrict the types of vertices and edges but impose the condition that they
have to be a linear order.
Furthermore, all weights have to be integers between 0 and 2q where 0 is
used as a special value to indicate that there is no edge at that position. Since the
range of the weight function is in the reals, the set of natural numbers {1,..,card
(parcs G) div 2} has to be casted into a set of reals. This is realized by taking
the image of the function real that casts a natural number to a real.
locale weighted-pair-graph = pair-graph (G:: (a::linorder) pair-pre-digraph) for G +
fixes w :: (a × a) weight-f un
assumes dom : e ∈ parcs G −→ w e ∈ real ‘ {1..card (parcs G) div 2}
and vert-ge : card (pverts G) ≥ 1
One important step in our formalization is to show that the weight function
is surjective. However, having two elements of the domain (edges) being mapped
to the same element of the codomain (weight) makes the proof complicated. We
therefore first prove that the weight function is surjective on a restricted set
of edges. Here we use the fact that there is a linear order on vertices by only
considering edges were the first endpoint is bigger than the second.
Then, the surjectivity of w is relatively simple to show. Note that we could
also have assumed surjectivity in distinct-weighted-pair-graph and shown that
distinctiveness follows from it. However, distinctiveness is the more natural
assumption that is more likely to appear in any application of ordered trails.
fun f indEdge :: (a × a) weight-f un ⇒ (a × a) list ⇒ real ⇒ (a × a) where
f indEdge f [] k = undef ined |
f indEdge f (e#es) k = (if f e = k then e else f indEdge f es k)
To add all edges to the graph, set i = |E|. Recall that card (parcs g) = 2∗|E|,
as every edge appears twice. Then, iterate over all vertices and give back the
maximum length which is found by using getL G w. Since getL G w can also be
used to get a longest strictly increasing trail ending at vertex v the algorithm is
not restricted to strictly decreasing trails.
Exporting the algorithm into Haskell code results in a fully verified program
to find a longest strictly decreasing or strictly increasing trail.
The algorithm introduced in Sect. 4.4 is already useful on its own. Additionally,
it can be used to verify the lower bound on the minimum length of a strictly
decreasing trail Pd (w, G) ≥ 2 · nq .
To this end, Lemma 1 from Sect. 3 is translated into Isabelle as the lemma
minimal-increase-one-step. The proof is similar to its counterpart, also using a
case distinction. Lemma 2 is subsequently proved, here named minimal-increase-
total.
lemma (in distinct-weighted-pair-graph) minimal-increase-one-step:
assumes k + 1 ∈ W
shows
( v ∈ pverts G. getL G w (k+1) v) ≥ ( v ∈ pverts G. getL G w k v) + 2
lemma (in
distinct-weighted-pair-graph) minimal-increase-total:
shows ( v ∈ pverts G. getL G w (q div 2) v) ≥ q
From minimal-increase-total we have that the sum of all labels after q div
2 steps is greater than q. Now assume that all labels are smaller than q div
n. Because we have n vertices, this leads to a contradiction, which proves
algo-result-min.
202 L. Kovács et al.
We return to the example graph from Fig. 1 and show that our results from
Sects. 4.2–4.5 can be used to prove existence of trails of length k, in particular
k = 3 in K4 . Defining the graph and the weight function separately, we use
natural numbers as vertices.
We show that the graph K4 of Fig. 1 satisfies the conditions that were imposed
in distinct-weighted-pair-graph and its parent locale, including for example no
Formalizing Graph Trail Properties in Isabelle/HOL 203
self loops and distinctiveness. Of course there is still some effort required for
this. However, it is necessary to manually construct trails or list all possible
weight distributions. Additionally, instead of q! statements there are at most 3q
2
statements needed.
interpretation example:
distinct-weighted-pair-graph ExampleGraph ExampleGraphW eightF unction
Now it is an easy task to prove that there is a trail of length 3. We only
add the fact that ExampleGraph is a distinct-weighted-pair-graph and lemma
dec-trail-exists.
lemma ExampleGraph-decT rail:
∃ xs. decT rail ExampleGraph ExampleGraphW eightF unction xs ∧
length xs = 3
6 Conclusion
In this work we formalized strictly increasing and strictly decreasing trails in
the proof assistant Isabelle/HOL. Furthermore, we showed correctness of an
algorithm to find such trails. We provided a verified algorithm and program to
compute monotone trails. We used this algorithm to prove the result that every
graph with n vertices and q edges has a strictly decreasing trail of length at least
2 · nq . For further work we plan to show that this is a tight bound for every n
except for n = 3 and 5.
Our results are built on the already existing Isabelle Graph-theory from
the Archive of Formal Proofs. Thus, our results can be used by any theory
using graphs that are specified as in this library. Therefore, our theory is highly
reusable and might be the basis for further work in this field.
References
1. Ballarin, C.: Tutorial to locales and locale interpretation. In: Contribuciones
cientı́ficas en honor de Mirian Andrés Gómez, pp. 123–140. Universidad de La
Rioja (2010)
2. Bucic, M., Kwan, M., Pokrovskiy, A., Sudakov, B., Tran, T., Wagner, A.Z.: Nearly-
linear monotone paths in edge-ordered graphs. arXiv preprint arXiv:1809.01468
(2018)
3. Calderbank, A.R., Chung, F.R., Sturtevant, D.G.: Increasing sequences with
nonzero block sums and increasing paths in edge-ordered graphs. Discret. Math.
50, 15–28 (1984)
4. Chavtal, V., Komlos, J.: Some combinatorial theorems on monocity. In: Notices
of the American Mathematical Society, vol. 17, p. 943. American Mathematical
Society, 201 Charles St, Providence, RI 02940-2213 (1970)
5. Cook, B., Kovács, L., Lachnitt, H.: Personal Communications on Automated Rea-
soning at AWS (2019)
6. De Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-78800-3 24
7. De Silva, J., Molla, T., Pfender, F., Retter, T., Tait, M.: Increasing paths
in edge-ordered graphs: the hypercube and random graphs. arXiv preprint
arXiv:1502.03146 (2015)
8. Graham, R., Kleitman, D.: Increasing paths in edge ordered graphs. Periodica
Math. Hung. 3(1–2), 141–148 (1973)
9. Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Proceed-
ings of CAV, pp. 1–35 (2013)
10. Milans, K.G.: Monotone paths in dense edge-ordered graphs (2015)
Formalizing Graph Trail Properties in Isabelle/HOL 205
11. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL - A Proof Assistant
for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.
org/10.1007/3-540-45949-9
12. Noschinski, L.: Graph theory. Archive of Formal Proofs, April 2013. http://isa-
afp.org/entries/Graph Theory.html. Formal proof development
13. Roditty, Y., Shoham, B., Yuster, R.: Monotone paths in edge-ordered sparse
graphs. Discret. Math. 226(1–3), 411–417 (2001)
14. Yuster, R.: Large monotone paths in graphs with bounded degree. Graphs Comb.
17(3), 579–587 (2001). https://doi.org/10.1007/s003730170031
Representing Structural Language
Features in Formal Meta-languages
2 Preliminaries
OMDoc is a rich representation language for mathematical knowledge with a
large set of primitives motivated by expressivity and user familiarity. The Mmt
[RK13] language is a complete redesign of the formal core of OMDoc focusing
on foundation-independence, scalability, modularity and minimality.
In Fig. 1, we show a fragment of the Mmt grammar that we need in the
remainder of this paper. Meta-symbols of the BNF format are given in color.
The central notion in Mmt is that
of a diagram consisting of a list of
modules. For our purposes, theories
are the only modules we need. Mmt
theories are named sets of statements
and are used to represent formal con-
structs such as logical frameworks,
logics, and theories. At the decla-
ration level Mmt has includes and
constants. Constants are meant to
represent a variety of OMDoc dec-
larations and are simply named sym-
bols with an optional type and defi-
nition. The types and definitions are
Mmt expressions, which are based
Fig. 1. Mmt grammar
on OpenMath. Expressions are ref-
erences to bound variables x and constants c, bound variable declarations x ∶ E,
and complex expressions c(E ∗ ) (which include variable binding by using x ∶ E
as a child).
The semantics of Mmt provides an inference systems that includes in partic-
ular two judgments for typing and equality of expressions. Via Curry-Howard,
the former includes provability, e.g., a theorem F is represented as a constant
with type F , whose definiens is the proof. We have to omit the details here
for brevity. We only emphasize that Mmt is foundation-independent: The syn-
tax does not assume any special constants (e.g., λ), and the semantics does
not assume any special typing rules (e.g., functional extensionality). Instead,
any such foundation-specific aspects are supplied by special Mmt theories
called foundations. For example, the foundation for the logical framework LF
[HHP93] declares constants for type, λ, Π, and @ (for application) as well as
the necessary typing rules. Thus, the Mmt module system governs, e.g., which
typing rules are available in which theory. The details can be found in [Rab17].
3 Structural Features
Before we come to a formal definition, let us consider record types as an
example for a structural feature.
A record type R is a collection of typed (in our case, optionally additionally
defined) fields x ∶ T . A record term r is a collection of assignments x ∶= d for
210 D. Müller et al.
D ∶= d ∶ f (F1 . . . Fn ) = {S1 . . . Sm }
The structural feature itself is written in Scala using the Mmt-API, which
provides dedicated abstractions, and acts as a rule similarly to the typing rules
mentioned in Sect. 2. Just like typing rules, structural features (or rather, their
derived declarations) can thus be made available in precisely those theories where
they are deemed valid.
For the rest of this paper, we will assume various extensions of LF as our
foundations. If our external declarations contain axioms, we assume some fixed
logic declared in the foundation, providing a type prop, an operator ⊢ of type
prop → type, a typed equality operator ≐ ∶ ∏A∶type A → A → prop and the usual
logical connectives.
However, it should be noted that the structural features presented herein can
be easily adapted to any logic sufficiently strong to allow for defining (equivalents
to) their external declarations.
4 Examples
We will now show the practical utility of these relatively abstract definitions in
some paradigmatic cases at the declaration and module levels.
4.1 Datatypes
Inductive Types. Structural features can provide a convenient syntax for
declaring inductive types. Consider for example a (parametric) type of lists
List(A) of type A, which can be defined as the inductive type generated by the
two constructors nil ∶ List(A) and cons ∶ A → List(A) → List(A). We devise
two structural features with names induct and indef, allowing us to declare
inductive types and inductive definitions respectively, as in Fig. 21 . Naturally,
parametric inductive types require a logic with (at least) shallow polymorphism.
In the absence of such a typing feature, we can instead elaborate into the
corresponding constructors and axioms (expressed in some logic declared in our
foundation) asserting their collective injectivity and surjectivity, in the manner
which we will describe shortly. Importantly, we can use the same validity predi-
cate and feature name for both variants, preserving the syntax of the structural
features across logics. This ensures that whenever L′ extends L by an inductive
typing feature, any L-theory using induct and indef remains a valid L′ -theory,
but the elaboration in the stronger logic will consist of defined constants.
Elaborating. induct A derived declaration
is elaborated as follows:
1. Any type-level declaration Si ∶ type is elaborated into
Dind /Si ∶ ∏ T → Ii
t1 ∶T1 ,...,tn ∶Tn
1. The first declaration S1 has to have function type TI → A for some type A.
2. For every constructor con ∶ T1′ → . . . → Tk′ → TI , there has to be an Si with
the same name, being a defined constant
elaborates to
R ∶ ∏ type ∶= λt1 ∶ T1 , . . . , tn ∶ Tn . ⟦ s1 ∶ T1′ [ ∶= d1 ] . . . sm ∶ Tm
′
[ ∶= dm ] ⟧ .
t1 ∶T1 ,...,tn ∶Tn
r ∶ ∏ R ∶= λt1 ∶ T1 , . . . , tn ∶ Tn . s1 ∶= d1 . . . sm ∶= dm .
t1 ∶T1 ,...,tn ∶Tn
214 D. Müller et al.
One big advantage of this approach in Mmt surface syntax is that each field
in a rectm-declaration can be checked separately against the corresponding field
in the record type, whereas the elaborated expression s1 ∶= d1 . . . sm ∶= dm
is treated as a single term and checked as a whole. While this does not make a
difference semantically, it allows for much better localization of errors and more
helpful error messages.
In a logic L without a notion of record types, the structural feature rectp
can instead elaborate a derived declaration in the manner described in Sect. 3.
Fig. 3. Theories for Monoids, Abelian groups and rings using structures
Figure 4 shows an
example of a Coq
section. A and f are
declared as variables
and used like con-
stants in the remain-
der of the section.
The defined constant
ltof within the section Fig. 4. A Coq section and its MMT counterpart
hence takes two arguments a, b. After the section is closed however, ltof is used
as a quaternary function, with its arguments being the type A, the function f
and the two arguments a, b.
The right side of Fig. 4 shows the same Section, but expressed in MMT syn-
tax using a new structural features Section. Variables are marked with the
role Variable flag. The validity predicate accepts any theory. A derived decla-
ration D = Sec ∶ Section () = {S1 . . . Sn } is elaborated as follows:
1. Any constant with the role Variable flag is not elaborated,
2. for any other constant Si = c ∶ T [ ∶= d], let v1 ∶ T1 . . . vn ∶ Tn be all
variables declared in D prior to Si . Then extend the elaboration of D by
Sec/c ∶ ∏ T [ ∶= λv1 ∶ T1 , . . . , vn ∶ Tn . d]
v1 ∶T1 ,...,vn ∶Tn
216 D. Müller et al.
5 Module-Level Features
Definition. To simplify the presentation, we have so far only considered struc-
tural features that extend the syntax inside theories. But it is natural to also
218 D. Müller et al.
Module-level structural features are defined and used in the same way as
above except for two subtleties. Firstly, the elaboration of a derived module
may only produce other modules. This makes sense as toplevel declaration must
elaborate to other toplevel declarations.
Secondly, it is difficult how to specify where a module-level structural feature
may be used. For derived declaration, which occur inside a theory, this is easy:
the declaration may be used if the respective structural feature rule is visible
to the containing theory. But derived modules, which may occur on toplevel,
do not have a containing theory. It is not desirable to introduce a global scope
that would define which module-level features are in scope as that would pre-
clude restricting a module-level feature to specific object-languages. We are still
experimenting with different designs for this issue. For now we use the containing
file as the scope.
6 Conclusion
We have presented a meta-language-based infrastructure of structural features
in the Mmt system and some paradigmatic examples that show its power. Struc-
tural features allow flexibly extending formal mathematical languages with new
kinds of declarations without having to enlarge the trusted core of the system. In
a meta-logical system, structural features are especially interesting because we
need them to represent object languages and because the module system itself
can restrict their availability to particular object logics.
The work presented here was to a large extent motivated by and developed
for building exports of theorem prover libraries. In these, structural features
allowed defining derived language features of theorem prover languages so that
exports could stay shallow, i.e., structure-preserving, while also capturing the
deep elaboration into kernel features that is needed for verification. Without
the infrastructure presented in this paper, only deep implementations would
have been possible and we would have been restricted to much less structured—
and thus less searchable and reusable—exports. Moreover, it will prove critical
for interoperability and library translations between theorem provers: even if
target and source system have the exact same structural feature, a translation is
practically very difficult if the intermediate representation is based on only the
elaborated declarations.
In future work, we plan to represent more advanced features of theorem
prover languages, starting with Isabelle and Coq. An open theoretical question
is how to translate derived declarations along views in such a way that translation
commutes with elaboration—this does not hold for every structural feature, and
establishing sufficient criteria would be very valuable for modular reasoning in
large libraries. Finally, we will improve Mmt’s abilities to represent the concrete
syntax of derived declarations in order to mimic even more closely arbitrary
object language syntax; this will allow for prototyping domain-specific languages
in a way that entirely hides the logical framework from the user.
References
[BHL+14] Blanchette, J.C., Hölzl, J., Lochbihler, A., Panny, L., Popescu, A., Traytel,
D.: Truly modular (co)datatypes for Isabelle/HOL. In: Klein, G., Gamboa,
R. (eds.) ITP 2014. LNCS, vol. 8558, pp. 93–110. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-08970-6 7
[CB16] Christiansen, D., Brady, E.: Elaborator reflection: extending idris in idris.
In: Garrigue, J., Keller, G., Sumii, E. (eds.) International Conference on
Functional Programming, pp. 284–297. ACM (2016)
[Coq15] Coq Development Team: The Coq Proof Assistant: Reference Manual.
Technical report, INRIA (2015)
[EUR+17] Ebner, G., Ullrich, S., Roesch, J., Avigad, J., de Moura, L.: A metapro-
gramming framework for formal verification. In: Proceedings of the ACM
on Programming Languages, vol. 1, no. ICFP, pp. 34:1–34:29 (2017)
220 D. Müller et al.
[Gor88] Gordon,M.: HOL: a proof generating system for higher-order logic. In:
Birtwistle, G., Subrahmanyam, P. (eds.) VLSI Specification, Verification
and Synthesis, pp. 73–128. Kluwer-Academic Publishers (1988)
[Har96] Harrison, J.: HOL light: a tutorial introduction. In: Srivas, M., Camilleri,
A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidel-
berg (1996). https://doi.org/10.1007/BFb0031814
[HHP93] Harper, R., Honsell, F., Plotkin, G.: A framework for defining logics. J.
Assoc. Comput. Mach. 40(1), 143–184 (1993)
[HKR12] Horozal, F., Kohlhase, M., Rabe, F.: Extending MKM formats at the state-
ment level. In: Jeuring, J., et al. (eds.) CICM 2012. LNCS (LNAI), vol.
7362, pp. 65–80. Springer, Heidelberg (2012). https://doi.org/10.1007/978-
3-642-31374-5 5
[Hor14] Horozal, F.: A framework for defining declarative languages. Ph.D. thesis,
Jacobs University Bremen (2014)
[HR15] Horozal, F., Rabe, F.: Formal logic definitions for interchange languages.
In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM
2015. LNCS (LNAI), vol. 9150, pp. 171–186. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-20615-8 11
[IKRU13] Iancu, M., Kohlhase, M., Rabe, F., Urban, J.: The Mizar mathematical
library in OMDoc: translation and applications. J. Autom. Reason. 50(2),
191–202 (2013)
[KMOR17] Kohlhase, M., Müller, D., Owre, S., Rabe, F.: Making PVS accessible to
generic services by interpretation in a universal format. In: Ayala-Rincón,
M., Muñoz, C.A. (eds.) ITP 2017. LNCS, vol. 10499, pp. 319–335. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-66107-0 21
[Koh06] Kohlhase, M.: OMDoc: An Open Markup Format for Mathematical Doc-
uments (Version.12). Lecture Notes in Artificial Intelligence, vol. 4180.
Springer, Heidelberg (2006). https://doi.org/10.1007/11826095
[KR14] Kaliszyk, C., Rabe, F.: Towards knowledge management for HOL light.
In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.)
CICM 2014. LNCS (LNAI), vol. 8543, pp. 357–372. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-08434-3 26
[KR20] Kohlhase, M., Rabe, F.: Experiences from exporting major proof assistant
libraries (2020, submitted). https://kwarc.info/people/frabe/Research/
KR oafexp 20.pdf
[MRK18] Müller, D., Rabe, F., Kohlhase, M.: Theories as types. In: Galmiche, D.,
Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900,
pp. 575–590. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
94205-6 38
[MRSC19] Müller, D., Rabe, F., Sacerdoti Coen, C.: The Coq library as a theory
graph. In: Kaliszyk, C., Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.)
CICM 2019. LNCS (LNAI), vol. 11617, pp. 171–186. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-23250-4 12
[Mül19] Müller, D.: Mathematical knowledge management across formal libraries.
Ph.D. thesis, Informatics, FAU Erlangen-Nürnberg, October 2019. https://
kwarc.info/people/dmueller/pubs/thesis.pdf
[Pau94] Paulson, L.: Isabelle: A Generic Theorem Prover. Lecture Notes in Com-
puter Science, vol. 828. Springer, Heidelberg (1994). https://doi.org/10.
1007/BFb0030541
[Rab17] Rabe, F.: How to identify, translate, and combine logics? J. Log. Comput.
27(6), 1753–1798 (2017)
Representing Structural Language Features in Formal Meta-languages 221
[RK13] Rabe, F., Kohlhase, M.: A scalable module system. Inf. Comput. (230),
1–54 (2013). http://kwarc.info/frabe/Research/mmt.pdf
[RM18] Rabe, F., Müller, D.: Structuring theories with implicit morphisms (2018).
http://wadt18.cs.rhul.ac.uk/submissions/WADT18A43.pdf
[Rot20] Rothgang, C.: Theories as inductive types, 05 2020. B.Sc. Thesis, expected
May 2020
[RS13] Rabe, F., Sojakova, K.: Logical relations for a logical framework.
ACM Trans. Comput. Log. (2013). http://kwarc.info/frabe/Research/RS
logrels 12.pdf
[SR19] Rabe, F., Sharoda, Y.: Diagram combinators in MMT. In: Kaliszyk, C.,
Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.) CICM 2019. LNCS
(LNAI), vol. 11617, pp. 211–226. Springer, Cham (2019). https://doi.org/
10.1007/978-3-030-23250-4 15
Formally Verifying Proofs for Algebraic
Identities of Matrices
1 Introduction
Supported by DFG TRR 195 “Symbolic Tools in Mathematics and their applications”.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 222–236, 2020.
https://doi.org/10.1007/978-3-030-53518-6_14
Formally Verifying Proofs for Algebraic Identities of Matrices 223
ideal membership problems with methods from associative ring theory is feasi-
ble. For this, we formulate a new algebraic framework with elementary proofs
to handle rectangular matrices of various sizes. Furthermore, we illustrate how
elimination orderings can be used for establishing new identities. We propose
easy-to-handle black box tools which do not require a full understanding of the
underlying algebraic framework. These tools are then applied to prove a num-
ber of relevant matrix identities. The verifications are traditional pen-and-paper
proofs that use the presented automation approach as the central proof step.
We employ the non-commutative extension Letterplace of the open source
computer algebra system Singular.
2 General Design
Every matrix with entries in some fixed field K has a size, that is two natural
numbers encoding the number of rows and columns of this matrix. Equality of
matrices and the arithmetic operations addition, subtraction and multiplication
are all partial due to those sizes. Instead of starting with a tedious description of a
syntax for valid matrix expressions, we shorten this process and encode matrices
as polynomials with non-commuting variables. Every reasonable semantics of
matrix expressions should respect the algebraic structure of polynomials. For
matrices M1 , . . . , Mn and M we consider the problem
M1 = 0 ∧ · · · ∧ Mn = 0 =⇒ M = 0
which can be reduced to well-known questions about ideals in non-commutative
algebras. Before introducing a formal setting, we begin the discussion with an
illustrative model problem.
Lemma 1. For arbitrary matrices A ∈ Kn×n , U ∈ Kn× and V ∈ K×n ,
2
AU V = A3 implies V A(U V ) = V A2 U V A.
To address this sort of problem, we interpret matrices as symbols in the free
associative algebra KX and use the tools from non-commutative Gröbner basis
theory, e.g. [1,12,14]. The process begins with the choice of assumptions, encoded
as a set F of non-commutative polynomials in which all elements of F vanish
as matrices. The following black box tool provides for a suitable choice of F a
proof for q being equal to a as a matrix.
Question q ∈ KX
Answer a = NF(q, G) (with non-commutative division algorithm)
=⇒ q is equal to a as a matrix
224 L. Schmitz and V. Levandovskyy
Let us discuss each step in detail. To prove that two matrices are equal, we
encode them as non-commutative polynomials q1 , q2 ∈ KX and choose a set
of assumptions F ⊆ KX. We use the proof machine and show that q1 and q2
have the same normal form modulo two-sided ideal F. One way of doing this
is to compute a Gröbner basis G of F and to apply the division algorithm with
divisors G, input q1 − q2 and output zero. With this we obtain a proof certificate
k(f
)
q1 − q2 = f t · f · rf t
f ∈F t=1
Proof (of Lemma 1). In the notation of the proof machine let X = {A, U, V }
denote a set of symbols and f := AU V −A3 ∈ KX a polynomial such that every
element in F = {f } vanishes as a matrix. We run Buchberger’s algorithm on F
with respect to the degree lexicographic ordering and linear preorder U > V > A.
It turns out that G = F is a Gröbner basis of the two-sided ideal F. With the
division algorithm, we successively subtract multiples of f until we reach zero.
In the end, we find an expression via f as follows:
2
q := V A(U V ) − V A2 U V A = V · f · U V − V A · f · A + V A2 · f.
Evaluating the right hand side of the above expression implies that q vanishes
as a matrix.
Remark 1. Generally in the free associative algebra, the ideal membership prob-
lem is semi-decidable, e.g. [13,15], meaning that there is a suitable procedure
which terminates if and only if the ideal membership takes place, or runs forever
otherwise.
3 Technical Details
We introduce an algebraic framework and show that solving ideal membership
problems with methods originating from associative ring theory is feasible. In
particular, this involves case studies and an evaluation to handle rectangular
matrices of various sizes.
Every M ∈ X stands for a matrix and the values row(M ) and col(M ) correspond
to the number of rows and columns of this matrix. Let KX denote the free
associative algebra over the free monoid X with string concatenation as its
operation and empty word (identified with 1 ∈ K) as its neutral element. As
a preimage of matrix products with i rows and j columns, for fixed natural
numbers i and j we define a subset of X by
⎧ ⎫
⎨ Mk ∈ X, ≥ 1 ⎬
Ui, j := Mk row(M1 ) = i, col(M ) = j .
⎩ ⎭
k=1 col(Mk ) = row(Mk+1 )
Let SpanK (Ui, j ) denote a K-vector subspace of KX which is spanned by the
elements of Ui, j from above. For arbitrary matrices we define the set of valid
relations UX by the set-theoretic union of subspaces
k(f
)
q =a+ f t · f · rf t ∈ SpanK (Ui, j ).
f ∈F t=1
Formally Verifying Proofs for Algebraic Identities of Matrices 227
k(f,
u)
gu = f tu · f · rf tu ∈ SpanK (Ui(u), j(u) )
f ∈F t=1
where the second equality results from substituting gu into the representation
from above. This implies that a ∈ SpanK (Ui, j ) is valid.
Question q ∈ UX
Answer a = NF (q, G)
=⇒ a ∈ KX \ Xr
Let us discuss each step in detail. For a given two-sided ideal F we compute
a Gröbner basis G with respect to a monomial ordering which eliminates Xe .
We compute the subset Xr ⊂ Xe containing all symbols in Xe whose normal
form contains no symbols in Xe . Together with the proof machine, this leads to
228 L. Schmitz and V. Levandovskyy
polynomial q and a being equal as matrices. That is, the matrix considered by
q has a representation in terms of matrices encoded by X \ Xr exclusively. For
every symbol M ∈ Xr and monomial · M · r ∈ X in q there exists a specific
chain of reductions such that · M · r is reduced to · NF(M, G) · r ∈ KX
before further reductions result in the normal form, hence a ∈ KX \ Xr . We
will work with elimination orderings in the proof of Lemma 3 and Theorem 5
where more details are provided.
Remark 2. Unlike the situation described in Remark 1, we are not aware of an
algorithm performing elimination. It is generally impossible to disprove that an
element is contained in Xr since normal forms are not computable in all cases.
The major reason is that a Gröbner basis with respect to an elimination ordering
is rather infinite and not positively graded. Nevertheless, we have observed in
practice that with some luck in the choice of ordering one gets finite Gröbner
bases and thus solves the elimination problem completely. In other cases, where
only truncated Gröbner bases are available, one obtains only a subset of the
removable matrices.
We employ the open source computer algebra system Singular [4], more pre-
cisely its non-commutative extension Letterplace [11]. Singular is used as a
backend by systems like SAGE [17] and OSCAR [9], supports various standards
and has numerous tools for prospective integration with other systems.
The proof of Lemma 3 consists of two phases and illustrates automated explo-
ration with elimination orderings and a straightforward verification by an ideal
membership problem. Putting this together, we show invertibility of a matrix by
first searching for its inverse and by verifying the defining axiomatics afterwards.
(i) If AB + Im ∈ GLm (K) and BA + I ∈ GL (K) then there exists a repre-
−1 −1
sentation of (AB + Im ) in terms of A, B, Im , I and (BA + I ) . Via
−1
symmetry the same holds for (BA + I ) by exchanging A and B.
(ii) The existence of one inverse implies the other, that is
Proof. For (i) we have to construct the required representation. In the notation
of the elimination machine let
−1 −1
X = {(AB + Im ) , (BA + I ) , A, B, I , Im }
−1
denote a set of symbols with subset of eliminated matrices Xe = {(AB + Im ) }.
To encode the invertibility of BA + I and to describe the identity matrices Im
Formally Verifying Proofs for Algebraic Identities of Matrices 229
and I we define
Executing the computation with the code below we obtain a Gröbner basis G2
of two-sided ideal F2 with respect to an ordering which eliminates Xe . The
−1
corresponding normal form of (AB + Im ) is given by
−1 −1
a = NF((AB + Im ) , G2 ) = −A(BA + I ) B + Im .
(ii) We show that BA + I ∈ GL (K) implies AB + Im ∈ GLm (K). The other
implication is analogous. In F1 we have postulated the invertibility of BA + I ,
so the same applies to a Gröbner basis G1 . The expression for a obtained in (i)
is the same as the inverse of AB + Im modulo G2 . Hence, it suffices to show that
a is a left and a right inverse of AB + Im modulo G1 , that is
NF (a (AB + Im ) − Im , G1 ) = 0 = NF ((AB + Im ) a − Im , G1 ) .
The following source code for Singular realizes all computations and gives
explanations for each command being used.
LIB "freegb.lib";
ring r = 0,(ABpIi,BApIi,A,B,Il,Im),lp; // field of char 0,
//names of variables (like ABpIi) and monomial ordering (lp)
ring R = freeAlgebra(r,11); // free algebra up to length 11
option(redSB); option(redTail); // min reduced GBs option
poly ABpI = A*B + Im; poly BApI = B*A + Il;
ideal F1 = Im*Im - Im, Il*Il - Il, A*Il - A, Im*A - A,
B*Im - B, Il*B - B, BApI*BApIi - Il, BApIi*BApI - Il,
BApIi*Il - BApIi, Il*BApIi - BApIi;
ideal F2 = F1, ABpI*ABpIi - Im, ABpIi*ABpI - Im,
ABpIi*Im - ABpIi, Im*ABpIi - ABpIi;
ideal G1 = twostd(F1); ideal G2 = twostd(F2); // truncated GBs
poly a = NF(ABpIi, G2); // division by G2 with remainder a
a; // synonymous to print(a);
> -A*BApIi*B+Im // output of the previous command
230 L. Schmitz and V. Levandovskyy
NF(a*(A*B+Im)-Im, G1);
> 0
NF((A*B+Im)*a-Im, G1);
> 0
Remark 3. Both Gröbner bases G1 and G2 in the proof are finite. Notably, we
can prove that the restriction to the field of characteristic zero is not essential for
this theorem. Also, for each of the three constructive statements in the Lemma
we can provide a symbolic proof in terms of the original assumptions Fi by
hiding the Gröbner component Gi . For instance, let G2 [11] denote the eleventh
element of G2 considered in the code. In (i) the reduction of (AB + Im )−1 with
divisors G2 requires subtraction by the element
4 Applications
We present several illustrations of concrete mathematical investigations where
the tools from above have been applied successfully. These reach from various
practically relevant identities concerning Moore–Penrose pseudoinverses to an
automated derivation of feedback loops in the Youla controller parametrization.
Lemma 4. For a matrix A ∈ Km× over a field K there exists B ∈ K×m with
ABA = A and BAB = B. Every such B is called a generalized inverse of A.
Proof. Let us consider the non-trivial case rankK (A) = r < min{m, } such that
there are matrices
−1
C C R −1
, F G = ∈ Km×m and P Q , = P Q ∈ K×
D D T
Every polynomial in
F = {Im Im − Im , F C + GD − Im , Im G − G, GIm−r − G, Im F − F,
F Ir − F, Im A − A, AI − A, Im−r D − D, DIm − D, T Q − I−r ,
DG − Im−r , DF, DAP, DAQ, Ir C − C, CIm − C, CG, Ir Ir − Ir ,
CF − Ir , RP − Ir , CAP − Ir , Ir R − R, RI − R, CAQ, RQ,
I P − P, P Ir − P, I I − I , P R + QT − I , I Q − Q, QI−r − Q,
T P, I−r T − T, T I − T, I−r I−r − I−r , Im−r Im−r − Im−r }
vanishes as a matrix. A Gröbner basis of the two-sided ideal F with respect
to the degree reverse lexicographic ordering is finite and given by
The Sherman–Morrison–Woodbury formula provides a cheap way to com-
pute the inverse of a matrix numerically. It has found application in Broyden’s
method. Recently, this formula has been generalized by [5] with all inverse matri-
ces replaced by Moore-Penrose pseudoinverses. A computer-supported proof fol-
lows in a straightforward fashion and without additional insight.
Theorem 4. Let A ∈ Cm× , U ∈ Cm×k , C ∈ Ck×o and V ∈ Co× be matrices
such that V = V A+ A, U = U S + S, V = C + CV , U = U CC + , V = SS + V and
U = AA+ U . This implies the Sherman–Morrison–Woodbury formula
+ +
(A + U CV ) = A+ − A+ U (C + + V A+ U ) V A+
for Moore–Penrose pseudoinverses.
and
−1 −1
(s · In − A − BF ) , (s · In − A − LC) ∈ Rn×n .
Let M , U0 , V0 and N be blocks in the expression
M U0 F −1 I 0
:= (s · In − A − BF ) B −L + m .
N V0 C + DF D Ip
(i) One has V0 ∈ GLp (K) and
−1
Ṽ0 := −F (s · In − A − LC) (B + LD) + Im ∈ GLm (K).
m×p
(ii) Let Qy ∈ K such that V0 + N Qy ∈ GLp (K). The choice
−1
K := (U0 + M Qy ) (V0 + N Qy ) ∈ Km×p
implies that
Im −K
∈ GLm+p (K)
−P Ip
with its inverse given by
H11 H12
H= ∈ K(m+p)×(m+p) ,
H21 H22
and where the Hij have explicit representations in terms of A, B, C, F , L,
−1 −1
D, Qy , Im , Ip , In , (s · In − A − BF ) and (s · In − A − LC) .
(iii) If the entries of Qy are in R, then the same holds for the entries of H.
Proof. (i) We show the invertibility of V0 and Ṽ0 . Since K = K(s), the matrix
s · In − A − BF − LC − LDF ∈ Kn×n
is invertible over the field K. Adding this fact to the other assumptions
which describe invertibility, the elimination machine finds representations
−1
V0−1 = (C + DF ) (s · In − A − BF − LC − LDF ) L + Ip
and Ṽ0−1
= −U0 V0−1 N
+ M.
(ii) By choosing the elimination ordering to be lp (see Singular [4] user’s
manual for its description) the elimination machine produces the represen-
tation
−1 −1
H11 = F (s · In − A − BF ) BQy C(s · In − A − LC) B
−1 −1
+ F (s · In − A − BF ) BQy C(s · In − A − LC) LD
−1 −1
+ Qy C(s · In − A − LC) B + F (s · In − A − BF ) BQy D
−1
+ Qy C(s · In − A − LC) LD + Qy D
−1 −1
− F (s · In − A − BF ) LC(s · In − A − LC) B
−1 −1
− F (s · In − A − BF ) LC(s · In − A − LC) LD
−1
− F (s · In − A − BF ) LD + Im .
Formally Verifying Proofs for Algebraic Identities of Matrices 235
The representations for H12 , H21 and H22 follow in an analogous way, look
similar and are therefore omitted.
(iii) Follows from the explicit presentations of Hij in (ii) since all involved matri-
ces have entries in R.
Remark 4. Note that the additional assumption in the proof of (i) does not
follow from the setup in an obvious way, and has to be inserted manually. On
the other hand, this assumption is not required in the proof of (ii) and (iii).
5 Conclusion
We have supplied not only theoretic results but concrete tools for formal verifi-
cation of algebraic identities. These tools provide an error-free and fast base for
mathematicians, engineers and all others who need computer-supported inves-
tigation with matrices in their daily work. A collection of known identities can
be put in form of an online database, similar to the well-known DLMF, OEIS
and DDMF. Our computational tools can be integrated into other projects, like
theorem provers.
References
1. Bergman, G.: The diamond lemma for ring theory. Adv. Math. 29, 178–218 (1978)
2. Chenavier, C., Hofstadler, C., Raab, C.G., Regensburger, G.: Compatible rewriting
of noncommutative polynomials for proving operator identities. https://arxiv.org/
abs/2002.03626 (2020)
3. Damm, T., Wimmer, H.K.: A cancellation property of the Moore-Penrose inverse
of triple products. J. Aust. Math. Soc. 86(1), 33–44 (2009)
4. Decker, W., Greuel, G.M., Pfister, G., Schönemann, H.: Singular 4-1-3 – A com-
puter algebra system for polynomial computations (2020). http://www.singular.
uni-kl.de
5. Deng, C.Y.: A generalization of the Sherman-Morrison-Woodbury formula. Appl.
Math. Lett. 24(9), 1561–1564 (2011)
6. Grégoire, B., Pottier, L., Théry, L.: Proof certificates for algebra and their applica-
tion to automatic geometry theorem proving. In: Sturm, T., Zengler, C. (eds.) ADG
2008. LNCS (LNAI), vol. 6301, pp. 42–59. Springer, Heidelberg (2011). https://
doi.org/10.1007/978-3-642-21046-4 3
7. Helton, J., Kronewitter, F.: Computer algebra in the control of singularly per-
turbed dynamical systems (1999). http://math.ucsd.edu/∼ncalg/DELL/SingPert/
singpertcdc99.pdf
236 L. Schmitz and V. Levandovskyy
8. Hofstadler, C., Raab, C.G., Regensburger, G.: Certifying operator identities via
noncommutative Gröbner bases. ACM Commun. Comput. Algebra 53, 49–52
(2019)
9. Joswig, M., Fieker, C., Horn, M., et al.: The oscar project (2020). https://oscar.
computeralgebra.de
10. Kronewitter, F.D.: Using noncommutative Gröbner bases in solving partially pre-
scribed matrix inverse completion problems. Linear Algebra Appl. 338(1–3), 171–
199 (2001)
11. Levandovskyy, V., Abou Zeid, K., Schönemann, H.: Singular: Letterplace – A
singular 4-1-3 subsystem for non-commutative finitely presented algebras (2020).
http://www.singular.uni-kl.de
12. Mora, T.: Groebner bases in non-commutative algebras. In: Gianni, P. (ed.) ISSAC
1988. LNCS, vol. 358, pp. 150–161. Springer, Heidelberg (1989). https://doi.org/
10.1007/3-540-51084-2 14
13. Mora, T.: An introduction to commutative and non-commutative Gröbner bases.
Theor. Comput. Sci. 134, 131–173 (1994)
14. Mora, T.: Solving Polynomial Equation Systems IV: vol. 4. Buchberger Theory
and Beyond. Cambridge University Press, Cambridge (2016)
15. Pritchard, F.L.: The ideal membership problem in non-commutative polynomial
rings. J. Symb. Comput. 22(1), 27–48 (1996)
16. Raab, C.G., Regensburger, G., Poor, J.H.: Formal proofs of operator identities by
a single formal computation. https://arxiv.org/abs/1910.06165 (2019)
17. Stein, W., et al.: Sage Mathematics Software. The Sage Development Team (2020)
18. Wavrik, J.J.: Rewrite rules and simplification of matrix expressions. Comput. Sci.
J. Moldova 4(3), 360–398 (1996)
19. Youla, D.C., Jabr, H.A., Bongiorno, J.J.: Modern Wiener-Hopf design of optimal
controllers. II: the multivariable case. IEEE Trans. Autom. Control 21 319–338
(1976)
20. Zhou, K., Doyle, J.C., Glover, K.: Robust and Optimal Control. Prentice Hall,
Upper Saddle River (1996)
AutoMSC: Automatic Assignment of
Mathematics Subject Classification Labels
1 Introduction
zbMATH1 has classified more than 135k articles in 2019 using the Mathematics
Subject Classification (MSC) scheme [6]. With more than 6,600 MSC codes, this
classification task requires significant in-depth knowledge of various sub-fields of
1
https://zbmath.org/.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 237–250, 2020.
https://doi.org/10.1007/978-3-030-53518-6_15
238 M. Schubotz et al.
mathematics to determine the fitting MSC codes for each article. In summary,
the classification procedure of zbMATH and MR is two-fold. First, all articles
are pre-classified into one of 63 primary subjects spanning from general topics
in mathematics (00), to integral equations (45), to mathematics education (97).
In a second step, subject editors assign fine-grained MSC codes in their area of
expertise, i.a. with the aim to match potential reviewers.
The automated assignments of MSC labels has been analyzed by Rehurek
and Sojka [9] in 2008 on the DML-CZ [13] and NUMDAM [3] full-text cor-
pus. They report a micro-averaged F1 score of 81% for their public corpus. In
2013 Barthel, Tönnies, and Balke performed automated subject classification for
parts of the zbMATH corpus [2]. They criticized the micro averaged F1 measure,
especially, if the average is applied only to the best performing classes. However,
they report a micro-averaged F1 score of 67.1% for the zbMATH corpus. They
suggested training classifiers for a precision of 95% and assigning MSC class
labels in a semi-automated recommendation setup. Moreover, they suggested to
measure the human baseline (inter-annotator agreement) for the classification
tasks. Moreover, they found that the combination of mathematical expressions
and textual features improves the F1 score for certain MSC classes substantially.
In 2014, Schöneberg and Sperber [11] implement a method that combined for-
mulae and text using an adapted Part of Speech Tagging approach. Their paper
reported a sufficient precision of >.75, however, it did not state the recall. The
proposed method was implemented and is currently being used especially to
pre-classify general journals [7] with additional information, like references. For
a majority of journals, coarse- and fine-grained codes can be found by statisti-
cally analyzing the MSC codes from referenced documents matched within the
zbMATH corpus. The editor of zbMATH hypothesizes that the reference method
outperforms the algorithm developed by Schöneberg and Sperber. To confirm or
reject this hypothesis was one motivation for this project.
The positive effect of mathematical features is confirmed by Suzuki and
Fujii [15], who measured the classification performance based on an arXiv and
mathoverflow dataset. In contrast, Scharpf et al. [10] could not measure a sig-
nificant improvement of classification accuracy for the arxiv dataset when incor-
porating mathematical identifiers. In their experiments Scharpf et al. evaluated
numerous machine learning methods, which extended [4,14] in terms of accuracy
and run-time performance, and found that complex compute-intensive neural
networks do not significantly improve the classification performance.
In this paper, we focus on the coarse-grained classification of the pri-
mary MSC subject number (pMSCn) and explore how current machine learning
approaches can be employed to automate this process. In particular, we compare
the current state of the art technology [10] with a part of speech (POS) prepro-
cessing based system customized for the application in zbMATH from 2014 [11].
Automatic Assignment of Mathematics Subject Classification Labels 239
2 Method
To investigate the given set of problems, we first created test and training
datasets. We then investigated the different pMSCn encodings, trained our mod-
els and evaluated the results, cf Fig. 1.
Filter Current High Quality Articles: The zbMATH database has assigned MSC
codes to more than 3.6 M articles. However, the way in which mathematical
articles are written has changed over the last century, and the classification of
historic articles is not something we aim to investigate in this article. The first
MSC was created in 1990, and has since been updated every ten years (2000,
2010, and 2020) [5]. With each update, automated rewrite rules are applied to
map the codes from the old MSC to the next MSC version, which is connected
with a loss of accuracy of the class labels. To obtain a coherent and high quality
dataset for training and testing, we focused on the more recent articles from
2000 to 2019, which were classified using the MCS version 2010, and we only
240 M. Schubotz et al.
Splitting to Test and Training Set: After applying the filter criteria as mentioned
above, we split the resulting list of 442,382 articles into test and training sets.
For the test set, we aimed to measure the bias of our zbMATH classification
labels. Therefore, we used the articles for which we knew the classification labels
by the MR service as the training set from a previous research project [1]. The
resulting test set consisted of n = 32, 230 articles, and the training set contained
410,152 articles. To ensure that this selection did not introduce additional bias,
we also computed the standard ten-fold cross validation, cf. Sect. 3.
These 5 fields were provided as CSV files to the algorithms. The mscs field
was generated as follows: For each reference in the document, we looked up the
MSC codes of the reference. For example, if a certain document contained the
references A, B, C that are also in the documents in zbMATH and the MSC
codes of A, B, C are a1 and a2 , b1 , and c1 − c3 , respectively, then the field mscs
will read a1 a2 , b1 , c1 c2 c3 .
After training, we required each of our tested algorithms to return the fol-
lowing fields in CSV format for the test sets:
2
The list of selected journals is available from https://zbmath.org/?q=dt%3Aj+st
%3Aj+py%3A2000-2019.
3
The fields de and labels must not be used as input to the classification algorithm.
Automatic Assignment of Mathematics Subject Classification Labels 241
We ensured that the fields de, method and pos form a primary key, i.e., no two
entries in the result can have the same combination of values. Note that for the
current multi-class classification problem, pos is always 1, since only the primary
MSC subject number is considered.
While the assignment of all MSC codes to each article is a multi-label classi-
fication task, the assignment of the primary MSC subject, which we investi-
gate in this paper, is only a multi-class classification problem. With k = 63
classes, the probability of randomly choosing the correct class of size ci is rather
low Pi = cni . Moreover, the dataset is not balanced. In particular, the entropy
k = H by
H = − i=1 Pi log Pi , can be used to measure the imbalance H log k
normalizing it to the maximum entropy log k.
To take into account the imbalance of the dataset, we used weighted versions
of
k
precision p, recall r, and the F1 measure f . In particular, the precision p =
i=1 ci pi
n with the class precision pi . r and F1 are defined analogously.
In the test set, no entries for the pMSCn 97 (Mathematics education) were
included, thus
= H = 3.44 = .83
H
log k log 62
Moreover, we eliminate the effect of classes with only few samples by disre-
garding all classes with less than 200 entries. While pMSCn with few samples
have little effect on the average metrics, the individual values are distracting
in plots and data tables. Choosing 200 as the minimum evaluation class size
reduces the number of effective classes to k = 37, which only has a minor effect
on the normalized entropy as it is raised to H = .85. The chosen value of 200
can be interactively adjusted in the dynamic result figures we made available
online4 . Additionally, the individual values for Pi that were used to calculate H
are given in the column p in the table on that page. As one can experience in
the online version of the figures, the impact on the choice of the minimum class
size is insignificant.
Table 1. Precision p, recall r and F1 -measure f with regard to the baseline zb1 (left)
and mr1 (right).
p r f p r f
zb1 1 1 1 zb1 0.817 0.807 0.81
mr1 0.814 0.814 0.812 mr1 1 1 1
titer 0.772 0.778 0.773 titer 0.776 0.775 0.772
refs 0.748 0.753 0.746 refs 0.743 0.743 0.737
titls 0.637 0.627 0.623 titls 0.644 0.632 0.627
texts 0.699 0.709 0.699 texts 0.704 0.709 0.699
ref1 0.693 0.648 0.652 ref1 0.693 0.646 0.652
uT1 0.656 0.642 0.645 uT1 0.653 0.636 0.639
uM1 0.655 0.639 0.644 uM1 0.652 0.632 0.636
tiref 0.76 0.764 0.76 tiref 0.762 0.761 0.758
teref 0.769 0.774 0.77 teref 0.771 0.77 0.767
tite 0.713 0.722 0.713 tite 0.72 0.724 0.715
Fig. 2. Mathematical symbols in title and abstract text do not improve the classifi-
cation quality. Method uT1 = left bar; method uM1 = right bar
244 M. Schubotz et al.
Fig. 3. Part-of-speech tagging for mathematics does not improve the classification qual-
ity. Method uM1 = left bar, method tite = right bar.
Effect of Features and Human Baseline: The newly developed method combined
method [10] works best in a combined approach that uses title, abstract text,
and references titer ftiter = 77.3%. This method performs significantly better
than methods that omit either one of these features. The best performing single
feature method was refs frefs = 74.6%) followed by text ftext = 69.9% and
titls ftitls = 62.3%. Thus, automatically generating the MSC subject while
including the references appears to be a very valuable strategy. This becomes
evident also when comparing the scores of approaches that only considered two
features. For the approaches that excluded title (i.e. teref ftext = 77%) or
abstract text (i.e. tiref ftext = 76%), the performance remained notably higher
than when the approach excluded the reference mscs (tite ftext = 71.3%) How-
ever, it is also worth pointing out that the naive reference-based method, ref1
ftext = 65.2%, which is currently being used in production still performs more
poorly than just using tite despite this approach ignoring references. In con-
clusion, we can say that training a machine learning algorithm that weights all
information from the fine grained MSC codes is clearly better than the majority
vote of the references, cf. Fig. 4.
Even the best performing machine learning algorithm, titer with ftiter =
77.3%, is worth than using the classification by human experts from MR, the
other mathematics publication reviewing service, resulted in a baseline of mr1
fmr1 = 81.2%. However, there is no foundation that could allow us to determine
which of the primary MSC subjects, either from MR or zbMATH, are truly cor-
rect. Assigning a two-digit label to mathematical research papers – which often
cover overlapping themes and topics within mathematics – remains a challenge
even to humans, who struggle to conclusively label publications as belonging
Automatic Assignment of Mathematics Subject Classification Labels 245
Fig. 4. Machine learning method (refs, left) clearly outperforms current production
(ref1, right) method using references as only source for classification.
Fig. 5. For many pMSCn the best automatic method (titer, right) gets close to the
performance of the human baseline (mr1 left)
to only a single class. While for some classes, expert agreement is very high,
e.g. for class 20 agreement is 89.1%, for other classes, such as 82, agreement is
only at 47.6% regarding the F1 score, cf., Fig. 5. These discrepancies reflect the
intrinsic problem that mathematics cannot be fully reflected by a hierarchical
system. The differences in classifications made among the two reviewing services
are likely also a reflection of emphasizing different facets of evolving research,
which often derive from differences in the reviewing culture.
We also investigated the bias introduced by the non-random selection of the
training set. Performing ten fold cross validation on the entire dataset yielded
an accuracy of ftiter,10 = .776 with a standard deviation σtiter,10 = .002. Thus,
test set selection does not introduce a significant bias.
After having discussed the strengths and weaknesses of the individual meth-
ods tested, we now discuss how the currently best-performing method, titer,
can be improved. One standard tool to analyze misclassifications is a confusion
246 M. Schubotz et al.
matrix, cf., Fig. 6. In this matrix, off-diagonal elements of the matrix indicate
that two sets of classes are often mixed by the classification algorithm. The x
axis shows the true labels, while the y axis shows the predicted labels. The most
frequent error of titer was that 68 (Computer science) was classified as 5 (Combi-
natorics). Moreover, 81 (Quantum theory) and 83 (Relativity and gravitational
theory) were often mixed up.
However, in general the number of misclassifications were small and there was
no immediate action that one could take to avoid special cases of misclassification
that do not involve a human expert.
Since titer outperforms both the text-based and reference based methods
currently used in zbMATH, we decided to develop a restful API that wraps our
trained model into a service. We use pythons fastAPI under unicorn to handle
higher loads. Our system is available as a docker container and can thus be
scaled on demand. To simplify development and testing, we provide a static
HTML page as a micro UI, which we call AutoMSC. This UI displays not only
lists/suggests the most likely primary MSC subjects but also the less likely MSC
subjects. We expect that our UI can support human experts, especially whenever
the most likely MSC subject seems unsuitable. The result is displayed as a
pie-chart, cf., Fig. 8 from https://automscbackend.formulasearchengine.com. To
use the system in practice, an interface to the citation matching component of
zbMATH would be desired to paste the actual references rather than the MSC
subjects extracted from the references. Moreover, looking at the precision-recall
curve (Fig. 7) for titer, suggests that one can also select a threshold for falling
back to manual classification. For instance, if one requires a precision that is
as high as the precision of the other human classifications by MR, one would
need to only consider suggestions with a score >0.5. This would automatically
classify 86.2% of the 135k articles being annually classified by subject experts at
zbMATH/MR and thus significantly reduce the number of articles that humans
must manually examine without a loss of classification quality. This is something
we might develop in the future.
References
1. Bannister, A., et al.: Editorial: on the road to MSC 2020. EMS Newslett. 2018–
6(108), 3–4 (2018). https://doi.org/10.4171/news/108/1
2. Barthel, S., Tönnies, S., Balke, W.-T.: Large-scale experiments for mathematical
document classification. In: Urs, S.R., Na, J.-C., Buchanan, G. (eds.) ICADL 2013.
LNCS, vol. 8279, pp. 83–92. Springer, Cham (2013). https://doi.org/10.1007/978-
3-319-03599-4 10
3. Bouche, T., Labbe, O.: The new numdam platform. In: Geuvers, H., England, M.,
Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS (LNAI), vol. 10383,
pp. 70–82. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62075-6 6
4. Evans, I.: Semi-supervised topic models applied to mathematical document classi-
fication. Ph.D. thesis, University of Bath, Somerset, UK (2017)
5. Ion, P., Sperber, W.: MSC 2010 in SKOS – the transition of the MSC to the
semantic web. Eur. Math. Soc. Newsl. 84(2012), 55–57 (2010)
6. Kühnemund, A.: The role of applications within the reviewing service zbMATH.
PAMM 16(1), 961–962 (2016). https://doi.org/10.1002/pamm.201610459
7. Mihaljević-Brandt, H., Teschke, O.: “Journal profiles and beyond: what makes a
mathematics journal “general”?” English. Eur. Math. Soc. Newsl. 91, 55–56 (2014)
8. Pedregosa, F., et al.: “Scikit-learn: machine learning in Python”. English. J. Mach.
Learn. Res. 12, 2825–2830 (2011)
9. Řehůřek, R., Sojka, P.: Automated classification and categorization of mathemat-
ical knowledge. In: Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M.,
Wiedijk, F. (eds.) CICM 2008. LNCS (LNAI), vol. 5144, pp. 543–557. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-85110-3 44
10. Scharpf, P., et al.: Classification and clustering of arXiv documents, sections, and
abstracts comparing encodings of natural and mathematical language. In: Proceed-
ings of ACM/IEEE JCDL (2020)
250 M. Schubotz et al.
11. Schöneberg, U., Sperber, W.: POS tagging and its applications for mathematics
-Text Analysis in Mathematics. In: Watt, S.M., Davenport, J.H., Sexton, A.P.,
Sojka, P., Urban, J. (eds.) CICM 2014. LNCS (LNAI), vol. 8543, pp. 213–223.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08434-3 16
12. Schubotz, M., Teschke, O.: “Four decades of TeX at zbMATH”. English. Eur.
Math. Soc. Newslett. 112, 50–52 (2019)
13. Sojka, P., Rehurek, R.: Classification of multilingual mathematical papers in DML-
CZ. In: Proceedings of the 1st Workshop on Recent Advances in Slavonic Natural
Languages Processing, RASLAN 2007, pp. 89–96. Masaryk University (2007)
14. Sojka, P., et al.: Quo vadis, math information retrieval. In: Horák, A., Rychlý, P.,
Rambousek, A., Tribun, E.U. (eds.) The 13th Workshop on Recent Advances in
Slavonic Natural Languages Processing, RASLAN 2019, Karlova Studanka, Czech
Republic, 6–8 December 2019, pp. 117–128 (2019)
15. Suzuki, T., Fujii, A.: Mathematical document categorization with structure of
mathematical expressions. In: 2017 ACM/IEEE Joint Conference on Digital
Libraries, JCDL 2017, Toronto, ON, Canada, 19-23 June 2017, pp. 119–128. IEEE
Computer Society (2017). https://doi.org/10.1109/JCDL.2017.7991566
Maintaining a Library of Formal
Mathematics
1 Introduction
As a tool for managing mathematical knowledge, a proof assistant offers many
assurances. Once a result has been formalized, readers can confidently believe
that the relevant definitions are fully specified, the theorem is stated correctly,
and there are no logical gaps in the proof. A body of mathematical knowledge,
represented by formal definitions and proofs in a single theorem proving envi-
ronment, can be trusted to be coherent.
Logical coherence, however, is only one of many properties that one could
wish of a mathematical corpus. The ideal corpus can be modified, extended, and
queried by users who do not have expert knowledge of the entire corpus or the
underlying system. Proof assistant libraries do not always fare so well in this
respect. Most of the large mathematical libraries in existence are maintained
by expert users with a significant time cost. While external contributions are
easily checked for logical consistency, it typically takes manual review to check
that contributions cohere with the system in other ways—e.g., that lemmas
are correctly marked for use with a simplification tactic. It can be difficult or
impossible for outsiders to understand the library well enough to contribute
themselves.
The first author is supported by the Sloan Foundation (grant G-2018-10067). The
second and third authors receive support from the European Research Council (ERC)
under the European Union’s Horizon 2020 research and innovation program (grant
agreement No. 713999, Matryoshka) and from the Dutch Research Council (NWO)
under the Vidi program (project No. 016.Vidi.189.037, Lean Forward).
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 251–267, 2020.
https://doi.org/10.1007/978-3-030-53518-6_16
252 F. van Doorn et al.
The mathlib library and community are growing at a fast pace. As of May 15,
2020, the library contains over 170,000 lines of non-whitespace, non-comment
code, representing a 25% increase over five months, and 42,000 declarations,
excluding internal and automatically generated ones, a 23% increase. Contribu-
tions have been made by 85 people, a 16% increase over the same time period.
264 commits were made to the mathlib git repository in April 2020; while a
small number were automatically generated, each commit typically corresponds
to a single approved pull request. We display more statistics about the project’s
growth on the community website.1 The library covers a wide range of subject
matter, enough to serve as a base for numerous projects that have formalized
complex and recent mathematical topics [5,7,11].
3 Semantic Linting
Static program analysis, the act of analyzing computer code without running
the code, is widely used in many programming languages. An example of this is
linting, where source code is analyzed to flag faulty or suspicious code. Linters
warn the user about various issues, such as syntax errors, the use of undeclared
variables, calls to deprecated functions, spacing and formatting conventions, and
dangerous language features.
In typed languages like Lean, some of these errors are caught by the elabo-
rator or type checker. The system will raise an error if a proof or program has a
different type than the declared type or if a variable is used that has not been
introduced. However, other problems can still be present in developments that
have been accepted by Lean. It is also possible that there are problems with
the metadata of a declaration, such as its attributes or documentation. These
mistakes are often not obvious at the time of writing a declaration, but will
manifest at a later time. For example, an instance might be declared that will
never fire, or is likely to cause the type class inference procedure to loop.
We have implemented a package of semantic linters in mathlib to flag these
kinds of mistakes. These linters are semantic in the sense that they take as input
a fully elaborated declaration and its metadata. This is in contrast to a syntactic
linter, which takes as input the source code as plain text. The use of semantic
linters allows us to automatically check for many commonly made mistakes,
using the abstract syntax tree (the elaborated term in Lean’s type theory) for
the type or value of a declaration. Syntactic linters would allow for testing of e.g.
the formatting of the source code, but would not help with many of the tests we
want to perform.
The linters can be used to check one particular file or all files in mathlib.
Running the command #lint at any point in a file prints all the linter errors
up to that line. The command #lint_mathlib tests all imported declarations
in mathlib. Occasionally a declaration may be permitted to fail a lint test, for
example, if it takes an unused argument to satisfy a more general interface. Such
lemmas are tagged with the attribute @[nolint], which takes a list of tests that
1
https://leanprover-community.github.io/mathlib stats.html.
254 F. van Doorn et al.
/-- Reports definitions and constants that are missing doc strings -/
meta def doc_blame_report_defn : declaration → tactic (option string)
| (declaration.defn n _ _ _ _ _) := doc_string n >> return none <|>
return "def missing doc string"
| (declaration.cnst n _ _ _) := doc_string n >> return none <|> return
"constant missing doc string"
| _ := return none
A first selection of mathlib linters checks for simple mistakes commonly made
when declaring definition and theorems.
Definitions vs. Theorems. Lean has separate declaration kinds for definitions
and theorems. The subtle differences relate to byte code generation and parallel
elaboration. It is nearly always the case that a declaration should be declared
as a theorem if and only if its type is a proposition. Because there are rare
exceptions to this, the system does not enforce it. The def_lemma linter checks
for this correspondence, so that the user must explicitly approve any exceptions.
Illegal Constants. The Lean core library defines a > b to be b < a, and similarly
for a ≥ b. These statements are convertible, but some automation, including the
simplifier, operates only with respect to syntactic equality. For this reason, it is
convenient to pick a normal form for equivalent expressions. In mathlib, we prefer
theorems to be stated in terms of < instead of >. The ge_or_gt linter checks that
the disfavored constants do not appear in the types of declarations.
Lean and mathlib make extensive use of type classes [21] for polymorphic dec-
larations. Of the 42,000 declarations in mathlib, 465 are type classes and 4600
are type class instances. In particular, type classes are used to manage the hier-
archy of mathematical structures. Their use allows definitions and theorems to
256 F. van Doorn et al.
be stated at high levels of generality and then applied in specific cases with-
out extra effort. Arguments to a declaration are marked as instance implicit by
surrounding them with square brackets. When this declaration is applied, Lean
runs a depth-first backward search through its database of instances to satisfy
the argument. Type classes are a powerful tool, but users often find the under-
lying algorithms opaque, and their misuse can lead to performance issues [20].
A collection of linters aims to warn users about this misuse.
To avoid this, in mathlib the type of module actually takes as arguments the
ring structure on R and the group structure on M. The declaration of module
looks more like this:
class module (R : Type u) (M : Type v) [ring R] [add_comm_group M] :=
(to_has_scalar : has_scalar R M)
/- some propositional fields omitted -/
Using this definition, there is no instance from modules to rings. Instead, the
ring structure of R is carried as an argument to the module structure on M. The
dangerous_instance raises a warning whenever an instance causes a new type
class problem that has a metavariable argument.
However, continuous f is not a type class, and this argument does not appear
in the codomain is_ring_hom (completion.map f). There is no way for the type
class resolution mechanism to infer this argument and thus this instance will
never be applied. The impossible_instance linter checks declarations for this
pattern, warning if a non-type class argument does not appear elsewhere in the
type of the declaration.
A dual mistake to the one above is to mark an argument as instance
implicit even though its type is not a type class. Since there will be no type
class instances of this type, such an argument will never be inferable. The
incorrect_type_class_argument linter checks for this. While the linter is very
simple, it checks for a mistake that is difficult to catch in manual review, since
it requires complete knowledge of the mathlib instance database.
example (x : N) : 0 + (0 + x) = x := by simp
lead to non-commuting diamonds in the type class hierarchy. To avoid this, math-
lib defines a weaker type class, nonempty, which is Prop-valued. Lean propositions
are proof-irrelevant, meaning that any two terms of the same Prop-valued type
are indistinguishable. Thus nonempty does not lead to non-commuting diamonds,
and is safe to use in situations where inhabited instances would cause trouble.
The inhabited_nonempty linter checks for declarations with inhabited argu-
ments that can be weakened to nonempty. Suppose that a Prop-valued declaration
takes an argument h : inhabited T. Since Lean uses dependent types, h may
appear elsewhere in the type of the declaration. If it doesn’t, it can be weakened
to nonempty T, since the elimination principles are equivalent for Prop-valued tar-
gets. Weakening this argument makes the declaration more widely applicable.
Lean contains a simp tactic for (conditional) term rewriting. Similar tactics,
such as Isabelle’s simp [17], are found in other proof assistants. Users can tag
theorems using the @[simp] attribute. The theorems tagged with this attribute
are collectively called the simp set. The simp tactic uses lemmas from the simp
set, optionally with extra user-provided lemmas, to rewrite until it can no longer
progress. We say that such a fully simplified expression is in simp-normal form
with respect to the given simp set.
The simplifier is used widely: mathlib contains over 7000 simp lemmas, and
the string by simp occurs almost 5000 times, counting only a small fraction of its
invocations. However, care needs to be taken when formulating simp lemmas. For
example, if both a = b and b = a are added as simp lemmas, then the simplifier
will loop. Other mistakes are more subtle. We have integrated several linters
that aid in declaring effective simp lemmas.
Both of these issues are checked by the simp_nf linter. It runs the simplifier
on the left-hand side of the simp lemma, and examines the proof term returned
by the simplifier. If the proof of the simplification of the left-hand side uses the
simp lemma itself, then the simp lemma is not redundant. In addition, we also
assume that the simp lemma is not redundant if the left-hand side does not
simplify at all, as is the case for conditional simp lemmas. Otherwise the linter
outputs a warning including the list of the simp lemmas that were used.
The simp_comm linter checks that the simp set contains no commutativity lemmas.
∀ f, is_homomorphism f → f (x + y) = f x + f y
the left-hand side has head symbol f, which is a bound variable, and therefore
the simplifier will not rewrite with this lemma. The simp_var_head linter ensures
that no such lemmas are accidentally added to the simp set.
4 Documentation
The majority of the documentation is oriented around modules. For each Lean
source file in mathlib, we create a single HTML page displaying the module
documentation and information for each declaration in that file. Declarations
appear in the same order as in the source, with an alphabetical index in a side
panel. For each declaration, we print various pieces of information (Fig. 3).
The declaration name is printed including its full namespace prefix. Lean
declarations have four possible kinds: theorem, definition, axiom, and constant.
We print the declaration kind and use it to color the border of the entry for a
visual cue. The type of the declaration is printed with implicit arguments hidden
by default. This gives an easy reference as to how the declaration can be applied.
Each type can be expanded to display all arguments. When a declaration has a
doc string, it is displayed beneath the type.
Lean represents the type former and constructors of an inductive type as
separate constants. We display them together, mirroring the Lean syntax for an
2
https://leanprover-community.github.io/mathlib docs/.
262 F. van Doorn et al.
Fig. 3. The generated documentation entry for the normed_space type class. The
implicit arguments can be expanded by clicking on {. . .}.
Lean proofs are often developed using tactics. Custom tactics can be written
in the language of Lean as metaprograms, and mathlib includes many such tac-
tics [15, Sect. 6]. It is essential for us to provide an index of the available tools
explaining when and how to use them. Tactic explanations are an example of
decentralized documentation. Their implementations appear in many different
files, interspersed with many other declarations, but users must see a single uni-
fied list. These same concerns apply to the commands defined in mathlib, as well
as to attributes and hole commands, which we do not discuss in this paper.
It is inconvenient to maintain a database of tactics separate from the library.
Since mathlib changes rapidly, such a database would likely diverge from the
Maintaining a Library of Formal Mathematics 263
structure tactic_doc_entry :=
(entry_name : string)
(category : doc_category)
(decl_names : list name)
(tags : list string := [])
(description : string := "")
(inherit_description_from : option name := none)
add_tactic_doc
{ entry_name := "linarith",
category := doc_cagetory.tactic,
tags := ["arithmetic", "decision procedure"],
decl_names := [ tactic.interactive.linarith] }
Fig. 4. The information stored in a tactic documentation entry, and the standard way
to register an entry. The text associated with this entry will be the declaration doc
string of tactic.interactive.linarith.
library before long. In addition, the doc strings for tactics—which appear as
tooltips in supported editors—often contain the same text as a tactic database
entry. To avoid these issues, we provide a command add_tactic_doc that registers
a new tactic documentation entry. Another command retrieves all tactic doc
entries that exist in the current environment.
A tactic doc entry (Fig. 4) contains six fields. The command add_tactic_doc
takes this information as input. To avoid duplicating information, the
description field is optional, as this string has often already been written as
a declaration doc string. When description is empty, the command will source
it from the declaration named in inherit_description_from (if provided) or the
declaration named in decl_names (if this list has exactly one element). The HTML
generation tool links each description to its associated declarations.
The entry_name field titles the entry. This is typically the name of the tactic or
command, and is used as the header of the doc entry. The category field is either
tactic, command, hole_command, or attribute. These categories are displayed on
separate pages. The decl_names field lists the declarations associated with this
doc entry. Many entries document only a single tactic, in which case this list will
contain one entry, the implementation of this tactic.
The tags field contains an optional list of tags. They can be used to filter
entries in the generated display. The command can be called at any point in any
Lean file, but is typically used immediately after a new tactic is defined, to keep
the documentation close to the implementation in the source code. The HTML
display allows the user to filter declarations by tags—e.g. to view only tactics
related to arithmetic.
The interface surrounding a definition is often developed in the same file as that
definition. We typically explain the design decisions of a given module in the
264 F. van Doorn et al.
Fig. 5. Library notes can be declared, referenced, and collected anywhere in mathlib.
5 Conclusion
Although there are a growing number of large libraries of formal proofs, both
mathematical and otherwise, little has been written about best practices for
maintaining and documenting these libraries. Ringer et al. [18] note the gap
between proof engineering and software engineering in this respect. Andronick [1]
describes the large-scale deployment of the seL4 verified microkernel, focusing on
the social factors that have led to its success; Bourke et al. [4] describe technical
aspects of maintaining this project. Other discussions of large libraries [3,10]
touch on similar topics. Wenzel [22] explains the infrastructure underlying the
Isabelle Archive of Formal Proofs (AFP), including progress toward building the
AFP with semantic document markup.
Maintaining a Library of Formal Mathematics 265
Sakaguchi [19] describes a tool for checking and validating the hierarchy of
mathematical structures in the Coq Mathematical Components library [13], a
task in the same spirit as our type class linters. Cohen et al. [6] implement a
related tool which greatly simplifies changing this hierarchy.
It is hard to quantify the effect that our linters and documentation have had
on the mathlib community. Fixing issues identified by the instance_priority and
dangerous_instance linters led to performance boosts in the library. Removing
unusable instances and simplification lemmas has also improved performance
and decluttered trace output. More noticeable is the effect on the workload of
maintainers, who can now spend more review time on the deeper parts of library
submissions. Similarly, inexperienced contributors worry less about introducing
subtle mistakes into the library. Users at all levels report frequent use of the
HTML documentation, especially to find information that is not easily available
in an interactive Lean session, such as the list of instances of a given type class.
So far we have only implemented the very basic sanity checks on simp lemmas
described in Sect. 3.4. There are also other properties of term rewriting systems
that we want for the simp set, such as confluence and termination. Kaliszyk and
Sternagel [12] have used completion of term rewriting systems to automatically
derive a simp set for the HOL Light standard library. We plan to implement a
more manual approach, where a linter detects the lack of local confluence and
prints a list of equations for the non-joinable critical pairs. It is then up to the
user to decide how to name, orient, and generalize these new equations.
The current linter framework considers each declaration locally, but we antic-
ipate the need for global tests. The simp_nf linter already goes beyond strictly
local checking: it considers the entire simp set. Another global linter could check
the termination of the simp set. This is a much harder challenge, since checking
termination is undecidable in general. We plan to investigate the integration of
external termination checkers such as AProVE [9].
While many of the features we present are specific to Lean, we believe that
the general considerations apply more broadly: automated validation and docu-
mentation seem essential for a sustainable and scalable library of formal proofs.
Especially in regard to documentation, there is a definite path for coordination
between libraries and systems, possibly aided by tools from the mathematical
knowledge management community.
References
1. Andronick, J.: Successes in deployed verified software (and insights on key social
factors). In: ter Beek, M.H., McIver, A., Oliveira, J.N. (eds.) FM 2019. LNCS,
vol. 11800, pp. 11–17. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
30942-8 2
266 F. van Doorn et al.
2. Avigad, J., de Moura, L., Kong, S.: Theorem Proving in Lean. Carnegie Mellon
University (2014)
3. Bancerek, G., et al.: The role of the Mizar Mathematical Library for interactive
proof development in Mizar. J. Autom. Reasoning 61(1–4), 9–32 (2018). https://
doi.org/10.1007/s10817-017-9440-6
4. Bourke, T., Daum, M., Klein, G., Kolanski, R.: Challenges and experiences in
managing large-scale proofs. In: Jeuring, J., et al. (eds.) CICM 2012. LNCS (LNAI),
vol. 7362, pp. 32–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-
642-31374-5 3
5. Buzzard, K., Commelin, J., Massot, P.: Formalising perfectoid spaces. In: Proceed-
ings of the 9th ACM SIGPLAN International Conference on Certified Programs
and Proofs, CPP 2020, pp. 299–312. Association for Computing Machinery, New
York (2020). https://doi.org/10.1145/3372885.3373830
6. Cohen, C., Sakaguchi, K., Tassi, E.: Hierarchy Builder: algebraic hierarchies made
easy in Coq with Elpi, February 2020. https://hal.inria.fr/hal-02478907
7. Dahmen, S.R., Hölzl, J., Lewis, R.Y.: Formalizing the solution to the cap set prob-
lem. In: Harrison, J., O’Leary, J., Tolmach, A. (eds.) 10th International Conference
on Interactive Theorem Proving (ITP 2019). Leibniz International Proceedings in
Informatics (LIPIcs), vol. 141, pp. 15:1–15:19. Schloss Dagstuhl-Leibniz-Zentrum
fuer Informatik, Dagstuhl, Germany (2019). https://doi.org/10.4230/LIPIcs.ITP.
2019.15
8. Ebner, G., Ullrich, S., Roesch, J., Avigad, J., de Moura, L.: A metaprogramming
framework for formal verification. PACMPL 1(ICFP), 34:1–34:29 (2017). https://
doi.org/10.1145/3110278
9. Giesl, J., et al.: Analyzing program termination and complexity automatically
with AProVE. J. Autom. Reasoning 58(1), 3–31 (2017). https://doi.org/10.1007/
s10817-016-9388-y
10. Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: ITP
2013, pp. 163–179 (2013). https://doi.org/10.1007/978-3-642-39634-2 14
11. Han, J.M., van Doorn, F.: A formal proof of the independence of the continuum
hypothesis. In: Proceedings of the 9th ACM SIGPLAN International Conference on
Certified Programs and Proofs, CPP 2020, pp. 353–366. Association for Computing
Machinery, New York (2020). https://doi.org/10.1145/3372885.3373826
12. Kaliszyk, C., Sternagel, T.: Initial experiments on deriving a complete HOL sim-
plification set. In: Blanchette, J.C., Urban, J. (eds.) PxTP 2013. EPiC Series in
Computing, vol. 14, pp. 77–86. EasyChair (2013)
13. Mahboubi, A., Tassi, E.: Mathematical Components (2017)
14. Marlow, S., Peyton-Jones, S.: The Glasgow Haskell Compiler. In: Brown, A., Wil-
son, G. (eds.) The Architecture of Open Source Applications, Volume II (2012)
15. The mathlib Community: The Lean mathematical library. In: CPP, pp. 367–381.
ACM, New York(2020). https://doi.org/10.1145/3372885.3373824
16. de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The Lean
theorem prover (system description). In: Felty, A.P., Middeldorp, A. (eds.) CADE
2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.
org/10.1007/978-3-319-21401-6 26
17. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL - A Proof Assistant
for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.
org/10.1007/3-540-45949-9
18. Ringer, T., Palmskog, K., Sergey, I., Gligoric, M., Tatlock, Z.: QED at large: a sur-
vey of engineering of formally verified software. Found. Trends R Program. Lang.
5(2–3), 102–281 (2019). https://doi.org/10.1561/2500000045
Maintaining a Library of Formal Mathematics 267
1 Introduction
The Coq Proof Assistant [3] is an Interactive Theorem Prover in which one
proves lemmas using tactic scripts. Individual tactics in these scripts represent
actions that transform the proof state of the lemma currently being proved. A
wide range of tactics exist, with a wide range of sophistication, from simple
inference steps to entire decision procedures and heuristic search procedures.
When proving a lemma, the user’s challenge is to observe the current proof
state and select the appropriate tactic and its arguments to be used. Often the
user makes this decision based on experience with previous proofs. If the current
proof state is similar to a previously encountered situation, then one can expect
that an effective tactic in that situation might also be effective now. Hence, the
user is continuously matching patterns of proof states in their mind and selects
the correct tactic based on these matches.
That is not the only task the user performs, however. When working on a
mathematical development, the user generally has two roles: (1) As a strategist,
the user comes up with appropriate lemmas and sometimes decides on the main
structure of complicated proofs. (2) As a tactician, the user performs the long
and somewhat mindless process of mental pattern matching on proof states,
applying corresponding tactics until the lemma is proved. Many of the steps
in the tactician’s role will be considered as “obvious” by a mathematician. Our
This work was supported by the European Regional Development Fund under
the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000466) and by the
AI4REASON ERC Consolidator grant nr. 649043.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 271–277, 2020.
https://doi.org/10.1007/978-3-030-53518-6_17
272 L. Blaauwbroek et al.
system is meant to replicate the pattern matching process performed in this role,
alleviating the user from this burden. Hence, we have aptly named it Tactician.
To perform its job, Tactician can learn from existing proofs, by looking at
how tactics modify the proof state. Then, when proving a new lemma, the user
can ask the system to recommend previously used tactics based on the current
proof state and even to complete the whole proof using a search procedure based
on these tactic recommendations.
In our previous publication, we describe technical details on the machine learn-
ing techniques employed by Tactician and measure its current automation against
Coq’s standard library [1]. This paper instead gives a quick introduction to Tacti-
cian from the user perspective. Details on installation and usage of Tactician can
be found on the project’s website http://coq-tactician.github.io. There, we also
explain how Tactician can be used on large projects with complex dependencies.
2 Design Principles
For our system, we start with the principal goal of learning from previous proofs
to aid the user with proving new lemmas. In Coq, there are essentially two
notions of proof: (1) proof terms expressed in the Gallina language (Coq’s version
of CIC [9]); (2) tactic proof scripts written by the user that can then generate
a Gallina term. Although it is possible to employ machine learning on both
notions, we choose to learn from tactic scripts for two reasons. (1) Tactic scripts
are more high-level and forgiving, which is more suitable for machine learning.
(2) Working on the tactic level allows the user to introduce domain-specific
information to aid the system by writing new tactics. One can teach Tactician
about such tactics merely by using them in hand-written proofs a couple of
times, after which the system will automatically start to use them.
Apart from the principal goal described above, Tactician’s most important
objective is to be usable and remain usable by actual Coq users. To achieve
this usability, Tactician needs to be pleasant to all parties involved, which we
express in four basic “friendliness” tenets: user-friendly, installation-friendly,
integration-friendly, and maintenance-friendly. More concretely, it should be
usable in any editor, with minimal configuration and no time spent training
a ML model. Instead, the system should learn on the fly. Tactician should be
tightly integrated with Coq, implemented as a plugin in OCaml without requiring
external toolkits. To ensure ease of installation and to prevent it from becoming
abandonware, it should be entered into the Coq Package Index [2].
The tight integration with Coq make Tactician function both in Coq’s inter-
active mode and compilation mode. In the next two sections, we describe how
the system is integrated with these modes.
Lemma a : σ
tactica1 . Γa1 σ1 , tactica1 , Γa2 σ2
tactica2 . Γa2 σ2 , tactica2 , Γa3 σ3
Lemma b : τ
tacticb1 . Γb1 τ1 , tacticb1 , Γb2 τ2
tacticb2 . Γb2 τ2 , tacticb2 , Γb3 τ3
Proof State
tacticbn . Γbn τn , tacticbn , · ·
A : γ1
Qed.
B : γ2
Z : γn
ω3 Lemma z : ω
tacticz1 . Γz1 ω1 , tacticz1 , Γz2 ω2
Messages tacticz2 . Γz2 ω2 , tacticz2 , Γz2 ω3
suggest.
Suggestions:
search.
tacticp6 .
Qed.
tacticu12 .
Φ1 ρ 1
suggest
t11 ,t12 ,. . . ,t
,t1n
1n
Φ2 ρ 2 Φ3 ρ 2 Φm ρ m
suggest suggest suggest
t21 ,t22 ,. . . ,t
,t2n
2n t31 ,t32 ,. . . ,t
,t3n
3n tm1 ,tm2 ,. . . ,t
,tmn
mn
··
Database X Database Y
Inherits Database X
When development X.v is then Required by another development file Y.v, the tac-
tic database of X.v is automatically inherited.
5 A Concrete Example
We now give a simple example use-case based on lists. Starting with an empty
file, Tactician is immediately ready for action. We proceed as usual by giving a
standard inductive definition of lists of numbers with their corresponding nota-
tion and a function for concatenation.
Inductive list := Fixpoint concat ls1 ls2 :=
| nil : list match ls1 with
| cons : nat -> list -> list. | [] => ls2
Notation "[]" := nil. | x::ls1 ' => x::(ls1 ' ++ ls2 )
Notation "x::ls" := (cons x ls). end where "ls1 ++ls2 ":=(concat ls1 ls2 ).
We wish to prove some standard properties of concatenation. The first is a
lemma stating that the empty list [] is the right identity of concatenation (the
left identity is trivial).
Lemma concat_nil_r ls : ls ++ [] = ls.
With Tactician installed, we immediately have access to the new tactics suggest
and search. Neither tactic will produce a result when used now since the system
has not had a chance to learn from proofs yet. Therefore, we will have to prove
this lemma by hand.
The system has immediately learned from this proof (it was even learning during
the proof) and is now ready to help us with a proof of the associativity of
concatenation.
Lemma concat_assoc ls1 ls2 ls3 : (ls1 ++ ls2 ) ++ ls3 = ls1 ++ (ls2 ++ ls3 ).
276 L. Blaauwbroek et al.
Now, if we execute suggest, it outputs the ordered list induction ls1 , simpl,
f equal,... Indeed, using induction as our next tactic is not unreasonable. We
can repeatedly ask suggest for a recommendation after every tactic we input,
which sometimes gives us good tactics and sometimes bad tactics. However,
we can also eliminate the middle-man and execute the search tactic, which
immediately finds a proof.
To cache the proof that is found for the future, we can copy-paste the recon-
struction tactic that Tactician prints into the source file. This example shows
how the system can quickly learn from very little data and with minimal effort
from the user. Of course, this also scales to much bigger developments.
6 Related Work
Tactician takes its main inspiration from the TacticToe [5] system for HOL4.
Our work is similar to TacticToe in principle, but diverges significantly in the
implementation details due to the large differences between HOL4 and Coq, both
their logical system and practical implementation, see [1].
The most significant distinguishing factor of Tactician to other systems for
Coq is its user-friendliness. There are many other interesting ML systems for
Coq, such as ML4PG [8], SEPIA [6], GamePad [7], CoqGym [11], and Prover-
Bot9001 [10]. However, all of these systems are either difficult to install, can only
be used with one editor, need a long time to train their models or do not have
an end-user interface at all. Many such systems are geared towards the AI com-
munity rather than towards the Theorem Proving community. CoqHammer [4]
is the only other system we know of for Coq that has tight integration with Coq
and is directly usable for end-users. For more detailed related work, see [1].
References
1. Blaauwbroek, L., Urban, J., Geuvers, H.: Tactic learning and proving for the Coq
proof assistant. In: LPAR23. EPiC Series in Computing, vol. 73, pp. 138–150.
EasyChair (2020)
2. Coq Development Team: Coq package index. https://coq.inria.fr/opam/www
3. Coq Development Team: The Coq proof assistant, version 8.11.0, October 2019
4. Czajka, L., Kaliszyk, C.: Hammer for Coq: automation for dependent type theory.
J. Aut. Reasoning 61(1–4), 423–453 (2018)
5. Gauthier, T., Kaliszyk, C., Urban, J.: TacticToe: Learning to reason with HOL4
tactics. In: LPAR. EPiC Series in Computing, vol. 46, pp. 125–143. EasyChair
(2017)
6. Gransden, T., Walkinshaw, N., Raman, R.: SEPIA: search for proofs using inferred
automata. In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol.
9195, pp. 246–255. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-
21401-6 16
The Tactician 277
7. Huang, D., Dhariwal, P., Song, D., Sutskever, I.: Gamepad: a learning environment
for theorem proving. In: ICLR (Poster). OpenReview.net (2019)
8. Komendantskaya, E., Heras, J., Grov, G.: Machine learning in proof general: inter-
facing interfaces. UITP. EPTCS 118, 15–41 (2012)
9. Paulin-Mohring, C.: Inductive definitions in the system Coq rules and properties.
In: Bezem, M., Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664, pp. 328–345.
Springer, Heidelberg (1993). https://doi.org/10.1007/BFb0037116
10. Sanchez-Stern, A., Alhessi, Y., Saul, L.K., Lerner, S.: Generating correctness proofs
with neural networks. In: arXiv/CoRR. abs/1907.07794 (2019)
11. Yang, K., Deng, J.: Learning to prove theorems via interacting with proof assis-
tants. In: ICML, Proceedings of Machine Learning Research, vol. 97, pp. 6984–6994
(2019)
Tree Neural Networks in HOL4
Thibault Gauthier(B)
1 Introduction
Applying machine learning to improve proof automation has been an essential
topic in the theorem proving community and contributed to the rise of powerful
automation such as hammers [2]. In these systems, the current machine learning
predictors learn the premise selection task with relative success. However, these
predictors typically rely on a set of syntactic features, and thus, they can hardly
discover semantic patterns. To solve this issue, we propose in this work to rely
on deep learning models to automatically infer appropriate features that better
approximates object semantics. The success of this approach depends heavily
on how the design of the neural network architecture encodes and processes the
input objects. For example, the space invariance of convolutional neural networks
makes them successful at interpreting images. Moreover, recurrent networks can
process arbitrarily long sequences of tokens, which is necessary for learning text-
based tasks. In the case of formulas, tree neural networks(TNNs) [8] capture the
compositional nature of the underlying functions as their structure dynamically
imitates the tree structure of the formula considered.
That is why we implement TNNs in HOL4 [9] and evaluate their pattern
recognition abilities on two tasks related to theorem proving. The first task is
to estimate the value of an expression. It is an example of evaluating a formula
in a Tarski-style model, which can be in general useful for conjecturing and
approximate reasoning. The second task is to estimate the truth of a formula.
Acquiring this ability is important for discarding false conjectures and flawed
derivations. These two tasks are only a sample of the many theorem proving
tasks that could be learned. We believe that deep learning models such as TNNs
could be useful to guide automated theorem provers. In practice, the existence
This work has been supported by the European Research Council (ERC) grant
AI4REASON no. 649043 under the EU-H2020 programme. We would like to thank
Josef Urban for his contributions to the final version of this paper.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 278–283, 2020.
https://doi.org/10.1007/978-3-030-53518-6_18
Tree Neural Networks in HOL4 279
head
0 × 0 + s(0)
+
0×0 s(0)
× s
0 0 0
Fig. 1. Computation flow of a tree neural network on the arithmetical expression 0×0+
s(0). The operator s stands for the successor function. Rectangles represent embeddings
(in Rd ) and rounded squares represent neural networks.
In both experiments, the TNNs have neural network operators (including the
head network) with one hidden layer and with a embedding dimension d = 12.
we follow a training schedule over 200 epochs using a fixed learning rate of 0.02
and we double the batch size after every 50 epochs from 8 to 64.
1
https://github.com/HOL/examples/AI TNN/README.md.
2
https://github.com/HOL-Theorem-Prover/HOL.
3
c679f0c69b397bede9fefef82197f33ec495dd8a.
280 T. Gauthier
5
https://github.com/deepmind/logical-entailment-dataset.
282 T. Gauthier
5 Usage
Our deep learning modules allow HOL4 users to train a TNN on a chosen super-
vised learning task with little development overhead. The function train tnn
from the module mlTreeNeuralNetwork is available for such purpose. Its three
arguments are a schedule, an initial TNN, and a couple consisting of training
examples and testing examples.
Initial TNN. To create an initial TNN, the user first needs to gather all operators
appearing in the examples. Then, given an embedding dimension d, for each
operator f with arity a the list of dimensions of Nf is to be defined as:
[a × d, u1 , . . . , uk , d]
The natural numbers u1 , . . . , uk are sizes of the intermediate layers that can be
freely chosen by the user. In the case of a head operator hi , the input dimension
is to be d and the output dimension is to be the length of the list li . From
the operators (including heads) and the associated dimensions, the user can
randomly initialize the weights of the TNN by calling random tnn.
6 Conclusion
In this paper, we presented an implementation of tree neural networks(TNNs)
in HOL4 that can be used to learn a function on HOL4 formulas from examples.
Compared to the other machine learning predictors, it excels on the arithmetical
evaluation task as the TNN architecture reflects perfectly the implied bottom-up
computation. It also exhibits excellent performance on propositional formulas.
It yields a better accuracy than an existing implementation of TNNs but comes
short of more involved architectures tailored for this particular task. As a way
forward, we would like to see if the observed TNNs pattern recognition abilities
(understanding) transfer to other tasks such as premise selection or high-order
unification, which could have a more direct benefit for proof automation.
Tree Neural Networks in HOL4 283
References
1. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances
in Neural Information Processing Systems 30: Annual Conference on Neural Infor-
mation Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA, pp.
5998–6008 (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need
2. Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED.
J. Formalized Reasoning 9(1), 101–148 (2016). https://doi.org/10.6092/issn.1972-
5787/4593
3. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 13–17 August 2016, San Francisco, CA, USA, pp. 785–794 (2016).
https://doi.org/10.1145/2939672.2939785
4. Chvalovský, K.: Top-down neural model for formulae. In: 7th International Con-
ference on Learning Representations, ICLR 2019, 6–9 May 2019, New Orleans, LA,
USA (2019). https://openreview.net/forum?id=Byg5QhR5FQ
5. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf.
Theory 13(1), 21–27 (1967). https://doi.org/10.1109/TIT.1967.1053964
6. Evans, R., Saxton, D., Amos, D., Kohli, P., Grefenstette, E.: Can neural networks
understand logical entailment? In: 6th International Conference on Learning Repre-
sentations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018, Conference
Track Proceedings (2018). https://openreview.net/forum?id=SkZxCk-0Z
7. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large
linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008). https://dl.acm.org/
citation.cfm?id=1442794
8. Kiperwasser, E., Goldberg, Y.: Easy-first dependency parsing with hierarchical tree
LSTMs. TACL 4, 445–461 (2016)
9. Slind, K., Norrish, M.: A brief overview of HOL4. In: Mohamed, O.A., Muñoz, C.,
Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 28–32. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-71067-7 6
10. Wang, Q., Kaliszyk, C., Urban, J.: First experiments with neural translation of
informal to formal mathematics. In: Rabe, F., Farmer, W.M., Passmore, G.O.,
Youssef, A. (eds.) CICM 2018. LNCS (LNAI), vol. 11006, pp. 255–270. Springer,
Cham (2018). https://doi.org/10.1007/978-3-319-96812-4 22
Interpreting Mathematical Texts
in Naproche-SAD
In principle, these points were already addressed in the early years of inter-
active theorem proving, e.g., in the Mizar project [7], see also [4]. Other proof
assistants have implemented Mizar-like proof languages with declarative proof
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 284–289, 2020.
https://doi.org/10.1007/978-3-030-53518-6_19
Interpreting Mathematical Texts in Naproche-SAD 285
structures [14]. Note, however, that the Mizar language is a restricted formal
language that is not part of commonly used mathematical English.
To reach an even higher degree of naturality, a small number of experimental
proof assistants accept proofs in controlled natural languages (CNL) which are
fully formal subsets of common natural English (with symbolic mathematical
terms). Moreover, input texts may be structured just like proof texts in the
published mathematical literature. This development should eventually lead to
systems that one may term natural proof assistants. In this paper we highlight
some aspects of points 1–3 in view of recent improvements [12] to the previous
Naproche-SAD release [3]. More technical details are contained in an informal
system description that we are also submitting to this conference.
The Evidence Algorithm project (EA) which was started by V. Glushkov was
inspired by the idea of a system to assist actual mathematical work [13]. It was
centered around the development of a controlled natural language for mathemat-
ics called ForTheL (Formula Theory Language). The project culminated in the
implementation of the proof assistant SAD (System for Automated Deduction)
in the PhD work of Andrei Paskevich [11].
Independently, the Naproche (Natural Proof Checking) initiative [10] devel-
oped a controlled natural language on top of classical first-order logic, with
an emphasis on techniques from formal linguistics. The PhD thesis of Marcos
Cramer demonstrated that formal grammars and discourse representation theory
could deal adequately and efficiently with mathematical proof texts [1].
A few years ago Naproche has adopted and extended the ideas and algorithms
of SAD (see [2], [5]) because of SAD’s superior logical setup and performance.
Naproche-SAD accepts and proof-checks texts like
Definition 1. A natural number p is prime iff p = 0, 1 and for every k such
that k | p, we have k = p or k = 1.
are based on patterns without further analysis of its constituent tokens. On the
other hand ForTheL gives a lot of freedom for the creation of patterns, allowing
rather arbitrary ASCII sequences as tokens.
For the above sample text, the pattern “natural number (−)” with a slot (−)
for some other term can be introduced by a language extension of the form
\begin{definition}
A natural number $p$ is prime iff $p \neq 0, 1$
and for every $k$ such that $k \divides p$,
we have $k = p$ or $k = 1$.
\end{definition}
\begin{theorem}[Euclid’s lemma]
If $p$ is prime and $p \divides m\mul n$
then $p \divides m$ or $p \divides n$.
\end{theorem}
operations). We address this by maintaining two operator tables, along with the
list of relators, and parsing expressions in three steps.
Introduction of Grammatical Number. Naproche-SAD used to have no
concept of grammatical number, treating singular and plural forms completely
synonymously. This can lead to ambiguities. For example, treating “is”/“are”
synonymously in “the maximum of x and y is/are smaller than z” creates an
ambiguity; with the first interpretation being “(the maximum of x and y) is
smaller than z” and the second interpretation being “(the maximum of x) and
y are smaller than z”, where the maximum is understood as an operation on a
list or set. This ambiguity can be resolved with grammatical number.
5 Future Work
The naturalness of an interactive system is the result of a large number of small
natural features. We shall continue to enhance Naproche-SAD in this direction.
Interpreting Mathematical Texts in Naproche-SAD 289
References
1. Cramer, M.: Proof-checking mathematical texts in controlled natural language.
Ph.D. thesis, University of Bonn (2013)
2. Frerix, S., Koepke, P.: Automatic proof-checking of ordinary mathematical texts.
In: CICM Informal Proceedings (2018). http://ceur-ws.org/Vol-2307/paper13.pdf
3. Frerix, S., Wenzel, M., Koepke, P.: Isabelle/Naproche (2019). https://sketis.net/
2019/isabelle-naproche-for-automatic-proof-checking-of-ordinary-mathematical-
texts
4. Harrison, J., Urban, J., Wiedijk, F.: Interactive theorem proving. In: Gabbay, D.M.,
Siekmann, J., Woods, J. (eds.) Computational Logic of the Handbook of the His-
tory of Logic, vol. 9, pp. 266–290. Elsevier, Amsterdam (2014)
5. Koepke, P.: Textbook Mathematics in the Naproche-SAD System. In: CICM Infor-
mal Proceedings (2019). http://cl-informatik.uibk.ac.at/cek/cicm-wip-tentative/
FMM4.pdf
6. Lean community: The Lean mathematical library. https://github.com/leanprover-
community/mathlib
7. Mizar. http://mizar.org/
8. de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The Lean
theorem prover. In: Automated Deduction - CADE-25 (2015)
9. Naproche community: A ForTheL Library. https://github.com/naproche-
community/FLib
10. Naproche. https://korpora-exp.zim.uni-duisburg-essen.de/naproche/
11. Paskevich, A.: Méthodes de formalisation des connaissances et des raisonnements
mathématiques: aspects appliqués et théoriques. Ph.D. thesis, Université Paris 12
(2007)
12. Prototype CNL. https://github.com/adelon/nave
13. Glushkov, V.M.: Some problems in the theories of automata and artificial intelli-
gence. Cybern. Syst. Anal. 6, 17–27 (1970). https://doi.org/10.1007/BF01070496
14. Wenzel, M.: Isabelle/Isar - a versatile environment for human-readable formal proof
documents. Ph.D. thesis, TU Munich (2002)
15. Wiedijk, F.: The QED manifesto revisited. In: From Insight to Proof, Festschrift
in Honour of Andrzej Trybulec, pp. 121–133 (2007)
TGView3D: A System for 3-Dimensional
Visualization of Theory Graphs
1 Introduction
Digital libraries of both informal and formal mathematics have reached enormous
sizes. For instance, at least half a dozen theorem prover libraries exceed 105
statements. Thus, it is getting more and more difficult to organize this knowledge
in a way that humans can understand and access it. While library sources,
generated presentations, and IDEs such as PIDE [Wen19] give good access to
local knowledge structures, global properties of the induced knowledge spaces
are very difficult to assess.
Theory graphs provide a good representation for these global properties: the
nodes are theories and their edges theory morphisms that define interrela-
tions between theories. Concretely, we use OMDoc/MMT [Koh06,RK13], which
distinguishes multiple kinds of morphisms for theory graphs: Most importantly,
inclusions represent the inheritance relation, and views represent translations
and interpretations.
However, standard graph visualization techniques are not ideal for theory
graphs. Inclusions are highly prevalent and induce a directed acyclic subgraph,
which captures the primary structure of the graph, in particular the inheritance
hierarchy; therefore, they must be prioritized in the layout. Views may introduce
cycles or connect very distant theories; therefore, they must be layouted with
care to avoid intersecting edges, which can lead to a messy layout, especially in
The authors were supported by DFG grant RA-1872/3-1, KO 2428/13-1 OAF and EU
grant Horizon 2020 ERI 676541 OpenDreamKit. They are also grateful for hardware
support from and very helpful discussions about layout algorithms with Roberto Grosso
and Marc Stamminger as well as Jonas Müller.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 290–296, 2020.
https://doi.org/10.1007/978-3-030-53518-6_20
TGView3D: A System for 3-Dimensional Visualization of Theory Graphs 291
the 2-dimensional case. For example, we have never been satisfied with the visu-
alization and user interaction features that state-of-the-art tools could provide
for our own graph of logic formalizations (LATIN; see [Cod+11]), containing
(only) a few hundred nodes and many includes representing the modular design
of logics and views representing logic translations. We will use this LATIN theory
graph as a running example.
Superseding our previous two-dimensional theory graph viewer [RKM17],
TGView3D is a three-dimensional theory graph visualization tool that adapts
traditional force-directed layout algorithms to make use of hierarchies and clus-
ters in theory graphs, using an approach similar to [DKM06] except extended to
three dimensions.
TGView3D is based on the Unity game engine [UGE]. While there are dedi-
cated tools for interactive 3D graph visualization such as Gephi [BHJ09] or web
applications and frameworks (e.g., based on WebGL), we opted for Unity as it
allows fast implementation of typical 3D interactions and flexible platform sup-
port as well as efficient rendering of large graphs. Unity also allows building two
versions of TGView3D: a WebGL version that we can embed into browser-based
interfaces for casual users, and executables for VR hardware that offer better
performance for power users.
While Unity is proprietary, all of our code is licensed under GPLv3 and
is available at https://github.com/UniFormal/TGView3D. The web application
runs at https://tgview3d.mathhub.info and a demo video for the VR executable
is available at https://youtube.com/watch?v=Mx7HSWD5dwg.
of moving through the graph. To allow crawling through the graph and focus-
ing on the local neighborhood of nodes, we give users the option to hide all
edges except those of selected nodes. Last, to bridge the gap between local and
global exploration, TGView3D can also compute node bicones, which show the
transitive inclusions of a node, i.e., the two trees of nodes that can be reached
by following the inclusion relation forwards and backwards. This gives the user
information about the role of an individual node in relation to the full graph.
Hierarchical Clustering. In Mmt theory graphs, all nodes and edges are
labeled in two orthogonal ways: with their logical URI, which follows
the namespace structure chosen by the user, and their physical source
URL, which follows the project and folder structure of the source files.
TGView3D uses this information to define clusters, which are visualized
by using the same color for the respective nodes and adding a cluster
label. Beyond that, TGView3D permits collapsing these clusters into a sin-
gle bigger node to reduce graph complexity and enable step-wise graph
exploration. In that case, all edges of the original
nodes are propagated to the cluster node. This also
allows for nested clusters, which is important to effi-
ciently reduce the graph to a size where humans can
recognize clear structures and computers can handle
the computational load better. With this method, we
can compress the graph shown in Fig. 2 drastically
(cf. Fig. 4) and still show all edge types at the same
time.
Indeed, mathematical libraries often yield large
theory graphs with a single connected component,
and theory graphs visualizations should not always
Fig. 4. LATIN graph: hier- be self-contained. As an complementary approach to
archic clustering clustering, TGView3D can also be opened with a sub-
graph built for a particular theory, containing some
neighborhood of that theory. The key difference is that instead of collapsing
nodes into clusters, the user preselects a certain cluster to reduce the size of
the loaded graph. In that case, TGView3D reveals the origin of external nodes
and gives users the option to load the respective subgraphs to add them to the
current one, thus gradually increasing the size of the visible subgraph.
called by URL parameters that govern which graph to load. It can call other
systems by opening URLs attached to the nodes and edges, e.g., in response
to user interaction. Thus, every library, namespace, and theory viewed in Math-
Hub allows opening a corresponding subgraph in TGView3D in a new page. Vice
versa, the Mmt URI of every node or edge in TGView3D can be used to view
the sources of the respective object in MathHub (cf. Fig. 5). It is also straight-
forward to add the functionality of opening nodes and edges in a locally running
version of Mmt’s source editor instead.
References
[BHJ09] Bastian, M., Heymann, S., Jacomy, M.: Gephi: an open source software
for exploring and manipulating networks. In: Third International AAAI
Conference on Weblogs and Social Media (2009)
[Cod+11] Codescu, M., Horozal, F., Kohlhase, M., Mossakowski, T., Rabe, F.:
Project abstract: logic atlas and integrator (LATIN). In: Davenport, J.H.,
Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS, vol. 6824, pp.
289–291. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-
22673-1 24. https://kwarc.info/people/frabe/Research/CHKMR latinabs
11.pdf
[DKM06] Dwyer, T., Koren, Y., Marriott, K.: Drawing directed graphs using
quadratic programming. IEEE Trans. Visual Comput. Graph. 12(4), 536–
548 (2006)
[Koh06] Kohlhase, M.: OMDoc - An Open Markup Format for Mathematical Docu-
ments [Version 12]. LNAI, vol. 4180. Springer, Heidelberg (2006). https://
doi.org/10.1007/11826095. http://omdoc.org/pubs/omdoc1.2.pdf
[RK13] Rabe, F., Kohlhase, M.: A scalable module system. Inf. Comput. 230, 1–54
(2013). http://kwarc.info/frabe/Research/mmt.pdf
[RKM17] Rupprecht, M., Kohlhase, M., Müller, D.: A flexible, interactive theory-
graph viewer. In: Kohlhase, A., Pollanen, M. (eds.) MathUI 2017: The
12th Workshop on Mathematical User Interfaces (2017). http://kwarc.
info/kohlhase/papers/mathui17-tgview.pdf
296 R. Marcus et al.
Yutaka Nagashima1,2(B)
1
Czech Technical University in Prague, Prague, Czech Republic
Yutaka.Nagashima@cvut.cz
2
University of Innsbruck, Innsbruck, Austria
1 Introduction
This work was supported by the European Regional Development Fund under the
project AI & Reasoning (reg. no.CZ.02.1.01/0.0/0.0/15 003/0000466) and by NII under
NII-Internship Program 2019-2nd call.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 297–302, 2020.
https://doi.org/10.1007/978-3-030-53518-6_21
298 Y. Nagashima
which proof method to recommend to what kind of proof goal from proof docu-
ments in Isabelle’s standard library and the Archive of Formal Proofs [10].
The key component of PaMpeR is its elaborate feature extractor. Instead of
applying machine learning algorithms to Isabelle’s proof documents directly,
PaMpeR first applies 113 assertions to the pair of a proof goal and its underlying
context. Each assertion checks a certain property about the pair and returns a
boolean value. Some assertions check if a proof goal involves certain constants or
types defined in the standard library. Others check the meta-data of constants
and types appearing in a goal. For example, one assertion checks if the goal has
a term of a type defined with the codatatype keyword.
When developing PaMpeR, we applied these 113 assertions to the proof
method invocations appearing in the proof documents and constructed a dataset
consisting of 425,334 unique data points.
Note that this number is strictly smaller than all the available proof method
invocations in Isabelle2020 and the Archive of Formal Proofs in May 2020, from
which we can find more than 900k proof method invocations. One obvious rea-
son for this gap is the ever growing size of the available proof documents. The
other reason is that we are intentionally ignoring compound proof methods while
producing data points. We decided to ignore them because they may pollute the
database by introducing proof method invocations that are eventually back-
tracked by Isabelle. Such backtracking compound methods may reduce the size
of proof documents at the cost of introducing backtracked proof steps, which
are not necessary to complete proofs. Since we are trying to recommend proof
methods appropriate to complete a proof search, we should not include data
points produced by such backtracked steps.
We trained PaMpeR by constructing regression trees [3] from this dataset.
Even though our tree construction is based on a fixed height and we did not
take advantage of modern development of machine learning research, our cross
evaluation showed PaMpeR can correctly predict experts’ choice of proof methods
for many cases. However, decision tree construction based on a fixed height is an
old technique that tends to cause overfitting and underfitting. We expect that
one can achieve better performance by applying other algorithms to this dataset.
In the following we present the simple dataset we used to train PaMpeR.
Our aim is to provide a dataset that is publicly available at Zenodo [15] and
easily usable for machine learning practitioners without backgrounds in theorem
proving, so that they can exploit the latest development of machine learning
research without being hampered by technicalities of theorem proving.
This is a valid proof script, with which Isabelle can check the correctness of the
conjecture; however, the application of the rule method is hardly appropriate
since the subsequent application of the auto method can discharge the proof
without the preceding rule. For these reasons we take the proof methods chosen
by human proof authors as the correct choice while ignoring other possibilities.
300 Y. Nagashima
1. assertions that check terms and types appearing in the first sub-goal, and
2. assertions that check how such terms and types are defined in the underlying
proof context.
The first kind of assertions directly check the presence of constructs defined
in the standard library. For example, the 56th assertion checks if the first sub-
goal contains Filter.eventually, which is a constant defined in the standard
library since the presence of this constant may be a good indicator to recommend
the special purpose proof method called eventually elim. A possible limitation
of these assertions is that these assertions cannot directly check the presence of
user-defined constructs because such constructs may not even exist when we
develop the feature extractor.
The second kind of assertions address this issue by checking how constructs
appearing in the first sub-goal are defined in the proof context. For example, the
13th assertion checks if the first sub-goal involves a constant that has one of the
following related rules: the code rule, the ctr rule, and the sel rule.
These related rules are derived by Isabelle when human engineers define
new constants using the primcorec keyword, which is used to define primitively
corecursive functions. Since this assertion checks how constants are defined in
the background context, it can tell that the proof goal at hand is a coinductive
problem. Therefore, if this assertion returns true, maybe the special purpose
method called coinduct would be useful, since it is developed for coinductive
problems. The advantage of this assertions is that it can guess if a problem is a
coinductive problem or not, even though we did not have that problem at hand
when developing the assertion.
Due to the page limit, we expound the further details of the 113 assertions
in our accompanying Appendix [14].
The task for machine learning algorithms is to predict the name of a promising
proof method from the corresponding array of boolean values. Since we often
have multiple equivalently suitable methods for a given proof goal, this learning
task should be seen as a multi-output problem: given an array of boolean values
machine learning algorithms should return multiple candidate proof methods
rather than only one method. Furthermore, this problem should be treated as
a regression problem rather than a classification problem, so that users can see
numerical estimates about how likely each method is suitable for a given goal.
Simple Dataset for Proof Method Recommendation in Isabelle/HOL 301
References
1. Bansal, K., Loos, S.M., Rabe, M.N., Szegedy, C., Wilcox, S.: HOList: an environ-
ment for machine learning of higher order logic theorem proving. In: Proceedings of
the 36th International Conference on Machine Learning, ICML 2019, Long Beach,
California, USA (2019). http://proceedings.mlr.press/v97/bansal19a.html
2. Blanchette, J.C., Haslbeck, M.W., Matichuk, D., Nipkow, T.: Mining the archive of
formal proofs. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.)
CICM 2015. LNCS, vol. 9150, pp. 3–17. Springer, Heidelberg (2015). https://doi.
org/10.1007/978-3-319-20615-8 1
3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
sion Trees. Wadsworth (1984)
4. Gauthier, T., Kaliszyk, C., Urban, J.: TacticToe: learning to reason with HOL4 tac-
tics. In: LPAR-21, 21st International Conference on Logic for Programming, Arti-
ficial Intelligence and Reasoning, Maun, Botswana (2017). http://www.easychair.
org/publications/paper/340355
5. Gransden, T., Walkinshaw, N., Raman, R.: SEPIA: search for proofs using inferred
automata. In: Felty, A., Middeldorp, A. (eds.) CADE 2015. LNCS, vol. 9195, pp.
246–255. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21401-6 16
302 Y. Nagashima
6. Hales, T.C., et al.: a formal proof of the Kepler conjecture. CoRR abs/1501.02155
(2015). http://arxiv.org/abs/1501.02155
7. Harrison, J.: HOL light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.)
FMCAD 1996. LNCS, vol. 1166, pp. 265–289. Springer, Heidelberg (1996). https://
doi.org/10.1007/BFb0031814
8. Harrison, J.: The HOL light theory of euclidean space. J. Autom. Reason. 50(2),
173–190 (2013). https://doi.org/10.1007/s10817-012-9250-9
9. Kaliszyk, C., Chollet, F., Szegedy, C.: HolStep: A machine learning dataset for
higher-order logic theorem proving. In: 5th International Conference on Learning
Representations, ICLR 2017, Toulon, France, Conference Track Proceedings (2017)
10. Klein, G., Nipkow, T., Paulson, L., Thiemann, R.: The archive of formal proofs
(2004). https://www.isa-afp.org/
11. Komendantskaya, E., Heras, J.: Proof mining with dependent types. In: Geuvers,
H., England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS, vol.
10383, pp. 303–318. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
62075-6 21
12. Matichuk, D., Murray, T.C., Andronick, J., Jeffery, D.R., Klein, G., Staples, M.:
Empirical study towards a leading indicator for cost of formal software verifica-
tion. In: 37th IEEE/ACM International Conference on Software Engineering, ICSE
2015, Florence, Italy, vol. 1 (2015). https://doi.org/10.1109/ICSE.2015.85
13. Nagashima, Y.: LiFtEr: language to encode induction heuristics for Isabelle/HOL.
In: Lin, A. (ed.) APLAS 2019. LNCS, vol. 11893, pp. 266–287. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-34175-6 14
14. Nagashima, Y.: Appendix to “simple dataset for proof method recommendation in
Isabelle/HOL (dataset description)”, May 2020. https://doi.org/10.5281/zenodo.
3839417
15. Nagashima, Y.: Simple dataset for proof method recommendation in Isabelle/HOL,
May 2020. https://doi.org/10.5281/zenodo.3819026
16. Nagashima, Y.: Smart induction for Isabelle/HOL (tool paper). CoRR
abs/2001.10834 (2020). https://arxiv.org/abs/2001.10834
17. Nagashima, Y., He, Y.: PaMpeR: proof method recommendation system for
Isabelle/HOL. In: Proceedings of the 33rd ACM/IEEE International Conference on
Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September
2018, pp. 362–372 (2018). https://doi.org/10.1145/3238147.3238210
18. Nagashima, Y., Kumar, R.: A proof strategy language and proof script generation
for Isabelle/HOL. In: de Moura, L. (ed.) CADE 2017. LNCS, vol. 10395, pp. 528–
545. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63046-5 32
19. Nagashima, Y., Parsert, J.: Goal-oriented conjecturing for Isabelle/HOL. In: Rabe,
F., Farmer, W., Passmore, G., Youssef, A. (eds.) CICM 2018. LNCS, vol. 11006, pp.
225–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96812-4 19
20. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL - A Proof Assistant for
Higher-Order Logic. Lecture Notes in Computer Science, vol. 2283. Springer, Hei-
delberg (2002). https://doi.org/10.1007/3-540-45949-9
Dataset Description: Formalization
of Elementary Number Theory in Mizar
Adam Naumowicz(B)
1 Introduction
The centrally maintained library of formalizations developed using Mizar [3],
the Mizar Mathematical Library (MML [2]), contains over 60, 000 theorems and
12, 000 definitions. The data is organized into more than 1, 300 files representing
articles on various topics. As such, the huge and somewhat eclectic library does
not appear to be the best resource for introducing the Mizar way of formalizing
mathematics to new users or facilitating introductory Mizar-based courses for
math students. For this reason we have started developing a set of easy to com-
prehend Mizar data files which can provide a better starting point for educational
activities. The set is based on examples from elementary number theory which
has an initially relatively steep learning curve, few prerequisites and provides a
great selection of self-contained proofs. Number theory proofs very often carry
an extra recreational component – statements can amuse the audience by sim-
plicity and elegance of their form, references to specific occasions, years or dates
and so on. Such tasks are in line with the educational entertainment approach
to learning which helps perceive the formalization as a challenging but reward-
ing activity. We believe that thanks to mastering the elementary techniques and
familiarizing with the theory’s basic methods one can be prepared to approach
the study and/or formalization of further, more advanced problems.
The Mizar processing has been performed using the infrastructure of the University of
Bialystok High Performance Computing Center.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 303–308, 2020.
https://doi.org/10.1007/978-3-030-53518-6_22
304 A. Naumowicz
3 Dataset Characteristics
Similarly to the informal original, the dataset comprises Mizar proofs of prob-
lems on several levels of difficulty. They are available in the form of full proofs,
proof sketches, as well as bare statements equipped with suitable environments
[12] importing necessary notions from the MML. Some ideas were drawn from
F. Wiedijk’s notion of formal proof sketches [18] and J. Alama’s mizar-items [1]
(developed as a means to do reverse mathematics over MML). The generation
of respective files was achieved by means of standard Mizar tools for extracting
article abstracts [6], and optimizing formalization environments [11]. Building
a suitable formalization environment of notions to be imported from the MML
is sometimes a non-trivial task itself, since many number theory facts are scat-
tered around the current library and take care of their various dependencies
– apart from proper number theory files, users also need to look for relevant
information in formalizations concerning cryptography (e.g. article PEPIN) or
set theory (article ABIAN), etc. Although the material contained in the prob-
lems is elementary, the collected proofs allow learning also more advanced Mizar
proof methods, like schemes or using Mizar flexary logical connectives [8] (see
references in Table 1).
MML ver. 5.57.1355 (04 June 2019)2 . The underlying Mizar article is now also
available in the MML as article NUMBER01 [13].
The data is located in directories: nump001 – nump010 corresponding to ten
initial problems from Sierpinski’s book. Each directory contains subdirectories
with:
Beginner Mizar users working with this dataset should be aware of some technical
issues.
E.g., a simple glitch can be seen in the statement of the very first problem:
Sierpinski refers to positive integers, whereas a preferred Mizar way is to use
natural numbers with their built-in automation. The literal encoding of state-
ments with positive integers is of course possible, but requires a technical and
superfluous (as far as the integrity of the MML is concerned) registration (see
e.g. the nump001t file).
We also face numerous differences in the writing style if we intend to mimic
natural language reasoning. E.g., in Mizar we use a lot of then linking to previ-
ous statements instead of using handy congruence chains ubiquitous in informal
2
http://mizar.uwb.edu.pl/system/index.html#download.
306 A. Naumowicz
which employs 0-based finite sequences to represent the ellipses available in tra-
ditional mathematics. However, one may note here that this Mizar encoding is
slightly more general than the original statement, because it also covers the triv-
ial case of n = 0 (so n does not have to be strictly positive) since 03 divides 05
according to the definition of the reflexive ‘divides’ predicate.
Table 1 shows more information about the data corresponding to particular
problems, the estimation of their size and the number of references extracted to
form the sketches. One can also see e.g. which problem can be used to illustrate
the use of natural induction, or which proof is based on Fermat’s little theorem.
Moreover, some proofs make use of either 0- or 1-based finite sequences to encode
informal ellipses – because of already available MML theories one or the other
approach can be preferable.
References
1. Alama, J.: mizar-items: exploring fine-grained dependencies in the Mizar math-
ematical library. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.)
CICM 2011. LNCS, vol. 6824, pp. 276–277. Springer, Heidelberg (2011). https://
doi.org/10.1007/978-3-642-22673-1 19
2. Bancerek, G., et al.: The role of the Mizar mathematical library for interactive
proof development in Mizar. J. Autom. Reason. 61(1–4), 9–32 (2018)
3. Bancerek, G., et al.: Mizar: state-of-the-art and beyond. In: Kerber, M., Carette,
J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015. LNCS (LNAI), vol. 9150,
pp. 261–279. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20615-8
17
4. Bancerek, G., Rudnicki, P.: A compendium of continuous lattices in MIZAR. J.
Autom. Reasoning 29(3–4), 189–224 (2002)
5. Grabowski, A.: Tarski’s geometry modelled in Mizar computerized proof assistant.
In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) Proceedings of the 2016
Federated Conference on Computer Science and Information Systems, FedCSIS
2016, Gdańsk, Poland, 11–14 September 2016. Annals of Computer Science and
Information Systems, vol. 8, pp. 373–381. IEEE (2016)
6. Grabowski, A., Kornilowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formalized
Reason. 3(2), 153–245 (2010)
7. Kaliszyk, C., Urban, J.: Mizar 40 for mizar 40. J. Autom. Reason. 55(3), 245–256
(2015)
8. Kornilowicz, A.: Flexary connectives in Mizar. Comput. Lang. Syst. Struct. 44,
238–250 (2015)
9. Kornilowicz, A., Naumowicz, A.: Niven’s theorem. Formalized Math. 24(4), 301–
308 (2016)
308 A. Naumowicz
1 Introduction
There is a class of machine learning sequence-to-sequence architectures based on
recurrent neural networks (RNNs) which are successfully used in the domain of
natural language processing, in particular for translation between languages [2].
Recently, such architectures proved useful also in various tasks in the domain of
symbolic computation [6,10,14,16]. The models encode the source sequence to
a hidden vector state and decode from it the target sequence.
In this work, we employ such neural methods to choose among the non-
deterministic steps in connection-style theorem proving. In more detail, we want
to learn the hidden proving states that correspond to the evolving proof trees
and condition the next prover steps based on them. I.e., from a set of connection
tableau proofs we create a dataset (Sect. 2) of source-target training examples
of the form (partial proof state, decision) that we then use to train the neural
models (Sect. 3). The results are reported in Sect. 4. Section 5 shows an additional
experiment with predicting (conjecturing) tableau goals.
B. Piotrowski—Supported by the grant 2018/29/N/ST6/02903 of National Science
Center, Poland.
J. Urban—Supported by the AI4REASON ERC Consolidator grant nr. 649043 and by
the Czech project AI&Reasoning CZ.02.1.01/0.0/0.0/15 003/0000466 and the Euro-
pean Regional Development Fund.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 309–314, 2020.
https://doi.org/10.1007/978-3-030-53518-6_23
310 B. Piotrowski and J. Urban
The connection tableau seems suitable for such methods. The connection
proofs grow as branches of a tree rooted in a starting clause. The number of
options (clauses) to choose from is relatively small compared to saturation-style
provers, where the number of clauses grows quickly to millions during the search.
The tableau branches representing the proof states can be the sequential input to
the RNNs, which can then decode one or more decisions, i.e., choices of clauses.
4 Results
The average results for the above met- Table 1. Predictive accuracy of the
ric are shown in Table 1. We can see NMT system trained on two types
that predicting the next clause is much of source paths (literals or clauses),
more precise than predicting multiple decoding 1–3 consecutive clauses. 1
clauses. The accuracy of predicting the or 10 best outputs were decoded and
next clause(s) from a sequence of clauses assessed.
is lower than predicting the next clause(s)
Paths of literals Paths of clauses
from a sequence of literals, which means # clauses 1 best 10 best 1 best 10 best
the literals give more precise information to decode output outputs output outputs
neural model trained on the paths of literals as the input are shown in the sec-
ond row of Table 2. As expected, the longer the input sequence, the better is the
prediction. The neural model was capable of taking advantage of a more complex
context. This differs significantly with the path-characterization methods using
manual features (as in [9]) that just average (possibly with some decay factor)
over the features of all literals on the path.
To compare with such methods, we trained a classifier based on gradient
boosted trees for this task using the XGBoost system [1], which was used for
learning feature-based guidance in [9]. To make the task comparable to the neural
methods, we trained XGBoost in a multilabel setting, i.e., for each partial proof
state (a path of literals) it learns to score all the available clauses, treated as
labels. Due to limited resources, we restrict this comparison to the MPTP2078
subset of MML which has 1383 different labels (the clause names).
The average performance of XGBoost on predicting the next clause from the
(featurized) path of literals was 0.43. This is lower than the performance of the
neural model, also using literals on the path as the input (0.64). The XGBoost
performance conditioned on the length of the input path is shown in the third row
of Table 2. XGBoost is outperforming NMT on shorter input sequences of literals,
but on longer paths, XGBoost gets significantly worse. The performance of the
recurrent neural model grows with the length of the input sequence, reaching
0.85 for input length 8. This means that providing more context significantly
helps the recurrent neural methods, where the hidden state much more precisely
represents (encodes) the whole path. The feature-based representation used by
XGBoost cannot reach such precision, which is likely the main reason for its
performance flattening early and reaching at most 0.51.
It turns out that this more difficult task is to some extent feasible with NMT.
Table 3 shows that NMT could propose the right next literal on the path in a
significant number of cases. Again, there is a positive dependence between the
length of the input sequence and the predictive performance. Most of the times
the correct predictions involve short literals, whereas predicting longer literals is
harder. The proposed longer literals often not only do not match the right ones
but have an improper structure (see Table 4 for examples of the NMT outputs).
Table 4. Literals conjectured by NMT vs. the correct ones. (1) is an example of a
correctly predicted output; in (2) NMT was wrong but proposed a literal which is
similar to the proper one; (3) shows a syntactically incorrect literal produced by NMT.
References
1. Chen, T., Guestrin, C.: XGboost: a scalable tree boosting system. ACM SIGKDD
2016, 785–794 (2016)
2. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk,
H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for
statistical machine translation. EMNLP 2014, 1724–1734 (2014)
3. Chvalovský, K., Jakubuv, J., Suda, M., Urban, J.: ENIGMA-NG: efficient neural
and gradient-boosted inference guidance for E. CADE 27, 197–215 (2019)
4. Evans, R., Saxton, D., Amos, D., Kohli, P., Grefenstette, E.: Can neural networks
understand logical entailment? In: ICLR 2018 (2018)
5. Freitag, M., Al-Onaizan, Y.: Beam search strategies for neural machine translation.
In: NMT@ACL 2017, pp. 56–60 (2017)
314 B. Piotrowski and J. Urban
Gauthier has been working on term synthesis using Monte-Carlo Tree Search and
reinforcement learning with semantic feedback [1,5].
2 Datasets
The datasets for neural conjecturing are available from our web page3 . We have
so far experimented with the following data:
1. All Mizar articles (MML version 1147), stripped of comments and concate-
nated together4 . This is 78M of uncompressed text.
2. Text version of the HTML export [14] of the MML articles5 . This unpacks to
156 MB. It additionally contains disambiguation features such as full types of
variables, full names of theorems and the thesis is printed after every natural
deduction step. This seems useful for neural conjecturing because the context
is repeated more often.
3. Tokenized TPTP proofs6 of 28271 Mizar theorems translated by the MPTP
system [15]. The proofs are produced by the E prover [13] equipped with
recent ENIGMA guidance [2]. This unpacks to 658 MB.
4. A subselection of the used Mizar premises from the 28271 proofs printed in prefix
notation7 . These files always start with the conjecture, and the premises are
printed in the order in which E used them in its proof. This unpacks to 53 MB.
Below we show short examples of the four kinds of data, all for the theorem
ZMODUL01:103:
theorem
for W being strict Submodule of V holds W /\ W = W
proof
let W be strict Submodule of V;
the carrier of W = (the carrier of W) /\ (the carrier of W);
hence thesis by Def15;
end;
theorem :: ZMODUL01:103
for V being Z_Module
for W being strict Submodule of V holds W /\ W = W
proof
let V be Z_Module; ::_thesis: for W being strict Submodule of V holds W /\ W = W
let W be strict Submodule of V; ::_thesis: W /\ W = W
the carrier of W = the carrier of W /\ the carrier of W ;
hence W /\ W = W by Def15; ::_thesis: verum
end;
3
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/.
4
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/datasets/mmlall.txt2.
5
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/datasets/html2.tar.gz.
6
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/datasets/prf2.tar.gz.
7
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/datasets/prf7.tar.gz.
First Neural Conjecturing Datasets and Experiments 317
c! b0 c=> c& c~ cv2_struct_0 b0 c& cv13_algstr_0 b0 c& cv2_rlvect_1 b0 c& cv3_rlvect_1 ...
c! b0 c=> c& c~ cv2_struct_0 b0 c& cv13_algstr_0 b0 c& cv2_rlvect_1 b0 c& cv3_rlvect_1 ...
c! b0 c! b1 c= ck3_xboole_0 b0 b0 b0
3 Experiments
The basic experiment for each dataset consists of training the smallest (117 M
parameters) version of GPT-2 on a NVIDIA GeForce GTX 1080 GPU with 12 GB
RAM, producing random unconditioned samples during the training. The pro-
duced samples and the most recent trained models are available from our web
page8 . The published models can be used for conditional and unconditional gen-
eration of Mizar-like texts, proofs and premise completion. The samples contain
megabytes of examples of what can be generated and how the generated texts
improve during the training. The training on the third dataset was stopped early.
The large number of redundant tokens such as brackets and commas led us to pro-
duce the fourth dataset that uses the punctuation-free prefix notation and much
shorter summary of the E proof (just the premises in their order). The training
for datasets 1, 2 and 4 has been running for several weeks, with the performance
still slowly improving. See Fig. 1 in Appendix A for a sample training and loss on
dataset 2. There are many interesting conjectures generated during the uncondi-
tioned sampling. The trained models can be directly used by Mizar users for auto-
completion of their texts. Some examples compared to real theorems are shown
below. More semantic evaluation on the textual datasets (1 and 2) could be done in
various ways. We imagine that a proper Mizar environment will have to be guessed,
some assumptions may be automatically added, etc.
# real MML theorem
theorem :: YELLOW10:61
for S, T being non empty up-complete Poset
for X being Subset of S
for Y being Subset of T st X is property(S) & Y is property(S) holds
[:X,Y:] is property(S)
8
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/samples/, http://grid01.ciirc.cvut.cz/
∼mptp/nn conj20/models/.
318 J. Urban and J. Jakubův
proof
let X, Y be finite set ; ::_thesis: ( not X is empty & X c= Y & card X = card Y implies X = Y )
assume that
A1: not X is empty and A2: X c= Y and A3: card X = card Y ; ::_thesis: X = Y
card (Y \ X) = (card Y) - (card X) by A1, A3, CARD_2:44;
then A4: card (Y \ X) = ((card Y) - 1) - (card X) by CARD_1:30;
X = Y \ X by A2, A3, Th22;
hence X = Y by A4, XBOOLE_0:def_10; ::_thesis: verum
end;
We have also done two initial experiments with proof and formula completion.
The data and results for them are available from our web page9 . In the first
experiment, we use a model trained on Dataset 4 (premises), and ask the model
to auto-complete 369 theorems from the CARD series of Mizar. For each conjec-
ture we produce 10 premise selections using beam search, and we use different
temperatures and beam search parameters. An interesting phenomenon is that
with low temperatures, practically all conjectured premises are known Mizar
theorems. I.e., the task reduces to standard premise selection. With higher tem-
peratures, GPT-2 starts producing premises (lemmas) that are not among the
existing Mizar theorems, but are still well-typed. Even higher temperatures lead
to non-well-typed or even unparsable lemmas. The next section provides a more
involved ATP evaluation done on a larger dataset.
The second experiment was done over Dataset 2 and a set of 462 partial
formulas from the CARD articles. The model trained on Dataset 2 is then (again
using beam search) asked to auto-complete these formulas. Mizar users can also
play with such autocompletion via a web server10 using this model. For example,
The first larger ATP (semantic) evaluation uses the fourth dataset following the
setting introduced for such evaluations in [6]. After training GPT-2 on the 28271
ENIGMA proofs, we produce (using beam search) 12 GPT-2 premise predic-
tions for a set of 31792 theorems of which 6639 are not among the training ones.
9
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/samples/premises/, http://grid01.
ciirc.cvut.cz/∼mptp/nn conj20/samples/html2/.
10
http://grid01.ciirc.cvut.cz:8000/.
11
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/samples/html2/00cardmizout1 t1.
First Neural Conjecturing Datasets and Experiments 319
12
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preds3.tar.gz.
13
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preds5.tar.gz.
14
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preds6.tar.gz.
15
We used E with 6 s time limit and its auto-schedule mode for this initial check.
16
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/xxreal 1.html#T48.
17
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/t48 xxreal 1 5.
18
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/t48 xxreal 1 5.out.
19
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preddatagpt1.out.tar.gz.
20
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preddatagpt1.tar.gz.
21
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/groupp 1.html#T10.
22
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/out4.tar.gz.
23
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/sincos10.html#T17.
24
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/t17 sincos10 1.
25
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/functor1.html#T9.
320 J. Urban and J. Jakubův
that the composition of full functors is full, GPT-2 proposes to reduce fullness
to faithfulness, likely because a previous theorem26 says that faithfulness is pre-
served under composition. See Appendix A for details.
Finally we use standard premise selection (although we could recurse and
use GPT-2) and E with the ENIGMA guidance to try to prove the 52515 new
formulas.27 This yields 9000–10000 proofs,28 depending on how we run premise
selection and E. While some proofs are long, it seems that we are not yet capa-
ble of proving the more interesting conjectures and we still need more ATP
strengths. E.g., the longest ATP proof shows that -infty is non empty, where
-infty is defined as [0,REAL]. A slightly more useful conjecture which is also
hard to prove29 is the strengthening of the symmetry of the are homeomorphic
predicate30 from non-empty to arbitrary spaces.
Funding. Funded by the AI4REASON ERC Consolidator grant nr. 649043 and by the
Czech project AI&Reasoning CZ.02.1.01/0.0/0.0/15 003/0000466 and the European
Regional Development Fund. We thank K. Chvalovský and T. Gauthier for discussions.
26
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/functor1.html#T7.
27
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preddata128.tar.gz.
28
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/preddata128.out.tar.gz.
29
http://grid01.ciirc.cvut.cz/∼mptp/nn conj20/results/t20 borsuk 3 7 1.
30
http://grid01.ciirc.cvut.cz/∼mptp/7.13.01 4.181.1147/html/borsuk 3.html#R2.
First Neural Conjecturing Datasets and Experiments 321
Following are the Mizar premises in the order proposed by GPT-2. The fifth and
sixth were not needed for the ATP proof.
l e t X be ext−r e a l −membered s e t ; l e t Y be s e t ;
pred X c= Y means : Def8 : : : MEMBERED: def 8
f o r e being ext−r e a l number s t e i n X holds e i n Y ;
l e t r , s be ext−r e a l number ;
c l u s t e r [ . r , s . [ → ext−r e a l −membered ;
theorem : : SUBSET : 1
f o r a , b being s e t s t a i n b holds a i s Element o f b ;
References
1. Brown, C.E., Gauthier, T.: Self-learned formula synthesis in set theory. CoRR,
abs/1912.01525 (2019)
2. Chvalovský, K., Jakubův, J., Suda, M., Urban, J.: ENIGMA-NG: efficient neural
and gradient-boosted inference guidance for E. In: Fontaine, P. (ed.) CADE 2019.
LNCS (LNAI), vol. 11716, pp. 197–215. Springer, Cham (2019). https://doi.org/
10.1007/978-3-030-29436-6 12
3. Colton, S.: Automated Theory Formation in Pure Mathematics. Distinguished Dis-
sertations. Springer, London (2012). https://doi.org/10.1007/978-1-4471-0147-5
4. Fajtlowicz, S.: On conjectures of Graffiti. Ann. Discrete Math. 72(1–3), 113–118
(1988)
5. Gauthier, T.: Deep reinforcement learning in HOL4. CoRR, abs/1910.11797 (2019)
6. Gauthier, T., Kaliszyk, C., Urban, J.: Initial experiments with statistical conjec-
turing over large formal corpora. In: CICM 2016 WiP Proceedings, pp. 219–228
(2016)
7. Johansson, M., Rosén, D., Smallbone, N., Claessen, K.: Hipster: integrating theory
exploration in a proof assistant. In: Watt, S.M., Davenport, J.H., Sexton, A.P.,
Sojka, P., Urban, J. (eds.) CICM 2014. LNCS (LNAI), vol. 8543, pp. 108–122.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08434-3 9
8. Kaliszyk, C., Urban, J., Vyskočil, J.: Automating formalization by statistical and
semantic parsing of mathematics. In: Ayala-Rincón, M., Muñoz, C.A. (eds.) ITP
2017. LNCS, vol. 10499, pp. 12–27. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-66107-0 2
9. Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (Rough
Diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22102-1 15
10. Lenat, D.B.: AM: an artificial intelligence approach to discovery in mathematics
as heuristic search. Ph.D thesis, Stanford (1976)
11. Piotrowski, B., Urban, J.: Stateful Premise Selection by Recurrent Neural Networks
(2020)
12. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI
Blog 1(8), 9 (2019)
13. Schulz, S.: System description: E 1.8. In: McMillan, K., Middeldorp, A., Voronkov,
A. (eds.) LPAR 2013. LNCS, vol. 8312, pp. 735–743. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-45221-5 49
First Neural Conjecturing Datasets and Experiments 323
14. Urban, J.: XML-izing Mizar: making semantic processing and presentation of MML
easy. In: Kohlhase, M. (ed.) MKM 2005. LNCS (LNAI), vol. 3863, pp. 346–360.
Springer, Heidelberg (2006). https://doi.org/10.1007/11618027 23
15. Urban, J.: MPTP 0.2: design, implementation, and initial experiments. J. Autom.
Reasoning 37(1–2), 21–43 (2006)
16. Wang, Q., Brown, C.E., Kaliszyk, C., Urban, J.: Exploration of neural machine
translation in autoformalization of mathematics in Mizar. In: CPP, pp. 85–98
(2020)
17. Wang, Q., Kaliszyk, C., Urban, J.: First experiments with neural translation of
informal to formal mathematics. In: Rabe, F., Farmer, W.M., Passmore, G.O.,
Youssef, A. (eds.) CICM 2018. LNCS (LNAI), vol. 11006, pp. 255–270. Springer,
Cham (2018). https://doi.org/10.1007/978-3-319-96812-4 22
A Contextual and Labeled Math-Dataset
Derived from NIST’s DLMF
1 Introduction
Machine Learning (ML) and ML-based Natural Language Processing (NLP)
have started to be applied to math language processing (MLP), math knowledge
discovery (MKD), and document processing in STEM fields [1,3,5–7]. This holds
great promise for advancement in those areas, but to accomplish that, we need
labeled math-datasets to train and test ML models, such as classifiers, part-of-
math taggers, summarizers, translators, question-answering systems, and word
embedding models. Unlike in traditional ML-NLP applications, there is a dearth
of labeled datasets for MLP and MKD. Ginev and Miller introduced recently a
large dataset labeled at a coarse granularity [2], but no math dataset labeled at
fine granularity is available at this time.
In this paper, we present a new dataset1 that we have derived from the widely
used Digital Library of Mathematical Functions (DLMF) of NIST [4]. For reasons
stated in Sect. 3, the dataset consists of two twin datasets: the per-expression
dataset, and the Simple-XML dataset.
1
For now, the dataset is at https://github.com/abdouyoussef/math-dlmf-dataset/.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 324–330, 2020.
https://doi.org/10.1007/978-3-030-53518-6_25
A Contextual and Labeled Math-Dataset Derived from NIST’s DLMF 325
The full context of each equation or expression is easily and quickly derivable
from the twin datasets, which enables users to identify and fully extract the
sentence containing a given equation/expression, as well as neighboring sentences
or full paragraphs, for contextualized processing needed in many MLP tasks.
Table 2. Names, values and explanations of the context fields of equation records.
We produced the datasets by writing Java software to process the DLMF XML
source files and to extract from them the twin datasets, including the JSON
version of the per-expression dataset. This section describes the details and sizes
of the datasets.
328 A. Youssef and B. R. Miller
The following is but a short list of applications that can make use of the datasets:
References
1. Gao, L., et al.: Preliminary exploration of formula embedding for mathematical
information retrieval: can mathematical formulae be embedded like a natural lan-
guage? arXiv:1707.05154 (2017)
330 A. Youssef and B. R. Miller
Kevin Buzzard(B)
Catherine Dubois(B)
1
The work presented in [5] concerns ternary constraints but has been recently
extended to n-ary constraints.
c Springer Nature Switzerland AG 2020
C. Benzmüller and B. Miller (Eds.): CICM 2020, LNAI 12236, pp. 334–335, 2020.
https://doi.org/10.1007/978-3-030-53518-6
Formally Verified Constraints Solvers: A Guided Tour 335
Thanks. The work presented here is joint work with A. Butant, M. Carlier,
V. Clément, S. Elloumi, A. Gotlieb, A. Ledein and H. Mlodecki. I am very
grateful to them.
References
1. Abdulaziz, M., Mehlhorn, K., Nipkow, T.: Trustworthy graph algorithms (invited
talk). In: Rossmanith, P., Heggernes, P., Katoen, J. (eds.) 44th International Sym-
posium on Mathematical Foundations of Computer Science, MFCS 2019, Aachen,
Germany, 26–30 August 2019, volume 138 of LIPIcs, pp. 1:1–1:22. Schloss Dagstuhl
- Leibniz-Zentrum für Informatik (2019)
2. Bessiere, C.: Constraint propagation. In: Handbook of Constraint Programming,
chapter 3. Elsevier, Amsterdam (2006)
3. Carlier, M., Dubois, C., Gotlieb, A.: A certified constraint solver over finite
domains. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp.
116–131. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32759-
9 12
4. Christian Bessiere, R.Y., Régin, J.-C., Zhang, Y.: An optimal coarse-grained arc
consistency algorithm. Artif. Intell. 165, 165–185 (2005)
5. Dubois, C.: Formally verified decomposition of non-binary constraints into equiv-
alent binary constraints. In: Journées Francophones des Langages Applicatifs, Les
Rousses, France (2019)
6. Klein, G., et al.: sel4: formal verification of an operating-system kernel. Commun.
ACM 53(6), 107–115 (2010)
7. Ledein, A., Dubois, C.: Facile en coq : vérification formelle des listes d’intervalles.
In: Journées Francophones des Langages Applicatifs, Gruissan, France (2020)
8. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52, 107–115
(2009)
9. Mackworth, A.: Consistency in networks of relations. Art. Intel. 8(1), 99–118 (1977)
10. Régin, J.-C.: A filtering algorithm for constraints of difference in CSPs. In: 12th
National Conference on Artificial Intelligence (AAAI 1994), pp. 362–367 (1994)
11. Wetzler, N., Heule, M.J.H., Hunt, W.A.: DRAT-trim: efficient checking and trim-
ming using expressive clausal proofs. In: Sinz, C., Egly, U. (eds.) SAT 2014. LNCS,
vol. 8561, pp. 422–429. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
09284-3 31
Author Index