Faithful Logical Reasoning Via Symbolic Chain of Thought
Faithful Logical Reasoning Via Symbolic Chain of Thought
Faithful Logical Reasoning Via Symbolic Chain of Thought
Jundong Xu1 , Hao Fei1∗, Liangming Pan2 , Qian Liu3 , Mong-Li Lee1 , Wynne Hsu1
1
National University of Singapore, Singapore
2
University of California, Santa Barbara, USA
3
University of Auckland, New Zealand
jundong.xu@u.nus.edu; haofei37@nus.edu.sg; liangmingpan@ucsb.edu
liu.qian@auckland.ac.nz; dcsleeml@nus.edu.sg; whsu@comp.nus.edu.sg
Abstract Question
Premises: There was one six-way tie on the leaderboard and
one person in the six-way tie was from Belgium. Descampe is from
While the recent Chain-of-Thought (CoT) tech- Belgium and is on the leaderboard of the 1992 du Maurier Classic.
nique enhances the reasoning ability of large Is the following statement true, false, or uncertain? Descampe is in
the six-way tie in the leaderboard of the 1992 du Maurier Classic.
language models (LLMs) with the theory of
arXiv:2405.18357v2 [cs.CL] 11 Jun 2024
ing improvements over the CoT method consis- Figure 1: An illustrative example of logical reasoning
tently, meanwhile refreshing the current state- via Chain-of-Thought and our proposed Symbolic CoT
of-the-art performances. We further demon- (SymbCoT).
strate that our system advances in more faith-
ful, flexible, and explainable logical reasoning.
reasoning abilities, particularly in logical reason-
To our knowledge, this is the first to combine
symbolic expressions and rules into CoT for ing, is of paramount importance. Logical reasoning
logical reasoning with LLMs. Code is open at (Cummins et al., 1991) stands out as a quintessen-
https://github.com/Aiden0526/SymbCoT. tial form of reasoning that, unlike other types, is
crucial and challenging. It epitomizes a cognitive
process characterized by rigorous evidence evalua-
1 Introduction tion, argument construction, and logical deduction
(Markovits and Vachon, 1989). The latest trend
Achieving human-like logical reasoning capabil-
is integrating LLMs with symbolic solvers to en-
ities is crucial for realizing AGI, which plays a
hance their performance (Olausson et al., 2023;
pivotal role in enabling intelligent systems to en-
Pan et al., 2023). Unfortunately, these efforts
gage in problem-solving, decision-making, and
have been limited to using LLMs merely as text-to-
critical thinking. Recently, LLMs (Patel et al.,
symbolic translators, with the core reasoning still
2023; Hahn et al., 2022) have demonstrated un-
reliant on traditional external reasoners (Robinson,
precedented capabilities in semantic understanding,
1965). Such an approach, first, does not intrinsi-
casting a beacon of hope toward achieving AGI.
cally strengthen LLMs’ capability in logical reason-
Further enhancing LLMs to achieve human-level
ing. Besides, over-reliance on external symbolic
∗
Corresponding author solvers often results in inflexibility, information
omission, and unexplainability. plicit intentions or crucial contextual informa-
On another note, the concept of CoT (Wei et al., tion embedded within questions. Yet LLMs
2022) has been introduced to mimic human think- excel at interpreting such nuanced information
ing processes by encouraging LLMs to explicitly and contexts. Thus, we consider a combina-
consider intermediate steps during problem-solving tion of symbolic and natural language expres-
and to provide rationales for decisions, thereby en- sions to leverage the mutual strengths of both:
hancing the reliability of the reasoning process. freely expressed implicit intents and contextual
CoT has been successfully integrated into a wide information in natural language and rigorous
array of tasks (Zhang et al., 2023b; Fei et al., 2023; expression in symbolic forms.
Inaba et al., 2023), significantly improving LLMs’ 2) Unlike the straightforward prompting of “think-
reasoning capabilities, sometimes even matching ing step by step” in vanilla CoT, SymbCoT con-
human performance in certain scenarios (Duni- siders a plan-then-solve architecture. This in-
vin, 2024). There is growing interest in apply- volves decomposing the original complex prob-
ing CoT for logical reasoning (Huang and Chang, lem into a series of smaller, more manageable
2023), and developing advanced strategies such as sub-problems, which are then addressed one
self-consistency (Wang et al., 2023b) and Tree-of- by one. This way, the entire reasoning process
Thought (Yao et al., 2023) for enhancement. How- becomes more trackable, enabling a clearer and
ever, applying basic CoT directly to logical rea- more structured approach to problem-solving.
soning is inherently limited, due to the abstractive 3) Furthermore, we devise a retrospective veri-
nature of language expression. Logical reasoning fication mechanism. At both the translation
demands rigorous logical calculations, heavily re- and subsequent problem-solving stages, we ret-
lying on both symbolic expressions and rigid de- rospectively validate the correctness of each
ducing rules to represent the internal structure of step’s outcome, by tracing back to the origi-
problems. Plain texts often fall short of support- nal given condition. This verification process
ing such precise logic, especially in scenarios that ensures the accuracy and reliability of the oper-
demand strict logical representation. For instance, ations performed during the reasoning process.
as shown in Fig. 1, when tackling a logical rea- In experiments, we test SymbCoT with symbolic
soning problem, utilizing symbolic representations expressions of FOL and Constraint Optimization
like First-Order Logic (FOL) is more representative (CO) on five logical reasoning datasets using both
and precise than fully natural language rationales GPT-3.5 and GPT-4. Results demonstrate that Sym-
in CoT, enabling strict logical reasoning through bCoT significantly enhances the reasoning capabil-
clear inference rules. ities of vanilla CoT, outperforming current SoTA
To address these challenges, we introduce a solutions clearly. We further demonstrate that the
novel Symbolic CoT (namely SymbCoT) for log- more complex the logical reasoning task, the more
ical reasoning. Unlike existing state-of-the-art pronounced the improvement of SymbCoT over
(SoTA) LLM-based symbolic reasoning systems vanilla CoT, further with the verification mecha-
(Olausson et al., 2023; Pan et al., 2023), SymbCoT nism ensuring the faithfulness of the reasoning pro-
is entirely facilitated by LLMs without relying on cess. Our in-depth analysis reveals that fully LLM-
any external reasoners/tools, i.e., encompassing based logical reasoning can offer better symbolic
both the initial translation and subsequent reason- syntax robustness, human-readable explanations,
ing phases. Fig. 2 provides a high-level illustration and fuller utilization of information.
of the overall system workflow. Technically, Sym- In summary, our technical contributions are:
bCoT comprises four main modules: Translator,
Planner, Solver, and Verifier. Notably, SymbCoT • proposing a fully LLM-based logical reason-
is characterized by the following three core aspects: ing framework based on CoT, demonstrating
that LLMs can achieve robust logical reasoning
1) SymbCoT integrates symbolic expressions into capabilities without external reasoning tools.
CoT to describe intermediate reasoning pro- Compared to existing SoTA solutions relying
cesses, facilitating more precise logical calcu- on external resolvers, SymbCoT offers better
lations. However, relying solely on symbolic robustness against translation errors and more
representation still has its limitations, as it of- human-understandable explanations.
ten fails to capture certain content, such as im- • innovatively integrating the strengths of sym-
LLM
P
S Translator Planner Solver Verifier
Checked Symbolic
Selected Premises Premises
Premises Final
& Symbolic Premises Step-by-Step Plan Derived Insights Verified Insights
Question Answer
Conclusion Verified Answer
bolic forms and natural language expressions, reasoning processes has been shown to enhance
enabling precise reasoning calculations while outcomes in code generation tasks (Li et al., 2023),
fully interpreting implicit information and cap- while adopting mathematical equations for CoT’s
turing rich contexts. steps has proven beneficial in solving mathemat-
• introducing a plan-then-solve architecture for ical problems (Imani et al., 2023). Focusing on
CoT reasoning, along with a retrospective veri- logical reasoning, it becomes evident that solely
fication mechanism, enhancing the faithfulness using natural language formats for intermediate
of the reasoning process. reasoning steps inevitably leads to significant in-
formation loss, especially when tackling complex
2 Related work logical reasoning jobs. This paper, therefore, pro-
Recent achievements in reasoning research pow- poses a symbolic-oriented CoT approach tailored
ered by LLMs have shown promising results for logical reasoning.
(Huang and Chang, 2023; Dunivin, 2024), bringing Logical reasoning (Cummins et al., 1991), a
LLMs closer to human-level reasoning capabilities paramount aspect of the reasoning domain, de-
due to their profound semantic understanding (Wu mands models that can precisely grasp and ma-
et al., 2023; Fei et al., 2022). Among these, the CoT nipulate complex logical structures. Previous
series methodology (Wei et al., 2022) has garnered works have explored rule-based (Robinson, 1965)
increasing attention for its emulation of human and neural-based solving (Amayuelas et al., 2022;
discrete chain reasoning. By considering more in- Gerasimova et al., 2023) methods for interpret-
termediate steps and the rationales behind decision- ing symbolic representations. The latest trend in-
making, CoT has significantly enhanced overall volves integrating LLMs into the symbolic reason-
reasoning performance on many downstream appli- ing process (Ye et al., 2023; Gaur and Saunshi,
cations (Fei et al., 2023, 2024). Subsequent tech- 2023). For example, Logic-LM (Pan et al., 2023)
nologies have introduced more advanced reason- and LINC (Olausson et al., 2023) consider using
ing frameworks, incorporating mechanisms such as LLMs as translators to convert natural language
self-consistency and non-linear, multidimensional into symbolic syntax such as FOL, which is then
topological structures, e.g., Tree-of-Thought (Yao processed by external reasoning tools to enhance
et al., 2023), Graph-of-Thought (Besta et al., 2023; reasoning performance. These approaches main-
Zheng et al., 2024), and other variants (Zhang et al., tain that LLMs cannot parse symbolic expressions
2023b; Wang et al., 2023a). as reliably as external rule-based reasoners.
However, research has also highlighted limita- Nonetheless, merely utilizing LLMs as transla-
tions within CoT due to its reliance on natural lan- tors does not inherently enhance their logical rea-
guage rationales, which may not always be advanta- soning capabilities. This work pioneers the devel-
geous in certain scenarios. Studies have found that opment of the first symbolic CoT specifically de-
representing CoT’s intermediate steps in a struc- signed for logical reasoning, fully utilizing LLMs.
tured manner, reflecting the task’s intrinsic struc- In contrast to approaches like Logic-LM and LINC,
ture, can bolster reasoning capabilities for specific our method demonstrates several advancements:
tasks (Hu et al., 2023; Imani et al., 2023). For in- First, external reasoners require strict formatting,
stance, using pseudo-code to describe intermediate where any translation error by LLMs can lead to
failure in parsing and reasoning. Our reasoning Solver is tasked with deriving the answer through
steps, facilitated by the LLM, exhibit greater robust- a sequential logical inference process given the
ness against syntax errors. Second, the entire rea- premises and question statement, along with a
soning process is conducted by the LLM, providing meticulously crafted plan. The Solver module em-
rationales that ensure a more human-friendly ex- phasizes the importance of basing logical deduc-
planation throughout. Third, we propose a blend of tions on symbolic inference rules. For instance,
symbolic forms and natural language expressions the LLM is explicitly instructed to apply logical
within the logical reasoning process, achieving pre- deduction techniques adhering to FOL such as
cise reasoning calculations while fully interpreting Modus Tollens (Enderton, 1972), e.g., If ¬B, and
implicit information inherent in natural language. (A → B), then ¬A. This ensures that each step in
Finally, we introduce a plan-then-solve CoT rea- the reasoning process is grounded in established
soning architecture and a verification mechanism, logical frameworks, to draw accurate and reliable
ensuring the faithfulness of the reasoning process. conclusions.
Verifier serves two functions in our framework.
3 SymbCoT for Symbolic Reasoning First, it validates the correctness of symbolic trans-
3.1 Task Definition lations by prompting the LLM to ascertain their
semantic equivalence to the corresponding natural
The logical reasoning is defined as: formally, given language. If discrepancies are identified, the lan-
a set of premises P = {p1 , p2 , . . . , pn }, where guage model refines the symbolic translation based
each pi represents a logical statement, we aim to on the original natural language to ensure semantic
derive a conclusion regarding a given statement S. consistency. Second, it verifies the output of the
The objective is to determine whether S is true (T ), Solver containing step-by-step logical reasoning
false (F ), or unknown (U ) based on the logical with the final answer. During this verification pro-
inferences drawn from the premises. cess, the Verifier scrutinizes each logic deduction
▶ Example: made, to ensure they strictly comply with the prin-
<Premises> (P ) ciples of formal logic. If invalid logic is found,
A hawk never lands. Some birds are hawks.
<Statement> (S) the Verifier refines the reasoning steps according to
All birds land. the correct logic to arrive at a new answer. These
<Answer>
False. ensure both the fidelity and reliability of symbolic
translations and logical reasoning soundness.
3.2 Modules 3.3 Reasoning Steps
As shown in Fig. 2, our SymbCoT system is fully We now walk through the SymbCoT reasoning
supported by LLMs and comprises four distinct steps with detailed examples, each corresponding
modules: Translator, Planner, Solver, and Veri- to the process exemplified in Fig. 2.
fier, whose roles are elaborated as follows. 1 Step 1: Translating natural language context
Translator converts the premises and a question into symbolic. Given the original set of premises
statement from natural language to a symbolic for- P = {p1 , p2 , . . . , pn } and the question statement
mat. This process prepares the input in a way that S, the Translator will first interpret them into
aligns with the structured requirements of subse- the corresponding symbolic format, denoted as
quent reasoning processes, ensuring that the reason- P ′ = {p′1 , p′2 , . . . , p′n } and S ′ . We concatenate
ing problems are represented in a format conducive the natural language and symbolic expression P
to logical analysis. and P ′ as Pc , S and S ′ as Sc , i.e., hybrid format
expressions for the future steps of reasoning.
Planner breaks down the raw problem into
▶ Input:
smaller sub-problems, which develop a detailed, Please parse the context and question into First-Order
step-by-step plan that connects the given premises Logic formulas.
<Premises> (P )
to the question statement in both natural language If a cartoon character is yellow, it is from the Simpsons.
and symbolic form. This blueprint guides the struc- If a cartoon character is from Simpsons, then it is loved by
children.
tured reasoning process by outlining a logical se- (... More premises ...)
quence of steps. <Statement> (S)
Ben is ugly or yellow.
1
We use few-shot demonstrations for each module.
▶ Output: from premise 1.
<Premises> (P ′ ) Step 2: Simpsons(ben) → Loved(ben) by Modus Ponens
∀x (Y ellow(x) → Simpsons(x)) from premise 2.
∀x (Simpsons(x) → Loved(x)) (... More steps ...)
(... More premises ...) <Conclusion> (C)
<Statement> (S ′ ) Thus, we can conclude that (Y ellow(ben) ∨ U gly(ben))
(Y ellow(ben) ∨ U gly(ben)) is false by contradiction.
Accuracy %
76.47
Naive 77.40 52.67 69.11 66.39 80
CoT 98.79 68.11 70.58 79.16 70 68.70 69.10 73.00 71.30
CoT-SC - 69.33 68.14 - 70
ToT - 70.33 69.12 - 60 60
CR - 71.67 69.11 - AR-LSAT Average
80 79.94
DetermLR - 79.17 75.45 - 41.30 77.45
40.43
Logic-LM 83.20 79.66 78.92 80.59 40 39.13
70 67.05
SymbCoT 99.60 82.50 83.33 88.47 35 60.80
(+0.81) (+2.84) (+4.41) (+7.88) 33.30 60
30 50
Table 1: Performance on symbolic reasoning with First- Ours w/o Planner&Solver&Verifier
Order Logical representation. The second best score is w/o Verifier w/o Translator&Planner&Solver&Verifier
underlined and bold one is the best. In the brackets are Figure 3: Ablation study. Since the Solver is dependent
the corresponding improvements in between. on the Planner, they have to be ablated simultaneously.
LogicalDeduction AR-LSAT Avg
Naive 71.33 33.33 52.33 Baseline. We compare with a range of estab-
CoT 75.25 35.06 55.14 lished baselines. Those based on GPT-3.5 are: 1)
CoT-SC 74.67 - -
ToT 76.83 - - Naive Prompting; 2) CoT (Wei et al., 2022); 3)
CR 78.33 - - Logic-LM (Pan et al., 2023). On GPT-4, apart from
DetermLR 85.00 - -
Logic-LM 87.63 43.04 65.34 the above baselines, we further include more sys-
SymbCoT 93.00 43.91 68.46 tems: 4) CoT-SC (Wang et al., 2023b); 5) ToT (Yao
(+5.37) (+0.87) (+3.12)
et al., 2023); 6) Cumulative Reasoning (CR; Zhang
Table 2: Results (using GPT-4) on symbolic reasoning et al., 2023a); 7) DetermLR (Sun et al., 2023);
with Constraint Optimization representation.
Appendix extends more details of the symbolic
4 Experiments structures (§B), the datasets (§C), and baselines
(§D), as well as the full prompt configurations (§I).
4.1 Settings
Model. Following prior work of Logic-LM (Pan 4.2 Main Result
et al., 2023), we assess the baselines and our Table 1 shows that our method significantly outper-
method using GPT-3.5 (Ouyang et al., 2022) and forms Naive, CoT, and Logic-LM baselines, with
GPT-4 (OpenAI, 2023)2 , ensuring a fair compari- gains of 21.56%, 6.11%, 3.53% on GPT-3.5, and
son with identical model parameters. 22.08%, 9.31% and 7.88% on GPT-4, respectively.
Dataset. Five standard datasets are employed, We notice the only exception is on the FOLIO
including PrOntoQA (Saparov and He, 2023), dataset with GPT-3.5, failing to surpass Logic-LM.
ProofWriter (Tafjord et al., 2021), FOLIO (Han The underperformance points to challenges in non-
et al., 2022), LogicalDeduction (Srivastava et al., linear reasoning, reflecting the inherent challenge
2022), and AR-LSAT (Zhong et al., 2022). Each of for LLMs. But, our approach notably surpasses
them takes different symbolic representations and all baselines across both datasets with GPT-4, es-
introduces its own set of challenges in the topic of pecially outperforming Logic-LM by an average
logical reasoning. The primary metric for evalu- of 7.88%, which demonstrates significant improve-
ation is accuracy, measuring the multiple-choice ments in complex reasoning tasks. In Table 2, we
correctness of the questions. show the results with CO symbolic expression on
two datasets. As seen, our approach surpasses both
Symbolic Structure. In datasets PrOntoQA, CoT and Logic-LM by 13.32% and 3.12%, respec-
ProofWriter, and FOLIO, we use FOL as sym- tively, again demonstrating its general versatility in
bolic structure. To test the generalizability of our different symbolic reasoning expressions.
framework among different symbolic structures,
we further consider the CO symbolic expression in 4.3 Model Ablation
datasets LogicalDeduction and AR-LSAT. To ascertain the individual impact of each mod-
2
Specifically, we use gpt-3.5-turbo-0613 and gpt-4-0613. ule within our framework, we perform an ablation
13.7% 18.1%
19.7%
20.6%
22.3%
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Jiacheng Ye, Chengzu Li, Lingpeng Kong, and Tao Yu.
Abu Awal Md Shoeb, Abubakar Abid, Adam 2023. Generating data for symbolic language with
Fisch, Adam R. Brown, Adam Santoro, Aditya large language models. In Proceedings of the Con-
Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, ference on Empirical Methods in Natural Language
Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Processing, pages 8418–8443.
Alex Ray, Alex Warstadt, Alexander W. Kocurek,
Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Par- Yifan Zhang, Jingqin Yang, Yang Yuan, and An-
rish, Allen Nie, Aman Hussain, Amanda Askell, drew Chi-Chih Yao. 2023a. Cumulative reasoning
Amanda Dsouza, Ameet Rahane, Anantharaman S. with large language models. CoRR, abs/2308.04371.
Iyer, Anders Andreassen, Andrea Santilli, Andreas
Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex
Stuhlmüller, Andrew M. Dai, Andrew La, Andrew K.
Smola. 2023b. Automatic chain of thought prompt-
Lampinen, Andy Zou, Angela Jiang, Angelica Chen,
ing in large language models. In Proceedings of the
Anh Vuong, Animesh Gupta, Anna Gottardi, Anto-
International Conference on Learning Representa-
nio Norelli, Anu Venkatesh, Arash Gholamidavoodi,
tions.
Arfa Tabassum, Arul Menezes, Arun Kirubarajan,
Asher Mullokandov, Ashish Sabharwal, Austin Her- Li Zheng, Hao Fei, Fei Li, Bobo Li, Lizi Liao,
rick, Avia Efrat, Aykut Erdem, Ayla Karakas, and Donghong Ji, and Chong Teng. 2024. Reverse multi-
et al. 2022. Beyond the imitation game: Quantifying choice dialogue commonsense inference with graph-
and extrapolating the capabilities of language models. of-thought. In Proceedings of the AAAI Conference
CoRR, abs/2206.04615. on Artificial Intelligence, volume 38, pages 19688–
19696.
Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang,
Shuo Shang, Ji-Rong Wen, and Rui Yan. 2023. From Wanjun Zhong, Siyuan Wang, Duyu Tang, Zenan Xu,
indeterminacy to determinacy: Augmenting logical Daya Guo, Yining Chen, Jiahai Wang, Jian Yin, Ming
reasoning capabilities with large language models. Zhou, and Nan Duan. 2022. Analytical reasoning of
CoRR, abs/2310.18659. text. In Proceedings of the North American Chap-
ter of the Association for Computational Linguistics,
Oyvind Tafjord, Bhavana Dalvi, and Peter Clark. 2021. pages 2306–2319.
ProofWriter: Generating implications, proofs, and
abductive statements over natural language. In Pro-
ceedings of the Annual Meeting of the Association
for Computational Linguistics, pages 3621–3634.
B Symbolic Representation
B.1 First-Order Logic
First-Order Logic, also known as predicate logic or first-order predicate calculus, is a formal system used
in mathematics, philosophy, linguistics, and computer science to express statements about objects and their
relationships. It extends propositional logic by introducing quantified variables that can represent objects
in a domain of discourse and predicates that can express properties of objects and relations between them.
The syntax of First-Order Logic (FOL) provides a formal structure for constructing expressions that
can represent statements about objects, their properties, and the relationships between them. It defines the
rules for forming well-formed formulas (WFFs), which are expressions in FOL that can be evaluated as
true or false. Below, we introduce the key components of FOL syntax:
Syntax. The syntax of FOL consists of a set of symbols divided into several categories:
• Variables: Symbols used to stand for objects in the domain of discourse (e.g., x, y, z).
• Constant symbols: Names that refer to specific objects in the domain (e.g., a, b, c).
• Predicate symbols: Symbols that represent properties of objects or relations between objects, with an
associated arity (e.g., P, Q, R).
• Function symbols: Symbols representing functions from objects to objects, also with arity (e.g.,
f, g, h).
• Quantifiers: Symbols expressing quantification over objects, including the universal quantifier (∀)
and the existential quantifier (∃).
Terms Terms are the simplest elements in FOL that refer to objects in the domain. Examples include:
• A variable: x
• A constant: a
Atomic Formulas (or Atoms) Atomic formulas are formed by applying a predicate symbol to the
appropriate number of terms. An example of an atomic formula is:
P (x, a)
• Existential quantification: ∃xP (x), stating that there exists an x for which P holds.
Well-formed Formulas (WFFs) Expressions that adhere to the syntactic rules of FOL are considered
well-formed. These expressions include terms, atomic formulas, and complex formulas constructed
according to the rules of formation. The syntax ensures that the expressions are meaningful within the
logic.
• A set of variables, each of which can take on values from a given domain.
• A set of constraints that specify allowable combinations of values for subsets of variables.
• Variables: The elements that represent the aspects of the problem being optimized (e.g., x, y, z).
• Domains: The possible values that each variable can take (e.g., D(x), D(y), D(z)).
• Constraints: Conditions or relations that must be satisfied by the variables (e.g., x + y ≤ z).
• Objective Function: A mathematical expression that defines the goal of the optimization, which can
be either maximization or minimization (e.g., f (x, y, z) = x2 + y 2 + z 2 ).
C Dataset Statistics
In alignment with the established norms of prior research, notably, the practices adopted by Logic-LM
(Pan et al., 2023), our study employs identical records for the sake of ensuring consistency and facilitating
comparative analysis. Table 3 shows the statistics of each dataset.
D Baseline Specification
Here are the details of each baseline:
Naive Prompting. This approach involves instructing the LLMs to directly answer questions with the
aid of a few in-context demonstrations, prioritizing straightforward response generation without explicit
reasoning processes;
Symbolic Test Opts
Dataset
Structure Size Num
PrOntoQA FOL 500 2
ProofWriter FOL 600 3
FOLIO FOL 204 3
LogicalDeduction CSP 300 3, 5, 7
AR-LSAT CSP 230 5
Chain-of-Thought (CoT). CoT encourages the LLM to articulate a step-by-step reasoning process
leading to the final answer, supported by in-context demonstrations (Wei et al., 2022);
Logic-LM Distinguished by its utilization of the LLM to translate natural language contexts and
questions into a symbolic representation, this baseline leverages external solvers to deduce answers,
offering a hybrid approach to problem-solving (Pan et al., 2023).
CoT-SC An advanced version of CoT, which employs self-consistency by an ensemble strategy through
majority-voting among multiple CoTs (Wang et al., 2023b);
Tree-of-thought (ToT) This approach enables LLMs to explore multiple reasoning pathways through a
decision-making tree structure, incorporating backtracking via a search algorithm to identify the globally
optimal reasoning path (Yao et al., 2023);
Cummulative Reasoning (CR) CR utilizes the LLMs in a cumulative and iterative manner, mimicking
the human thought process by building upon previous reasoning steps to gradually approach the solution
(Zhang et al., 2023a);
DetermLR This method constructs a logical reasoning process, transitioning from indeterminate
premises to a determinate conclusion iteratively, and integrates external modules such as memory storage
to support this process (Sun et al., 2023).
To ensure efficient resource management, we adopt the experimental data from Logic-LM (Pan et al.,
2023) and DetermLR (Sun et al., 2023) for comparative analysis as the datasets and records we used
on our experiments are exactly the same. The primary metric for evaluation is accuracy, reflecting the
multiple-choice format of the questions across the datasets.
E Prediction Behavior
We present the predicting behavior of Chain-of-Thought (CoT), Logic-LM, and our method by showing
the percentage of predicting each option in Fig. 9. Please note that for Logic-LM, to see the behavior of
the external solver (Prover9), we turn off their backup strategy which uses the result of CoT when the
records containing syntax error cannot be execute by the external solver. We found that Logic-LM, using
an external solver, tends to predict more "Unknown". This could be attributed to the loss of information
during the translation process, which results in insufficient data for accurately deducing the correct
100 Precision Recall F1
95%
90 89%
85%
80
81% 81% 79% 82%
Score
75% 72%
70
60
50 CoT Logic-LM SymbCoT
Figure 10: The Precision, Recall, and F1 score for CoT, Logic-LM and Ours on FOLIO.
answers. In such instances, these tools opt to predict an outcome as "unknown", where answers that are
actually "True" or "False" as a consequence of this information loss.
F Evaluation Merics
In Fig. 10, the data encapsulates a comparative analysis of performance metrics—precision, recall, and
F1 score—across three models: CoT (Chain of Thought), Logic-LM, and our proposed method. Ours
method surpass others on the F1 score, by 6% on CoT and 3% on Logic-LM.
H Output examples
To have a better understanding of how SymnbCoT solves logical questions, we select a typical solving
example below that showcases the step-by-step logical deduction.
Input:
Predicates:
• Rabbit(x) ::: x is a rabbit.
• Cute(x) ::: x is cute.
• Turtle(x) ::: x is a turtle.
• Animal(x) ::: x is an animal.
• Squirrel(x) ::: x is a squirrel.
• Skittish(x) ::: x is skittish.
• Still(x) ::: x is still.
Premises:
• ∀x(Rabbit(x) → Cute(x)) ::: All rabbits are cute.
Conclusion:
• (Cute(rock) ∧ Still(rock)) → (Turtle(rock) ∧ Skittish(rock)) ::: If Rock is cute and still, then Rock is a turtle and
skittish.
Output:
I Full Prompting
Below is the detailed prompt of our method across all 5 datasets:
I.1 Ours
I.1.1 ProntoQA
Step-1: Translating natural language context into symbolic
Task Description:
You are given a problem description and a question. The task is to:
2. Parse the problem into logic rules based on the defined predicates.
Problem:
• Each jompus is fruity. Every jompus is a wumpus. Every wumpus is not transparent. Wumpuses are tumpuses. Tumpuses
are mean. Tumpuses are vumpuses. Every vumpus is cold. Each vumpus is a yumpus. Yumpuses are orange. Yumpuses
are numpuses. Numpuses are dull. Each numpus is a dumpus. Every dumpus is not shy. Impuses are shy. Dumpuses are
rompuses. Each rompus is liquid. Rompuses are zumpuses. Alex is a tumpus.
Question:
• True or false: Alex is not shy.
Predicates:
• Jompus(x): Is x a jompus?
• Fruity(x): Is x fruity?
• (... More predicates ...)
Facts:
• Tumpuses(Alex): Alex is a tumpus.
• Fruity(x): Is x fruity?
Facts:
• Tumpuses(Alex): Alex is a tumpus.
Plan:
To determine whether the statement "Shy(Alex, False)" is true, false, or unknown based on the given premises, we will use a
systematic approach applying First-Order Logic inference rules. Here’s the step-by-step plan:
• 1. Identify the Goal: Our goal is to establish whether Alex is not shy (Shy(Alex, False))
• 2. Utilize Facts about Alex: We have Tumpuses(Alex, True) which states Alex belongs to Tumpuses.
• By following the plan, you can determine whether "Shy(Alex, False)" is true or false.
• Fruity(x): Is x fruity?
Plan:
To determine whether the statement "Shy(Alex, False)" is true, false, or unknown based on the given premises, we will use a
systematic approach applying First-Order Logic inference rules. Here’s the step-by-step plan:
• 1. Identify the Goal: Our goal is to establish whether Alex is not shy (Shy(Alex, False))
• 2. Utilize Facts about Alex: We have Tumpuses(Alex, True) which states Alex belongs to Tumpuses.
• By following the plan, you can determine whether "Shy(Alex, False)" is true or false.
Execution:
Let’s execute the plan step by step, applying First-Order Logic inference rules.
• Utilize Facts about Alex: - Fact: Tumpuses(Alex, True). - Inference Rule: Given Fact. - Logical Process:
Tumpuses(Alex, True) is given.
Original Execution:
Let’s execute the plan step by step, applying First-Order Logic inference rules.
• Utilize Facts about Alex: - Fact: Tumpuses(Alex, True). - Inference Rule: Given Fact. - Logical Process:
Tumpuses(Alex, True) is given.
Verification:
To verify the logical validity of the original execution concerning the statement "Shy(Alex, False)" based on the given
premises, let’s assess each step against the requirements of First-Order Logic, assumptions, consistency, and semantic
equivalence.
Context verification:
• The context and query correctly correspond to the information in the original context and are semantically consistent.
• Therefore, after verifying the translation between the original context and symbolic format, and the logical process, the
original conclusion "Shy(Alex, Flase) is true" is valid and remains unchanged.
I.1.2 ProofWriter
Step-1: Translating natural language context into symbolic
Task Description:
You are given a problem description and a question. The task is to:
1. Define all the predicates in the problem.
2. Parse the problem into logic rules based on the defined predicates.
Problem:
• Anne is quiet. Erin is furry. Erin is green. Fiona is furry. Fiona is quiet. Fiona is red. Fiona is rough. Fiona is white. Harry
is furry. Harry is quiet. Harry is white. Young people are furry. If Anne is quiet then Anne is red. Young, green people are
rough. If someone is green then they are white. If someone is furry and quiet then they are white. If someone is young and
white then they are rough. All red people are young.
Question:
• Is the following statement true, false, or unknown? Anne is white.
Translation:
Predicates:
• Quiet(x): Is x quiet?
• Furry(x): Is x furry?
Facts:
• Quite(Anne, True): Anne is quiet.
• Furry(x): Is x furry?
Facts:
• Quite(Anne, True): Anne is quiet.
• 2. Define relevant predicates and facts: We need to consider predicates related to being quiet, red, green, white and
condition related to youth and roughness since these are directly or indirectly related to the query about Anne being whte.
• This plan systematically breaks down the steps necessary to assess the truth value of the statement concerning Anne’s color
based on the predicates, facts, and rules provided, applying First-Order Logic inference rules throughout the process.
Step-3: Solving the Problem given the context and plan
Task:
Based on the premises, question and plan, execute each step by selecting relevant premises and making inferences based
on First-Order Logic inference rules to solve the question. Ensure to follow the plan and indicate what First-Order Logic
inference rules you used.
First-Order Logic Inference Rules:
The First-Order Logic inference rules include but are not limited to: Modus Ponens, Modus Tollens, (... More FOL rules ...)
Predicates:
• Quiet(x): Is x quiet?
• Furry(x): Is x furry?
Facts:
• Quite(Anne, True): Anne is quiet.
• 2. Define relevant predicates and facts: We need to consider predicates related to being quiet, red, green, white and
condition related to youth and roughness since these are directly or indirectly related to the query about Anne being whte.
• This plan systematically breaks down the steps necessary to assess the truth value of the statement concerning Anne’s color
based on the predicates, facts, and rules provided, applying First-Order Logic inference rules throughout the process.
Execution:
Let’s execute the plan step by step, applying First-Order Logic inference rules.
• Given the information and the rules, we cannot directly infer "White(Anne, True)" based on the First-Order Logic
inference rules provided. Thus, White(Anne, True) is unknown.
Step-4: Verifying the translation and solving process
Task:
The task is to verify whether the original execution correctly determines the value of the given conclusion based on the
context. You should abide by the First-Order Logic rules when checking the original execution and the conclusion.
When verifying the logical process and the final answer, make sure you fully consider the following aspects before verification:
1. Whether the logical process abides by the First-Order Logic inference rules.
2. Assumption and Consistency Verification. Please check whether the relevant facts or rules used in a logical step are indeed
from the context or inferred from the previous steps. If you think it contains assumption in the logical steps, you need to
clearly what assumption it is and verify whether the assumptions are from the context and whether it is already inferred from
the previous steps. Please note that an assumption that does not exist from the context or was not inferred from the previous
step will be considered an invalid assumption. Please note that if an assumption is semantically equivalent to context or
information inferred from the context, it should be seen as a valid assumption. For example, "X sees X" can be interpreted as
"X sees itself" since they are semantically equivalent.
3. Whether the translation of First-Order Logic is semantically consistent with the natural language. Make sure that for
compound predicates that are separated by a comma, you should treat the comma as "or". For instance, "All red, rough things
are big." should be interpreted as Both red things and rough things are big. Thus, it should be translated to translated to
"Red($x, True) ∨ Rough($x, True) → Big($x, True)" because being red or rough will make the conditional statement being
true.
4. Refinement of Logical Process: If the original execution violates any of the above steps, refine the logical process. Use the
relevant premises and information correctly derived from previous steps to create a more accurate logical deduction.
5. Handling of Conditional Statements in Conclusions: For conclusions containing conditional statements (e.g., "If..."),
remember that in First-Order Logic, a conditional statement is considered false only when the antecedent is true and the
consequent is false. In all other cases, the conditional statement is considered true.
6. If applicable, please provide a detailed analysis of each step in the logical process and a refined answer at the end, using
the format true/false/unknown. The answer should be one of these three options. 7. Make sure the conclusion aligns with the
information inferred from the logical steps.
Predicates:
• Quiet(x): Is x quiet?
• Furry(x): Is x furry?
• (... More predicates ...)
Facts:
• Quite(Anne, True): Anne is quiet.
• (... More facts ...)
Rules:
• Young($x, True) ⇒ Furry($x, True): Young people are furry.
• (... More rules ...)
Conditional Rules:
• Quite(Anne, True) ⇒ Red($x, True): If Anne is quiet then Anne is red.
• (... More conditional rules ...)
Rules with Compound Predicates
• Young($x, True) ∨ Green($x, True): Young, green people are rough.
• (... More conditional rules ...)
Query:
• White(Anne, True) ::: Anne is white.
Original Execution:
Let’s execute the plan step by step, applying First-Order Logic inference rules.
• 2. The rule "Quite(Anne, True) → Red($x, True)" also contains a typo and should be "Quiet(Anne, True) → Red($x,
True)." This rule incorrectly uses $x instead of specifically stating "Anne," which might lead to confusion. The correct
interpretation for the context should directly indicate that if Anne is quiet, then Anne is red, implying "Quiet(Anne, True)
→ Red(Anne, True)."
• (... More logical verification ...)
• The conclusion that "White(Anne, True)" remains unknown is consistent with the logical assessment of the provided
context, considering the application of First-Order Logic inference rules and the available premises.
I.1.3 FOLIO
Step-1: Translating natural language context into symbolic
Task Description:
Given a problem description and a question, the task is to parse the problem and the question into First-Order Logic formulas.
The grammar of the First-Order Logic formula is defined as follows:
Problem:
• Miroslav Venhoda was a Czech choral conductor who specialized in the performance of Renaissance and Baroque music.
Any choral conductor is a musician. Some musicians love music. Miroslav Venhoda published a book in 1946 called
Method of Studying Gregorian Chant.
Question:
• Is the following statement true, false, or uncertain? Miroslav Venhoda loved music.
Translation:
Predicates:
• Czech(x): x is a Czech person.
• Musician(x): x is a musician.
• Book(x): x is a book.
Premises:
• Czech(miroslav) ∧ ChoralConductor(miroslav) ∧ Specialize(miroslav, renaissance) ∧ Specialize(miroslav, baroque):
Miroslav Venhoda was a Czech choral conductor who specialized in the performance of Renaissance and Baroque music.
Query:
• Love(miroslav, music) ::: Miroslav Venhoda loved music.
• Musician(x): x is a musician.
• Book(x): x is a book.
Premises:
• Czech(miroslav) ∧ ChoralConductor(miroslav) ∧ Specialize(miroslav, renaissance) ∧ Specialize(miroslav, baroque):
Miroslav Venhoda was a Czech choral conductor who specialized in the performance of Renaissance and Baroque music.
Query:
• Love(miroslav, music) ::: Miroslav Venhoda loved music.
Plan:
To determine whether the conclusion "Love(miroslav, music) ::: Miroslav Venhoda loved music" is true, false, or unknown
based on the given premises, we can follow these steps:
• 1. Identify the Goal: Our goal is to assess whether it can be logically concluded that Miroslav Venhoda loved music based
on the available premises.
• 2. Define relevant predicates and facts: Premise 1 introduces Miroslav as a Czech choral conductor specializing in
Renaissance and Baroque music. This establishes his professional background and interests but does not directly address
his personal feelings toward music. Premise 2 establishes that all choral conductors are musicians. This is a classification
premise that links the profession of being a choral conductor with being a musician.
• This plan outlines a systematic approach to assessing the provided conclusion against the premises using First-Order Logic
inference rules.
Step-3: Solving the Problem given the context and plan
Task:
Based on the premises, question, and plan, execute each step by selecting relevant premises and making inferences based
on First-Order Logic inference rules to solve the question. Ensure to follow the plan and indicate what First-Order Logic
inference rules you used.
First-Order Logic Inference Rules:
The First-Order Logic inference rules include but are not limited to: Modus Ponens, Modus Tollens, (... More FOL rules ...)
Predicates:
• Czech(x): x is a Czech person.
• Musician(x): x is a musician.
• Book(x): x is a book.
Premises:
• Czech(miroslav) ∧ ChoralConductor(miroslav) ∧ Specialize(miroslav, renaissance) ∧ Specialize(miroslav, baroque):
Miroslav Venhoda was a Czech choral conductor who specialized in the performance of Renaissance and Baroque music.
Query:
• Love(miroslav, music) ::: Miroslav Venhoda loved music.
Plan:
To determine whether the conclusion "Love(miroslav, music) ::: Miroslav Venhoda loved music" is true, false, or unknown
based on the given premises, we can follow these steps:
• 1. Identify the Goal: Our goal is to assess whether it can be logically concluded that Miroslav Venhoda loved music based
on the available premises.
• 2. Define relevant predicates and facts: Premise 1 introduces Miroslav as a Czech choral conductor specializing in
Renaissance and Baroque music. This establishes his professional background and interests but does not directly address
his personal feelings toward music. Premise 2 establishes that all choral conductors are musicians. This is a classification
premise that links the profession of being a choral conductor with being a musician.
• This plan outlines a systematic approach to assessing the provided conclusion against the premises using First-Order Logic
inference rules.
Execution:
To solve the question about whether "Miroslav Venhoda loved music" based on the given premises, let’s proceed systemati-
cally:
• Apply Universal Instantiation to Premise 2: From Premise 2, since Miroslav is a choral conductor, we can infer he is also
a musician based on the rule that all choral conductors are musicians. Using Universal Instantiation and Modus Ponens,
we deduce Musician(miroslav).
• (... More solving steps ...)
• The conclusion that "Miroslav Venhoda loved music" cannot be definitively proven true solely based on the provided
premises and the strict application of First-Order Logic rules. Therefore, the answer is unknown.
Step-4: Verifying the translation and solving process
Task:
The task is to verify whether the original execution correctly determines the value of the given conclusion based on the
context. You should abide by the First-Order Logic rules when checking the original execution and the conclusion.
When verifying the logical process and the final answer, make sure you fully consider the following aspects before verification:
1. Whether the logical process abides by the First-Order Logic inference rules.
2. Assumption and Consistency Verification. Please check whether the relevant facts or rules used in a logical step are indeed
from the context or inferred from the previous steps. If you think it contains assumption in the logical steps, you need to
clearly what assumption it is and verify whether the assumptions are from the context and whether it is already inferred from
the previous steps. Please note that an assumption that does not exist from the context or was not inferred from the previous
step will be considered an invalid assumption. Please note that if an assumption is semantically equivalent to context or
information inferred from the context, it should be seen as a valid assumption. For example, "X sees X" can be interpreted as
"X sees itself" since they are semantically equivalent.
3. Whether the translation First-Order Logic is semantically consistent with the natural language. Make sure that for
compound predicates that are separated by a comma, you should treat the comma as "or". For instance, "All red, rough things
are big." should be interpreted as Both red things and rough things are big. Thus, it should be translated to translated to
"Red($x, True) ∨ Rough($x, True) → Big($x, True)" because being red or rough will make the conditional statement being
true.
4. Refinement of Logical Process: If the original execution violates any of the above steps, refine the logical process. Use the
relevant premises and information correctly derived from previous steps to create a more accurate logical deduction.
5. Handling of Conditional Statements in Conclusions: For conclusions containing conditional statements (e.g., "If..."),
remember that in First-Order Logic, a conditional statement is considered false only when the antecedent is true and the
consequent is false. In all other cases, the conditional statement is considered true.
6. If applicable, please provide a detailed analysis of each step in the logical process and a refined answer at the end, using
the format true/false/unknown. The answer should be one of these three options. 7. Make sure the conclusion aligns with the
information inferred from the logical steps.
Predicates:
• Czech(x): x is a Czech person.
• Musician(x): x is a musician.
• Book(x): x is a book.
Premises:
• Czech(miroslav) ∧ ChoralConductor(miroslav) ∧ Specialize(miroslav, renaissance) ∧ Specialize(miroslav, baroque):
Miroslav Venhoda was a Czech choral conductor who specialized in the performance of Renaissance and Baroque music.
Query:
• Love(miroslav, music) ::: Miroslav Venhoda loved music.
Original Execution:
To solve the question about whether "Miroslav Venhoda loved music" based on the given premises, let’s proceed systemati-
cally:
• Apply Universal Instantiation to Premise 2: From Premise 2, since Miroslav is a choral conductor, we can infer he is also
a musician based on the rule that all choral conductors are musicians. Using Universal Instantiation and Modus Ponens,
we deduce Musician(miroslav).
• (... More solving steps ...)
• The conclusion that "Miroslav Venhoda loved music" cannot be definitively proven true solely based on the provided
premises and the strict application of First-Order Logic rules. Therefore, the answer is unknown.
Verification:
Verification of the original execution involves evaluating the logical process used to determine whether Miroslav Venhoda
loved music, based on the provided context and premises.
Context verification:
• The context and query correctly correspond to the information in the original context and are semantically consistent.
• The original execution’s conclusion that the statement "Love(miroslav, music)" is unknown is correct based on the
available premises. The verification process confirms that the original logical process and conclusion are accurate,
adhering to the principles of First-Order Logic given the information provided.
I.1.4 LogicalDeduction
Question:
• Which of the following is true?
Options:
• A) The station wagon is the second-newest.
• convertible ∈ {1, 2, 3}
• minivan ∈ {1, 2, 3}
Constraints:
• station_wagon == 1: The station wagon is the oldest.
Query:
• convertible ∈ {1, 2, 3}
• minivan ∈ {1, 2, 3}
Constraints:
• station_wagon == 1: The station wagon is the oldest.
Query:
Plan: To approach the constraint optimization problem described, involving variables like station_wagon, convertible, and
minivan with their respective constraints, you can follow a systematic plan.
• 1. Identify Variables and their domain: List all the variables (station_wagon, convertible, minivan) and note their possible
values (1, 2, 3), where 1 represents the oldest and 3 represents the newest.
• 2. Apply the constraints to narrow down the possibility: The station_wagon is the oldest.
The minivan is newer than the convertible.
All vehicles (station_wagon, convertible, minivan) must have different ages.
• This plan provides a systematic approach to solving the problem without directly computing the solution. It emphasizes
understanding constraints, logical deduction, and iterative refinement to navigate toward a solution that satisfies all given
conditions.
• convertible ∈ {1, 2, 3}
• minivan ∈ {1, 2, 3}
Constraints:
• station_wagon == 1: The station wagon is the oldest.
Query:
Plan: To approach the constraint optimization problem described, involving variables like station_wagon, convertible, and
minivan with their respective constraints, you can follow a systematic plan.
• 1. Identify Variables and their domain: List all the variables (station_wagon, convertible, minivan) and note their possible
values (1, 2, 3), where 1 represents the oldest and 3 represents the newest.
• 2. Apply the constraints to narrow down the possibility: The station_wagon is the oldest.
The minivan is newer than the convertible.
All vehicles (station_wagon, convertible, minivan) must have different ages.
• This plan provides a systematic approach to solving the problem without directly computing the solution. It emphasizes
understanding constraints, logical deduction, and iterative refinement to navigate toward a solution that satisfies all given
conditions.
Execution:
Let’s solve the given constraint optimization problem following the outlined plan:
• Identify variables and their domains: Variables: station_wagon, convertible, minivan. Domain for each: 1 (oldest), 2, 3
(newest)
• station_wagon == 1: Station wagon is the oldest.
minivan > convertible: Minivan is newer than the convertible.
All vehicles have different ages.
• Apply Direct Constraints: Assign station_wagon the value 1 since it’s the oldest.
• Based on the configurations that satisfy all constraints, the only query that holds true is convertible == 2, making it the
second-newest vehicle. Therefore, the final answer is B.
• convertible ∈ {1, 2, 3}
• minivan ∈ {1, 2, 3}
Constraints:
• station_wagon == 1: The station wagon is the oldest.
• minivan > convertible: The minivan is newer than the convertible.
• AllDifferentConstraint([station_wagon, convertible,
minivan]): All vehicles have different values.
Query:
Original Execution:
Let’s solve the given constraint optimization problem following the outlined plan:
• Identify variables and their domains: Variables: station_wagon, convertible, minivan. Domain for each: 1 (oldest), 2, 3
(newest)
• station_wagon == 1: Station wagon is the oldest.
minivan > convertible: Minivan is newer than the convertible.
All vehicles have different ages.
• Apply Direct Constraints: Assign station_wagon the value 1 since it’s the oldest.
• (... More solving steps ...)
• Based on the configurations that satisfy all constraints, the only query that holds true is convertible == 2, making it the
second-newest vehicle. Therefore, the final answer is B.
Verification:
Verification of the translation and the Original Execution:
Context verification:
• The context and query correctly correspond to the information in the original context and are semantically consistent.
Logical Process Verification:
• Given the constraints and the domain’s interpretation, let’s verify the original execution:
• 1. Station Wagon’s Age: As the oldest, its correct assignment is 1. This matches both the symbolic and natural language
descriptions.
• 2. Convertible and Minivan Relationship: The constraint "minivan > convertible" suggests the minivan is newer (higher
number) than the convertible. Given the domain and the direct constraint on the station wagon, the convertible cannot
be the oldest (1), and the minivan cannot be the oldest due to its relationship with the convertible. This necessitates a
careful review of the possible configurations.
• (... More logical verification ...)
• The original solution’s interpretation of the constraints and the final answer is consistent with the domain’s direction and
the natural language description of the constraints. The logic applied matches the expected outcomes, and no changes
are necessary to the symbolic format or the conclusion based on the provided context and constraints. Therefore, the
original answer B remains unchanged.
I.1.5 AR-LSAT
.
Step-1: Translating natural language context into symbolic
Task Description:
Task Description: You are given a problem description. The task is to parse the problem as a constraint satisfaction problem,
defining the domain, variables, and constraints.
Problem:
• Four boys—Fred, Juan, Marc, and Paul—and three girls—Nita, Rachel, and Trisha—will be assigned to a row of five
adjacent lockers, numbered consecutively 1 through 5, arranged along a straight wall. The following conditions govern the
assignment of lockers to the seven children: Each locker must be assigned to either one or two children, and each child
must be assigned to exactly one locker. Each shared locker must be assigned to one girl and one boy. Juan must share a
locker, but Rachel cannot share a locker. Nita’s locker cannot be adjacent to Trisha’s locker. Fred must be assigned to
locker 3.
Question:
• If the first three lockers are assigned to girls, which one of the following must be true?
Options:
• A) Juan is assigned to locker 1.
• B) Nita is assigned to locker 3.
• C) Trisha is assigned to locker 1.
• D) Juan is assigned to the same locker as Trisha.
• E) Paul is assigned to the same locker as Trisha.
Translation:
Domain
• Lockers: 1 to 5
Variables
• Fred_locker ∈ {1, 2, 3, 4, 5}
• Juan_locker ∈ {1, 2, 3, 4, 5}
• Marc_locker ∈ {1, 2, 3, 4, 5}
• Paul_locker ∈ {1, 2, 3, 4, 5}
• Nita_locker ∈ {1, 2, 3, 4, 5}
• Rachel_locker ∈ {1, 2, 3, 4, 5}
• Trisha_locker ∈ {1, 2, 3, 4, 5}
Constraints
1. Each Locker One or Two Children: Each locker can have either one or two children assigned to it.
2. Each Child One Locker: Each child is assigned to exactly one locker.
3. Shared Locker Boy and Girl: If a locker is shared, it must be shared between one boy and one girl.
4. Juan Must Share: Juan_locker must be the same as one girl’s locker.
5. Rachel Cannot Share: Rachel_locker must be unique and not shared with any other child.
6. Nita Not Adjacent to Trisha: |N ita_locker − T risha_locker| =
̸ 1
7. Fred Assigned to Locker 3: Fred_locker = 3
8. First Three Lockers to Girls: At least one girl must be assigned to each of the lockers 1, 2, and 3.
Queries for Options (If the first three lockers are assigned to girls, which one of the following must be true?)
• A) Juan is assigned to locker 1. Query: Juan_locker = 1 and Juan_locker is shared with a girl
• B) Nita is assigned to locker 3. Query: Nita_locker = 3
• C) Trisha is assigned to locker 1. Query: Trisha_locker = 1
• D) Juan is assigned to the same locker as Trisha. Query: Juan_locker = Trisha_locker
• E) Paul is assigned to the same locker as Trisha. Query: Paul_locker = Trisha_locker
Step-2: Deriving plan based on the natural and symbolic context
Task:
Can you derive a step-by-step plan that can use the domain, variables and constraints relevant inference rules to choose the
correct option satisfying all the constraints?
Domain
• Lockers: 1 to 5
Variables
• Fred_locker ∈ {1, 2, 3, 4, 5}
• Juan_locker ∈ {1, 2, 3, 4, 5}
• Marc_locker ∈ {1, 2, 3, 4, 5}
• Paul_locker ∈ {1, 2, 3, 4, 5}
• Nita_locker ∈ {1, 2, 3, 4, 5}
• Rachel_locker ∈ {1, 2, 3, 4, 5}
• Trisha_locker ∈ {1, 2, 3, 4, 5}
Constraints
1. Each Locker One or Two Children: Each locker can have either one or two children assigned to it.
2. Each Child One Locker: Each child is assigned to exactly one locker.
3. Shared Locker Boy and Girl: If a locker is shared, it must be shared between one boy and one girl.
4. Juan Must Share: Juan_locker must be the same as one girl’s locker.
5. Rachel Cannot Share: Rachel_locker must be unique and not shared with any other child.
8. First Three Lockers to Girls: At least one girl must be assigned to each of the lockers 1, 2, and 3.
Queries for Options (If the first three lockers are assigned to girls, which one of the following must be true?)
• A) Juan is assigned to locker 1. Query: Juan_locker = 1 and Juan_locker is shared with a girl
Plan: To approach this constraint optimization problem regarding locker assignments for children, follow a systematic plan
that accounts for all given constraints and variables. This plan will guide you through understanding the problem, applying
logical reasoning, and deducing possible configurations that satisfy all conditions.
• 1. Understand the Domain and Variables: Acknowledge the domain of lockers numbered 1 to 5. Note each child’s possible
locker assignments (Fred, Juan, Marc, Paul, Nita, Rachel, Trisha) within this domain.
• 2. Comprehend Constraints:
Each locker can have either one or two children assigned, but no more.
Each child is assigned exactly one locker.
(... More constraints ...)
• (... More plan ...)
• This plan emphasizes a logical, step-by-step approach to deducing valid locker assignments for each child, ensuring
compliance with all constraints. By methodically analyzing each query in the context of these constraints, you can identify
which statements about locker assignments must be true.
Step-3: Solving the Problem given the context and plan
Task:
The task is a constraint optimization problem, and you need to select the most appropriate option that is true from the given
options. The domain, variables, constraints, and relevant plan will be given to you to solve the question.
There will be only one answer, so choose the one you think is the most likely.
Here is a workflow you need to follow to solve the constraint optimization problem:
1.Domain and Variables: Identify the possible values (domain) and variables involved.
2.Apply Constraints: Read and apply the specific constraints to the variables.
3.List Possibilities: Enumerate all potential combinations or orders that could satisfy the constraints.
4.Determine Order/Values: Establish the correct order or specific values for each variable as per the constraints.
5.Demonstrate Satisfaction of Constraints: Provide a specific example that shows all constraints are met.
6.Evaluate Queries: Analyze each query against the established order or values.
7.Identify Correct Answer: Choose the option that aligns with the determined order/values.
8.Present Final Answer: Clearly state the solution that satisfies the constraints and query.
Domain
• Lockers: 1 to 5
Variables
• Fred_locker ∈ {1, 2, 3, 4, 5}
• Juan_locker ∈ {1, 2, 3, 4, 5}
• Marc_locker ∈ {1, 2, 3, 4, 5}
• Paul_locker ∈ {1, 2, 3, 4, 5}
• Nita_locker ∈ {1, 2, 3, 4, 5}
• Rachel_locker ∈ {1, 2, 3, 4, 5}
• Trisha_locker ∈ {1, 2, 3, 4, 5}
Constraints
1. Each Locker One or Two Children: Each locker can have either one or two children assigned to it.
2. Each Child One Locker: Each child is assigned to exactly one locker.
3. Shared Locker Boy and Girl: If a locker is shared, it must be shared between one boy and one girl.
4. Juan Must Share: Juan_locker must be the same as one girl’s locker.
5. Rachel Cannot Share: Rachel_locker must be unique and not shared with any other child.
8. First Three Lockers to Girls: At least one girl must be assigned to each of the lockers 1, 2, and 3.
Queries for Options (If the first three lockers are assigned to girls, which one of the following must be true?)
• A) Juan is assigned to locker 1. Query: Juan_locker = 1 and Juan_locker is shared with a girl
Plan: To approach this constraint optimization problem regarding locker assignments for children, follow a systematic plan
that accounts for all given constraints and variables. This plan will guide you through understanding the problem, applying
logical reasoning, and deducing possible configurations that satisfy all conditions.
• 1. Understand the Domain and Variables: Acknowledge the domain of lockers numbered 1 to 5. Note each child’s possible
locker assignments (Fred, Juan, Marc, Paul, Nita, Rachel, Trisha) within this domain.
• 2. Comprehend Constraints:
Each locker can have either one or two children assigned, but no more.
Each child is assigned exactly one locker.
(... More constraints ...)
• 3. Apply all the constraints to each option.
• This plan emphasizes a logical, step-by-step approach to deducing valid locker assignments for each child, ensuring
compliance with all constraints. By methodically analyzing each query in the context of these constraints, you can identify
which statements about locker assignments must be true.
Execution:
Let’s solve the given constraint optimization problem following the outlined plan:
Option A: Juan is assigned to locker 1.
• Test Against Constraints: If Juan is not in locker 1, he could be in locker 2, 3, 4, or 5. Since locker 3 is taken by Fred (a
boy), Juan cannot share with him due to constraint 3. If Juan shares locker 4 or 5 with a girl, this would still satisfy all
constraints. Therefore, violating the condition in Option A can still lead to a valid order satisfying all the constraints.
Thus, Option A does not have to be true.
• Test Against Constraints: Since Fred is assigned to locker 3, Nita cannot be assigned to locker 3. This does not violate
any constraints, as Nita can be assigned to any other locker that is not adjacent to Trisha’s. Therefore, Option B cannot
be true because it directly conflicts with constraint 7.
(... More solving steps ...)
Conclusion:
• Upon evaluation, we notice that only option A must be true to satisfy all the constraint. Therefore, the final answer is A.
• Your task is to apply this method to verify the problem, and if necessary, re-solve it to find a valid solution.
Domain
• Lockers: 1 to 5
Variables
• Fred_locker ∈ {1, 2, 3, 4, 5}
• Juan_locker ∈ {1, 2, 3, 4, 5}
• Marc_locker ∈ {1, 2, 3, 4, 5}
• Paul_locker ∈ {1, 2, 3, 4, 5}
• Nita_locker ∈ {1, 2, 3, 4, 5}
• Rachel_locker ∈ {1, 2, 3, 4, 5}
• Trisha_locker ∈ {1, 2, 3, 4, 5}
Constraints
1. Each Locker One or Two Children: Each locker can have either one or two children assigned to it.
2. Each Child One Locker: Each child is assigned to exactly one locker.
3. Shared Locker Boy and Girl: If a locker is shared, it must be shared between one boy and one girl.
4. Juan Must Share: Juan_locker must be the same as one girl’s locker.
5. Rachel Cannot Share: Rachel_locker must be unique and not shared with any other child.
Queries for Options (If the first three lockers are assigned to girls, which one of the following must be true?)
• A) Juan is assigned to locker 1. Query: Juan_locker = 1 and Juan_locker is shared with a girl
• B) Nita is assigned to locker 3. Query: Nita_locker = 3
• C) Trisha is assigned to locker 1. Query: Trisha_locker = 1
• D) Juan is assigned to the same locker as Trisha. Query: Juan_locker = Trisha_locker
• E) Paul is assigned to the same locker as Trisha. Query: Paul_locker = Trisha_locker
Original Execution:
Let’s solve the given constraint optimization problem following the outlined plan:
Option A: Juan is assigned to locker 1.
• Negation of Condition for A: Juan is not assigned to locker 1.
• Test Against Constraints: If Juan is not in locker 1, he could be in locker 2, 3, 4, or 5. Since locker 3 is taken by Fred (a
boy), Juan cannot share with him due to constraint 3. If Juan shares locker 4 or 5 with a girl, this would still satisfy all
constraints. Therefore, violating the condition in Option A can still lead to a valid order satisfying all the constraints.
Thus, Option A does not have to be true.
Option B: Nita is assigned to locker 3.
• Negation of Condition for B: Nita is not assigned to locker 3.
• Test Against Constraints: Since Fred is assigned to locker 3, Nita cannot be assigned to locker 3. This does not violate
any constraints, as Nita can be assigned to any other locker that is not adjacent to Trisha’s. Therefore, Option B cannot
be true because it directly conflicts with constraint 7.
(... More solving steps ...)
Conclusion:
• Upon evaluation, we notice that only option A must be true to satisfy all the constraints. Therefore, the final answer is A.
Verification:
Verification of the translation and the Original Execution:
Context verification:
• The context and query correctly correspond to the information in the original context and are semantically consistent.
Logical Process Verification:
• Given the constraints and the domain’s interpretation, let’s verify the original execution and other options:
• Option A: Verification of Constraints: If Juan is assigned to locker 1, he must share with a girl due to constraint 4. Since
locker 3 is taken by Fred (a boy), and Rachel cannot share a locker (constraint 5), Rachel must be in locker 2. This leaves
locker 1 for Juan to share with a girl other than Rachel. This satisfies all constraints, making Option A a possibility.
• Option B: Nita is assigned to locker 3. Verification of Constraints: Since Fred is assigned to locker 3 (constraint 7), Nita
cannot be assigned to locker 3. This directly conflicts with constraint 7, making Option B impossible.
• Given that the first three lockers are assigned to girls, Fred is in locker 3, and Rachel cannot share a locker, Rachel must
be in locker 1 or 2. If Rachel is in locker 1, then Trisha cannot be in locker 1, which means Trisha must be in locker 2.
This leaves locker 1 available for Juan to share with a girl, which must be the case since Juan must share a locker with a
girl, and locker 3 is occupied by Fred, a boy. Therefore, Juan must be assigned to locker 1 to satisfy the constraints,
making Option A the one that must be true. Upon verification, we found that Option A indeed is the only possible answer.
Thus, the answer A should remain unchanged.
• Max is a yumpus.
Conclusion:
• The correct option is: B)
I.2.2 ProofWriter
Given a problem statement as contexts, the task is to answer a logical reasoning question.
Context:
• The cow is blue.
• The cow is round.
• The cow likes the lion.
• The cow visits the tiger.
• If something likes the squirrel and it likes the cow then it visits the tiger.
• If the cow is cold and the cow visits the lion then the lion sees the squirrel.
Question: Based on the above information, is the following statement true, false, or unknown? The tiger is not young.
Options:
A) True
B) False
C) Unknown
Reasoning:
• The tiger likes the cow.
• If something likes the squirrel and it likes the cow, then it visits the tiger. So the tiger visits the tiger.
• If something is nice and it sees the tiger then it is young. So the tiger is young.
Conclusion:
• The correct option is: B)
I.2.3 FOLIO
.
Given a problem statement as contexts, the task is to answer a logical reasoning question.
Context:
• The Blake McFall Company Building is a commercial warehouse listed on the National Register of Historic Places.
• The Blake McFall Company Building was added to the National Register of Historic Places in 1990.
• The Emmet Building is another name for the Blake McFall Company Building.
Question: Based on the above information, is the following statement true, false, or uncertain? The Blake McFall Company
Building is located in Portland, Oregon.
Options:
A) True
B) False
C) Uncertain
Reasoning:
• The Blake McFall Company Building is another name for the Emmet Building.
Conclusion:
• The correct option is: A)
I.2.4 LogicalDeduction
.
Given a problem statement as contexts, the task is to answer a logical reasoning question.
Context:
• On a branch, there are five birds: a quail, an owl, a raven, a falcon, and a robin.
• The owl is the leftmost.
• The robin is to the left of the raven.
• The quail is the rightmost.
• The raven is the third from the left.
Question: Which of the following is true?
Options:
A) The quail is the rightmost.
B) The owl is the rightmost.
C) The raven is the rightmost.
D) The falcon is the rightmost.
E) The robin is the rightmost.
Reasoning:
• The owl is the leftmost. This means the owl is not the rightmost.
• The robin is to the left of the raven. This means neither the robin nor the raven is the rightmost.
• The quail is the rightmost. This directly answers the question, so option A) The quail is the rightmost is true.
• The raven is the third from the left. This means the raven is also not the rightmost.
Conclusion:
• The correct option is: A)
I.2.5 AR-LSAT
.
Given a problem statement as contexts, the task is to answer a logical reasoning question.
Context:
• During a single week, from Monday through Friday, tours will be conducted of a company’s three divisions—Operations,
Production, and Sales.
• Exactly five tours will be conducted that week, one each day.
• The schedule of tours for the week must conform to the following restrictions:
– Each division is toured at least once.
– The Operations division is not toured on Monday.
– The Production division is not toured on Wednesday.
– The Sales division is toured on two consecutive days, and on no other days.
– If the Operations division is toured on Thursday, then the Production division is toured on Friday.
Question: Which one of the following CANNOT be true of the week’s tour schedule?
Options:
A) The division that is toured on Monday is also toured on Tuesday.
Reasoning:
• Since Thursday and Friday already have tours planned, only Monday, Tuesday, and Wednesday tours need to be
determined.
• On Tuesday, the Sales division is toured since that division is toured on only two consecutive days, i.e., either Monday
and Tuesday or Tuesday and Wednesday. A different division is toured on Thursday.