Skip to content

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Sep 3, 2025

There are some languages for which the ClangExpressionParser currently switches the language type behind a user's back. Specifically, C gets turned into C++ and ObjC into ObjC++. That's because the Clang expression evaluator depends on C++ features. These languages have different semantics, so if, e.g., a user forcefully wants to evaluate an expression in C, but we switch it to C++, they will most likely wonder why the expression failed (we get reports like this every once in a while...#152113 is a variation of this).

This patch rejects languages specified with expression --language that we won't be able to explicitly evaluate.

rdar://159669244

…expression evaluation

There are some languages for which the `ClangExpressionParser` currently
switches the language type behind a user's back. Specifically, `C` gets
turned into `C++` and `ObjC` into `ObjC++`. That's because the Clang
expression evaluator depends on C++ features. These languages have
different semantics, so if, e.g., a user forcefully wants to evaluate an
expression in `C`, but we switch it to `C++`, we get reports from users
confused why the expression failed.

This patch rejects languages specified with `expression --language` that
we won't be able to explicitly evaluate.

rdar://159669244
Copy link
Contributor

@felipepiovezan felipepiovezan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially hesitant to make this a hard error, but the fact that we already have a list of supported languages and that C is not in there makes me more confident in erroring here. Would be curious to hear other's thoughts.

StreamString sstr;
sstr.Printf("unknown language type: '%s' for expression. "
sstr.Printf("invalid language '%s' for expression. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to clarify what invalid means in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. Does this also mean we can delete some language auto-upgrading code somewhere?

@jimingham
Copy link
Collaborator

jimingham commented Sep 3, 2025

For C in particular we have a "half-way between" mode where we allowed some keywords (like class) even though we still used C++ reference semantics. So for instance:

(lldb) run
Process 79429 launched: '/tmp/foo' (arm64)
Process 79429 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000480 foo`main at foo.c:7
   4   	main()
   5   	{
   6   	  int class = 100;
-> 7   	  printf("class is: %d.\n", class);
    	                            ^
   8   	  return 0;
   9   	}
Target 0: (foo) stopped.
(lldb) expr -l C -- class = 110
(int) $0 = 110
(lldb) n
class is: 110.
Process 79429 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x000000010000049c foo`main at foo.c:8
   5   	{
   6   	  int class = 100;
   7   	  printf("class is: %d.\n", class);
-> 8   	  return 0;
    	  ^
   9   	}
Target 0: (foo) stopped.
(lldb) expr -l c++ -- class = 120
                      ^
                      error: declaration of anonymous class must be a definition
                      warning: declaration does not declare anything

That seems useful, does this patch break that feature?

@jimingham
Copy link
Collaborator

BTW, if this patch breaks that feature w/o some test catching it, we should add a test for these allowed keywords.

@Michael137
Copy link
Member Author

I think this makes sense. Does this also mean we can delete some language auto-upgrading code somewhere?

Not yet, that code still is in place so that we can evaluate expressions in C frames. But it gets us a tiny step closer i guess

@Michael137
Copy link
Member Author

Michael137 commented Sep 3, 2025

BTW, if this patch breaks that feature w/o some test catching it, we should add a test for these allowed keywords.

This patch doesn't break said feature. It only breaks if you explicitly pass expression --language C, and expect it to work. If it's a C frame, then expression class = 123 will still work.

This is what your example now would look like:

(lldb) f
frame #0: 0x0000000100000490 a.out`main at main.c:5:1
   1    int main() {
   2      int class = 100;
   3      __builtin_printf("class is: %d.\n", class);
   4      __builtin_debugtrap();
-> 5    }
(lldb) expr class
(int) $1 = 100
(lldb) expr -l c -- class
error: invalid language 'c' for expression. List of supported languages:
  c++
  objective-c++
  c++03
  c++11
  c++14
  c++17
  c++20
  objc++
(lldb) expr -l c++ -- class
                      ˄
                      ╰─ error: declaration of anonymous class must be a definition
                      ╰─ warning: declaration does not declare anything

lldb/test/API/lang/c/cpp_keyword_identifiers/TestCppKeywordsAsCIdentifiers.py tests this (for a large subset of the c++ keywords, maybe even all of them, haven't cross-referenced). The test doesn't specify -l c, so it wasn't affect by my change

@llvmbot llvmbot added the lldb label Sep 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-lldb

Author: Michael Buch (Michael137)

Changes

There are some languages for which the ClangExpressionParser currently switches the language type behind a user's back. Specifically, C gets turned into C++ and ObjC into ObjC++. That's because the Clang expression evaluator depends on C++ features. These languages have different semantics, so if, e.g., a user forcefully wants to evaluate an expression in C, but we switch it to C++, they will most likely wonder why the expression failed (we get reports like this every once in a while...#152113 is a variation of this).

This patch rejects languages specified with expression --language that we won't be able to explicitly evaluate.

rdar://159669244


Full diff: https://github.com/llvm/llvm-project/pull/156648.diff

3 Files Affected:

  • (modified) lldb/source/Commands/CommandObjectExpression.cpp (+17-9)
  • (modified) lldb/test/API/commands/expression/calculator_mode/TestCalculatorMode.py (+1-1)
  • (modified) lldb/test/API/commands/expression/invalid-args/TestInvalidArgsExpression.py (+14-2)
diff --git a/lldb/source/Commands/CommandObjectExpression.cpp b/lldb/source/Commands/CommandObjectExpression.cpp
index 197bffe9c982f..1b951603563ac 100644
--- a/lldb/source/Commands/CommandObjectExpression.cpp
+++ b/lldb/source/Commands/CommandObjectExpression.cpp
@@ -44,18 +44,26 @@ Status CommandObjectExpression::CommandOptions::SetOptionValue(
   const int short_option = GetDefinitions()[option_idx].short_option;
 
   switch (short_option) {
-  case 'l':
+  case 'l': {
     language = Language::GetLanguageTypeFromString(option_arg);
-    if (language == eLanguageTypeUnknown) {
-      StreamString sstr;
-      sstr.Printf("unknown language type: '%s' for expression. "
-                  "List of supported languages:\n",
+    if (const LanguageSet supported_languages =
+            Language::GetLanguagesSupportingTypeSystemsForExpressions();
+        supported_languages[language])
+      break;
+
+    StreamString sstr;
+    if (language == eLanguageTypeUnknown)
+      sstr.Printf("unknown language '%s' for expression. ",
+                  option_arg.str().c_str());
+    else
+      sstr.Printf("language '%s' is currently not supported for expression "
+                  "evaluation. ",
                   option_arg.str().c_str());
 
-      Language::PrintSupportedLanguagesForExpressions(sstr, "  ", "\n");
-      error = Status(sstr.GetString().str());
-    }
-    break;
+    sstr.PutCString("List of supported languages for expressions:\n");
+    Language::PrintSupportedLanguagesForExpressions(sstr, "  ", "\n");
+    error = Status(sstr.GetString().str());
+  } break;
 
   case 'a': {
     bool success;
diff --git a/lldb/test/API/commands/expression/calculator_mode/TestCalculatorMode.py b/lldb/test/API/commands/expression/calculator_mode/TestCalculatorMode.py
index 138027507c7a7..f12b5b0a12814 100644
--- a/lldb/test/API/commands/expression/calculator_mode/TestCalculatorMode.py
+++ b/lldb/test/API/commands/expression/calculator_mode/TestCalculatorMode.py
@@ -21,7 +21,7 @@ def test__calculator_mode(self):
         )
         # Now try it with a specific language:
         self.expect(
-            "expression -l c -- 11 + 22",
+            "expression -l c++ -- 11 + 22",
             "11 + 22 didn't get the expected result",
             substrs=["33"],
         )
diff --git a/lldb/test/API/commands/expression/invalid-args/TestInvalidArgsExpression.py b/lldb/test/API/commands/expression/invalid-args/TestInvalidArgsExpression.py
index 344aef318d783..f0d1542d783b1 100644
--- a/lldb/test/API/commands/expression/invalid-args/TestInvalidArgsExpression.py
+++ b/lldb/test/API/commands/expression/invalid-args/TestInvalidArgsExpression.py
@@ -10,8 +10,20 @@ def test_invalid_lang(self):
             "expression -l foo --",
             error=True,
             substrs=[
-                "error: unknown language type: 'foo' for expression",
-                "List of supported languages:",
+                "error: unknown language 'foo' for expression.",
+                "List of supported languages for expressions:",
+                "c++",
+                "c++11",
+                "c++14",
+            ],
+        )
+
+        self.expect(
+            "expression -l c --",
+            error=True,
+            substrs=[
+                "error: language 'c' is currently not supported for expression evaluation.",
+                "List of supported languages for expressions:",
                 "c++",
                 "c++11",
                 "c++14",

@jimingham
Copy link
Collaborator

That seems weird. I would expect that the frame choosing the language and the user choosing a language would have identical effects.

@jimingham
Copy link
Collaborator

How do we explain that for my global variable class,

(lldb) expr class = 110

works when I happen to be stopped in a C frame, but if I can't find a C frame around there's no way I can replicate that behavior?

@Michael137
Copy link
Member Author

Michael137 commented Sep 4, 2025

How do we explain that for my global variable class,

(lldb) expr class = 110

works when I happen to be stopped in a C frame, but if I can't find a C frame around there's no way I can replicate that behavior?

How do we explain that for my global variable class,

(lldb) expr class = 110

works when I happen to be stopped in a C frame, but if I can't find a C frame around there's no way I can replicate that behavior?

Good question. The reason it works for your example when stopped in a C frame is that the Clang expression evaluator has a workaround in it to allow C++ keywords in pure C/ObjC expressions. If the frame (or before this patch the --language flag was C/ObjC), we explicitly "unmark" the reserved C++ keywords here:

switch (expr.Language().AsLanguageType()) {
case lldb::eLanguageTypeC:
case lldb::eLanguageTypeC89:
case lldb::eLanguageTypeC99:
case lldb::eLanguageTypeC11:
case lldb::eLanguageTypeObjC:
// This is not a C++ expression but we enabled C++ as explained above.
// Remove all C++ keywords from the PP so that the user can still use
// variables that have C++ keywords as names (e.g. 'int template;').
RemoveAllCppKeywords(m_compiler->getPreprocessor().getIdentifierTable());
break;
default:
break;
}

But that in itself still has a workaround to make C-expressions work as C++ (because we rely on using to capture local variables)

static void RemoveCppKeyword(IdentifierTable &idents, llvm::StringRef token) {
  // FIXME: 'using' is used by LLDB for local variables, so we can't remove
  // this keyword without breaking this functionality.
  if (token == "using")
    return;

But we're still lying to the user about this being pure C expression evaluation. All-in-all, in my opinion, it would be best to keep this workaround contained to just expression evaluation from C/ObjC-frames. If we're stopped in a context where these identifiers were used as variable names, presumably that frame must be a language where this was allowed in. So we would still support the majority of use-cases.

The only scenario I can think of that this breaks is that if we're stopped in a C-frame, then declare a top-level variable in the expression context with a reserved identifier in the name, and then switch to a C++ frame and try to use it:

frame #3: 0x000000010000046c a.out`func at lib.c:2:14                                                     
   1     #include <stdlib.h>             
-> 2    int func() { abort(); }                                                                           
(lldb) expr --top-level -- int namespace = 10;                                                            
(lldb) expr namespace                                
(int) $0 = 10                                                                                             
(lldb) up                                                                                                 
frame #4: 0x0000000100000454 a.out`main at main.cpp:6:5
   1    extern "C" {                                 
   2    int func();       
   3    }                                                                                                 
   4                
   5    int main() {
-> 6        func();
   7    }
(lldb) expr namespace
            ˄      
            ╰─ error: expected identifier or '{'
(lldb) down          

Is that a scenario we want to support?

@jimingham
Copy link
Collaborator

The other scenario is accessing an extant C global variable (maybe even one you don't have debug info for) called "class" or "namespace". But more importantly, having this sort of secret out that is applied inconsistently will just make lldb more confusing. It's fine to say "the support for the 'C' language is incomplete" but to treat it inconsistently just makes it harder to understand how lldb is working for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants