Skip to content

Conversation

DeinAlptraum
Copy link
Contributor

Add tests to ensure that all C-enum variants are defined on Python side.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:as-a-library libclang and C++ API labels Jun 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2025

@llvm/pr-subscribers-clang

Author: Jannick Kremer (DeinAlptraum)

Changes

Add tests to ensure that all C-enum variants are defined on Python side.


Full diff: https://github.com/llvm/llvm-project/pull/143264.diff

1 Files Affected:

  • (modified) clang/bindings/python/tests/cindex/test_enums.py (+48-1)
diff --git a/clang/bindings/python/tests/cindex/test_enums.py b/clang/bindings/python/tests/cindex/test_enums.py
index 9e7f44fcf7867..54c0c77e4df29 100644
--- a/clang/bindings/python/tests/cindex/test_enums.py
+++ b/clang/bindings/python/tests/cindex/test_enums.py
@@ -1,4 +1,5 @@
 import unittest
+from pathlib import Path
 
 from clang.cindex import (
     AccessSpecifier,
@@ -12,6 +13,7 @@
     TemplateArgumentKind,
     TLSKind,
     TokenKind,
+    TranslationUnit,
     TypeKind,
 )
 
@@ -44,8 +46,53 @@ def test_from_id(self):
 
     def test_duplicate_ids(self):
         """Check that no two kinds have the same id"""
-        # for enum in self.enums:
         for enum in self.enums:
             num_declared_variants = len(enum._member_map_.keys())
             num_unique_variants = len(list(enum))
             self.assertEqual(num_declared_variants, num_unique_variants)
+
+    def test_all_variants(self):
+        """Check that all libclang enum values are also defined in cindex"""
+        cenum_to_pythonenum = {
+            "CX_CXXAccessSpecifier": AccessSpecifier,
+            "CXAvailabilityKind": AvailabilityKind,
+            "CXBinaryOperatorKind": BinaryOperator,
+            "CXCursorKind": CursorKind,
+            "CXCursor_ExceptionSpecificationKind": ExceptionSpecificationKind,
+            "CXLinkageKind": LinkageKind,
+            "CXRefQualifierKind": RefQualifierKind,
+            "CX_StorageClass": StorageClass,
+            "CXTemplateArgumentKind": TemplateArgumentKind,
+            "CXTLSKind": TLSKind,
+            "CXTokenKind": TokenKind,
+            "CXTypeKind": TypeKind,
+        }
+
+        indexheader = (
+            Path(__file__).parent.parent.parent.parent.parent
+            / "include/clang-c/Index.h"
+        )
+        tu = TranslationUnit.from_source(indexheader, ["-x", "c++"])
+
+        enum_variant_map = {}
+        # For all enums in self.enums, extract all enum variants defined in Index.h
+        for cursor in tu.cursor.walk_preorder():
+            type_class = cenum_to_pythonenum.get(cursor.type.spelling)
+            if (
+                cursor.kind == CursorKind.ENUM_CONSTANT_DECL
+                and type_class in self.enums
+            ):
+                if type_class not in enum_variant_map:
+                    enum_variant_map[type_class] = []
+                enum_variant_map[type_class].append(cursor.enum_value)
+
+        for enum in self.enums:
+            with self.subTest(enum):
+                python_kinds = set([kind.value for kind in enum])
+                c_kinds = set(enum_variant_map[enum])
+                missing_python_kinds = c_kinds - python_kinds
+                self.assertEqual(
+                    missing_python_kinds,
+                    set(),
+                    f"Please ensure these variants are defined inside {enum} in cindex.py.",
+                )

@DeinAlptraum DeinAlptraum requested a review from Endilll June 7, 2025 14:29
@DeinAlptraum
Copy link
Contributor Author

DeinAlptraum commented Jun 7, 2025

@Endilll this adds the missing enum variants, and also adds a test using libclang, parsing Index.h for all our used enums to check for missing variants. You can see what this looks like in case of failure on the first CI run for this PR.

This uses only the numeric enum values for comparisons since this is easier to implement (especially because some variants have aliases on the C-side)

Sidenote: I've noticed that a few enum variants have inconsistent namings, e.g. all Python enum variants are fully uppercase, execpt for StmtExpr. There's also CXCursor_CompoundAssignOperator which corresponds to COMPOUND_ASSIGNMENT_OPERATOR, i.e. slightly different names. I wonder if we should adapt these, but it might not be worth the breaking change.

This is probably by far our slowest test, responsible for about 20% of the entire test suite's runtime, but since all tests together take like 500ms that's acceptable imo

Add tests to ensure that all C-enum variants are defined on Python side.
Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry it took me so long to get to this

for enum in self.enums:
num_declared_variants = len(enum._member_map_.keys())
num_unique_variants = len(list(enum))
self.assertEqual(num_declared_variants, num_unique_variants)

def test_all_variants(self):
"""Check that all libclang enum values are also defined in cindex"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Check that all libclang enum values are also defined in cindex"""
"""Check that all libclang enum values are also defined in cindex.py"""

USHORTACCUM = 36
UACCUM = 37
ULONGACCUM = 38
BFLOAT16 = 39
IBM128 = 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see constants that correspond to CXType_FirstBuiltin and CXType_LastBuiltin. Is this intentional?

Path(__file__).parent.parent.parent.parent.parent
/ "include/clang-c/Index.h"
)
tu = TranslationUnit.from_source(indexheader, ["-x", "c++"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would make any practical difference, but maybe we still should parse C header in C language mode to make it less surprising

"CXCursor_ExceptionSpecificationKind": ExceptionSpecificationKind,
"CXLinkageKind": LinkageKind,
"CXRefQualifierKind": RefQualifierKind,
"CX_StorageClass": StorageClass,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys in this dictionary seem to be sorted lexicographically, except for this one. I'd expect it to go second instead of being put in the middle of the list. The underscore after CX is another example of consistent naming...

enum_variant_map = {}
# For all enums in self.enums, extract all enum variants defined in Index.h
for cursor in tu.cursor.walk_preorder():
type_class = cenum_to_pythonenum.get(cursor.type.spelling)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type_class = cenum_to_pythonenum.get(cursor.type.spelling)
python_enum = cenum_to_pythonenum.get(cursor.type.spelling)

Otherwise it's not clear why cenum_to_pythonenum gives you type class, which for Clang contributors usually mean something else (the kind of AST Type node).

type_class = cenum_to_pythonenum.get(cursor.type.spelling)
if (
cursor.kind == CursorKind.ENUM_CONSTANT_DECL
and type_class in self.enums
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that (I think) all enums inherit from BaseEnumeration, can we replace the registry of enums that contributors and reviewers are prone to forget about, and inspect the module instead?

This code ignores enums that are present in Index.h but not in self.enums. I'd prefer to have a test that fails if libclang and Python wrapper are out of sync on declared enums.

with self.subTest(enum):
python_kinds = set([kind.value for kind in enum])
c_kinds = set(enum_variant_map[enum])
missing_python_kinds = c_kinds - python_kinds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you resort to comparing the number of enumerators, because we can't establish mapping between individual enumerators by name? Having names of missing enumerators would make test failure even more actionable.

@Endilll
Copy link
Contributor

Endilll commented Sep 3, 2025

Sidenote: I've noticed that a few enum variants have inconsistent namings, e.g. all Python enum variants are fully uppercase, execpt for StmtExpr. There's also CXCursor_CompoundAssignOperator which corresponds to COMPOUND_ASSIGNMENT_OPERATOR, i.e. slightly different names. I wonder if we should adapt these, but it might not be worth the breaking change.

Ideally I'd have enumerators with both correct and incorrect spelling, mark incorrect as deprecated, and phase them out, but warning.deprecated() is very helpfully a function decorator, so the typical way of doing that seems unavailable to us. Moreover, I see you judge completeness of enums by the number of enumerators, so we'll have to be careful about that, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants