gh-111495: improve test coverage of codecs C API #126030

picnixz · 2024-10-27T10:35:42Z

I found out that not all handlers were tested and that there was a way to check which exceptions are currently crashing. Note that I must determine whether the exception will make the handler crash directly in the Python code because if I access the attributes, I'll be using a buggy version (see #123378).

I'd like to first update the tests and then fix the handlers one by one. As I said in #123378 (comment) and #123378 (comment), just fixing the getters is not sufficient. I haven't sufficiently in the handlers themselves, but one assertion makes the replace errors handler crash so it wouldn't help just fixing the getters.

We really need to decide how to handle the start and end values of unicode errors in general but let's discuss it on #123378.

Issue: Add more C API tests #111495

encukou · 2024-10-28T13:54:58Z

Edit: please disregard this comment, I tested the wrong code!

Click to expand the original

This passes for me when I remove the may_crash machinery:

diff --git a/Lib/test/test_capi/test_codecs.py b/Lib/test/test_capi/test_codecs.py
index b764981cca6..d3af4d576f9 100644
--- a/Lib/test/test_capi/test_codecs.py
+++ b/Lib/test/test_capi/test_codecs.py
@@ -745,28 +745,6 @@ def test_codec_stream_writer(self):
                 codec_stream_writer(NULL, stream, 'strict')
 
 
-class UnsafeUnicodeEncodeError(UnicodeEncodeError):
-    def __init__(self, encoding, message, start, end, reason):
-        self.may_crash = (end - start) < 0 or (end - start) >= len(message)
-        super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeDecodeError(UnicodeDecodeError):
-    def __init__(self, encoding, message, start, end, reason):
-        # the case end - start >= len(message) does not crash
-        self.may_crash = (end - start) < 0
-        super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeTranslateError(UnicodeTranslateError):
-    def __init__(self, message, start, end, reason):
-        # <= 0 because PyCodec_ReplaceErrors tries to check the Unicode kind
-        # of a 0-length result (which is by convention PyUnicode_1BYTE_KIND
-        # and not PyUnicode_2BYTE_KIND as it currently expects)
-        self.may_crash = (end - start) <= 0 or (end - start) >= len(message)
-        super().__init__(message, start, end, reason)
-
-
 class CAPICodecErrors(unittest.TestCase):
     @classmethod
     def _generate_exceptions(cls, atomic_literal, factory, objlens):
@@ -780,19 +758,19 @@ def _generate_exceptions(cls, atomic_literal, factory, objlens):
     @classmethod
     def generate_encode_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeEncodeError('utf-8', obj, start, end, 'reason')
+            return UnicodeEncodeError('utf-8', obj, start, end, 'reason')
         return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
 
     @classmethod
     def generate_decode_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeDecodeError('utf-8', obj, start, end, 'reason')
+            return UnicodeDecodeError('utf-8', obj, start, end, 'reason')
         return tuple(cls._generate_exceptions(b'0', factory, [objlen, *objlens]))
 
     @classmethod
     def generate_translate_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeTranslateError(obj, start, end, 'reason')
+            return UnicodeTranslateError(obj, start, end, 'reason')
         return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
 
     @classmethod
@@ -889,20 +867,11 @@ def test_codec_namereplace_errors_handler(self):
         self.do_test_codec_errors_handler(handler, exceptions, bad_exceptions)
 
     def do_test_codec_errors_handler(self, handler, exceptions, bad_exceptions):
-        at_least_one = False
         for exc in exceptions:
-            # See https://github.com/python/cpython/issues/123378 and related
-            # discussion and issues for details.
-            if exc.may_crash:
-                continue
-
-            at_least_one = True
             with self.subTest(handler=handler, exc=exc):
                 # test that the handler does not crash
                 self.assertIsInstance(handler(exc), tuple)
 
-        self.assertTrue(at_least_one, "all exceptions are crashing")
-
         for bad_exc in bad_exceptions:
             with self.subTest('bad type', handler=handler, exc=bad_exc):
                 self.assertRaises(TypeError, handler, bad_exc)

What am I missing?

picnixz · 2024-10-28T13:58:17Z

Huh... I don't really know now. Are the PoCs:

./python -c "import codecs; codecs.xmlcharrefreplace_errors(UnicodeEncodeError('bad', '', 0, 1, 'reason'))"
./python -c "import codecs; codecs.replace_errors(UnicodeTranslateError('000', 1, -7, 'reason'))"

crashing for you? I am no more on my dev env so I'll have a look at it tomorrow.

EDIT 1: Ah! I actually did not test the translate errors in replace_error.
EDIT 2: I'll also cover a larger range of errors (because not all PoCs are generated).
EDIT 3:

./python -c "import codecs; codecs.backslashreplace_errors(UnicodeDecodeError('utf-8', b'00000', 9, 2, 'reason'))"

does not crash, it simply raises a SystemError due to negative size.

picnixz · 2024-10-29T08:11:39Z

@encukou I think I know: did you configure Python using --with-pydebug? the crashes are assertion-only so that's why you didn't catch them I think. You only have SystemError due to negative sizes though in release mode.

encukou · 2024-10-29T13:08:53Z

Ah, misconfiguratoin on my side. Sorry for the noise!

encukou

These look good, but I think some of the code could be made easier to read.
It's all subjective, of course. Do you find your version easier (or have particular reason to write them that way)?

Lib/test/test_capi/test_codecs.py

picnixz · 2024-10-29T13:36:37Z

It's all subjective, of course. Do you find your version easier (or have particular reason to write them that way)?

Not really, that was the first approach I had in mind :D but I'll include some of your suggestions, thanks!

encukou

Thanks!

Lib/test/test_capi/test_codecs.py

picnixz · 2024-10-31T15:34:10Z

Thanks Petr! (you just reviewed when I left my dev session...)

Lib/test/test_capi/test_codecs.py

Co-authored-by: Petr Viktorin <encukou@gmail.com>

For now, skip some crashers (tracked in pythongh-123378).

improve test coverage of codecs C API

8e3fdfe

picnixz added the skip news label Oct 27, 2024

bedevere-app bot added tests Tests in the Lib/test dir awaiting review labels Oct 27, 2024

bedevere-app bot mentioned this pull request Oct 27, 2024

Add more C API tests #111495

Closed

10 tasks

picnixz requested review from vstinner and encukou October 27, 2024 10:36

encukou reviewed Oct 29, 2024

View reviewed changes

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

picnixz added 3 commits October 29, 2024 14:46

address Petr's review

03959c4

Simplify tests

b993a2e

Simplify tests

69d0643

encukou approved these changes Oct 31, 2024

View reviewed changes

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting review labels Oct 31, 2024

picnixz commented Oct 31, 2024

View reviewed changes

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

Update Lib/test/test_capi/test_codecs.py

05f75b6

encukou reviewed Nov 1, 2024

View reviewed changes

Lib/test/test_capi/test_codecs.py Outdated Show resolved Hide resolved

Update Lib/test/test_capi/test_codecs.py

c36232a

Co-authored-by: Petr Viktorin <encukou@gmail.com>

encukou enabled auto-merge (squash) November 1, 2024 13:14

encukou merged commit 32e07fd into python:main Nov 1, 2024
34 checks passed

bedevere-app bot removed the awaiting merge label Nov 1, 2024

picnixz deleted the fix/c-api-codecs-test-111495 branch November 1, 2024 13:32

picnixz added a commit to picnixz/cpython that referenced this pull request Dec 8, 2024

pythongh-111495: improve test coverage of codecs C API (pythonGH-126030)

629634a

For now, skip some crashers (tracked in pythongh-123378).

ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025

pythongh-111495: improve test coverage of codecs C API (pythonGH-126030)

2ea6baa

For now, skip some crashers (tracked in pythongh-123378).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-111495: improve test coverage of codecs C API #126030

gh-111495: improve test coverage of codecs C API #126030

Uh oh!

picnixz commented Oct 27, 2024 •

edited by bedevere-app bot

Loading

Uh oh!

encukou commented Oct 28, 2024 •

edited

Loading

Uh oh!

picnixz commented Oct 28, 2024 •

edited

Loading

Uh oh!

picnixz commented Oct 29, 2024

Uh oh!

encukou commented Oct 29, 2024

Uh oh!

encukou left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz commented Oct 29, 2024

Uh oh!

encukou left a comment

Uh oh!

Uh oh!

Uh oh!

picnixz commented Oct 31, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-111495: improve test coverage of codecs C API #126030

gh-111495: improve test coverage of codecs C API #126030

Uh oh!

Conversation

picnixz commented Oct 27, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

encukou commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Oct 29, 2024

Uh oh!

encukou commented Oct 29, 2024

Uh oh!

encukou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz commented Oct 29, 2024

Uh oh!

encukou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

picnixz commented Oct 31, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz commented Oct 27, 2024 •

edited by bedevere-app bot

Loading

encukou commented Oct 28, 2024 •

edited

Loading

picnixz commented Oct 28, 2024 •

edited

Loading