Skip to content

gh-111495: improve test coverage of codecs C API #126030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 1, 2024

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Oct 27, 2024

I found out that not all handlers were tested and that there was a way to check which exceptions are currently crashing. Note that I must determine whether the exception will make the handler crash directly in the Python code because if I access the attributes, I'll be using a buggy version (see #123378).

I'd like to first update the tests and then fix the handlers one by one. As I said in #123378 (comment) and #123378 (comment), just fixing the getters is not sufficient. I haven't sufficiently in the handlers themselves, but one assertion makes the replace errors handler crash so it wouldn't help just fixing the getters.

We really need to decide how to handle the start and end values of unicode errors in general but let's discuss it on #123378.

@bedevere-app bedevere-app bot added tests Tests in the Lib/test dir awaiting review labels Oct 27, 2024
@bedevere-app bedevere-app bot mentioned this pull request Oct 27, 2024
10 tasks
@picnixz picnixz requested review from vstinner and encukou October 27, 2024 10:36
@encukou
Copy link
Member

encukou commented Oct 28, 2024

Edit: please disregard this comment, I tested the wrong code!

Click to expand the original

This passes for me when I remove the may_crash machinery:

diff --git a/Lib/test/test_capi/test_codecs.py b/Lib/test/test_capi/test_codecs.py
index b764981cca6..d3af4d576f9 100644
--- a/Lib/test/test_capi/test_codecs.py
+++ b/Lib/test/test_capi/test_codecs.py
@@ -745,28 +745,6 @@ def test_codec_stream_writer(self):
                 codec_stream_writer(NULL, stream, 'strict')
 
 
-class UnsafeUnicodeEncodeError(UnicodeEncodeError):
-    def __init__(self, encoding, message, start, end, reason):
-        self.may_crash = (end - start) < 0 or (end - start) >= len(message)
-        super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeDecodeError(UnicodeDecodeError):
-    def __init__(self, encoding, message, start, end, reason):
-        # the case end - start >= len(message) does not crash
-        self.may_crash = (end - start) < 0
-        super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeTranslateError(UnicodeTranslateError):
-    def __init__(self, message, start, end, reason):
-        # <= 0 because PyCodec_ReplaceErrors tries to check the Unicode kind
-        # of a 0-length result (which is by convention PyUnicode_1BYTE_KIND
-        # and not PyUnicode_2BYTE_KIND as it currently expects)
-        self.may_crash = (end - start) <= 0 or (end - start) >= len(message)
-        super().__init__(message, start, end, reason)
-
-
 class CAPICodecErrors(unittest.TestCase):
     @classmethod
     def _generate_exceptions(cls, atomic_literal, factory, objlens):
@@ -780,19 +758,19 @@ def _generate_exceptions(cls, atomic_literal, factory, objlens):
     @classmethod
     def generate_encode_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeEncodeError('utf-8', obj, start, end, 'reason')
+            return UnicodeEncodeError('utf-8', obj, start, end, 'reason')
         return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
 
     @classmethod
     def generate_decode_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeDecodeError('utf-8', obj, start, end, 'reason')
+            return UnicodeDecodeError('utf-8', obj, start, end, 'reason')
         return tuple(cls._generate_exceptions(b'0', factory, [objlen, *objlens]))
 
     @classmethod
     def generate_translate_errors(cls, objlen, *objlens):
         def factory(obj, start, end):
-            return UnsafeUnicodeTranslateError(obj, start, end, 'reason')
+            return UnicodeTranslateError(obj, start, end, 'reason')
         return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
 
     @classmethod
@@ -889,20 +867,11 @@ def test_codec_namereplace_errors_handler(self):
         self.do_test_codec_errors_handler(handler, exceptions, bad_exceptions)
 
     def do_test_codec_errors_handler(self, handler, exceptions, bad_exceptions):
-        at_least_one = False
         for exc in exceptions:
-            # See https://github.com/python/cpython/issues/123378 and related
-            # discussion and issues for details.
-            if exc.may_crash:
-                continue
-
-            at_least_one = True
             with self.subTest(handler=handler, exc=exc):
                 # test that the handler does not crash
                 self.assertIsInstance(handler(exc), tuple)
 
-        self.assertTrue(at_least_one, "all exceptions are crashing")
-
         for bad_exc in bad_exceptions:
             with self.subTest('bad type', handler=handler, exc=bad_exc):
                 self.assertRaises(TypeError, handler, bad_exc)

What am I missing?

@picnixz
Copy link
Member Author

picnixz commented Oct 28, 2024

Huh... I don't really know now. Are the PoCs:

./python -c "import codecs; codecs.xmlcharrefreplace_errors(UnicodeEncodeError('bad', '', 0, 1, 'reason'))"
./python -c "import codecs; codecs.replace_errors(UnicodeTranslateError('000', 1, -7, 'reason'))"

crashing for you? I am no more on my dev env so I'll have a look at it tomorrow.

EDIT 1: Ah! I actually did not test the translate errors in replace_error.
EDIT 2: I'll also cover a larger range of errors (because not all PoCs are generated).
EDIT 3:

./python -c "import codecs; codecs.backslashreplace_errors(UnicodeDecodeError('utf-8', b'00000', 9, 2, 'reason'))"

does not crash, it simply raises a SystemError due to negative size.

@picnixz
Copy link
Member Author

picnixz commented Oct 29, 2024

@encukou I think I know: did you configure Python using --with-pydebug? the crashes are assertion-only so that's why you didn't catch them I think. You only have SystemError due to negative sizes though in release mode.

@encukou
Copy link
Member

encukou commented Oct 29, 2024

Ah, misconfiguratoin on my side. Sorry for the noise!

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good, but I think some of the code could be made easier to read.
It's all subjective, of course. Do you find your version easier (or have particular reason to write them that way)?

@picnixz
Copy link
Member Author

picnixz commented Oct 29, 2024

It's all subjective, of course. Do you find your version easier (or have particular reason to write them that way)?

Not really, that was the first approach I had in mind :D but I'll include some of your suggestions, thanks!

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@picnixz
Copy link
Member Author

picnixz commented Oct 31, 2024

Thanks Petr! (you just reviewed when I left my dev session...)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
@encukou encukou enabled auto-merge (squash) November 1, 2024 13:14
@encukou encukou merged commit 32e07fd into python:main Nov 1, 2024
34 checks passed
@picnixz picnixz deleted the fix/c-api-codecs-test-111495 branch November 1, 2024 13:32
picnixz added a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants