-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-111495: improve test coverage of codecs C API #126030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Edit: please disregard this comment, I tested the wrong code! Click to expand the originalThis passes for me when I remove the diff --git a/Lib/test/test_capi/test_codecs.py b/Lib/test/test_capi/test_codecs.py
index b764981cca6..d3af4d576f9 100644
--- a/Lib/test/test_capi/test_codecs.py
+++ b/Lib/test/test_capi/test_codecs.py
@@ -745,28 +745,6 @@ def test_codec_stream_writer(self):
codec_stream_writer(NULL, stream, 'strict')
-class UnsafeUnicodeEncodeError(UnicodeEncodeError):
- def __init__(self, encoding, message, start, end, reason):
- self.may_crash = (end - start) < 0 or (end - start) >= len(message)
- super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeDecodeError(UnicodeDecodeError):
- def __init__(self, encoding, message, start, end, reason):
- # the case end - start >= len(message) does not crash
- self.may_crash = (end - start) < 0
- super().__init__(encoding, message, start, end, reason)
-
-
-class UnsafeUnicodeTranslateError(UnicodeTranslateError):
- def __init__(self, message, start, end, reason):
- # <= 0 because PyCodec_ReplaceErrors tries to check the Unicode kind
- # of a 0-length result (which is by convention PyUnicode_1BYTE_KIND
- # and not PyUnicode_2BYTE_KIND as it currently expects)
- self.may_crash = (end - start) <= 0 or (end - start) >= len(message)
- super().__init__(message, start, end, reason)
-
-
class CAPICodecErrors(unittest.TestCase):
@classmethod
def _generate_exceptions(cls, atomic_literal, factory, objlens):
@@ -780,19 +758,19 @@ def _generate_exceptions(cls, atomic_literal, factory, objlens):
@classmethod
def generate_encode_errors(cls, objlen, *objlens):
def factory(obj, start, end):
- return UnsafeUnicodeEncodeError('utf-8', obj, start, end, 'reason')
+ return UnicodeEncodeError('utf-8', obj, start, end, 'reason')
return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
@classmethod
def generate_decode_errors(cls, objlen, *objlens):
def factory(obj, start, end):
- return UnsafeUnicodeDecodeError('utf-8', obj, start, end, 'reason')
+ return UnicodeDecodeError('utf-8', obj, start, end, 'reason')
return tuple(cls._generate_exceptions(b'0', factory, [objlen, *objlens]))
@classmethod
def generate_translate_errors(cls, objlen, *objlens):
def factory(obj, start, end):
- return UnsafeUnicodeTranslateError(obj, start, end, 'reason')
+ return UnicodeTranslateError(obj, start, end, 'reason')
return tuple(cls._generate_exceptions('0', factory, [objlen, *objlens]))
@classmethod
@@ -889,20 +867,11 @@ def test_codec_namereplace_errors_handler(self):
self.do_test_codec_errors_handler(handler, exceptions, bad_exceptions)
def do_test_codec_errors_handler(self, handler, exceptions, bad_exceptions):
- at_least_one = False
for exc in exceptions:
- # See https://github.com/python/cpython/issues/123378 and related
- # discussion and issues for details.
- if exc.may_crash:
- continue
-
- at_least_one = True
with self.subTest(handler=handler, exc=exc):
# test that the handler does not crash
self.assertIsInstance(handler(exc), tuple)
- self.assertTrue(at_least_one, "all exceptions are crashing")
-
for bad_exc in bad_exceptions:
with self.subTest('bad type', handler=handler, exc=bad_exc):
self.assertRaises(TypeError, handler, bad_exc) What am I missing? |
Huh... I don't really know now. Are the PoCs: ./python -c "import codecs; codecs.xmlcharrefreplace_errors(UnicodeEncodeError('bad', '', 0, 1, 'reason'))"
./python -c "import codecs; codecs.replace_errors(UnicodeTranslateError('000', 1, -7, 'reason'))" crashing for you? I am no more on my dev env so I'll have a look at it tomorrow. EDIT 1: Ah! I actually did not test the translate errors in ./python -c "import codecs; codecs.backslashreplace_errors(UnicodeDecodeError('utf-8', b'00000', 9, 2, 'reason'))" does not crash, it simply raises a SystemError due to negative size. |
@encukou I think I know: did you configure Python using |
Ah, misconfiguratoin on my side. Sorry for the noise! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look good, but I think some of the code could be made easier to read.
It's all subjective, of course. Do you find your version easier (or have particular reason to write them that way)?
Not really, that was the first approach I had in mind :D but I'll include some of your suggestions, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks Petr! (you just reviewed when I left my dev session...) |
Co-authored-by: Petr Viktorin <encukou@gmail.com>
For now, skip some crashers (tracked in pythongh-123378).
For now, skip some crashers (tracked in pythongh-123378).
I found out that not all handlers were tested and that there was a way to check which exceptions are currently crashing. Note that I must determine whether the exception will make the handler crash directly in the Python code because if I access the attributes, I'll be using a buggy version (see #123378).
I'd like to first update the tests and then fix the handlers one by one. As I said in #123378 (comment) and #123378 (comment), just fixing the getters is not sufficient. I haven't sufficiently in the handlers themselves, but one assertion makes the replace errors handler crash so it wouldn't help just fixing the getters.
We really need to decide how to handle the start and end values of unicode errors in general but let's discuss it on #123378.