-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-37476: Adding a unit test of unicode in test_unicode.py #14531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lib/test/test_unicode.py
Outdated
|
||
self.assertEqual(unicode_asutf8('abc'), 'abc') | ||
self.assertEqual(unicode_asutf8('abc\0'), 'abc') | ||
self.assertEqual(unicode_asutf8('abc\0abc'), 'abc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more like you are testing decodeUTF8
. I'd suggest add more encode samples, BMP, non-BMP, just take from other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, xiang core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add tests, please test also PyUnicode_AsUTF8AndSize() which would allow embedded NUL characters/bytes.
Lib/test/test_unicode.py
Outdated
def test_asutf8(self): | ||
from _testcapi import unicode_asutf8 | ||
|
||
self.assertEqual(unicode_asutf8('abc'), 'abc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function encodes to UTF-8, so the result should be a bytes string, not Unicode. Use PyString_FromString() rather than PyUnicode_FromString() in _testcapi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The others similar C API tests also return unicodes other than bytes which in some sense also tests the counter part decoding function, though I don't very understand why designed like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either fix other tests, or leave them unchanged. But I would prefer to not add new buggy tests :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So returning the decoded bytes string is right way? I am not sure I understand it clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return bytes, not unicodes. Test decode functionality separately.
np, I would continue to add test. |
FYI, I cancelled the Travis CI run to free some resources for upcoming 3.8b2 and 3.7.4rc2 releases. |
@vstinner hi, victor. Pls help me review this patch again, thanks ;) |
https://bugs.python.org/issue37476