gh-66428 Stop including all bidirectional "B" characters in line breakers #132369
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is not a direct fix to the issue (as it is not clear how/if it should be fixed at this point), but to this comment:
And the answer by @malemburg
It turns out that
LineBreak.txt
is already used to gather the line break categories, but thatB
was still kept as indicating a line breaker. Looking at the current state of Unicode (16.0)B
that are non-tailorable line breakers are in the BK, CR, LF or NL categories, so they were already captured.B
(U+001C, U+001D, U+001E) are all combining marks and shouldn't break lines.So I just removed the
bidirectional == "B"
condition in the code generating the list of line breakers and that should be it.I think from an API perspective, this only affects
str.splitlines()
, which at this point is only tested for behaviour against CR, LF and CR+LF and no other line breaker, so I didn't add any test, but I can if it seems useful.In general, I don't expect this to be a huge compatility-breaking change given the conversation in #66428, but don't really know how to check for that apart from searching for the codepoints (in U+ and
\x
forms) on Github, which didn't return any Python code that would be broken.This is my first PR, so it's very likely I missed something, please let me know!
📚 Documentation preview 📚: https://cpython-previews--132369.org.readthedocs.build/