Skip to content

str: proper titlecase support #832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 20, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions vm/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ unicode-segmentation = "1.2.1"
lazy_static = "^1.0.1"
lexical = "2.0.0"
itertools = "^0.8.0"

# TODO: release and publish to crates.io
[dependencies.unicode-casing]
git = "https://github.com/OddCoincidence/unicode-casing"
rev = "90d6d1f02b9cc04ffb55a5f1c3fa1455a84231fb"
9 changes: 6 additions & 3 deletions vm/src/obj/objstr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use std::str::FromStr;
use std::string::ToString;

use num_traits::ToPrimitive;
use unicode_casing::CharExt;
use unicode_segmentation::UnicodeSegmentation;

use crate::format::{FormatParseError, FormatPart, FormatString};
Expand Down Expand Up @@ -413,12 +414,12 @@ impl PyString {
for c in self.value.chars() {
if c.is_lowercase() {
if !previous_is_cased {
title.extend(c.to_uppercase());
title.extend(c.to_titlecase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a single character be title cases? I guess something else is meant than python's titlecase function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprised me too, but there indeed exist single codepoint titlecase characters (I realized this from reading the cpython tests for title / istitle). Some examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Latin Capital Letter D with Small Letter Z with Caron". Unicode is never boring! :)

} else {
title.push(c);
}
previous_is_cased = true;
} else if c.is_uppercase() {
} else if c.is_uppercase() || c.is_titlecase() {
if previous_is_cased {
title.extend(c.to_lowercase());
} else {
Expand Down Expand Up @@ -652,7 +653,7 @@ impl PyString {
let mut cased = false;
let mut previous_is_cased = false;
for c in self.value.chars() {
if c.is_uppercase() {
if c.is_uppercase() || c.is_titlecase() {
if previous_is_cased {
return false;
}
Expand Down Expand Up @@ -1050,6 +1051,7 @@ mod tests {
("Format,This-As*Title;String", "fOrMaT,thIs-aS*titLe;String"),
("Getint", "getInt"),
("Greek Ωppercases ...", "greek ωppercases ..."),
("Greek ῼitlecases ...", "greek ῳitlecases ..."),
];
for (title, input) in tests {
assert_eq!(PyString::from(input).title(&vm).as_str(), title);
Expand All @@ -1066,6 +1068,7 @@ mod tests {
"A\nTitlecased Line",
"A Titlecased, Line",
"Greek Ωppercases ...",
"Greek ῼitlecases ...",
];

for s in pos {
Expand Down