Skip to content

Apply NFC normalization during :string key comparison #905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions spec/registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,23 +55,14 @@ where `resolvedSelector` is the _resolved value_ of a _selector_
and `keys` is a list of strings,
the `:string` selector function performs as described below.

1. Let `compare` be the string value of `resolvedSelector`.
1. Let `compare` be the string value of `resolvedSelector`
in Unicode Normalization Form C (NFC) [\[UAX#15\]](https://www.unicode.org/reports/tr15)
1. Let `result` be a new empty list of strings.
1. For each string `key` in `keys`:
1. If `key` and `compare` consist of the same sequence of Unicode code points, then
1. Append `key` as the last element of the list `result`.
1. Return `result`.

> [!NOTE]
> Matching of `key` and `compare` values is sensitive to the sequence of code points
> in each string.
> As a result, variations in how text can be encoded can affect the performance of matching.
> The function `:string` does not perform case folding or Unicode Normalization of string values.
> Users SHOULD encode _messages_ and their parts (such as _keys_ and _operands_),
> in Unicode Normalization Form C (NFC) unless there is a very good reason
> not to.
> See also: [String Matching](https://www.w3.org/TR/charmod-norm)

> [!NOTE]
> Unquoted string literals in a _variant_ do not include spaces.
> If users wish to match strings that include whitespace
Expand All @@ -90,6 +81,11 @@ the `:string` selector function performs as described below.

The `:string` function returns the string value of the _resolved value_ of the _operand_.

> [!NOTE]
> The function `:string` does not perform Unicode Normalization of its formatted output.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "case folding or Unicode Normalization", since the original text does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since case folding would be a surprising operation for :string to perform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "... or its resolved value"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolved value isn't necessarily a string, though (if the non-normative example in the spec is followed, it would be an object with methods, one of which produces the formatted output).

> Users SHOULD encode _messages_ and their parts in Unicode Normalization Form C (NFC)
> unless there is a very good reason not to.

## Numeric Value Selection and Formatting

### The `:number` function
Expand Down