Skip to content

Add use case for source expression attribute #772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 22, 2024
Merged

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Apr 21, 2024

While working on moz.l10n, a new Python localization library that uses the MF2 message and resource data model to represent messages from a number of different current syntaxes, I've come across at least the following use cases for expression attributes:

  • In addition to supporting a limited set of HTML elements, Android String Resources may use <xliff:g> to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.
  • Web extension messages.json files allow for named placeholders that are mapped to indexed arguments. These may include an example, which is best represented in MF2 as an @example=... attribute.
  • Apple's Xcode supports localization of plural messages via .stringsdict XML files, which encode the plural variable's name as a NSStringLocalizedFormatKey value, where it appears as e.g. %#@countOfFoo@ or similar. To display only the relevant "countOfFoo" name of this variable to localizers as context, it's best to use a @source=... attribute on the selector.

The first two use cases are already documented, but the last one is not; it's added by this PR.

The overall use case of the underlying work is to make use of the MF2 data model to provide a unified representation of messages in many different syntaxes, so that e.g. validation and a UI for plural message editing can be applied to all formats, rather than needing separate parsing and handling for each.

@aphillips
Copy link
Member

Android String Resources may use xliff:g to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.

An argument can be made that this is a job for markup, since, after all, an XLIFF processor might want to directly consume the g.

Apple's Xcode supports localization of plural messages

Side thought: we probably want to chat with the Apple, Android, and MSFT folks about adopting MF2 into some of their resource/API syntaxes.

it's best to use a @source=... attribute on the selector

I'm not sure I understand the @source annotation you're proposing. Why wouldn't the caller just assign the value to a named argument to MF2 in the setup to calling the formatter? Why does the translator need to know the original name?

Apple's doc furnishes this example of the format you're talking about:

<plist version="1.0">
    <dict>
        <key>%d home(s) found</key>
        <dict>
            <key>NSStringLocalizedFormatKey</key>
            <string>%#@homes@</string>
            <key>homes</key>
            <dict>
                <key>NSStringFormatSpecTypeKey</key>
                <string>NSStringPluralRuleType</string>
                <key>NSStringFormatValueTypeKey</key>
                <string>d</string>
                <key>zero</key>
                <string>No homes found</string>
                <key>one</key>
                <string>%d home found</string>
                <key>other</key>
                <string>%d homes found</string>
            </dict>
        </dict>
    </dict>
</plist>

Isn't this represented in MF2 as:

.input {$homes :integer}
.match {$homes}
0 {{No homes found)}
one {{{$homes} home found}}
* {{{$homes} homes found}}

The %#@homes@ is needed to bind homes to the sprintf-style positional arguments (%d in the example). Presumably MF2 already does this by name.

@aphillips aphillips added syntax Issues related with syntax or ABNF design Design document or issues related to design LDML46 LDML46 Release (Tech Preview - October 2024) labels Apr 21, 2024
@eemeli
Copy link
Collaborator Author

eemeli commented Apr 21, 2024

Android String Resources may use xliff:g to wrap nontranslatable content. This is best represented in MF2 with a @translate=no attribute.

An argument can be made that this is a job for markup, since, after all, an XLIFF processor might want to directly consume the g.

I'm building a workflow where the source content can be parsed into an MF2 data model, modified, and then reserialised in the original format. So there isn't necessarily any XLIFF processor involved here, and even if there were, the use of <xliff:g> is completely custom in the Android format, and does not match with the "generic group placeholder" meaning that the XLIFF spec places on it. Hence representing the intent of the original syntax with an attribute, rather than modelling the input exactly.

it's best to use a @source=... attribute on the selector

I'm not sure I understand the @source annotation you're proposing. Why wouldn't the caller just assign the value to a named argument to MF2 in the setup to calling the formatter? Why does the translator need to know the original name?

In this case, there is no formatter involved in the workflow, so the source needs to be retained to allow for a later serialisation in the format that the iOS or MacOS formatter will be able to process. For the translator, the name of the variable can be an informative part of the message's context, and it's much clearer when lifted out of its syntax trappings.

Isn't this represented in MF2 as:

.input {$homes :integer}
.match {$homes}
0 {{No homes found)}
one {{{$homes} home found}}
* {{{$homes} homes found}}

The %#@homes@ is needed to bind homes to the sprintf-style positional arguments (%d in the example). Presumably MF2 already does this by name.

Yes, and in the MF2 representation the %#@homes@ string is needed to reliably transform the MF2 back into the corresponding stringsdict value. Sometimes it also carries a positional indicator, and other content; it's not always a %#@ prefix and @ suffix to the variable name.

@aphillips
Copy link
Member

For the translator, the name of the variable can be an informative part of the message's context, and it's much clearer when lifted out of its syntax trappings.

Agreed, but one could extract the name (and/or decorate) the name to generate the expression operand. I understand that the NSStringLocalizedFormatKey is actually a construct for enumerating what we'd call operands and aligning them with classical "placeholders". You have to parse that string in your implementation, IIUC (not having worked with it, only having glanced at the documentation).

Yes, and in the MF2 representation the %#@homes@ string is needed to reliably transform the MF2 back into the corresponding stringsdict value. Sometimes it also carries a positional indicator, and other content; it's not always a %#@ prefix and @ suffix to the variable name.

👍

So there isn't necessarily any XLIFF processor involved here, and even if there were, the use of xliff:g is completely custom in the Android format, and does not match with the "generic group placeholder" meaning that the XLIFF spec places on it. Hence representing the intent of the original syntax with an attribute, rather than modelling the input exactly.

Understood, but there is Android's processor and this does still look like markup in that context. FWIW, XLIFF elements are implemented in many different ways by different tools. So there are many dialects already.

Overall, what you're doing can obviously work. I'm just curious whether we already provide the necessary constructs.

Thought: does this suggest the need for namespaced or custom attributes? @source is fine, but maybe @moz:source would avoid conflicts with other interpretations in tooling downstream?

@eemeli
Copy link
Collaborator Author

eemeli commented Apr 22, 2024

Agreed, but one could extract the name (and/or decorate) the name to generate the expression operand. I understand that the NSStringLocalizedFormatKey is actually a construct for enumerating what we'd call operands and aligning them with classical "placeholders". You have to parse that string in your implementation, IIUC (not having worked with it, only having glanced at the documentation).

Eh, or I can just extract the relevant-to-translators bit out of it (the variable name), and leave the rest as line noise that I hide away. The "IIUC" bit that you mention is hard here, because this syntax isn't well documented, and I'm not myself 100% confident I've understood all of it.

Understood, but there is Android's processor and this does still look like markup in that context.

Yes, and in some cases like

<xliff:g><b>foo</b></xliff:g>

I do need to leave it in as markup like

{#xliff:g @translate=no}{#b}foo{/b}{/xliff:g @translate=no}

but that's less useful and less friendly to a translator or tooling than e.g. representing

<xliff:g id="user" example="Bob">%1$s</xliff:g>

as

{$user :xliff:g example=Bob @translate=no @source=|%1$s|}

Thought: does this suggest the need for namespaced or custom attributes? @source is fine, but maybe @moz:source would avoid conflicts with other interpretations in tooling downstream?

That's actually a big part of why I opened this PR. If we find agreement on what a @source attribute is supposed to mean, then I don't need to use a namespaced one.

@aphillips aphillips merged commit a037ba7 into main Apr 22, 2024
1 check passed
@aphillips aphillips deleted the source-attribute branch April 22, 2024 16:41
@mihnita
Copy link
Collaborator

mihnita commented Apr 22, 2024

Note that the way <g> the way is used in the Android files is bad.

It is meant to declare the text between <g>...</g> as non-localizable.
But in XLIFF the content between the tags is very much localizable.
The <g> is intended to use for things like <b>, <i>, and so on.


I though that "do not translate" is already representable in MF2 as "...{|don't translate this|}..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design document or issues related to design LDML46 LDML46 Release (Tech Preview - October 2024) syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants