Skip to content

Define @attributes on expressions #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Aug 14, 2023

Fixes #426

As discussed during today's call, we have use cases for attributes that pertain to an expression, rather than acting as function options. The examples presented in #426 refer just to the locale of an expression, but other valuable attributes include the translatability of an expression, and attributes such as canCopy, canDelete, canOverlap, and others defined in the XLIFF 2.1 spec. Many of these do not have an impact on message formatting, but can be very important for translation work.

This PR proposes the following syntax for this, using the examples included in the PR:

A message in English including a French literal that should not be translated:

{In French, "{|bonjour| @locale=fr @translate=no}" is a greeting}

A message with a $date variable formatted with the :datetime function using the root "und" locale:

{Today is {$date :datetime weekday=long @locale=und}.}

A message with a $start variable that should in translations be kept as the first element of the message.

{{$start @can-reorder=no} is the beginning.}

Effectively, after the function options, expression attributes may be defined with the same option syntax but with @ as a prefix for each name. The following attributes are explicitly defined:

  • @locale: a sequence of IETF BCP 47 language tags
  • @translate: a literal value of either yes or no.

All other custom attributes must include a hyphen - in their name (such as @can-reorder above); everything else is ignored. Mismatching names can't be treated as a syntax error, because that would limit our later ability in a 2.1 spec version to define more attributes.

This PR also unreserves the @ character and updates the data model descriptions as necessary.

@eemeli eemeli added the syntax Issues related with syntax or ABNF label Aug 14, 2023
@eemeli eemeli requested review from aphillips, stasm and mihnita August 14, 2023 21:43
@macchiati
Copy link
Member

The @can-reorder=no doesn't look very useful. Not a blocker for this PR, but it should have a better example in the future.

It isn't clear what it would mean, but I'm guessing:

in the translation,

  • all placeholders that appear before this placeholder appear before this placeholder in the original.
  • all placeholders that appear after this placeholder appear after this placeholder in the original.

So if I had some paired placeholders like the following (pseudocode, cr = can-reorder), the translator couldn't reorder the text as needed for their language

.x. {bold-start @cr=no} .y. {bold-end @cr=no} .z.  {italic-start @cr=no} .w. {italic-end @cr=no} .u.

The translator couldn't reorder the parts, eg

.X. {italic-start @cr=no} .W. {italic-end @cr=no} .U. {bold-start @cr=no} .Y. {bold-end @cr=no} .Z.  

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start.

Be careful of whether it is @language or @Locale.

I think this is too opinionated about usage and needs to be slightly genericized.

See comments about namespacing. I think checking the interior of the name is too much work? I also think that we should explicitly define what implementations are allowed to define for themselves.

One normative thing is that unrecognized items are ignored, but does that mean that they are not passed downstream? Are they deleted from the data model, for example? We know how to process them, just not what they mean...

@@ -105,6 +105,9 @@ one of the following is used to resolve the value of the _expression_:
- Else, the _expression_ has a _reserved_ _annotation_,
an Unsupported Expression error is emitted and a fallback value is used as its value.

_Expression attribute resolution_ defines the handling of _expression attributes_
that have an impact on _message_ formatting.
Copy link
Member

@aphillips aphillips Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of the sentence doesn't belong. The attributes need to be resolved if they are present. Whether they have an impact on formatting is up to the functions or processing involved in the formatting itself. Also, it should be noted that the attributes almost certainly "pass down" to parts during formatting (so that the caller can access them, if needed).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about attributes getting passed as such to formatting or selection functions. If that makes sense, then why not pass them as function options?

Locale is special because

  1. It's already handled as special by existing formatters, and
  2. it may get a separate field in the formatted-parts output.

I can see how e.g. timezone could be handled similarly specially by setting up some aspect of the environment before a formatter call rather than considering it as a function option, but for that I'd expect it to be a thing that the implementation does, and which affects the formatter only indirectly.

The attributes need to be resolved if they are present.

That's not really in line with what we're saying about other ignored code. For example, we allow an implementation to ignore errors in expressions that aren't needed for the current formatting. Similarly, we should ignore errors in attributes that don't impact the formatting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attributes need to be resolved if they are present.

That's not really in line with what we're saying about other ignored code. For example, we allow an implementation to ignore errors in expressions that aren't needed for the current formatting. Similarly, we should ignore errors in attributes that don't impact the formatting.

The attributes need to be parsed, even if they are not recognized. The parser does not necessarily know what is supported downstream or in an implementation.

Similarly, we should ignore errors in attributes that don't impact the formatting.

Depends on what "error" means. If there is a syntax error, we can't ignore that, no? (some of this is about what "we" means here: It is up to the function to decide if an attribute is "bad" or just to ignore it. "Our" job is to get the options and variables to the function.)

bind the _name_ of the _option_ to the resolved value in the mapping.
- Otherwise, do not bind the _name_ of the _option_ to any value in the mapping.
4. Determine the _expression_ _locale_.
If the _expression_ has a valid `@locale` _expression attribute_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "valid" here. Is @locale=|hello world| valid? What about @locale=dead-beef (sorta BCP47-like)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's defined in the expression attribute resolution section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that "valid" isn't clearly attached to @Locale or expression attribute.

If this were written upside down, it might be clearer, e.g.

Let expressionLocale be the message locale from the formatting context.
If the expression contains a well-formed @locale expression attribute, let expressionLocale be that value.

Otherwise, look up the _message_ _locale_ from the _formatting context_.
5. Call the function implementation with the following arguments:

- The _expression_ _locale_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expose the expression attributes here also. Locale may not be the only one--I certainly have timezone in mind and other attributes may be provided by the implementation or our specification that affect the formatter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the JS Intl.DateTimeFormat, timeZone is a regular option. Should we not handle it as such?

For other attributes, there is this below:

Implementations MAY assign meaning during formatting to other expression attributes

Should the text here make a reference to that as well, or is that implicit permission sufficiently clear?

Copy link
Member

@aphillips aphillips Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other attributes, there is this below:

Implementations MAY assign meaning during formatting to other expression attributes

Should the text here make a reference to that as well, or is that implicit permission sufficiently clear?

You're permitting implementations to assign meaning, but here I'm suggesting that functions can assign meaning to attribute values, including those unknown to the implementation. They can only do this if they have access to the values, of course.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would such a value be defined in the syntax as an expression attribute rather than as a function option?

Comment on lines +564 to +565
To ensure future extensibility of this specification,
all other _expression attributes_ with a _name_ not containing `-` MUST be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just allowing unrecognized attributes to be ignored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then we could not later assign a meaning to them without potentially breaking user code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserving a namespace is a separate thing from ignoring unrecognized values. These can work in tandem. An MF2.0 implementation would ignore the unrecognized value @foo that MF2.1 unreserves, as well as @_bar that some other proprietary implementation is using.

Alternatively (and as I suggested elsewhere), as long as the values are syntactically correct, the implementation can just parse the values without imputing meaning to them and pass them. It only has to apply validation rules to attributes it "knows" (and do we define an error for that???)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following hypothetical, in a world where we allow implementations to assign their own meaning to any attributes not explicitly defined in the spec like @locale and @translate:

Let's say that an implementation defines @fallback to provide a value that should be used for a placeholder, should its formatting fail for whatever reason. This doesn't really make sense as a function option, because it's covering the behaviour of what happens if the function throws an error. Other implementations could just ignore this attribute, and use the spec-defined fallback representation.

Then let's say that in a 2.1 version of the MF spec we'd like to define @fallback to mean a fallback locale to use, if the function doesn't support the message locale, or the one set by @locale. This hypothetical attribute would have the same semantics as @locale, but it would append values to the language priority list rather than replacing it.

But in this world we could not do that without potentially breaking userspace, because an implementation as defined above could exist, along with a pre-existing body of messages using @fallback with a completely different meaning. In this case, the meanings of "fallback" are quite different, but even should they align, it's very likely that the detailed implementation and requirements could differ for the attributes. In other words, we would not be able to define any future attributes in the spec without a danger of breaking userspace.

@aphillips aphillips added the formatting Issue pertains to the formatting section of the spec label Aug 14, 2023
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
Comment on lines +108 to +109
The `attributes` of an `Expression` primarily represent translator hints,
but could include ones such as `locale` that do have an impact on _message_ formatting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remarked on this before: this is too opinionated. Just because you are thinking of translator hints at this moment does not make them them the sole or even primary reason for this feature.

I suspect we should take a moment to agree on requirements/design before adding a feature like this: we should work as efficiently as possible, but not headlong.

bind the _name_ of the _option_ to the resolved value in the mapping.
- Otherwise, do not bind the _name_ of the _option_ to any value in the mapping.
4. Determine the _expression_ _locale_.
If the _expression_ has a valid `@locale` _expression attribute_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that "valid" isn't clearly attached to @Locale or expression attribute.

If this were written upside down, it might be clearer, e.g.

Let expressionLocale be the message locale from the formatting context.
If the expression contains a well-formed @locale expression attribute, let expressionLocale be that value.

Comment on lines +256 to +257
If a `@locale` has an invalid value,
emit a Resolution error and ignore the _expression attribute_ in further processing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to be careful here. Does "invalid" include the language tag's validity? That is, are we requiring that the implementation check the tag's contents? I think that's a job for e.g. Intl.Locale.

If you made the change I suggested on the previous paragraph, some of this paragraph would become irrelevant (we'd have already said that the value was a sequence of language tags). Then this paragraph could say:

Suggested change
If a `@locale` has an invalid value,
emit a Resolution error and ignore the _expression attribute_ in further processing.
When `@locale` is assigned an empty value or the resolved value consists of anything other than a list of well-formed language tags, emit a Resolution error and omit the _expression attribute_ from further processing.

Then we probably need to add somewhere:

To resolve the value of an @locale...
If the value is empty, emit an error (I think?)
If a literal, split on comma, trim, check for well-formed tag
If a variable:
- if an array...
- else if a literal...

If a `@locale` has an invalid value,
emit a Resolution error and ignore the _expression attribute_ in further processing.

Implementations MAY ignore all other _expression attributes_ during formatting.
Copy link
Member

@aphillips aphillips Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is wrong. Implementations should not be required to support attributes that they don't recognize or impute meaning. So they are "ignoring" them from that perspective. But they have already parsed them. If I implement @_foo in my functions and impute meaning to whatever I assign to @_foo at the expression level, then I'd like the implementation I'm running on to pass the value to me (pretty please), even though the impl doesn't know what it means. Otherwise "private use" expression attributes can't work (as anything other than translator instructions)

Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
@mihnita
Copy link
Collaborator

mihnita commented Aug 21, 2023

{{$start @can-reorder=no} is the beginning.}

This is not the intended XLIFF usage.
The use case for reorder is not allowing this kind of translation:

src: ...{ph1} .... {ph2}....
trg: ...{ph2} .... {ph1}....

The reason this makes sense in XLIFF is that it is an interchange format.
It is used to extract strings from other formats and represent them in a unified manner.
So "no reorder" would be used if the string in the original format was something like ".... %d ... %s ...".
(yes, bad i18n, but xliff should be able to represent that)

But I don't think it has a use case for MF2.

@eemeli eemeli marked this pull request as draft August 21, 2023 17:14
@eemeli
Copy link
Collaborator Author

eemeli commented Aug 21, 2023

Based on discussion during today's call, I'll be submitting a design document that this PR may be solving.

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 12, 2024

Closing in favour of #592.

@eemeli eemeli closed this Jan 12, 2024
@eemeli eemeli deleted the expression-attributes branch January 12, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatting Issue pertains to the formatting section of the spec syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to express expression locale?
4 participants