-
-
Notifications
You must be signed in to change notification settings - Fork 36
formatToParts-like Iterator #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+100 |
@stasm How would the "before variables are resolved to their values" bit work? I definitely agree with having parts including the value before its stringified, but what would be the benefit of not already determining the part's original value? This matters when you consider format-to-parts output together with post-resolution transformations that might enable solutions to #16, #31, #34, and #160. If the parts are emitted before variable resolution, such a transformation could not be applied to them. |
Here's a concrete proposal for the interfaces of the formatted message parts, where the return type of a interface FormattedDynamic<T> {
type: 'dynamic';
value: T;
meta?: FormattedMeta;
toString(): string;
valueOf(): T;
}
interface FormattedFallback {
type: 'fallback';
value: string;
meta?: FormattedMeta;
toString(): string;
valueOf(): string;
}
interface FormattedLiteral {
type: 'literal';
value: string | number;
meta?: FormattedMeta;
toString(): string;
valueOf(): string | number;
}
interface FormattedMessage<T> {
type: 'message';
value: FormattedPart<T>[];
meta?: FormattedMeta;
toString(): string;
valueOf(): string;
}
type FormattedMeta = Record<string, string | number | boolean | null>;
type FormattedPart<T = unknown> =
| FormattedDynamic<T>
| FormattedFallback
| FormattedMessage<T>
| (T extends string | number ? FormattedLiteral : never); In other words, I'm proposing that we have four different formatted parts:
The fields of these parts are shared by all, and each has an important role:
In the execution model of the EZ model, these formatted parts may also be used to wrap the arguments of a formatting function, which would allow e.g. for |
I'd prefer subclassing |
@longlho Your reference link and at least my understanding of your concerns would indicate that you might be talking about the representation of the source message, rather than this formatted output, where all the selectors, functions etc. have been resolved into a single sequence of formatted parts. Is this so, or have I misunderstood? The MF2 data model representation of source messages is a separate from this, and its allowance of selectors only at the top level should make it significantly easier to e.g. count selector cases directly. |
Ah I misunderstood this then. In that case looks like |
Sorry for missing this question back when. Looking at it today, I think I got this wrong. We should resolve the variables references to a runtime value (like the one you proposed in #41 (comment)) and stop there, i.e. yield those runtime values without formatting them to strings. |
A naming detail which I think may impact the understanding of the proposed interfaces:
Would |
Actually, let me take a step back. I was under the impression that we'd want to yield unformatted values, but after thinking about this this morning, I'm not so sure anymore. Given the message:
|
@stasm Really good point. And I think it made me change my mind on a few things. I actually had a decently long reply to this written, but then I realised that my approach to this is premised on
If instead we skip all of that and require eager resolution for formatting functions args, we really ought to consider alignment with the existing prior art on this as a relatively high priority, i.e. follow what ECMA-402 does. And that to me answers your question: We should go with option 3, formatted & flattened parts, adding something like the [
{ type: 'literal', value: 'Transferred ' },
{ type: 'integer', value: '1', source: 1.23 },
{ type: 'decimal', value: '.', source: 1.23 },
{ type: 'fraction', value: '23', source: 1.23 },
{ type: 'literal', value: ' ', source: 1.23 },
{ type: 'unit', value: 'MB', source: 1.23 }
] Not sure about the exact shape of the |
Can you explain how the eager vs. resolution for function arguments ties into this? In my mind in both approaches, the parts yielded by |
Message formatting is unique enough that we could justify the nested approach too, kind of like (2) in #41 (comment).
If instead we go for a completely flat output, then I like your idea to use |
Late night revelation that I wouldn't want to forget: flat output scales better when we're talking about messages referencing other messages, possibly more than one level deep. |
Okay, updated proposal based on comments from @longlho and @stasm. I think the parts should be a flat list type MessageFormatPart = { source?: string } & (
| { type: 'literal'; value: string }
| { type: 'dynamic'; value: string | symbol | function | object; source: string }
| { type: 'fallback'; value: string; source: string }
| { type: 'meta'; value: ''; meta: Record<string, string>; source: string | undefined }
| Intl.DateTimeFormatPart
| Intl.ListFormatPart
| Intl.NumberFormatPart
| Intl.RelativeTimeFormatPart
) The added formatted part types are the same as before, except for
The fields are also much the same as before, though
|
Thought it might be good to note here that my current thinking on formatting a message to parts is represented in the In brief, I now think that the most appropriate part-like representation of a resolved message in JavaScript is a list of I do not think that this representation necessarily makes sense in all environments, as it ends up relying on specific implementation choices and deeply interacting with its JS |
@aphillips Replying here to #396 (comment), as this seems like a more appropriate place for the conversation; see above for some prior related discussion.
Ah, ok. So do I understand right that you're advocating for us to define a "format to string parts" API, and that if an implementation were to want to represent non-string-y values in expressions, then the implementation would need to provide a separate API for that? Thus far, I have been working from the presumption that an "expression part" in a "format to parts" API would have at least the following qualities:
Would you agree with the above, or do you think that e.g. 2. should be left out?
In principle, it seems to make sense if we only care about string output, but I'd leave out the locale & direction from all but the message and expression elements. Literals at least should inherit the message's properties. |
@eemeli wrote:
Actually, no. I don't think we are required to mandate any specific APIs. What I'm trying to lay out is an approach to organizing the formatting spec. The The If the value of a variable is a literal, it still might be formatted through a function and not just returned verbatim. And we've discussed elsewhere that a function can return a sequence of "parts" for decoration.
I think each "part" would have properties and the list of properties would be the same for each part--
I don't think the former is true: we care about specifying how a message is resolved. "to-string" is only one of the ways a string can be resolved (even if it is by far the most common). I made a point about including the locale and direction because I want each part to have the same set of properties. While some programming languages/environments can differentiate using (for example) class or reflection, others don't make this easy. I don't think it is good to have to write code that has to differentiate for (messagePart in mf.formatToParts(someArgs)) {
someNode.lang = (messagePart.type === 'text') ? mf.getLocale() : messagePart.lang;
someNode.dir = (messagePart.type === 'text') ? mf.getDirection() : messagePart.dir;
// etc.
} It's also the case that not all literal nodes will inherit direction or language (the text nodes would have to inherit it). I should say more, but don't have the time today to work on it, but wanted to get some thoughts down quickly... |
Very similar issue: |
It sounds like all parties agree a firm API is not part of the specification, but there is good discussion on defining a structural definition of a @aphillips you mention you're "trying to lay out ... an approach to organizing the formatting spec". Forgive me, I'm not tracking how the rest of your message correlates to that end. I observe commentary on the parts schema design. Can you help set me straight, in simple terms? |
The main trouble with deciding in the spec for a certain structure is that it will have a big friction with existing implementations. Yes, there is no MF2-like in MF1. For example ICU formats to "something that implements the Having the same kind of result from MF2 (a And it is not an ICU problem. Android has Spannable, macos has AtttibutedString. These are all structures that are "format-to-parts" like, but very hard (impossible?) to unify. We can try to say what one might expect to find in such a structured result, but not the fields, or methods. |
I think we'd like to be a bit more conservative and agnostic, still. Rather than defining specific structural types, we can provide guidance to implementers about how to design formatted parts, and list a number of requirements that they should meet. For example, based on #160 (comment):
|
Adding to the Stas' bulleted list:
Example:
There is one placeholder ( Might also have "annotations" that result in the final text being rendered differently depending on context. |
this may have been addressed in the F2F proposal for F2P (format to parts). As mentioned in today's telecon (2023-09-18), closing old requirements issues. |
Is your feature request related to a problem? Please describe.
Rather than format a message to a string, it's sometimes useful to work with an iterable of formatted parts. This is conceptually similar to
NumberFormat.prototype.formatToParts
and others. This approach allows greater flexibility on the side of the calling code, and opens up opportunities to design tighter integrations with other libraries.For instance, in the message
Hello, {$userName}!
, theuserName
variable could be a UI widget, e.g. a React component. The code integrating the localization into the React app could then callformatToParts
which would yield the React component unstringified, ready to be used inrender()
.Another example:
It is {$currentTime}
where$currentTime
is aDate
object formatted as18:03
. IfformatToParts
yields it unformatted, the calling code can then callDateTimeFormat.prototype.formatToParts
on it and add custom markup to achieve effects like18:03
(hours in bold).Describe the solution you'd like
The
formatToParts
iterator would yield parts after all selection logic runs (in MF terms, that's afterplural
andselect
chooses one of the supplied variants) and after the variables are resolved to their runtime values, but beforevariables are resolved to their values,they are formated to strings, and interpolated into the message.Describe why your solution should shape the standard
It's an API offering a lot of flexibility to its consumers. The regular
format
API returning a string can be implemented on top of it, too.Additional context or examples
Fluent is now considering
formatToParts
in projectfluent/fluent.js#383 and projectfluent/fluent#273. I expect it to be ready by the end of H1. We see it as a great way of allowing interesting use-cases like component interpolation mentioned above, as well as an alternative approach to handle BiDi isolation (see #28) and to support custom transformation functions for text (great for implementing pseudolocalizations).The text was updated successfully, but these errors were encountered: