Why `MessageFormat` needs a successor #84

mihnita · 2020-05-14T23:58:52Z

In preparation for the group meeting.

mihnita · 2020-05-15T00:05:20Z

@zibi: thank you for offering to help.
I had you in mind since I started writing this :-)

If you look at the history of the file, I started with a list of bullets that were the same as the list of issues we already collected in GitHub.
But "filtered" through my opinionated view of things.
And that would not be very useful.

doc/why_mf_next.md

romulocintra · 2020-05-18T16:43:29Z

doc/why_mf_next.md

+
+**Mandatory xkcd:**
+
+[<img src='https://imgs.xkcd.com/comics/standards.png'>](https://xkcd.com/927/)


😂😂😂😂😂😂😂😂😂😂😂

zbraniecki · 2020-05-18T20:45:03Z

Going high-level first.

I like the "5 points" - it simplifies the logic and makes it easy to read "what they're addressing" and "what should I expect out of the outcome".

I don't like personal sentences in WG documents. Sentences like "I've started with..." read more like a personal take than a formal position of the WG.

Here's my summary of your outline:

Intro
- MessageFormat is the Unicode API for software localization
- It's 20y old, well designed, proven solution
- Its design is optimized for the software development model of 20y ago and its shortcomings result in mixed reception and adoption by the industry.
- The current wave of software development coming from dynamic languages, modern UI frameworks and new forms of user interactions (voice, VR etc.) combined with the lessons learned from MessageFormat, we aim to design the next iteration of MessageFormat suitable for current generation of software and adoption by Web Standards.
Core problems
- Non-extensible syntax
- Every "feature" encoded in the data model
- Limited specification and tooling around semantic features
- Designed for imperative APIs only

I removed pt (2) from your listing because it doesn't feel like a problem that requires fixing via new iteration - we could fix it just by adding test suite. Let me know if you agree.

On the other hand, I see a number of MF shortcomings that are not captured here:

MF syntax makes it hard to recover from resolution errors, which makes the system less resilient and limits fallbacking.
MF doesn't provide any meta-data (semantic comments, localizer comments, string versioning etc.)
Limited multi-variant messages (I'm not sure how to phrase it, but it seems that we aim to provide more generic ability to provide variants of the same message that depend on some information, while MF is designed to allow for variants that are tied to basically PLURAL since custom formatters are deprecated and never got any adoption)
Multiline strings are not well supported (not sure if the handling of whitespace in multilines is actually specified well, for example for HTML fragments)
Modern formatters enable much richer/better integration that MF wasn't designed for, and with mixed formatting control

I trust you on selection of those, so feel free to incorporate or reject any and all of the above.

Finally, I'm wondering if this is the right place to extract the conclusion from the shortcoming listed by you about "hard to add/remove features" into some form of "L10n systems are hard to change, they stay for decades as software changes. Encoding every feature deep into the data model and syntax does not stand the test of time. In conclusion we believe that designing a more generic data model on top of which we can develop features that can be extended and even deprecated as the time goes without affecting low-level foundational tooling is critical for the longevity of the system".
Do you think it's the consensus or am I projecting my personal bias here? :)

mihnita · 2020-05-20T00:42:19Z

I don't like personal sentences in WG documents. Sentences like "I've started with..." read more like a personal take than a formal position of the WG.

Thanks, I'll do that
(for a long time I've been told that I am "too forceful" and learned to "sprinkle" a lots of "I think", "in my opinion" and so on :-)

I'll go through your list and refactor, but can't take you full suggestion "as is"

There are things that are not addressed because they are not MessageFormat issues, or are already included, or can be fixed in the existing form, no need for a new version:

MF syntax makes it hard to recover from resolution errors, which makes the system less resilient and limits fallbacking.

There is no fallback or resolution, because "Designed to be API only", so it is already captured.

The loading of the string (including resolution / fallback) is the responsibility of some other component. You load the string (using the Android ResourceManager, Java Resources, Windows / MacOS APIs, C/C++ gettext), then pass it to the MessageFormat class.

This allows one to use ICU without being force to migrate the whole resource resolution out of the OS (making the resolution results inconsistent in the process).

Add this to the "detailed notes" of the bullet?

MF doesn't provide any meta-data (semantic comments, localizer comments, string versioning etc.)

Multiline strings are not well supported (not sure if the handling of whitespace in multilines is actually specified well, for example for HTML fragments)

Same for both items, addressed by "Designed to be API only", not a complete solution.

You can use whatever metadata the native string storage mechanism you use.
MessageFormat is agnostic in that respect, treats this as "out of scope"
If I have a way to say add metadata to non-ICU messages (comments, examples, length limits) in "strings storage", then I am free to use them for ICU messages too.
Same for multiline.
One can say that "Java properties are not multiline friendly"

Whitespaces and multiline can't be specified well in HTML fragments to begin with, MessageFormat or not better or worse in this respect.
If my fragment is used inside a <pre> or a div with white-space set to nowrap / pre / pre-line / pre-wrap then I get different behavior. If my fragment is used in 2 different contexts, it might end up rendered differently.

So MessageFormat is again agnostic here. Solving multiline spaces are the problem of the string loading mechanism, or of the rendering part.

And all of the 3 issues above can be solved without a new version of MessageFormat, we "only need to add a "MessageFormat store", with resolution + fallback.

But the price is loss of flexibility: now as an iOS / Android / Qt developer I need to either move ALL my strings to a "MF store", with its own resolution / fallback / metadata, or keep non MF strings in a "native store" and MF strings in a "MF store" (with the huge risk that the resolution is different)

Limited multi-variant messages (I'm not sure how to phrase it, but it seems that we aim to provide more generic ability to provide variants of the same message that depend on some information, while MF is designed to allow for variants that are tied to basically PLURAL since custom formatters are deprecated and never got any adoption)

Modern formatters enable much richer/better integration that MF wasn't designed for, and with mixed formatting control

Agree with both.
But I've considered that to be captured in "Does not have any “extension points”"
We can split it in a separate point, or add this clarification in the explanation of the bullet?

Finally, I'm wondering if this is the right place to extract the conclusion from the shortcoming listed by you about "hard to add/remove features" into some form of "L10n systems are hard to change, they stay for decades as software changes. Encoding every feature deep into the data model and syntax does not stand the test of time. In conclusion we believe that designing a more generic data model on top of which we can develop features that can be extended and even deprecated as the time goes without affecting low-level foundational tooling is critical for the longevity of the system".

I agree with this, and I'll see how I take this and use it.
Most likely not a bullet, but probably in the explanation section of "Hard to map to the existing localization core structures"

Do you think it's the consensus or am I projecting my personal bias here? :)

As you might have noticed in my answers above, there is some bias (probably yours and mine :-)
Let me "sleep on it" a bit and I'll see what we get.

stasm · 2020-05-20T14:30:41Z

Would it be a good idea to enumarate with more detail the design mistakes which you briefly touch on in point 3? Things like positional and named parameters, escaping, etc. I feel like each would benefit from its own bullet point with an example and the rationale about why we think they didn't work out.

doc/why_mf_next.md

mihnita · 2020-06-11T21:10:34Z

Would it be a good idea to enumarate with more detail the design mistakes which you briefly touch on in point 3?

I'm afraid that would make this document too big.

There is an older Google Doc (public, shared) where I listed what I like / dislike about MessageFormat. I can link to it. Or I can convert it to a markdown file and post it somewhere (where?)
That document is "kind of opinionated", it is not necessarily the position of the working group.

But if others agree to include all that here I am also OK to include some of that doc here.

mihnita · 2020-06-11T21:12:01Z

Changed things quite a bit. I've tried to capture / address all of your points, even if that didn't change the bullets too much.

nbouvrette · 2020-05-24T20:23:39Z

doc/why_mf_next.md

+“de facto reference-implementations”, and the ports to other languages
+(JavaScript, Go, Dart, etc.) are at risk for being “slightly incompatible”
+
+### 3. Can't remove anything, even if we know know better


Not sure if this is related to ICU itself or more with its process? But it's a good reminder that we should avoid falling back in the same situation (e.g. support versioning)

nbouvrette · 2020-05-24T20:24:54Z

doc/why_mf_next.md

+
+### 4. Hard to map to the existing localization core structures
+
+The format is not supported by any major localization system that I know of. \


Do you mean TMS here? and can you elaborate a bit more around what you would expect in terms of support?

The term was "intentionally vague", since I didn't want to get into debates about TMS vs GMS vs CAT vs whateve-else-abbreviations-people-use.

Support means most of the stuff people do with other formats: extract (with unescape), segment, leverage, translate protecting codes, validate, merge (with proper escape).

I don't count "take the file as is, with curly brackets and all, and let translators edit it raw" as "support".

zbraniecki

This looks good. I agree with all points and I think it captures my mental model really well. Thank you!

zbraniecki · 2020-06-15T16:48:35Z

doc/why_mf_next.md

+are doing it in ICU itself.
+It also means most tools used to process these messages are built rigidly,
+and are unprepared to handle changes
+(think localization tools, liners, friendly UIs, etc.).


I'd maybe add that, from the experience with Fluent, the fact that almost every feature in MF is a syntax feature, means that on the very lowest level - AST, parser - every extensions basically breaks the existing functionality because the file cannot be parsed and its AST operated on.

Fluent's AST is more generic which means that more of the functionality falls on higher levels of abstraction and in fact, over the last 3 years, all accrued "wants" can be implemented either on the higher level (Semantic Comments, Dynamic References etc.) or require additive syntax relaxations (flatten selectors, rich overlays).

romulocintra · 2020-07-20T09:19:41Z

doc/why_mf_next.md

@@ -0,0 +1,146 @@
+# Why `MessageFormat` needs a successor ([issue #49](https://github.com/unicode-org/message-format-wg/issues/49))


In my opinion, the issue number should be placed along the article in a "more info" , "links" , "discussions" section or in the intro while giving context.

mihnita and others added 3 commits May 8, 2020 15:34

First version comitted

7a6bd2e

Trying to go to the 'root causes'

0949b0f

Another cause, removing the old style (list of issue)

db83135

mihnita requested a review from zbraniecki May 15, 2020 00:01

rxaviers reviewed May 18, 2020

View reviewed changes

doc/why_mf_next.md Outdated Show resolved Hide resolved

romulocintra reviewed May 18, 2020

View reviewed changes

nbouvrette reviewed May 24, 2020

View reviewed changes

doc/why_mf_next.md Outdated Show resolved Hide resolved

Implemented some of the feedback, clarified some areas

5eff768

nbouvrette approved these changes Jun 12, 2020

View reviewed changes

zbraniecki approved these changes Jun 15, 2020

View reviewed changes

zbraniecki mentioned this pull request Jun 17, 2020

Initial AST dump zbraniecki/message-format-2.0-rs#2

Open

Implemented feedback from the June 15th meeting

73f3bdb

romulocintra reviewed Jul 20, 2020

View reviewed changes

romulocintra self-requested a review July 20, 2020 20:58

romulocintra merged commit 01f97ff into unicode-org:master Jul 20, 2020

romulocintra mentioned this pull request Jul 20, 2020

Document why MessageFormat needs a successor #49

Closed

DavidFatDavidF mentioned this pull request Jul 27, 2020

Design Principle: Compatible vs. Breaking #88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why `MessageFormat` needs a successor #84

Why `MessageFormat` needs a successor #84

mihnita commented May 14, 2020

mihnita commented May 15, 2020

romulocintra May 18, 2020

zbraniecki commented May 18, 2020 •

edited

Loading

mihnita commented May 20, 2020

stasm commented May 20, 2020

mihnita commented Jun 11, 2020

mihnita commented Jun 11, 2020

nbouvrette May 24, 2020

nbouvrette May 24, 2020

mihnita Jun 15, 2020 •

edited

Loading

zbraniecki left a comment

zbraniecki Jun 15, 2020

romulocintra Jul 20, 2020


		Mandatory xkcd:

		[<img src='https://imgs.xkcd.com/comics/standards.png'>](https://xkcd.com/927/)


		### 4. Hard to map to the existing localization core structures

		The format is not supported by any major localization system that I know of. \

		@@ -0,0 +1,146 @@
		# Why `MessageFormat` needs a successor ([issue #49](https://github.com/unicode-org/message-format-wg/issues/49))

Why MessageFormat needs a successor #84

Why MessageFormat needs a successor #84

Conversation

mihnita commented May 14, 2020

mihnita commented May 15, 2020

romulocintra May 18, 2020

Choose a reason for hiding this comment

zbraniecki commented May 18, 2020 • edited Loading

mihnita commented May 20, 2020

stasm commented May 20, 2020

mihnita commented Jun 11, 2020

mihnita commented Jun 11, 2020

nbouvrette May 24, 2020

Choose a reason for hiding this comment

nbouvrette May 24, 2020

Choose a reason for hiding this comment

mihnita Jun 15, 2020 • edited Loading

Choose a reason for hiding this comment

The term was "intentionally vague", since I didn't want to get into debates about TMS vs GMS vs CAT vs whateve-else-abbreviations-people-use.

zbraniecki left a comment

Choose a reason for hiding this comment

zbraniecki Jun 15, 2020

Choose a reason for hiding this comment

romulocintra Jul 20, 2020

Choose a reason for hiding this comment

Why `MessageFormat` needs a successor #84

Why `MessageFormat` needs a successor #84

zbraniecki commented May 18, 2020 •

edited

Loading

mihnita Jun 15, 2020 •

edited

Loading