Skip to content

Create and Collect Use Cases #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
romulocintra opened this issue Nov 27, 2019 · 10 comments
Closed

Create and Collect Use Cases #2

romulocintra opened this issue Nov 27, 2019 · 10 comments
Labels
requirements Issues related with MF requirements list

Comments

@romulocintra
Copy link
Collaborator

romulocintra commented Nov 27, 2019

We need to define "Scope" , "Pipelines" to focus if we are designing for developers, translators, or runtime efficiency.

@romulocintra
Copy link
Collaborator Author

romulocintra commented Jan 6, 2020

IMHO the future MF API should be focused on providing a low-level set of APIs extending the built-in Intl with reusable and pluggable formatters etc...
The focus or the target should be:
1 - Developer/Translators
2-Tooling/Efficiency

In other words, MF should be designed having Developers in mind but make i18n library authors can converge their developments for the future MF and keep their tools on top of it in a smooth way. As a listed Developers/Translators should be taken as the main stakeholders...

@jamuhl
Copy link

jamuhl commented Jan 6, 2020

agree...regarding tooling having a parser that parses a message to "AST" would already be awesome...but guess the community will come up with this anyway...so from my perspective message syntax, api and if we go not single message supporting referencing file format would be a good start

@longlho
Copy link

longlho commented Jan 6, 2020

Strong vote for defining a AST for me because:

  1. A lot of tooling (linter, debugger, string collector) relies on AST
  2. Distribution pipeline can ship AST instead of strings to save on parsing runtime (& right now, parser code weight).

Having the community coming up w/ the AST creates a lot of inconsistency in parsing especially when it comes to escaping syntax char & placeholder enforcement.

@zbraniecki
Copy link
Member

@eemeli
Copy link
Collaborator

eemeli commented Jan 6, 2020

For a reference, here's the EBNF-ish Peg.js parser that messageformat uses when compiling messages into functions.

Its output is an array of AST nodes.

@longlho
Copy link

longlho commented Jan 7, 2020

Similarly formatjs also uses a PEG.js parser to generate our TypeScript AST. That powers our linter and allows us to deal w/ translation vendors that have extra limitations on ICU.

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Jan 16, 2020

IMHO the future MF API should be focused on providing a low-level set of APIs extending the built-in Intl with reusable and pluggable formatters etc...
The focus or the target should be:
1 - Developer/Translators
2-Tooling/Efficiency

In other words, MF should be designed having Developers in mind but make i18n library authors can converge their developments for the future MF and keep their tools on top of it in a smooth way. As a listed Developers/Translators should be taken as the main stakeholders...

As someone who has implemented their own Unicode API from scratch, don't couple this format specifier syntax too closely with ICU, you'll just make it harder to implement and therefore less likely to be used.

Which brings me to my main point.

Why not just extend POSIX's positional format specifiers? e.g: printf("%1$Gs", "Male");

@echeran
Copy link
Collaborator

echeran commented Jan 24, 2020

If I understand correctly in this thread, the discussion of defining the AST produced from parsing a file (ex: Fluent, ICU MessageFormat) is pretty similar to my comment in the other thread about defining a data model. If so, that's good. In some cases, the parser output AST looks a lot like the input data structure from an earlier proof-of-concept I wrote to exemplify what the data-oriented approach might look like in a dynamic language like JS.

Of course, the difference between the terminology comes from the fact that ASTs are generated from parsers of file/string syntax, and a data model is just a specification of data without regard to its syntax or source.

For parsing files with a specific syntax, the pros mentioned include the ability to reuse the grammar definition for parsing and validation. The cons are the possible relative difficulty of getting the syntax correct (ex: ICU MessageFormat), and the possibility that the syntax allows problems that we must guard against.

For starting from a data model, the pros are that we are effectively "defining the AST" (data model) while allowing alternate syntaxes (ex: Fluent, ICU MF) to coexist. We also allow certain target language implementations (ex: JS?) to idiomatically accept data literals as input instead of a string-only input. The cons are that we need to write code for data validation, and possibly that there is no standard serialization format.

And it sounds like there is support for decoupling the concepts of authoring format / syntax from the runtime format / structure of input data.

Feel free to correct me on the above summary.

@nbouvrette
Copy link
Collaborator

This is my first post on this issue but I'm having a hard time figuring out the difference between this thread and the wish-list (issue #3). They both seem to have overlapping conversations but I think what would help is to define what is the common use case in terms of integration that a syntax would need to support.

I will try to include this in my presentation next Monday because I saw a lot of different usages of acronyms such as TMS, CAT and AST that could represent different concepts for different audiences.

Maybe having a definition for all the terms would help?

For example there are a lot of discussion around file types, but there could be different file types in different stage of the syntax (one for developers and one for the TMS/linguists).

@echeran

And it sounds like there is support for decoupling the concepts of authoring format / syntax from the runtime format / structure of input data.

I'm not 100% confident about this yet because I've been able to pilot MessageFormat using different file format and exposing raw syntax to linguists on a large scale. This creates a much simpler solution, if you provide the right tools with the syntax.

If we think there is a need for decoupling then I think we should highlight clearly why and keeping in mind also that most TMS expect symmetric file input (same amount of keys in the input file and output file). On top of this, you can create translation projects from 1 source langue to multiple target language, which means they would all need to stay keep the same amount of keys. This is unless you want to start breaking up projects per language pairs but this can have an impact on existing processes, costing and reporting when doing enterprise-scale localization...

Of course you could probably work around this by creating an new file format or trying to leverage existing ones that offer more flexibility (e.g. XLIFF). But also from experience, we know that most TMS provide different levels of XLIFF support which might impact adoption negatively. This would also probably be the same if a new file type of created - getting broad TMS support can take quite a while.

This was referenced Jan 25, 2020
@mihnita mihnita added the requirements Issues related with MF requirements list label Sep 24, 2020
@romulocintra
Copy link
Collaborator Author

Closing in favour of #119

macchiati added a commit that referenced this issue Jan 19, 2024
deleted #2 because of ambiguities, as in comment.
aphillips added a commit that referenced this issue Jan 22, 2024
* Update README.md

Updates the repo README to use only `LDML45` functions/features. It doesn't list everything possible with a message.

Also update the copyright date.

* Update goals.md

* Update goals.md

Patch of Addisons's version

* Update docs/goals.md

* Update goals.md

deleted #2 because of ambiguities, as in comment.

* Update docs/goals.md

Co-authored-by: Addison Phillips <addison@unicode.org>

* Update docs/goals.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Expand plurals example

Address @eemeli's comment

---------

Co-authored-by: Addison Phillips <addison@unicode.org>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requirements Issues related with MF requirements list
Projects
None yet
Development

No branches or pull requests

9 participants