-
-
Notifications
You must be signed in to change notification settings - Fork 36
When do we evaluate the local variables? #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd prefer failing as late as possible when formatting, as this allows for usage as in the On the other hand, I would presume that validation/linting could well present the error much earlier, so that e.g. attempting to assign an invalid literal option value could be noted. |
I think that in principle the evaluation should be lazy. Specifically:
I'll try to illustrate with an example message. Please bear with me, this is going to be long and complex. The goal of the message is to be able to construct sentences like You own 3 red crayons. and You own 5 blue pencils. (I'm ignoring the singular number for the sake of the example.) In Polish, the noun (crayons, pencils) must appear in plural accusative; the adjective (red, blue) must agree with the noun on the gender, number, and case. In other words, in You own 3 red crayons, The message takes three arguments:
The message:
Going over this step by step:
|
There is a lot to unpack here. I can write more later, but use of this kind
of evaluation requires a lot more underlying support (eg glossary lookups,
etc.). The messages have to be usable by implementations that don't have
that support, or don't have it for all their languages. Many products are
now supporting near or over 100 languages, and even requiring the same
level of support in every language is a challenge.
So while a sophisticated implementation might want to have 'lazy'
evaluation, directed by dependencies, the message format has to be usable
by implementations that don't need that. In particular, we can enforce that
a variable must be declared before it is referenced, even if a
sophisticated algorithm would delay the full evaluation.
Mark
* Moreover, I suspect that this scenario is actually more complicated than
it would appear. As just one example, when the input $amount is 0, it will
probably need to be quite different; a switch between different variant
messages. And in that case, the case of the noun may need to be different,
depending on the language, etc.
…On Fri, Sep 23, 2022 at 1:41 AM Stanisław Małolepszy < ***@***.***> wrote:
I think that in principle the evaluation should be lazy. Specifically:
- Expressions should be evaluated right before they're formatted (to
string or to parts).
- When referred to inside other expressions, they should be passed
into the function without being evaluated. The function may then choose to
evaluate if needed.
------------------------------
I'll try to illustrate with an example message. Please bear with me, this
is going to be long and complex.
The goal of the message is to be able to construct sentences like *You
own 3 red crayons.* and *You own 5 blue pencils.* (I'm ignoring the
singular number for the sake of the example.) In Polish, the noun (crayons,
pencils) must appear in plural accusative; the adjective (red, blue) must
agree with the noun on the gender, number, and case. In other words, in *You
own 3 red crayons*, red must be a feminine, plural accusative.
The message takes three arguments:
- $id: "OBJECT_CRAYON" | "OBJECT_PENCIL"
- $color: "COLOR_RED" | "COLOR_BLUE"
- $amount: float
The message:
$myCount = {$amount :number maximumFractionDigits=0}
$myName = {$id :noun case=accusative number=$myCount}
{You own {$myCount} {$color :adjective accord=$myName} {$myName}.}
Going over this step by step:
1.
$myCount = {$amount :number maximumFractionDigits=0}
A local variable definition. No evaluation happens at this stage.
2.
$myName = {$id :noun case=accusative number=$myCount}
Another definition of a local variable. Again, $myName doesn't have an
evaluate value just yet. There's also a reference to $myCount here; it
doesn't cause the evaluation of $myCount. Instead, the local variable
definition is passed, to capture the *intent* rather than the
evaluated value.
In other words, I'd imaging the :noun function to receive one of the
following dictionaries as its options bag; the implementation should be
able to choose which one it is:
a) $myCount is passed as-is, without looking up the value of $amount.
It's up to :noun to look it up (when :noun is called in step 5). This
implies that :noun has access to the arguments passed to the message,
but also simplifies the signature of :noun's implementation because
the type of the number option is known upfront.
{
"case": "accusative",
"number": {
argument: Identifier {"name": "$amount"},
function: ":number",
options: {"maximumFractionDigits": 0}
}
}
b) $myCount is passed with the $amount variable already resolved.
{
"case": "accusative",
"number": {
argument: 3.0f,
function: ":number",
options: {"maximumFractionDigits": 0}
}
}
Thanks to $myCount's being passed lazily, unevaluated, the :noun
function will be able to inspect the value of $amount, despite its
being passed wrapped inside the $myCount definition. Thus, :noun can
choose the correct grammatical number for the object's name.
3.
... You own {$myCount} ...
Here, we're about to *format* $myCount, which means it must be
evaluated. This is the first time that the :number function from
$myCount's definition is being called.
4.
... {$color :adjective accord=$myName} ...
Here, $myName is passed into :adjective without being evaluated,
similar to $myCount above. This allows :adjective to do 2 things:
a) Look up the grammatical gender of the object. The gender is an
inherent property of the word, so it cannot be given as an option. Instead,
it must be defined in a glossary together with the translation.
b) Inspect the case and number options of $myName's definition.
These two steps allow :adjective to agree the translation of the color
with the gender, number and case of the object.
5.
... {$myName}. ...
Here, we finally get to format $myName. This will call :noun, which
will in turn inspect $myCount as described in step 2, and look up the
case-proper and number-proper form of the translation.
—
Reply to this email directly, view it on GitHub
<#299 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFPFZMJLES27FAYQV3V7VUKLANCNFSM6AAAAAAQNRVVFA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks for your thoughts, @macchiati.
My goal here is allow message to express complex grammatical relationships. The tool I'm choosing to achieve this goal is lazy evaluation. I'm open to discussing other solutions if you think that lazy evaluation is not feasible. My operating assumption is that for grammatical agreement to be possible we need to be able to inspect the options used in an expression inside other expressions. The two ways discussed by the group were:
My opinion is that (2) is more verbose and more error-prone. Furthermore, it still doesn't solve the question of how to pass the gender of the object to format the color. The gender isn't a configuration knob; it's an inherent property of the object name, and as such there needs to be a lookup involved and it should happen inside
Absolutely, this was just an example, and it's already longer than I wish it was, so I attempted to simplify :) |
The point I'm trying to make is this.
The framework needs to be usable by:
1. very high-end implementations that handle reinflection for some
languages
2. high end implementations that can inflect various words in
some languages
3. lower end implementations that can't do either of those
Any given implementation might have a mix of these, where different
languages have different levels of support. Or they might have all
supported languages at the same level.
To cover this spread, we need to allow for *all of these* to be conformant,
though clearly the messages constructed for #1 won't include all the
features that #2 does, and #2 for #3.
That means that we can allow for lazily-evaluated binding, but we cannot
require it.
Mark
…On Fri, Sep 23, 2022 at 9:37 AM Stanisław Małolepszy < ***@***.***> wrote:
Thanks for your thoughts, @macchiati <https://github.com/macchiati>.
The messages have to be usable by implementations that don't have that
support, or don't have it for all their languages.
My goal here is allow message to express complex grammatical
relationships. The tool I'm choosing to achieve this goal is lazy
evaluation. I'm open to discussing other solutions if you think that lazy
evaluation is not feasible.
My operating assumption is that for grammatical agreement to be possible
we need to be able to inspect the options used in an expression inside
other expressions. The two ways discussed by the group were:
1.
passing the entire lazily-evaluated binding, e.g. {$color :adjective
accord=$myName}, so that the outer expression can inspect the
configuration of the inner one (here, :adjective can inspect the
options passed to :noun),
2.
passing specific parts of the configuration from one expression to
another one by one, either directly: {$color :adjective
case=accusative ...} or {$color :adjective case=$myName.case ...} (but
MF2 doesn't have member expressions), or indirectly: {$color
:adjective case=$myName ...}.
My opinion is that (2) is more verbose and more error-prone. Furthermore,
it still doesn't solve the question of how to pass the gender of the object
to format the color. The gender isn't a configuration knob; it's an
inherent property of the object name, and as such there needs to be a
lookup involved and it should happen inside :adjective.
Moreover, I suspect that this scenario is actually more complicated than
it would appear.
Absolutely, this was just an example, and it's already longer than I wish
it was, so I attempted to simplify :)
—
Reply to this email directly, view it on GitHub
<#299 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCDG5GCJQGT2MS7DBTV7XMENANCNFSM6AAAAAAQNRVVFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Would this example be perhaps easier to reason about?
My expectation would be that in English, the evaluation of that message would always select the Given that we're not intending to define In the JS implementation, I've solved this by considering the evaluated value of Given the wide scale of potential users, I think I agree with @macchiati that we shouldn't mandate either behaviour, and support an ecosystem where e.g. an ICU function registry and a JS function registry would behave differently with the above message. |
I agree with your conclusion, but not the example.
That is, the plural categorization *must* be performed on the formatted
number, otherwise the incorrect value will result. Here's a simple example
with value = 1.0d:
formatted value: "1.0"
correct selected variant message: You have {$appleCount} apples
formatted value: "1"
correct selected variant message: You have {$appleCount} apple
This happens with many languages, where the visible trailing fraction
digits make a difference in the plural category.
Now, when I say "performed on the formatted number", internal to the
implementation that is typically done by looking at an intermediate result
produced during the formatting. That intermediate result has all of the
digits of the final form and the decimal position. It is sufficient for the
plural categorization, without extra string cruft. That is, the *string*
resulting from the formatting is not appropriate for the plural
categorization, because the categorization should *not* have to parse the
string resulting from the formatting, as that can be complicated.
…On Sun, Sep 25, 2022 at 8:50 AM Eemeli Aro ***@***.***> wrote:
Would this example be perhaps easier to reason about?
let $appleCount = {$amount :number minimumFractionDigits=1}
match {$appleCount :plural}
when one {You have {$appleCount} apple}
when * {You have {$appleCount} apples}
My expectation would be that in English, the evaluation of that message
would always select the * option. Do others share this expectation, or
should e.g. the :plural fail because it's getting a "formatted number"
rather than an actual number as its input?
Given that we're not intending to define :number or :plural in the MF2
spec, I would think that the spec language should not explicitly mandate
how either approach should be implemented. For plurals in particular, it
would even be possible to re-parse a formatted-number string argument to
its component parts for the CLDR rule calculations, though that does of
course depend on the locale.
In the JS implementation, I've solved this by considering the evaluated
value of $appleCount to be a "MessageNumber", an object that holds its
resolved argument and options values. It's not completely lazy, as its
resolution has looked up the $amount value and is holding that directly
rather than a getter for it -- but that's an implementation detail. This
allows for the selection to be done based on the value + options needed for
plural rule selection, while the actual formatting (either to a string or
parts) has similar access to the value and the relevant options.
Given the wide scale of potential users, I think I agree with @macchiati
<https://github.com/macchiati> that we shouldn't mandate either
behaviour, and support an ecosystem where e.g. an ICU function registry and
a JS function registry would behave differently with the above message.
—
Reply to this email directly, view it on GitHub
<#299 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMG7YKXQXCDRSSNMGHTWABYDJANCNFSM6AAAAAAQNRVVFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Perhaps we're all trying to achieve the same thing, but had imagined different ways to do it? In the
No one is advocating for (3) but I'm putting it in the list as a potential implementation choice. As long as the end result is |
#3 should not even be mentioned in passing. Parsing a formatted value
is simply asking for trouble.
…On Sun, Sep 25, 2022 at 10:28 PM Stanisław Małolepszy < ***@***.***> wrote:
Perhaps we're all trying to achieve the same thing, but had imagined
different ways to do it? In the You have X apples example, we want :plural
to be able to determine that $amount has been or will be formatted as 1.0.
This can be achieved by:
1. Lazy evaluation ***@***.*** <https://github.com/mihnita> and @stasm
<https://github.com/stasm>).
2. Intermediate result ***@***.*** <https://github.com/eemeli> and
@macchiati <https://github.com/macchiati>).
3. Reparsing the formatted string result.
No one is advocating for (3) but I'm putting it in the list as a potential
implementation choice. As long as the end result is You have 1.0 apples.
That said, reparsing wouldn't be a good choice in the case of the example I
gave, where it's unlikely it would be able to reverse engineer grammatical
properties of a noun from a formatted string result.
—
Reply to this email directly, view it on GitHub
<#299 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMBNBVERF67TEUTQLMLWAEX6LANCNFSM6AAAAAAQNRVVFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I don't think 1 and to are orthogonal. And I don't think that the @eemeli and @macchiati options are the same. In fact I am not even sure the @eemeli option even answers the question. You can only make a decision after you invoke the formatting function: We should not think I would really like to see the JS implementation without any knowledge of plural / number / date and so on. That would really show that the "engine" is does not hard-code any special knowledge about the functions. |
It seems to me that we're mixing here concerns from multiple different layers of the implementation. To get at something like a root of this, could we see if we could agree at least on the following statement?
A couple of clarifiers are perhaps in order here:
This is available via the |
When I expressed my support for some sort of lazy evaluation strategy, my goal was to allow function implementations to access information about the precedeing transformations that a variable went through. An example use-case for this is: in a I saw lazy evaluation as one way to achieve this, but when I implemented https://github.com/stasm/message2 I realized that it can be also done by passing resolved values in wrappers carrying the formatting configuration on the runtime. Here's how the |
Option 2. Give the opportunity to implementations that need max performance the ability to achieve max performance. |
@cdaringe Re: Option 2: you might be interested in reading the comments on #413 , where I explained how lazy evaluation with memoization (i.e. call-by-need) affects the meaning of variable resolution. You might wish to chime in there if you think the added complexity is worth it to allow implementations to maximize performance. (Note: memoization itself involves overhead, so I personally don't have a good analysis at this time of which of the three possible evaluation strategies would lead to the best performance, but maybe you have more thoughts on that.) |
Note: this might be partly addressed by #469 (which doesn't directly address when evaluation occurs, but does address mutability and does at least ensure that |
Addressed by #476 |
* Design document for variable mutability and namespacing * style: Apply Prettier * Partly address #299 * style: Apply Prettier * Address comments, fix sigil choice - change `@` to `#` because we want to use `@` for annotations such as `@locale` - Provide text that considers not making ugly local variables - Provide use cases for static analysis - Call out the perfidy of the author in stealing ill-baked requirements * style: Apply Prettier * Add @eemelie's `input` proposal as an option considered * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Add new proposed design * Update exploration/variable-mutability.md Co-authored-by: Addison Phillips <addison@unicode.org> * Address @eemeli's comments Specifically the one about forward references * style: Apply Prettier * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Eemeli Aro <eemeli@mozilla.com> Co-authored-by: Eemeli Aro <eemeli@gmail.com>
* Create notes-2023-10-02.md (#486) * Design document for variable mutability and namespacing (#469) * Design document for variable mutability and namespacing * style: Apply Prettier * Partly address #299 * style: Apply Prettier * Address comments, fix sigil choice - change `@` to `#` because we want to use `@` for annotations such as `@locale` - Provide text that considers not making ugly local variables - Provide use cases for static analysis - Call out the perfidy of the author in stealing ill-baked requirements * style: Apply Prettier * Add @eemelie's `input` proposal as an option considered * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Add new proposed design * Update exploration/variable-mutability.md Co-authored-by: Addison Phillips <addison@unicode.org> * Address @eemeli's comments Specifically the one about forward references * style: Apply Prettier * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> * Update exploration/variable-mutability.md Co-authored-by: Eemeli Aro <eemeli@mozilla.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Eemeli Aro <eemeli@mozilla.com> Co-authored-by: Eemeli Aro <eemeli@gmail.com> * Create notes-2023-10-09.md * Update notes-2023-10-09.md * Remove the Prettier push action (#491) Remove the Prettier lint action * Remove numbers from the existing design proposals (#490) --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Eemeli Aro <eemeli@mozilla.com> Co-authored-by: Eemeli Aro <eemeli@gmail.com> Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
It does not matter if we do the evaluation in one go or in two steps, at some point one needs to go from the arguments to the final result (parts, in format to parts, or string)
And part of that is evaluating the local variables.
When do we do it?
It is not simply an implementation detail, as the evaluation might trigger errors / exceptions.
So this decision would affect the timing of these kind of errors.
Example 1:
This would fail, because the skeleton contains invalid characters.
But only the function knows what's valid and what's not.
So you will get an exception when you invoke the function.
One might say: you should validate earlier, using some regex from the registry, without invoking the function.
But we also support this:
And we only know
$realSkeleton
at evaluation time (might even be another local variable?).Option 1: when it is "defined"
Option 2: first time it is referred (lazy eval + memoization)
Option 3: implementation dependent.
In the ICU4J implementation I went with 2.
Some benefits for 2 (why I chose it):
Might waste time, because not all of them may be used by the selected pattern (if we have selectors).
And might prevent supporting some use cases.
For example:
As you can see
$exp
is only used on thetrue
branch.Even more, the evaluation of
$exp
might even fail ifexpirationDate
is undefined (not passed as argument, or is null).And the condition protects us from this.
Even if we ask that the functions never fail (is that a good idea? puts more restrictions on how one writes the functions).
option 1 means that we will evaluate
$exp
even if branch*
is selected. These are wasted cycles.Side effect: naming
Late evaluation makes "variable" a bit confusing.
In most programming languages variables are evaluated when assigned (the right-side is evaluated before assignment).
That is why in my proposals this was "macro" or "alias"
The text was updated successfully, but these errors were encountered: