Skip to content

Clarify that Reserved may also represent private-use in data model #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 8, 2023

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Jul 29, 2023

The data model Reserved interface may also be used for private-use annotations that are not supported by the implementation. This PR adds text clarifying that to be the case, and explicitly mentions that private-use syntax that is supported by the implementation may use a different interface.

The source definition is also corrected to not include the starting sigil. This was the only point in the data model where a part of the syntax was showing up twice, without a really good reason why.

@eemeli eemeli added the data model Issues related to the Interchange Data Model label Jul 29, 2023
Comment on lines 148 to 149
A `Reserved` represents an _expression_ with a _reserved_ or _private-use_ _annotation_.
The `sigil` corresponds to the starting sigil of the _annotation_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect.

Reserved represents a portion of the syntax that is wholly invalid--not permitted to be used--but which might become part of a future incarnation. If a part of reserved were used in a future incarnation, existing (pre the unreserving) implementations would continue with whatever behavior reserved provides. Probably this means, as you have it here, passing reserved gunk through the data model without processing.

Private use is different. Private use is a portion of the syntax that is valid, but which may not be functional in a given implementation.

What I would do is define private use and reserved separately and use the same language where appropriate.

Also: note that private-use is a feature of our specification, which is why I put it before reserved in the text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example of a data model use case where it would be useful for the private-use and reserved syntax rules to be represented by different data structures, rather than this single Reserved?

Given that supported private-use syntax would end up using a different interface altogether, I'm failing to come up with a realistic scenario where I'd like unsupported private-use and reserved to be handled differently. I suppose a linter error for a message including one or the other could use a slightly different text, but that can be detected from the sigil.


Implementations MUST NOT rely on the set of `sigil` values remaining constant,
as future versions of this specification MAY assign other meanings to such sigils.

If the _expression_ includes a _literal_ or _variable_ before the _annotation_,
it is included as the `operand`.

When parsing the syntax of a _message_ that includes a _private-use_ _annotation_
supported by the implementation,
the implemenation MAY represent it in the data model using a different interface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need some stronger guidance here. I agree with the "MAY" you have, but would suggest something along the lines of:

Private-use annotations are specific to a particular "private agreement"
between the various implementations that support a given form of private-use.

When parsing the syntax of a message that includes a private-use annotation
unrecognized by the implementation, the annotation MUST be processed
identically to a reserved annotation.

An implementation that supports a given private-use annotation MUST
define the specific interface to support the semantics, structure, and meaning
that it provides. Use of existing data model interfaces is RECOMMENDED, although
an implementation MAY use any interface appropriate for its needs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the "parsing the syntax" sentence belongs in the data model spec, but in the syntax. It's defining a MUST about what the syntax means, and if included here that would get watered down by the qualifier at the top of this doc:

Implementations are not required to use this data model for their internal representation of messages.

On review, I now also note that the Extensions section will need to be updated to account for private-use. Something like your third paragraph is probably needed there.

The `source` is the "raw" value (i.e. escape sequences are not processed)
and includes the starting `sigil`.
and does not include the starting `sigil`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, by removing the starting sigil, you remove the ability for an implementation to tell how reserved (or private) sequence was introduced. If the data model were serialized and sent to an implementation that supported one or another sigil, the receiver would have no way of knowing what the sigil was. Is there a reason you removed the sigil?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because it's already in the separate sigil field.

@macchiati
Copy link
Member

macchiati commented Jul 30, 2023 via email

@eemeli eemeli requested a review from aphillips August 8, 2023 13:31
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly editorial suggestions. Have a look.

eemeli and others added 2 commits August 8, 2023 18:16
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
@eemeli eemeli requested a review from aphillips August 8, 2023 15:20
@aphillips aphillips merged commit 08685d1 into unicode-org:main Aug 8, 2023
@eemeli eemeli deleted the refresh-reserved branch August 8, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data model Issues related to the Interchange Data Model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants