Skip to content

Should line breaks be width 0 or 1? #60

Closed
@Manishearth

Description

@Manishearth

See #55 (comment)

From Unicode:

control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters [...], when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph, so that there is a visible indication of their presence in the text.

Newline control characters (and other break characters) are typically "displayable by normal rendering", however they do not have a rendering that has anything that could be termed as a "width", so there's no single answer as to what this crate should provide here according to the spec.

Applications using unicode-width ought to be using a higher level protocol to split text on newlines (and other similar things) before feeding it to this crate (and perhaps we should document that). It isn't really possible for this crate to provide such a protocol itself since there are nuances to how different contexts handle newlines (Windows vs Unix being the most obvious one, but wrapping behavior also participates).

However, we should pick some consistent, sensible default for what newlines are treated as. My guess is that we should treat them as width 0 (for both CR and LF, and any other line breaks that exist).

h/t/ @decathorpe for noticing

cc @Jules-Bertholet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions