Left-to-right mark: Difference between revisions
Added Persian as another example RTL language. |
→top: other software is available |
||
(47 intermediate revisions by 41 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Control character in bidirectional text}} |
|||
⚫ | The '''left-to-right mark''' (LRM) is a [[control character]] |
||
{{Merge|Right-to-left mark|Arabic letter mark|date=June 2024|discuss=Talk:Arabic letter mark}} |
|||
{{More citations needed|date=January 2019}} |
|||
⚫ | The '''left-to-right mark''' ('''LRM''') is a [[control character]] (an invisible formatting character) used in computerized [[typesetting]] of text containing a mix of left-to-right scripts (such as [[Latin script|Latin]] and [[Cyrillic script|Cyrillic]]) and right-to-left scripts (such as [[Arabic script|Arabic]], [[Syriac alphabet|Syriac]], and [[Hebrew alphabet|Hebrew]]). It is used to set the way adjacent characters are grouped with respect to text direction. |
||
==Unicode== |
==Unicode== |
||
⚫ | In [[Unicode]], the LRM character is encoded at {{unichar|200E|left-to-right mark|html=}}. In [[UTF-8]] it is <code>E2 80 8E</code>. Usage is prescribed in the Unicode Bidi (bidirectional) algorithm.<ref>Unicode 12.0 standard, http://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf, p. 880</ref> |
||
⚫ | |||
==Example of use in HTML== |
==Example of use in HTML== |
||
Suppose the writer wishes to |
Suppose the writer wishes to use some English text (a left-to-right script) into a paragraph written in Arabic or Hebrew (a right-to-left script) with non-alphabetic characters to the right of the English text. For example, the writer wants to translate, "The language C++ is a programming language used..." into Arabic. Without an LRM control character, the result looks like this: |
||
<span dir="rtl">لغة C<span style="color:red">++</span> هي لغة برمجة تستخدم...</span> |
|||
With an LRM |
With an LRM entered in the HTML after the ++, it looks like this, as the writer intends: |
||
<span dir="rtl">لغة C<span style="color:red">++</span>‎ هي لغة برمجة تستخدم...</span> |
|||
In the first example, without an LRM control character, a [[web browser]] will render the ++ on the left of the "C" because the browser recognizes that the paragraph is in a right-to-left text ([[Arabic script|Arabic]]) and applies punctuation, which is neutral as to its direction, according to the direction of the adjacent text. The LRM control character causes the punctuation to be adjacent to only left-to-right text – the "C" and the LRM – and position as if it were in left-to-right text, i.e., to the right of the preceding text. |
|||
Some software requires using the [[HTML]] code <code>&#8206;</code> or <code>&lrm;</code> instead of the invisible Unicode control character itself.{{citation needed|date=April 2019}} Using the invisible control character directly could also make copy editing difficult. |
|||
==See also== |
==See also== |
||
* [[Right-to-left mark]] |
* [[Right-to-left mark]] |
||
* [[ |
* [[Bidirectional text]] |
||
==References== |
|||
{{reflist}} |
|||
==External links== |
==External links== |
||
* [http://unicode.org/reports/tr9/ Unicode standard annex #9: The bidirectional algorithm] |
* [http://unicode.org/reports/tr9/ Unicode standard annex #9: The bidirectional algorithm] |
||
* [ |
* [https://www.fileformat.info/info/unicode/char/200e/index.htm Unicode character (U+200E)] |
||
{{Unicode navigation}} |
{{Unicode navigation}} |
||
Line 29: | Line 36: | ||
[[Category:Digital typography]] |
[[Category:Digital typography]] |
||
[[Category:Unicode formatting code points]] |
[[Category:Unicode formatting code points]] |
||
{{typ-stub}} |
|||
[[ar:علامة يسار-إلى-يمين]] |
|||
[[fr:Marque gauche-à-droite]] |
|||
[[zh:左至右符號]] |
Latest revision as of 09:01, 21 July 2024
It has been suggested that this article be merged with Right-to-left mark and Arabic letter mark. (Discuss) Proposed since June 2024. |
This article needs additional citations for verification. (January 2019) |
The left-to-right mark (LRM) is a control character (an invisible formatting character) used in computerized typesetting of text containing a mix of left-to-right scripts (such as Latin and Cyrillic) and right-to-left scripts (such as Arabic, Syriac, and Hebrew). It is used to set the way adjacent characters are grouped with respect to text direction.
Unicode
[edit]In Unicode, the LRM character is encoded at U+200E LEFT-TO-RIGHT MARK (‎). In UTF-8 it is E2 80 8E
. Usage is prescribed in the Unicode Bidi (bidirectional) algorithm.[1]
Example of use in HTML
[edit]Suppose the writer wishes to use some English text (a left-to-right script) into a paragraph written in Arabic or Hebrew (a right-to-left script) with non-alphabetic characters to the right of the English text. For example, the writer wants to translate, "The language C++ is a programming language used..." into Arabic. Without an LRM control character, the result looks like this:
لغة C++ هي لغة برمجة تستخدم...
With an LRM entered in the HTML after the ++, it looks like this, as the writer intends:
لغة C++ هي لغة برمجة تستخدم...
In the first example, without an LRM control character, a web browser will render the ++ on the left of the "C" because the browser recognizes that the paragraph is in a right-to-left text (Arabic) and applies punctuation, which is neutral as to its direction, according to the direction of the adjacent text. The LRM control character causes the punctuation to be adjacent to only left-to-right text – the "C" and the LRM – and position as if it were in left-to-right text, i.e., to the right of the preceding text.
Some software requires using the HTML code ‎
or ‎
instead of the invisible Unicode control character itself.[citation needed] Using the invisible control character directly could also make copy editing difficult.
See also
[edit]References
[edit]- ^ Unicode 12.0 standard, http://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf, p. 880