Skip to content

Commit 9209c25

Browse files
AhMohsen46ronaldtse
authored andcommitted
add vowels
1 parent 41112b0 commit 9209c25

File tree

1 file changed

+83
-136
lines changed

1 file changed

+83
-136
lines changed

maps/odni-pus-Arab-Latn-2011.yaml

Lines changed: 83 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -33,140 +33,57 @@ description: |
3333
this document.
3434
3535
notes:
36-
- 1. Alif ( ‫ا‬ ) should be romanized as follows
37-
a. Initially,it indicates that the word begins with a vowel or
38-
diphthong; the alif itself is not romanized, but rather the
39-
short vowel it “carries” is romanized; e.g., Aslam Zhrandah
40-
‫ه‬ َ‫د‬ ‫ن‬ ‫ژر‬ ‫سلَم‬ َ‫أ‬ ‫ميړ‬ → b. When it carries a
41-
maddah (‫)آ‬ (see vowel table, row 3), it represents ā;
42-
e.g., Band. Mīṟ ‫د‬ ‫ن‬ ‫ب‬ َ ‫آب‬ → Āb c. Medially and
43-
finally it represents ā (see table 2, row 2); e.g., ‫ۍ‬
44-
‫ماڼ‬ → Māṉêy d. Medially and finally in words of Arabic
45-
origin, alif may serve as the bearer of hamzah, e.g.
46-
‫رأس‬ → ra’s. See also note 4.
47-
48-
- 2. The characters tsē ( ‫څ‬ ) and dzē ( ‫ځ‬ ) may be
49-
romanized t͡ s and d͡ z (the combining double breve (
50-
Unicode 0361) appearing over the digraph) when for special
51-
reasons it is desired that confusion be avoided between
52-
‫ت‬ (t) plus ‫س‬ (s) and between ‫د‬ (d) plus ‫ز‬ (z),
53-
respectively.
54-
55-
- 3. Occasionally the character sequences ‫ه‬ ‫ك‬ , ‫ه‬ ‫ز‬ ,
56-
‫ه‬ ‫س‬ , and ‫ه‬ ‫گ‬ occur . They may be romanized k·h, z·
57-
h, s·h, and g·h in order to differentiate these
58-
romanizations from the digraphs kh, zh, sh, and gh, which
59-
are used to represent the characters ‫خ‬ , ‫ژ‬, ‫ش‬ , and
60-
‫غ‬ respectively .
61-
62-
- 4. Hamzah ( ‫ء‬ ) should be romanized as follows a. In
63-
word-initial position, where it will appear either above or
64-
below alif ( indicates a short vowel and should not itself
65-
be romanized. romanized by an apostrophe, e.g. ‫أ‬ or
66-
‫إ‬ ), it In other positions it should be ‫جُزء‬ → juz’. b.
67-
Yeh with hamzah ( ‫ئ‬ ) should be romanized êy, unless it
68-
represents the compound (iẕāfah) morpheme, in which case it
69-
is romanized according to note 9 below.
70-
71-
- 5. The division of words utilized in Pashto writing is
72-
followed in romanization, except that the elements –ābād, -
73-
khwā, -shahr, -zādah, -zay and -ullāh are always romanized
74-
as part of the preceding word, e.g. ‫آباد‬ ‫ت‬ ‫م‬ َ ْ‫ح‬
75-
‫ر‬ َ → Raḩmatābād and ‫الله‬ ‫ت‬ ‫م‬ َ ْ‫ح‬ ‫ر‬ َ →
76-
Raḩmatullāh. However, when the word for God ( ‫الله‬ )
77-
appears as a standalone word it should be written Allāh.
78-
Note also the “dagger alif” ( ٙ) above the second ‫ل‬ (lām)
79-
in the word ‫الله‬ ; this, like the short vowels, is not
80-
written in Pashto but should be romanized ā, like a full-
81-
size alif. Persian derivational endings such as –vand and
82-
endings of Turkish origin such as –lar, -lī, -lū, -i, -u, -
83-
si, and –su, should be written together with the preceding
84-
word.
85-
86-
- 6. The Pashto preposition ‫د‬ should be romanized dê in
87-
agreement with its pronunciation, despite the fact that
88-
it is sometimes pointed with kasrah ( ٙ ).
89-
90-
- 7. In names of Arabic origin, the l of the definite article
91-
al/ul is assimilated before the ‘sun letters’ t, s̄ , d,
92-
z̄ , r, z, s, sh, ş, ẕ, ţ, z̧ , l and n. In romanization,
93-
the article will be written al or its assimilated
94-
equivalent in name-initial position but ul or its
95-
assimilated equivalent elsewhere; the article should be
96-
separated from the name it precedes and should not be
97-
capitalized, except at the beginning of a name, e.g. جَبَل
98-
السَرَاج → Jabal us Sarāj
99-
100-
- 8. In Arabic names, a shaddah, ٙ is used to denote the
101-
doubling of a particular consonant character, e.g. ‫مَّد‬
102-
َ‫ح‬ ‫م‬ ُ → Muḩammad. However, in Pashto this ‘doubling’
103-
is frequently omitted in both Perso-Arabic script and the
104-
resulting romanization. Guidance on doubling may be taken
105-
from an authoritative names source, such as an Afghan
106-
government source or Pashto dictionary; for example, it is
107-
usual to see Ḩājī without and ‘Abbās with the doubled
108-
consonant. The doubled y consonant is almost always
109-
retained, as in Sayyid or Qayyūm
110-
111-
- 9. The iẕāfah morpheme is not a grammatical feature of
112-
Pashto and, if encountered in a linguistically hybrid
113-
geographical name (i.e. combining features of both Pashto
114-
and Dari), it should be treated according to the BGN/PCGN
115-
national system of romanization for Afghanistan, 2007, as –
116-
e, unless the preceding word ends with a silent heh (‫)ه‬
117-
or a vowel when it should be shown – ye, e.g. 10. The
118-
character sequence ‫خو‬ , ‫صار‬ ‫ح‬ ِ ‫غر‬ → Ghar-e Ḩişār;
119-
‫و‬ ‫ن‬ َ ‫ه‬ ٔ ‫لع‬ َ ‫ق‬ َ → when followed by ‫ا‬ or
120-
‫ی‬ , Qal‘ah-ye Now.
121-
122-
123-
- 10. The character sequence خو when followed by ‫ا‬ or
124-
‫ی‬ ,should be romanized khw, although the w is either not
125-
pronounced, or only weakly pronounced; e.g. ‫خواجه‬ →
126-
khwājah.
127-
128-
- 11. An inventory of letter-diacritic combinations in addition to the unmodified letters of the
129-
basic Roman script is
130-
‘ (U+2018)
131-
ʼ (U+2019)
132-
Ā (U+0100)
133-
ā (U+0101)
134-
Á (U+00C1)
135-
á (U+00E1)
136-
Ḏ (U+0044+0031)
137-
ḏ (U+0064+00031)
138-
Ē (U+0112)
139-
ē (U+0113)
140-
Ê (U+00CA)
141-
ê (U+00EA)
142-
Ḩ (U+1E28)
143-
ḩ (U+1E29)
144-
Ī (U+012A)
145-
ī (U+012B)
146-
N̄ (U+004E+0304)
147-
n̄ (U+004E+0304)
148-
Ō (U+014C)
149-
ō (U+014D)
150-
Ṟ (U+0052+0031)
151-
ṟ (U+0072+0031)
152-
Ş (U+015E)
153-
ş (U+015F)
154-
S̄ (U+0053+0304)
155-
s̄ (U+0073+0304)
156-
Ṯ (U+0054+0031)
157-
ṯ (U+0074+0031)
158-
Ţ (U+0162)
159-
ţ (U+0163)
160-
Ū (U+016A)
161-
ū (U+016B)
162-
Z̧ (U+005A+0327)
163-
z̧ (U+007A+0327)
164-
Z̄ (U+005A+0304)
165-
z̄ (U+007A+0304)
166-
Ẕ (U+005A+0331)
167-
ẕ (U+007A+0331)
168-
Z͟ H (U+005A+0048+035F)
169-
z͟ h (U+007A+0068+035F)
36+
# Special rules: Consonants
37+
- Pashto letter ge (‫ږ‬) The two common renderings for this
38+
letter are 'zh' and 'g.' 1 The preferred option will be
39+
'zh' (consistent with the choice of southern 'sh' for ‫)ښ‬.
40+
However, when referring to communities that consistently
41+
render the name with a 'g' as opposed to a 'zh,' then 'g'
42+
will be the preferred option. In these cases, the inclusion
43+
of a variant spelling with 'zh' is strongly encouraged.
44+
45+
- Double consonants Double consonants represented by the
46+
tashdid (shaddah) are shown in most cases regardless of
47+
whether they are clearly enunciated in speech. Examples
48+
Muhammad Hassan, Izzatullah. However, consonants
49+
represented by digraphs are not doubled. Example Mubashir (
50+
not Mubashshir). Special care should be taken when
51+
possible to discriminate between doubled and
52+
non-doubled letters in names that are otherwise
53+
indistinguishable in their transliterated forms
54+
Hasan (حسن (vs. Hassan (حسان(
55+
56+
- Digraphs No distinction is made between digraphs such as 'sh'
57+
and single contiguous letters such as 's' followed by 'h.'
58+
59+
# Special rules: Vowels
60+
- Short vowels zair and pesh The preferred options for the
61+
short vowels represented by the zair and pesh will be 'i'
62+
and 'u.' However, in cases where there is a mixed Dari and
63+
Pashto environment, then the use of 'e' and 'o' is accepted
64+
in consideration of Dari norms.
65+
66+
- Long/short vowels Long and short vowels are not
67+
distinguished in the system (with the exception of
68+
certain spellings driven by Dari influence as discussed
69+
above). In this and other systems, the borrowed
70+
Arabic name Salim could represent two distinct names, one
71+
with a long /a/ (Saalim - ‫)سالم and one with a
72+
long /i/ (Saliim - ‫)سلیم‬. This is known as a collision.
73+
This and many other prevailing standardization
74+
systems do not distinguish between these types of
75+
collisions. However, in cases like these, it is
76+
recommended that a vigorous effort be made to include
77+
variant spellings in order to eliminate ambiguity
78+
as to which name is intended, as in the following examples
79+
Hamid (var. Hameed) – ‫حمید‬
80+
Hamid (var. Hamed) – ‫حامد‬
81+
82+
- Izafat The linking vowel of Persian origin known as the
83+
izafat will be written with a hyphen and then 'e'
84+
and then a following space. Example Koh-e Nur ("mountain
85+
of light"). There will be no special
86+
accommodation for when the initial word ends in a vowel.
17087

17188
tests:
17289

@@ -270,10 +187,40 @@ map:
270187
'\u064f': # ُ damma
271188
- 'u'
272189
- 'o'
273-
'[\u0622|u0627]' : 'a' # آ/ا
190+
'[\u0622|\u0627]' : 'a' # آ/ا
274191
'\u0648' : 'o' # و
275192
'\u064e\u0648\u0652' : 'aw' # ـَوْ
276193
'\u064f\u0648' : 'u' # ـُو
277194
'\u064e\u064a' : 'ai' # ـي
278195
'\u06D0' : 'e' # ې
279-
'\u06CD' : 'ey' # ‫ۍ‬
196+
'\u06CD' : 'ey' # ‫ۍ
197+
'\u06CC'‬ : 'a' # ‫ی‬
198+
'\u064e\u06CC[\u0647|\u0627]' : 'aya' # َيا / َيه
199+
'\u0650\u06CC[\u0647|\u0627]' : 'ia' # # َيا / َيه
200+
'\u0652\u06CC\u0627' : 'ya' # ْيا
201+
'[\u06D0|\u06D2]\b' : 'ey' # ‫ے‬ / ‫ې‬
202+
'\u0648\u064A\b' : 'oy' # ‫وي‬
203+
'\u064f\u0648\u064A\b' : 'uy' # ُوي
204+
205+
# Double consonants
206+
'\u0628\u0651' : 'b' # ب
207+
'\u067E\u0651' : 'p' # پ
208+
'[\u062a|\u067C|\u0637]\u0651' : 't' # ت/ټ/ط
209+
'\u062c\u0651' : 'j' # ج
210+
'[\u062d|\u0647]\u0651' : 'h' # ح/ه
211+
'[\u062f|\u0689]\u0651' : 'd' # د/ډ‬
212+
'[\u0631|\u0693]\u0651' : 'r' # ر/ړ
213+
'[\u0630|\u0632|\u0636|\u0638]\u0651' : 'z' # ذ/ز/ض/ظ
214+
'[\u062B|\u0633|\u0635]\u0651' : 's' # س/ث/ص
215+
'\u0641\u0651' : 'f' # ف
216+
'\u0642\u0651' : 'q' # ق
217+
'\u0643\u0651' : 'k' # ك
218+
'\u06A9\u0651' : 'k' # ک
219+
'[\u06AF|\u06AB]\u0651' : 'g' # ‫گ‬/ګ
220+
'\u0644\u0651' : 'l' # ل
221+
'\u0645\u0651' : 'm' # م
222+
'[\u0646|\u06BC]\u0651' : 'n' # ن/ڼ
223+
'\u0648\u0651' : 'w' # و
224+
'\u064a\u0651' : 'y' # ي
225+
'\u0649\u0651' : 'y' # ي
226+

0 commit comments

Comments
 (0)