@@ -33,140 +33,57 @@ description: |
33
33
this document.
34
34
35
35
notes :
36
- - 1. Alif ( ا ) should be romanized as follows
37
- a. Initially,it indicates that the word begins with a vowel or
38
- diphthong; the alif itself is not romanized, but rather the
39
- short vowel it “carries” is romanized; e.g., Aslam Zhrandah
40
- ه َد ن ژر سلَم َأ ميړ → b. When it carries a
41
- maddah ()آ (see vowel table, row 3), it represents ā;
42
- e.g., Band. Mīṟ د ن ب َ آب → Āb c. Medially and
43
- finally it represents ā (see table 2, row 2); e.g., ۍ
44
- ماڼ → Māṉêy d. Medially and finally in words of Arabic
45
- origin, alif may serve as the bearer of hamzah, e.g.
46
- رأس → ra’s. See also note 4.
47
-
48
- - 2. The characters tsē ( څ ) and dzē ( ځ ) may be
49
- romanized t͡ s and d͡ z (the combining double breve (
50
- Unicode 0361) appearing over the digraph) when for special
51
- reasons it is desired that confusion be avoided between
52
- ت (t) plus س (s) and between د (d) plus ز (z),
53
- respectively.
54
-
55
- - 3. Occasionally the character sequences ه ك , ه ز ,
56
- ه س , and ه گ occur . They may be romanized k·h, z·
57
- h, s·h, and g·h in order to differentiate these
58
- romanizations from the digraphs kh, zh, sh, and gh, which
59
- are used to represent the characters خ , ژ, ش , and
60
- غ respectively .
61
-
62
- - 4. Hamzah ( ء ) should be romanized as follows a. In
63
- word-initial position, where it will appear either above or
64
- below alif ( indicates a short vowel and should not itself
65
- be romanized. romanized by an apostrophe, e.g. أ or
66
- إ ), it In other positions it should be جُزء → juz’. b.
67
- Yeh with hamzah ( ئ ) should be romanized êy, unless it
68
- represents the compound (iẕāfah) morpheme, in which case it
69
- is romanized according to note 9 below.
70
-
71
- - 5. The division of words utilized in Pashto writing is
72
- followed in romanization, except that the elements –ābād, -
73
- khwā, -shahr, -zādah, -zay and -ullāh are always romanized
74
- as part of the preceding word, e.g. آباد ت م َ ْح
75
- ر َ → Raḩmatābād and الله ت م َ ْح ر َ →
76
- Raḩmatullāh. However, when the word for God ( الله )
77
- appears as a standalone word it should be written Allāh.
78
- Note also the “dagger alif” ( ٙ) above the second ل (lām)
79
- in the word الله ; this, like the short vowels, is not
80
- written in Pashto but should be romanized ā, like a full-
81
- size alif. Persian derivational endings such as –vand and
82
- endings of Turkish origin such as –lar, -lī, -lū, -i, -u, -
83
- si, and –su, should be written together with the preceding
84
- word.
85
-
86
- - 6. The Pashto preposition د should be romanized dê in
87
- agreement with its pronunciation, despite the fact that
88
- it is sometimes pointed with kasrah ( ٙ ).
89
-
90
- - 7. In names of Arabic origin, the l of the definite article
91
- al/ul is assimilated before the ‘sun letters’ t, s̄ , d,
92
- z̄ , r, z, s, sh, ş, ẕ, ţ, z̧ , l and n. In romanization,
93
- the article will be written al or its assimilated
94
- equivalent in name-initial position but ul or its
95
- assimilated equivalent elsewhere; the article should be
96
- separated from the name it precedes and should not be
97
- capitalized, except at the beginning of a name, e.g. جَبَل
98
- السَرَاج → Jabal us Sarāj
99
-
100
- - 8. In Arabic names, a shaddah, ٙ is used to denote the
101
- doubling of a particular consonant character, e.g. مَّد
102
- َح م ُ → Muḩammad. However, in Pashto this ‘doubling’
103
- is frequently omitted in both Perso-Arabic script and the
104
- resulting romanization. Guidance on doubling may be taken
105
- from an authoritative names source, such as an Afghan
106
- government source or Pashto dictionary; for example, it is
107
- usual to see Ḩājī without and ‘Abbās with the doubled
108
- consonant. The doubled y consonant is almost always
109
- retained, as in Sayyid or Qayyūm
110
-
111
- - 9. The iẕāfah morpheme is not a grammatical feature of
112
- Pashto and, if encountered in a linguistically hybrid
113
- geographical name (i.e. combining features of both Pashto
114
- and Dari), it should be treated according to the BGN/PCGN
115
- national system of romanization for Afghanistan, 2007, as –
116
- e, unless the preceding word ends with a silent heh ()ه
117
- or a vowel when it should be shown – ye, e.g. 10. The
118
- character sequence خو , صار ح ِ غر → Ghar-e Ḩişār;
119
- و ن َ ه ٔ لع َ ق َ → when followed by ا or
120
- ی , Qal‘ah-ye Now.
121
-
122
-
123
- - 10. The character sequence خو when followed by ا or
124
- ی ,should be romanized khw, although the w is either not
125
- pronounced, or only weakly pronounced; e.g. خواجه →
126
- khwājah.
127
-
128
- - 11. An inventory of letter-diacritic combinations in addition to the unmodified letters of the
129
- basic Roman script is
130
- ‘ (U+2018)
131
- ʼ (U+2019)
132
- Ā (U+0100)
133
- ā (U+0101)
134
- Á (U+00C1)
135
- á (U+00E1)
136
- Ḏ (U+0044+0031)
137
- ḏ (U+0064+00031)
138
- Ē (U+0112)
139
- ē (U+0113)
140
- Ê (U+00CA)
141
- ê (U+00EA)
142
- Ḩ (U+1E28)
143
- ḩ (U+1E29)
144
- Ī (U+012A)
145
- ī (U+012B)
146
- N̄ (U+004E+0304)
147
- n̄ (U+004E+0304)
148
- Ō (U+014C)
149
- ō (U+014D)
150
- Ṟ (U+0052+0031)
151
- ṟ (U+0072+0031)
152
- Ş (U+015E)
153
- ş (U+015F)
154
- S̄ (U+0053+0304)
155
- s̄ (U+0073+0304)
156
- Ṯ (U+0054+0031)
157
- ṯ (U+0074+0031)
158
- Ţ (U+0162)
159
- ţ (U+0163)
160
- Ū (U+016A)
161
- ū (U+016B)
162
- Z̧ (U+005A+0327)
163
- z̧ (U+007A+0327)
164
- Z̄ (U+005A+0304)
165
- z̄ (U+007A+0304)
166
- Ẕ (U+005A+0331)
167
- ẕ (U+007A+0331)
168
- Z͟ H (U+005A+0048+035F)
169
- z͟ h (U+007A+0068+035F)
36
+ # Special rules: Consonants
37
+ - Pashto letter ge (ږ) The two common renderings for this
38
+ letter are 'zh' and 'g.' 1 The preferred option will be
39
+ ' zh' (consistent with the choice of southern 'sh' for )ښ.
40
+ However, when referring to communities that consistently
41
+ render the name with a 'g' as opposed to a 'zh,' then 'g'
42
+ will be the preferred option. In these cases, the inclusion
43
+ of a variant spelling with 'zh' is strongly encouraged.
44
+
45
+ - Double consonants Double consonants represented by the
46
+ tashdid (shaddah) are shown in most cases regardless of
47
+ whether they are clearly enunciated in speech. Examples
48
+ Muhammad Hassan, Izzatullah. However, consonants
49
+ represented by digraphs are not doubled. Example Mubashir (
50
+ not Mubashshir). Special care should be taken when
51
+ possible to discriminate between doubled and
52
+ non-doubled letters in names that are otherwise
53
+ indistinguishable in their transliterated forms
54
+ Hasan (حسن (vs. Hassan (حسان(
55
+
56
+ - Digraphs No distinction is made between digraphs such as 'sh'
57
+ and single contiguous letters such as 's' followed by 'h.'
58
+
59
+ # Special rules: Vowels
60
+ - Short vowels zair and pesh The preferred options for the
61
+ short vowels represented by the zair and pesh will be 'i'
62
+ and 'u.' However, in cases where there is a mixed Dari and
63
+ Pashto environment, then the use of 'e' and 'o' is accepted
64
+ in consideration of Dari norms.
65
+
66
+ - Long/short vowels Long and short vowels are not
67
+ distinguished in the system (with the exception of
68
+ certain spellings driven by Dari influence as discussed
69
+ above). In this and other systems, the borrowed
70
+ Arabic name Salim could represent two distinct names, one
71
+ with a long /a/ (Saalim - )سالم and one with a
72
+ long /i/ (Saliim - )سلیم. This is known as a collision.
73
+ This and many other prevailing standardization
74
+ systems do not distinguish between these types of
75
+ collisions. However, in cases like these, it is
76
+ recommended that a vigorous effort be made to include
77
+ variant spellings in order to eliminate ambiguity
78
+ as to which name is intended, as in the following examples
79
+ Hamid (var. Hameed) – حمید
80
+ Hamid (var. Hamed) – حامد
81
+
82
+ - Izafat The linking vowel of Persian origin known as the
83
+ izafat will be written with a hyphen and then 'e'
84
+ and then a following space. Example Koh-e Nur ("mountain
85
+ of light"). There will be no special
86
+ accommodation for when the initial word ends in a vowel.
170
87
171
88
tests :
172
89
@@ -270,10 +187,40 @@ map:
270
187
' \u064f ' : # ُ damma
271
188
- ' u'
272
189
- ' o'
273
- ' [\u0622|u0627]' : 'a' # آ/ا
190
+ ' [\u0622|\ u0627]' : 'a' # آ/ا
274
191
' \u0648' : 'o' # و
275
192
' \u064e\u0648\u0652' : 'aw' # ـَوْ
276
193
' \u064f\u0648' : 'u' # ـُو
277
194
' \u064e\u064a' : 'ai' # ـي
278
195
' \u06D0' : 'e' # ې
279
- ' \u06CD' : 'ey' # ۍ
196
+ ' \u06CD' : 'ey' # ۍ
197
+ ' \u06CC' : 'a' # ی
198
+ ' \u064e\u06CC[\u0647|\u0627]' : 'aya' # َيا / َيه
199
+ ' \u0650\u06CC[\u0647|\u0627]' : 'ia' # # َيا / َيه
200
+ ' \u0652\u06CC\u0627' : 'ya' # ْيا
201
+ ' [\u06D0|\u06D2]\b' : 'ey' # ے / ې
202
+ ' \u0648\u064A\b' : 'oy' # وي
203
+ ' \u064f\u0648\u064A\b' : 'uy' # ُوي
204
+
205
+ # Double consonants
206
+ ' \u0628\u0651' : 'b' # ب
207
+ ' \u067E\u0651' : 'p' # پ
208
+ ' [\u062a|\u067C|\u0637]\u0651' : 't' # ت/ټ/ط
209
+ ' \u062c\u0651' : 'j' # ج
210
+ ' [\u062d|\u0647]\u0651' : 'h' # ح/ه
211
+ ' [\u062f|\u0689]\u0651' : 'd' # د/ډ
212
+ ' [\u0631|\u0693]\u0651' : 'r' # ر/ړ
213
+ ' [\u0630|\u0632|\u0636|\u0638]\u0651' : 'z' # ذ/ز/ض/ظ
214
+ ' [\u062B|\u0633|\u0635]\u0651' : 's' # س/ث/ص
215
+ ' \u0641\u0651' : 'f' # ف
216
+ ' \u0642\u0651' : 'q' # ق
217
+ ' \u0643\u0651' : 'k' # ك
218
+ ' \u06A9\u0651' : 'k' # ک
219
+ ' [\u06AF|\u06AB]\u0651' : 'g' # گ/ګ
220
+ ' \u0644\u0651' : 'l' # ل
221
+ ' \u0645\u0651' : 'm' # م
222
+ ' [\u0646|\u06BC]\u0651' : 'n' # ن/ڼ
223
+ ' \u0648\u0651' : 'w' # و
224
+ ' \u064a\u0651' : 'y' # ي
225
+ ' \u0649\u0651' : 'y' # ي
226
+
0 commit comments