Arabic Diacritics Wiki
Arabic Diacritics Wiki
Arabic Diacritics Wiki
Arabic diacritics
Article
Talk
Arabic script has numerous diacritics, which include: consonant pointing known as iʻjām ()ِإْع َج ام, and supplementary diacritics known as
tashkīl ()َتْش ِكيل. The latter include the vowel marks termed ḥarakāt ( ;َح َر َك اتsingular: َح َر َك ة, ḥarakah).
Contents
The literal meaning of َتْش ِكيلtashkīl is 'forming'. As the normal Arabic text does not provide enough information about the correct
pronunciation, the main purpose of tashkīl (and ḥarakāt) is to provide a phonetic guide or a phonetic aid; i.e. show the correct pronunciation.
It serves the same purpose as furigana (also called "ruby") in Japanese or pinyin or zhuyin in Mandarin Chinese for children who are learning
to read or foreign learners.
The bulk of Arabic script is written without ḥarakāt (or short vowels). However, they are commonly used in texts that demand strict
adherence to exact pronunciation. This is true, primarily, of the Qur'an ⟨( ⟩ٱْلُقْر آنal-Qurʾān) and poetry. It is also quite common to add ḥarakāt
to hadiths ⟨( ⟩ٱْلَح ِديثal-ḥadīth; plural: al-ḥādīth) and the Bible. Another use is in children's literature. Moreover, ḥarakāt are used in ordinary
texts in individual words when an ambiguity of pronunciation cannot easily be resolved from context alone. Arabic dictionaries with vowel
marks provide information about the correct pronunciation to both native and foreign Arabic speakers. In art and calligraphy, ḥarakāt might
be used simply because their writing is considered aesthetically pleasing.
َّٰل
ِبْس ِم ٱل ِه ٱلَّر ْح َٰمِن ٱلَّر ِح يِم
bismi -llāhi r-raḥmāni r-raḥīmi
Some Arabic textbooks for foreigners now use ḥarakāt as a phonetic guide to make learning reading Arabic easier. The other method used in
textbooks is phonetic romanisation of unvocalised texts. Fully vocalised Arabic texts (i.e. Arabic texts with ḥarakāt/diacritics) are sought
after by learners of Arabic. Some online bilingual dictionaries also provide ḥarakāt as a phonetic guide similarly to English dictionaries
providing transcription.
The ḥarakāt َح َر َك ات, which literally means 'motions', are the short vowel marks. There is some ambiguity as to which tashkīl are also ḥarakāt;
the tanwīn, for example, are markers for both vowels and consonants.
Fatḥah
The fatḥah ⟨ ⟩َفْتَح ةis a small diagonal line placed above a letter, and represents a short /a/ (like the /a/ sound in the English word "cat").
َـ The word fatḥah itself ( )َفْتَح ةmeans opening and refers to the opening of the mouth when producing an /a/. For example, with dāl
(henceforth, the base consonant in the following examples): ⟨ ⟩َد/da/.
When a fatḥah is placed before a plain letter ⟨( ⟩اalif) (i.e. one having no hamza or vowel of its own), it represents a long /aː/ (close to the
sound of "a" in the English word "dad", with an open front vowel /æː/, not back /ɑː/ as in "father"). For example: ⟨ ⟩َدا/daː/. The fatḥah is not
usually written in such cases. When a fathah placed before the letter ⟨( ⟩ﻱyā’), it creates an /aj/ (as in "lie"); and when placed before the letter
⟨( ⟩وwāw), it creates an /aw/ (as in "cow").
Although paired with a plain letter creates an open front vowel (/a/), often realized as near-open (/æ/), the standard also allows for
variations, especially under certain surrounding conditions. Usually, in order to have the more central (/ä/) or back (/ɑ/) pronunciation, the
word features a nearby back consonant, such as the emphatics, as well as qāf, or rā’. A similar "back" quality is undergone by other vowels
as well in the presence of such consonants, however not as drastically realized as in the case of fatḥah.[1][2][3]
Kasrah
ِـ A similar diagonal line below a letter is called a kasrah ⟨ ⟩َك ْس َر ةand designates a short /i/ (as in "me", "be") and its allophones [i, ɪ, e, e̞, ɛ]
(as in "Tim", "sit"). For example: ⟨ ⟩ِد/di/.[4]
When a kasrah is placed before a plain letter ⟨( ⟩ﻱyā’), it represents a long /iː/ (as in the English word "steed"). For example: ⟨ ⟩ِدي/diː/. The
kasrah is usually not written in such cases, but if yā’ is pronounced as a diphthong /aj/, fatḥah should be written on the preceding consonant
to avoid mispronunciation. The word kasrah means 'breaking'.[1]
Ḍammah
The ḍammah ⟨ ⟩َض َّمةis a small curl-like diacritic placed above a letter to represent a short /u/ (as in "duke", shorter "you") and its
ُـ allophones [u, ʊ, o, o̞, ɔ] (as in "put", or "bull"). For example: ⟨ ⟩ُد/du/.[4]
When a ḍammah is placed before a plain letter ⟨( ⟩وwāw), it represents a long /uː/ (like the 'oo' sound in the English word "swoop"). For
example: ⟨ ⟩ُدو/duː/. The ḍammah is usually not written in such cases, but if wāw is pronounced as a diphthong /aw/, fatḥah should be
written on the preceding consonant to avoid mispronunciation.[1]
The word ḍammah ( )َض َّمةin this context means rounding, since it is the only rounded vowel in the vowel inventory of Arabic.
Alif Khanjariyah
ـٰـ
The superscript (or dagger) alif ⟨( ⟩َأِلف َخ ْنَج ِر َّيةalif khanjarīyah), is written as short vertical stroke on top of a consonant. It indicates a
long /aː/ sound for which alif is normally not written. For example: ⟨( ⟩َٰهَذ اhādhā) or ⟨( ⟩َر ْح َٰمنraḥmān).
The dagger alif occurs in only a few words, but they include some common ones; it is seldom written, however, even in fully vocalised texts.
Most keyboards do not have dagger alif. The word Allah ⟨( ⟩اهللAllāh) is usually produced automatically by entering alif lām lām hāʾ. The word
consists of alif + ligature of doubled lām with a shaddah and a dagger alif above lām.
Maddah
Not to be confused with Tilde.
The maddah ⟨ ⟩َمَّدةis a tilde-shaped diacritic, which can only appear on top of an alif ( )آand indicates a glottal stop /ʔ/ followed by a
ٓـآ long /aː/.
In theory, the same sequence /ʔaː/ could also be represented by two alifs, as in *⟨⟩َأا, where a hamza above the first alif represents
the /ʔ/ while the second alif represents the /aː/. However, consecutive alifs are never used in the Arabic orthography. Instead, this
sequence must always be written as a single alif with a maddah above it, the combination known as an alif maddah. For example: ⟨ ⟩ُقْر آن/qur
ˈʔaːn/.
Alif waslah
Main article: Wasla (diacritic)
The waṣlah ⟨⟩َوْص َلة, alif waṣlah ⟨ ⟩َأِلف َوْص َلةor hamzat waṣl ⟨ ⟩َه ْم َز ة َوْص لlooks like a small letter ṣād on top of an alif ⟨( ⟩ٱalso indicated by
ٱ an alif ⟨ ⟩اwithout a hamzah). It means that the alif is not pronounced when its word does not begin a sentence. For example: ⟨⟩ِبٱْس ِم
(bismi), but ⟨( ⟩ٱْم ُش و۟اimshū not mshū). This is because no Arab word can start with a vowel-less consonant (unlike the English
school, or skateboard). But when it happens, an alif is added to obtain a vowel or a vowelled consonant at the beginning of one's speech. In
English that would result in *ischool, or *iskateboard.
It occurs only in the beginning of words, but it can occur after prepositions and the definite article. It is commonly found in imperative verbs,
the perfective aspect of verb stems VII to X and their verbal nouns (maṣdar). The alif of the definite article is considered a waṣlah.
To replace the elided hamza whose alif-seat has assimilated to the previous vowel. For example: ِفي ٱْلَيَمنor ( في اليمنfi l-Yaman) ‘in Yemen’.
In hamza-initial imperative forms following a vowel, especially following the conjunction ⟨( ⟩وwa-) ‘and’. For example: َ( ُقْم َوٱْش َر ِب ٱْلَماَءqum
wa-shrab-i l-mā’) ‘rise and then drink the water’.
Like the superscript alif, it is not written in fully vocalized scripts, except for sacred texts, like the Quran and Arabized Bible.
Sukūn
ْــ The sukūn ⟨ ⟩ُس ُك وْنis a circle-shaped diacritic placed above a letter ( ْ). It indicates that the consonant to which it is attached is not
followed by a vowel, i.e., zero-vowel.
It is a necessary symbol for writing consonant-vowel-consonant syllables, which are very common in Arabic. For example: ⟨( ⟩َدْدdad).
The sukūn may also be used to help represent a diphthong. A fatḥah followed by the letter ⟨( ⟩ﻱyā’) with a sukūn over it ( )َـْيindicates the
diphthong ay (IPA /aj/). A fatḥah, followed by the letter ⟨( ⟩ﻭwāw) with a sukūn, ( )َـْوindicates /aw/.
The sukūn may have also an alternative form of the small high head of ḥāʾ (U+06E1 ۡ ), particularly in some Qurans. Other shapes
ۡـ ـ may exist as well (for example, like a small comma above ⟨ʼ⟩ or like a circumflex ⟨ˆ⟩ in nastaʿlīq).[5]
Tanwin (final postnasalized or long vowels)
Main article: Nunation
ٌـ ٍـ ًـ
The three vowel diacritics may be doubled at the end of a word to indicate that the vowel is followed by the consonant n. They
may or may not be considered ḥarakāt and are known as tanwīn ⟨⟩َتْنِو ين, or nunation. The signs indicate, from left to right, -un, -in,
-an.
These endings are used as non-pausal grammatical indefinite case endings in Literary Arabic or classical Arabic (triptotes only). In a
vocalised text, they may be written even if they are not pronounced (see pausa). See i‘rāb for more details. In many spoken Arabic dialects,
the endings are absent. Many Arabic textbooks introduce standard Arabic without these endings. The grammatical endings may not be
written in some vocalized Arabic texts, as knowledge of i‘rāb varies from country to country, and there is a trend towards simplifying Arabic
grammar.
The sign ⟨ ⟩ًـis most commonly written in combination with ⟨( ⟩ًـاalif), ⟨( ⟩ًةtā’ marbūṭah), ⟨( ⟩ًأalif hamzah) or stand-alone ⟨( ⟩ًءhamzah). Alif
should always be written (except for words ending in tā’ marbūṭah, hamzah or diptotes) even if an is not. Grammatical cases and tanwīn
endings in indefinite triptote forms:
The shadda or shaddah ⟨( ⟩َش َّدةshaddah), or tashdid ⟨( ⟩َتْش ِديدtashdīd), is a diacritic shaped like a small written Latin "w".
ّــ
It is used to indicate gemination (consonant doubling or extra length), which is phonemic in Arabic. It is written above the consonant
which is to be doubled. It is the only ḥarakah that is commonly used in ordinary spelling to avoid ambiguity. For example: ⟨ ⟩ّد/dd/;
madrasah ⟨'( ⟩َمْدَر َس ةschool') vs. mudarrisah ⟨'( ⟩ُمَدِّر َس ةteacher', female).
The i‘jām ⟨( ⟩ِإْع َج امsometimes also called nuqaṭ)[6] are the diacritic points that distinguish various
consonants that have the same form (rasm), such as ⟨ ⟩ـبـ/b/ ب, ⟨ ⟩ـتـ/t/ ت, ⟨ ⟩ـثـ/θ/ ث, ⟨ ⟩ـنـ/n/ ن, and ⟨ ⟩ـيـ/j/ ي.
Typically i‘jām are not considered diacritics but part of the letter.
Early manuscripts of the Qur’ān did not use diacritics either for vowels or to distinguish the different
values of the rasm. Vowel pointing was introduced first, as a red dot placed above, below, or beside the
rasm, and later consonant pointing was introduced, as thin, short black single or multiple dashes placed
above or below the rasm (image). These i‘jām became black dots about the same time as the ḥarakāt
became small black letters or strokes. 7th-century kufic script without any
ḥarakāt or i‘jām.
Typically, Egyptians do not use dots under final yā’ ⟨⟩ي, which looks exactly like alif maqṣūrah ⟨ ⟩ىin
handwriting and in print. This practice is also used in copies of the muṣḥaf (Qurʾān) scribed by ‘Uthman Ṭāhā. The same unification of yā and
alif maqṣūrā has happened in Persian, resulting in what the Unicode Standard calls "Arabic Letter Farsi Yeh", that looks exactly the same
as yā in initial and medial forms, but exactly the same as alif maqṣūrah in final and isolated forms ⟨ـی ـیـ ⟩یـ.
known as jarrah, resembling a long fatħah, was used for a contracted (assimilated) sin. Thus ⟨ سٚ ⟩ڛ ۣس ۡس
were all used to indicate that the letter in question was truly ⟨ ⟩سand not ⟨⟩ش.[7] These signs, collectively known as ‘alāmātu-l-ihmāl, are still
occasionally used in modern Arabic calligraphy, either for their original purpose (i.e. marking letters without i‘jām), or often as purely
decorative space-fillers. The small کabove the kāf in its final and isolated forms ⟨ـك ⟩كwas originally an ‘alāmatu-l-ihmāl that became a
permanent part of the letter. Previously this sign could also appear above the medial form of kāf, when that letter was written without the
stroke on its ascender. When kaf was written without that stroke, it could be mistaken for lam, thus kaf was distinguished with a superscript
kaf or a small superscript hamza (nabrah), and lam with a superscript l-a-m (lam-alif-mim).[8]
Although normally a diacritic is not considered a letter of the alphabet, the hamza ( َه ْم زةhamzah, glottal stop), often
ء أ إ ؤ ئ stands as a separate letter in writing, is written in unpointed texts and is not considered a tashkīl. It may appear as a
letter by itself or as a diacritic over or under an alif, wāw, or yā.
Which letter is to be used to support the hamzah depends on the quality of the adjacent vowels;
If the glottal stop occurs at the beginning of the word, it is always indicated by hamza on an alif: above if the following vowel is /a/ or /u/
and below if it is /i/.
If the glottal stop occurs in the middle of the word, hamzah above alif is used only if it is not preceded or followed by /i/ or /u/:
If /i/ is before or after the glottal stop, a yāʼ with a hamzah is used (the two dots which are usually beneath the yāʾ disappear in this
case): ⟨⟩ئ.
Otherwise, if /u/ is before or after the glottal stop, a wāw with a hamzah is used: ⟨⟩ؤ.
If the glottal stop occurs at the end of the word (ignoring any grammatical suffixes), if it follows a short vowel it is written above alif, wāw,
or yā the same as for a medial case; otherwise on the line (i.e. if it follows a long vowel, diphthong or consonant).
Two alifs in succession are never allowed: /ʔaː/ is written with alif maddah ⟨ ⟩آand /aːʔ/ is written with a free hamzah on the line ⟨⟩اء.
Consider the following words: ⟨ ⟩َأخ/ʔax/ ("brother"), ⟨ ⟩إْس ماِع يل/ʔismaːʕiːl/ ("Ismael"), ⟨ ⟩ُأّم/ʔumm/ ("mother"). All three of above words "begin"
with a vowel opening the syllable, and in each case, alif is used to designate the initial glottal stop (the actual beginning). But if we consider
middle syllables "beginning" with a vowel: ⟨ ⟩َنْش أة/naʃʔa/ ("origin"), ⟨ ⟩َأْف ِئدة/ʔafʔida/ ("hearts"—notice the /ʔi/ syllable; singular ⟨ ⟩ُفؤاد/fuʔaːd/),
⟨ ⟩ُر ُؤوس/ruʔuːs/ ("heads", singular ⟨ ⟩َر ْأس/raʔs/), the situation is different, as noted above. See the comprehensive article on hamzah for more
details.
History
According to tradition, the first to commission a system of harakat was Ali who appointed Abu al-
Aswad al-Du'ali for the task. Abu al-Aswad devised a system of dots to signal the three short vowels
(along with their respective allophones) of Arabic. This system of dots predates the i‘jām, dots used
to distinguish between different consonants.
Early Basmala Kufic Middle Kufic Modern Kufic in Evolution of early Arabic calligraphy (9th–
Qur'an 11th century). The Basmala was taken as
an example, from kufic Qur’ān manuscripts.
(1) Early 9th century, script with no dots or
Abu al-Aswad's system diacritic marks (see image of early
Basmala Kufic);
Abu al-Aswad's system of Harakat was different from the system we know today. The system used (2) and (3) 9th–10th century under Abbasid
dynasty, Abu al-Aswad's system
red dots with each arrangement or position indicating a different short vowel.
established red dots with each
arrangement or position indicating a
A dot above a letter indicated the vowel a, a dot below indicated the vowel i, a dot on the side of a different short vowel; later, a second black-
letter stood for the vowel u, and two dots stood for the tanwīn. dot system was used to differentiate
between letters like fā’ and qāf (see image
However, the early manuscripts of the Qur'an did not use the vowel signs for every letter requiring of middle Kufic);
Accordingly, he replaced the ḥarakāt with small superscript letters: small alif, yā’, and wāw for the short vowels corresponding to the long
vowels written with those letters, a small s(h)īn for shaddah (geminate), a small khā’ for khafīf (short consonant; no longer used). His system
is essentially the one we know today.[9]
Automatic diacritization
The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. It is useful to avoid ambiguity in
applications such as Arabic machine translation, text-to-speech, and information retrieval. Automatic diacritization algorithms have been
developed.[10][11] For Modern Standard Arabic, the state-of-the-art algorithm has a word error rate (WER) of 4.79%. The most common
mistakes are proper nouns and case endings.[12] Similar algorithms exist for other varieties of Arabic.[13]
See also
Arabic alphabet:
I‘rāb ()ِإْع َر اب, the case system of Arabic
Hebrew:
Hebrew diacritics, the Hebrew equivalent
References
1. ^ a b c Karin C. Ryding, "A Reference Grammar of Modern Standard 9. ^ Versteegh, C. H. M. (1997). The Arabic Language . Columbia
Arabic", Cambridge University Press, 2005, pgs. 25-34, specifically University Press. pp. 56ff. ISBN 978-0-231-11152-2.
“Chapter 2, Section 4: Vowels”
10. ^ Azmi, Aqil M.; Almajed, Reham S. (2013-10-10). "A survey of automatic
2. ^ Anatole Lyovin, Brett Kessler, William Ronald Leben, "An Introduction Arabic diacritization techniques" . Natural Language Engineering. 21
to the Languages of the World", "5.6 Sketch of Modern Standard (3): 477–495. doi:10.1017/S1351324913000284 . ISSN 1351-3249 .
Arabic", Oxford University Press, 2017, pg. 255, Edition 2, specifically S2CID 31560671 .
“5.6.2.2 Vowels”
11. ^ Almanea, Manar (2021). "Automatic Methods and Neural Networks in
3. ^ Amine Bouchentouf, Arabic For Dummies®, John Wiley & Sons, 2018, Arabic Texts Diacritization: A Comprehensive Survey" . IEEE Access. 9:
3rd Edition, specifically section "All About Vowels" 145012–145032. doi:10.1109/ACCESS.2021.3122977 . ISSN 2169-
3536 . S2CID 240011970 .
4. ^ a b "Introduction to Written Arabic" . University of Victoria, Canada.
12. ^ Thompson, Brian; Alshehri, Ali (2021-09-28). "Improving Arabic
5. ^ "Arabic character notes" . r12a.
Diacritization by Learning to Diacritize and Translate".
6. ^ Ibn Warraq (2002). Ibn Warraq (ed.). What the Koran Really Says : arXiv:2109.14150 [cs.CL ].
Language, Text & Commentary . Translated by Ibn Warraq. New York:
13. ^ Masmoudi, Abir; Aloulou, Chafik; Abdellahi, Abdel Ghader Sidi;
Prometheus. p. 64. ISBN 1-57392-945-X. Archived from the original
Belguith, Lamia Hadrich (2021-08-08). "Automatic diacritization of
on 11 April 2019. Retrieved 9 April 2019.
Tunisian dialect text using SMT model" . International Journal of
7. ^ Gacek, Adam (2009). "Unpointed letters" . Arabic Manuscripts: A Speech Technology. 25: 89–104. doi:10.1007/s10772-021-09864-6 .
Vademecum for Readers. BRILL. p. 286. ISBN 978-90-04-17036-0. ISSN 1572-8110 . S2CID 238782966 .
Alexis Neme and Sébastien Paumier (2019), "Restoring Arabic vowels through omission-tolerant dictionary lookup", Lang Resources &
Evaluation, Vol. 53, pp. 1-65