Hanyu Pinyin Pronunciation Guide: The Structure of Syllables
Hanyu Pinyin Pronunciation Guide: The Structure of Syllables
Hanyu Pinyin Pronunciation Guide: The Structure of Syllables
Stephen M. Hou
Version: 11/19/2010
This guide is intended to teach native English speakers how to pronounce words written in Hanyu Pinyin,
the official Romanization system for Standard Mandarin Chinese used in mainland China, Taiwan, and
Singapore. Keep in mind that Romanization is not intended approximate English pronunciation – it is
simply a mapping from Latin letters to the sounds of Mandarin. This is also true for other languages that
use the Latin alphabet; for example, the letter “j” is pronounced differently in English, French, Spanish, and
Unless otherwise stated, the English example words are pronounced with a Standard American accent.
Some finals contain one of three glides: -i-, -u-, and -ü-. These are vowels in the middle of syllable that are
followed by more vowels in the same syllable. It is important to note that such syllables are written as one
syllable and thus are to be pronounced as one syllable. For example, English speakers are tempted to
pronounced “liang” as the two-syllable combination “lee-ang”. However, the “-i-” sound is of very short
duration and “glides” into the rest of the final.
When the glide sounds -i-, -u-, and -ü- have no initial (null initial), they are written as y-, w-, and yu-,
respectively. For example, when initials are removed from “liang”, “tuan” and “lü”, they become “yang”,
“wan” and “yu”, respectively.
Mandarin has four tones, plus a neutral (“fifth”) tone:
Tones are written as diacritical marks above a vowel in the syllable. For example, the four tones of the
syllable “ma” are: (1) mā, (2) má, (3) mǎ, (4) mà. The neutral tone is written without any marks at all (ma).
Alternatively, numbers can be written after the Roman letters: ma1, ma2, ma3, ma4, and ma5.
Mandarin has one general tone sandhi rule: When there are two 3rd tones in a row, the first one becomes 2nd
Since no specific Chinese words are used in this guide, tones are not indicated in any syllable in the
remainder of this document. The focus is on learning how to pronounce the initials and finals.
Initial Consonants
Zhuyin Pinyin Pronunciation
ㄅ b- Like b as in “boy”, but unvoiced. Like p as in “spin”.
ㄆ p- Like p as in “pin”, but with more aspiration.
ㄇ m- Like m as in “mom”.
ㄈ f- Like f as in “for”.
ㄉ d- Like d as in “door”, but unvoiced. Like t as in “stick”.
ㄊ t- Like t as in “Tom”, but with more aspiration.
ㄋ n- Like n as in “nun”.
ㄌ l- Like l as in “light”.
ㄍ g- Like g as in “gas”, but unvoiced.
ㄎ k- Like k as in “kite”, but with more aspiration. Like k as in “skin”.
ㄏ h- Like h as in “hall”. Northern speakers tend to have a rasp, like the ch in “Chanukah”
and “Bach”.
ㄐ j- Similar to j as in “jeans”, but with wide lips (smile!), tongue behind lower front teeth,
and unvoiced. Like the Korean ㅈ. Definitely not like the s in “pleasure” or “Asia”.
“Beijing” is commonly mispronounced in the US media.
ㄑ q- Similar to ch as in “cheese”, but with wide lips (smile!), tongue behind lower front
teeth, and with more aspiration.
ㄒ x- Similar to sh as in “sheep”, but with wide lips (smile!) and tongue behind lower front
teeth. Like the Polish ś.
ㄓ zh- Retroflexed version of pinyin z-. Position the tongue and lips as if you were going to
say “err…”, but try to say the English ds as in “beds” (and without voicing) instead.
The resulting sound is the pinyin zh-. Sounds vaguely like the d in “drive”, but
unvoiced. Like the Polish cz.
ㄔ ch- Retroflexed version of pinyin c-. Position the tongue and lips as if you were going to
say “err…”, but try to say the English ts as in “bits” instead. The resulting sound is
the pinyin ch-. This is different from the pinyin q-, which is actually more similar to
the English ch than the pinyin ch- is.
ㄕ sh- Retroflexed version of s-. Position the tongue and lips as if you were going to say
“err…”, but try to say the English s instead. The resulting sound is the pinyin sh-.
This is different from the pinyin x-, which is actually more similar to the English sh
than the pinyin sh- is. “Shanghai” is commonly mispronounced in the US media.
Like the Polish sz and the Swedish and Norwegian rs.
ㄖ r- Position the tongue and mouth like you are going to say “err…”, but try to say the
English l as in “light” instead. Unlike l, however, the tip of the tongue does not make
contact with anything, but the sides of the tongue touch the roof of the mouth. The
resulting sound is the pinyin r-. This is probably the most difficult sound in Standard
Mandarin for native English speakers to pronounce correctly. Like the Polish ż and
ㄗ z- Like ds as in “beds”, but unvoiced. Like the German z.
ㄘ c- Like ts as in “bits”, but with more aspiration. Like the Polish c.
ㄙ s- Like s as in “sand”.
More on Initial Consonants
The last ten consonants can be organized into a table:
Palatals Retroflexes
ㄐ/ j- ㄗ/ z- ㄓ/ zh-
ㄑ/ q- ㄘ/ c- ㄔ/ ch-
ㄒ/ x- ㄙ/ s- ㄕ/ sh-
ㄖ/ r-_
• Palatals: The tip of the tongue is dropped to a place behind the lower front teeth and the blade of the
tongue is brought up to contact the palate (roof of mouth).
• Dental sibilants: The tip of the tongue is held against the back of the gum of the lower front teeth
(alveolar ridge).
• Retroflexes: Position the tongue and mouth like you are going to say “err…”, but try to say the dental
sibilants instead.
Generally, the lips are wider (smile!) than they are when similar English sounds are pronounced.
Note: Many people outside Beijing (especially in Southern China and Taiwan) cannot pronounce the
retroflex sounds correctly. They will frequently pronounce zh-, ch-, and sh- as z-, c-, and s-, respectively;
r- tends to be pronounced in a variety of ways, including as pinyin l- and English z. For example, in
Taiwan, the city of Shanghai is frequently pronounced as “Sanghai”.
Note: The use of the letter combinations q, x, zh, ch, sh, z, and c may seem quite unnatural for English
speakers. After all, why not use “ts” and “ds” to instead of “c” and “z”, respectively? The Hanyu Pinyin
system is widely praised for following two simple principles for initial consonants:
• Adding an “h” after z, c, and s makes them retroflexed.
• Each initial consonant is represented by exactly one letter, or by a combination of a letter with the
retroflex indicator “h”.
Thus, the pinyin assignment of Latin letters to Mandarin initial consonants is quite optimal, given those
constraints. Other Romanization systems, such as Wade-Giles system commonly used in the West before
the 1980s and still used in Taiwan for personal and place names, use the same letter combinations to
represent different sounds; the next letter determines which sound is meant. For example, in the Wade-
Giles system, “ch-” (without an apostrophe) represents the pinyin j- when it is followed by -i- or -ü- (Wade-
Giles “chiang” = Hanyu Pinyin “jiang”), and represents the pinyin zh- otherwise (Wade-Giles “chou” =
Hanyu Pinyin “zhou”).
Note: Finally, Mandarin (and all other modern Chinese dialects) lacks consonant clusters that are frequent
in English and other European languages, such as tr-, fr-, pl-, sn-, sm-, sp-, sc-, st-, etc. However, linguists
believe such clusters were present in Old Chinese, the language of Confucius (ca. 500 BC).
Glideless Finals
Form with
Zhuyin Pinyin Pronunciation
Null Initial
N/A -i N/A Buzzed continuation of the initials zh-, ch-, sh-, r-, z-, c-, and s-. For all
other initials, see the “Group i Finals” table below.
ㄚ -a a Like a as in “father”.
ㄛ -o o Like “awe” or “all”.
ㄜ -e e Like “uh” or the oo in “look”, but with wide lips (smile!).
ㄝ -e ê Like e as in “wet”. Sound without glide and initial consonant only
exists as interjection. Otherwise, it requires a glide.
ㄞ -ai ai Like “eye”.
ㄟ -ei ei Like ay as in “day”.
ㄠ -ao ao Similar to ow as in “now”, but the starting vowel sounds more like the a
in “father”.
ㄡ -ou ou Like o as in “no”.
ㄢ -an an Like the an in “pan” spoken with a British accent. In other words, the a
is pronounced like the a in “ax”. It is not “ahn”. “Mulan” is commonly
mispronounced in the US media.
ㄣ -en en Like en as in “ten”.
ㄤ -ang ang Like ang as in “angst”, or “ah-ng”. It is not like ang as in “sang”.
ㄥ -eng eng Northerners tend to pronounce it like ung as in “rung”. Southerners and
Taiwanese tend to pronounce it as pinyin -en, or as “eh-ng” (similar to
pinyin -en, but ending with a nasal -ng sound instead of -n), and
pronounce the syllable “feng” like “fong” (long o). It is not like ang as
in “sang”.
ㄦ -er er Like er as in “better”. Northerners tend to pronounce it like the ar in
“car” (as if they were pirates or something). Southerners and
Taiwanese tend to pronounce it similar to pinyin -e (ㄜ).
Finals (cont.)
Group u (w- ) Finals
Form with
Zhuyin Pinyin Pronunciation
Null Initial
ㄨ -u wu Like oo as in “moose”. However, when the initial is j-, q-, or x-, see the
table below for Group ü Finals.
ㄨㄚ -ua wa u + a. Like wa as in “swan”.
ㄨㄛ -uo wo u + o. Like “wall”.
ㄨㄞ -uai wai u + ai. Like “why”.
ㄨㄟ -ui wei Contraction of u + ei. Like “way”.
ㄨㄢ -uan wan u + an. Like “wax”, but with “x” replaced by “n”. However, when the
initial is j-, q-, or x-, see the table below for Group ü Finals.
ㄨㄣ -un wen Contraction of u + en. Like “when”. However, when the initial is j-, q-,
or x-, see the table below for Group ü Finals.
ㄨㄤ -uang wang u + ang. The a is like the a as in “father”.
ㄨㄥ -ong weng u + eng. Like “song” in British English (long “o”).
The diphthongs are much more fused in Chinese than in English. For example, the -ai final in the “hai”
syllable of “Shanghai” is said with far less transition from the “a” to “i”, as compared to the similar sound
in English, the “ye” in “bye” or the “ie” in “lie”. Thus, when an English-speaker says “Shanghai”, the “ai”
sounds exaggerated to a native Mandarin speaker.
Non-Ambiguity of Finals
e: Two different sounds (ㄜ and ㄝ) are represented by the pinyin letter “e”. How do we know which
sound is meant? (Recall: ㄜ is like “uh” or the oo in “look”, but with wide lips; ㄝ is like e as in “wet”.)
• ㄜ: Only occurs immediately after (an) initial consonant(s) or by itself (i.e. no glide).
• ㄝ: Almost always requires a glide vowel immediately before it. The only exception is when this final
appears by itself (and it does so only as an interjection), in which case it is written as “ê” to distinguish
it from the syllable “e”, which is “ㄜ”.
i: When -i follows zh-, ch-, sh-, r-, z-, c-, or s-, it is simply a buzzed continuation of the initial consonant.
For all other initial consonants, it is pronounced like ee in “see”. Pronouncing the -i as “ee” after the six
aforementioned consonants produces syllables that do not exist in Standard Mandarin. For example, the
syllable “see” does not exist, even though both “s” and “ee” sounds exist.
ü: We can usually leave out the umlaut dots over the -ü because most initials can be followed by either -u-
or -ü-, but not both (see the Hanyu Pinyin Syllable Table), so it is unambiguous as to which sound is
represented. The only syllables in Standard Mandarin for which both -u- and -ü- versions exist are:
• nü vs. nu
• lü vs. lu
As the umlaut dots are inconvenient or impossible to type on English keyboards, you may sometimes see
“lü” written as “luu” or “lv” (the letter v is unused in Hanyu Pinyin). The syllable “nü” is similarly
sometimes written as “nuu” or “nv”.