Chinese is often described as a "monosyllabic" language. However, this is only partially correct. It is largely accurate when describing Classical Chinese and Middle Chinese; in Classical Chinese, for example, perhaps 90% of words correspond to a single syllable and a single character. In the modern varieties, it is still usually the case that a morpheme (unit of meaning) is a single syllable; contrast English, with plenty of multi-syllable morphemes, both bound and free, such as "seven", "elephant", "para-" and "-able". Some of the conservative southern varieties of modern Chinese still have largely monosyllabic words, especially among the more basic vocabulary.
In modern Mandarin, however, most nouns, adjectives and verbs are largely disyllabic. A significant cause of this is phonological attrition. Sound change over time has steadily reduced the number of possible syllables. In modern Mandarin, there are now only about 1,200 possible syllables, including tonal distinctions, compared with about 5,000 in Vietnamese (still largely monosyllabic) and over 8,000 in English.[b]
This phonological collapse has led to a corresponding increase in the number of homophones. As an example, the small Langenscheidt Pocket Chinese Dictionary lists six common words pronounced shí (tone 2): 十 "ten"; 实 "real, actual"; 识 "know (a person), recognize"; 石 "stone"; 时 "time"; 食 "food". These were all pronounced differently in Early Middle Chinese; in William H. Baxter's transcription they were dzyip, zyit, syik, dzyek, dzyi and zyik respectively. They are still pronounced differently in today's Cantonese; in Jyutping they are sap9, sat9, sik7, sek9, si4, sik9. In modern spoken Mandarin, however, tremendous ambiguity would result if all of these words could be used as-is; Yuen Ren Chao's modern poem Lion-Eating Poet in the Stone Den exploits this, consisting of 92 characters all pronounced shi. As such, most of these words have been replaced (in speech, if not in writing) with a longer, less-ambiguous compound. Only the first one, 十 "ten", normally appears as such when spoken; the rest are normally replaced with, respectively, 实际 shíjì (lit. "actual-connection"); 认识 rènshi (lit. "recognize-know"); 石头 shítou (lit. "stone-head"); 时间 shíjiān (lit. "time-interval"); 食物 shíwù (lit. "food-thing"). In each case, the homophone was disambiguated by adding another morpheme, typically either a synonym or a generic word of some sort (for example, "head", "thing"), whose purpose is simply to indicate which of the possible meanings of the other, homophonic syllable should be selected.
However, when one of the above words forms part of a compound, the disambiguating syllable is generally dropped and the resulting word is still disyllabic. For example, 石 shí alone, not 石头 shítou, appears in compounds meaning "stone-", for example, 石膏 shígāo "plaster" (lit. "stone cream"), 石灰 shíhuī "lime" (lit. "stone dust"), 石窟 shíkū "grotto" (lit. "stone cave"), 石英 shíyīng "quartz" (lit. "stone flower"), 石油 shíyóu "petroleum" (lit. "stone oil").
Most modern varieties of Chinese have the tendency to form new words through disyllabic, trisyllabic and tetra-character compounds. In some cases, monosyllabic words have become disyllabic without compounding, as in 窟窿 kūlong from 孔 kǒng; this is especially common in Jin.
Chinese morphology is strictly bound to a set number of syllables with a fairly rigid construction which are the morphemes, the smallest blocks of the language. While many of these single-syllable morphemes (字, zì) can stand alone as individual words, they more often than not form multi-syllabic compounds, known as cí (词／詞), which more closely resembles the traditional Western notion of a word. A Chinese cí (“word”) can consist of more than one character-morpheme, usually two, but there can be three or more.
- yún 云/雲 – "cloud"
- hànbǎobāo, hànbǎo 汉堡包／漢堡包, 汉堡／漢堡 – "hamburger"
- wǒ 我 – "I, me"
- rén 人 – "people"
- dìqiú 地球 – "earth"
- shǎndiàn 闪电/閃電 – "lightning"
- mèng 梦/夢 – "dream"
All varieties of modern Chinese are analytic languages, in that they depend on syntax (word order and sentence structure) rather than morphology—i.e., changes in form of a word—to indicate the word's function in a sentence. In other words, Chinese has very few grammatical inflections—it possesses no tenses, no voices, no numbers (singular, plural; though there are plural markers, for example for personal pronouns), and only a few articles (i.e., equivalents to "the, a, an" in English). There is, however, a gender difference in the written language (他 as "he" and 她 as "she"), but it should be noted that this is a relatively new introduction to the Chinese language in the twentieth century, and both characters are pronounced in exactly the same way.
They make heavy use of grammatical particles to indicate aspect and mood. In Mandarin Chinese, this involves the use of particles like le 了 (perfective), hái 还／還 (still), yǐjīng 已经／已經 (already), and so on.
Chinese features a subject–verb–object word order, and like many other languages in East Asia, makes frequent use of the topic–comment construction to form sentences. Chinese also has an extensive system of classifiers and measure words, another trait shared with neighbouring languages like Japanese and Korean. Other notable grammatical features common to all the spoken varieties of Chinese include the use of serial verb construction, pronoun dropping and the related subject dropping.
Although the grammars of the spoken varieties share many traits, they do possess differences.
The entire Chinese character corpus since antiquity comprises well over 20,000 characters, of which only roughly 10,000 are now commonly in use. However Chinese characters should not be confused with Chinese words; since most Chinese words are made up of two or more different characters, there are many times more Chinese words than there are characters.
Estimates of the total number of Chinese words and phrases vary greatly. The Hanyu Da Zidian, a compendium of Chinese characters, includes 54,678 head entries for characters, including bone oracle versions. The Zhonghua Zihai (1994) contains 85,568 head entries for character definitions, and is the largest reference work based purely on character and its literary variants. The CC-CEDICT project (2010) contains 97,404 contemporary entries including idioms, technology terms and names of political figures, businesses and products. The 2009 version of the Webster's Digital Chinese Dictionary (WDCD), based on CC-CEDICT, contains over 84,000 entries.
The most comprehensive pure linguistic Chinese-language dictionary, the 12-volumed Hanyu Da Cidian, records more than 23,000 head Chinese characters and gives over 370,000 definitions. The 1999 revised Cihai, a multi-volume encyclopedic dictionary reference work, gives 122,836 vocabulary entry definitions under 19,485 Chinese characters, including proper names, phrases and common zoological, geographical, sociological, scientific and technical terms.
The latest 2012 6th edition of Xiandai Hanyu Cidian, an authoritative one-volume dictionary on modern standard Chinese language as used in mainland China, has 69,000 entries and defines 13,000 head characters.
Like any other language, Chinese has absorbed a sizable number of loanwords from other cultures. Most Chinese words are formed out of native Chinese morphemes, including words describing imported objects and ideas. However, direct phonetic borrowing of foreign words has gone on since ancient times.
Some early Indo-European loanwords in Chinese have been proposed, notably 蜜 mì "honey", 獅 shī "lion," and perhaps also 馬 mǎ "horse", 豬 zhū "pig", 犬 quǎn "dog", and 鵝 é "goose".[c] Ancient words borrowed from along the Silk Road since Old Chinese include 葡萄 pútáo "grape", 石榴 shíliú "pomegranate" and 狮子／獅子 shīzi "lion". Some words were borrowed from Buddhist scriptures, including 佛 Fó "Buddha" and 菩萨／菩薩 Púsà "bodhisattva." Other words came from nomadic peoples to the north, such as 胡同 hútóng "hutong". Words borrowed from the peoples along the Silk Road, such as 葡萄 "grape," generally have Persian etymologies. Buddhist terminology is generally derived from Sanskrit or Pāli, the liturgical languages of North India. Words borrowed from the nomadic tribes of the Gobi, Mongolian or northeast regions generally have Altaic etymologies, such as 琵琶 pípa, the Chinese lute, or 酪 lào/luò "cheese" or "yoghurt", but from exactly which source is not always clear.
Modern borrowings and loanwords
Modern neologisms are primarily translated into Chinese in one of three ways: free translation (calque, or by meaning), phonetic translation (by sound), or a combination of the two. Today, it is much more common to use existing Chinese morphemes to coin new words in order to represent imported concepts, such as technical expressions and international scientific vocabulary. Any Latin or Greek etymologies are dropped and converted into the corresponding Chinese characters (for example, anti- typically becomes "反", literally opposite), making them more comprehensible for Chinese but introducing more difficulties in understanding foreign texts. For example, the word telephone was loaned phonetically as 德律风／德律風 (Shanghainese: télífon [təlɪfoŋ], Mandarin: délǜfēng) during the 1920s and widely used in Shanghai, but later 电话／電話 diànhuà (lit. "electric speech"), built out of native Chinese morphemes, became prevalent (電話 is in fact from the Japanese 電話 denwa; see below for more Japanese loans). Other examples include 电视／電視 diànshì (lit. "electric vision") for television, 电脑／電腦 diànnǎo (lit. "electric brain") for computer; 手机／手機 shǒujī (lit. "hand machine") for mobile phone, 蓝牙／藍牙 lányá (lit. "blue tooth") for Bluetooth, and 网志/網誌 wǎngzhì (lit. "internet logbook") for blog in Hong Kong and Macau Cantonese. Occasionally half-transliteration, half-translation compromises (phono-semantic matching) are accepted, such as 汉堡包／漢堡包 hànbǎobāo (漢堡 hànbǎo "Hamburg" + 包 bāo "bun") for "hamburger". Sometimes translations are designed so that they sound like the original while incorporating Chinese morphemes, such as 拖拉机／拖拉機 tuōlājī "tractor" (lit. "dragging-pulling machine"), or 马利奥／馬利奧 Mǎlì'ào for the video game character Mario. This is often done for commercial purposes, for example 奔腾／奔騰 bēnténg (lit. "dashing-leaping") for Pentium and 赛百味／賽百味 Sàibǎiwèi (lit. "better-than hundred tastes") for Subway restaurants.
Foreign words, mainly proper nouns, continue to enter the Chinese language by transcription according to their pronunciations. This is done by employing Chinese characters with similar pronunciations. For example, "Israel" becomes 以色列 Yǐsèliè, "Paris" becomes 巴黎 Bālí. A rather small number of direct transliterations have survived as common words, including 沙发／沙發 shāfā "sofa", 马达／馬達 mǎdá "motor", 幽默 yōumò "humor", 逻辑／邏輯 luójí "logic", 时髦／時髦 shímáo "smart, fashionable", and 歇斯底里 xiēsīdǐlǐ "hysterics". The bulk of these words were originally coined in the Shanghai dialect during the early 20th century and were later loaned into Mandarin, hence their pronunciations in Mandarin may be quite off from the English. For example, 沙发／沙發 "sofa" and 马达／馬達 "motor" in Shanghainese sound more like their English counterparts. Cantonese differs from Mandarin with some transliterations, such as 梳化 so1 faa3*2 "sofa" and 摩打 mo1 daa2 "motor".
Western foreign words representing Western concepts have influenced Chinese since the 20th century through transcription. From French came 芭蕾 bāléi "ballet" and 香槟 xiāngbīn, "champagne"; from Italian, 咖啡 kāfēi "caffè". English influence is particularly pronounced. From early 20th century Shanghainese, many English words are borrowed, such as 高尔夫／高爾夫 gāoěrfū "golf" and the above-mentioned 沙发／沙發 shāfā "sofa". Later, the United States soft influences gave rise to 迪斯科 dísīkē "disco", 可乐／可樂 kělè "cola", and 迷你 mínǐ "mini [skirt]". Contemporary colloquial Cantonese has distinct loanwords from English, such as 卡通 kaa1 tung1 "cartoon", 基佬 gei1 lou2 "gay people", 的士 dik1 si6*2 "taxi", and 巴士 baa1 si6*2 "bus". With the rising popularity of the Internet, there is a current vogue in China for coining English transliterations, for example, 粉丝／粉絲 fěnsī "fans", 黑客 hēikè "hacker" (lit. "black guest"), and 博客 bókè. In Taiwan, some of these transliterations are different, such as 駭客 hàikè for "hacker" and 部落格 bùluògé for "blog" (lit. "interconnected tribes").
Another result of the English influence on Chinese is the appearance in Modern Chinese texts of so-called 字母词／字母詞 zìmǔcí (lit. "lettered words") spelled with letters from the English alphabet. This has appeared in magazines, newspapers, on web sites, and on TV: 三G手机／三G手機 "3rd generation cell phones" (三 sān "three" + G "generation" + 手机／手機 shǒujī "mobile phones"), IT界 "IT circles" (IT "information technology" + 界 jiè "industry"), HSK (Hànyǔ Shuǐpíng Kǎoshì, 汉语水平考试／漢語水平考試), GB (Guóbiāo, 国标／國標), CIF价／CIF價 (CIF "Cost, Insurance, Freight" + 价／價 jià "price"), e家庭 "e-home" (e "electronic" + 家庭 jiātíng "home"), W时代／W時代 "wireless era" (W "wireless" + 时代／時代 shídài "era"), TV族 "TV watchers" (TV "television" + 族 zú "social group; clan"), 后РС时代／後PC時代 "post-PC era" (后／後 hòu "after/post-" + PC "personal computer" + 时代／時代), and so on.
Since the 20th century, another source of words has been Japanese using existing kanji (Chinese characters used in Japanese). Japanese re-molded European concepts and inventions into wasei-kango (和製漢語?, lit. "Japanese-made Chinese"), and many of these words have been re-loaned into modern Chinese. Other terms were coined by the Japanese by giving new senses to existing Chinese terms or by referring to expressions used in classical Chinese literature. For example, jīngjì (经济／經濟; 経済 keizai in Japanese), which in the original Chinese meant "the workings of the state", was narrowed to "economy" in Japanese; this narrowed definition was then re-imported into Chinese. As a result, these terms are virtually indistinguishable from native Chinese words: indeed, there is some dispute over some of these terms as to whether the Japanese or Chinese coined them first. As a result of this loaning, Chinese, Korean, Japanese, and Vietnamese share a corpus of linguistic terms describing modern terminology, paralleling the similar corpus of terms built from Greco-Latin and shared among European languages.