Research Brief: Corpus Analysis of Nominal Phrases with Numeral Classifiers in the Chinese System

March 03, 2021
by Yamei Wang

Classifier is a general term for noun classification devices in linguistics, including noun class markers, possessives, locatives, and numeral classifiers. Numeral classifiers are an “areal feature” of East and Southeast Asian languages, such as Mandarin Chinese. There is a dispute over whether numeral classifiers in Chinese should be considered syntactic-based or semantic-based features. The dispute can only be resolved when we take a specific classifier or phrase into consideration since previous psycholinguistic experiments suggest that numeral classifiers differ in the amount of semantic and syntactic features that can be utilized to predict an upcoming noun by L1 and L2 Mandarin speakers. In order to investigate how a numeral classifier collaborates with other parts of speech in nominal phrases to achieve communication function, a large-scale corpus analysis on English and Mandarin has been conducted. Nominal phrases and nominal phrases with classifiers have been extracted from EWT (English Web Treebank) and GSDSimp (Simplified Chinese Universal Dependencies Dataset) corpora respectively.

Human language is one of the natural and efficient communication systems that humans use to transmit information. Not all languages share the same set of parts of speech. It is a mystery how numeral classifiers assist the language system’s ability to maintain a functional equilibrium. In particular, it asks how the existence of numeral classifiers affects other parts of speech in the language system. This study would also suggest the aforementioned dispute whether numeral classifiers are more syntactic- or semantic-based by comparing classifiers in Chinese to prenominal adjectives in English through corpora analysis. One ambitious goal of linguistic study is to figure out what invariant “universal” properties might underpin the fundamental human capacity for language, amidst remarkable diversity. This study might shed light on the existence of strong cross-linguistic word order preferences for nominal modifiers.

The findings presented above constitute a preliminary report from my dissertation proposal. A specific calculation method needs to be adopted to quantify entropy of various parts of speech, which is a measurement of uncertainty in phrases according to information theory. Other dialects of Chinese, such as Cantonese, should also be taken into consideration to get a complete view of the classifier system.