Semantic Patterns of Chinese Post-modified V+N Phrases
Likun Qiu
Institute of Computational Linguistics
Peking University
Beijing, China
Wenxian Zhang
International College for Chinese Language Studies
Peking University
Beijing, China
Abstract—Noun phrase understanding is very important for many sub-fields of natural language processing and information retrieval. This paper proposed a classification framework for Chinese post-modified V+N phrases. The basic idea is that most noun phrases might be mapped to corresponding clauses. Therefore, case, tense, aspect and modality can also be encoded in noun phrases as in verb phrases. All those factors are included in the proposed framework.
Keywords-Semantic Pattern;NounPhrases;Case; Tense; Modality.
I.Introduction
With the development of science and technology, human tend to classify things more and more intensive and meticulous. In Chinese, one important phenomenon is the large scale use of base noun phrases [1]. The statistics also showed that, post-modified V+N phrases (abbreviated as VN-NP) occurred 10,047 times in the People’s Daily of January 1998 (containing about 1.12 million words).At the same time, study on noun phrases has attracted increasing concern [2-5].
Generally believed that the majority of VN-NPs can be converted to the corresponding clauses (such as武器生产人员 "weapons production staff" and 人员生产武器"staff producing weapons").Obviously, the two forms have much in common as well as much in difference. But the similarities and differences between the two have been discussed more from the grammatical level and less involved in the semantic level. In fact, the most fundamental difference between the two is that a small number of dominant elements in the sentence are hidden inthe corresponding noun phrase, where some original lexical and grammatical meaningsare shown implicitly in VN-NP. That is the most important distinction between a VN-NP and its corresponding clause form. For example, in a clause, the tense of the verb usually denoted by a function word such as了 "le", 着 "zhe", 过 "guo". Incorresponding VN-NP, there is no element denoting tense explicitly. However, a native speaker might perceive the existence of some kind of tense, aspect or modality. For instance,移动电话 "mobile phone"denotes a kind of phone, on which a kind of action might be achieved.接地回路 "ground loop" denotes a kind of loop, which is in ongoing status. In the semantic derivation,移动电话 "mobile phone"cannotbe interpreted as正在移动的电话 "a phone that is moving".接地回路 "ground loop" also cannot be interpreted as可以接地的回路"loops that can touch the ground".
Based on the above analysis, this paper will propose a semantic patter framework for Chinese VN-NP.
The rest of this paper is organized as follows. The distribution of VN-NP in corpus is given in Section 2. Then, Section 3 presented the semantic pattern framework for Chinese VN-NP. The final section gives out conclusion.
II.Distribution of chinese vn-np
The Contemporary Chinese Corpus (abbreviated as PFR corpus), which is segmented and POS-tagged, contains all the news articles (about 1.12 million words) of the People’s Daily newspaper published in China in June 1998[6].Through the statistics on PFR corpus, we found a total of 10,047 VN-NPs (filtered duplicated token, then a total of 5110). In detail, a total of 1502 nouns have occurred in the head position of the noun phrases, in which 362 nouns have occurred more than 3 times. A total of 1738 verbs have occurred in the subordinate position, in which 279 verbs have occurred more than 4 times. Here we give some instances of the high-frequency nouns and verbs.
Instances of high frequency nouns:
能力 80 人员 77 问题 61 方式 59 制度 55 过程 51 情况 46 企业 42 机制 41 单位 40 水平 39 项目 39 计划 37 力度 37 作用 37 任务 36 措施 35 方面 34 时间 32 工程 31 阶段 31 行为 31 仪式 31 方法 30 技术 30 部门 29 市场 29 条件 29 方案 27 结果 27 政策 27 资金 26 标准 25 设备 25 系统 25 形式 25 程度 24 队伍 24 对象 24 .
Instances of high frequency verbs:
工作 93有关 79发展 76管理 67生产 65经营 61服务 59生活 58教育 44扶贫 39研究 37投资 36创作 33改革 32销售 32合作 31贸易 31出口 30开发 30劳动 30革命 28旅游 28建设 27运输 24救灾 23
These high-frequency words are derived from the news corpus statistics, so it can only reflect the language features of news corpus.In another statistics based on a corpus ofthe information science field, the nouns with highest frequencies are 系统 "system", 程序 "procedure", and so on, which is very different from the previous two lists. To some extent, it shows the limitation that such kind of high-frequency vocabularies usually subject to the domain of corpus.
III.Semantic pattern framework for vn-np
There is no difference on part of speech sequences of VN-NP instances. Therefore, if we want toclassify those instances from the semantic perspective, theoretical basis beyond part of speech must be found. The first basis is diathesis alternation between VN-NP and corresponding clause. All VN-NPs might be classified into two kinds according to whether has one corresponding clause. Those VN-NPs that has one corresponding clause might be classified further according to the tense, modality, aspect and case of the corresponding clause.
In contemporary Chinese, the grammatical relation of one V+N phrase might bepost-modified or predicate-object. We might find many instances such as下载软件"download files", 复印资料"copy information", 学习文件"study documents", 讨论报告"discuss reports",出租汽车"taxis for rent", which might be post-modified or predicate-object in different context. The effect of context would not be involved in this paper. We would focus on the internal structure of VN-NP and try to present a semantic pattern framework for it.
Fillmore used the case marker in some languages as the standard of differentiating semantic cases [7-8]. Since the numbers of case markers in different languages vary greatly, he has to adopt meaning as the judgment standards when the corresponding case marker isn’t available in a certain language.
We also can’t find necessary markers for the judgment of semantic patterns of VN-NP. There exist some markers such as 了 "le", 着 "zhe", 过 "guo" in Chinese clause. However, we even can’t find similaramount of markers in Chinese VN-NP.As mentioned above, some VN-NPs might be converted to corresponding clauses. Therefore, the semantic pattern of VN-NPs also could be differentiated by semantic relations by referred corresponding clauses.
Table1 List of Semantic Patterns
Semantic Pattern / InstancesNominative-Panchronic / 纺织职工、抢救小组
Nominative -Infinitive / 回迁居民、出游旅客
Nominative -Progressive / 驻外人员、驻华大使
Nominative -Perfect / 获奖作品、下岗女工
Nominative -Able / 当家品种、代表人士
Negative / 不锈钢、不倒翁
Objective-Infinitive / 赞助对象、购并企业
Objective -Progressive / 持有资产、合营公司
Objective -Possible / 自选动作、共享空间
Objective -Perfect / 禁用药物、淘汰设备
Location / 建筑工地、健身场地
Instrument / 救灾药品、勘探装备
Causative / 放心岗,放心肉
Parametric / 出发日期、播出时间
Implication-N / 发行总量、进口总额
Appositional / 牺牲精神、上升趋势
Case, tense, aspect and modality are involved in our framework, which has three categories in the first level and sixteen categories in the second level (see Table 1). The interpretation of those categories and corresponding judgment standard would be show in next section.
A. Explicit Case, Implicit Case and Indireict Case
The first level is differentiated from the type of cases. We distinguish three types of cases, i.e.,explicit case, implicit case and indirect case.
[1]Explicit Case: For instance, in the clause of小王吃苹果 "Xiaowang ate an apple", the case relations between 小王 "Xiaowang", 苹果"apple" and the verb 吃"eat" are explicit cases.Explicit case relation usually occurs between the predicate and its subject and object.
[2]ImplicitCase: In the phrase of 小王吃的苹果"the apple eaten by Xiaowang", 吃 "eat"is the predicate verb of the attributive clause小王吃 "Xiaowang eat"and the relation between 小王"Xiaowang"and 吃 "eat"is implicit relation.In VN-NPs such as 物业管理"Property Management"and 管理人员 "manager", the case relations between 物业"property" and 管理"management", 人员"human" and 管理"manage"are also implicit relations.Implicit case relation usually occurs between the head word and its attributive constituent.
[3]IndirectCase:For instance, in the phrase石油公司"petroleum company" and 手机工厂 "mobile phone manufacture", the case relations between the two nouns of each phase are indirect case relation. The predicate verb 生产 "produce" and 制造"manufacture" are hidden. In the phrases 销售数量"quantity of sale" and 出口总额 "the value of export", the case relations between the verb and the head noun are also indirect case relations. A word that relates the verb and noun has been hidden.
If one VN-NP can’t be converted to a corresponding clause, there doesn’t exist any kind of case relation between the verb and noun. This is referred to asnon-case relation.
B. Intepretation of Semantic Patterns
In the proposed framework, semantic patterns are firstly classified into three types, i.e., implicit case, indirect case and non-case.The three types might be further classified according to the specific case type, tense, aspect and modality. Here, the specific case types include nominative, objective, location and instrument. The subtypes of tense, aspect and modality include panchronic, infinitive, progressive, perfect, able and negative types.
The interpretations of all the patterns are given in the following.
a)Patterns with Implicit Case
[1]Panchronic
In a VN-NP, if V denotes the function or utility of N, since the function and utility is a kind of property of N and beyond any specific time, it is considered as panchronic.The three cases, i.e., nominative, location and instrument might coexist with panchronic type.It is worth noting that both location and instrument might be considered as a special nominative in broad sense.
Nominative-PanchronicIn this type of VN-NPs, N is usually animate and V denotes the function of N.
Location-PanchronicIn this type of VN-NPs, N denotes a specific location and V denotes the action that happens at the location. Accordingly, V becomes the function of N and the nominative of V doesn’t appear.
Instrument-PanchronicIn this type of VN-NPs, N, which denote a kind of instruments, is usually inanimate and V denote the utility of N.
[2] Infinitive
In a VN-NP, if the action denoted by V might have happened, might be ongoing or would happen in future, the tense and aspect of this action is referred to as infinitive. Nominative and objective might coexist with infinitive type of VN-NPs and form nominative-infinitive and objective-infinitive VN-NPs.
[3] Perfect
In a VN-NP, if the action has happened, it is referred to as perfect type. Nominative and objective might coexist with perfect type and so forms nominative-perfect and objective-perfect VN-NPs.
[4] Progressive
In a VN-NP, if the action is ongoing and would continue for some time, it is referred to as progressive type.Nominative and objective might coexist with progressive type and so forms nominative-progressive and objective-progressive VN-NPs.
[5] Able
In a VN-NP, if the thing denoted by N has the ability of conduct the action denoted by V, it is referred to as able type.Nominative and objective might coexist with able type and so forms nominative-able and objective-able VN-NPs.
Nominative-AbleIn this type of VN-NPs, V denotes a kind of ability and N denotes somebody or something that has the ability.
Objective-AbleIn this type of VN-NPs, N denote an objective and usually can be imposed a certain behavior on. The construction meaning of this type of VN-NPs is "a kind of N that might be Ved".
[6] Negative
This kind of VN-NPs usually contain a negative word such as 不"not".
[7] Causative
The construction meaning of this kind of VN-NPs is "N causes somebody V", in which N usually is the agent of the V.
b)Patterns with Indirect Case
[1] Parametric
In some VN-NPs with indirect case relation, it is usually difficult for us to find appropriate verb to restore the event structure. However, we might be taken the N in a VN-NP as a parameter of the event denoted by the V. At the same time, we suppose that there exist a hidden verb that relate the V and the N. This type of VN-NPs is referred to as parametric type.
In corresponding clause, the N in VN-NP, which denotes the parameter, usually appears in the form of prepositional object. For instance, the VN-NP 连通时间 has a corresponding clause用了许多时间连通 "connect with a lot of time".
[2] Implication-N
In some VN-NPs with indirect case relation, the V and the V might also be connected by a hidden noun, which usually occurs in the context of the NP. For instance, in the VN-NP 发行总量 "circulations", there might hidden a noun 报纸 "newspaper"or 图书 "book"that connect the V 发行and N 总量. This type of VN-NP is referred to as Implication-N type.
Only a few nouns usually occur in this type of VN-NPs. Some of them are listed as follows: 总量、总额、数量、总值、总数、限额、数额.Most of them denote quantity.
c)No-case Patterns
[1] Appositional
This type of VN-NPs is very special. There is no any kind of case relation between the V and the N.Instead, the N usually denotes the category of the V. Therefore, the construction meaning of this type of VN-NPs might be "V is a kind of N".
IV.Conclusion and future work
Semantic analysis is a challenging area.This paper tried to presenta semantic pattern framework for interpreting Chinese VN-NPs and expected tohave a certain instructiveto natural language processing, especially on the analysis of noun phrases.
Next, based on the proposed framework, we plan to do practical analysis and annotation focusing on VN-NPsand expect to find models forautomatic semantic pattern classification.
II.Acknowledgements
This work wassupported by the Open Project Program of the NationalLaboratory of Pattern Recognition (NLPR).
III.References
[1]. Chengxing Jin, Guozheng Hu. 2002.Study on Nominalizaiton Tendency in Science English.Journal of Wuyi University(Social Science Version), 4(2).
[2]. Wenji Le, Ming Zhou & al.1995. Corpus-based Maximal-length Chinese Noun Phrase Extraction. In: Chen Liwei, Yuan Qi.Eds. Advances and applications on computational linguistics, Beijing: Tsinghua University Press, pages119-124
[3]. Qiang Zhou, Maosong Sun, Changning Huang. 2000. Automatic identification of Chinese maximal noun phrases. Journal of Software, 11:2, 195-201。
[4]. Jinxia Li. 2002. Study on Ambiguity of "V+N" and Related Structure in Modern Chinese (Thesis in Chinese). Beijing: Graduate Schoolof Chinese Academy of Social Sciences.
[5]. Chunyang Song. 2005. A Logic Semantics Study on "noun+noun" of Modern Chinese for Chinese Information Processing. Shanghai: Xuelin Press.
[6]. Shiwen Yu, Huiming Duan, Xuefeng Zhu, BinSwen, Baobao Chang. 2003. Specification forCorpus Processing at Peking University: WordSegmentation, POS Tagging and Phonetic Notation.Journal of Chinese Language and Computing,13(2): 121-158
[7]. Fillmore, Charles J. "The Case for Case". In Bach and Harms (Ed.): Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston,1968. 1-88.
[8]. Fillmore, Charles J. Frame semantics. In Linguistics in the Morning Calm, pp. 111--137. Seoul, South Korea: Hanshin Publishing Company.