Announcement on the Update of HowNet
Canada Keentime Inc. will upgrade HowNet to 2012 version for all its clients. The
update will start from December 1, 2012. All clients are requested to contact
for the update.
Outline of the Update of HowNet 2012
This is the most important upgrading of the version since its release. By the upgrading we will provide some of our latest-developed tools of meaning computation as well as new basic data resources, including dictionary and rule bank.
1. HowNet_Browser
2. Concept_Relevance_Calculater, CRC
3. Concept_Similarity_Measure, CSM
4. HowNet_Inference_Machine, IM
5. Snese_Colony_Tester, SCT
6. Chinese_Word_Processor, CWP
7. Chinese_VXY_Structure_Determiner, VXY
8. Chinese_VN_Structure_Determiner, VN
A brief summary of the update is presented as follows.
1. HowNet_Browser
a. Basic data have been added and modified in a large amount.
a-1 13000 rarely-used Chinese characters have been added in order to meet the probable demands to deal with big data processing, especially the OOVs.
a-2 To add detailed English syntactic features to all the records of the basic data in HowNet Dictionary, for example, “ask”:
Previous version: W_E=ask
G_E=verb
Present version: W_E=ask
G_E=verb [2 ask verb -0 vt,dobj,sobj,whobj,ofnpa 22 ]
The detailed English syntactic features can be used in English parsing.
b. To release all the functions in the Browser, for example:
b-1. The basic management tool for adding, modifying, deleting. Users can handle the dictionary data according to needs for their applications. However users are requested to strictly follow the guidelines for the functions to avoid any damage to the dictionary.
b-2. Full access to all right-click menu, including Export, Expand, Collapse…
2. Concept_Relevance_Calculater, CRC
The major modification for CRC is done on realization of the cognate relationship of CoEvent and its corresponding event, for example: “marriage” and “marry”, “battle” and “fight” should highly relevant. Let’s have a look at their definition in HowNet.
Marriage: {fact|事情:CoEvent={GetMarried|结婚}}
Marry: {GetMarried|结婚:RelateTo={human|人:belong={family|家庭},
modifier={female|女}{spouse|配偶}}{human|人:belong={family|家庭},
modifier={male|男}{spouse|配偶}}}
战争: {fact|事情:CoEvent={fight|争斗},domain={military|军}}
打仗: {fight|争斗:domain={military|军}}
3. Concept_Similarity_Measure, CSM
a. It must be noted that what HowNet computes is not words but concepts. The similarity measurement by HowNet is independent of specific languages.
b. The similarity is different from the relevance. It is inappropriate to confuse them. Those are highly relevant may have very low similarity to each other, for instance, “doctor” and “patient”. These two are highly related because they share the event of “to treat”, but from the view of similarity, they are quite apart from each other, because “doctor” is the agent of the event “to treat” while “patient” is the target.
4. HowNet_Inference_Machine, IM
This is a newly-developed tool. To date, the tool can be used to construct a concept relevance field, or called “bag of the concepts”, by means of concept relevance rules. The concept relevance field based on HowNet_Inference_Machine (IM) is different from the one based on Concept_Relevance_Calculater (CRC). The IM relevance field can be constructed by the rules specified by the user, and the strength of the rules can help the user to build powerful tracks of association. To take “bank” (only for meaning as river bank) for example, the association track goes: bank – waters — land — fish — fishing, but the CRC relevance field is confined only to a track: bank – waters – land. Additionally, if we take “buy” as a keyword, the CRC field gives only a track: buy – purchase location – buyer – buying manner – sell, while The IM field will cover: buy – purchase location – buyer – buying manner – sell –select – pay – price – cheap/expensive – money – other concepts of commercial domain – it almost forms a script with an event as its centre.
HowNet_Inference_Machine will be designed and developed as some other types of meaning computation tools such as “inference device for event relationships”. Users are expected to scrutinize the guideline of the IM rule base and learn to modify and write the rules in order to cater their own needs.
5. Snese_Colony_Tester, SCT
This is also a newly-developed tool. HowNet_Inference_Machine is its indispensable basis. After an English text or a Chinese text is input into the Snese_Colony_Tester, the Tester firstly will give all the senses for each word and expression in the text, and then can show a value of each sense according to the computation of “other senses” contribution to it. The sense contribution is calculated on the basis of the relevant field given by the HowNet_Inference_Machine.
Let’s take the following text for an experiment:
“There was a very serious accident on the Beijing-Tangjin highway on Tuesday
morning, resulting in a brutal jam. Victims say the traffic started to slow down
around 4 o'clock Monday morning.
Vehicles stuck on the highway only moved about 20 kilometers over the next 20
hours. Their biggest challenge, besides boredom, was the heat, and lack of food
and drink. Around 5 o'clock in the afternoon, police started sending food and drink
supplies to the stuck travelers.”
The sense testing value for the word “jam” given by the Tester is: 0.0270616606 (traffic jam) and 0.0002705628 (food jam).
Let’s compare the processing results of the word “jam” by two machine translation systems.
MT 1
有一个非常严重的事故,在北京唐津高速公路上周二上午,在残酷的果酱。受
害人说,交通开始放缓星期一早上4点钟左右。
在未来20小时内,滞留在高速公路上的车辆只移动了约20公里。他们最大
的挑战,除了无聊,热,缺乏食物和饮料。在下午5点左右,警方开始卡住
的旅客送食物和饮用品。
MT 2
有一个非常严重的事故,在beijing-tangjin公路上星期二上午,造成一个
残酷的果酱。受害者说,交通开始减慢,在星期一早上四点。
汽车陷在高速公路上只有约20公里,在接下来的20个小时。他们最大
的挑战,除了无聊,是热,和缺乏食物和饮料。下午五点左右,警方开始
将食品和饮料供应的滞留旅客。
Both the systems give a wrong Chinese translation “food jam”. The discourse-originated WSD is still a very hard nut to crack for the state-of-the-art human language technology.
6. Chinese_Word_Processor
The Chinese_Word_Processor is completely based on HowNet dictionary. It involves the basic function of so-called Chinese word segmentation. It is now used in Snese_Colony_Tester. Users are expected to develop their own word processor in combination with their word segmentation tool.
7. Chinese_VXY_Structure_Determiner, VXY
By the term of VXY structure, we mean an ambiguous structure frequently-occurred in Chinese, for example, “削苹果的刀 (peel apple’s knife)” or “削苹果的皮 (peel apple’s peel)”. The ambiguity of the type lies in the governing of the verb “peel”, it may govern either the noun “apple” or the noun “knife/peel”. In most cases, the disambiguation of this type of structures mainly depends on semantics rather than syntax. In this type of structures, “V” indicates a verb, while “X” and “Y” may be “N” mostly, or “A” or “V”. When testing the tool, the user put VXY into three boxes respectively, for example, 增加(V)产品(X)花色品种(Y).The result is then shown as “TYPE 1”, which indicates “V” governs “Y”. If the result is “TYPE 2”, it indicate that “V” governs “X”, for instance, if you input “增加(V)花色品种(X)途径(Y)”. Finally if the result “TYPE 未知类型” is shown, it indicates the government of “V” remains ambiguous, for instance, “骂老师的孩子”. On GUI, in the “rule box”, the rule(s) applied is(are) displayed.
8. Chinese_VN_Structure_Determiner, VN
By VN structure we mean a kind of semantic relationship between an event and a thing rather than a verb and a noun syntactically, for example, 运输-旅客(transport- passenger),运输-工具 (transport-vehicle),海洋-运输 (sea-transport),话剧-演员(drama-actor),汽车-制造商 (car-manufacturer). From HowNet’s viewpoint “drama-actor” should be interpreted as “an actor who plays drama” rather than the syntactical relationship -- attributive-headword. We believe the semantic representation will be more useful for machine reading and understanding. So far, Chinese_VN_Structure_Determiner is only a prototype. We hope our users will develop and enhance it further.
In addition, a HowNet-based English-to-Chinese machine translation system has been developed. It is a rule-based system. One of the purposes to develop the system is to test and tune HowNet and to make a comparison of MT systems with and without the support of HowNet. The HowNet MT system is a self-sufficient practical product which will be sold separately from HowNet.