IndoWordNet Database Design
Status: Draft
This document specifies a Database design for maintaining the IndoWordNet data, and requests discussion and suggestions for improvements.
Abstract:
The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number (id). A WordNet is a crucial resource for a language which aids in NLP tasks such as Machine Translation, Information Retrieval etc. A WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNet actually maintains the concepts in a language, relations between concepts and their ontological details. The concept in a language is captured as a synset. A synset represents a unique concept is a language. Synset is composed of a. gloss describing the concept b. example sentences c. a set of synonym words that are used for the concept. Besides synset data, a WordNet maintains many lexical and semantic relations. Lexical relations like antonomy are between the words in a language whereas semantic relations are between concepts in a language i.e. synsets. Ontology details for a synset are also maintained in a WordNet.
The proposed database design can be used by all language groups part of the IndoWordNet. This design proposes to maintain the data for a WordNet in three databases. The common data for all languages such as semantic relations and ontology details are maintained in a common database called wordnet_master. The synset data for a language is maintained in a separate database for each language called wordnet_respective_language. Here respective_language is to be replaced by the actual language name like wordnet_konkani, wordnet_ hindi_, wordnet_marathi etc.
Detailed Design:
The WordNet data is maintained in multiple databases mentioned below:
Database 1
Name: wordnet_master
Purpose: To maintain the data shared by all the languages. This database will keep tables which borrow the relations from the source WordNet(HindiWordNet). It will include all ontology related tables and tables for semantic relations.
Database 2
Name:wordnet_respective_language*
Purpose: To maintain the data which is not shared by all the languages. This database will keep tables which will have information related to the target language. It will include tables to keep synset details, words in the language, examples etc.
NOTE:
*respective_languageis to be replaced by one of Bengali, Gujarati, Kashmiri, Konkani, Oriya, Punjabi, Urdu as applicable.
Database 3
Name: wordnet_admin
In addition to the above mentioned databases another database can be made to keep website related tables such as feedback table, FAQ table, website administration tables and so on. The details of this database is beyond the scope of the current document hence the same is not included here.
Fig 1: Some of the important tables which are part of the WordNet with colour coding to show common data shared by all languages and data different for each language.
Tables for maintaining synset data for the respective Language:
Database Name: wordnet_respective_language
1)Table Name: wn_synset
Purpose: To maintain the details of a synset (concept in a language). A synset(or concept) has a gloss, example sentences and synonym word set.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Primary key: Uniquely identifies a concept/synset in the language
2 / concept_definition / Text / The gloss / concept definition in a synset
3 / category_id / Decimal(4,0) / Foreign key from category table. Specifying if the concept is a noun, verb, adjective or adverb
4 / source_id / Decimal(4,0) / Foreign key from source table. Specifies the source from where the concept is taken
2)Table Name: wn_word
Purpose: To maintain the unique words of the language. Holds the vocabulary of the language.
Sr. No. / Field Name / Data Type / Purpose1 / word_id / Bigint(20) / Primary key: Uniquely identifies a word of the language.
2 / Word / Text / The word of the language.
3)Table Name: wn_synset_words
Purpose: To maintain the synonymous words in a synset which are used to describe a concept in a language by maintaining the principle of coverage and minimality
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from wn_synset table.
2 / word_id / Bigint(20) / Foreign key from wn_word table.
3 / word_priority / Decimal(4,0) / Gives the priority for the synonymous words. Where the highest priority is given to the most commonly used word for a concept. Here 1 is used for highest priority
4)Table Name: wn_synset_example
Purpose: To maintain the example sentences for a concept / synset. A synset may have more than one example sentences. Here we assume that a example belongs to one synset
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from wn_synset table.
2 / example_content / Text / The example sentence text.
3 / example_priority / Decimal(4,0) / Gives the priority for the example sentences. The order in which the examples are to be displayed / used. Here 1 is used for highest priority.
5)Table Name: wn_fileholder
Purpose: To maintain the different types of files like a pdf file, image file or similar files corresponding to the synset.
Sr. No. / Field Name / Data Type / Purpose1 / fileholder_id / Bigint(20) / Primary key: Used to uniquely identify the values of the file holder
2 / file_name / Text / Name of a file
3 / file_content / Mediumblob / Binary field to store the content of a file
4 / file_type / Text / Type of a file e.g. pdf, jpg, doc etc
5 / file_size / Decimal(4,0) / Size of a file
6)Table Name: wn_domain
Purpose: To maintain the class (domain) to which the concept/synset belongs like medical concept, marine concept, technology concept, mythological concept, language specific concept etc
Sr. No. / Field Name / Data Type / Purpose1 / domain_id / Decimal(4,0) / Primary key: Uniquely identifies the different classes or domains.
2 / domain_value / Text / The name for the class (domain)
7)Table Name: wn_synset_domain
Purpose: To maintain the relation between a synset and a class (domain) to which the concept/synset belongs.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from wn_synset table
2 / domain_id / Decimal(4,0) / Foreign key from wn_domain table.
8) Table Name: wn_synset_source
Purpose: To maintain the source from which a concept/synset has been taken
Sr. No. / Field Name / Data Type / Purpose1 / source_id / Bigint(20) / Primary key: Used to uniquely identify the source of a concept.
2 / source_value / Text / The name of the source.
9)Table Name: wn_ontology_nodes
Purpose: To maintain the different ontology typesor positions description in the target language
Sr. No. / Field Name / Data Type / Purpose1 / onto_id / Decimal(4,0) / Primary key: Uniquely identifies an ontology type.
2 / onto_data / Text / The name for the ontology type like noun, verb, inanimate object etc.
3 / onto_desc / Text / A description of the ontology type in target language
4 / transliterated_onto_data / Text / onto_data transliterated into English
5 / transliterated_onto_desc / Text / onto_desc transliterated into English
Tables for maintaining lexical relation like antonym and gradation which is specific to a particular language under consideration:
Database Name: wordnet_respective_language
10) Table Name:wn_rel_antonymy
Purpose: To maintain the antonym relation between a pair of words
Sr. No. / Field Name / Data Type / Purpose1 / word_id / Bigint(20) / Foreign key from the word table. Points to the word for which the antonym is being set.
2 / synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for which antonym is set.
3 / anto_word_id / Bigint(20) / Foreign key from the word table. Points to the antonym word
4 / anto_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the antonym word for the proper sense.
5 / anto_grad_property_id / Decimal(4,0) / Foreign key from wn_property_antonymy_gradation table. Points to the property name based on which the antonym is chosen. Example colour, time, gender etc
11)Table Name:wn_rel_gradation
Purpose: To maintain the gradation relation between three words.
Sr. No. / Field Name / Data Type / Purpose1 / first_word_id / Bigint(20) / Foreign key from the word table. Points to the first word for which the gradation is being set.
2 / first_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the first word
3 / mid_word_id / Bigint(20) / Foreign key from the word table. Points to the mid word for which the gradation is being set.
4 / mid_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the mid word
5 / last_word_id / Bigint(20) / Foreign key from the word table. Points to the last word for which the gradation is being set.
6 / last_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the last word
7 / anto_grad_property_id / Decimal(4,0) / Foreign key from wn_property_antonymy_gradation table. Points to the property name based on which the antonym is chosen. Example colour, time, gender etc
12)Table Name:wn_rel_compounding
Purpose: To maintain the compound words of the language.
Sr. No. / Field Name / Data Type / Purpose1 / compound_word_id / Bigint(20) / Foreign key from the word table. Points to a compound word which is formed of two or more words.
2 / compound_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word compound word
3 / part_word_id / Bigint(20) / Foreign key from the word table. Points to a part of the compound word.
4 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the part of the compound word
13)Table Name:wn_rel_conjunction
Purpose: To maintain the words formed by conjunction of words in the language.
Sr. No. / Field Name / Data Type / Purpose1 / conjunction word_id / Bigint(20) / Foreign key from the word table. Points to a conjunction word which is formed of two or more words.
2 / conjunction_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word conjunction word
3 / part_word_id / Bigint(20) / Foreign key from the word table. Points to a part of the conjunct word.
4 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the part of the conjunct word
14)Table Name:wn_lexical_relations
Purpose: To maintain the lexical relations w.r.t. the synsets.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign Key from synset table. Points to a synset.
2 / word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / relation_id / Decimal(4,0) / Foreign Key from relation types table from wordnet_master. Point to a relation in which the synset belongs
15)Table Name: wn_rel_adverb_derived_from_verb
Purpose: To maintain the semantic relation between synsets namely a adverb synset and the corresponding verb synset from which it is derived.
Sr. No. / Field Name / Data Type / Purpose1 / adverb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adverb synset/concept.
2 / adverb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept from which the adverb is derived.
4 / verb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
16)Table Name: wn_rel_verb_derived_from_noun
Purpose: To maintain the verbs derived from noun.
Sr. No. / Field Name / Data Type / Purpose1 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept.
2 / verb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept from which the verb is derived.
4 / noun_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
Database Name: wordnet_master
17)Table Name: wn_master_category
Purpose: To maintain the different grammatical categories such as noun, verb etc.
Sr. No. / Field Name / Data Type / Purpose1 / category_id / Decimal(4,0) / Primary key: Uniquely identifies a part of speech category.
2 / category_pid / Decimal(4,0) / Parent category id , in case of subcategories such as common noun, proper noun etc.
3 / category_value / Text / The name for the category such as noun, verb etc
18)Table Name: wn_master_language
Purpose: To maintain the language information in a database
Sr. No. / Field Name / Data Type / Purpose1 / language_id / Decimal(4,0) / Primary key: Uniquely identifies a language.
2 / language_name / Text / Name of a language
3 / language_desc / Text / Description of a language
4 / language_script / Text / Script of a language e.g. devnagari, roman etc.
5 / iso_code_char2 / Text / 2 character ISO code of a language
6 / iso_code_char3 / Text / 3 character ISO code of a language
7 / database_name / Text / Name of the database to which the language belongs.
19)Table Name: wn_master_language_lss_range
Purpose: To maintain language specific synset range w.r.t. the given language.
Sr. No. / Field Name / Data Type / Purpose1 / language_id / Decimal(4,0) / Foreign key of wn_master_language table
2 / start_range_id / Bigint(20) / Start range id of lss w.r.t the given language
3 / end_range_id / Bigint(20) / End range id of lss w.r.t the given language
20)Table Name: wn_master_synset_file
Purpose:Toassociateafile(exceptpicturefile)withasynset
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from synset table
2 / fileholder_id / Bigint(20) / Foreign key from wn_fileholder table
3 / language_id / Decimal(4,0) / Foreign key for wn_master_language
Tables for maintaining semantic relation and ontology data which is common to all languages of IndoWordNet:
Database Name: wordnet_master
21)Table Name: wn_rel_hypernymy_hyponymy
Purpose: To maintain the hypernymy and hyponymy type of a relation which is a IS-A-KIND-OF type of a semantic relationship between synsets. For example rose is a kind of flower then rose is child or hyponymy and flower is parent or hypernymy
Sr. No. / Field Name / Data Type / Purpose1 / parent_synset_id / Bigint(20) / Foreign key from the synset table. Points to the parent concept which is called hypernymy of the IS-A-KIND-OF relationship.
2 / child_synset_id / Bigint(20) / Foreign key from the synset table. Points to the child concept which is called hyponymy of the IS-A-KIND-OF relationship.
22)Table Name:wn_rel_meronymy_holonymy
Purpose: To maintain the meronymy and holonymy type of a relation which is a PART-WHOLE type of a semantic relationship between synsets. For example leaf is part of tree here tree is whole or meronym and leaf is part or holonym
Sr. No. / Field Name / Data Type / Purpose1 / whole_synset_id / Bigint(20) / Foreign key from the synset table. Points to the whole concept that is meronym of the PART-WHOLE relationship.
2 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the part concept that is holonymy of the PART-WHOLE relationship.
3 / mero_holo_property_id / Decimal(4,0) / Foreign key from the wn_property_meronymy_holonymy table. Points to the additional description about the relation.
23)Table Name: wn_rel_troponymy
Purpose: To maintain the troponymy type of a semantic relationship between synsets
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table.
2 / troponym_synset_id / Bigint(20) / Foreign key from the synset table. Points to the troponym synset.
24)Table Name: wn_rel_entailment
Purpose: To maintain theentailmenttype of a semantic relationship between synsets
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table.
2 / entailed_synset_id / Bigint(20) / Foreign key from the synset table. Points to the entailed synset.
25)Table Name: wn_rel_similar
Purpose: To maintain the relation between similar types of synsets.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / similar_synset_id / Bigint(20) / Foreign key from the synset table. Points to a similar synset/concept.
26)Table Name: wn_rel_also_see
Purpose: To maintain the relation between synsets which may be related in some way other than the regular semantic relations defined on a WordNet.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / also_see_synset_id / Bigint(20) / Foreign key from the synset table. Points to a additionally related synset/concept.
27)Table Name: wn_rel_noun_verb_link
Purpose: To maintain the semantic relation between synsets namely a noun synset and associated verb synset.
Sr. No. / Field Name / Data Type / Purpose1 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.
2 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept.
3 / link_type / Enum / The type of link between the synsets it could be ability link or capability link or function link.
28)Table Name: wn_rel_noun_adjective_attribute_link
Purpose: To maintain the semantic relation between synsets namely a noun synset and associated adjective attribute that go together.
Sr. No. / Field Name / Data Type / Purpose1 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.
2 / adjective_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adjective synset/concept.
29)Table Name: wn_rel_adjective_modifies_noun
Purpose: To maintain the semantic relation between synsets namely a noun synset and associated adjective attribute that go together.
Sr. No. / Field Name / Data Type / Purpose1 / adjective_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adjective synset/concept.
2 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.
30)Table Name: wn_rel_adverb_modifies_verb
Purpose: To maintain the semantic relation between synsets namely an adverb synset and the corresponding verb synset which it modifies.
Sr. No. / Field Name / Data Type / Purpose1 / adverb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adverb synset/concept.
2 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept which the adverb modifies.
31)Table Name: wn_rel_causative
Purpose: To maintain the causative semantic relation between synsets.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / causes_synset_id / Bigint(20) / Foreign key from the synset table. Points to a cause synset/concept.
32)Table Name: wn_rel_near_synsets
Purpose: To maintain the near synsets relation between synsets.
Sr. No. / Field Name / Data Type / Purpose1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / near_synset_id / Bigint(20) / Foreign key from the synset table. Points to a near synset/concept of a given synset.
33)Table Name: wn_property_antonymy_gradation
Purpose: To maintain the different types of relation properties for relations like antonym have properties like colour, gender etc
Sr. No. / Field Name / Data Type / Purpose1 / anto_grad_property_id / Decimal(4,0) / Primary key: Uniquely identifies a property type.
2 / anto_grad_property_value / Text / The name of the property like colour, gender etc.
34)Table Name: wn_property_meronymy_holonymy
Purpose: To maintain the different types of relation properties for relations like meronymy holonymy have properties like component-object, feature-activity etc
Sr. No. / Field Name / Data Type / Purpose1 / mero_holo_property_id / Decimal(4,0) / Primary key: Uniquely identifies a property type.
2 / mero_holo_property_value / Text / The name of the property like component-object, feature-activity etc
35)Table Name: wn_relation_types