IndoWordNet Database Design

Status: Draft

This document specifies a Database design for maintaining the IndoWordNet data, and requests discussion and suggestions for improvements.

Abstract:

The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number (id). A WordNet is a crucial resource for a language which aids in NLP tasks such as Machine Translation, Information Retrieval etc. A WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNet actually maintains the concepts in a language, relations between concepts and their ontological details. The concept in a language is captured as a synset. A synset represents a unique concept is a language. Synset is composed of a. gloss describing the concept b. example sentences c. a set of synonym words that are used for the concept. Besides synset data, a WordNet maintains many lexical and semantic relations. Lexical relations like antonomy are between the words in a language whereas semantic relations are between concepts in a language i.e. synsets. Ontology details for a synset are also maintained in a WordNet.

The proposed database design can be used by all language groups part of the IndoWordNet. This design proposes to maintain the data for a WordNet in three databases. The common data for all languages such as semantic relations and ontology details are maintained in a common database called wordnet_master. The synset data for a language is maintained in a separate database for each language called wordnet_respective_language. Here respective_language is to be replaced by the actual language name like wordnet_konkani, wordnet_ hindi_, wordnet_marathi etc.

Detailed Design:

The WordNet data is maintained in multiple databases mentioned below:

Database 1

Name: wordnet_master

Purpose: To maintain the data shared by all the languages. This database will keep tables which borrow the relations from the source WordNet(HindiWordNet). It will include all ontology related tables and tables for semantic relations.

Database 2

Name:wordnet_respective_language*

Purpose: To maintain the data which is not shared by all the languages. This database will keep tables which will have information related to the target language. It will include tables to keep synset details, words in the language, examples etc.

NOTE:

*respective_languageis to be replaced by one of Bengali, Gujarati, Kashmiri, Konkani, Oriya, Punjabi, Urdu as applicable.

Database 3

Name: wordnet_admin

In addition to the above mentioned databases another database can be made to keep website related tables such as feedback table, FAQ table, website administration tables and so on. The details of this database is beyond the scope of the current document hence the same is not included here.

Fig 1: Some of the important tables which are part of the WordNet with colour coding to show common data shared by all languages and data different for each language.

Tables for maintaining synset data for the respective Language:

Database Name: wordnet_respective_language

1)Table Name: wn_synset

Purpose: To maintain the details of a synset (concept in a language). A synset(or concept) has a gloss, example sentences and synonym word set.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Primary key: Uniquely identifies a concept/synset in the language
2 / concept_definition / Text / The gloss / concept definition in a synset
3 / category_id / Decimal(4,0) / Foreign key from category table. Specifying if the concept is a noun, verb, adjective or adverb
4 / source_id / Decimal(4,0) / Foreign key from source table. Specifies the source from where the concept is taken

2)Table Name: wn_word

Purpose: To maintain the unique words of the language. Holds the vocabulary of the language.

Sr. No. / Field Name / Data Type / Purpose
1 / word_id / Bigint(20) / Primary key: Uniquely identifies a word of the language.
2 / Word / Text / The word of the language.

3)Table Name: wn_synset_words

Purpose: To maintain the synonymous words in a synset which are used to describe a concept in a language by maintaining the principle of coverage and minimality

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from wn_synset table.
2 / word_id / Bigint(20) / Foreign key from wn_word table.
3 / word_priority / Decimal(4,0) / Gives the priority for the synonymous words. Where the highest priority is given to the most commonly used word for a concept. Here 1 is used for highest priority

4)Table Name: wn_synset_example

Purpose: To maintain the example sentences for a concept / synset. A synset may have more than one example sentences. Here we assume that a example belongs to one synset

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from wn_synset table.
2 / example_content / Text / The example sentence text.
3 / example_priority / Decimal(4,0) / Gives the priority for the example sentences. The order in which the examples are to be displayed / used. Here 1 is used for highest priority.

5)Table Name: wn_fileholder

Purpose: To maintain the different types of files like a pdf file, image file or similar files corresponding to the synset.

Sr. No. / Field Name / Data Type / Purpose
1 / fileholder_id / Bigint(20) / Primary key: Used to uniquely identify the values of the file holder
2 / file_name / Text / Name of a file
3 / file_content / Mediumblob / Binary field to store the content of a file
4 / file_type / Text / Type of a file e.g. pdf, jpg, doc etc
5 / file_size / Decimal(4,0) / Size of a file

6)Table Name: wn_domain

Purpose: To maintain the class (domain) to which the concept/synset belongs like medical concept, marine concept, technology concept, mythological concept, language specific concept etc

Sr. No. / Field Name / Data Type / Purpose
1 / domain_id / Decimal(4,0) / Primary key: Uniquely identifies the different classes or domains.
2 / domain_value / Text / The name for the class (domain)

7)Table Name: wn_synset_domain

Purpose: To maintain the relation between a synset and a class (domain) to which the concept/synset belongs.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from wn_synset table
2 / domain_id / Decimal(4,0) / Foreign key from wn_domain table.

8) Table Name: wn_synset_source

Purpose: To maintain the source from which a concept/synset has been taken

Sr. No. / Field Name / Data Type / Purpose
1 / source_id / Bigint(20) / Primary key: Used to uniquely identify the source of a concept.
2 / source_value / Text / The name of the source.

9)Table Name: wn_ontology_nodes

Purpose: To maintain the different ontology typesor positions description in the target language

Sr. No. / Field Name / Data Type / Purpose
1 / onto_id / Decimal(4,0) / Primary key: Uniquely identifies an ontology type.
2 / onto_data / Text / The name for the ontology type like noun, verb, inanimate object etc.
3 / onto_desc / Text / A description of the ontology type in target language
4 / transliterated_onto_data / Text / onto_data transliterated into English
5 / transliterated_onto_desc / Text / onto_desc transliterated into English

Tables for maintaining lexical relation like antonym and gradation which is specific to a particular language under consideration:

Database Name: wordnet_respective_language

10) Table Name:wn_rel_antonymy

Purpose: To maintain the antonym relation between a pair of words

Sr. No. / Field Name / Data Type / Purpose
1 / word_id / Bigint(20) / Foreign key from the word table. Points to the word for which the antonym is being set.
2 / synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for which antonym is set.
3 / anto_word_id / Bigint(20) / Foreign key from the word table. Points to the antonym word
4 / anto_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the antonym word for the proper sense.
5 / anto_grad_property_id / Decimal(4,0) / Foreign key from wn_property_antonymy_gradation table. Points to the property name based on which the antonym is chosen. Example colour, time, gender etc

11)Table Name:wn_rel_gradation

Purpose: To maintain the gradation relation between three words.

Sr. No. / Field Name / Data Type / Purpose
1 / first_word_id / Bigint(20) / Foreign key from the word table. Points to the first word for which the gradation is being set.
2 / first_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the first word
3 / mid_word_id / Bigint(20) / Foreign key from the word table. Points to the mid word for which the gradation is being set.
4 / mid_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the mid word
5 / last_word_id / Bigint(20) / Foreign key from the word table. Points to the last word for which the gradation is being set.
6 / last_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word sense for the last word
7 / anto_grad_property_id / Decimal(4,0) / Foreign key from wn_property_antonymy_gradation table. Points to the property name based on which the antonym is chosen. Example colour, time, gender etc

12)Table Name:wn_rel_compounding

Purpose: To maintain the compound words of the language.

Sr. No. / Field Name / Data Type / Purpose
1 / compound_word_id / Bigint(20) / Foreign key from the word table. Points to a compound word which is formed of two or more words.
2 / compound_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word compound word
3 / part_word_id / Bigint(20) / Foreign key from the word table. Points to a part of the compound word.
4 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the part of the compound word

13)Table Name:wn_rel_conjunction

Purpose: To maintain the words formed by conjunction of words in the language.

Sr. No. / Field Name / Data Type / Purpose
1 / conjunction word_id / Bigint(20) / Foreign key from the word table. Points to a conjunction word which is formed of two or more words.
2 / conjunction_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the word conjunction word
3 / part_word_id / Bigint(20) / Foreign key from the word table. Points to a part of the conjunct word.
4 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the synset corresponding to the part of the conjunct word

14)Table Name:wn_lexical_relations

Purpose: To maintain the lexical relations w.r.t. the synsets.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign Key from synset table. Points to a synset.
2 / word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / relation_id / Decimal(4,0) / Foreign Key from relation types table from wordnet_master. Point to a relation in which the synset belongs

15)Table Name: wn_rel_adverb_derived_from_verb

Purpose: To maintain the semantic relation between synsets namely a adverb synset and the corresponding verb synset from which it is derived.

Sr. No. / Field Name / Data Type / Purpose
1 / adverb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adverb synset/concept.
2 / adverb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept from which the adverb is derived.
4 / verb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.

16)Table Name: wn_rel_verb_derived_from_noun

Purpose: To maintain the verbs derived from noun.

Sr. No. / Field Name / Data Type / Purpose
1 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept.
2 / verb_word_id / Bigint(20) / Foreign Key from word table. Points to a word.
3 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept from which the verb is derived.
4 / noun_word_id / Bigint(20) / Foreign Key from word table. Points to a word.

Database Name: wordnet_master

17)Table Name: wn_master_category

Purpose: To maintain the different grammatical categories such as noun, verb etc.

Sr. No. / Field Name / Data Type / Purpose
1 / category_id / Decimal(4,0) / Primary key: Uniquely identifies a part of speech category.
2 / category_pid / Decimal(4,0) / Parent category id , in case of subcategories such as common noun, proper noun etc.
3 / category_value / Text / The name for the category such as noun, verb etc

18)Table Name: wn_master_language

Purpose: To maintain the language information in a database

Sr. No. / Field Name / Data Type / Purpose
1 / language_id / Decimal(4,0) / Primary key: Uniquely identifies a language.
2 / language_name / Text / Name of a language
3 / language_desc / Text / Description of a language
4 / language_script / Text / Script of a language e.g. devnagari, roman etc.
5 / iso_code_char2 / Text / 2 character ISO code of a language
6 / iso_code_char3 / Text / 3 character ISO code of a language
7 / database_name / Text / Name of the database to which the language belongs.

19)Table Name: wn_master_language_lss_range

Purpose: To maintain language specific synset range w.r.t. the given language.

Sr. No. / Field Name / Data Type / Purpose
1 / language_id / Decimal(4,0) / Foreign key of wn_master_language table
2 / start_range_id / Bigint(20) / Start range id of lss w.r.t the given language
3 / end_range_id / Bigint(20) / End range id of lss w.r.t the given language

20)Table Name: wn_master_synset_file

Purpose:Toassociateafile(exceptpicturefile)withasynset

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from synset table
2 / fileholder_id / Bigint(20) / Foreign key from wn_fileholder table
3 / language_id / Decimal(4,0) / Foreign key for wn_master_language

Tables for maintaining semantic relation and ontology data which is common to all languages of IndoWordNet:

Database Name: wordnet_master

21)Table Name: wn_rel_hypernymy_hyponymy

Purpose: To maintain the hypernymy and hyponymy type of a relation which is a IS-A-KIND-OF type of a semantic relationship between synsets. For example rose is a kind of flower then rose is child or hyponymy and flower is parent or hypernymy

Sr. No. / Field Name / Data Type / Purpose
1 / parent_synset_id / Bigint(20) / Foreign key from the synset table. Points to the parent concept which is called hypernymy of the IS-A-KIND-OF relationship.
2 / child_synset_id / Bigint(20) / Foreign key from the synset table. Points to the child concept which is called hyponymy of the IS-A-KIND-OF relationship.

22)Table Name:wn_rel_meronymy_holonymy

Purpose: To maintain the meronymy and holonymy type of a relation which is a PART-WHOLE type of a semantic relationship between synsets. For example leaf is part of tree here tree is whole or meronym and leaf is part or holonym

Sr. No. / Field Name / Data Type / Purpose
1 / whole_synset_id / Bigint(20) / Foreign key from the synset table. Points to the whole concept that is meronym of the PART-WHOLE relationship.
2 / part_synset_id / Bigint(20) / Foreign key from the synset table. Points to the part concept that is holonymy of the PART-WHOLE relationship.
3 / mero_holo_property_id / Decimal(4,0) / Foreign key from the wn_property_meronymy_holonymy table. Points to the additional description about the relation.

23)Table Name: wn_rel_troponymy

Purpose: To maintain the troponymy type of a semantic relationship between synsets

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table.
2 / troponym_synset_id / Bigint(20) / Foreign key from the synset table. Points to the troponym synset.

24)Table Name: wn_rel_entailment

Purpose: To maintain theentailmenttype of a semantic relationship between synsets

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table.
2 / entailed_synset_id / Bigint(20) / Foreign key from the synset table. Points to the entailed synset.

25)Table Name: wn_rel_similar

Purpose: To maintain the relation between similar types of synsets.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / similar_synset_id / Bigint(20) / Foreign key from the synset table. Points to a similar synset/concept.

26)Table Name: wn_rel_also_see

Purpose: To maintain the relation between synsets which may be related in some way other than the regular semantic relations defined on a WordNet.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / also_see_synset_id / Bigint(20) / Foreign key from the synset table. Points to a additionally related synset/concept.

27)Table Name: wn_rel_noun_verb_link

Purpose: To maintain the semantic relation between synsets namely a noun synset and associated verb synset.

Sr. No. / Field Name / Data Type / Purpose
1 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.
2 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept.
3 / link_type / Enum / The type of link between the synsets it could be ability link or capability link or function link.

28)Table Name: wn_rel_noun_adjective_attribute_link

Purpose: To maintain the semantic relation between synsets namely a noun synset and associated adjective attribute that go together.

Sr. No. / Field Name / Data Type / Purpose
1 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.
2 / adjective_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adjective synset/concept.

29)Table Name: wn_rel_adjective_modifies_noun

Purpose: To maintain the semantic relation between synsets namely a noun synset and associated adjective attribute that go together.

Sr. No. / Field Name / Data Type / Purpose
1 / adjective_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adjective synset/concept.
2 / noun_synset_id / Bigint(20) / Foreign key from the synset table. Points to a noun synset/concept.

30)Table Name: wn_rel_adverb_modifies_verb

Purpose: To maintain the semantic relation between synsets namely an adverb synset and the corresponding verb synset which it modifies.

Sr. No. / Field Name / Data Type / Purpose
1 / adverb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a adverb synset/concept.
2 / verb_synset_id / Bigint(20) / Foreign key from the synset table. Points to a verb synset/concept which the adverb modifies.

31)Table Name: wn_rel_causative

Purpose: To maintain the causative semantic relation between synsets.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / causes_synset_id / Bigint(20) / Foreign key from the synset table. Points to a cause synset/concept.

32)Table Name: wn_rel_near_synsets

Purpose: To maintain the near synsets relation between synsets.

Sr. No. / Field Name / Data Type / Purpose
1 / synset_id / Bigint(20) / Foreign key from the synset table. Points to a synset/concept.
2 / near_synset_id / Bigint(20) / Foreign key from the synset table. Points to a near synset/concept of a given synset.

33)Table Name: wn_property_antonymy_gradation

Purpose: To maintain the different types of relation properties for relations like antonym have properties like colour, gender etc

Sr. No. / Field Name / Data Type / Purpose
1 / anto_grad_property_id / Decimal(4,0) / Primary key: Uniquely identifies a property type.
2 / anto_grad_property_value / Text / The name of the property like colour, gender etc.

34)Table Name: wn_property_meronymy_holonymy

Purpose: To maintain the different types of relation properties for relations like meronymy holonymy have properties like component-object, feature-activity etc

Sr. No. / Field Name / Data Type / Purpose
1 / mero_holo_property_id / Decimal(4,0) / Primary key: Uniquely identifies a property type.
2 / mero_holo_property_value / Text / The name of the property like component-object, feature-activity etc

35)Table Name: wn_relation_types