Naming Conventions for CV and Ontologies Draft V7metabolomics Standards Initiative

Naming Conventions for CV and Ontologies Draft v7Metabolomics Standards Initiative

13.10.2006Ontology Working Group

Naming Conventions for

Controlled Vocabularies (CVs) and Ontologies

1Rationale for this document

2(Meta-) Reference Terminology

3General principles for creating representational artifacts

3.1Univocity

3.2Positivity

3.3Objectivity

3.4Try to avoid multiple parenthood and multiple inheritance

4Naming Classes

4.1Class name precision

4.2Synonyms

4.2.1Different sorts of Synonyms ?

4.2.2Property synonyms

4.3Lexical Properties of class names

4.3.1Capitalisation

4.3.2Character set

4.3.2.1Character set formatting

4.3.3Word separators

4.3.3.1Hyphens, dash and slash

4.3.4Singular nouns

4.3.5Use present tense for representational units

4.3.6Plurals and sets

4.3.7Avoid linguistic ellipses

4.3.8Acronyms and Abbreviations

4.3.9Registered Product- and Company-names

4.3.10Word compositions and length

4.3.10.1Compound vs. atomic names for representational units

4.3.10.2Splitting and merging classes

4.3.11Affixes (prefix, suffix, infix and circumfix)

4.3.12Logical connectives

4.3.13"Taboo" words and Characters

4.3.14Specific language requirements

5Depicting representational units within text

6Class definitions

6.1General rules for creating sound normalized definitions

7Unique identifiers

7.1Capturing the class name and ID using the autoID plugin in Protégé-owl

7.2Life science Identifier, (LSID:

8Namespace

9Ontology Imports in Protégé-owl

9.1The “lang” attribute issue

9.1.1Import

9.1.1.1Importing from repositories (extracted from the Protégé wiki)

9.1.1.2Changing the imported ontology to be the newest updated version

10Properties (Attributes and Relations)

10.1Assigning "key-properties" to top level classes

11Naming of Ontology files and Ontology Versions

12References

1Rationale for this document

This document defines naming conventions for controlled vocabularies (CVs) and ontologies. Metadata annotation elements are not covered here; these are addressed in the <Metadata Annotations for Representational Units and Representational Artefacts> document [1].

These recommendations have been developed to guide the activities of the Metabolomics Standards Initiative (MSI)[2] Ontology Working Group (OWG) [3].

The MSI OWG seeks to facilitate the consistent description of metabolomics experiment components by reaching a consensus on a core set of CVs and then developing an ontology. The CVs are developed in close collaboration with the HUPO Proteomics Standards Initiative (PSI) [4] and structured as taxonomies in owl and OBO format. The ontology is developed as part of the Ontology for Biomedical Investigation (OBI, previously ‘FuGO’) [5], a larger, multi-domain collaborative effort.

These naming conventions are also used in the context of the OBI, developed in OWL.

2(Meta-) Reference Terminology

Knowledge representations (KR, also called representational models) are referred to with the term‘representational artefact’, RA). A representational artefact is made of related ‘representational units’(RU, also known as KR-idioms) - in most cases classes and properties. We recommend usingthe term ‘class’ to refer to the representational unit that models a ‘universal’ in an ontological representational artefact. Each class has a ‘class name’, a term (string) to designate the class. An ‘Instance’ is the representation of a ‘particular’ in reality. A particular instantiates a universal and an instance instantiates a class.Properties of universals are represented through representational units called ‘properties’. Propertieswhich have fillers of simple datatypes(e.g. integer, string, boolean, ...) are called ‘attributes’ or ‘datatype properties’. Properties which have classes or instances as their fillers (also called ‘range’) are called ‘relations’ or ‘object properties’. Confusingly other formats use the word "property" for restrictions.The word ‘domain’ can mean a group of classes that a property is asserted to (in owl), but also describes the area of interest of a representational artefact.

For a detailed recommendation have a look at the full paper:

The following key words “MUST,” “MUST NOT,” “REQUIRED,” “SHALL,” “SHALL NOT,” “SHOULD,” “SHOULD NOT,” “RECOMMENDED,” “MAY,” and “OPTIONAL” are to be interpreted as described in RFC-2119, see S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, Internet Engineering Task Force, RFC 2119, March 1997.

Sections in Brackets [...] are comments for the editor. Please ignore these.

3General principles for creating representational artifacts

Become acquainted with the capablities and incapabilities of both the representation formalism and its implementation (an ontology engineering tool) of your choice.

Don’t get into 'analysis paralysis'! You will not get it right at the first time! Sometimes one has to throw things away and start again.Do not get into the ‘naïve euphoria’ either. Not every fancy just-built piece of representation is an ontology worth bothering others.

Save often! Always save to a new version number including the date. Protégé-OWL is not yet completely stable. Undo is difficult and bugs occasionally corrupt ontologies beyond retrieval.

General Ontology Engineering Axioms:

Every class has at least one instance

Distinct classes on the same level and leaf classes never share instances

3.1Univocity

Names of RUs(including the ones for relations) should have the same meaning on every occasion of useand refer to the same universals and kinds of entities in reality. Each name should refer to exactly one RU, and each RU should represent exactly one entity in reality (a universal in the case of a class). In effect, it should unambiguously refer to the same entity in reality. Note that this principle of univocity excludes homonyms, terms that are used as names of more than one RU. For example, if you use the term ‘cell’ as a name of the class representing (the type of) cells as found in all organisms, the same term should not be used as a name for a more specialized class representing (the type of) cells as found only in plants.Likewise, the term ‘part of’ should not be used to name more than one relation, e.g., partonomy, set membership, etc.

Further more:

Don’t confuse universals with ways of getting to know types

Don’t confuse universals with ways of talking about types

Don’t confuses universals with data about types

3.2Positivity

Complements of classessuch as ‘non-mammal’ or ‘non-membrane’ are not necessarily themselves classes and don’t designate genuine universals. Similarly, do not represent the absence of a wing as the presence of the non-existence of a wing, e.g.: 'wing' has_status "absent". The positivity recommendation may need to be weakened; sometimes it can make sense to have e.g. an "ex-vivo" role or a “non-living_organism”.

3.3Objectivity

No distinction without a difference. A child class must differ from its parent class in a distinctive way. A child class must share all the properties of its parent classes (inheritance principle) and have additional ones that the parents have not. Each class must be defined in a formula which states the necessary and sufficient conditions for being an instance of the corresponding universal. The sibling class of a given parent class should have differentia which are really distinct. This means that the universals of these classesat least have distinct (ideally non-overlapping= single inheritance) extensions. The distinction between each pair of siblings must be explicitly represented (opposition principle).

Which universals exist is not a function of our biological knowledge. Be aware that terms such as ‘unknown’ or ‘untypified’ or ‘unlocalized’ do not designate genuine universals. To characterize classes, formulate intrinsic properties (properties that are inherent to the universal represented by theRU) rather than extrinsic ones (properties that are asserted from outside, e.g. accession numbers). ‘Intrinsic’ describes a characteristic or property of some thing or action which is essential and specific to that thing or action, and which is wholly independent of any other object, action or consequence. A characteristic which is not essential or inherent is extrinsic (from

3.4Try to avoid multiple parenthood and multiple inheritance

No class in the hierarchy should have more than one superclass. Multiple inheritance can generate subtle but systematic ambiguity in the meaning of formal relations like is_a and part_of within the ontology. One should not press the "is_a" into service to mean a variety of different things (see univocity principle). Domain-experts should build single parenthood taxonomies of their views of reality. Other domain experts build the same for theirs and only later all these taxonomies will get ‘multidimensionally’ aligned within obo and secure common nodes will result which make consistent (!) multiple inheritance possible.

There are however many opinions on this issue and we might discuss this matter further, when we feel there is a real need for multiple parenthood.

4Naming Classes

Each class representing a universal in a representational artefact is labelled with a human readable class name. Class names should be short, easy to remember and as self-explanatory as the pragmatic compromise allows. This class name should be used as default browser key when navigating through the class hierarchy and should therefore be as intuitive as possible to the ontology engineer building the ontological structure. However this class name will not necessarily be used as the main search attribute by the end-users when they are searching for classes. For this a short and intuitive class name should be captured as preferred synonym, which would be the term of highest usage frequency found in the literature of that domain, i.e. the term with the highest user acceptance. Use a name that is most widely accepted in the user domain.The class should represent and be named after the intrinsic, underlying nature of the universal to be represented, not according to extrinsic properties or roles a class can play in a particular context. Embodying the whole meaning of the class - with all its relationships to other classes - in its name is in most cases neither possible nor recommended.Keep semantics in the definitions and formalize it explicitly as properties and axioms. For example, a class “distinct_identifiable_physical_part” should be just called “physical_part”. For the preferred synonym readability should have higher priority than constraining interpretation through the class names. For the class name that is used for OE, it is the other way round.

Epistemologicalstatements don't belong in the class names so avoid calling the class “instrument” “instrument_class” or the relation “has_part” “has_part_relation”.

4.1Class name precision

Class names should be precise, concise and linguistically correct (i.e. they should conform to the rules of the language in question).Often terms for RUs are not precise, i.e. they do not capture the intended meaning. Imprecise terms are especially problematic in the absence of good definitions. For example the term “anatomic_structure, system or substance” does not give us any clue as to whether the scope of the adjective prefix “anatomic” is restricted to structure or extends also to system and substance. This ambiguity can lead to problems like the following: If “anatomic” is restricted to “structure” only, then “drug” and “chemical” would be classified under this class, since these are clearly substances. If it is not restricted “drug” and “chemical” could not be classified under this class.

4.2Synonyms

A strict definition of synonymy, as e.g. proposed by ISO 1087-1:2000 is: “… relation between or among terms in a given language representing the same concept, with a note to the effect that terms which are interchangeable in all contexts are called synonyms; if they are interchangeable only in some contexts, they are called quasi-synonyms. “

The number of synonyms for a class is not limited, and the same text string can be used as a synonym for more than one class. Add synonyms if you edit or delete a class name, but the old name is still a valid synonym, e.g. if you change "respiration" to "cellular_respiration", keep "respiration" as a synonym. This helps other users to find familiar classes. Add synonyms if the class name has (or contains) a commonly used abbreviation. Acronyms are synonymous with the full name as long as the acronym is not used in any other sense elsewhere. 'Jargon' type phrases are synonymous with the full name as long as the phrase is not used in any other sense elsewhere.

To capture synonyms in owl, one can use the rdf:comment field, and add a comma separated list of synonyms after a “synonym: ”-marker. Another way would be to create a new metaclass with a new string datatype property “has_synonyme” and derive all new classes from this new metaclass (see also This has the disadvantage of the whole ontology becoming OWL-full. Capturing synonyms in further rdfs:label fields has the disadvantage that when more synonyms are present, it is not possible to know which one is the preferred class name, the human readable class name to display as the browser key and which is another kind of synonym. Usually the alphabetically first rdfs;label would be displayed.

4.2.1Different sorts of Synonyms ?

As we saw above synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the class they are attached to. Some synonyms may be broader or narrower in meaning than the class name; it may be a related phrase or alternative wording, spelling or use a different system of nomenclature. Having a single, broad relationship between a class and its synonyms is adequate for most search purposes, but for applications such as semantic matching, the inclusion of a more formal relationship set is valuable. For this reason, one could record a relationship type for each synonym, e.g. like GO does. Such relationships can be stored in the OBO format flat file.

Synonym types:

Some synonym relationship types are:

* the term is an exact synonym to the class name, “ornithine_cycle” is an exact synonym of “urea_cycle”

* the term is related to the class name, “cytochrome_bc1_complex” is a related synonym of “ubiquinol-cytochrome-c_reductase_activity”

* the synonym is broader than the class name, “cell division” is a broad synonym of “cytokinesis”

* the synonym is narrower or more precise than the class name, “pyrimidine-dimer_repair_by_photolyase” is a narrow synonym of “photoreactive_repair”

* the synonym is related to the class name, but is not exact, broader or narrower, “virulence” has a synonym type of other related to the class name “pathogenesis”

However we do not recommend to capture such ‘synonym types’ as the GO style guide suggests. Capture only exact synonyms.

For the OWL format one could use the W3 standard for thesauri ‘Simple Knowledge Organisation System’ (SKOS, ) to encode synonym types through relations like “narrower than”, “broader than”. It also provides a “preferred label” and "related to" element for terminological mapping:

The SKOS Core Vocabulary includes the following properties for asserting semantic relationships between concepts: skos:semanticRelation, skos:broader, skos:narrower and skos:related. In a property hierarchy semanticRelation is the top semantic relationship and others are children relationships. To assert that one concept is broader in meaning (i.e. more general) than another, where the scope (meaning) of one falls completely within the scope of the other, use the skos:broader property. To assert the inverse, that one concept is narrower in meaning (i.e. more specific) than another, use the skos:narrower property.

<skos:Concept rdf:about="

<skos:broader rdf:resource="

</skos:Concept>

To assert that one concept is broader in meaning (i.e. more general) than another, where the scope (meaning) of one falls completely within the scope of the other, use the skos:broader property. To assert the inverse, that one concept is narrower in meaning (i.e. more specific) than another, use the skos:narrower property. For example:

<rdf:RDF

xmlns:rdf="

xmlns:skos="

<skos:Concept rdf:about="

<skos:prefLabel>mammals</skos:prefLabel>

<skos:broader rdf:resource="

</skos:Concept>

<skos:Concept rdf:about="

<skos:prefLabel>animals</skos:prefLabel>

<skos:narrower rdf:resource="

</skos:Concept>

</rdf:RDF>

When you add a synonym in OBO-format using OBO-Edit, choose a type from the pull-down selector (see the DAG-Edit user guide for more information). DAG-Edit will incorporate the synonym type into the OBO format flat file when you save. The default synonym type is the broadest, 'synonym' (equivalent to 'related' above).

4.2.2Property synonyms

One can also create Object Property Synonymes (see section 4.1 of e.g:

<owl:ObjectProperty rdf:ID="has_child">

<owl:equivalentProperty>

<owl:ObjectProperty rdf:ID="has_kid"/>

</owl:equivalentProperty>

</owl:ObjectProperty>

4.3Lexical Properties of class names

4.3.1Capitalisation

Names should be lower case letters throughout except for acronyms which are capitalised (if their use in class names can't be avoided) and proprietary names, which are written as such. Proper names / brand names can break the conventions rules unless rdf-field restrictions prevent these. E.g. there can be a "CBS_station" (starting with a capital letter) and there can be a CamelCase brand name. This is the recommendation of the OBO-Consortium. The other KR-domains (semantic web / OWL, Protégé-group), use capitals for beginning class names, while proprietary names and properties start with lower case letters.

Internal capitalization is however enforced by some computer systems, and mandated by the coding standards of many programming languages, i.e. Java coding style dictates that UpperCamelCase be used for classes, and lowerCamelCase be used for instances and members. So unless you plan to use auto generated java classes or any MDA approaches to convert the ontology into software code avoid CamelCase.

4.3.2Character set

Terms designating RUs should consist mainly of alphabetic characters, numerals and underscores. Whether you will be allowed to use the space as word delimiter depends on the way the implementation handles the strings for the representational unit in question. Avoid special characters where possible. Avoid accents, sub- or superscripts and characters and character-combinations that may have a special meaning in regular expressions or programming languages and XML. This recommendations are largely dependant on what the parsers for the implementation format for the specific RU can handle, e.g. OWL identifiers (values of the rdfID / :NAME property) must begin with a letter or underscore and contain only letters, numerals, and the underscore character (‘_’). Spaces are not allowed here. For the full less restrictive specification see

NCNameStartChar::=Letter | '_'

NCNameChar::=NameChar - ':'

( NameChar::=Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender )