User Guide to Teknowledge Ontologies

User's Guide to Teknowledge Ontologies

Deborah Nichols and Allan Terry

Teknowledge Corp.

December 3, 2003

This work was sponsored by the Air Force Research Laboratory under
contract F30602-02-C-0045

User's Guide to Teknowledge Ontologies

Table of Contents

I. What Is an Ontology?

II. Teknowledge Ontologies

A. SUMO

B. MILO

C. Domain Ontologies

D. Knowledge Interchange Format

III. Elements of an Ontology: Terms

A. Classes

B. Predicates

C. Other Types of Terms

IV. Ontologies and Logic: Sentences

A. Logical Operators

B. Predicate Logic

C. Quantification

D. Inference Rules

E. First-Order vs. Higher-Order Logic

F. Sentences as Terms: Metaknowledge

V. How to Read a Teknowledge Ontology

A. Ontology File Layouts

B. Strategies for Understanding Ontologies

VI. Using Ontologies

A. From Ontology to Knowledge Base

B. Uses of Ontologies

Appendix A: SUO-KIF Ontology Quick Reference

Appendix B: The Plane Shapes Ontology

References

I. What Is an Ontology?

There are many definitions of ontology. Here we are concerned with ontologies used in intelligent computer applications.[1] In this document, an ontology is a systematic formalization of concepts, definitions, relationships, and rules that captures the semantic content of a domain in a machine-readable format.

Ontologies created for computer applications are written in a formal language that is machine-readable. Formalized ontologies are instruments for capturing the meanings of concepts so that they may be used for improved, automated management of information. Ontologies may cover very general concepts or (more often) represent specific and restricted domains. The selection of concepts and their level of detail will depend on the characteristics of the domains to be covered and the operations needed. For example, a map program may use a simple representation for bridges, while a traffic-planning program needs a more complex concept in which bridges have lanes, access ramps, etc. A program whose purpose is to aid civil engineers in the design of bridges will require an even more in-depth representation of bridges, their parts, and even the forces that affect them. The purpose of an ontology influences both its scope and its degree of formal complexity.

A notable exception to the domain-specific design of ontologies is the field of Upper Ontology. Upper ontologies identify and define general concepts. Their purpose is to serve as:

(a) a foundation for more specialized ontologies;

(b) a framework for integrating domain-oriented ontologies; and/or

domain(s) but use a different vocabulary.

Some notable upper ontologies offering coverage of very general terms include the Suggested Upper Merged Ontology (SUMO) discussed below, the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), and the Cyc Upper Ontology.[2] Ontology applications by Teknowledge use the SUMO upper ontology, for all three of the reasons listed above.

General-purpose mid-level ontologies reach further toward the domain level in specificity but are still general enough to support the functions listed above. The following diagrams illustrate some examples of how upper and mid-level ontologies may be used in relation to specialized domain ontologies. Figure 1 shows a specialized ontology linked to appropriate parts of an upper-level ontology. The upper-level ontology provides a foundation into which more specific concepts are set. Definitions for the terms “Vehicle,” “Region,” etc., do not have to be created from scratch because they already exist in the upper ontology. Figure 2 shows several specialized ontologies being consolidated by using upper and mid-level ontologies as an integration framework. Such systematic integration enables the reuse of multiple, readily available, standard classification schemes to create a comprehensive ontology of transportation devices.

Figure 1. Upper Ontology as Foundation

Legend:

Figure 2. Upper (and Mid-level) Ontology as Integration Framework

Legend:

II. Teknowledge Ontologies

Teknowledge creates upper, mid-level, and domain ontologies. Our aim is to create ontologies that are relatively easy for people to understand and practical for computational use. This section describes our knowledge engineering products at all three levels.

A. SUMO

The Suggested Upper Merged Ontology (SUMO) contains the broadest and most abstract concepts needed for ontologies produced by Teknowledge's Knowledge Systems group. SUMO’s purpose is to promote data interoperability, information search and retrieval, automated inference, and natural language processing. SUMO provides an upper-level starting point for further ontology work at Teknowledge and elsewhere. It is freely available under GNU licensing.

SUMO was created by Ian Niles, who maintains it.[3] SUMO was created as an effort of the IEEE Standard Upper Ontology Working Group, and its development was guided by feedback from group members. SUMO is one of only three officially approved starter documents of the SUO Working Group. It has also been proposed to the IEEE as a general standard.

The charter of the SUO Working Group was to create a compact set of concepts that would be easy to understand and use. In line with this mandate, SUMO contains terms chosen to cover the most general concepts needed to represent the world. Approximately 4,000 assertions – including over 800 rules – define its 1,000 concepts. SUMO terms are mapped to all 100,000 WordNet synonym sets. Some of the general topics covered in the SUMO include:

Structural concepts such as “instance” and “subclass”
General types of objects and processes
Abstractions including set theory, attributes, and relations
Numbers and measures
Temporal concepts, such as “duration”
Parts and wholes
Agency and intentionality.

The structural relations defined in SUMO, in particular “subclass,”are basic for defining any ontology, including SUMO itself.

SUMO is available for downloading from the Teknowledge ontology site at A detailed “Overview of the SUMO” is available under the Supporting Material section. It outlines and discusses the topics and terms included in SUMO. Like other Teknowledge ontologies, SUMO is written in the SUO-KIF format discussed below in section II.D.

B. MILO

The Mid-level Ontology (MILO) is Teknowledge's current effort for creating a bridge between the high-level abstractions of SUMO and the domain ontologies. MILO’s coverage is governed by a pragmatic standard. It contains all concepts mentioned at least three times in the Brown corpus. (The Brown Corpus is a well-known compilation of one million words drawn from a variety of printed sources like books and newspapers.[4]) This criterion guarantees that MILO covers the concepts people actually use, rather than some prescriptive set. MILO concepts are intermediate in that they are mapped directly to language used in the everyday world. For example, SUMO defines change of possession in an abstract sense, while MILO contains specializations for such familiar concepts as renting and charging a fee. The Mid-level Ontology is divided into three modules: Objects, Processes, and Abstract (depending upon which SUMO concept the module inherits from). MILO will have on the order of 2,500 concepts. Estimated completion of MILO is in early 2004.

C. Domain Ontologies

Using the foundation of its upper and mid-level ontologies, Teknowledge creates specialized ontologies called domain ontologies that cover topical bodies of knowledge. Domain ontologies may be subdivided into domain theory ontologies and domain data ontologies.

Domain theoryontologies represent subject areas, such as geography, government, transportation, and weapons of mass destruction (WMD). Domain theories add the concepts needed to cover their subject matters. For example, while the upper-level SUMO includes the basic term “Nation,” the geography domain ontology incorporates terms representing bodies of water, climate zones, and landforms. They also contain formalized rules for deep coverage of the concepts and knowledge germane to each topic.

Domain dataontologies contain mainly instance-level information. For example, data ontologies that cover geography may represent the geographic locations of all the United Nations members and their salient geographic features, including mountains and rivers. Another example is Teknowledge’s terrorism ontology, which represents factual information about individual terrorists, terrorist groups, and their actions. Data ontologies resemble databases in their content, but they are expressed in vocabulary that links them to the semantic framework provided by domain theories plus mid-level and upper-level ontologies.

Teknowledge takes a modular approach to knowledge engineering, with upper-level, mid-level, domain theories, and domain data ontologies encoded in separate KIF files. Such modular design allows ontologies to be utilized as needed in an application. This facilitates a flexible and efficient construction of knowledge bases (KBs) tailored to the application at hand. It eliminates the distraction of having concepts irrelevant to the task in a KB, as well as the overhead of using extra memory and extra search space when querying. A laborsaving benefit of modularity is the ability to reuse existing ontologies (including SUMO, MILO, and relevant domain ontologies) when building a new application.

D. Knowledge Interchange Format

Teknowledge's ontologies are encoded in a formal language called SUO-KIF. (“SUO” stands for Standard Upper Ontology, and “KIF” abbreviates Knowledge Interchange Format.) SUO-KIF is a simplification of the full Knowledge Interchange Format proposed as an American National Standard.[5] Readers having some background in logic will recognize SUO-KIF as a First Order Logic (FOL) language using prefix notation.

The logic sentences of an ontology may also be called assertions. In this document, “logic sentence,” “sentence,” and “assertion” will be used interchangeably when describing the discrete logical expressions that make up ontologies. Logic sentences having the form of “if-then” expressions constitute a very important subclass of assertions. “If-then” assertions are referred to as “rules,” “implication rules, or simply “implications.”

Teknowledge ontologies may be exported to the OWL Web Ontology Language being designed by the W3C Ontology Working Group.[6] It should be noted that OWL expresses only a subset of what can be expressed in SUO-KIF. In particular, OWL does not yet express rules. While SUO-KIF is suited for both human and machine consumption, OWL is mainly oriented towards use by computers.

The following sections explain in detail the form and contents of Teknowledge ontology files. Examples of SUO-KIF syntax are included throughout the discussion.

III. Elements of an Ontology: Terms

The elements of an ontology are called terms. Terms represent the basic entities covered in an ontology. Terms are the conceptual building blocks of the ontology – its vocabulary – while logic provides its structure (see section IV). Identifying the terms for an ontology is the first step in covering a domain. For any particular ontology, the aim is to create exactly the terms that are needed to capture the theory or source data, neither more nor less. The terms of an ontology, together with their definitions and other assertions about them, formally encode its domain of knowledge.

In Teknowledge’s SUO-KIF ontologies, terms belong to one of five basic types. A term is a class, relation, function, attribute, or individual. Each of these types is discussed below. The two most important kinds of terms in an ontology are the classes into which individuals can be categorized and the predicaterelations that are used to create links between classes (as well as between other kinds of terms).

The different types of terms play different syntactic roles in SUO-KIF assertions, and they may be distinguished – and recognized – on that basis. In addition, we adopt some naming conventions to differentiate types of terms at sight. These conventions are not part of the formal syntax of SUO-KIF, but they are reliable conventions in Teknowledge ontologies. The most salient ones are based on capitalization. The names of classes, attributes, functions, and individuals are capitalized, as in these examples: “PassengerShip” (a class), “Titanic” (an individual), “Unsinkable” (an attribute), and “TripFn” (a function). All predicates, and only predicates, begin with lower-case letters, e.g., “subclass” and “instance.” (Predicates are one subtype of KIF relations, functions are another; the former are not capitalized, while the latter are.) Note that white space is not allowed within the names of terms (e.g., “PassengerShip,” “US_President”). Class names typically use the singular noun (e.g., “Ship” rather than Ships).

In what follows, each type of term is discussed. In part B, the important relations called Ontology Linking Predicates (OLPs) receive special attention, because they are used to create the logical structure of an ontology. An example ontology of two-dimensional shapes is developed throughout for purposes of illustration.

A. Classes

The conceptual scope of an ontology comes mainly from its classification scheme. Classes are used to summarize general knowledge about a kind of thing, because classes can efficiently represent characteristics that may apply to many individuals. Concepts that are good candidates for representation with classes are those that (a) define natural categories or (b) are salient, essential, and/or permanent qualities that individuals may have. Characteristics that define natural kinds, such as being human or being a dog, are represented as classes (e.g., “Human”) whose instances have that quality. Classes are like generic nouns that are applied to distinct, named or nameable, individuals (e.g., human, dog, company). An individual member of a class has the general character of the class, but, as an individual, it may have other characteristics and relationships as well. Individual members of a class can be described in some detail, including in some cases their location in time and space, their ages, their other characteristics, and their relations to other individuals in the same or other classes.

Classes are used to relate concepts within a domain theory of knowledge. The most basic relationship is the subordination of one class to another. For example, within the domain of living things, we can state which classes are subtypes of others, e.g., the class DomesticDog is a subtype of the class Canine. Subtype relations can be used to create a hierarchical tree of subclasses, called a “taxonomy.”

Data-level facts are represented by creating named individuals that belong to a class and relating them to other individuals or classes. Some examples of such information (in this case, fictional) are: Lassie is a DomesticDog; Lassie is a Collie; Timmy is a Human; Lassie belongs to Timmy; and, Timmy loves Lassie.

Each class needs a clear definition that captures its meaning. Human users will often interpret the meaning of a term based on its name or documentation, but an inference engine can only use what has been formally encoded. So it is preferable that a class definition be formalized in SUO-KIF, using at least the subclass relation introduced in the next section. In most of Teknowledge’s ontologies, fuller definitions are formalized by adding inference rules (which are introduced in section IV.D).

(1) Scope

Class terms represent the basic categories of any ontology. These are the types into which individuals in the domain can be classified. An important objective when constructing an ontology is to cover all the terms needed, also to distinguish the terms from one another clearly.

The example to be developed, an ontology of plane geometric figures, will include the following terms in its scope:

PlaneShapes

Circles

EquilateralTriangles

Trapezoids

IsoscelesTriangles

Rhombuses

ScaleneTriangles

Quadrilaterals

Parallelograms

Ellipses

Triangles

Rectangles

Squares

(2) Hierarchy

Once the basic terms of an ontology are identified, they must be properly related to one another. Determining and stating the hierarchical relationships between classes is a fundamental step in building an ontology. This encodes domain knowledge about which concepts are more general and which are specializations. An ontology may contain one or more root terms and thus may contain one or more hierarchies, which may be distinct or overlapping. In a formalized ontology, the hierarchy of subclass relations is stated in the formal language (see section B, below).

For human users, a graphical presentation of hierarchy may be easier to grasp. So initially we present the structure of our sample ontology graphically in an indented list. The arrangement below shows relationships of more general to more specific classes in the two-dimensional shapes ontology. There is a single root concept, PlaneShapes.

PlaneShapes

Circles

Ellipses

Triangles

EquilateralTriangles

IsoscelesTriangles

ScaleneTriangles

Quadrilaterals

Parallelograms

Rectangles

Squares

Rhombuses

Trapezoids

Some ontologies are no more than trees of terms linked by subclass relations. (These may be called ‘classification ontologies’.) Additional information about classes may include identifying disjointness, or exclusivity, of classes that may not share members, such as biological species. The PlaneShapes ontology contains some disjoint subclasses of figures, based upon their number of sides (e.g., Triangles, Quadrilaterals). There are also disjoint subclasses of Triangles, based on the relative lengths of their sides (i.e., having none, two, or three that are equal in length).

(3) Inheritance

One feature of hierarchies is that items that are lower in a hierarchy (that is, more specific) inherit characteristics from the items above them. Inheritance has two aspects. First, conceptually, the definition of a term lower in the hierarchy includes (or implies) the definitions of the terms directly above. For example, Squares, like the parent concept Rectangles, is by definition a class of PlaneShapes whose sides meet in right angles (plus its additional condition of having all sides equal). The second aspect of inheritance is that individuals belonging to a lower class have all the properties of the subsuming higher classes as well. For example, in the PlaneShapes example, all individuals that fall under any subclass of Quadrilaterals have four sides.