Introduction to Schemas

The schema defines the elements that can appear within the document and the attributes that can be associated with an element. It also defines the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements. It defines whether an element is empty or can include text. The schema can also define default values for attributes.

Consider a simple XML document that contains only three elements, "PGROUP," "PERSONA," and "GRPDESCR," as in the following example:

<?xml version="1.0" ?>

<PGROUP>

<PERSONA>MACBETH</PERSONA>

<PERSONA>BANQUO</PERSONA>

<GRPDESCR>generals of the king's army.</GRPDESCR>

</PGROUP>

In all documents of this type, the document element is "PGROUP." In these examples, "PGROUP" does not contain any text, but contains only one or more child elements named "PERSONA," and one child element named "GRPDESCR." The "PERSONA" and "GRPDESCR" elements contain only text and do not contain other elements.

One method for describing this schema is the Document Type Definition (DTD), a grammar that can be used to formally describe a particular schema. The DTD for the example document appears as follows:

<!DOCTYPE PGROUP [

<!ELEMENT PGROUP (PERSONA+, GRPDESCR) >

<!ELEMENT PERSONA (#PCDATA) >

<!ELEMENT GRPDESCR (#PCDATA) >

]>

For a formal description of the DTD syntax, see the Extensible Markup Language (XML) 1.0 specification.

Introduction to XML Schemas

A second, and more interesting, method is available to users of Internet Explorer 5. XML Schema, like DTD, can be used to specify the schema of a particular class of documents. Unlike DTDs, however, XML Schema uses XML syntax. This is convenient since you are not required to learn a completely new syntax just to describe your grammar—although you do need to learn how to declare elements and attributes using XML Schema. In addition, XML Schemas offer a number of other significant advantages over using DTDs, which are described in the following topics.

  • Content Model
  • Data Types
  • Extensibility
  • XML Schemas and the DOM

The XML Schema implementation that ships with Internet Explorer 5 is based primarily upon the XML-Data Note ( ), posted by the W3C in January 1998, and the Document Content Description (DCD) ( ) note. XML Schemas in Internet Explorer 5 provide support for the subset of XML-Data that coincides directly with the functionality expressed in DCD, albeit in a slightly different XML grammar.

The XML Schema implementation provided with Internet Explorer 5 focuses on syntactic schemas, without support for inheritance or other object-oriented design features. The implementation provided with Internet Explorer 5 is described in the XML Schema Reference.

The following sample demonstrates how to use XML Schema to specify the schema for the sample document.

<?xml version="1.0"?>

<Schema name="schema_sample_1"

xmlns="urn:schemas-microsoft-com:xml-data"

xmlns:dt="urn:schemas-microsoft-com:datatypes">

<ElementType name="PERSONA" content="textOnly" model="closed"/>

<ElementType name="GRPDESCR" content="textOnly" model="closed"/>

<ElementType name="PGROUP" content="eltOnly" model="closed">

<element type="PERSONA" minOccurs="1" maxOccurs="*"/>

<element type="GRPDESCR" minOccurs="1" maxOccurs="1"/>

</ElementType>

</Schema>

Defining Elements and Attributes

The most important part of any schema is the formal specification of what elements and attributes are allowed within a particular class of document, and how those elements and attributes are related to each other. Elements and attributes are defined in an XML Schema document by specifying an <ElementType ...> and <AttributeType ...>, respectively. These provide the definition and type of the element or attribute. An instance of an element or an attribute is declared using <element ...> or <attribute ...> tags. This can be thought of as being analogous to declaring a typedef in the C programming language and then declaring a variable of that type.

<?xml version="1.0"?>

<Schema xmlns="schemas-microsoft-com:xml-data">

<ElementType name="title" />

<ElementType name="author" />

<ElementType name="pages" />

<ElementType name="book" model="closed">

<element type="title" />

<element type="author" />

<element type="pages" />

<AttributeType name="copyright" />

<attribute type="copyright" />

</ElementType>

</Schema>

In this example, there are four <ElementType> elements: "title," "author," "pages," and "book." These are simply definitions of the elements. Within the ElementType for "book," however, you can declare the content model for a book. Each book contains "title," "author," and "pages" elements using the <element> with a type attribute that references the ElementType. You can also define an <AttributeType> for the copyright attribute and then declare its usage using the <attribute> element with a type attribute that references its definition.

Note also that the definition of the copyright attribute is contained within the ElementType for "book." Attribute definitions are distinct from ElementType definitions in that they can be declared within the scope of an ElementType, allowing different element types to declare attributes of the same name but with (potentially) different meaning. <AttributeType> elements can be declared globally by placing them outside the context of an ElementType. In this way, multiple elements can share the definition of a common attribute without having to redeclare the AttributeType within each ElementType.

Content Model

You can specify the content model for an ElementType by using <element> to reference other ElementTypes. For example:

<Schema xmlns="urn:schemas-microsoft-com:xml-data">

<ElementType name="title" content="textOnly"/>

<ElementType name="authors" content="textOnly"/>

<ElementType name="pages" content="textOnly"/>

<ElementType name="book" order="seq" content="eltOnly">

<element type="title" />

<element type="authors" />

<element type="pages" />

</ElementType>

</Schema>

In the above example, content of a "book" element is defined to be the sequence of "title", "author" and "pages" elements. This allows you to write the following valid XML document:

<book xmlns="x-schema:book-schema.xml">

<title>Applied XML: A Toolkit for Programmers</title>

<authors>Alex Ceponkus and Faraz Hoodbhoy</authors>

<pages>474</pages>

</book>

This is a very simple case. There are a number of mechanisms in XML Schema that allow you to build more complex content models, including being able to group elements and define cardinality.

Open Content Models

The most important innovation for content models in XML Schema is that content models are "open" by default. An open content model enables additional elements and/or attributes to be present within an element without having to declare each and every element in the XML Schema. This provides an extensibility mechanism not present when using Document Type Definitions (DTDs). For example:

<book xmlns="x-schema:book-schema.xml" xmlns:x="urn:some-new-namespace">

<title x:id="123">Applied XML: A Toolkit for Programmers</title>

<authors>Alex Ceponkus and Faraz Hoodbhoy</authors>

<pages>474</pages>

<x:publisher>Wiley Computer Publishing</x:publisher>

</book>

The following constraints apply on what is allowed in an open content model:

  1. You cannot add/remove content that will break the existing content model. For example, since <book> is defined as a sequence, the valid data must provide that exact sequence first, before adding any "open" content. So removing the <pages> element or providing two <title> elements next to each other would cause validation to fail.
  2. You can add undeclared elements so long as they are defined in a different namespace.
  3. You can add other elements declared in the same schema. For example, a second <title> element after the <pages> element will validate.

You can change this default behavior and specify a closed content model in instances where an open content model is not desired. In this case the model attribute can be used on the ElementType. For example:

<ElementType name="book" model="closed">

This indicates that a "book" element can only contain the content specified—"title," "author," and "pages" elements. In this case the above <book> example with the extended elements will not validate.

Content

An element can contain text, other elements, a mixture of text and elements, or nothing at all. Use the content attribute to specify what the element will contain. Possible values are "textOnly", "eltOnly", "empty", and "mixed". For example, to indicate that an element contains text but no sub-elements, write:

<ElementType name="street" content="textOnly" />

Content for an element can also be limited to contain only sub-elements. For example:

<ElementType name="PurchaseOrder" content="eltOnly">

The value "empty" for content indicates that text and sub-elements are not allowed. The value "mixed" allows both text and sub-elements.

If an ElementType has a datatype specified such as date, number, etc., then "textOnly" is implied, and not required.

MinOccurs and MaxOccurs

The minOccurs and maxOccurs attributes specify how many times an element can appear within another element.

<element type="Item" maxOccurs="*" />

The maxOccurs attribute is a constraint rule, specifying the maximum number of times that a sub-element may appear. Valid values for maxOccurs include integers and "*", which indicates that an unrestricted number of elements may appear. The default value for maxOccurs is "1"; however, when content="mixed", the default value is "*".

Similarly, you can specify a minimum number of times a sub-element may appear with minOccurs. For example, to make a sub-element optional, set minOccurs to "0". The default value for minOccurs is 1.

These attributes can be used for both element and group declarations.

Order

The order attribute specifies whether sub-elements are required to appear in a certain order, and if only one sub-element of a set can appear.

The "seq" value indicates that sub-elements must appear in the order listed in the schema. For example:

<ElementType name="PurchaseOrder" order="seq">

The "one" value specifies that only one sub-element can be used from a list of sub-elements. For example, to specify that an "Item" element may contain either a "product" element or a "backOrderedProduct" element, but not both:

<ElementType name="Item" order="one">

<element type="product" />

<element type="backOrderedProduct" />

</ElementType>

The "many" value specifies that the sub-elements may appear in any order, and in any quantity.

The default value for order is "seq" when the content attribute is set to "eltOnly", and the default is "many" when content is set to "mixed".

Group

The group element enables you to specify constraints on a specific set of sub-elements. For example, to indicate that the "Item" element has either a "product" or a "backOrderedProduct" element, and then a "quantity" and "price", you can use the following XML:

<ElementType name="Item">

<group order="one">

<element type="product" />

<element type="backOrderedProduct" />

</group>

<element type="quantity"/>

<element type="price"/>

</ElementType>

The group element accepts the order, minOccurs, and maxOccurs attributes.

Attributes

Attributes are more limited in some ways than elements. For instance, attributes cannot contain sub-elements, and you cannot require attributes to appear in any particular order; nor can you pose alternatives, such as a "product" or a "backOrderedProduct". You can specify whether an attribute is required or optional, but an attribute may only appear once per element.

However, attributes have some capabilities that elements do not: attributes may limit their legal values to a small set of strings, and may indicate a value to be inferred if the attribute is omitted from an element. Most importantly, different element types may have attributes with the same name. These attributes are considered to be independent and unrelated.

The following example specifies that the attribute is required:

<AttributeType name="shipTo" dt:type="idref" required="yes"/>

The following limits the value of an attribute to words from a small list:

<AttributeType name="priority" dt:type="enumeration" dt:values="high medium low" />

To give a default value to an attribute (that is, a value that an application should infer if the attribute is not present in the document instance) use the default attribute, as follows:

<AttributeType name="quantity" dt:type="int">

<attribute type="quantity" default="1"/>

Data Types

Unlike a Document Type Definition (DTD), XML Schema allows you to specify a data type for an element or attribute. Data types indicate the format of the data, provide for validation of the type by the XML parser, and enable processing specific to the data type in the XML Document Object Model (DOM).

Data type support includes primitive data types common to programming languages as well as the special attribute types included in the XML Language Specification (for example, ID, IDREF, and NMTOKEN). A complete list of the data types can be found in the XML Data Types Reference.

To use the data type support included with Internet Explorer 5, your XML Schema must include the datatypes namespace. The top-level <Schema> element declaration would look like this:

<Schema name="myschema"

xmlns="urn:schemas-microsoft-com:xml-data"

xmlns:dt="urn:schemas-microsoft-com:datatypes">

<!-- ... -->

</Schema>

Within the <Schema > element, data types can be specified on an <ElementType> or <AttributeType> basis using one of two forms:

  • dt:type attribute
  • <datatype> element

Both forms are demonstrated below shown in a declaration—the samples are equivalent.

<ElementType name="pages" dt:type="int"/>

<ElementType name="pages">

<datatype dt:type= "int"/>

</ElementType>

Note that although XML Schema support in Internet Explorer 5 allows data types to be specified within attributes, only the following data types are supported within attributes by the parser and DOM: string, id, idref, idrefs, nmtoken, nmtokens, entity, entities, enumeration, and notation.

In the following schema, the shipTo attribute has a data type of "idref."

<AttributeType name="shipTo" dt:type="idref"/>

<attribute type="shipTo"/>

An attribute whose data type is "idref" holds an identifying value in the document instance, a value that does not appear on any other idref attribute in the document.

Two other related data types are "id" and "idrefs." An attribute of type "id" acts as a reference to the element with the matching id value. The data type "idrefs" is similar to "id," but it holds a list of ids, separated by spaces. For example,

<PurchaseOrder items="Item-1 Item-2">

Declaring Attribute Data Types on Elements Through Schema

In Internet Explorer 5.01, elements support the id attribute data type. You can declare it on an element node through schema just as you would declare it on an attribute node. The two ways of doing this are with the dt:type attribute on the <ElementType> element, or with the <datatype> element within an <ElementType>.

Below, with the dt:type attribute on the <ElementType> element:

<ElementType name="Element1" dt:type="id"/>

or with the <datatype> element within an <ElementType > element:

<ElementType name="Element2">

<datatype dt:type="id">

</ElementType>

Using other attribute data types on elements is not currently supported; however, their use does not cause a validation error to occur.

Declaring Simple Data Types on Attributes Through Schema

In Internet Explorer 5.01, you can use simple data types (for example, int, boolean, float, and so on) when you declare an attribute. Use either the dt:type attribute on the <AttributeType > element, or a <datatype> element within an <AttributeType> element.

Below, the dt:type attribute on the <AttributeType> element:

<AttributeType name="att1" dt:type="int"/>

<ElementType name="Element1">

<attribute type="att1"/>

</ElementType>

or with a <datatype> element within an <AttributeType> element:

<AttributeType name ="y"/>

<datatype dt:type="int"/>

</AttributeType>

<ElementType name="x">

<attribute type="y"/>

<ElementType>

Extensibility

Unlike a Document Type Definition (DTD), XML Schemas are extensible. That is, they are built on an open content model. Schema authors are free to add their own elements and attributes to XML Schema documents. For example, you could add additional constraints to a declaration for a "pages" element. The following sample code declares the "pages" element and assigns it an "int" datatype. Extended tags from the "myExt" namespace are used to augment this information with an added limitation that books must have a minimum of 50 pages and a maximum of 100 pages.

<ElementType name="pages" xmlns:myExt="urn:myschema-extensions">

<datatype dt:type="int" />

<myExt:min>50</myExt:min>

<myExt:max>100</myExt:max>

</ElementType>

While validation will only check that the value of a particular "pages" element is an integer, your application can use the information provided by the added elements in the "myExt" namespace to perform additional validation.

It's worth noting that the elements that Microsoft added to the schema were qualified by a namespace. Use of a namespace is a requirement for adding elements and attributes to a schema. For more information on namespaces, see Introduction to Namespaces.