XML, Validation, and Extra Cheese

Charles Heinemann

August 19, 1998

The whole thing began when I ran up to the fourth floor to make a quick pizza delivery. Now, the delivery went fine. It wasn’t until I got down to the parking lot -- and realized that I hadn’t got my parking ticket stamped -- that things went awry.

I got involved in the pizza business about four weeks ago, right about the last time ( I wrote to you. My uncle Edd recently started an online pizza business, a franchise of the All-American Pizza Co. Well, every month, the All-American Pizza Co. posts a menu update. This update is marked up in XML, so that each franchisee can gain access to the menu in a way he or she sees fit. Along with this monthly update, the mother company also posts a Document Type Definition (DTD), describing the content model for these monthly updates. The point of posting the DTD is so that the application Uncle Edd or his fellow pizza brokers write to display the menu will understand the logical structure of the XML document containing the update. With the DTD, Uncle Edd can both prepare for the data he will receive each month and validate the updates.

There was one small problem, however. Uncle Edd had no idea what to do with the said DTD. He, of course, immediately got me on the horn, asking for my assistance.

Over the phone, I briefly explained that XML can be both well formed and valid. To be well formed, the XML need simply adhere to the syntax rules as laid out in the XML specification. To be valid, however, the XML document must adhere to the logical structure described in the DTD. After a short pause, which I was sure signaled an intense lack of understanding, I told uncle Edd that I’d be over in a minute to explain the DTD.

The DTD itself, a file called pizzas.dtd, is pretty simple:

<!ELEMENT pizzas (pizza)*>

<!ELEMENT pizza (name, toppings, description, price)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT toppings (topping)+>

<!ELEMENT topping (#PCDATA)>

<!ELEMENT description (#PCDATA)>

<!ELEMENT price (#PCDATA)>

It basically describes an XML document such as the following (the example being a subset of the actual menu):

<pizzas>

<pizza>

<name>The Nebraskan</name>

<toppings>

<topping>corn nibblets</topping>

<topping>mozzarella cheese</topping>

<topping>tomato sauce</topping>

</toppings>

<description> With every corn-laden slice, the memories of those

wild Omaha nights become more and more vivid.</description>

<price>7.99</price>

</pizza>

</pizzas>

<!ELEMENT pizzas (pizza)*> describes a single “pizzas” element that contains zero or more “pizza” elements. <!ELEMENT pizza (name, toppings, description, price)> describes a single “pizza” element that contains exactly one “name”, “toppings”, “description”, and “price” element. <!ELEMENT toppings (topping)+> describes a single “toppings” element that contains one or more “topping” elements. And <!ELEMENT name (#PCDATA)> describes a single “name” element that contains only text.

To validate the menu against this DTD, all you need to do is place the following DOCTYPE declaration at the top of your XML document:

<!DOCTYPE pizzas SYSTEM “pizzas.dtd”>

Now, when the parser loads the XML, it will check the validity of each node within the XML document and fail to load the document if it is invalid.

Uncle Edd, having taken all this in, turned to me and inquired, “What if I want to add my own pizzas to the list?”

“What exactly do you have in mind?”

“I dunno. What if I got a whole list of pizzas -- monthly specials, let’s say -- and I want Junior, here (pointing at my cousin) to add them to the XML file they give me each month? What’ll I do with this DTD thing then?”

“As long as the new entries conform to the DTD, it won’t matter,” I assured him.

“Well, how do I know if they will or won’t?”

“I’ll show you.”

And with that, I wrote a little helper function that would validate Junior’s work

function validateJunior(){

xmlid.load("specialtyPizzas.xml");

var pizzaList = xmlid.documentNode.childNodes;

for (var i=0;i<pizzaList.length;i++){

islandLocale.innerHTML = "<XML ID='pizzaIsland'>” +

“<!DOCTYPE pizza SYSTEM 'pizza.dtd'>" +

xmlid.saveNode(pizzaList.item(i)) +

"</XML>";

if (pizzaIsland.lastError.reason != "")

xmlid.documentNode.removeNode(pizzaList.item(i));

}

return xmlid;

}

Adding, as well, the following data island and <DIV> element to the page:

<XML ID=”xmlid”</XML>

<DIV ID=”islandLocale”</DIV>

The above code takes the following XML authored by Junior. (Notice his carelessness in not supplying a price for the “Texan”.)

<pizzas>

<pizza>

<name>The Washingtonian</name>

<toppings>

<topping>apple slices</topping>

<topping>salmon</topping>

<topping>mozzarella cheese</topping>

<topping>tomato sauce</topping>

</toppings>

<description>Who says you can't mix seafood with America's number one fruit

pie filling?</description>

<price>7.99</price>

</pizza>

<pizza>

<name>The Texan</name>

<toppings>

<topping>barbeque brisket</topping>

<topping>dill pickles</topping>

<topping>onions</topping>

<topping>mozzarella cheese</topping>

<topping>tomato sauce</topping>

</toppings>

<description>Put the lone in lone star state! Ask for extra

onions</description>

</pizza>

<pizza>

<name>The Mississippian</name>

<toppings>

<topping>fried catfish</topping>

<topping>greens</topping>

<topping>mozzarella cheese</toppings>

<topping>tomato sauce</topping>

</toppings>

<description>This Southern treat will have you dreaming of pine

trees, Faulkner, and floating casinos.</description>

<price>7.99</price>

</pizza>

</pizzas>

The XML iterates through the “pizza” elements and validates each pizza element against the following DTD:

<!ELEMENT pizza (name, toppings, description, price)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT toppings (topping)+>

<!ELEMENT topping (#PCDATA)>

<!ELEMENT description (#PCDATA)>

<!ELEMENT price (#PCDATA)>

The parser validates on load according to the DTD pointed to within the DOCTYPE declaration. This means I can get the parser to validate that particular “pizza” element. How? By creating an XML data island that contains a DOCTYPE declaration pointing at “pizza.xml” and the XML for a single “pizza” element, and inserting that data island into the page.

If a “pizza” element is not valid (the second in the above case), that node is removed from the tree. Consequently, regardless of Junior’s carelessness, the function always returns a valid XML document for the application to process and display.

“The darn thing’s Junior proof!” yelled Uncle Edd.

“That’s the idea.”

I sat there for a moment while Uncle Edd looked back over the DTD and my code. He still looked a little puzzled. After a couple of minutes he spoke again, expressing concern over Junior’s ability to learn a new syntax. “It was tough enough,” he claimed, “teaching him the XML thing.”

This gave me a perfect opportunity to tell him about the XML Data submission ( to the World Wide Web Consortium (W3C). This outlines a way to describe XML documents using an XML-based syntax, so Junior would be free from having to learn another syntax and could simply validate his documents against another XML document, a schema. These schemas, I continued, will also allow more precise description, because they will incorporate data typing and inheritance.

“And I can access them using the XML object model,” announced Uncle Edd.

“Sure can.”

The problem now solved, Uncle Edd asked me if I could do one more thing for him. “Of course,” I said, always willing to answer one more question about XML and related technologies.

“Great,” said Uncle Edd, and with that he went into the next room. A second or two later, he came back with an armload of pizzas and list of addresses.