Building Business-to-Business Solutions with XML

The challenges of moving data between the parts of a distributed application are not new. In this article, I'll provide you with an example of how XML technologies can be used to handle data in a distributed system. The example that I'll use is an application that my company has already delivered to the client and that is in use now. It's also the ultimate example of a distributed application—not only are the parts of the application running on different servers, the servers are owned by completely different companies. It's an ideal example of XML in action.

XML overview
XML is an open, standard protocol endorsed by the World Wide Web Consortium (W3C). XML addresses the problems that developers face in building distributed systems ranging from client/server applications up to business-to-business systems. XML has two major features that make it an attractive solution to the problems in these areas: XML is text-based and self-describing. It's used to create text documents containing both data and the definition of the data (the term "document" is misleading, since the XML structures that you create may be held entirely in memory). Since XML data comes with its definition and every computer system can handle text, XML provides a vendor-neutral, cross-platform method for moving data around. Virtually every business application is about data, so XML is crucial to solving business problems.

But XML is more than a data structure. XML comes with the Document Object Model (DOM), an open, standard programming interface to process XML data. So, along with XML's self-describing nature, XML provides a vendor-independent way to process data. XML also separates data from presentation, directly supporting n-tier applications that separate presentation from business processing. In an XML world, data is provided in a format that contains only the data structure, while presentation is provided by a separate process. This allows a server to provide the data to a variety of clients and have each of those clients apply the appropriate presentation style without having to demand server resources. This feature alone creates the possibility for rich user interfaces with improved scalability. XML has an associated technology, known as eXtensible Stylesheet Language (XSL), that provides a way of manipulating XML declaratively.

Another important part of the XML feature set is validation. The W3C currently recommends one standard for defining XML tags, which is known as the Document Type Definition (DTD). DTDs let you establish how your data should be organized and define business rules to be applied to your data. But XML technologies are evolving rapidly, and an alternative, known as XML Schemas, will further extend your abilities to validate XML automatically.

This overview suggests the tremendous flexibility of XML: It's more than just a data transfer technology. Your data, the presentation applied to the data, and even the rules that you impose on this data are all derived from a single technology, XML. While this article concentrates on using XML to move data, that's only the beginning of how XML will be used.

A case study
Our application demonstrates the full range of the current XML technologies. I was asked by my client to put together an architecture to enable them to sell a variety of products over the Internet. The system had some stringent requirements. It had to:

• Be a generic application that could be used for many different types of data.

• Minimize the impact of adding new products.

• Communicate between suppliers with different hardware and software configurations.

• Be completely browser-independent.

Our architecture proposal defined a system that involved two types of suppliers working with our client to create the complete application. The roles of the business partners were:

• Data provider: This partner takes care of the site's content. For example, one data provider was responsible for providing a list of books that were available for purchase. That data provider was responsible for making sure that the data feeds from the various publishers were synchronized, up to date, and "clean" (no duplicates, no errors) .

• Order fulfiller: The order fulfiller ensured that goods were shipped to customers, based on information from my client. This meant determining whether the items being ordered were in stock, managing a shipping process, and getting the product into the hands of the customer.

This design meant that our client didn't need to worry about maintaining the system's content or incurring the overhead of dispatching orders. Instead, those activities were passed to business partners who specialized in those activities. As my client rolled out the application to sell other products, they just needed to find a provider capable of supplying the new content and a fulfiller able to ship the product. The new providers and fulfillers needed only to comply with our already established rules for doing business with my client. Defining those rules, then, was critical.

For this architecture to work, there were two important requirements that had to be met:

• A common vocabulary needed to be agreed upon among the client, the data providers, and the order-fulfillment agencies.

• We needed to be able to communicate with our business partners regardless of what hardware or software they used.

XML was a perfect choice for this architecture. Even in our initial implementation, both the data provider and the order fulfiller were using different hardware and software from my client. XML bridged that gap effortlessly. All the communication among our business partners is handled by exchanging XML documents.

Defining data
The first step in designing our system was to determine which XML documents we'd need. We decided that we needed the following documents (see Figure 1 for a schematic of the system):

Figure 1.

• Request for content, sent to the data provider ("Show me all books on XML.")

• The content, sent from the data provider ("Here's a list of all of the books on XML.")

• Request to dispatch an order, sent to the order fulfiller ("Send Mr. Smith the following books.")

• The order acknowledgement, sent from the order fulfiller ("Mr. Smith will be able to get two of the three books he ordered.")

With our documents defined, the next step in the process was to decide on a definition of the information being exchanged. Defining the contents of your XML documents is, of course, essential to ensure that all of the partners can read what the other partner sends. DTDs provide a mechanism for recording those document definitions. DTDs aren't passive documents, however. If an XML processor is passed a DTD along with an XML document, the processor can automatically confirm if the document matches the DTD, eliminating the need for you to write edit code.

Our system uses Document Type Definitions to define all of our data-exchange documents. We wrestled with this decision because one emerging technology, XML Schemas, was quite attractive to us. XML Schemas have many benefits, including strong data typing and increased extensibility over DTDs. However, one of the fundamental design decisions that we made was for our system to comply as much as possible with the XML standards, as published by the W3C. We decided that reliance on published standards was more important to us than using the latest technology and, as a result, I decided to use DTDs (for more information on Schemas, see Peter Vogel's article "Your First Schema" in this issue). I'm planning to convert the system to use XML Schemas as soon as they're approved as a standard.

When you start to develop the DTD that will describe your documents, you need to decide if you're going to create a new DTD or use a DTD that's already published and available in the public domain. There are a number of DTD and Schemas repositories, including www.oasis-open.org and www.BizTalk.org. I decided to write our own DTD, as all of the relevant published DTDs were far too complex for our requirements.

From experience, I can tell you that designing a DTD or Schema is not an exercise to be taken lightly. This is where you have to get all of the stakeholders in the application to agree on the rules of the game. If some of the suppliers are unfamiliar with XML, your immediate problem will be how to communicate the structure of your XML documents with them.

When dealing with naïve XML users, I was frequently asked to supply them with a sample of the XML documents that they were to use. The problem is that XML documents represent only a snapshot of the data exchange. A DTD, on the other hand, describes all of the rules that control any XML document that we might produce. Providing a sample XML document as a way of understanding the rules is like asking for some sample data to understand a database. The only way to understand an application's data structure is to look at the database design. While some sample data can be helpful, it never describes the complete data model. The same is true of using XML documents as examples of what needs to be exchanged.

The problem for naïve users is that DTDs are difficult to understand, particularly if they describe a large system with a lot of flexibility. DTDs, unfortunately, use a notation inherited from SGML, XML's parent technology. This format is very different from the XML documents themselves. One of the benefits of XML Schemas is that they use the same XML syntax as the documents they define, making them easier to read. This was another reason for developing our own DTD: Given the problems I was having with getting our business partners on board with XML, a simple DTD was a necessity.

In Figure 2 you can see a sample of the document returned from our data provider.

Figure 2.

The XML tags (for example, <books>, <book>, <title>) effectively describe the data contained in the document. In Figure 3 you can see a portion of the DTD that describes the document.

Figure 3.

Processing XML
Once you finish the design phase, you have to start thinking about processing your documents. There are two major technologies for processing XML documents: the Document Object Model and the eXstensible Stylesheet Language (XSL). A third technology, SAX, seems to be becoming a niche solution.

Despite its name, XSL is more than just a way of controlling the style or appearance of XML documents. XSL is a full-blown language with conditional and looping statements whose commands consist of XML-compliant tags. In addition, XSL comes with commands that allow you to create attributes, elements, and other XML components.

In addition to support for procedural activities like looping, XSL is also a declarative language. In XSL you can specify what you want to have happen rather than specifying all of the detail of how to get it. In a way, XSL is similar to SQL where you specify the result you want rather than how to execute it. Using XSL this way requires you to think in terms of patterns and templates. Patterns describe the elements of your XML document that you're interested in processing. Templates are the code containers which associate commands with selected XML elements. For example, you can write an XSL document to request all Customer elements in an XML document, then run a set of templates that will convert the Customer elements into a new format. In future XML Developer articles, you'll be exposed to the complete power of XSL.

To support our data exchanges, the application makes extensive use of XSL. The system uses XSL for two purposes. First, XSL is used to change the structure of one XML document to another. Adding a provider or fulfiller who doesn't use our XML standard consists of creating the appropriate XSL conversion document. Second, we use XSL to convert our XML documents into HTML to be displayed to our users. Changing the site's appearance consists of swapping in a new XSL document to convert XML to HTML.

More typically, XML documents are processed by loading them into a parser and generating a Document Object Model (DOM) set of objects. Once an XML document is loaded into a DOM object, you can process the document as a collection of objects. This lets you extract values from the document's elements and attributes, create XML documents from scratch, or modify existing documents.

It's important to note that DOM is an open, standard API. This means that the definition of the DOM's programming interface (the objects, their relationships, and their methods and properties) is a W3C responsibility. The code to implement this interface is vendor-dependent. In other words, the createElement method in the DOM looks the same across Visual Basic and Java, but the code that makes the method work will differ from one host language to another.

Given these two powerful tools, you could be asking yourself whether you should use DOM or XSL in your processing. To make matters more complicated, if you're using Microsoft's MSXML parser, then you can perform DOM processing from within your XSL scripts. Upcoming XML Developer articles will show you how to use both of these technologies (for an introduction, see Michael Corning's article "A Tour of XSL" in this issue).

In our application, we have a number of documents that arrive on a batch basis. These documents are used to synchronize the state of the orders between my client and the order fulfillers. I use DOM-based code to extract data from the documents and check against or update our database. In addition, the system uses DOM's functions to process responses from the order fulfillers, following our order requests.