Correctly Resolving Schema Namespace URLsin Commercial Java Applications (DRAFT)

Andrew Jacobs

CTO, HandCoded Ltd.

Introduction

The resolving process of schema namespaces in the JAXP interface is awkward and difficult to use in real commercial applications.

This paper illustrates the problem through a number of small example files processed by a simple JAXP based application that creates a schema validating DOM parser and applies it to a number of documents displaying the arguments passed to the entity resolver and any errors discovered during the parsing process.

As we are concerned only with the resolving process only a single element (‘<root>’) is defined in the supporting schemas and DTD.

Why is resolution important?

The link between an XML document and its schema is via its target namespace. The ‘schemaLocation’ attribute is provided to hold a ‘hint’ for the location of the schema however the paths and filenames it contains are controlled by the writer of the document. A later document processor may keep his copies of the schemas in different locations in his filing system than those used by the writer, so he must be able to override the ‘schemaLocation’ attribute.

It is also useful to check that the referenced namespaces are ones known to the application. An XML document can quite legitimately make references to schemas hosted on the Internet but the document processor is unlikely (or unwilling) to be process file that reference such schemas as they may significantly change the interpretation of the parts of the document that it does recognise.

The SAX component of the XML parser provides an interface that allows requests for ‘entities’ to be intercepted processed (the ‘org.xml.sax.EntityResolver’ interface).

The Apache Xerces parser supports an ‘external schema location’ property that allows target namespaces URLs to be mapped to an alternate location but this property is not standard.

Resolution in DTD based documents

To test the DTD resolution process the following documents were used with test application.

<!ELEMENT root EMPTY>

Figure 1 test.dtd

<?xml version="1.0"?>
<!DOCTYPE root PUBLIC "-//TEST" "test.dtd">
<root/>

Figure 2 dtd.xml

<?xml version="1.0"?>
<!DOCTYPE root PUBLIC "-//TEST" "rubbish/test.dtd">
<root/>

Figure 3 dtd-invalid-location.xml

The output from the test application for the dtd.xml and dtd-invalid-location.xml files shows that the resolver is called with both the public and system names from the <!DOCTYPE> line. Note that the system name has been converted from a relative path to an absolute location relative to the applications working directory.

> Processing: dtd.xml
Resolver called with publidId='-//TEST' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/test.dtd'
Parse Worked!
> Processing: dtd-invalid-location.xml
Resolver called with publidId='-//TEST' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/rubbish/test.dtd'
Caught IOException: C:\Documents and Settings\Andrew\workspace\JAXP Resolver Test\rubbish\test.dtd (The system cannot find the path specified)

As can be seen from the output the EntityResolver interface receives sufficient information to allow the application to redirect the parser to a grammar file based on the public name.

Resolution in XML Schema Based Documents

The namespace for a schema can be expressed using any valid URI. The test cases put through the test application use namespaces expressed as both URLs (e.g. ‘ and URNs (e.g. ‘urn:…’).

Simple URL processing

The first set of schema examples use a URL as the target namespace (like FpML). One file uses schemaLocation to refer to a local schema, the other refers to a non-existent schema file.

<?xml version = "1.0" encoding = "UTF-8"?>
<!--Generated by Turbo XML 2.4.1.100. Conforms to w3c
<xsd:schema xmlns =
targetNamespace =
xmlns:xsd = "
<xsd:element name = "root">
<xsd:complexType/>
</xsd:element>
</xsd:schema>

Figure 4 test-url-namespace.xsd

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi=" xsi:schemaLocation=" test-url-namespace.xsd"/>

Figure 5 url-namespace.xml

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi=" xsi:schemaLocation=" rubbish/test-url-namespace.xsd"/>

Figure 6 url-namespace-invalid-location.xml

The output from the test application shows that the resolver is called for both cases but is only passed the path to the schema file derived from the schemaLocation attribute. As the path for one file was deliberately made incorrect errors are generated in this case.

> Processing: url-namespace.xml
Resolver called with publidId='null' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/test-url-namespace.xsd'
Parse Worked!
> Processing: url-namespace-invalid-location.xml
Resolver called with publidId='null' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/rubbish/test-url-namespace.xsd'
Warning: schema_reference.4: Failed to read schema document 'rubbish/test-url-namespace.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
Error: cvc-elt.1: Cannot find the declaration of element 'root'.
Parse Worked!

Simple URL to an Internet hosted schema

In the next two cases the a copy of the schema was made available on the Internet. One test case file provides a schemaLocation attribute while the second does not.

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi=" xsi:schemaLocation=" livexsd"/>

Figure 7 live-url-namespace.xml

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi="

Figure 8 live-url-namespace-no-schema-location.xml

The output from the test application shows that when the schemaLocation attribute is present it is used to find the schema. When no schemaLocation is present the namespace URL is accessed WITHOUT any being offered to the entity resolver.

> Processing: live-url-namespace.xml
Resolver called with publidId='null' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/livexsd'
Parse Worked!
> Processing: live-url-namespace-no-schema-location.xml
Error: cvc-elt.1: Cannot find the declaration of element 'root'.
Parse Worked!

AJ: I think this last error is related to not setting the MIME type for the schema on the server

Simple URN processing

The test file for URN based namespaces use ‘urn:handcoded:test’ as the schema identifier.

<?xml version = "1.0" encoding = "UTF-8"?>
<!--Generated by Turbo XML 2.4.1.100. Conforms to w3c
<xsd:schema xmlns = "urn:handcoded:test"
targetNamespace = "urn:handcoded:test"
xmlns:xsd = "
<xsd:element name = "root">
<xsd:complexType/>
</xsd:element>
</xsd:schema>

Figure 9 test-urn-namespace.xsd

<?xml version="1.0"?>
<root xmlns="urn:handcoded:test" xmlns:xsi=" xsi:schemaLocation="urn:handcoded:test test-urn-namespace.xsd"/>

Figure 10 urn-namespace.xml

<?xml version="1.0"?>
<root xmlns="urn:handcoded:test" xmlns:xsi=" xsi:schemaLocation="urn:handcoded:test rubbish/test-urn-namespace.xsd"/>

Figure 11 urn-namespace-invalid-location.xml

The test application output shows that the resolution process is exactly the same as for URL based namespace references. The URN is never passed to entity resolver only the derived schema location.

> Processing: urn-namespace.xml
Resolver called with publidId='null' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/test-urn-namespace.xsd'
Parse Worked!
> Processing: urn-namespace-invalid-location.xml
Resolver called with publidId='null' and systemId='file:C:/Documents%20and%20Settings/Andrew/workspace/JAXP%20Resolver%20Test/rubbish/test-urn-namespace.xsd'
Warning: schema_reference.4: Failed to read schema document 'rubbish/test-urn-namespace.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
Error: cvc-elt.1: Cannot find the declaration of element 'root'.
Parse Worked!

Using Xerces External schema mapping

The test application configures the XML parser to map the target namespace to urn:handcoded:mapped. The test cases only contain the ‘ URI.

<?xml version = "1.0" encoding = "UTF-8"?>
<!--Generated by Turbo XML 2.4.1.100. Conforms to w3c
<xsd:schema xmlns =
targetNamespace =
xmlns:xsd = "
<xsd:element name = "root">
<xsd:complexType/>
</xsd:element>
</xsd:schema>

Figure 12 test-url-mapped.xsd

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi=" xsi:schemaLocation=" mapped.xsd"/>

Figure 13 url-mapped.xml

<?xml version="1.0"?>
<root xmlns=" xmlns:xsi=" xsi:schemaLocation=" rubbish/mapped.xsd"/>

Figure 14 url-mapped-invalid-location.xml

The output from the test application shows that the schemaLocation attribute is ignored and the entity resolver is passed the value registered with Xerces through the external schema location property.

> Processing: url-mapped.xml
Resolver called with publidId='null' and systemId='urn:handcoded:mapped'
Parse Worked!
> Processing: url-mapped-invalid-location.xml
Resolver called with publidId='null' and systemId='urn:handcoded:mapped'
Parse Worked!

We choose to map the namespace URI to a URN (which is then mapped to a file in the entity resolver) but we could have mapped directly to the schema filename.

Conclusions

The entity resolver mechanism was designed for DTD based documents and provides all the information to the resolved needed to map to a grammar. Although the same interface is used during schema processing it does not provide enough information to perform resolution correctly, in particular the namespace URI is not made available to the new resolver. In addition the systemId file path can not be relied upon as it is derived from the schemaLocation and may indicate the document writers system environment.

This means that the EntityResolver interface as provided by JAXP is completely useless for processing XML schema based documents. The only examples I could find where resolution did seem possible the schemaLocation mapped from one address to another (e.g. schemaLocation=” To resolve namespaces to local files cleanly you MUST use features like the Xerces external schema property that are not standard and may not be supported on all parsers.

AJ: If I’ve missed an obvious way around this then I’d love to hear it. I find it pretty hard to believe that I’m the only person that wants tight(er) control over namespace processing.

In the HandCoded FpML toolkit we map FpML version URIs to dummy URNs that are then mapped to local file via a catalog (to prevent hard coding of file locations in application code). Whilst this works we still end up hard coding the mapping from FpML URI to dummy URN!

The simplest and best solution would be for the target namespace URI to be accessible in the entity resolver (as the publicId).

Comparison to .Net Framework

The XML parser in the Microsoft .Net framework is much easier to control. First build a ‘SchemaCollection’ instance and them fill it with ‘Schema’ instances for schema the application should recognize. This schema collection can then be passed directly to the XML parser whenever a document is processed. Only schemas within the collection will ever be used.

Test Application

import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.EntityResolver;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public final class ResolverTest
{
public static void main (String [] arguments)
{
ResolverTestapplication = new ResolverTest ();
for (int index = 0; index < arguments.length; ++index)
application.process (new File (arguments [index]));
}
/**
* Contructs a <B>ResolverTest</B> instance and creates the validating
* namespace aware parser used for the tests configured with a simple
* reporting resolver and error handler.
*/
protected ResolverTest ()
{
factory = DocumentBuilderFactory.newInstance ();
// We want validation and namespaces
factory.setValidating (true);
factory.setNamespaceAware (true);
// We want schema validation
try {
factory.setAttribute (" Boolean.TRUE);
}
catch (IllegalArgumentException error) {
System.err.println ("JAXP implementation does not support schema validation - Aborting");
System.exit (2);
}
// Set up a mapping for to urn:handcoded:mapped
try {
factory.setAttribute ("
" urn:handcoded:mapped");
}
catch (IllegalArgumentException error) {
System.err.println ("JAXP implementation does not support external schema location - Aborting");
System.exit (2);
}
// Create an XML parser
try {
builder = factory.newDocumentBuilder ();
}
catch (ParserConfigurationException error) {
System.err.println ("Failed to create a validating XML parser");
System.exit (2);
}
// Install an entity resolver
builder.setEntityResolver (
new EntityResolver ()
{
public InputSource resolveEntity (String publicId, String systemId)
{
System.out.println ("Resolver called with publidId='" + publicId + "' and systemId='" +systemId + "'");
// Handle the mapped URL
if ((systemId != null) & systemId.equals ("urn:handcoded:mapped"))
return (new InputSource ("test-url-mapped.xsd"));
return (null);
}
});
// Install an error handler
builder.setErrorHandler (
new ErrorHandler ()
{
public void warning (SAXParseException error)
{
System.out.println ("Warning: " + error.getMessage());
}
public void error (SAXParseException error)
{
System.out.println ("Error: " + error.getMessage());
}
public void fatalError (SAXParseException error)
{
System.out.println ("Fatal Error: " + error.getMessage());
}
});
}
/**
* Performs a validating parse of the specified file and reports
* @param file
*/
protected void process (File file)
{
System.out.println ("> Processing: " + file.getName());
try {
if (builder.parse (file) != null)
System.out.println ("Parse Worked!");
else
System.out.println ("Parse Failed!");
}
catch (SAXException error) {
System.out.println ("Caught SAXException: " + error.getMessage());
}
catch (IOException error) {
System.out.println ("Caught IOException: " + error.getMessage());
}
}
/**
* Factory instance used to create XML parsers.
*/
private DocumentBuilderFactoryfactory;
/**
* The JAXP interface to the DOM parser.
*/
private DocumentBuilderbuilder;
}