The Florida State University

College of Arts and Sciences

A Java Implementation of the
Simple Object Access Protocol

By Dongmei Gao

December 4, 2001

A project submitted to the Department of Computer Science

In partial fulfillment of requirements for the

Degree of Master of Science

Major Professor: Dr. Robert van Engelen

Master Project committee

------

Prof. Robert van Engelen

Major Professor

------

Prof. Ernest McDuffie

Committee Member

------

Prof. David Whalley

Committee Member

Table Of Contents

Abstract ……………………………………………………………………………………….. 4

1The Simple Object Access Protocol: SOAP

1.1SOAP Overview

1.2Interoperability

1.3Basic Concepts of SOAP

1.4SOAP Architecture

1.5The Use of SOAP for RPC

1.6Security Considerations

1.7SOAP Advantages

1.8SOAP Disadvantages

2SOAP for Java

2.1SFJ Architecture

2.2The Remote Method Parameter Types

3SJF Design and Implementation

3.1Serialization

3.2De-serialization

3.3Fault Class

4Client-Server Application Example

4.1WhoIs

4.2getAllSOAPServices

5Conclusions and Future work

Reference

Abstract

The Simple Object Access Protocol (SOAP)is a lightweight remote method invocation protocol for the exchange of structured data in a decentralized, distributed environment. The SOAP protocol is based on XML and HTTP, which makes it a programming language and platform neutral vehicle for remote method invocation over the Internet and through firewalls. The SOAP for Java (SFJ) project implements SOAP in Java by developing algorithms for marshalling native and user-defined Java data structures in SOAP without the use of a library of SOAP-like Java data structures. To this end, the Java reflection package is utilized to serialize and deserialize data structures in SOAP. With SFJ a Java program can be run and interoperate with other SOAP applications in a distributed environment through SOAP remote method invocation.

1The Simple Object Access Protocol: SOAP

This section introduces SOAP as a programming language and platform-neutral protocol for remote method invocation. The basic concepts of SOAP will be discussed, its architecture presented, the use of SOAP for remote method invocation is illustrated, SOAP security issues are addressed, and the advantages and disadvantages of SOAP for remote method invocation are summarized.

1.1SOAP Overview

A more recent development is the Simple Object Access Protocol (SOAP). SOAP is a versatile message exchange format that is simple and lightweight. The XML-based protocol is language and platform neutral, which means that information sharing relationships can be initiated among disparate parties, across different platforms, languages and programming environments. SOAP is not a competitive technology to component systems and object-request broker architectures such as the CORBA component modeland DCOM, but rather complements these technologies. CORBA, DCOM, and Enterprise Javaenable resource sharing within a single organization while SOAP technology aims to bridge the sharing of resources among disparate organizations possibly located behind firewalls. SOAP applications exploit a wire-protocol (typically HTTP) to communicate with Web services to retrieve dynamic content. For example, real-time stock quote information of a stock portfolio can be graphed on the display of a cell phone or can be analyzed within a spreadsheet program running on a desktop computer. This allows real-time ``what-if'' scenarios and enables the development of agents that access real-time information. Other examples are the visualization of factory processes on PDAs, control and visualization of large-scale simulations from a desktop computer, people sharing laboratory results using cell phones, remote database access, and science portals.

There are many existing SOAP implementations (including Java implementations), such as,

Apache SOAP/Axis (Java), eSOAP, gSOAP, IONA XMLBus, kSOAP, pocketSOAP 1.1 beta

SILAB/TclSOAP, SIM SOAP4R, Spray B2001, SQLData, WASP Advanced 3.0, WASP for C++, White Mesa 2.5, xSOAP (Java), and Interoperability.

SOAP is a language- and platform-neutral RPC protocol that adopts XML as the marshalling format. SOAP applications typically adopt HTTP as a firewall-friendly transport protocol. These and other key interoperability features of SOAP are summarized below:

Ubiquity. The SOAP protocol and its industry-wide support promises to make services available to users anywhere, e.g.in cellphones, pocket PCs, PDAs, embedded systems, and desktop applications.

Services. SOAP Web services are units of application logic providing data and services to other applications over the Internet or intranet. A Web service can be as simple as a shell or Perl script that uses the Common Gateway Interface (CGI) of a Web server such as Apache. A Web service can also be a server-side ASP, JSP, or PHP script, or an executable CGI application implemented in any programming language for which an XML parser is available.

WSDL. The Web Service Description Language (WSDL) is an XML format for describing network services as abstract collections of communication endpoints capable of exchanging structured information. The platform- and language-neutral WSDL descriptions published by Web services enable the automatic generation of SOAP stubs for the development of clients within a specific programming environment. The language-specific stubs can be used to invoke the remote methods of the Web service, see the Web Service, WSDL, and Clients Figureabove.

UDDI. The Universal Description, Discovery, and Integration (UDDI) specification provides a universal service for registry, lookup, discovery, and integration of world-wide business services. WSDL descriptions complement UDDI by providing the abstract interface to a service.

Firewalls. Firewalls can be configured to selectively allow SOAP messages to pass through, because the intent of a message can be determined from the header part of the SOAP message.

1.2Basic Concepts of SOAP

1.2.1Definition

SOAP is a lightweight XML-based protocol for exchange of information in a decentralized, distributed environment.

The basic design characteristics of SOAP are:

1)It is a lightweight protocol. SOAP is a standard way of regulating data transmission between computers. The SOAP authors decided to specify SOAP only as a low-layer protocol for structured data exchange. The authors clearly stated that they did not want to define an entire distributed object system specification.

2)It is used to exchange structured data. SOAP is designed to exchange structured and typed information. It is a remote method invocation (a.k.a. remote procedure calling RPC) protocol for the Internet.

3)It is an XML-based protocol. The SOAP specification mandates an XML vocabulary that is used for representing remote method parameters, return values, and (remote) exceptions.

4)It works in a decentralized, distributed environment. It is a protocol specification for invoking methods on servers, services, components and objects servers in a platform-independent manner. It commonly uses the HTTP protocol to transport the XML-encoded remote method parameters over the Internet between disparate systems.

1.2.2Brief History of SOAP

SOAP was originally developed by Microsoft, DevelopMentor, and Userland Software and was then submitted to the Internet Engineering Task Force (IETF), who eventually made it an official recommendation. The basic specification was drawn up in spring 1998 by Dave Winer of UserLand Software. His XML-RPC specifications, on which SOAP is based and which are available on are almost identical to the original SOAP specifications. With SOAP 1.1, IBM and Lotus joined DevelopMentor, Microsoft and Userland Software, along with a group of partners.

1.2.3Standards

1)SOAP relies on HTTP 1.0 or greater and can take advantage of the HTTP extension framework (

2)SOAP also relies on the core W3C XML recommendation (

3)SOAP supports (but does not mandate) the W3C XML namespace recommendation (

4)SOAP payloads must be well-formed XML, but no validation (via DTDs or otherwise) is required. XML Schemas are used to describe SOAP data types (

1.3SOAP Architecture

This section briefly introduces the architecture of the SOAP protocol.

1.3.1Elements of the SOAP Protocol

The SOAP protocol consists of four parts:

1)An envelope that defines a framework for describing what is in an XML-encoded SOAP message and how to process it.

2)A set of encoding rules that define a data serialization mechanism that can be used to express instances of application-defined data types in XML.

3)A convention for representing remote procedure calls and responses.

4)A binding convention for exchanging messages between systems using an underlying protocol. Bindings describe how to use SOAP in combination with HTTP and the experimental HTTP Extension Framework.

1.3.2The Basic SOAP Payload Structure

XML is a simple and extensible markup language. Because XML is just text, any application can understand it as long as the application understands the character encoding in use. By default, XML assumes that all characters belong to ISO/IEC 10646, known as the Universal Character Set (UCS). The XML specification ( mandates that all XML processors must accept character data encoded using the UCS Transformation Formats UTF-8 or UTF-16. Therefore, any XML data stream encoded in UTF-8 or UTF-16 can be understood regardless of platform or programming language. This makes XML a good choice for describing method invocations in a platform and language-neutral fashion.

The basic SOAP payload structure consists of three parts.

1)The SOAP envelope: this is the root XML element in the XML document tree representing a message.

2)The (optional) SOAP header: this is a generic mechanism that adds characteristics to the SOAP message. SOAP defines several attributes that can be used to indicate who must process the message, and whether this process is optional or mandatory.

3)The SOAP body: this is the container for the mandatory information being sent to the message endpoint.

1.3.2.1SOAP Envelope

This is the first part of the SOAP message and the mandatory root element of the XML document tree. It contains the name of the element (Envelope), followed by a namespace defining the SOAP version being used, and the optional encodingStyle attribute which points to a link where the serialization (tree structure) and encoding rules are defined. The envelope is presented as follows:

<SOAP-ENV:Envelope
xmlns:SOAP-ENV=”
SOAP-ENV:encodingStyle="
……
</SOAP-ENV:Envelope>
1.3.2.2SOAP Header

This is an optional part of the SOAP message encapsulated in the SOAP envelope. It carries information to intermediaries, and is made up of one of more entries. These bear a local name, a full name, a namespace and the two actor attributes which designate the endpoint of the entry, and mustUnderstand, which indicates the optional nature of the process. A SOAP application must include a correct SOAP namespace for all the elements and attributes defined in the message generated. This is a URI that points to a description of the message information in order to guarantee the uniqueness of the message.

<SOAP-ENV:Header>
<t:newEvent xmlns:t="
SOAP-ENV :actor=" /actor/next/"
SOAP-ENV :mustUnderstand="1">
Christmas Event
</t:newEvent>
</SOAP-ENV:Header>
1.3.2.3SOAPBody

The information to be processed by the endpoint is found in the body of the SOAP message. This can contain a set of entries that are all kept in the root of the message body.

<SOAP-ENV:Body>
<m:NewCustomer xmlns:m="Some-URI">
<Name>Dumser</Name>
<Surname>Johann</Surname>
<City>Cambridge</City>
<ZipCode>01800</ZipCode>
<State>MA</State>
<Country>USA</Country>
</m:NewCustomer>
</SOAP-ENV:Body>

1.3.3HTTP Structure

The http protocol is used so that the SOAP message will be transported effectively.

1.3.3.1HTTP Header

The HTTP header is just before the SOAP message. The HTTP protocol sends a POST request via the network. In the first line, the send method, URI request and protocol version are defined:

POST /Computer HTTP/1.1

The next line gives the target site:

Host:

The next three lines are used to define the MIME format for message display, the HTTP coding and the length of the message.

Content-Type: text/xml;
charset="utf-8"
Content-Length: 10

Then, methods are added, such as SOAPAction, which determines the intention of the HTTP request. The identifier following the # sign must match the name of the first tag in the SOAP message body.

SOAPAction="

1.3.4SOAP Message Structure

Below is an example of the SOAP message request code, followed by an explanatory diagram:

POST /EventManager HTTP/1.1
Host:
Content-Type: text/xml;
charset="utf-8"
Content-Length: 60
SOAPAction=" Customer"
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="
SOAP-ENV :encodingStyle="
<SOAP-ENV:Header>
<t:Name xmlns:t="
SOAP-ENV :actor=" /next/"
SOAP-ENV :mustUnderstand="1">
Dumser
</t:Name >
</SOAP-ENV:Header>
<SOAP:Body>
<m:NewCustomer xmlns:m="
<Entreprise>SQLI</Entreprise>
<Address>Paris</Address>
</m:NewCustomer>
</SOAP:Body>
</SOAP:Envelope>

1.4The Use of SOAP for RPC

1.4.1SOAP Web Services

SOAP applications exploit a wire-protocol (typically HTTP) to communicate with Web services to retrieve dynamic content. SOAP Web services are units of application logic providing data and services to other applications over the Internet or Intranet. Web services can be as simple as a shell or Perl script. It can also be a server-side ASP, JSP, or PHP script, or an executable CGI application implemented in the programming language for which an XML parser is available. The SOAP protocol and its industry-wide support promises to make services available to users anywhere. In SOAP RPC, the Web Services are treated like procedures or components are treated in traditional programming.

1.4.2Header requirements

The format of the URI in the first line of the header is not specified. For example, it could be empty, a single slash, if the server is only handling XML-RPC calls. However, if the server is handling a mix of incoming HTTP requests, we allow the URI to help route the request to the code that handles XML-RPC requests.

A User-Agent and Host must be specified.

The Content-Type is text/xml.

The Content-Length must be specified and must be correct.

1.4.3Payload format

The payload is in XML, a single <methodCall> structure.

The <methodCall> must contain a <methodName> sub-item, which is a string containing the name of the method to be called.

For example, the methodName could be the name of a file containing a script that executes on an incoming request. It could be a path to a file contained within a hierarchy of folders and files.

If the procedure call has parameters, the <methodCall> must contain a <params> sub-item. The <params> sub-item can contain any number of <param>s, each of which has a <value>.

Examples of different types

Number (int, double, float)

String

Array: <array> elements do not have names.

Linkedlist

Vector

Class

1.4.4Example

Imagine a component that lives somewhere on the Internet that implements the PurchaseBook method on the purchase_book interface. This method would be invoked for a user to purchase a book from an online bookstore. The following HTTP request represents how you would invoke such a method using the SOAP protocol:

POST /cgi-bin/purchase-book.cgi HTTP/1.1
Content-Type: text/xml
Content-Length: 555
<SOAP-ENV:Envelope …
<SOAP-ENV:Body>
<PurchaseBook>
<ISBN xsi:type=”xsd:integer”>0201379368</ISBN>
</PurchaseBook>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

This HTTP request points to a uniform resource identifier (URI) of /cgi-bin/purchase-book.cgi. Since the SOAP specification says nothing about how a component is activated, it's up to the code behind this URI to decide how to activate the component and invoke the specified method.

Unless there's a lower-level error, the first line of the HTTP header of the response by the service that processes the message always returns 200 OK.

The Content-Type is text/xml. Content-Length must be present and correct.

The body of the response is a single XML structure, containing the Envelope, Body, and a <methodResponse>.

Here's an example of a response to an XML-RPC request:

HTTP/1.1 200 OK
Connection: close
Content-Length: 158
Content-Type: text/xml
Date: Fri, 17 Jul 1998 19:55:08 GMT
Server: UserLand Frontier/5.1.2-WinNT
<?xml version="1.0"?>
<SOAP-ENV:Envelope …>
<SOAP-ENV:Body>
<purchaseBookResponse>
<result>
<charged xsi:type=”xsd:int”>1</charged>
<invoice xsi:type=”xsd:long”>6471274575</invoice>
</result>
</purchaseBookResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

When RPC sends a wrong SOAP message, server will return a fault result that includes fault code, fault string, run code and detail. The possible fault code shows in the following table.

SOAP Fault Codes

Value / Name / Meaning
100 / Version Mismatch / The call was using an unsupported SOAP version.
200 / Must Understand / An XML element was received that contained an element tagged with mustUnderstand="true" that was not understood by the receiver.
300 / Invalid Request / The receiving application did not process the request because it was incorrectly formed or not supported by the application.
400 / Application Faulted / The receiving application faulted when processing the request. The detail element contains the application-specific fault.

Below is an example of returning an application-specific fault :

<soap:Envelope

xmlns:soap=‘urn:schemas-xmlsoap-org:soap.v1’>

<soap:Body>

<soap:Fault>

<faultcode>400</faultcode>

<faultstring>

Divide by zero occurred

</faultstring>

<runcode>Maybe</runcode>

<detail>

<t:DivideByZeroException xmlns:t="someURI">

<expression>x = 2 / 0;<expression>

</t:DivideByZeroException>

</detail>

</soap:Fault>

</soap:Body>

</soap:Envelope>

A response cannot contain both a <fault> and a <methodResponse>.

1.5Security Considerations

Firewalls can easily recognize SOAP packets based on their Content-Type (text/xml-SOAP), and can filter based on the interface and method name exposed via HTTP headers. These headers include the interface URI and the method name being invoked on that interface. This information is culled from the payload before sending the HTTP request, and is required to match the payload.