Web service shell architecture

2012/11/19

Purpose:

This document outlines the function of the Web Service Shell (WSS) that communicates with both an HTTP client and an internal handler (e.g. command line utility) responsible for generating the content returned to the client. This document also includes a technical description of the communication protocol between the shell and the handler program. This shell architecture is primarily targeted at International FDSN web services.

General architecture:

The WSS translates web service parameters into options for a localized data content handler that returns data in the format expected by the client. The Shellperforms the following core functions:

  • Manage HTTP connection with client
  • First order validation of request parameters; see “Parameter Validation” below.
  • Manage connection with localized handler according to configuration parameters.
  • Log service requests, including time series shipment summaries
  • Handles (via the servlet container) HTTP authentication (basic or digest).

The web service shellis written in Java and designed to run in a Tomcat container, other common containers (e.g. Glassfish)will also work.

Parameter Validation

The WSS will check for expected parameters as defined in the configuration file and will validate their type only (date, number, text). Unexpected parameters or parameter values that do not match the expected type will result in an error returned to the client. The WSS will not check the values for valid ranges or for the appropriate grouping of parameters. The content handler should perform all remaining validation of the parameters.

Data Content Handlers

The handler program is responsible for extracting data from the appropriate data center repository, and then returning it in the format expected by the client. There are two options for creating a localized handler: a) an external command-line program that is executed by the shell and b) a Java “stub” that returns data via an internal interface.

External program handler

The shell executes an external program handler using inputs that are supplied through command line arguments or via the standard input stream. The handler transmits data content via the standard output stream. The overall status of the request is communicated using the handler’s exit status. An optional message may be transmitted from the handler to the shell via the standard error stream. Only *nix systems are supported.

Inputs

Request parameters are passed to the handler using command line options roughly equivalent to parameters received via a GET request. In the case of POST requests that may include numerous selections, the POST body is passed to the handler via the standard input stream.

Environment variables

The shell will set the following environment variables for use by the handler:

Variable / Description
REQUESTURL / Full request URL submitted to the shell
USERAGENT / HTTP UserAgent submitted to the shell
IPADDRESS / IP address of client submitting request
APPNAME / Application name as set in the configuration file
VERSION / Web service version as set in the configuration file

Data content

The handler returns the results via the standard output stream. The data content should be limited to the data only (e.g., returned data should not include any HTTP headers). This data is streamed directly to the client after the shell has added the appropriate HTTP headers.

Request status

The handler’s exit status indicates the overall status of the request and is translated to HTTP status codes as follows:

Exit status / HTTP status / Description
0 / 200 / Successfully processed request, data returned via stdout.
1 / 204 / No data. Request was successful but results in no data.
2 / 400 / Invalid or unsupported argument/parameter.
3 / 413 / Too much data requested.
4 / 500 / General error. An error description may be provided on stderr.

Optional message

The handler may return an optional message via the standard error stream. Normally this is a mechanism for the handler to report a description of an error. The shell may include this description in an error message sent back to the client.

Internal Java interface

As an alternative to calling an external program, data content may be returned via an internal Java interface. This requires developing data center specific code in Java. In some cases this may allow data extraction to be optimized. For example, an intermediate cache could be used instead always retrieving from the repository.

Unsupported options

If a handler does not support a parameter-argument that is supported by the shell, then it should exit with an exit status of 2 to indicate an error and return the following text on stderr: “Unsupported option: XX”.

Configuration Files

There are 2 configuration files associated with the WSS. The first is the application configuration file that allows control over basic operational parameters. The second controls the list of allowable query parameters and their basic types (for simple validation by the WSS).

Configuration files are located in the WSS servlet’s /webserviceshell/WEB-INF/classes/META-INF directory. The configuration file format is simply a list of name=value pairs. Names and values are case-sensitive. Whitespace is ignored.

Application Configuration file (service.cfg)

Parameter / Description
rootServicePath / URI base of the service interface, e.g. service.iris.edu/fdsn/event
rootServiceDoc / URI of document to be served to the client for the service root page. The following strings, if present, will be translated in the document by the shell:
  • BASEURL – replace with the value of rootServicePath
  • VERSION – replace with service version
  • HOST – replaced with host name
* see Documentation section below
appName / Name of application, e.g. “fdsn-station”. Used in error messages and logging.
version / Web service version, e.g. “1.0.0”. Used in error messages, logging, and potentially content.
outputType / Default output type from set [text, xml, seed, json]. Determines the default output MIME type, defaults to text.
handlerProgram / Path to the content handler program that returns content, e.g.: “/usr/local/bin/extractminiseed”
handlerTimeout / A timeout value in seconds. If the handler does not respond within the timeout duration the process is killed and an HTTP status code of 503 is returned to the caller.
handlerWorkingDirectory / Path to the directory on the host where the handlerProgram will nominally run, i.e. where data files may be read, created, etc. This directory is referenced to the main directory of the web service shell servlet on the application server. This directory will be erased during standard deployment of the web application unless it is part of the project, i.e. the WAR. Attempts to use a working directory outside of the application’s context on the server will result in an exception. Defaults to “/”.
loggingType / Logging type from set [LOG4J, JMS]. Defaults to LOG4J.
jndiUrl / (JMS specific) URL to the JNDI configuration data for JMS logging
connectionFactory / (JMS specific) connectionFactoryName for establishing publishing connection
topicDestination / (JMS specific) Topic to which to publish
singletonClassName / (Java interface specific) Qualified name of Java class to load at application start
streamingOutputClassName / (Java interface specific) Qualified name of Java class to use for data access.

Query Parameter Configuration File (param.cfg)

This configuration file controls the list of allowed query parameters along with their basic types. Allowable types are listed in the table below.

Parameter Type / Description / Validation Requirement
DATE / Valid FDSN data format: YYYY-MM-DDTHH:MM:SS
NUMBER / Number (value must be parseable as a number)
TEXT / Text string (not validated)

Example param.cfg file:

starttime=DATE

endtime=DATE

network=TEXT

station=TEXT

location=TEXT

channel=TEXT

maxlatitude=NUMBER

minlatitude=NUMBER

Mapping between HTTP parameters and handler arguments

The WSS will pass each parameter to the handler by simply translating each parameter to a command line argument and value.

Service parameter:key=value

Command line equivalent:--key value

The handler is expected to use the same formats and conventions for the parameter values as specified for the service parameters. For example, if the service accepts a time value as YYYY-MM-DDTHH:MM:SS the handler should also accept the equivalent time value in that format. Another example, if service accepts a depth as a value in kilometers, the equivalent option in the handler should take the value as kilometers.

Documentation

The base URL of the web service is expected to return documentation in HTML format. The page should generally contain an overview of the service operation with links to further documentation as needed. To maximize flexibility the template for this page is not embedded into the service. The service will fetch and return the HTML document indicated by rootServiceDoc and the strings BASEURL, VERSION and HOST will be replaced as indicated in the configuration parameter table. The value of rootServicePath(BASEURL) should be the URL used by clients to access the service and can be used to create links in documentation to example queries and services methods (e.g. WADL or a version method).

Error conditions

On handler errors or a handler timeout, the shell will translate the errors to an appropriate HTTP status and error message and terminate the handler if it is still running.

On client errors the shell will terminate the handler.

Terminating the local handler is done in the following stages:

  • Send the process the termination signal (SIGTERM).
  • If the process remains after 10 seconds, send the kill signal (SIGKILL) and disconnect all streams from the process (stdin, stdout, stderr).

Usage Logging

The WSS can log each request using either Log4j or via an IRIS-designed scheme utilizing JMS (Java Message Service, a publish / subscribe system). The Log4j mechanism will be used by default and is configured to write the log entries to files. Standard Log4j configuration is used which allows the use of a variety of existing Log4j ‘appenders’. The default logging configuration is a daily rolling log.

The JMS mechanism is far more complex and is designed to utilize a centralized (possibly clustered) logging server to which messages are submitted by multiple services. Multiple subscribers can then listen for these messages and log them to a file, database, etc.

In addition to an entry for each request, the WSS will log a summary of each data segment for time series requests.

The usage log message content is designed to be consistent between service types to allow for aggregation and uniform storage.

WSS Logging

The WSS servlet itself utilizes Log4j to log internal log messages indicating the internal state of the WSS program. These are written to a separate log file from the Usage Logging described above.

The locations of the usage and WSS log files are configurable via the Log4j properties configuration file.

Authentication (for fdsn-dataselect only)

For service methods that require authentication for access to restricted data (e.g. fdsn-dataselect/queryauth) the web application container should be configured to negotiate the authentication and provide the WSS servlet with the authenticated user name.

If authentication is successful, the user name will be supplied to the data content handler with a command line argument of “--username <user>” or an internal variable. The implication is that this user has been authenticated and data content should be allowed based on the user name. Except for the user name no other authentication credentials are provided to the data content handler.

Example mapping for the fdsn-station service

HTTP Parameter / Handler argument / Description (from the specification)
starttime / --starttime / Start time (YYYY-MM-DDTHH:MM:SS)
endtime / --endtime / End time (YYYY-MM-DDTHH:MM:SS)
startbefore / --startbefore / Start before
startafter / --startafter / Start after
endbefore / --endbefore / End before
endafter / --endafter / End after
network / --network / SEED network code
station / --station / SEED station code
location / --location / SEED location ID
channel / --channel / SEED channel codes
minlatitude / --minlatitude / Minimum latitude (degrees)
maxlatitude / --maxlatitude / Maximum latitude (degrees)
minlongitude / --minlongitude / Minimum longitude (degrees)
maxlongitude / --maxlongitude / Maximum longitude (degrees)
latitude / --latitude / Latitude for radius (degrees)
longitude / --longitude / Longitude for radius (degrees)
minradius / --minradius / Minimum radius, default 0 (degrees)
maxradius / --maxradius / Maximum radius (degrees)
level / --level / Result level: net,sta,chan,resp
includerestricted / --includerestricted / Include metadata if access is restricted
includeavailability / --includeavailability / Include availability
updatedafter / --updatedafter / Limit to metadata updated after time

A request for:

fdsn/station/query?network=IU&starttime=2012-01-01T12:13:14

might result in handler execution similar to:

/usr/local/bin/MetadataHandler --network IU --starttime 2012-01-01T12:13:14

Ideas for consideration:

* Include an administrative“status” page that shows the configuration of the server. This page could potentially allow changing of log levels.

1