European Commission – DG Eurostat

ESS.VIP Programme

Cross-cutting project on sharing statistical SERVices

Statistical Service Implementation

STRUVAL Structural Validation Service
Phase 1 – based on SDMX Converter

Version 0.95

19/10/2015

Table of Contents

1General information

1.1Service name

1.2Service version

1.3Relation to Service Definition and Specification

2Invocation Protocols

3Data-by-reference Protocols

4Canonical data models

5Non-canonical data models

6Distribution

7Service Contract

7.1Operation <validate>

7.1.1Function

7.1.2Statistical methods

7.1.3Invocation protocols

7.1.4Inputs

7.1.5Outputs

7.1.6Pre-conditions

7.1.7Post-conditions

7.1.8Metrics

7.1.9Business Exceptions

7.1.10Compensation

7.1.11Specific requirements

8Parameterization

9Requirements for security

9.1Security mechanisms

9.1.1Non-repudiation

9.1.2Integrity

9.1.3Authentication and trust domains

9.1.4Self-registration

9.1.5Authorization

9.1.6Encryption

9.1.7Data at rest

9.1.8Data in transfer (end-to-end)

9.2Data protection

10Policies

10.1Security assertions

10.2Quality of service assertions

10.3Message format assertions (compliance)

10.4Other Policies

10.5Terms of use

11Non-functional characteristics (QoS)

11.1Reliability

11.2Availability

11.3Performance

11.4Multilingual support

11.5Error handling

11.6Process metrics

12Technical Dependencies

13SOA Layering

1General information

1.1Service name

STRUVAL Structural Validation

1.2Service version

1.0

1.3Relation to Service Definition and Specification

In this document we describe the implementation of the Phase 1 SDMX Converter-based STRUVAL Service, along the lines set out in the service definition and specification.

The STRUVAL Service is based on an extended version 5.1 of the SDMX Converter Web Service, which is a part of the SDMX Converter toolchain (alongside with the API, GUI, and command line version).

The structural validations covered by this initial release of the STRUVAL Service include:

  • Verifying that the SDMX-ML message (the dataset) is a well-formed XML document.
  • Verifying that the structural elements in the SDMX-ML message (header, dataset, groups, series, observations, etc.) are correctly ordered and nested.
  • Detecting misplaced, undefined, and missing dimensions and attributes at the dataset, group, series, and observation levels.
  • Detecting invalid data format and invalid values for time-period concepts.
  • Detecting invalid codes, based on the code lists and the dataflow constraints.
  • Detecting duplicated observations.

The result of the STRUVAL Service is a machine-readable validation report containing the overall success indication and, in case of validation errors, a list of detected errors (up to a user-configurable limit). Each detected error is characterized using a standard error code, a descriptive text, and either a line/column in the input file of the incorrect XML syntactic element, or the value dimensions for the data unit (series, observation) where the error has been detected.

The initial STRUVAL Service release has the following limitations:

  • The DSD has to be sent (i.e., embedded) with the service call, and needs to be self-contained, i.e., needs to include all code lists and other artifacts referenced from the DSD. In future releases, it will be possible to refer to the DSD, code lists, and other artifacts, via SDMX registry.
  • Currently only a subset of SDMX-ML formats is supported. The supported formats are SDMX v2.0 Comact and SDMX v2.1 Structure-specific messages. The support for other SDMX-ML formats will be added in future releases.
  • XML syntax errors in the SDMX-ML input messages (datasets), which cause a message not to be a well-formed XML document, are currently non-recoverable, in the sense that the validation process stops upon the first encounter of such an error.

2Invocation Protocols

STRUVAL is being developed by Eurostat in order to assist the Member States and Eurostat in the process of structural validation of statistical datafile. These structure and dictionaries are defined in a DSD (dataset definition) stored in the Euro SDMX Registry.

The STRUVAL Service is implemented as a SOAP/HTTP Web service that extends the existing SDMX Converter Web service interface by introducing a new service operation named "validate" which accepts:

  • The input SDMX-ML data set, embedded in the service request.
  • The data structure file, embedded in the service request; this is normally a dataflow with the embedded DSD and, optionally, dataflow constraints.
  • The user-defined maximal number of validation errors to be detected and reported by STRUVAL.

The STRUVAL Service returns the validation report.

3Data-by-reference Protocols

In the initial release of the STRUVAL Service, all data are passed by value. In future releases, it will be possible to refer to DSDs, dataflows, code lists, etc. stored in SDMX Registries.

4Canonical data models

Files to be successfully structurally validated must have SDMX-ML file format, and be compliant with the SDMX-ML information model.

5Non-canonical data models

N/A

6Distribution

The Phase 1 release will be distributed within Eurostat. Later versions of STRUVAL are planned to be made available for Member States.

7Service Contract

7.1Operation <validate

7.1.1Function

To validate that the given input is a valid SDMX-ML message (dataset) that conforms to the structural and coding rules defined by the SDMX standard and the given DSD/dataflow.

7.1.2Statistical methods

No statistical method.

7.1.3Invocation protocols

SOAP/HTTP

7.1.4Inputs

Parameter name / Type / Description
inputData / base64Binary / The embedded input SDMX-ML document (the dataset).
dsdStructure / Base64Binary / The embedded data structure file (DSD/dataflow with constraints).
maxErrorNumber / int / The maximal number of validation errors to report.

7.1.5Outputs

Parameter name / Type / Description
returnCode / int / The overall return code:
  • <0 for structural errors in DSD and/or malformed input data XML document
  • 0 if no structural validation errors have been found
  • >0 if one or more structural validation errors have bene found

errorsFound / int / The number of structural validation errors that were found, if returnCode>0.
moreErrors / Boolean / Set to true if there were more errors than reported (i.e., over the user-defined error limit)
Errors / XML [0..*] / Description of each encountered error, if returnCode>0, up to the user-configured error limit.
Each error has:
  • Descriptive errorClass.
  • Numeric code and textual description.
  • Boolean fatal indicating that this error has forced further validation to stop.
  • Attachment level (dataset, series, etc.) described by attachedTo string field.
  • Either:
  • Numeric line and column indicating the location in inputData, or
  • Set dimensions giving the coordinates of the offending data.

7.1.6Pre-conditions

Pre-condition / Description
Self-contained data structure / The dsdStructure data structure file has to be self-contained, i.e. needs to contain all necessary structural elements: DSD, code lists, constraints.

7.1.7Post-conditions

Pre-condition / Description
Validation performed / returnCode >= -1

7.1.8Metrics

Metric / KPI / Description
Duration of processing / KPI-1 / Duration of the structural validation process
Time before processing / KPI-2 / What is the maximum delay between service launch and start of the datafile validation?
Concurrent access / KPI-3 / What is the maximum number of concurrent access?
Maximum processing capacity / KPI-4 / Maximum processing capacity of the service (number of file x size of file processed at the same time or during a defined period)

7.1.9Business Exceptions

None. The validation result is always returned.

7.1.10Compensation

None.

7.1.11Specific requirements

The input encoding for the embedded files in inputData and dsdStructuremust be UTF-8.

8Parameterization

Currently the only parameter is the maximum number of validation errors to report.

9Requirements for security

9.1Security mechanisms

9.1.1Non-repudiation

N/A

9.1.2Integrity

N/A

9.1.3Authentication and trust domains

Authentication is done using ECAS.

9.1.4Self-registration

Done using ECAS.

9.1.5Authorization

The initial release of the STRUVAL service does not access any external resources, and therefore does not need user authorization of that kind. However, in future releases, the service may require additional authorizations to access data stored in the registry etc.

9.1.6Encryption

The STRUVAL service receives and returns plain-text data.

9.1.7Data at rest

N/A

9.1.8Data in transfer (end-to-end)

Case of confidential datafile:

-Transmission has to be encrypted (TLS)

9.2Data protection

The STRUVAL Service does not store any data in the file system or

10Policies

10.1Security assertions

Use of HTTPS as confidential data file may be sent for structure validation.

10.2Quality of service assertions

To be elaborated based on the exploitation data:

  • Measurement of the request processing duration.
  • Defining the delay after which the service is stopped.

10.3Message format assertions (compliance)

Message format is SDMX-ML compliant to SDMX 2.0 / 2.1.

10.4Other Policies

None.

10.5Terms of use

  • General term of use defined for service at Eurostat.
  • Service security policy of Eurostat.
  • The first version will be a Proof of Concept and not yet ready for production use.

11Non-functional characteristics (QoS)

11.1Reliability

Message returns to the user is reliable in a sense of guaranteed delivery.

11.2Availability

The service should be available at minimum 95% of time during working hours 8:00 – 18:00 (only working days). This especially applies to the peak times between 10:30 and 16:00.

11.3Performance

(To be defined based on the exploitation data.)

11.4Multilingual support

No multilingual requirement here.

11.5Error handling

The STRUVAL Service should never fail and return a SOAP Service Fault message, unless under abnormal conditions of the execution environment (a network, servlet container, Web application server, Java Virtual Machine, or operating system failure).

The STRUVAL Service should always return a response described in Section 7.1.5 within a finite amount of time.

11.6Process metrics

-Number of concurrent processing

-Maximum size of the file to validate

-Maximum duration of a validation process

-Maximum delay between 2 processes.

12Technical Dependencies

The first version of the STRUVAL Service is self-contained, and has no external technical dependencies.

13SOA Layering

STRUVAL - SERV_Statistical_Service_Implementation v0.95 (05/09/2014)