Doc. Eurostat/ITDG/October 2005/3.4

IT Directors Group

24 and 25 October 2005

Luxembourg, BECHBuilding

Room Ampère, 09h30-17h30

Data sharing – application for data dissemination

(The SODI project within the ESS)

Item 3.4 of the agenda

1

Data sharing – application for data dissemination
(The SODI project within the ESS)

1.Background and recent developments

In 2005 Eurostat launched SDMX Open Data Interchange (SODI) as a data sharing project in the European Statistical System. The project has been presented in detail at the last meetings of the ITDG[1] and the STNE[2]. The project started with a pilot exercise involving the National Statistical Institutes of Germany, France, the Netherlands, Sweden and the United Kingdom, designed to test a new approach for collecting and disseminating short-term statistics with the PEEI aimed particularly at making them more timely and accessible. An interesting aspect in this context has been the investigation of the “pull” and “push” technologies using SDMX standards.

One result of the SODI Pilots is an Issue Report, which was already presented to the FROCH group in June 2005; you find this document in the annex. It has been updated by inclusion of a summary progress table in which the status of the different issues has been evaluated.

Another result of the SODI pilots is a proof of concept for the SODI exercise. This has been done via numerous data transmissions testing the feasibility of both SDMX technical standards, transformation of GESMES messages into SDMX, and the data sharing approach.

A live demonstration of the data sharing will be given at the ITDG meeting.

Starting in early 2006, future work on SODI will be financed under the XDIS project (see agenda item 3.2).

2.Demonstration of data sharing

A demonstration of the data sharingapproachhas been set up with the indicator monthly Industrial Production Index (IPI). The demonstration is based on the following technologies and workflow:

Technologies

  • SDMX message web service

The Statistics Netherlands provided a web service where data available via StatLine ( are transformed into SDMX message webservices.

  • RSS feeds to Eurostat

RSS (Really Simple Syndication) feeds generated by Statistics Netherlandscontain information on new data and on revision.

  • Requestor (SDMX query message)

In case of new or revised data the Eurostat requestor issues, via SOAP protocol, a SDMX query message to the Statistics Netherlands web service.

  • Eurobase

Eurobase is the new Eurostat reference database currently under development, which will replace the current NewCronos system. The new system will be used to gather, organize and validate the reference data produced by production units, and to build the statistical datasets that will be disseminated to the public.

  • eXtensible Stylesheet Language Family (XSL)

XSL consists of two parts: a language for transforming XML documents (XSLT), and an XML vocabulary for specifying formatting semantics (XSL-FO). XSL can be applied to SDMX-ML to produce formatted tables for display or publication. End-users may choose to retrieve the SDMX-ML files and to apply their own XSL transformations.

Workflow

  1. Eurostat fetches monthly Industrial Production Index (IPI) data from Statistics Netherlands web service making a request via SOAP protocol.
  2. The request is made from a requestor application within Eurostat (SDMX query message). The requestor pulls the data from the data sources (Statistics Netherlands web service).
  3. The IPI data from Statistics Netherlands are populated into the Eurobase.
  4. The final step is a download of the compiled data IPI data from Eurobase displaying the SDMX-ML files using XSL techniques to facilitate the presentation and re-use of the data

This is not yet a production system, but is intended to demonstrate the possibilities of the data sharing approach implemented through SDMX using web services and other available technologies.

3.Progress in SODI

The status of the pilots can be seen from the following table:

Country / Format / Method / Status
DE / SDMX-ML / Pull / Transmission formats and parameters defined, transmission environment set up on both sides, RSS feed and SOAP service available
FR / SDMX-EDI / Push / Data transmission arrived at Eurostat and was made available on CIRCA web site in SDMX-ML after format checking
NL / SDMX-ML / Pull / Transmission formats and parameters defined, transmission environment set up on both sides, data available in SDMX-ML on CBS-web site and made available on CIRCA, RSS feed available
SE / SDMX-EDI / Push / Data transmissions arrived at Eurostat and were made available on CIRCA web site in SDMX-ML after format checking
UK / SDMX-ML / Push / Data transmission arrived at Eurostat and was made available on CIRCA web site in SDMX-ML after format checking

The SODI Pilots Issue Report, presented to the FROCH group in June 2005, summarises issues encountered or identified during the pilot exercise. The issues are categorised as technical, standardisation, statistical, political, organisational and legal. The report marks points for decision, threats and opportunities and gives recommendations for tackling both technical and non-technical issues. These issues are to be discussed by the appropriate bodies (FROCH, ITDG, SODI Task Force) in order to arrive at decisions or recommendations for the future use of the data sharing approach.

4.Current developments

Starting in early 2006, it is intended that future work on SODI will be financed under the XDIS project (see SPC agenda item 6).

The following indicative timetable shows not only a possible timescale but also points to the need for several groups to contribute to the resolution of the issues set out in the Issue Report.

When / Actors / Milestones/work
2005
October / IT Directors Group (ITDG) / Eurostat presented a progress report plus a live demonstration of the data sharing application for data dissemination
November / SPC / Progress report
November / FROCH / Review updated timetable for next stage of work
Review progress on statistical, organisational, political issues
Eurostat/ECB/OECD to present paper on philosophy of data-sharing
December / Eurostat / Sign first contracts for further development of SODI
Note: this depends on the timely availability of the IDABC budget, assuming that the IDABC work programme is formally adopted in Q3 2005 and that the Global Implementation Plan for Eurostat’s IDABC project is approved by the SPC.
2006
May / SPC / Eurostat would ask the SPC to endorse plans for the first production phase of SODI
May / STNE / Progress report by Eurostat
Q3-2006 / Eurostat / First public access to information via SODI

5.Decision-making for SODI

In the early stages, SODI was in effect sponsored by the FROCH group as a realisation of the “common dissemination platform” concept which was proposed by the FROCH in 2003. As SODI advances towards regular production use, it is proposed to define the roles of the various groups according the following schema. The SODI Task Force does not appear in this schema as its existence is linked to the SODI pilots which are ending in 2005. However, this remains a need for very close operational contacts with the group of countries which are actively implementing SODI. It is assumed that in 2006, this group will expand beyond the original five countries.

6.Questions to Member States

Member States are asked

to comment on any aspect of the SODI project;

to indicate if they wish to participate actively in the next stage of work on SODI

to give their views on those items in the Issue Report which are marked for the attention of the ITDG, focusing in particular on the following points:

2.1, 2.2 / Are NSIs willing to generate statistics in SDMX-ML format?
2.3 / Implementation of “pull” method: are NSIs able to set up the required web services?

ANNEX: SODI Issue Report

1.Management Summary

1.1.Overview

SODI (SDMX Open Data Interchange) is a data sharing project in the European Statistical System. The main idea of data sharing is that national PEEI data become visible on Eurostat's web site, as soon as they are nationally published.

This document summarises the issues encountered or identified during the SODI pilot exercise. It marks points for decision, threats and opportunities, and gives recommendations and lessons learned, tackling both technical and non-technical issues.

The issues raised in this document shall be tackled at the appropriate level, only a few will be finally handled by the SODI pilots task force. In addition, while some issues have to be solved before taking further steps in the SODI project, most of them can be addressed whenever they will become prominent.

1

1.2.Summary progress table

Issue / Discussion by / Conclusions/status
2. / Technical issues
2.1. / SDMX-EDI and SDMX-ML / SODI TF
ITDG / Both formats can be used.
2.2. / SDMX-ML for dissemination / SODI TF
ITDG / Feasible.
2.3. / Push and Pull / SODI TF
ITDG / Both methods are feasible. Pull requires more investment by MS and more harmonisation, but allows more seamless integration into production processes.
2.4. / Registries / SODI TF
ITDG / Further work needed.
3. / Standardisation issues
3.1. / Key Families / SODI TF
ITDG / It is clear from the pilots that much more work is needed to ensure SDMX-compatible key families are available and used.
3.2. / Statistical terminology / SODI TF
ITDG / Further work needed.
3.3. / Reference metadata / SODI TF
ITDG / Further work needed.
3.4. / Notification mechanism / SODI TF
ITDG / RSS V2.0 used for pilots. A definitive standard or recommendation for notifications has to be discussed also with the SDMX partners.
4. / Statistical issues
4.1. / Concepts / SODI TF
ITDG / It is known already that there is a lack of consistency in the way statistical concepts are used and coded across domains. As for 3.1, much more work is needed to ensure that key families meet the requirements of SODI.
For the pull method (see 2.3) Member States have to use the same concepts in their publication as Eurostat will use to publish the data.
Harmonisation of concepts will be a criterion to decide on the indicators suitable for data sharing.
Further discussion on this point is needed.
4.2. / Seasonal Adjustment / FROCH / Agreed to publish national data “as is” with footnotes to signal different methods for seasonal adjustment.
4.3. / Statistical confidentiality / Eurostat / In principal this should not be an issue as the national data used in SODI are already published. However, Eurostat will monitor this as SODI develops.
4.4. / Validation / SODI TF
ITDG / Further work needed.
4.5. / Footnote Treatment / SODI TF
ITDG / Further work needed.
5. / Political Issues / A document on the principles and opportunities for data sharing is being prepared by ECB, Eurostat and OECD and will be presented to the FROCH in November 2005. The political issues are addressed in that document.
5.1. / The role of Eurostat in a data sharing environment / FROCH / Further discussion needed.
5.2. / Aggregates / FROCH / Further discussion needed.
5.3. / Releases / FROCH / Further discussion needed.
5.4. / Embargo Policies / FROCH / Further discussion needed.
5.5. / Ownership of the Data / FROCH / Further discussion needed.
6. / Organisational Issues
6.1. / Coexistence of Standards and Methods / SODI TF
ITDG / So far it appears from the pilots that it is feasible to allow different methods and standards to coexist. However, this will have to investigated in more depth in the first phase of development of the SODI production system.
6.2. / Embargo Treatment / SODI TF
ITDG / Further discussion needed.
6.3. / Integration of SODI into Eurostat’s Data Life Cycle / SODI TF
ITDG / Further discussion needed. In addition, the technical implications of this will have to investigated in more depth in the first phase of development of the SODI production system.
7. / Legal Issues
7.1. / “Pull” and the Obligations of Member States / Eurostat / Further investigation needed.
7.2. / SODI and SDMX in Legal Acts / Eurostat / Further investigation needed.

1

2.Technical issues

Note: these issues will be considered by the SODI TF and the ITDG.

2.1.SDMX-EDI and SDMX-ML

One main result of the pilots is, that the two different data formats do not cause a real problem. The conversion from SDMX-EDI to SDMX-ML within the pilots is being done with an existing tool (an MS-Access data base which is also used for the maintenance of key families); this tool shall be replaced in the future by a more appropriate one (not based on proprietary desktop-software). SDMX-EDI conforms to GESMES/TS version 3.0. The differences between SDMX-ML and SDMX-EDI (apart from the mere syntactical differences) are described in chapter 3 of the "Implementor’s Guide For SDMX Format Standards (Version 1.0)"[3].

2.2.SDMX-ML for dissemination

The SODI pilots project will test as well the use of SDMX-ML for dissemination. For this, formatting information (in XSL, the eXtensible Stylesheet Language, or CSS, Cascading Style Sheets) will be added to the SDMX-ML data, so that they can be visualised in any modern browser. The experiences with this will be worked into this report at a later stage.

2.3.Push and Pull

The Push method means, that the Member States send their data to Eurostat on the conventional STADIUM/eDAMIS way:

The Push method has already been tested and verified. Existing tools (STATEL, STADIUM, EDIFLOW, eDAMIS) can be used.

The Pull method is characterised by the fact, that a Eurostat application (called "requestor") fetches the data from the web site of the National Statistical Institute:

The requestor is triggered by an RSS feed. RSS stands for "Really Simple Syndication", and is the web standard for news feeds. The implementation of the Pull method is currently under construction in both Eurostat and two National Statistical Institutes (Netherlands and Germany).

The Pull method is more difficult to install both at Eurostat and at the NSIs; because of this, it has been the subject of in-depth technical investigations within the SODI pilots. However, this method will finally fit better into a general SDMX-ML dissemination environment in the Member States, so a seamless integration into the production process will be achieved automatically. In addition, the Pull uses only standardised internet compatible methods. Further experiences with the Pull method will be worked into this report at a later stage.

The Pull guarantees that data are available at the same time for Eurostat as they are published nationally, while it requires that the National Statistical Institutes publish data conforming to the European concepts rather than using national ones.

2.4.Registries

XML registries (i.e., registries describing the data model and the concepts and code lists used) are a vital part of the XML world, and as such their use is foreseen for SDMX-ML. In the SODI pilots, registries are not used for the interpretation of the messages. This will be addressed at a later stage.

The IDABC programme of the Enterprises and Industry DG of the European Commission foresees the creation of an infrastructure for XML registries for interoperability projects of the Commission. SODI could make use of this registry infrastructure.

3.Standardisation issues

Note: these issues will be considered by the SODI TF and the ITDG, and also by the SDMX partners

3.1.Key Families

For the pilots it proved to be an additional difficulty causing the need for manual intervention, that the existing key families for the GDP indicator were not compatible with GESMES/TS (and hence with SDMX). A compatible set of key families is currently still under development through a collaboration between Eurostat and ECB; a beta version is expected for mid 2005. It is a prerequisite, that at the start of the full SODI implementation all key families are SDMX compliant.

3.2.Statistical terminology

The development of more efficient processes for sharing data also requires the adoption of a standard terminology for describing the statistics being exchanged. The Metadata Common Vocabulary (MCV) elaborated under the SDMX initiative provides the common set of terms (and related definitions) to be used for the sake of terminological consistency. Agreement on such a standard implies a continuous updating of the MCV to reflect core concepts used within SDMX-aligned projects. SODI, therefore, is one of those initiatives which can provide a valuable “reality check”, through the description of key families and the attachment of a set of reference metadata documenting the data. Existing ambiguities in the use of a term, or the fact that not all terms have been identified in the MCV, call for a parallel expansion of the MCV during 2005-2006, in line with SODI development.

3.3.Reference metadata

At each stage, SODI should make use as far as possible of the latest standards proposed to Member States by Eurostat for associating data with an aligned set of metadata items.

3.4.Notification mechanism

The pilots show that there is a need for notificationsfor the publication of news about SDMX data. The SDMX standards do not yet include notification system. The RSS (“Really Simple Syndication”) standard provides a widely used notification mechanism, so that in the SODI pilots the current solution is an SDMX namespace in RSS version 2.0. It has to be discussed with the other SDMX stakeholders, whether this solution should be generally recommended for RSS feeds on statistics published in SDMX format.

4.Statistical issues

Note: issues to be reviewed by the FROCH now or later are flagged.

4.1.Concepts [SODI TF + ITDG]

Indispensable for the sharing of data is the harmonisation of concepts. Fortunately, for the PEEI and the National accounts data (ESA95) this is already the case. For other data flows, especially in the area of social statistics, this has to be checked case by case before integrating them into SODI.

As part of Eurostat's Data Life Cycle initiative, it is planned to reduce the number of code lists in use for the same concept. SODI – as requiring a unique concept for reception, production and dissemination at Eurostat – could contribute to this effort.