Mapping the coverage of ESSnet projects relevant to Work Package 2 of the ESSnet on Data Warehousing

  1. Introduction

The purpose of this document is to identify any potential areas of overlap, or any potential gaps, between the deliverables of Work Package 2 (WP2) of the ESSnet on Data Warehousing (hereafter, ESSnetDWH) and the results of other ESSnet projects. We have identified which of the completed and ongoing ESSnet projects are of direct relevance to WP2 of the ESSnetDWH. We also provide a brief summary of the scope of these ESSnet projects, before attempting to identify specific work packages that are of relevance to the ESSnetDWH. This will enable the implementation of established best practice within the ESSnet DWH, and will prevent the duplication of work that has already been carried out. It also has the potential to identify any gaps in the deliverables of WP2 of the ESSnetDWH at an early stage. The intention is that this report can serveas a useful guide to the completion of these deliverables within the ESSnetDWH. It can also act as a “living” document that can be updated if and when new links are identified.

  1. Summary of the ESSnet on Data Warehousing

The ESSnetDWH consists of 5 work packages; the first and fifth of these relate to the co-ordination and management of the ESSnet and the dissemination activities of the ESSnet respectively. The remaining work packages focus on metadata (WP1), the methodological aspects of building a DWH (WP2), and the technical and architecture aspects of building a DWH (WP3). WP1 has deliverables that include producing a framework of metadata requirements, the impact of metadata quality, recommendations on the use of metadata methods, metadata management and how the metadata system maps onto the architectural framework of the DWH. WP3 has deliverables for producing the modular workflow and architectural framework of the DWH, the business architecture of the DWH, and development of an implementation strategy.

This document focuses on mapping links to the deliverables in WP2, which concentrates on the methodological aspects of building a DWH.WP2 has deliverables that include methodological evaluations of the metadata framework and DWH architecture, the production of guidelines relating to how the business register (BR) interacts with the DWH, recommendations on data linking and confidentiality, and guidelines on selective editing in the DWH. For completeness, a more detailed list of the WP2 deliverables is given in Annex 1.

  1. Summary of ESSnet projects of relevance to the ESSnet on Data Warehousing

The following ESSnet projects have been identified as being highly relevant to the ESSnetDWH. These projects were identified by conducting a short review of the ESSnet projects, both those completed and those ongoing, which are detailed on the ESSnet portal ( A brief summary of the scope and duration of these ESSnet projects is given below.

  1. ESSnet on the uses of administrative and accounts data for business statistics

This ESSnetproject aims to provide best practice recommendations on the uses of administrative and accounts data for business statistics. The scope of this ESSnetproject is on issues related to how administrative data are used. In particular, this ESSnet project focuses on identifying current uses of administrative data for business statistics, producing checklists for use in obtaining administrative data sources and for assessing their quality, identifying methods of estimation for the situation where administrative data are not available for the variable of interest, identifying methods to deal with the timeliness of administrative data, developing quality indicators for outputs produced using a combination of survey and administrative sources and understanding differences in definitions between accounting practices and business statistics. ThisESSnet project is due to complete in August 2013.

  1. ESSnet on data integration

This ESSnet project aims to provide recommendations on procedures for data integration. It covers record linkage, statistical matching and micro integration. The scope of the ESSnet is looking at issues related to linking data sources and producing a complete data set. The work packages of the ESSnetinclude providing recommendations on the state of the art for data integration, developing methodologies to solve some of the problems encountered when integrating data, identifying and developing software tools to perform data integration and presenting a series of case studies on the use of data integration. These case studies include micro integration for register-based employment statistics; quality evaluation of micro-integrated employment statistics; combining data from surveys and admin sources (single variable case); profiling Italian patenting enterprises and development of a framework for error in an integrated survey. ThisESSnet projectwas completed in December 2011.

  1. ESSnet on the integration of survey and administrative data

The ESSnet on data integration builds on this earlier ESSnet project on the integration of survey and administrative data. The first work package reported on the state of the art in techniques and experience for integrating survey and administrative data. The second work package provided recommendations on methodological aspects of linking survey and administrative data. The third work package considered software tools for performing the integration of survey and administrative data.

These ESSnets provide potential overlaps with a number of deliverables in WP2 of the ESSnet DWH. In addition, there are other ESSnet projects that may be able to provide relevant information for specific deliverables in WP2. The ESSnet projects that have been identified as being relevant in this context are: the ESSnet on Statistical Disclosure Control [1], which focussed on improving tools and knowledge in statistical disclosure control (SDC); the ESSnet on common tools and harmonised methodology for SDC in the ESS[2]; and the ESSnet on Common Reference Architecture[3]. In addition, the Blue-ETS project[4] has been identified as having links with specific deliverables in WP2 of the ESSnet DWH. The focus of Blue-ETS is on providing best practice recommendations and methodology for business statistics with a view to reducing the burden placed on businesses whilst taking quality into consideration.

It has also been possible to identify where there may be helpful links in the future for the ESSnet on DWH. One instance is the Memobust project[5], whose aim is to review and update the “Handbook on design and implementation of business surveys” (Eurostat 1997). The new handbook produced by Memobust will draw on its predecessor, and will consist of a number of modules that describe methods and themes that are relevant to modern business statistics. The modules in the current version of the Memobust handbook are available on the project website. The project began in January 2011, and will continue until December 2013. Although at the present time there is no clear overlap between the ESSnet DWH and the Memobust project, the latter will consider topics that are of interest to the ESSnet DWH and therefore may provide helpful background information, on areas such as data integration, editing techniques and disclosure. There is also a specific module on “Dissemination” in the later phase of the Memobustproject which will cover the methodological aspects of data warehousing (WillenborgScholtus 2011). Although there are no obvious overlaps between Memobust and the ESSnet DWH at the current time, it is clear that close communication between the projects will be helpful, especially in this later phase.

  1. Mapping of the ESSnet DWH to relevant ESSnet projects

There are informative links between the ESSnet DWH and the ESSnet projects identified in Section 3. These have been categorised as either “methods” links, where the outputs from other ESSnet projects can directly provide methods or solutions for ESSnet DWH WP2 deliverables, or“background“ links, where outputs from other ESSnet projects can provide useful background, or may help identify relevant issues, in the work of the ESSnet DWH. An effort has been made to identify such links, alongside their relevance and time frame, with any web links to existing documentation also provided. For clarity, only the deliverables of WP2 of the ESSnet DWH where links have been identified are shown. The mapping between the ESSnet DWH and the ESSnet on Data Integration is shown in Table 1. Given that the ESSnet on Data Integration builds on the results of the earlier ESSnet on the integration of survey and administrative data, only the former has been explicitly considered here. The mapping between the ESSnet DWH and the ESSnet on Admin Data is shown in Table 2, and links identified between the ESSnet DWH and the ESSnets on SDC and SDC harmonisation are shown in Table 3. Table 4 presents the links identified between the ESSnet DWH and the ESSnet on the Common Reference Architecture, and Table 5 presents the links identified between the ESSnet DWH and the Blue-ETS project.

Table 1. The mapping of links between the ESSnet DWH and the ESSnet on Data Integration

ESSnet on Data Warehousing / ESSnet on Data Integration
Deliverables / Type of link / Relevance / Time frame / Links
2.4 Guidelines and recommendations for data linking / Methods
Methods
Methods / WP1 State of the art in data integration – provides methods for data integration in a variety of circumstances
WP2 Development of methods – provides methods for establishing linkage error, performing inference, and maintaining micro-consistency and macro-consistency
WP3 Development of common software tools – provides software for data linking in R / The final report is available
The final report is available
The software is available /


2.5 Guidelines and recommendations for confidentiality / Background / WP1 State of the art in data integration – provides a literature review on data integration methods in statistical disclosure control;also provides a bibliography on intersection between statistical matching and statistical disclosure control / The final report is available
The bibliography is available /

2.6 Guidelines on selective editing options for DWH / Background / WP2 Development of methods – provides some discussion of possible editing strategies at different phases of the production process / The final report is available /

Table 2. The mapping of links between the ESSnet DWH and the ESSnet on Admin Data

ESSnet on Data Warehousing / ESSnet on Admin data
Deliverables / Type of link / Link and relevance / Time frame / Links
2.1 Methodological evaluation of metadata framework / Background
Background
Background / WP2a Checklist for admin data– aims to provide a checklist for the use of a new admin data source
WP2b Checklist for quality of admin data– aims to provide a checklist for assessing the quality of an admin data source
WP6 Development of quality indicators– aims to provide quality indicators for outputs based on survey and admin data / Final versions of checklists available in Feb/March 2013
Final versions of checklists available in Feb/March 2013
Draft quality indicators available now, final versions available May/June 2013 / On-line draft version can be viewed by registered users through:

On-line draft version can be viewed by registered users through:


2.4 Guidelines and recommendations for data linking / Background
Background
Background
Background
Background / WP2b Checklist for quality of admin data– aims to provide a checklist for assessing the quality of an admin data source
WP3 Methods of estimation for variables– aims to provide methods to estimate for variables where admin data sources are not available
WP4 Timeliness of administrative data– aims to provide methods of estimation when admin data is not available in time
WP6 Development of quality indicators– aims to develop quality indicators for assessing outputs based on survey and admin sources
WP7 Statistics and accounting standards– aims to align business statistics definitions with accounting characteristics / Final versions of checklists available in Feb/March 2013
Reports available for some variables; guide on estimation methods available June 2013
Theory to be developed by end 2012, implementation of methods to be available in 2013
Draft quality indicators available now, final versions available May/June 2013
Information on different variables will be available in 2012 and 2013. Some information on Structural Business Statistics is already available. / On-line draft version can be viewed by registered users through:





Table 3.The mapping of links between the ESSnet DWH and the ESSnets related to statistical disclosure control

ESSnet on Data Warehousing / ESSnets on:
Statistical Disclosure Control (SDC)
Common tools and harmonised methodology for SDC in the ESS (SDC harmonisation)
Deliverables / Type of link / Link and relevance / Time frame / Links
2.5 Guidelines and recommendations for confidentiality / Background
Background
Background / Task 2 SDC – focuses on SDC tools, including their sustainability and interaction with IT infrastructure
All tasks SDC – the outcome of the ESSnet was a manual on SDC
WP3 SDC harmonisation – considers the future directions of SDC software / Final versions of deliverables are available
The final version of the manual is available
This work is ongoing /


Table 4. The mapping of links between the ESSnet DWH and the ESSnet on the Common Reference Architecture (CORA)

ESSnet on Data Warehousing / ESSnet onCommon Reference Architecture (CORA)
Deliverables / Type of link / Link and relevance / Time frame / Links
2.3 Methodological evaluation of the data warehouse business architecture / Background
Background / WP3 Definition of the technical architecture – focuses on a definition of the common reference architecturethrough mapping to the Generic Statistical Business Process Model
WP3 Definition of the technical architecture –focuses on a definition of the common reference architecturethrough mapping to the GSBPM / The final report defining the common reference architecture is available
An instruction manual on implementing the model is available /

Table 5. The mapping of links between the ESSnet DWH and the Blue-ETS project

ESSnet on Data Warehousing / Blue-ETS
Deliverables / Type of link / Link and relevance / Time frame / Links
2.1 Methodological evaluation of metadata framework / Background / Blue-ETS WP4 – focuses on improving the use of administrative data, including assessing quality / Reports on quality indicators are available, a Quality Report Card with automated prototype will be available June 2012[6] /

  1. Visual depiction of the links with other ESSnets – identifying gaps in coverage

Figure 1 shows a visualisation of the links between the deliverables for WP2 of the ESSnet DWH and the work packages of the other ESSnet projects identified in Sections 3 and 4. The links identified as being “methods” links are shown as solid black lines, and the links identified as “background” links are shown as grey lines. Each of the ESSnet projects considered are represented by different coloured circles, as shown in the legend, and refer to the ESSnets on data warehousing (DWH), admin data (AD), data integration (DI), common reference architecture (CORA), statistical disclosure control (SDC), statistical disclosure control harmonisation (SDC-H) and the Blue-ETS project. In the case of the ESSnet DWH, the circles represent the different deliverables, and the deliverable number is shown in each circle. For the other ESSnet projects, the numbers in the circles represent the individual work packages where links have been made.

The figure highlights the importance of the results of the Data Integration ESSnet for deliverable 2.4 of the ESSnet DWH. In addition, there are many links to deliverable 2.4 that have been classified as “background” links, meaning that a lot can be learned for this deliverable from work that is currently taking place elsewhere, particularly in the ESSnet on Admin Data. A key observation from Figure 1 is that currently, no direct links have been identified for deliverables 2.2 (business register interactions with the DWH), and 2.8 (outlier detection and treatment options). This is perhaps unsurprising, as 2.2 is a deliverable specific to the set-up of a data warehouse, and although 2.8 considers outlier treatment, the application of outlier detection and treatment when using data for a variety of purposes is also an issue very much unique to the concept of a data warehouse.

  1. Conclusions

This exercise has identified potentially useful links between work package 2 of the ESSnet on data warehousing and other current and completed ESSnet projects. These links have been categorised into those where methods can be directly applied in the work of the ESSnet DWH, and those that should provide useful background on the topics being considered by the ESSnet DWH. This document will serve as a useful guide to work package 2 of the ESSnet DWH. However, the document should be considered as a “living” one, where the information can be expanded and updated as work begins on the other deliverables in WP2. It may be that further links can be identified, the usefulness of individual links can be assessed, or the tables can be further populated with contact names where relevant. The document should also be updated to respond to further material being made available by different ESSnet projects, to ensure that the hyperlinks provided remain up-to-date.

Figure 1.The links between the deliverables of the ESSnet DWH and work packages in the other relevant ESSnet projects.Circles represent the different ESSnet projects according to the legend, and the numbers inside the circles represent the deliverables (in the case of the ESSnet DWH) and the work package numbers (in the case of the other ESSnet projects). The ESSnet projects are data warehousing (DWH), admin data (AD), data integration (DI), common reference architecture (CORA), SDC (statistical disclosure control), SDC-H (SDC harmonisation) and Blue-ETS. Grey lines identify “background” links between work packages/deliverables, and black lines identify “methods” links between work packages/deliverables.

References

Eurostat, 1997, “Handbook on design and implementation of business surveys”, ed. A Willeboordse, available from

Willenborg L., Scholtus S., 2011, “Workplan of the Memobust project”, available by registered users from

Annex 1