NSI / Office for national statistics
Name contact person / Pete Brodie
E-mail contact person /
1. What is the current status of your S-DWH?
in development /
in implementation or operational) / Planned
2.Please give a (short) general description of the nature of your
S-DWH by describing topics as:
Which problems do/did you want to solve by implementing a S-DWH?
What is the goal of the
What are the most important solutions desired by the NSI, to be solved by implementing a S-DWH?
What functionalities are (to be) implemented in the S-DWH? / Modernising system, reducing proliferation of silos ( most important), standardising process and metadata.
To support production of primary outputs. Increase the value of ONSIS data. Identification of reuse.
3. What is the scope of your S-DWH?
(ETL, Data Warehousing, metadata, integrations between metadata system and ETL) / Standardise ETL process. Standardise metadata use and integrate data by breathing down silos.
4. How is the S-DWH organised?
(activities, responsibilities, roles etc)
If you have an architectural diagram, are you willing to share it with us (yes/no)? / Systems are in silos with hard coded processes and metadata.
We have an early version of a business view which we could share.
5. Which problems did you encounter so far? / Funding
Cultural silo mentality and ownership. Legacy technology and a viable transition plan.
6. Does your S-DWH contains:
- micro data,
- macro data or
- both? / Current systems contain micro data and macro data.
7. In your S-DWH, do you save:
- initial data and weights,
- weighted data or
- both? / Current silo system weight and data stored together. Not ideal for a S-DWH
8. How do S-DWH and SR interact?
- completely independent
- fully integrated
- other (please specify) / Difficult to integrate the two for cultural reasons. Ideally we would want to integrate to enrich the metadata.
9. If S-DWH and SR are not fully integrated then is the S-DWH used for updating the SR? / It would be if we had reached this state.
10. How do you manage metadata in your S-DWH?
- per domain,
- other (please specify) / Per domain.
We would lite to centralise this in the future.
11. What metadata model(s)/standards
do you use in your
- etc / Some S-DWH moving towards adopting of GSIM
12. How do you manage the quality of your metadata?
- minimal quality requirements,
- standards / Minimal quality requirements at the moment.
Tools/Technology Used (please read Annex 1 for a summary of tools in NSIs)
Please give a general description of the technology used in your SDWH by describing tools used for:
(Indicate whether these tools were bought or built in house)
13. Capturing data from different sources into the SDWH (Source layer)
14. Cleaning and integrating the data sources into the SDWH (Integration layer)
15. Analysing and interpreting the data (Interpretation and Data Analysis layer)
16. Disseminating and providing access to Macro Data (Access layer)
Annex 1: Tool Inventory
This annex provides a brief description of software packages existing on the market or developed on request in NSIs in order to describe the solutions that would meet NSI needs, implement SDWH concept and provide the necessary functionality for each SDWH level.
The principal purpose of data warehouse is to provide information to its users for strategic decision-making. These users interact with the warehouse throughout Access layer using end user access tools. The examples of some of the end user access tools can be:
· Specialised Business Intelligence Tools for data access
Business intelligence tools are a type of software that is designed to retrieve, analyse and report data. This broad definition includes everything from Reporting and Query Tools, Application Development Tools to Visual Analytics Software, OLAP viewers. The main makers of business intelligence tools are:
o SAS Institute
· Office Automation Tools for regular productivity and collaboration instruments
By Office automation tools we understand all software programs which make it possible to meet office needs. In particular, an office suite therefore usually contains following software programs: word processing, a spreadsheet, a presentation tool, a database, a scheduler. One of the most common office automation tools around:
o Microsoft Office
o IBM‘s Lotus SmartSuite
· Graphics and Publishing tools
Graphics and publishing tools are tools with ability to create one or more infographics from a provided data set or to visualize information. There are a vast variety of tools and software to create any kind of information graphics, depending on the organizations needs:
· Web services tools (M2M) like:
o Microsoft Visual Studio
o Apache Axis
· SDMX tools like:
o SDMX Java Suite developed by European Central Bank
o Flex-CB Visualization
o DSW (Data Structure Wizard) developed by Eurostat
o SDMX Converter developed by Eurostat
o SDMX-RI (Reference Infrastructure) developed by Eurostat
o SDMX Registry developed by Eurostat
o SDMX.NET developed by UNESCO Institute for Statistics
Interpretation and Data Analysis layer
The interpretation and data analysis layer is specifically for statisticians and would enable any data manipulation or unstructured activities. In this layer expert users can carry out data mining or design new statistical strategies.
· Statistical Data Mining Tools
The overall goal of the data mining tools is to extract information from a data set and transform it into an understandable structure for further use. They also are capable to visualise data, which was extracted in data mining process:
o IBM SPSS Modeler
o SAS Enterprise Miner
o Microsoft Analysis Services
· Business Intelligence Tools for data analyse in a direct connection with data base
Business Intelligence tools allow users to create visual reports for data analysis needs. These tools often come as packages that include tools for ETL operations, and for designing specialised OLAP cubes, and finally presentational tools for displaying tabular data from specialised reporting views for end users:
o SAS OLAP Cube Studio
o SQL Server Analysis Services (SSAS)
o Analytic Workspace Manager 11g
· Tools for designing specialised reports like:
o Eclipse BIRT Project
o OpenOffice Base
o Oracle Reports
o SAS Web Report Studio
o SQL Server Reporting Services (SSRS)
o Crystal Reports
The integration layer is where all operational activities needed for all statistical elaboration processes are carried out. This means operations carried out automatically or manually by operators to produce statistical information in an IT infrastructure. With this aim, different subprocesses are predefined and preconfigured by statisticians as a consequence of the statistical survey design in order to support the operational activities.
In general, for the Integration layer there are mostly dedicated software applications usually defined as Data Integration tools. This kind of software is used for metadata management and usually is developed and implemented on NSI request. This is because of specific needs and requirements from customer. It has a user friendly graphic interface to help the integration of different input sources and their manipulation.
The Source Layer is the level in which all the activities related to storing and managing internal or external data sources are located. Internal data are from direct data capturing carried out by CAWI, CAPI or CATI while external data are from administrative archives. Essential component used to load data into data warehouses from the external sources is ETL. The most popular commercial ETL Tools are:
o Oracle Warehouse Builder (OWB)
o SAS Data Integration Studio
o SAP Business Objects Data Services (SAP BODS)
o Microsoft SQL Server Integration Services (SSIS)
o Pentaho Data Integration (Kettle)