NSI / Statistics Estonia
Name contact person / Maia Ennok
E-mail contact person /
1. What is the current status of your S-DWH?
(planned / in development / in implementation or operational) / Indicated in the questionnaire: question 6
No single coherent system which covers most of the data in the production of business statistics yet
Implementing
S-DWH is implemented at first for Population and Housing Census in 2011.
Implementing all social studies and 6 economical statistical activities by the end of 2014
General Description
2.Please give a (short) general description of the nature of your S-DWH by describing topics as:
Which problems do/did you want to solve by implementing a S-DWH?
What is the goal of the S-DWH?
What are the most important solutions desired by the NSI, to be solved by implementing a S-DWH?
What functionalities are (to be) implemented in the S-DWH? / Indicated in the questionnaire: question 1 + 1.1 resp 1 + 1.2 + remarks, 2.1 and 14
No Single conceptual approach for processing data in the production of business statistics.
The current situation be best characterized as ‘The 'Process Model'
We have the "Process Model" approach if there is also possible more than one input for output and inputs (surveys, admin data, census, observation register) and outputs (aggregate statistics, micro data, time series) are the same for the "Process Model" as for the "Data Model". We have the "Data Model" approach for the business register for statistical purposes. (Remark made)
Main motivation to start DWH in your business statistics systems are:
More ways to (re)use data
Data linking / integration
To implement changes in business process architecture
To improve efficiency in business statistics production
Technical efficiency
Most important challenges of the NSI
- Changes in the society are fast.
- The users of statistics in Estonia as well as on international level require data which are reliable, comparable and corresponding to the users’ needs.
- The demand for statistics is continuously increasing.
- For the EU Member States the compatibility of the needs of the European Union and the country is a challenge.
- Local governments and counties need data for observing their development and for making decisions.
- For the new decade the qualitative data have more important role than ever in the society as a whole (in public administration and enterprise). That is the reason why the importance of objectivity and professionalism of statistics is increasing.
- The users of statistics need the data as soon as possible. For this reason the statisticians must choose between being up-to-date and quality, compromises must not be made with regard to the latter.
Most important technical weaknesses
· data is duplicated
· data consistency is not assured
· metadata is missing or not harmonized
· different data sets are hard or impossible to link
· data re-using is very difficult
· impact analysis is not possible
· different statistical activities use different platforms for data processing and data storing
· standardizing of processes is very time consuming
Goals from business strategy
1. Increase the coherence and comparability of statistics of different subject areas
2. Increase the volume of analytical products
3. Respond faster to the needs of clients
4. Standardise the process of data processing
5. Separate product development and orders for information from production process
6. Develop the department-driven organisational culture into the institution-driven one
7. Create a modern working environment
8. Reuse of statistical data, metadata and data steps (data processing code)
Solutions to achieve above mentioned goals
- Centralizing and standardizing meta data (Goals:1;3;4;6;7;8)
- Standardizing and automating processes of data processing (Goals:1;3;4;6;7;8)
- Building and using S-DWH (Goals:1;2;3;5;6;7;8)
- Centralizing technical specialists of data warehouse (Goals:1;4;6)
- Single integrated platform for metadata, data processing and data storing (creating precondition for creating unified archiving solution) (Goals:1;2;3;4;5;6;7;8)
-
Functionalities:
Data processing software (VAIS) has essential role in IT architecture. It is central integration tool and main data processing software. VAIS moves data from one system to another. For example from data collection to data processing and statistical registries, from processing to data warehouse etc. VAIS is metadata driven and enables to combine workflows from prior programmed data processing steps. Manual intervention (edit, impute, code etc) is also possible using operator’s application.
iMeta is central metadata repository, based on MMX MOF 3 meta-meta model, that enables to manage several different meta models. As metadata we manage both reference metadata and structural metadata (including process metadata, technical metadata, user roles and privileges etc).
Data warehouse (conformed collection of datasets) consists of data processed and prepared for analysis. In data warehouse variables (columns) are linked with variable descriptions in iMeta. Data sets in Data Warehouse are versionised. With each version is stored also suitable data processing package version, with what data was produced. Data sets are mutually linked with common dimensions and facts in different data sets are unique (avoid data duplication in different data sets).
Need assistance of centre of competence or best practice from some other NSI’s
· Data duplication in S-DWH
· Store everything vs easy access
· Best practice for storing population and sample
· Role of Statistical Business Registry (?)
· Implementation of standards (for example DDI)
3. What is the scope of your S-DWH?
(ETL, Data Warehousing, metadata, integrations between metadata system and ETL) / Statistical Data Warehouse system consists of:
Phase 2: iMeta application for designing statistical activity (specifying outputs, variables, data collection methods etc)
Phase 2 and 3: VAIS Designer for designing, building and configuring workflows
Phase 5: VAIS Operator for manual processing, XDTL Runtime for automatic meta data driven data processing (ETL) and Data Warehouse (conformed datasets) for storing processed data to be analysed in Phase 6.
Phase 6: VAIS Cube Designer for designing and building metadata enriched cubes, calculating new variables etc.
All metadata is stored in one metadata repository.
Best practice:
· Metadata repository - MMX (integrated metadata system (iMeta), data processing workflow design, build, configuration (VAIS Designer))
· Metadata driven data processing system (XDTL Runtime, VAIS Operator)
· Metadata driven cube generation (VAIS Cube Designer)
4. How is the S-DWH organised?
(activities, responsibilities, roles etc) / In data processing, there are fallowing roles:
Lead statistician – describes metadata of statistical activity in metadata system and data processing rules and checks in data processing system.
Workflow designer – compile workflow in data processing system.
Data Warehouse programmer – programs and complement reusable data processing steps.
Data Warehouse architect– develops and maintains data model of Data Warehouse.
Operator – manually corrects errors found during data processing.
Analyst – analyze data in Data Warehouse and produces statistics.
Lead statistician, operators and analysts are structurally in departments of statistical domains.
Data Warehouse architect, workflow designer, Data Warehouse programmer are situated in unit of data warehouse.
Statistical domains are responsible for statistical methodological side and data warehouse unit is responsible for data warehouse technical realization.
5. Which problems did you encounter so far? / Indicated in the questionnaire: questions 15, 16, 17
Main methodological barriers to implementing an integrated system are:
Not enough expertise
Too difficult, complex
Data linking
Insufficient development time
Too high costs
Problems encountered in data integration you desire to solve with an integrated DWH-system are:
With Data Warehouse system we could have more data available to use and reuse, and by using metadata-driven DWH-system we could have better data quality for output data. DWH could positively effect the response burden (diminish administrative response burden). (remark made)
Problems encountered in process integration you desire to solve with an integrated DWH-system
With Data Warehouse system we could have more effective and unified process (using one framework for all data warehouse processes) and with monitoring process metadata we could have better process quality. (remark made)
· It is difficult to develop massive system if requirements for system are not cleared.
· Creation (agreement) of metadata to new system takes a lot of time.
· People resistance to changes.
· Support of administration for changes could be bigger.
· No agreement how new system has influence division of work between departments.
· Technical problems with implementing freshly finished software.
Characteristics
6. Please check the characteristics as indicated in the questionnaire and correct / complete if necessary? / Indicated in the questionnaire: questions 3 + 4
There is not a one-to-one correspondence between input data and outputs. The data warehouse is passive.
ETL part is active but data storing is passive (no changes in data warehouse).
7. Is the data in the S-DWH unit based (micro data)? / Generally yes, but we can store aggregates also if needed.
8. Do you have weighted data in the
S-DWH? / We plan to store both, original and weighted data together with weights.
Statistical Registers
9. Is the Business Register (BR) or other Statistical Register (SR) managed in the
S-DWH? / Indicated in questionnaire: question 10
The business register currently sits (or will it be) outside the current (or planned) DWH system.
BR is stored and managed outside of SDWH.
10. Which BR / SR data are stored in the
S-DWH? / · stratification variables (from snapshots to Statistical Unit dimension)
· identifying variables (only in house ID, district level address)
11. Is the S-DWH used for updating the BR? / Yes, mainly for updating stratification variables in BR.
12. Do you have a snapshot of the BR in the S-DWH? / No, snapshots of BR is stored together with BR.
13. Do you keep different versions of BR snapshot in the
S-DWH? / No, snapshots of BR are versioned in BR. Statistical Unit dimension in S-DWH is based on BR snapshot and is versioned according to snapshot versions.
Metadata (please read Annex 1 for an explanation of the underlined Metadata concepts!)
A. What kind of reference (business) metadata is used? / B. What kind of structural (technical) metadata is used
14. What kind of statistical metadata is used? / We use iMETA system for statistical metadata (both reference and structural). Neuchatel model: statistical activity descriptions, collection methods. / Codes in Neuchatel model: statistical activity, statistical activity version, variables, classifiers (inc code lists) etc.
15. What kind of process metadata is used? / S-DWH creates and uses process metadata. XDTL metamodel: package, tasks, steps, parameter, variable, connection. Rule, transformation, condition etc.
16. What kind of technical metadata is used? / Relational Database (RDB) Metamodel (database, schema, table, column, roles etc.).
17. What kind of quality metadata is used? / Neuchatel model: statistical activity instance description: user needs, user satisfaction, quality assurance, quality assessment, quality management, quality documentation. / Imputation rate, unit and item response rate etc.
18. What kind of authorisation metadata is used? / Descriptions which statistical activity uses which administrative data. Who is statistical activity manager. / Role-Based Access Control Model and role based security: privileges by statistical activity, by variables, by operations.
17. Specify the key relations between the metadata classes according to your metamodel(s)?
(eg statistical activity, classifier, variable) / Relation between statistical activity and variables, variables and classifiers etc.
18. Specify the key relations between the different kinds of metadata
(reference, structural,
quality) / Different kind of metadata is somehow related. Key relations: relations between variable and database column, relations between variable and privilege (role based data access), etc.
19. How is the metadata maintenance organised and stored? / iMeta is central metadata repository, based on MMX MOF 3 meta-meta model, that enables describe several different meta models. As metadata we manage reference metadata, structural metadata (including process metadata, technical metadata, user roles and privileges etc.).
Methodological unit is responsible for metadata management. Different units are responsible for metadata inputs and updates. Principle is that metadata is filled where it formed (process and unit).
Reference metadata is just descriptive and not used for metadata driven system (it gives user the context).
Structural metadata is used for metadata driven system (example variables codes).
20. What are your metadata quality requirements? / Metadata is described in metadata systems by templates to assure: no missing values, only unique codes, metadata property quality rules (dependence between different types of metadata elements) etc.
Annex 1: Metadata concepts[1]
Reference metadata are metadata that describe the contents and quality of the data in order to help the user understand and evaluate them (conceptually)
Examples: Quality information on survey, register and variable levels; variable definitions; reference dates; confidentiality information; contact information; relations between metadata items
Structural metadata are metadata that help the user find, identify, access and utilise the data (physically)
Examples: Classification codes; parameter lists
Statistical metadata are data about statistical data
This definition will obviously cover all kinds of documentation with some reference to any type of statistical data and is applicable to metadata that refer to data stored in a S-DWH as well as any other type of data store
Examples: Variable definition; register description; code list.
Process metadata are metadata that describe the expected or actual outcome of one or more processes using evaluable and operational metrics
Examples: Operator’s manual (active, structured, reference); parameter list (active, structured, reference); log file (passive, structured, reference/structural)
Technical metadata are metadata that describe or define the physical storage or location of data.
Examples: Server, database, table and column names and/or identifiers; server, directory and file names and/or identifiers
Quality metadata are any kind of metadata that contribute to the description or interpretation of the quality of data.
Examples: Quality declarations for a survey or register (passive, free-form, reference); documentation of methods that were used during a survey (passive, free-form, reference); most log lists (passive, structured, reference/structural)
Authorisation metadata are administrative data that are used by programmes, systems or subsystems to manage users’ access to data.
Examples: User lists with privileges; cross references between resources and users
[1] Metadata Framework for Statistical Data Warehousing v0.9; ESSnet on Data Warehousing