/ United Nations Economic Commission for Africa
African Centre for Statistics (ACS)

DRAFT

Handbook on Major Statistical Data Management and Dissemination Platforms

Addis Ababa
May 2011
© 2011 African Centre for Statistics (UNECA) / Page 1

Table of Contents

Table of Contents

1.Background Information

1.1.Statistical Data

1.1.1.Micro-data

1.1.2.Macro-data

1.1.3.Metadata

1.2.Statistical Database System

1.3.Current Situation Analysis

1.4.Organization of this Document

2.Project Definition

2.1.Objective

2.2.Mode of Operation

2.3.Scope of Work

3.Major Requirements of Statistical Databases

3.1.Data Capturing

3.2.Data Storage and Retrieval

3.3.Data Processing and Dissemination

3.4.Standard Data Sharing and Exchange

3.5.Metadata Management

3.6.Indicators Management

3.7.Integration with Other Systems

3.8.Data security

3.8.1.Backup and Restore Feature

3.8.2.Access Control

3.8.3.Users Management

3.8.4.Users and Data Auditing

3.9.GIS Support

3.10.Reporting Features

3.11.Training

3.12.User Interface

3.13.Alerting Feature

3.14.Analysis Tool

3.15.Scalability

3.16.Extendibility

3.17.System Environment

4.Available Statistical Data Management Systems

4.1.List of Statistical Data management Systems

4.2.Product Descriptions

4.2.1.CountrySTAT (FAO)

4.2.2.DevInfo

4.2.3.Eurotrace Suite

4.2.4.GenderStats (World Bank)

4.2.5.LABORSTA (ILO)

4.2.6.Live Data Base (World Bank)

4.2.7.NESSTAR

4.2.8.StatBase

4.2.9.StatWorks (OECD)

4.3.Feature Comparisons

5.Software Selection Guideline

5.1.Hidden Factors for Software Selection

5.1.1.Vendor History & Experience

5.1.2.Cost

5.1.3.Ease of Use/Adoption

5.1.4.Training

5.1.5.Maintenance

5.1.6.Familiarity

5.1.7.Security

5.1.8.Software as a Service (SAAS)

5.2.Major Steps for Selecting the Right SDMS

5.2.1.Needs Analysis

5.2.2.Management Support

5.2.3.Requirements Specification

5.2.4.RFP Preparation

5.2.5.Short List Identification

5.2.6.Demo Preview

5.2.7.System Selection and Contract Negotiation

6.Conclusion

7.References

8.Appendix

8.1.Questionnaire to NSOs

8.2.Questionnaire to Experts

8.3.Questionnaire to Vendors

© 2011 African Centre for Statistics (UNECA) / Page 1

1.Background Information

There is broad consensus among African countries and development partners about the need for better statistics in support of sound policy decision making for the achievement of the internationally agreed upon goals, including the MDGs. Governments of African states realized that the right use of better statistics is essential for good polices and development outcomes. This recognition requires more accurate and timely statistics supported by new information technology environment.

National Statistical Offices (NSOs) in the continent, however, are providing limited statistical products and services in terms of quantity, type and quality, and are, therefore, unable to respond adequately to the increasing demand by their governments and the international community for better development statistics.

One of the recommendations put forward by the Data Management Working Group during the first and second meetings of the Statistical Commission for Africa (StatCom-Africa) is to set up a group of experts made up of Statisticians, data management and Geo-information experts to evaluate the major existing statistical data management platforms and compare their features so that member states and their partners can make informed decisions on the selection of platforms for statistical data collection, production and dissemination. The recommendation was prompted by the plethora of offers for data management platforms that member States receive. Some of these offers are at no or reduced cost as part of assistance projects, while others are at commercial values. Even when there is no financial costs, accepting all offers would result in duplication of efforts with associated wastages of the scarce human capacity, and possibility of data inconsistencies. Such feature comparison will therefore enhance sustainability of information infrastructure and associated tools for effective management and dissemination of statistical data, applications and services.

1.1.Statistical Data

The notion of statistical data represents all facts and estimated values of a certain specific entity. Micro-data, macro-data and metadata are the major types of statistical data.

1.1.1.Micro-data

Micro-data are data about individual object such as person, event, transaction, etc. Every object can be characterized by properties. The values of this properties are considered as micro-data.

1.1.2.Macro-data

Macro-data are estimated values of statistical characteristics concerning sets of objects. A statistical characteristic is a measure that summarizes the values of a certain variable of the objects in a population.

1.1.3.Metadata

Metadata are defined as data about data, and refers to the definitions, descriptions of procedures, methodologies, system parameters, and operational results, which characterize and summarize statistical programs. Metadata are data describing different quality aspects of statistical data, such as contents, describing definitions of objects, populations, variables, etc; accuracy, describing different kinds of deviations between observed/estimated and true values of variables and statistical characteristics; availability, describing which statistical data are available, where they are located, and how they can be accessed. Metadata also might contain description of statistical surveys itself, their content and layout, description of validation, aggregation and reports preparation rules. In other words, metadata can be considered as an entity describing the meaning, accuracy, availability and other important characteristics of the underlying micro- and macro-data. These characteristic features of the underlying data are essential for correctly identifying

and retrieving relevant statistical data for a specific problem as well as for correctlyinterpreting and (re)-using the statistical data.

Metadata may be passive (descriptive) or active (prescriptive). Passive metadata is used as a form of documentation, whereas active metadata is used to determine the actions of automated processes.

Standards are also developed for metadata creation. A metadata standard outlines the properties to be recorded, as well as the values the properties should have. Standardisation of metadata elements makes information sharing more reliable and universal. The use of metadata standards enables producers to describe datasets fully and coherently. The standard also facilitates data discovery, retrieval and use. If a standard is used, finding a specific piece of information in a metadata record will be much easier than if no standard is used. Standards also enable automated searching. When standards are used, computers can be programmed to search and find useful data sets.

1.2.Statistical Database System

A statistical database management system (SDMS) is a database management system that can model, storeand manipulate data in a manner well suited to the needs of users who want to perform statistical analyses on thedata. SDMSs have some special characteristics and requirements that are not supported by existingcommercial database management systems. For example, while basic aggregation operations like SUM and AVGare part of SQL, there is no support for other commonly used operations like variance and co-variance as well as other advanced operations like regression and principal component analysis.

Macro-data management is also entirely different from that of micro-data. Most commercially available database management systems fall short of features to manage macro-data as well as metadata.

1.3.Current Situation Analysis

National, regional and sub-regional statistical offices and organizations often need to compile data from various sources and disseminate to diverse user communities. This need should be a determining factor in choosing a statistical data management platform for such offices and organizations. This presupposes that the officers responsible for the selection have adequate knowledge of the capabilities of the various offerings. This is not always the case and some offices, therefore, end up with systems that may not fully satisfy their needs. Some have implemented multiple systems to get the benefit of the complementarity of systems.

While it is not necessarily wrong to implement multiple systems if the situation so warrants, member States have expressed the need for guidance on the capabilities of the various options to make informed decisions the optimum platform (or platforms) for their particular environments. This handbook is therefore, intended to address this need by providing comparative descriptions of the existing platforms. It also presents guidelines to be followed to select the required platform for the task at hand.

1.4.Organization of this Document

This document is organized as follows. Section 2 presents the project definition where objective, mode of operation and scope are described. Section 3 outlines the critical requirements of a Statistical Data Management System. It is by no standard an exhaustive list of features, but is intended to serve as a reference to provide organizations with information on the issue. Section 4 documents the features of major Statistical Data Management Systems which are currently in use in member states and partner institutions. It is a simple feature-wise documentation of SDM systems which should not be consideredas feature comparison. Section 5, outlines the system selection guidelines by describing the major factors which influence, in both negative and positive ways, the process of SDMS selection. This section also presents the steps to be followed in selecting the right SDMS for an organization. Finally, concluding remarks are presented in Section 6.

2.Project Definition

2.1.Objective

The main objective of this initiative is to produce a publication that provides a comparison of the features of major statistical data management platform to serve as a guide for member States implementing data management services.

2.2.Mode of Operation

In order to achieve the above specified objectiveParticipatory Design (PD) principles were strictly followed in the course of implementation of this initiative. PD is an approach which gives much attention to the active involvement of all stakeholders in the whole implementation process of the initiative. The approach selected is known for it promotes participative communication & learning among stakeholders (system vendors, experts, system users, management, etc.) This approach is also known for reducing last minute surprises by gradually and continuously informing participating individuals in the project.

To this end, the following operations were performed in the course of the initiative:

  • Experts group, comprising of individuals from different countries and institutions, is formed to support the initiative.
  • Online discussion forum was set up to communicate ideas on the issues of selecting the right statistical data management platform that suits the requirement at hand.
  • Experts group meeting was conducted ......
  • Questionnaires were designed and distributed to three different stakeholders, namely: National Statistical Offices, Experts and System Vendors. (See Appendix)
  • Physical observation of a selected site was conducted. It was tried to gauge how comfortable users are in using the system. Other working environment issues in relation to the system use were also taken into consideration.
  • Review of technical specification of selected statistical data management and dissemination platforms was conducted
  • Demonstration of selected statistical data management and dissemination systems was undertaken

In general, quite intensive communications and discussions with all stakeholders were conducted to produce this document. The communication media included online discussion forum, emails, telephone discussion and questionnaires.

2.3.Scope of Work

This initiative focused on macro data management systems, which have been identified as the area of immediate needs by member States. Micro data platforms will be dealt with separately as the needs in the area are different.

The project is also limited to analyze and document major statistical data management platforms which are currently in use in National Statistical Offices (NSOs) of member states and/or partner institutions. Systems deployed elsewhere were not given much attention in this document.

3.Major Requirements of Statistical Databases

3.1.Data Capturing

It is obvious that a Statistical Data Management System should allow users to capture statistical data. The main requirement in this regard is that the system should be suitable to capture all the data the users intend to store. The system should also offer appropriate data entry schemes. Some users might need to compile their data in other software such as MS Excel and need to import into the system in batch mode.

The system is also expected to validate the data at the time of entry. Data validation is a critical feature for SDMSs.

Most commercial word processing packages use ‘AutoText’ which is currently expanded to ‘Building blocks’to facilitate data entry. In word processing context, building blocks are stored snippets that can contain formatted/unformatted text, graphics, and other objects, which can be predefined by the user and can be inserted into a document when needed. Building Block as a concept can also be implemented in SDM systems to improve data entry by speeding up the process and reducing errors.

Pulling data through web services from third-party database systems is also a crucial data capturing feature most SDMSs required to possess.

3.2.Data Storage and Retrieval

Statistical organizations are responsible to collect and store a huge amount of statistical data just to feed the decision makers, researchers and the general public with timely and accurate information. Due to the magnitude of the data maintained and the users’ expectation and understanding for quality data, the process of storage and accessibility need to be supported by a robust statistical database system.

Storage and Retrieval is, therefore, one of the major requirements of any statistical database system. Database systems need to provide and manage storage of huge amount of data in a systematic manner. They should also offer a flexible, intuitive and simple retrieval module which assists decision makers, the general public, and other users with limited system manipulation expertise to access the information from the database.

3.3.Data Processing and Dissemination

Any statistical data management system is expected to perform data processing activities such as coding, editing, and data harmonization to list just a few of them. Once data is processed and required adjustments are entered, the database system should provide a dissemination facility.

Nowadays, Internet technology is the most widely used dissemination media. This technology is composed of a number of functional features:

  • Electronic mail: a common engine for sending electronic messages. It is mostly appropriate periodical reports to selected and predefined user community
  • Websites: used to publish statistical information at a specified location in the Net for the general public. Websites allow features to transport statistical data files in different formats (Excel, pdf, Word, etc. ). Web sites are, increasingly, becoming dissemination channels for statistical data.They offer a simple, comparatively cheap and efficient way to provide timelyinformation to the core users of statistics as well as to a broader audience.

To these end, most statistical database systems possess a facility to publish the information into a web readable format. Accordingly, web publishing is a critical requirement for an SDMS.

3.4.Standard Data Sharing and Exchange

National Statistical Offices face a tremendous pressure of providing reports to other organizations including government offices, international development organizations,and partners. At the same time, NSOs need to capture data from various other sources, including partner institutions, with different formats. It is also abundantly clear that these activities are frequently performed and possess a huge amount of data flow. Keying in such data manually is mostly a resource intensive, tedious and error-prone activity which needs to be reducedto the extent possible.

Synergies, standardization and optimization of the processes and infrastructures are the only solution to this challenge.Standard exchange formats such as SDMX can help byimproving quality and efficiencies in the exchange and dissemination of data and metadata through:

  • harmonization and coherence of data;
  • preservation of meaning by coupling data with metadata that defines and explains itaccurately;
  • open format (XML) rather than a proprietary one;
  • facilitating and standardizing the use of new technologies as XML and Web services. ManyNSOs are already using, or are planning to use, XML as the basis for their data management anddissemination systems. By choosing SDMX one could avoid the proliferation of many XMLgrammars.

3.5.Metadata Management

As presented in the previous sections support to metadata is one of the critical requirements of SDMSs. It is this feature which manages the metadata required for defining the content, quality, security, accessibility and other aspects of the actual database content. The database system, through the metadata management module is expected to present description of data content andlayout, description of validation, aggregation and reports preparation rules.

Metadata is critical because data are only made accessible through their accompanying documentation. Without a description of their various elements, data resources will manifest themselves as more or less meaningless collections of numbers to the end user. The metadata provides the bridge between the producers of data and their users and conveys information that is essential for secondary analysis.

Metadata Standard compliance is also one requirement that SDMSs need to consider. An example metadata standard is DDI (Data Documentation Initiative). DDI is a metadata standard used for documenting datasets developed in European and North American data archives, libraries and official statistics agencies. It is designed to be fully machine readable and machine processable.

3.6.Indicators Management

Statistical indicators are any quantitative data that provide evidence about the quantity, quality or standard of an entity. The following are some examples of indicators from the list proposed by the World Bank (

  • Expenditure per student, primary (% of GDP per capita)
  • Public spending on education, total (% of government expenditure)
  • Expenditure per student, secondary (% of GDP per capita)
  • Pupil-teacher ratio, primary)

In most cases, SDMSs are expected to allow users create new indicators, and manage existing ones. The management might include operations such as categorizing indicators into thematic groups, deleting existing indicators, or any other modifications.

3.7.Integration with Other Systems

In this era of technology, one cannot think that there is only a single software system to manage processes of an organization. For different reasons, most organizations deploy multiple technology solutions through time to manage their day to day activities. Ultimately, however, as those systems are working to realize the vision of a single organization, the need of integration arises. The same requirement might arise with statistical data management system.