/ United Nations Economic Commission for Africa
African Centre for Statistics

Handbook on Major StatisticalData Management Platforms

Addis Ababa
October 2011
© 2011 African Centre for Statistics (UNECA) / Page 1
© 2011 African Centre for Statistics (UNECA) / Page 1

Contents

I.Background Information

1.1.statisticaldata

1.1.1.microdata

1.1.2.macrodata

1.2.Statistical data management system

1.3.justification of the assignment

1.4.organization of this document

II.Project Definition

2.1.Objective

2.2.Mode of Operation

2.3.Scope of Work

III.Major Requirements of Statistical Data Management Systems

3.1.Data Capturing

3.2.Data Storage and Retrieval

3.3.Data Processing and Dissemination

3.4.Standard Data Sharing and Exchange

3.5.Metadata Management

3.6.Indicators Management

3.7.Integration with Other Systems

3.8.Data security

3.8.1.Backup and Restore Features

3.8.2.Access Control

3.8.3.User management

3.8.4.Users and data auditing

3.9.GIS Support

3.10.Reporting Features

3.11.Training

3.12.User Interface

3.13.Alerting Feature

3.14.Analysis Tools

3.15.Scalability

3.16.Extendibility

3.17.System Environment

IV.Available Statistical Data Management Systems

4.1.List of Statistical Data management Systems

4.2.Product Descriptions

4.2.1.CountrySTAT (FAO)

4.2.2.DevInfo (UNICEF)

4.2.3.Eurotrace (Eurostat)

4.2.4.LABORSTA (ILO)

4.2.5.Live database (World Bank)

4.2.6.Nesstar

4.2.7.StatBase (UNECA)

4.2.8.StatWorks (OECD)

4.3.Feature Comparisons

V.Software Selection GuidelineS

5.1.Hidden Factors for Software Selection

5.1.1.Vendor history and experience

5.1.2.Cost

5.1.3.Ease of use/adoption

5.1.4.Maintenance

5.1.5.Familiarity

5.1.6.Security

5.1.7.Software as a service (SaaS)

5.2.Important Steps inSelecting the Right SDMS

5.2.1.Needs Analysis

5.2.2.Management support

5.2.3.Requirements specification

5.2.4.RFP Preparation

5.2.5.Software demonstration

5.2.6.System selection and contract negotiation

VI.Conclusion

VII.Recommendations

ANNEX

1.Questionnaire to NSOs

2.Questionnaire to Experts

3.Questionnaire to Vendors

References

© 2011 African Centre for Statistics (UNECA) / Page 1

I.BACKGROUND INFORMATION

1.There is broad consensus among African Governments and development partners about the need for better statistics in support of sound policy formulation for the achievement of internationally-agreed goals, including the Millennium Development Goals (MDGs). Governments of African States realize that the right use of better statistics is essential for good policies and development outcomes. This recognition requires more accurate and timely statistics supported by a robust and integrated information technology environment.

2.National statistical offices (NSOs) on the continent, however, are providing limited statistical products and services in terms of quantity, type and quality, and are therefore unable to respond adequately to the increasing demand by their Governments and the international community for better development statistics.

3.One of the recommendations put forward by the Data Management Working Group during the first and second meetings of the Statistical Commission for Africa was to set up a group of experts made up of statisticians, and data management and geo-information experts, to evaluate the major statistical data management platforms available and compare their features so that member States and their partners can make informed decisions on the selection of platforms for statistical data collection, production and dissemination. The recommendation was prompted by the plethora of offers of data management platforms that member States receive. Some of these offers are at no or reduced cost as part of assistance projects, while others are at commercial values. Even when there are no financial costs, accepting all offers would result in duplication of efforts with associated wastage of scarce human capacity, and the possibility of data inconsistencies. Feature documentation of such statistical data management systems as well as selection guidelines or a handbook will, thereforefacilitate the right platform selection to enhance the sustainability of information infrastructures and associated tools for the effective management and dissemination of statistical data, applications and services.

1.1.Statistical data

4.The notion of statistical data encompasses all the facts and estimated values of a certain specific entity. In the context of this handbook, “statistical data” refers to sequencesof observations or estimated values of social, economic, political and environmental entities. Although there are various ways of classifying and differentiating statistical data, micro- and macrodata are worth mentioning in order to understand the scope of this assignment.

1.1.1.Microdata

5.Microdata are data about individual objects such as a person, event, transaction, etc.Every object can be characterized by properties. The values of these properties are considered as microdata. In microdatasets, each row typically represents an individual object and each column an attribute or characteristic feature of the object.Microdata are often collected from each object through a survey or individual measurement.

1.1.2.Macrodata

6.Macrodata are estimated values of statistical characteristics of sets of objects. Macrodata can be generated by combining, aggregating, or summarizing microdata or by direct observation and estimation of a group of entities.Macrodatacomprise files containing tabulations, counts and frequencies.

1.2.Statistical Data Management Systems

7.A statistical data management system (SDMS) is a system that can model, storeand manipulate data in a manner well suited to the needs of users who want to perform statistical analyses on thedata. SDMSs offer process-oriented feature sets which help users traverse from data capture through the process of statistical data validation and production and information dissemination.

8.Statistical data analysis functionalities, including data validation, standardization support, metadata management and indicator management, are some of the core features of SDMSs which differentiate them from ordinary database systems.

9.Statistical data management systems are expected to:

(a)Increase the quality of the statistical information produced;

(b)Improve processes of statistical data analysis; and

(c)Modernize and increase the quality of data dissemination.

1.3.Justification of the assignment

10.National, regional and subregional statistical offices and organizations often need to compile data from various sources and disseminate the data to diverse user communities. That need should be a determining factor in choosing a statistical data management platform for such offices and organizations, which presupposes that the officers responsible for the selection have adequate knowledge of the capabilities of the various offerings. That is not always the case and some offices, therefore, end up with systems that may not fully satisfy their needs. Some have implemented multiple systems to benefit fromsystem complementarity.

11.While it is not necessarily wrong to implement multiple systems if the situation warrants it, member States have expressed the need for guidance on the capabilities of the various options to make informed decisions with regard to the optimum platform (or platforms) for their particular environments. This handbook is therefore intended to address this need by documenting feature descriptions of the existing platforms. It also presents guidelines to be followed in selecting the required platform for the task at hand.

1.4.Organization of this document

12.This document is organized as follows. Section 2 presents the project definition where objective, mode of operation and scope are described. Section 3 outlines the critical requirements of astatistical data management system. This is by no means an exhaustive list of features, but is intended to serve as a reference fororganizations.Section 4 documents the features of major statistical data management systems which are currently in use in member States and partner institutions. This is simple feature documentation of SDMSs which should not be considered as a feature comparison. Section 5 outlines the system selection guidelines and describes the major factors which influence the process of SDMS selection. This section also presents the steps to be followed in selecting the right SDMS for an organization. Finally, concluding remarks and recommendations are presented in Sections 6 and 7 respectively.

II.PROJECT DEFINITION

2.1.Objective

13.The main objective of this initiative is to produce a publication that documents characteristic features of major statistical data management platforms to serve as a guide for member States implementing data management services.

2.2.Mode of operation

14.In order to achieve the above objective, participatory design principles were strictly followed in the implementation of this initiative. Participatory design is an approach which gives much attention to the active involvement of all stakeholders in the whole implementation process of an initiative. The approach promotes participative communication and learning among stakeholders (including system vendors, experts, system users, management) andis also known for reducing last minute surprises by gradually and continuously informing participating individuals involved in the project.

15.To that end, the following operations were performed in the course of the initiative:

(a)An expert group, comprising individuals from different countries and institutions, was formed to support the initiative;

(b)An online discussion forum was set up to communicate ideas around selecting asuitable statistical data management platform;

(c)An expertgroup meetingwas held and valuable feedback and suggestions on the draft handbook were forwarded after the discussions;

(d)Questionnaires were designed and distributed to three different types of stakeholders, namely: national statistical offices, experts and system vendors; (see attached)

(e)Physical observation of a selected site was conducted. This was to gauge how comfortable users were in using the system. Other working environments for the system were also taken into consideration;

(f)A review of technical specifications for selected statistical data management and dissemination platforms was conducted; and

(g)Demonstrations of selected statistical data management and dissemination systems were undertaken.

16.In general, intensive communications and discussions with all stakeholders were conducted to produce this document, including viaan online discussion forum, emails, telephone discussions and the distribution of questionnaires.

2.3.Scope of work

17.This initiative focused on macrodata management systems, identified as the area of immediate need by member States. Microdata management platforms will be dealt with separately as the needs in that area are different.

18.The project is also limited to analysing and documenting statistical data management platforms which are currently in use in the national statistical offices of member States and/or partner institutions. Systems deployed elsewhere are not given much attention in this document.

III.MAJOR REQUIREMENTS OF STATISCAL DATA MANAGEMENT SYSTEMS

3.1.Data capture

19.It is obvious that astatistical data management system should allow users to capture statistical data. The main requirement of the system is to capture all the data the users intend to store. The system should also offer appropriate data entry schemes. Some users might need to compile their data in other software such as MS Excel and need to import this into the system in batch mode.

20.The system is also expected to validate the data at the time of entry. Data validation is a critical feature for SDMSs.

21.Most commercial word processing packages use AutoText which is currently expanded to Building Blocks to facilitate data entry. In the word processing context, building blocks are stored snippets that can contain formatted/unformatted text, graphics, and other objects, which can be defined and inserted by the user into a document when needed. Building Blocks as a concept can also be implemented in SDMSs to improve data entry by speeding up the process and reducing errors.

22.Pulling data through web services from third-party database systems is also a crucial data capture feature that most SDMSs are required to possess.

3.2.Data storage and retrieval

23.Statistical organizations are responsible for collecting and storing a huge amount of statistical data just to feed the decision makers, researchers and the general public with timely and accurate information. Due to the magnitude of the amount of data maintained and the users’ expectations and demands for quality data, the processes of storage and providing access need to be supported by a robust statistical database system.

24.Storage and retrieval is, therefore, one of the major requirements of any statistical database system. Database systems need to store huge amounts of data in a systematic manner. They should also offer a flexible, intuitive and simple retrieval module which assists decision makers, the general public, and other users with limited system manipulation expertise to access the information from the database.

3.3.Data processing and dissemination

25.Any statistical data management system is expected to perform data processing activities such as coding, editing, and data harmonization to list just a few. Once data is processed and the required adjustments are made, the database system should provide a dissemination facility.

26.Nowadays, the Internet is the most widely used dissemination medium. This technology is composed of a number of functional features:

(a)Electronic mail serves as a common platform for sending electronic messages. It is mostly appropriate for periodical reports to a selected and predefined user community;

(b)Websites are used to publish statistical information at a specified location on the Internet for the general public; and

(c)Websites also furnish features that help transport statistical data files in different formats (Excel, PDF, Word, etc.). They are, increasingly, becoming dissemination channels for statistical data.They offer a simple, comparatively cheap and efficient way to provide timelyinformation to the core users of statistics as well as to a broader audience.

27.Most statistical database systems therefore possess a facility to publish information in a web readable format. Accordingly, web publishing capability is a critical SDMS requirement.

3.4.Standard data sharing and exchange

28.National statistical offices face tremendous pressure to provide reports to other organizations including Government offices, international development organizations, and partners. At the same time, NSOs need to capture data from various sources, including partner institutions, with different formats. It is also abundantly clear that these activities are performed frequently and entail a huge amount of data flow. Keying in such data manually is mostly a resource-intensive, tedious and error-prone activity which needs to be reduced as far as possible.

29.Synergies, standardization and optimization of processes and infrastructures are the only solution to this challenge. Standard exchange formats such as Statistical Data and Metadata Exchange (SDMX) can help by improving quality and efficiencies in the exchange and dissemination of data and metadata through:

(a)Harmonization and coherence of data;

(b)Preservation of meaning by coupling data with metadata that defines and explains itaccurately;

(c)Use of an open format such as XML rather than a proprietary one; and

(d)Facilitating and standardizing the use of new technologies such as XML and Web services. ManyNSOs are already using, or are planning to use, XML as the basis for their data management anddissemination systems. By choosing SDMX, the proliferation of many XMLgrammars could be avoided.

3.5.Metadata management

30.Metadata are defined as data about data, and refer to the definitions, descriptions of procedures, methodologies, system parameters and operational resultsthat characterize and summarize statistical data. Metadata are data describing different quality aspects of statistical data, such as file contents, and definitions of objects, populations, variables, etc. This includes details on data accuracy, for example descriptions of the differences between the observed/estimated and true values of variables and statistical characteristics. Metadata can include information on which statistical data are available, where they are located, and how they can be accessed. Metadata also might contain a description of the content and layout, and a description of validation, aggregation and reports preparation rules. In other words, metadata can be considered as an entity describing the meaning, accuracy, availability and other important characteristics of the underlying data. These characteristic features of the underlying data are essential for correctly identifyingand retrieving relevant statistical data for a specific problem as well as for correctly interpreting and reusing the data.

31.Metadata is critical because data are only made accessible through their accompanying documentation. Without a description of their various elements, data resources will manifest themselves to the end user as more or less meaningless collections of numbers. The metadata provides the bridge between the producers of data and their users and conveys information that is essential for secondary analysis.

32.As metadata is critical, metadata management is one of the core requirements of SDMSs. It is this feature which manages the metadata required for defining the content, quality, security, accessibility and other aspects of the actual database. The system, through the metadata management module, is expected to present a description of data content andlayout, as well as a description of validation, aggregation and reports preparation rules.

33.Currently, standardization of metadata elements makes information sharing more reliable and universal. The use of metadata standards enables producers to describe data sets fully and coherently. They also facilitate data discovery, retrieval and use. The Data Documentation Initiative is an example of a metadata standard which is used for documenting data sets and designed to be fully machine readable and machine processable.

Metadata standard compliance is another critical requirement that a SDMS needs to demonstrate.

3.6.Indicators management

34.Statistical indicators are any quantitative data that provide evidence about the quantity, quality or standard of an entity. The following are some examples of indicators collected by the World Bank (

(a)Expenditure per student, primary (% of GDP per capita);

(b)Public spending on education, total (% of government expenditure);

(c)Expenditure per student, secondary (% of GDP per capita); and

(d)Pupil-teacher ratio (primary)

35.In most cases, SDMSs should enable users to create new indicators and manage existing ones. The management might include operations such as categorizing indicators into thematic groups, deleting existing indicators, or any other modifications.

3.7. Integration with other systems

36.In this era of technology, it cannot be thought that there is only a single software system to manage processes of an organization. For different reasons, most organizations deploy multiple technology solutions through time to manage their day-to-day activities. Ultimately, however, as those systems are working to realize the vision of a single organization, the need for integration arises. The same requirement might arise with statistical data management systems.