Statistics Netherlands

Division of Methodology and Quality

P.O.Box 24500

2490 HADen Haag

The Netherlands

Standardisation and Differentiation in Redesigns

Focussing on Redesign Approach,
Standard Toolbox and Introduction of R

Frank Hofman and Mark van der Loo

Standardisation and Differentiation in Redesigns

Summary:Statistics Netherlands (SN) faces a number of major challenges: improving efficiency and the quality of key statistics, while lowering the administrative burden at the same time. One of the measures to meet those challenges is to promote the reuse of both data, methodology, processes and IT systems and to implement standards to support the intended reuse.

SN has developed a number of standards, of which this paper focuses on the Redesign Approach and the Standard Toolbox. The Redesign Approach provides a standard way of working in redesign projects. This helps projects to start up quickly and supports the portability of the documents produced and the interchangeability of the project staff.

The Standard Toolbox provides a set of tools for the small, ‘self-supporting’ projects to automate their statistical process. The aim is to provide maximum functionality with a minimal set of tools, so costs can be reduced, integration is easier and the knowledge becomes more broadly available.

A recent addition to the Standard Toolbox is R, a powerful statistical tool for self-supporting projects. Using so called packages R facilitates reuse: a package provides a specific function and can be incorporated by any other user.

Keywords: Redesign Approach, Standard Toolbox, Tools, R, Standards, BAD, MAD, SAD, self-supporting

1.Introduction

This paper first gives an overview of some forms of standardization. Then it discusses two of these forms: the redesign approach and standard toolbox. The introduction of the tool R is given as a case study for the introduction of new tools. Finally we attempt to initiate further discussion with a number of statements and discussion points.

1.1Context

Statistics Netherlands is in the middle of radical changes. A number of very heterogeneous driving forces pose major challenges that can only be met when the way the institute operates is thoroughly reconsidered. The main challenges are to improve efficiency and quality of key statistics, while at the same time lowering the administrative burden considerably.

In order to stay in control of these challenges, an ambitious modernisation programme, the Master plan ‘Counting on Statistics’ (Ypma and Zeelenberg, 2007) has been started in 2005. An important measure in this Master plan is to promote reuse of both (meta)data, methods, processes and IT systems (Braaksma, 2009). Reuse of data lowers the administrative burden and improves both efficiency and quality (e.g. coherence). Reuse of methods, processes and tools mainly contributes to the efficiency, but also to the quality of the output.

1.2Standardisation

Standardisation supports the reuse of both (meta)data, methods, processes and IT systems. Applying standards simplifies the comparison two instances e.g. two processes, and thereby helps recognizing potential reuse.

Table 1 shows for each of the aspects mentioned the different means for the standardisation SN has developed.

Table 1: Means of standardisation

Aspect and Means / Explanation
Data
  • Steady States
/ Steady states are a key element in SN’s architecture. A steady state is a data setand its metadata with guaranteed quality that is made available for reuse through the Data Service Centre. The steady states do not only describe the output products, but explicitly the raw input and intermediate products as well.
  • Data Service Centre (DSC)
/ The DSC is the generic business service to store all steady states to make them available for reuse by other statistical processes. The DSC is the treasury of SN (ultimately) containing all steady states, both raw input data, intermediate products and statistical output.
  • Coordination
/ Coordination aims to standardise classifications (code lists), populations, the definitions of variables etc.
Methodology
  • Methods Series
/ The Methods Series consists of scientifically based, well documented and proven methods, which are the preferred methods to use in a specific statistical process.[1]
Processes
  • Business Architecture
/ The Business Architecture consists of a number of principles and models guiding the design of a specific statistical process. E.g. it introduces the steady states and demands to design them for each statistical process. Another example is the mandatory use of the Generic Business Services. See Huigen (2006) and Renssen (2007).
  • Generic Business Services
/ The Generic Business Services provide common functions that do not require specific statistical knowledge. The first service, the Data Service Centre has already been addressed. The other formal service is the Data Collection Centre where ultimately all primary data (collected by questionnaires) and secondary data (from registers) are collected.
  • Standard Process Steps
/ This research project, that has just recently started, aims to provide standard building blocks which can be used to construct a specific statistical process. A process designer can than choose the building blocks he needs and puts them together to form his process. Examples of Standard Process Steps are data verification and unit matching.
The Standard Process Steps bridge the gap between standard methods and standard IT components (tools). See Renssen et al (2010).
IT systems
  • IT Architecture
/ As the Business Architecture guides the design of processes, the IT Architecture provides principles to guide the use and development of IT systems to support the statistical process.
  • Standard Toolbox
/ The Standard Toolbox provides a list of tools that can be used for smaller projects in order to build the IT system to support their processes. In section 3 we will elaborate upon the standard toolbox.
Way of working
  • Project management, Software development and Testing
/ SN has adopted a number of industry standards for non-statistical processes: Prince for project management, Rational Unified Process (RUP) for software development and TMap for testing.
  • Redesign approach
/ The redesign approach supports the many redesign projects with a standardised way of working. It combines the adopted industry standards and adds some aspects that are specific for a statistical institute. Section 2 discusses the redesign approach in more detail.

1.3Differentiation

In this paper we focus on the redesign projects during which the (meta)data, methods, process and/or IT systems of a specific statistical process are redesigned in accordance with the standards mentioned in section 1.2. Since these redesign projects vary considerably, itwould be unwise to blindly make the same standards compulsory for all kinds of projects.

We therefore distinguish between two kinds of projects: A projects and C projects, generally spoken of as small and large projects. Although the informal names may suggest otherwise, this distinction is not only based on the size of the project, but on a combination of aspects:

  • Size of the project: the amount of work needed for the project
  • Size of the organisational unit responsible for the statistical output
  • Complexity of the process, methodology and IT systems
  • Importance of the statistical output ergo the impact of mistakes in the statistical output.

There is no formula to determine whether a project is to be classified as a small or large project. E.g. a project may be small in size, but involve very complex methodology. It will then probably be classified as a C (large) project. A central board determines a projects classification weighing the scores on each of these aspects.

Differentiation in Standards

The general idea for the small projects is that the responsible organisational units can support themselves. They are supposed to be self-supporting. The combination of their statistical knowledge, general skills and the available standards for methodology, processes and IT systems are sufficient to successfully implement the needed changes.

The main differences in the standards for large and small projects concern those for the IT systems and for the way of working. Concerning data, methodology and processes the same standards apply for both small and large projects.

For small projects, supporting IT systems may only be developed using the standard tools. Whereas for large projects the IT systems are being developed by the central IT department and contain largely complex custom built components.

As mentioned before, the small projects are staffed locally, with mainly statisticians. Other competences, like methodologists, business analysts or tool experts, are available, but the majority of the work is executed by local staff. However, large projects are mainly staffed by central competences, of course in combination with local statisticians for the necessary specific statistical knowledge.

Since small projects are staffed locally they are also prioritised and approved by local management,whereas a central board prioritises and approves the large projects.

The final differentiation in the way of working is the required documentation. Large projects are required to provide a full set of design documents, whereas smaller projects may limit themselves to a smaller set of documents. Section 2.3 discusses this differentiation in more detail.

2.Standard Redesign Approach

2.1The Redesign Process

The redesign process has been developed to support redesign projects with a standard way of working.Before that, each project had to determine its own approach, especially for the part of the process prior to the system development. This was not only inefficient for the project, but also limited the portability of documents and the interchangeability of the project staff.

The redesign process gives a number of related activities that need to be carried out during an average redesign project. Each activity is assigned to a specific role (indicating a set of competences) and produces (a part of) a document (or other artefact).

The redesign process incorporates some well known standards[2] for parts of theredesign process and adds some parts that are more specific for a statistical institute. In this paper we focus on those non-standard parts of the redesign process.Figure 1 shows the redesign process, although one key element is omitted: the statistician whose specific knowledge of his statistic domain is vital for (almost) all activities shown. Although the sequence of activities may suggest a strict sequential order, they may also occur in parallel or iterative.In this section we will examine each sub process of the redesign process.

For a more detailed description of the redesign approach, the redesign documents and some reflections on the approach see Hofman (2011) or Hofman and Leerintveld (2010).

Preliminary Investigation & Project Setup

For each (redesign) project the project manager has to draw up a project proposal, describing, among other things, the business case and goals. Since SN uses Prince2 for project management, the project proposal is described in a project brief.

a project proposal is usually preceded by a preliminary investigation. The investigation usually starts with the as-is situation, briefly describing the current process to indicate problems and/or desired changes. During the investigation alternatives for the future situation are examined. Balancing the pros and cons of each alternative, results in a choice by the steering committee setting the course for the actual redesign project. A business analyst is responsible for the investigation and captures the results in a document ‘Preliminary Investigation’.


Figure 1: Redesign process

Statistical Design

Since this part of the process is most specific for a national statistical institute (NSI), we discuss the statistical design in more detail than the other steps. The enterprise architecture of SN identifies five steps in (re)designing a statistic, of which the first four are compressed into the first step (Design products & process) in Figure 1:

  1. Determine statistical information needs

As a basic principle, the design process is output-driven. So we start by determining the statistical information needs. During this step, we identify our customers and their needs.

  1. Design statistical product

Knowing the needs of our customers, we can now design the steady states for the actual statistical products. We determine the table(s) to be produced, the population, variables and aggregation levels as well as the quality metadata (indicators and standards) and the publication frequency of the statistic.

  1. Design data sources

Knowing what the statistical process has to deliver, we can now turn to the means of achieving the desired output. This means we have to design the steady states for the input data sources needed to produce the output. The description of the input data sources is comparable to that of the output: conceptual metadata and the quality-metadata.

  1. Design process model

In the process model, we describe the steps (activities) to be completed to produce the statistical product (output) from the data sources (input), incorporating the generic business services. From each step we describe goal, input and output. The input may consist of auxiliary information, for example calculated the weights needed for “grossing up” sample survey results to make them representative of the target population.

The flow through all steps is as important as the steps themselves, especially when the flow is complex, containing branches and loops. The criteria used for decisions, like the stopcriteria of loops, are often derived from the quality needs of the statistical product.

  1. Design methodology

The final step in designing a statistic is choosing the suitable methodology from the Methods Series, to produce the output at the required quality from the selected input. In practice, the main methodology is often explored before the process model is designed. Then, in iterations, the methodology and process model are elaborated upon.

Just like the entire process, these steps are generally more iterative and parallel than strictly sequential. The first four steps are conducted by a Business Analyst, capturing the results in the Business Analysis Document (BAD). A methodologist is responsible for the last step which results in a Methodological Advisory Document (MAD).

Software Development

To go from statistical design to information system, SN combines the statistical design process with the RUP as system development process. In this paper, we will briefly describe the requirements discipline and the software architecture part of the analysis and design discipline.

The requirements discipline is crucial for a smooth transition from statistical design to system development. The information analyst captures the system requirements in the Vision document and elaborates the systems usage in the Use Case Model Survey. The detailed description of the system's function is captured in several Use Cases.

The software architectures of the individual projects play a crucial role in managing the overall IT landscape. An important guiding principle is the reuse of existing software within SN, before buying standard software, before developing bespoke software. The software architect describes the projects software architecture in a Software Architecture Document (SAD).

Central Approval Compliance Testing

All documents are reviewed within the redesign project. Additionally, some documents have to be reviewed by a central authority. The project proposal is reviewed by the 'Central Portfolio Board'. The board checks the legitimacy of the business case and prioritises the projects.

The Business Analysis Document (BAD), the Methodological Advisory Document (MAD) and the Software Architecture Document (SAD) are all reviewed for compliance with the enterprise architecture of SN and the Methods Series.

2.2Main Redesign Documents

In the previous section, we have shown which design steps contribute to which documents. To support the redesign projects templates for each of these documents are available. In this section, we will focus on the main documents themselves: the BAD, the MAD and the SAD.

Business Analysis Document (BAD)

The BAD captures the design of the statistical products en process. The main topics in the BAD are:

  • Context

The context describes the environment of the statistical department. It contains the external and internal customers of the statistic and their needs as well as the suppliers of the data sources.

  • Steady states

This section describes all steady data sets involved in the process. Not only the output, but also the data source and the intermediary products (at micro or macro level) are given. For each steady state, it’s conceptual and quality metadata are explored, like its population, variables,frequency and quality.

  • Process model

The process model provides the overview of the entire process and its relation to the data sets. For the exact functioning of each methodological step within the process references are made to the MAD.

Methodological Advisory Document (MAD)

The methodology is captured in the MAD.The methodology in the MAD covers the entire statistical process from data collection to the dissemination of the final product and defines the actual operation of each step of the process.The MAD not only states the methods chosen, but also why they have been selected and how they are applied: the parameterisation.

Software Architecture Document (SAD)

In agreement withthe RUP, we use Kruchten's[3] 4+1 view model for the SAD. Special attention is paid to the reuse of existing tools, both standard tools and custom made tools.

2.3Differentiation in Redesign Approach

Small projects do not have to deliver the full set of design documents. If deemed appropriate they may compress the set of documentation by applying one or more of the adjustments:

  • The preliminary investigation may be skipped if the problems and the desired solution are clear.
  • The BAD, MAD and Vision may be combined into a single document containing all functional aspects of the new situation. The main topics of each original document are retained, but compressed into a single document.
  • Use Cases are optional in small projects. Sometimes the systems functionality is simple and limited and can therefore be described in more elaborated Features in the Vision document.

The other documents are used for both types of projects, although they will generally be more compact for small projects.