EUROSTAT
Deputy Director-General
Task Force Big Data
Draft
Smart Statistics
The members of the DIME/ITDG are invited to comment on this document and provide advice on all strategic aspects of designing, testing, experimenting, prototyping, evaluating and mainstreaming the development of 'trusted' smart statistics. While the initial phase aims at developing a 'proof of concept' for smart statistics, the members of the DIME/ITDG are welcome to propose already at this stage 'use cases' for experimentation.
In this domain, the focus of our experimental research in the next few years should be on the statistical issues rather than whether the technological, legal, or data –access challenges have been resolved or not. When we see smart statistics in the context of an IoT-enabled world, the cognitive process is a fundamental process of how these smart statistics are created. We assume a world where the presence of sensors is ubiquitous while their interconnection is seamless (the technical details of making this happen is fundamental and critical, of course, but should remain out of the scope of study for a statistician).
In a network of seamlessly interconnected sensors (apparently with certain limitations in power consumption ), we foresee the realisation of a cognitive process where each sensor on its own, but mainly all sensors as a network, can perceive current social/economic conditions, plan, decide, act on those conditions, learn from the consequences of their actions, all while following end-to-end (policy) goals. This loop, the cognition loop, senses the environment, plans actions according to input from sensors and policies, decides which scenario fits best its purpose using a reasoning engine, and finally acts on the chosen scenario. The system learns from the past (situations, plans, decisions, actions) and uses this knowledge to improve the decisions in the future.
The above is a 'high level' simplified narrative which can be used as a starting background point for our discussion; such a discussion could include (but not limited to) the following questions:
- What are the basic components of a system(s) of smart statistics?
- Which are the major statistical issues to address?
- How should a 'Proof of Concept' look like in practice?
- Do we need to develop new data and metadata standards together with IoT Standardisation bodies like ETSI?
- Should we think of 'privacy by design' already at the development of a POC?
- Whom we should partner with in this path towards smart statistics?
- What could be the preferred scenario for implementation?
- Do smart statistics represent new opportunities and challenges for citizen science and citizen data?
- Can you propose some promising use cases for experimentation of smart statistics?
- What are the most likely statistical domains for experimentation?
1. Background
Our societies and economies are facing an ever increasing degree of digitization that offers unprecedented opportunities for compiling statistics on various domains. The VIP Big Data addresses these opportunities together with challenges emerging from the use of big data sources for official statistics. However, the impact of these innovative technological developments on societies and economies will be by far more fundamental. This development can be expressed by the term "Datafication", which means "taking all aspects of life and turning them into data" (Cukier & Mayer-Schoenberger, 2013)[1]. Smart devices, electronic networks and constant production of data on all aspects of life and the environment will become an integrative component of how our societies and economies will function. Most if not all data in a decade from now will be "organic", i.e. by-products from activities of people, systems and things (including billions of low-end and affordable smart devices connected to the internet, i.e. the Internet of Things (IoT)).
During the coming decade, we will see a massive proliferation of electronic devices and sensors that are connected to the internet. They will generate and communicate huge amounts of data via this network. According to a study by the EC, IDC expects that the number of IoT[2] connections will expand from 1.8 million in 2013 to almost 6 billion in 2020[3]. An entire ecosystem will be developed around the Internet of Things affecting the physical, the digital and the virtual world.
–Figure 1: Internet of Things: knowledge integration[4]
The fundamental question for us is how an official statistics organisation should adapt in order to remain relevant within such a data ecosystem? As such, this proposal covers the entire sphere of policy relevant statistics and issues associated with a full-fledged digital society. The use cases could address selective issues from the priorities of the Digital Single Market, Energy Union and Climate, and the Internal Market. Together with achieving a digital single market, the conditions for monitoring this market could be developed. In the near future, smart energy grids together with smart meters will be introduced in order to successfully manage the change to climate friendly energy. This provides opportunities for integrated data productions for the energy sector as well as for the industry and social and environment domain. Datafication of business processes and business relations could contribute substantially to monitoring the development of the internal market.
In addition, the introduction of smart appliances could be widely used to support collection of data on households and individuals. Similar opportunities will emerge for a wide range of policy areas where official statistics need to be re-thought and re-designed.
–Figure 2: Internet of Things: application areas and integration[5]
The project extends the current work on big data in so far as it is considering the entire ecosystem as an opportunity for intelligent production of relevant data for European Statistics. While the big data project follows a centric approach accessing data in places where they are collected, this project analyses the conditions for using the network and its components for producing relevant statistics, largely instantly and in an automated way.
We can think of Smart Statistics as being the future system of official statistics, where data capturing, processing and analysis will be embedded in the system itself, starting with the digital footprints of the activity. The realisation of such a system will probably go through a very long path. This large-scale study, which will be carried out by means of a competitive tender procedure, constitutes a first input in that ambitious pathway. In that respect, and in the absence of any prior similar research in this area, it is important to underline the enormous exploratory nature of this work.
The project should be seen as an integral operational activity of the ESS Vision and the modernisation of the statistical system, while at the same time it tries to sketch ways of extensions in the future, i.e. beyond 2020. It is also aligned to current initiatives on data, information and knowledge management within the Commission, as well as to the data4policy activities (by which, a number of lead DGs, including ESTAT, are committed into building a permanent analytics capability in the Commission for the purposes of designing, implementing, monitoring and evaluating European policies). In this regard, Smart Statistics will provide substantive inputs on a variety of important aspects related to the functioning of data4policy in an ever increasing digitisation and automation of everyday activities. The lead DGs (and other key policy DGs) will be closely consulted in the definition of application areas and policy priorities which will be included in the technical specifications.
2.Scope and Objectives
2.1Scope
The underlying reference framework of the proposal consists of the ESS Vision 2020 overarching aim of building the future of European Statistics and in particular the harnessing of new data sources. At the same time, the project also addresses other priorities and critical issues which emerge at the intersection of physical and digital worlds such as privacy, ethics, algorithmic transparency, quality, etc. Use cases will be selected in close consultation with users at policy level following the priorities of the Commission. Byputting intelligence to all stages of the data lifecycle it is expected to enhance the efficiency of the entire statistical system and enable the ESS to maintain and reinforce its role as a key provider of data4policy in a digital world.
The project will concentrate on the development of the Internet of Things and its various applications, e.g.
- mobility,
- smart cities,
- smart homes,
- wearables,
- smart energy,
- smart manufacturing
- smart energy grids,
- smart farming,
- …
as well as on infrastructural aspects (cloud infrastructures, intelligent gateways, mobile phones, and other controlling devices). In addition, it will analyse horizontal issues, such as security, privacy, algorithmic transparency, or IT.
2.2Aims and Objectives
The aim of this project is to analyse the technological foundations of a future statistical information system (say 10 years from now), which will operate within an enlarged digitisation of our societies and greater interactions between the web, the web of data and a multitude of smart environments based on the Internet of Things, such as smart cities and Industry 4.0).
The analysis will be done following different dimensions. One dimension will cover architectural aspects. IoT devices, gateways, the network, cloud infrastructures and the business premises could take different roles and perform different activities depending on the purpose of the system, the type of data collected, or the phenomenon to be observed. Data is collected and aggregated by different components of the network. Depending on the amount of intelligence put into these components, decisions may be taken by them autonomously on how to interpret collected data and whether or not to invoke certain actions. In this context data collection and aggregation follows specific objectives. The study will investigate and analyse the possible role of European statistics in such systems. This may include (but it is not limited to) (i) using of third party systems (public or private) under specific agreements or legislation and (ii) developing entirely new data gathering approaches managed by the statistical system itself. A possible scenario for the future could be that statistical offices deploy sensors and devices exclusively for the purpose of generating statistical information. Another scenario would be using existing IoT networks that are deployed for other purposes than statistics in order to extract statistical information. The first approach would correspond to running surveys for statistical purposes while the latter would fit to using secondary data sources, such as administrative or big data. Mixed scenarios are also possible.
–Figure 3: IoT Architecture
Source: Gartner (September 2014)
The second dimension consists of examining the implications for official statistics by analysing the issues in terms of type of application domain, e.g.
- mobility,
- smart cities,
- smart homes,
- wearables,
- smart energy,
- smart manufacturing
- smart energy grids,
- smart farming, etc.
Use cases should cover a broad range of the above mentioned application domains.
The third dimension is closely related to the latter, i.e. analysing by policy area. There will be a mapping between policy areas and application domains to identify possible inputs for European policies specifically those belonging to the stated priority areas of the EU.
The main output of the project will be a proof of concept of the implications of the Internet of Things for Official Statistics. The project should analyse latest developments and future perspectives in various domains that will be affected by the IoT and other smart environments, such as energy, transport, employment, health, population, economics, consumption, etc.
It will elaborate and propose a number of use cases that could be followed up in future projects. Wherever possible, the use cases will be mapped against the 10 priorities of the European Commission and their potential for providing (near real time) policy relevant information. Additional criteria, such as delivery of tangible intermediate results could be added.
Particular attention will be paid to issues regarding the potential blending of traditional core survey instruments and sources (LFS, SILC, Health, etc) with data from electronic networks; another priority in this project consists in the design of a comprehensive data and information ecosystem in which statistics, visual analytics and dissemination services for EU policies form integral components of the system.
The project comprises an analysis of IoT and smart systems developments and possible application domains that could serve different policy areas. It requires an extensive consultation of stakeholders involved into the development, deployment, and use of smart systems with official statistics. The final output should include detailed analysis of the opportunities, obstacles, anticipated benefits and costs that occur when embedding statistical intelligence into smart systems. It is expected that this project will serve as input to a possible next phase of pilots following the definition of the use cases.
3. Impact Assessment
3.1Stakeholder Analysis
The national statistical offices, the ESS and statistical organisations at UN level are already engaged in projects using big data sources for official statistics. The ESS launched the VIP BIGD in Sep 2015. The UN Global Working Group on Big Data for official statistics was created in 2014. Reasons for engaging in modernisation of official statistics in this area are enlarging the pool of data sources for producing statistics, while facing decreasing willingness of citizens and businesses to reply to questionnaires. At the same time the forming of a digital society and economy changes the phenomenon that should be measured. Additional objectives for engaging into the project are potential for decreasing burden on respondents and increasing efficiency in data production. Considering the characteristics of the data collections systems, quality elements, such as timeliness or relevance could be enhanced.
One of the priority areas of the European Commission is the digital single market. The communication of the European Commission “Towards a thriving data-driven economy” is sketching the features of the future data-driven economy and sets out fields of activity to support the transition to this future economy. In April 2016 the Communication "Digitising European Industry"[6] (COM(2016) 180) was published. It is accompanied by a staff working document "Advancing the Internet of Things in Europe"[7] that analyses three main pillars for achieving the objective of advancing Europe as leading region in IoT products and services. The currently being prepared Commission Communication on data, information and knowledge management within the Commission calls for sharing data, information and knowledge widely within the Commission and for collaborative working practices as preferred working method. One of the aims of the Communication is also to improve availability and use of data and information for better policy making.
The industry is developing the components of IoT systems and is offering services related to running IoT systems, processing data and doing analysis. Intentions for developing such systems are e.g. preventive maintenance, offering better services in the use of machines and devices related to mobility or health, increasing the efficiency of systems, or reducing impact of services on the environment. Standardisation activities are currently ongoing to ensure communication between the different components of IoT networks. There are currently different initiatives to developing operating systems suitable for the IoT and its different components.
Private entities are important partners in developing components for a statistical system, especially in terms of software development. It is also important to identify win-win situations in cases where statistical offices are using third party infrastructure for the purpose of collection statistical information. In addition, statistical offices could deploy IoT systems for their own purposes (partially) replacing traditional data collection methods.
At the same time enterprises could profit from introducing statistical components into IoT systems by lowering the use of resources for reporting to statistical offices and by their potential utility for the data ecosystems explored by private entities.
The European Central Bank and other central banks are participating in big data activities of the ESS. Joint developments in areas of mutual interest could be proposed to share the burden of developing data collection systems.
Academia use IoT for research purposes and are developing new approaches for data analysis in IoT systems. Existing partnerships should be extended to this new area to accompany the study. Possible synergies should be analysed for exploration at later stage.
The public, on the one hand, will benefit from efficiency gains achieved via IoT systems (energy grid, mobility, smart cities) and will experience better services (smart homes, wearables). On the other hand, increased collection of data and information on the use of devices and related personal data will pose issues of privacy, confidentiality, ethics and misuse of data. Approaches such as privacy by design are trying to avoid the possibility of misusing personal data. Automatic aggregation at certain nodes of the network could as well contribute to preserving privacy of citizens. Official statistics have to analyse these approaches in terms of suitability for the purposes of official statistics. The advantages for citizens would be increased offer and/or reduced burden and at the same time preserving the privacy of the citizens.
It is therefore expected that Smart Statistics will offer a unique opportunity for addressing some fundamental issues relevant to both the industry and society which go beyond official statistics like, for example, algorithmic accountability and transparency.
3.2Project Environment
The project is conducted among other modernisation activities that are currently ongoing. The closest relation is to activities exploring big data for statistical purposes and for improved policy making. While these projects concentrate on specific big data sources and integration of different data, the study here proposed focus on integrated systems, the Internet of Things and its smart applications for their potential to produce statistical information. The current projects are creating valuable input for this second step.