DIME/ITDG Plenary February 2017

Item 2- Annex 1: A joint framework for action on linked open data at ESS level

DRAFT PROPOSAL

Footer text goes herePage 1 of 1

Table of Contents

Table of Contents

Table of Figures

1Linked open statistical data: the way forward

1.1Official statistics are high value open data

1.2Experimenting Linked data technologies in the ESS

2A proposal for a joint framework for LOD actions

2.1Strategy and policy

2.2People and capacities

2.3Data and metadata

2.4Technology and infrastructure

2.5Governance

Annex I: The statistical LOD ecosystem

oLegal view

oOrganisational view

oSemantic view

oTechnical view

Table of Figures

Figure 1: Expected benefits of LOD for statistical data

Figure 2: The proposed joint LOD framework in a nutshell

1Linked open statistical data: the way forward

1.1Official statistics are high value open data

The increasing importance of data and information for economic and societal growth, underpinned by promising new data and analytics technologies, has increased the need for high-quality, trusted data. Such data should be available at the right time, at the right level of granularity and in several distribution formats. The data should be provided under clear licences for access and reuse, with availability over time.

National Statistical Institutes (NSIs) are playing an instrumental role in this environment. The Member States assign to the NSIs the responsibility for the development, production and dissemination of European official statistics,as well as statistical code lists and classifications. NSIs have been servicing societies and organisations with high quality, trusted data for hundreds of years. Technology evolution has always had an important role in impacting and transforming the dissemination of official statistics.

In the European Union, the statistical data domain was one of the first areas that provided transparent and open access to the public in response to the requirements of the PSI directive in 2003 and its revision in 2013. Meanwhile, statistical datahas been identified as high value data in the G8 Open Data Charter[1], where high value data is defined as data that can improve democracies and encourage innovative reuse. Official statistics have an extremely high reuse potential, and even more so when combined with other data. Just to name a few,official statistics are combined with other sources by policy and economic analysts, integration of statistics with geospatial data enables urban planning and disaster management, official statistics combined with private market data can facilitate business decisions, such are where to invest.

In this new reality, NSIs are collaborating and sharing data with several types of data consumers, including businesses, academia, NGOs, data journalists and other organisations. These consumers combine statistical data, often also with other types of data, e.g. geospatial, in value-added services and mobile apps to serve internal needs or external demand. Increased data sharing and integration raise semantic interoperability challenges, i.e. challenges related to the interpretation of the meaning of data and metadata coming from different sources. Furthermore, information models and formats need to be harmonised to create a unified view over the data and to allow for the connection, use and analysis of data from different internal or external sources.

The ESS Vision 2020[2] foresees that "we provide a pool of European statistics in a machine-readable open data format. This data pool is publicly available at all times to all user categories. It enables experienced power users such as data-driven journalists, scientists or policy makers to digest statistical datasets in a manner that best suits their needs. Our value proposition is based on their needs. Third parties may also access and re-use the data pool, e.g. for integration (with source notification) in their websites or apps." And "In a next phase we will investigate if the data can be made available as linked open data, for easy combination with other data pools".

1.2Experimenting Linked data technologies in the ESS

To overcome these challenges, NSIs are continuously striving to improve the way their data is disseminated, inspired by the emergence of state-of-the-art technologies, such as linked data. We have observed that NSIs are increasingly moving towards standardised machine-usable dissemination formats, are investing in communication activities to reach out to existing and new consumers of their data, and are experimenting with these new technologies.

In this report, we understand linked data as a technology paradigm based on Web technologies and standards such as HTTP, RDF and URIs mostly proposed by the World Wide Web Consortium (W3C). Linked data puts forward a set of design principles for connecting and sharing machine-usable data within and across organisations, and on the Web. It enables the provision of “data services” and conceives the Web as an open ecosystem where data owners, data publishers, and data consumers can interconnect and integrate different datasets. It converts the Web from a “Web of documents” into a “Web of interconnected data”. For statistics it means from “linking tables and datasets” to “linking data”.

Applying linked data technologies to statistical data can deliver significant benefits to National Statistical Institutes and to data consumers alike.

Figure 1: Expected benefits of LOD for statistical data

In order to gain hands-on experience with linked data and to understand how expected benefits can be delivered and how they can impact the dissemination of official statistics, a number of NSIs, including INSEE (France), ISTAT (Italy), CSO (Ireland), ONS (United Kingdom) and FSO (Switzerland) are already experimenting with statistical linked data. By analysing the current status of affairs in those cases, we observed three linking approaches:

  1. Publishing official statistics in a linkable, machine-readable format (4 stars)

Using linked data as means of publishing official statistics in a linkable, machine-readable format, which can easily be reused and integrated with other types of data, e.g. geospatial, weather, etc. The primary driver behind this scenario is the transformation of official statistics dissemination so that is published in linkable, machine-readable formats, based on open standards such as SDMX, RDF Data Cube or CSV, and at level of granularity that matches the requirements of consumers, from single observations to full datasets. This data linking approach scenario underpins the reuse of official statistics across sectors, and gives new opportunities in terms of disseminationand analysis of data.

  • Example: INSEE publishes census data in LOD since 2012

NSIs can build new services based on LOD, in particular merging statistics with geospatial information.

  • Example: ISTAT has presented a use case of spatial queries of LOD data and how to interact with it through a mobile device given some GPS coordinates, the user can retrieve population indicators related to the nearest census sections and visualize them).
  • A pilot case was also conducted in the UK, allowing the analysis of special distribution of statistical phenomena
  1. Interconnect official statistics datasets together and with datasets of other organisations (5 stars)

Using linked data to interconnect several official statistics datasets (covering both data and metadata) housed in different databases, data stores and data warehouses of different NSIs and/or Eurostat, or of the NSI with Other National Authorities.This enables the user to interact with data from different sources using a single channel. This can be done to support different use cases from calculating, analysing or comparing to an indicator across countries through to replacing the physical data exchanges between organisations by “virtually” tapping into each other’s data via browsing the linked data. By adopting this paradigm it would be much easier to produce "integrated dissemination products".

Datasets can be combined with data published on the web by other types of organisations.

  • Example: Istat linked the data of their portal with those of ISPRA (National Institute for environmental protection) LOD portal, at municipality level; A query can be posed to Istat portal asking simultaneously for Istat's data and ISPRA's data. In this approach it is very importance to keep the information about the provenance of the data, this is well supported by the LOD paradigm.
  • Example: CSO Ireland liked census data with Department of Education and Skills
  • This approach could also be useful to link other sources for Sustainable development Goals indicators.

Semantic and technical interoperability barriers, e.g. different data representations or different protocols for data access, would need to be overcome.

  1. Data integration within an NSI

Although there are alternative ways to integrate data, it has been recognised by several NSIs linked data can be useful to interconnect several official statistics datasets (covering both data and metadata) housed in different databases, data stores and data warehouses within an NSI. The primary driver here is to reduce the efficiency costs of data integration, and to investigate new more flexible and standards-driven technologies for doing that. The outcome not only results in a unified view over data physical stored in different systems and locations, implemented using different technologies, but it also makes this data easier to be linked with other (internal and external data); thus making it both linkable and interoperable.

The ESS workshop on (linked) open data that took place in Malta on 18-19 January clearly showed that while the use of LOD technologies by the ESS is still in an experimenting phase (with the exception of ISTAT), there are potential high benefits in progressing along this path in a coordinated way and to develop the capacities jointly. Coordination reduces the risks of replicating non-interoperable and disconnected information islands, re-inventing the wheel and missing out on sharing of knowledge and experience. Below a joint strategy for linked data for the ESS is proposed building upon the existing ESS experiences and the results of the Malta workshop.

2A proposal for a joint framework for LOD actions

It is proposed to have a strategy at the ESS level, defining the approach of Eurostat and NSIs as well as the areas where cooperation is useful. It should be noted that there are other important stakeholders in this domain: at national level for instance organisations managing national data portals, for Eurostat the Publications Office, DG Connect and DIGIT.

A framework for action for statistical LOD focusing only on data and the technology will not succeed. We are therefore putting forward a holistic approach, covering both open and closed linked data, which comprises five complementary and tightly inter-connected dimensions:strategy and policy; people and capacities; data and metadata; technology and infrastructure; and governance.The five views of the strategy are explained in the following sections.

The joint statistical LOD strategy proposes the following two levels of execution, which complement each other.

  1. Joint activities, executed by NSIs and Eurostat as a group. This is in line with the "shared' approach of the ESS Enterprise Architecture reference framework (there are common, distributed services, shared and accessible to all the NSIs) e.g. provision of joint services and software, collaboration on LOD pilots, LOD capability building and standardisation. Such activities will primarily benefit the harmonisation and easier integration of official statistics and statistical code lists across the ESS, interoperability, improved data dissemination, sharing of resources, capabilities, skills and knowledge. They will foster collaboration within the ESS, and will drive innovation both for NSIs and Eurostat, but also for the users of their data, across domains and borders.
  2. National activities of the NSIs, building on common approaches, standards and practices. This is in line with the "interoperable' approach of the ESS Enterprise Architecture reference framework (Coordination is through interoperability. The NSIs have the autonomy to design and operate their own BBs, as long as they have the ability to exchange information and operate together effectively through their respective information systems). Through such activities NSIs which would like to do so put in place their LOD infrastructures and provide LOD to data consumers, use linked data for internal data integration and quality improvement, and improve the ways of managing and disseminating their data. Those activities will help NSIs engage with the national communities of data users and strengthen the link between statistical data supply and demand.

An iterative approach is recommended. Small cycles with well-defined outcomes can help NSIs evaluate the impact and assess the benefits of linked data implementations, before proceeding with larger-scale investments. Therefore, the several activities comprised in each of the five dimensions are prioritised accordingly, hence indicating from where should an NSI start with its LOD journey and what is the path to be followed.

Figure 2: The proposed joint LOD strategy in a nutshell

2.1Strategy and policy

The strategy and policy dimension aims to put in place the foundations for collaboratively formulating in the ESS and agreeing on a shared vision and way forward for LOD.

Start small, think big. Eurostat and NSIs that want to jump on the LOD train should not be worried about having everything done perfectly from day one. They are advised to start small and gradually improve their LOD offering, based on user demand and feedback. To this end, small LOD implementation cycles with well-defined outcomes and KPIs can help NSIs evaluate the impact and assess the benefits of LOD, before proceeding with larger-scale investments

Joint activities with immediate priority

  • Piloting.Working together on collaborative proofs-of-concept can contribute greatly to learning, sharing knowledge and developing capabilities collectively in the ESS, but also to measuring and quantifying the benefits of LOD.This could be done through the ESSnet.
  • Coordination of strategy. The overall strategy for LOD should be coordinated by Eurostat and the ESS with alignment of national strategies to the common strategy.
  • Management buy-in.The overall direction of the joint LOD strategy needs to be supported and endorsed by the highest management levels at the NSIs and Eurostat on the basis of clear shared benefits for the NSIs and for data consumers. The benefits need to be advertised and promoted. Ideally, national strategy needs to be supported by a high level champion, e.g. cabinet minister.
  • Stakeholder identification and engagement.Identify key users and create champions. The objectives and audiences for targeting promotion, communication and publicity should be clearly identified and engaged. Audiences include both data consumers and other internal and external stakeholders, such as other administrations. Stakeholder engagement and communication strategies are key for the success of any LOD initiative.
  • Promoting standardisation. To promote the implementation of LOD standards in the EU beyond the ESS, e.g. to other public organisations in the Member States and EU institutions and agencies providing statistical data, as well as to their contractors and collaborators.

Activities for the future

  • Evaluation and next steps: based on the results of the pilot, the ESS should define a strategy, in line with the ESS EA framework.
  • Collaborate with institutions outside of the ESS. Collaborate with other organisations that produce statistics in order to create awareness and maximise impact of linked open data.
  • Ensuring own funding.Local actions should be undertaken to secure availability of financial and human resources for LOD activities, taking into account expected benefits or cost reductions for particular NSIs or the ESS as a whole.
  • Acquiring external funding.The possibilities of achieving external funding in the form of grants or sponsorship may be considered, especially via European Commission programmes such as ISA2 and CEF.
  • Innovation.Opportunities for new business models for data dissemination or value-added services should be kept in mind for those NSIs who want to re-invent themselves and transform driven by the new technological advancements.
  • Develop a community of consumers. A community of consumers of statistical LOD should be developed in the ESS and should be supported in order to exchange best practices and how to, so that dissemination and reuse of official statistics as linked data is maximised.
  • Sustainability of the results of hackathons.The objectives, funding and intended results of hackathons and app contests should be defined. Their outcomes should be sustained and further developed, whenever a business case exists.

2.2People and capacities

Lack of skills in the ESS are likely to be an issue to further develop LOD artifacts. While external support may be provided, in particular on the technological side, it seems necessary for NSIs and Eurostat to develop in-house expertise.LOD require multidisciplinary teams, combining competence in IT, dissemination/user interaction, data and metadata management and semantic. The people and capacities dimension is putting forward activities for changing culture and capacity development in the ESS, both required for moving towards the provision of statistical LOD.

Joint activities with immediate priority

  • Collaboration and knowledge sharing in the ESS. The need for the members of the ESS to work together, share practices and experiences and learn from each other is cornerstone for the successful implementation of the joint strategy. This will help NSIs progress faster and more efficiently, as they are all facing common challenges and have similar questions to be answered. Collaboration and knowledge sharing can be facilitated via the participation in joint pilots, the organisation of conferences and workshops, the production of joint thought leadership reports, study visits between NSIs, the easy access to learning material etc. Development of capacitiesshould be related to practical experiments and pilots in such a way that there is direct interaction between practice and theory – what has been learnt should be put in practice, and engagement should be adapted to knowledge gaps identified during the work. This effort can be supported through an ESTP course and through the management of a community of practice of actors involved in LOD.
  • Creation of an ESSnet. The creation of an ESSnet on statistical LOD will allow collection of best practices coming both from inside (NSI, Eurostat) and outside (e.g. ISA² program); and to share these best practices notably through open publication.
  • Cross-functional communication. LOD requires competences in statistics, IT and dissemination (e.g. which metadata should be used and how to ensure their effective dissemination). Consequently, opportunities should be created (e.g. workshop, pilots) that will bring together various profiles: conceptual, statistics, technical, business.
  • Developing sustainable partnerships and relationships.The ESS should work on strengthening further the partnerships with academia and industry, who can provide the necessary skills and capabilities for publishing and using LOD.
  • Capacity building. The NSIs should first decide which skills they wish to develop in-house and for which they will rely on external expertise, e.g. on academic and industry partners. It is imperative for NSIs to develop in-house some core LOD expertise for being able to implement and manage LOD activities. To this end, the organisation and provision of training on LOD to NSIs’ staff, piloting and bringing together people with complementary skills and backgrounds will increase their readiness and capability to provide LOD. Training programmes for staff may be set up in cooperation with internal experts and external training providers.

Activities for the future