Highlighting the added-value of

Statistical Linked Open Data

Raffaella Aracri ()[1], Giovanni Corcione ()[2], Andrea Pagano ()1,Paolo Pizzo ()1,Monica Scannapieco ()1,Laura Tosco ()1, Luca Valentino ()1

Keywords:Linked Open Data, RDF, Spatial Query, Federated Query

1.Introduction

Given the increasing importance of Linked Open Data [1] as a dissemination channel and the need to be compliant to the National and International Directives [2] in this field, Istatreleased, in June 2015,its Linked open data portal: datiopen.istat.it. This portal makes available as RDF data: the Italian territory, both at administrative and statistical level, and a set of Population Census indicators at the sub-municipality level of census sections. In more details, the available indicators include resident population by sex, age, citizenship, employment status and educational level and indicators related to families, dwellings and buildings[3]. The portal makes available to end-usersseveral services, namely: (i) guided queries on territorial and census data, (ii) a SPARQL endpoint to compute free-style queries, (iii) a customized Pubby interface for RDF navigation, (iv) a link to LODLive portal for the visual navigation of data, (v) a Web service called CensusLodRest that allows machine-to-machine data exchange, (vi) a graphical interface to download defined dataset in CSV format and the specific domain ontologies [4][5], and finally (vii) a documentation service.

From an architectural point of view, we realized a platform based on Oracle “Spatial & Graph” as a triple store with a SPARQL query engine, being both fully compliant with the IT infrastructure already existing in Istat and having the major advantage to scale up to billions of triples, which is an important requirement for the platform that has to support the Istat LOD dissemination channel.

To highlight the benefits of statistical linked open data, we propose two possible scenarios in which the Istat LOD portal could offer profitable services. In Section 2, we propose a first scenario in which a web application uses the spatial features of Istat census data. In Section 3, a second use case is described; in this scenario we outline the usefulness of making a federated query.

2.First Scenario

A bookseller wants to open a new international library. He is carrying out an inspection on an available space and wants to make a market analysis to know the type of users distributed by age, country of origin, educational level and employment status that are resident in adjacentareas to the store,in order to adequately diversify the supply. To this end, the bookseller has available an application on his mobile device that detect its GPS coordinates and sends this information to the dati.open.istat.it SPARQL endpoint that allows to make a query, via the HTTP protocol, to the Istat’s triple store and returns the required information.In more details, the application builds a GEOSPARQL[6] query that identifiesthe census sections nearest to the detected position and returns for each of them, the related WKT[7]geometryand the resident population, both Italian and foreigner, distributed by age, sex, country of origin, educational level and employment status, covering the portion of land nearest the census sections in which the detected position is located.

Figure 1. Use case 1 – application workflow

Figure 1, shows the workflow of the described application, while Figure 2 shows the application results drawndirectly in a map and visualized, i.e. in a smartphone.

Figure 2. Use case 1 – application results

3.Second Scenario

A technical official of an Italian province has to make an analysis of the state of degradation of the buildings in the municipalities in relation to the land use. For this purpose, the official has available an application that builds a federated query between the data published by Istat and data published by ISPRA (Institute for Environmental Protection and Research) [8]that have been linked at the municipality level, using as access point both the Istat endpoint or the ISPRA endpoint [9]. The federated query selects: (i) from the Istat triple store, the name and cadastral code, the indicator related to the total number of building in bad state of preservation and the indicator related to the total resident population of the municipalities in the province, and (ii) from the ISPRA triple store the indicator related to the land usage expressed in percentage. In this way, the official obtains easily a complex information that otherwise it would have been hard to get.

Figure 3. Use case 2 – application workflow

Figure 3 shows the application workflow in which the federated query is built and executed on the Istat endpoint that is linked to the ISPRA endpoint, and returns the results that are visualized on a chart to make data more accessible as shown in Figure 4.

Figure 4. Use Case 2 – Visual representation of results

4.Conclusions

The LOD Portal as of today allows accessing more than 800 Million RDF triples. The releases of LOD datasets on the portal planned in the very next future include: (i) Labour market areas (as of 2011 Italian census) and (ii) the National Italian Registry of Addresses (with civic numbers). In particular, the datasets (ii) has been identified as a priority for 2017 given the huge number of requests to publish it in LOD collected by the Italian Agency for IT in Public Administration.

The advanced services described in the use cases could be particularly suitable also with respect to these next releases. In particular, use case 1 could be revisited according to the National Italian Registry of Addresses.

References

[1]Linked Open Data – LOD:

[2]National Guidelines for the of the Valorization of the Public Sector Information (in Italian) English summary available at:

[3]Aracri R., De Francisci S., Pagano A., Scannapieco M., Tosco L., Valentino L. “Publishing the 15th Italian Population and Housing Census in Linked Open Data” in Proceeding of SemStats - International Workshop on Semantic Statistics, 2014

[4]Territorial Ontology,

[5]Census Data Ontology,

[6]GEOSPARQL,

[7]Well Known Text,

[8]ISPRA,

[9]ISPRA endpoint,

1

[1]Istat - Italian National Institute of Statistics

[2]Oracle Italia