ESSnet Big Data
Specific Grant Agreement No 1 (SGA-1)
https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata
http://www.cros-portal.eu/......
Framework Partnership Agreement Number 11104.2015.006-2015.720
Specific Grant Agreement Number 11104.2015.007-2016.085
Work Package 2
Web scraping / Enterprise Characteristics
Deliverable 2.1
Legal aspects related to Web scraping of Enterprise Web Sites
Version 2016-08-23
ESSnet co-ordinator:
Peter Struijs (CBS, Netherlands)
telephone : +31 45 570 7441
mobile phone : +31 6 5248 7775
Table of contents
1 Introduction 2
2 Present Regulations for Official Statistics 2
2.1 CBS - Netherlands 2
2.2 GUS - Polland 2
2.3 SCB - Sweden 2
2.4 BNSI – Bulgaria 2
2.5 Istat - Italy 2
2.6 ONS - United Kingdom 3
3 Privacy and Copyright protection requirements by enterprises 3
3.1 CBS - Netherlands 3
3.2 GUS - Polland 3
3.3 SCB - Sweden 3
3.4 BNSI – Bulgaria 3
3.5 Istat - Italy 3
3.6 ONS - United Kingdom 3
4 Discussion of challenges and recommendations from NSIs’ legal departments 3
4.1 CBS - Netherlands 4
4.2 GUS - Polland 4
4.3 SCB - Sweden 4
4.4 BNSI – Bulgaria 4
4.5 Istat - Italy 4
4.6 ONS - United Kingdom 4
1 Introduction
2 Present Regulations for Official Statistics
2.1 CBS - Netherlands
2.2 GUS - Polland
2.3 SCB - Sweden
2.4 BNSI – Bulgaria
2.5 Istat - Italy
The legislation framework for the use of data for Official Statistics in Italy is mainly based on the three following legislative artefacts:
· Codice in materia di protezione dei dati personali – (Codice privacy), d.lgs. 30 giugno 2003, n. 196, in the following named as privacy code.
· Codice di deontologia e buona condotta per il Sistan (allegato A.3 al Codice Privacy), in the following named as ethics code.
· Decreto legislativo 6 settembre 1989, n. 322 as National Statistical System (NSS) decree.
With reference to such legislation, the following major points are relevant to the task of accessing, storing and processing data from enterprise web sites:
1. Istat and other components of the NSS can use personal data for their institutional functions in compliance with principles of ‘pertinence’ and ‘no excess’.
2. Personal data can be collected only for ‘defined aims, which have been made explicit and legitimate, and used for other tasks that are compliant with such aims’ (Privacy Code art. 11).
3. ‘Handling personal data for historical, statistical or scientifical aims is considered as complaint with aims according to which data have been collected or handled previously’ (Privacy Code art. 99).
4. The NSS decree introduces the concept of the National Statistical Program. This program is a three-year program that has a complex iter ending with the approval by the National Authority for Privacy (in Italian Garante per la Privacy).
5. To be completed (regulation on duties for notification)
Given point 3, it is always possible to use for statistical purposes personal data that third parties have collected according to whatever purpose.
Given the point 4, Istat proposed a project about scraping of enterprise Web sites in the last National Statistical Program. In the project, Istat had to provide details in particular on the following items:
· Purposes and ways according to which Web scraped data would have been handled.
· Data sources, i.e. Enterprises Web sites.
· Collaboration with other parties.
· Phases of the statistical process in which data would have been handled.
· For personal data, time after which data would have been deleted.
2.6 ONS - United Kingdom
3 Privacy and Copyright protection requirements by enterprises
General description of the protection of owners’ legal rights and privacy requirements when data are collected, processed and stored, country based
From the point of view of the enterprises, there are two main issues that can potentially raise when a party accesses via Web scraping the data they publish on their Web sites, namely:
· Violation of copyright protection
· Violation of privacy rights
· Violation of ethical principles in accessing data
With respect to copyright protection, from Wikipedia:
“Copyrightis alegal rightcreated by the law of a country that grants the creator of an original workexclusive rightsfor its use and distribution. This is usually only for a limited time. The exclusive rights are not absolute but limited bylimitations and exceptions to copyrightlaw, including fair use.”
“Copyrights are consideredterritorial rights, which means that they do not extend beyond the territory of a specific jurisdiction. While many aspects of national copyright laws have been standardized throughinternational copyright agreements, copyright laws vary by country.”
With respect to privacy rights, data published on Enterprise sites are intended to be publicly available. However, for privacy guarantees there is a major issue to consider, namely:
· Ensuring that the “purposes” for which data are accessed prevent from possible privacy breaches.
To be completed
3.1 CBS - Netherlands
3.2 GUS - Polland
3.3 SCB - Sweden
3.4 BNSI – Bulgaria
3.5 Istat - Italy
3.6 ONS - United Kingdom
4 Discussion of challenges and recommendations from NSIs’ legal departments
· How to achieve the agreement with enterprises on a big scale?
Describe the possible approaches to reach thousands of enterprises.
· How to target the content for scraping in an ‘uncertain’ situation?
Describe the possibility of raising legal issues according to different use cases.
4.1 CBS - Netherlands
4.2 GUS - Polland
4.3 SCB - Sweden
4.4 BNSI – Bulgaria
4.5 Istat - Italy
4.6 ONS - United Kingdom
2