‘Policy implications of the data revolution, 12.7.12

Event summary

Overview:
The amount of data accessible to researchers and decision-makers has increased, and continues to increase, in ways that were scarcely imaginable a generation ago. So far the Government has opened up over 9000 data sets on their open portal (http://www.data.gov.uk/) that cover health, education, transport, crime and justice. By opening up data the Government hopes to inspire innovation and spur economic growth; openness should also empower citizens. The UK is leading the way in transparency, involving 55 Local Authorities in the UK including Cambridgeshire County Council, from whom we’ll hear later today. The Rt Hon. Francis Maude highlighted in last month’s White Paper, 'Open Data - Unleashing the potential' that ‘data is the 21st century’s new raw material’ …….

The event looked at how the increasing supply of data affects policy research and decision making; how companies are using data for new applications, how researchers and councils are gathering and processing new data and some of the implications for policy and policy-making.

The event was split into 3 parts and what follows is a summary of each of the speakers in each of the sections.

1. How the supply of data is changing

Nick Illsley (Dept of Transport / Transport Direct)

a. Transport Direct has led the government departments in transparency and open data.

b. Information is gradually crossing boundaries and silos as services cross boundaries. In 2002 97/100 data access points were free. The remaining three (Ordnance Survey, Rail industry and Air industry) all charged for their data. In 2011 the PM published a letter about making data open across departments of eg transport. Data included timetables, motorways, cycle routes and every local authority could publish its roadwork data.

c. A specialist transport sector data transparency board was set up. Both data owners and users attended to discuss what can / could they release. The emphasis was important, and wasn’t on ‘why won’t you release….?’ All minutes and papers were also released.

d. Transport data is complex; rail fares are complex and you need to know how to share the information with the user in mind – a possible application for smart ticketing in the future?

e. The Autumn (2011) Statement declared that the aim of transparency was for economic growth overall and stipulated that the CAA also had to release user data. The June 2012 ‘Francis Maude’ White Paper has 3 key chapters: publishing data whilst protecting privacy; Government making data available interactively across agencies and being able to use that data smarter.

f. MyData: A Government-run project called MyData has enabled consumers can access digital data held on them by companies (Google, Tesco, utilities, banks….) which should help consumers to switch products and organisations more easily and save money by eg monitoring their energy bills. For example, this should simplify some tariffs eg there are 6 million mobile phone tariffs in the UK. It should also lead to further empowerment of communities eg in bulk-purchasing. The DfT has released performance-related information on ‘mydata’.

g. DfT has used the Open Data Institute mechanism for pump-priming. There needs to be a dynamic between the data user and the provider. Data users are now starting to demand Service Level Agreements on the quality of data provided. The format of data, too, can be a challenge. It’s hard to introduce this into a public department eg the National Audit Office. Over the next 10 years there will be a whole swath of data releases; here are some of the early ones:

o Buses: Traveline holds the repository for buses and has timetables for every bus service under Government licence. It’s a free / useful service for users. And there are 60-70 developers now using that data for further improvements.

o Rail: Network Rail open licence has 500 developers using that licence and processing their data.

o Roads: Lots of data released on roads; many local authorities happy to share data on roadworks. In the East of England specifically, so now people in Cambridge can collaborate to try and get A14 sorted using existing resource / data. There is now better information which should lead to intelligent modelling (and less cost).

h. The best examples in the UK at the moment of collaboration open data should involve everyone, not just people you know. Tim Berners-Lee launched an open data initiative in the Autumn of 2010 (http://www.data.gov.uk/). Key issues are: Should open data be regulated? And there is a need to be able to quantify the benefits eg in saved time if nothing else.

i. Ownership is important and needs protection eg rail data is privately owned and Ordnance Survey can’t make their data free as it’s a trading company. The application of data also needs to be clear and developed with the end user in mind. It’s likely that many apps will in fact be short term, as new apps and uses come along.

Steve Magenis: Royal Haskoning : ‘The Peterborough Model’

a. As many councils, Peterborough City Council had paid lots of money for lots of data but was finding it hard to reap the benefit from it. Should they sell the data on? It was also complex to understand. ‘The Peterborough Model’, conceived by a consortium of IBM, Green Ventures, Natural England and Royal Haskoning, is designed to use the data to really understand a city and how it operates, but it can be applied to any complex place from cities to parishes.

b. The project was started 2 years ago with collaboration as its focus. There was initial resistance but when some of the players saw others’ data and possible benefits of sharing, the resistance was gradually turned into acceptance. The collaborators identified various areas where there was data and where it was useful to see data modelling. These areas included Environment, Waste, Energy, Water and Transport.

c. The Peterborough Model uses Google Earth as the foundation and can be seen and built on 2 websites; on for the auto-uploading of data from the stakeholders and one for the public where there are ‘fly-throughs’ showing how the City works. You can see the outline of the Peterborough Model plus a YouTube clip and the fly-throughs online at: http://www.peterborough.gov.uk/environment/the_peterborough_model.aspx

d. The fly-through online demos show the main issues for the city of Peterborough, modelled from various data that’s simple to obtain. So, for example, the flythrough video on energy explains where Peterborough’s energy comes from and how it is transmitted locally. It demonstrates that the city relies on energy supplied from outside its boundaries, with the local power station providing a backup power supply. It shows that although there are local windfarms connected into the electricity network, the city still depends on fossil fuel energy sources.

2. Analysis and / or applications using new sources of data or new ways of combining data.

Seppe Cassettari: CEO, the GeoInformation Group (http://www.geoinformationgroup.co.uk/)

a. Geographical information is one of the most important datasets that we can use and underpins most spatial information. Most countries use national mapping data. Now the UK is looking at new mapping sources eg Google rather than the national base (Government-funded Ordnance Survey). The national base tends to be expensive to maintain and update and you get stuck into a standard specification.

b. All 3D building data is commercially driven and we need to think of ways of integrating this.

c. The GeoInformation Group pushed to map and collect data that’s not used in any other way ie mapping of anti-social behaviour in children against, say, fast-food restaurants in the London borough of Newham. This doesn’t exist as land-use information and has never been generated.

d. The only national dataset not duplicated is postcodes. Address data needs different data sets to build a complete picture. No one body collects all information and no one organisation owns the IPR. Getting data to the end-user often involves many organisations; and new data is being created all the time. The market is very fast-moving and more data =more apps.

d. National databases cost a lot to maintain and are too clumsy and slow to react quickly. Eg The Ordnance Survey costs £120m pa to maintain the maps yet it doesn’t have a lot of the new information on it.

Points to consider:

I. Ownership: Crown copyright or commercial? Who owns the IPR? Licensing? For example, Telecoms planning has to use non-national data for coverage. Possible breach copyright of GeoInformation Group’s supply of maps to Google.

II. New coverage: GeoInformation Group mapped vegetation in London as the Government had said it would increase tree cover by 5%...although they had no knowledge of existing tree coverage.

III. Currency vs quality: Best and latest maps are hugely expensive (more than the Government put into OS) therefore there is always a compromise. Mapping of residential properties in London could use older datasets (eg aerial photos) but to obtain current information took 100 people a year to map this.

IV. Who pays? Government or consumer? There is an increasing amount of free data, with less people buying so prices become higher. Projects vs products – need the latter for licensing agreements.

V. Consumers’ behaviour: The more you consume the less you value it.

Daniele Quercia: Computer Lab (http://www.cl.cam.ac.uk/~dq209/)

a. Daniele’s research is about the relationship between offline / on line data. His group looks at Twitter data comments and can overlay this data with data on census deprivation in London. So researchers can analyse the sentiments expressed by topic and correlate this to different social strata. From these analyses you could overlay tube usage data to show how mobile people were so gaining an understanding of behavioural characteristics.

b. Students at the Computer Lab have developed a short online game to try to construct a virtual representation of how residents perceive London. It’s a one minute game and has over 20,000 users already. See http://www.cam.ac.uk/research/news/look-familiar/ and urbanopticon.org for more information.

c. Spatial data can also be used to overlay what we know about buildings to build a network eg by mapping who talks to who in the building, could you create a building designed for knowledge creation or not?

Presentation: http://www.slideshare.net/daniele.quercia/unleashing-the-potential-of-spatial-data

Cecilia Mascolo: Computer Lab (http://www.cl.cam.ac.uk/~cm542/)

a. Cecilia is a researcher looking into office and mobile sensing data – using mobiles, when and who they chat to whilst also trying to understand emotions from microphones, and mobile phone applications (often free apps demonstrate a strong locational bias).

b. Geospatial data (FourSquare: a free app that helps you and your friends make the most of where you are). This data can also be made public via Twitter and therefore it’s possible to map people at times across the city to see how the city functions.

c. You can also map the number of instances certain words crop up in mobile conversations eg by mapping phone conversations in Chicago she found that the majority of people mentioned ‘lake’ whilst next to Lake Michigan….

d. But this also has a serious side – mapping the density of mobile phone speech can help to decide the optimum location for mobile phone masts and hotspots.

Presentation: https://dl.dropbox.com/u/18143875/bigdata712.pdf

Richard Hall: IT Applications Architect, Open Data Group, Cambridgeshire County Council.

a. ‘Open data is expected to deliver Euros 40bn boost to the EU economy each year’ (EU press release, 12/12/2011).

b. Cambs County Council publishes to the Government’s open data portal at www.data.gov.uk and http://www.cambridgeshire.gov.uk/council/access/opendata/default.htm

c. One of the main obstacles is getting data users used to the fact that there’s often another application for their data.

d. Open Data can address the concern in public organizations of large and expensive IT systems

e. Cambs County Council would like to buy innovative services and products too from open data apps; Open Data can enable SMEs to sell better products and services to Governments and Local Authorities.

Examples of Cambridgeshire County Council using open data:

· Real-time bus data: CCC has been interested in publishing real-time bus data (open source Cambridge University minibus app) which will be scaled to work with the Traveline Next-buses API (application programming interface). This app will show you where and when the next buses are due to arrive, nearest to your location in real-time. User experience is becoming more important in app development – so that time isn’t wasted queuing in the rain for buses that don’t arrive.

· Cultural and venue data: Culture Hack East (an event in June 2012 which was a ‘36 hour hack, working in teams to create fresh ideas and prototypes with newly released arts and culture data’). One project aimed to understand the supply and demand for cultural events, which in turn could improve the targeting for arts funding. If you overlay attendee information with MOSAIC (social data) information and then venue information you can show where an audience is coming from and which venues are most popular. Connectivity is key eg using FourSquare is key for venues. See http://www.apropos-site.com/2012/06/18/culture-hack-east/ for write-up of many of the projects including VenueData.org

3. Policy and policy implications of open data

Catherine Stanger, Stanger Consulting (http://efficiencyandtransformation.co.uk/)

June’s White Paper basically said ‘we’re putting the data out there’, Sec 1.3. which has implications for managers in the public sector. But it’s very important to demonstrate the need and value of this data to the user.

GIS stuff: Does it exacerbate the rural / urban divide? Lots of modelling of cities but what about villages – try googling your home village services and see the quality of data that is returned. So an important point has to be made about the interpretation of the data from a user perpective.

In practice:

1. Transport data has developed around need eg personal trips round London (‘walkit’), and it’s easy to see the benefit some of these bring.

2. Public expenditure over £500 – so public can access it….but why? Catherine checked herself out on it to see who the competition were, who were the ‘big 4’ etc but is it actually of any use?

Data has to be contextual (relevant) and current and these points were not included in the recent White Paper: The implications are:

1. Training – what is it used for: Don’t congratulate people on publishing large amounts of data; understanding it is crucial. Training is needed – don’t just put it out there! Not just policy makers but anyone implementing measures.