ESSnet Big Data

Specific Grant Agreement No 2 (SGA-2)

https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata

http://www.cros-portal.eu/......

Framework Partnership Agreement Number 11104.2015.006-2015.720

Specific Grant Agreement Number 11104.2016.010-2016.756

Work Package 6

Early estimates of economic indicators

Coordination meeting, Ljubljana 4.-5.October 2017

Minutes

Version 2017-16-10

ESSnet co-ordinator:

Peter Struijs (CBS, Netherlands)

telephone : +31 45 570 7441

mobile phone : +31 6 5248 7775

Participants:

-  CBS: Marco Puts

-  OSF: Henri Luomaranta

-  ISTAT: Alessandra Righi

-  INE: Pedro Campos

-  GUS: G Grygiel

-  SURS: Tomaž Špeh, Boro Nikić, Manca Golmajer, Luka Zupanc, Črt Grahonja, Simona Peceli, Vesna Horvat

1  Welcome and introduction

Tomaž and Boro introduced participants to the meeting and introduced the agenda of the meeting.

The main goal of the WP6 is to conduct at least one early estimate of economic indicator with Big data source.

The main objectives of the meeting were:

- Overview of work done in SGA-1 and by countries in SGA-2 (state of the art)

- Discuss Big Data sources and “traditional sources” which could be combined for purposes of estimating of early economic indicators.

- Discuss Economic indicators which early estimates could be calculated by using big data sources

- Introduce Pilot on Early estimates of GDP (early estimates of GDP-Slovenian case; big data and other sources, data preparation, combining data, models, estimates and precision of estimates, estimates of components of GDP)

- Discuss methods for nowcasting

- Discuss Quality measures (with emphasis on precision of estimates)

- SGA2- planning (actions; country by country, deliverables…)

The participants of the meeting introduced themselves and their role in the frame of WP6.

2  Review of work done in SGA-1 and SGA-2 (so far)

Italy (by Alessandra Righi):

-  In Italy the preliminary estimate of GDP is officially released at T+45 days, compiled following the same sources and methods adopted for the full compilation of quarterly national accounts (released at t+60)

-  Review of quality criteria for GDP Flash estimates:

o  Accuracy in order to publish timely, consistent and coherent estimates of macroeconomic variable (e.g. GDP) that accurately represents productive activity in the economy

o  Limited availability of data especially for the third month of the quarter

o  The coverage of the source data used in compiling the national GDP of the subsequent estimates improves significantly (more econometric modelling needed)

o  Quality Issues (Coverage, Information available, Estimation method, Unbiasedness in the estimates: revision not in the same direction, Maximum average absolute revision)

o  Revisions of seasonally adjusted data take place for reasons of revision of the unadjusted data and better estimate/identification of the seasonal pattern.

-  Big data source used for nowcasting is Electronic payment data (in collaboration with Bank of Italy). Data include transactions from direct debt, payment card, cheque and credit transfer

-  Analyses with Electronic payment data have already been made. That is: graphical comparisons of payment flows and GDP, of payment flows and households’ consumption, of payment flows and value added in service sector and of payment flows and gross fixed investment

-  The goal of ISTAT is improvement of Flash estimates using Electronic payment data of BCE System of Payments. Targeted variables for flash estimates are IPI by NACE and Turnover by NACE.

Finland (by Henri Luomaranta):

-  In SGA-1 nowcasting Finnish Turnover indexes using firm-level data was completed, while implementation in the Turnover indexes production system is ongoing.

-  In SGA-2 the plan is:

o  to apply results/methods to different indicators (GDP, Trend Indicator of Output).

o  Expand the data sources

o  Explore new methodologies that work with large dimensional problems

-  Three different processes of nowcasting were introduced. In general, process is made of four steps: building training and test sets; training models, validating and selecting; applying models; evaluating performance of model.

-  Methods used were: OLS shrinkage regressions, factor models and non-linear models (Boosting, Decision trees, Neural Nets). A great number of different model are tested and through a machine learning process some (those that give the best results) are chosen to constitute the ensemble method that is used for actual nowcasting.

-  Data treatment is done in SAS, models were built in R using package caret. Before optimal ensemble choice all models in the caret package are used for testing.

Portugal (by Pedro Campos):

-  Statistics on GDP in Portugal are published:

o  T + 45: chain and yearly rates

o  T + 60: GDP components

o  T + 85: detailed information (including regional disaggregation)

-  Use of Domain estimation for estimation of unemployment rate, estimation of poverty indicators, for estimation of GDP-by-Metropolitan-Area

-  Methods used for estimating domain (small area) estimations: Synthetic Regression estimator, Empirical Best Unbiased Estimator (EBLUP).

-  Big data sources, that could be used for nowcasting purposes are smart meters that measure energy consumption and traffic sensors on the border with Spain.

Netherlands (by Marco Puts):

-  CBS set up Center of Big Data to research the use of Big data for making new statistics

-  First official statistics produced from traffic loops data

-  Reversed the idea and use economic indicators to predict traffic density with linear regression models

-  Public comments on social media could be used to "harvest" insight on changes in consumer confidence before the results of more formal surveys are published

-  CBS started to disseminate experimental statistics (Beta products in development)

3  Big data sources

List of (potential) Big data sources for estimation of early economic indicators was made. Considered Big data sources are:

-  Traffic loops data

-  Border crossing traffic sensor data

-  Electronic payment data

-  Smart meters (for energy consumption)

-  Image satellite data (for the amount of emissions in the atmosphere)

-  Social media (economic sentiment indicator can be used as a regressor)

-  Web scraped data on job vacancies

4  Early economic indicators

Participating countries mainly have the similar ideas what early economic indicators could be estimated. List of (potential) economic indicators was conducted:

-  GDP (aggregated)

-  STS indicators (Industry (IPI), Services, Retail trade, Wholesale, Construction)

-  CPI (inflation)

-  External trade

-  Building permits

5  Early estimates of GDP at SURS

SURS described work and the process on nowcasting statistical indicators from the point of data acquisition (traffic loops data) to the end results on GDP and also on a known GDP correlator, the Industry Production Index (IPI). The use of linear regression and PCA methods together with traffic loops data could produce accurate estimations of some early economic indicators. Data acquired from traffic sensors are used as primary and secondary regressor in a linear regression method for nowcasting GDP 45 days after the reference period. A Nowcasting is a rapid estimate produced by a statistical authority or an institution outside a statistical system during the current reference period T (or very close to its end "the present") for a hard economic variable of interest observed for the same reference period T. A nowcasting makes use of all available information becoming available between T – 1 and T until the estimation time. Statistical or econometric models, different from the ones used for the regular production process are considered using either hard, soft, unconventional and/or financial data.

6  Demo sessions

SURS described first pilot IT solution for nowcasting economic indicators from the point of data acquisition (data from traffic sensor placed on Slovenia roads) to the end results on GDP and IPI (Industry Production Index). It was presented how to show usefulness of such data and what was needed to be done, before this data could actually be used. These data have been used as a secondary regressor in a linear regression, in which GDP or IPI was the dependent variable, and primary regressors were the chosen first few principal components of turnover microdata in industrial enterprises. Traffic sensor data, methods used for nowcasting and IT solution was described in order to make it easier for partners to use the IT solutions for nowcasting additional economic indicators on their data and models.

7  Quality measures

Problem 1:

The main issue is how to assess nowcasted estimation; how close should estimate be to the currently official statistic on certain economic indicator?

If someone would like to know how good the estimate is, he has to know what the reality is. However, the first problem is that economic indicators, such as GDP, are not tangible but are rather concepts.

Group did not conclude what is the acceptable precision (accuracy) at which the estimate is good enough to be disseminated.

Problem 2:

Two representatives of WP8 were present on the meeting (Marco Puts, Vesna Horvat). We discussed what kind of contribution WP8 expects from other work packages (concretely WP6).

Some examples on quality, methodology and IT were given. Main quality related topics mentioned were: processing errors, measurement errors, model error & precision and quality on combining sources. However in WP6 there is not that much combining data on micro level as it is on aggregated level.

As WP8 will soon expect contribution from other work packages it was agreed, that since Italy has done a lot of work on the area of quality, WP6 will use its findings.

8  SGA-2 plannig (actions)

Review and discussion about big data and other sources reliable for estimation of early economic indicators.

Outcome: final decision about sources to use

Review and discussion about early economic indicators which could be tested during SGA-2.

Outcome: decision (country by country if possible) about testing set of about early economic indicators

Work on early estimates of GDP (SURS pilot):

·  IT application (method)

·  Data processing and estimates

The code will be tested by other countries on their data by our methods; their methods; try to evaluate other indicators.

Outcome: Ideas for improvement

Methods for nowcasting. Review and discussion.

Outcome: determination of methods which could be tested (implemented)

Quality measures with focus on precision criteria and validation of process and results.

Outcome: Proposals of quality indicators

Wednesday, October 4th / Item / NSI
10:00-10:15 / Welcome and introductions / Slovenia, All
10:15-10:30 / Objectives of the meeting / Slovenia
10:30-11:00 / Review of work done in SGA-1 and SGA-2 (so far) / Slovenia
11:00-11:20 / Coffee break
11:20-12:40 / Country presentations / Slovenia
Finland
Netherlands
Italy
13:40-14:30 / Country presentations / Portugal
Poland?
14:30-15:00 / Big Data sources / All
15:00-15:30 / Early estimates / All
15:30-15:50 / Coffee break
15:50-17:00 / Early estimates of GDP / Slovenia
17:00-17:15 / Conclusions of the first day / Slovenia, All
Social Dinner

Meeting agenda

Wednesday, October 5th / Item / NSI
9:00-10:00 / Combining traffic loops data and survey data for purposes of early estimates / Slovenia
10:00-11:00 / Methods for nowcasting of early estimates / Slovenia, All
11:00-11:20 / Coffee break
11:20-12:20 / Practical work / All
12:20-13:20 / Lunch break
13:20-14:20 / Quality measures / All
14:20-14:50 / SGA2- planning (actions) / All
14:50-15:10 / Conclusion of the meeting / Slovenia