Prediction in science: how big data does – and doesn’t – help

(2717 words, excl. footnotes and references)

Abstract (147 words)

Have big data methods revolutionized our ability to predict fieldphenomena, i.e. phenomenaoutside of the laboratory? When are its predictionssuccessful?I draw onthree case studies – of weather forecasting, election prediction, and GDP forecasting –to go beyond existing philosophy of science work. Their overall verdict is mixed. Two factors are important for big data prediction to succeed: underlying causal processes must be stable, and the available dataset must be sufficiently rich. However, even satisfying both of these conditions does not guarantee success. Moreover, the case studies also illustrate some of the reasons why the conditions may not be satisfied in any case.A final lesson is that whenpredictive success is achieved, it is local and hard to extrapolate to new contexts. More data has not countered this trend; if anything, it exacerbates it.There is reason to think this true of field cases generally.

Synopsis (744 words)

Has the rise of big data, i.e. of data-intensive science, revolutionized our ability to predict, suggesting in turn that science should now emphasize prediction over explanation? Many have claimed so.Accurate prediction has long been possible in the context of a controlled laboratory or an engineered artefact. But, thanks to big data methods, it is now possible of many field phenomena too, i.e. ofuncontrolled phenomena away from the laboratory.

But how much difference has big data actually made? And what factors have influencedthat? I draw onthree case studies, using them togobeyond existing philosophy of science work: weather forecasting (AUTHOR forthcoming), election prediction (AUTHOR 2015), and GDP forecasting (Betz 2006).

1) Weather. Weather forecastinghas improved significantly:the reliability of seven-day forecasts today is equal to that of three-day forecasts 20 years ago. Several factors explain this progress: improvements in the quality and quantity of data (forecasters now make use of over 10 million observations per day); better models; new analytical techniques, especially ensemble methods, i.e. the running of multiple simulations to generate probabilistic forecasts; and greatly increased computing power. So, more data is part, although not all, of the explanation. But even with still more data (and computing power, etc), still only probabilistic ensemble forecasts would be possible. How accurate these forecasts couldbecome, how many days ahead, is unknown.

2) Elections. There have been two main approaches to predicting elections:using ‘fundamentals’ such asgrowth in GDP, jobs or real incomes; or using opinion polls. It turns out that polls predict much better than fundamentals do – although, as several recent elections have shown, still not perfectly.What role for big data? More data has clearly helped: pollsters’predictions have improved. But yet moredata will likely not help as muchin the future. First, there is a limited sample of past election results, which is the relevant effect variable, and this inevitably limits thescope for trainingpredictive models. Second, methods that predict well in one election do notnecessarily predict well in the next. The model of voter turnout that best predicted the UK general election in 2015, for instance, was not the one that best predicted the election in 2017.

3) GDP.GDP prediction has proved very difficult. One naïve benchmark is to assume that GDP growth will be the same next year as this year. Currently, forecasts for 12 months ahead barely outperform this benchmark; forecasts for 18 months ahead don’t outperform it at all. The record shows little or no sustained difference in the success of different forecasters despite widely varying methods.Moreover, the forecasting record has not improved over the last 50 years despite vast increases in available data and computing power.The induction is that more data will not improve matters here, unlike in the weather case.

What lessons do these case studies teach? First, that two conditions are important to predictive success:

1) Underlying causal processes must be sufficientlystable, else there will be no stable correlations for big data predictions to exploit. (This problem hampers prediction of GDP and elections.)

2) The available dataset must be sufficiently rich, featuring all relevant configurations of cause and effect variables. (Prediction of elections is hampered by this issue too.)

Cases of successful prediction, such as weather forecasting, satisfyboth these conditions.

The case studies teach us more.Even satisfying the stability and richness conditions does not yet guaranteesuccess. Other problemscan still intrude, such as measurement error or a system being chaotic. Moreover, the case studies also illustrate some of the possible causes of the stability and richness conditions not being satisfied in the first place. These include a system being open, thus threatening the stability condition because of unpredictabledisruption by external factors, or a system being reflexive, thus threatening stability by allowing looping effects.

There is one final methodological lesson. In all of our cases, it is not generalizable theory that predicts successfully.Rather, the weather models require many ad hoc adjustments that go beyond, or even contradict, basic Newtonian theory; the ‘fundamentals’ models of elections are out-predicted by opinion polling; and theory-based forecasts of GDP fare no better than those drawn fromother methods. As a result, predictive success when itis achieved is local and hard to extrapolate to new contexts. Moreover, more data has not countered this anti-theory trend; if anything, it exacerbates it.There is reason to think this will betrue of field cases generally.

Full text

1. Introduction: big data and prediction

Does the rise of data-intensive science, or‘Big Data’, mandate big changes in scientific method? In particular, has it revolutionized our ability to predict, suggestingthat we should now emphasize prediction over explanation? Many have argued so: recent technology, reinforced by commercial imperatives, has greatly increased both the amount of data collected and our ability to process it. As a result, accurate prediction is now often possible where previously it was not. In particular, while accurate prediction has long been possible in the context of a controlled laboratory setting or an engineered artefact, it is claimed that it is nownewly possible too for many field phenomena, i.e. foruncontrolled phenomena ‘in the field’ away from the laboratory. Examples include: Amazon’s personalized suggestions of new purchases, which manhole covers will blow or which rent-controlled apartments will have fires in New York City,or how to get the cheapest airline tickets (Mayer-Schoenberger and Cukier 2013); the discovery of theCRISPR technology for genome editing in living eukaryotic cells (Lander 2016); Facebook and Google’s experiments regarding page design and marketing; and many others. New analytical techniques have developed in tandem, most notably various forms of machine learning and algorithmic methods. Neural nets, for example, are behind rapid recent advances in natural-language translation (Lewis-Kraus 2016).

But does big datareallyrevolutionize prediction of field phenomena? And what factors determine when it does – and when it doesn’t? I draw onthree existingstudies (already done for other purposes) of field cases of independent interest, concerning: weather forecasting (AUTHOR forthcoming), election prediction (AUTHOR 2015), and GDP forecasting (Betz 2006). How have big data methods impacted on predictive success each time?[1]

2. First example: Weather

Earth’s weather system is widely believed to be chaotic, in other words it is believed that weather outcomes are indefinitely sensitive to exact initial conditions (Lorenz 1969). Moreover, it has also been argued recently that weather predictions are indefinitely sensitive to model errors too– that is, even tiny inaccuracies in a model can lead to very large errors in the predictions made by that model (Frigg et al 2014). These difficulties would seem to bode ill for the prospects of accurate weather prediction.Yet, despite them, forecasting accuracy has improved significantly over recent decades. Hurricane paths, for instance, are predicted more accurately and further ahead, and temperature and rainfall predictions are more accurate too. Overall, the reliability of seven-day forecasts now is equal to that of three-day forecasts 20 years ago (Bechtold et al 2012).

What explains this tremendous progress? Several factors together. The first is the availabledata: there has been a huge improvement in itsquality and quantity, stemming initially from the launch of the first weather satellites in the 1960s. Today, there are temperature, humidity and other reports of ever greater refinement both horizontally (currently increments of 20km squares) and vertically (currently 91 separate altitude layers). Over 10 million observations per day are inputted into the models of leading forecasters.

The second factor is the forecasting models. At the heart of these models are differential equations that have been known for hundreds of years, namely Newton’s laws of fluid dynamics. These laws are assumed to govern the fiendishly complex movements of air in the atmosphere, and how those are impacted by temperature, pressure, the Earth’s rotation, the cycle of night and day, and so on. So far as is known, this fundamental theory remains a true description of the weather system (or at least as approximately true as any other Newtonian model). However, in practice it is not sufficient to generate accurate forecasts. Moreover, refining weather models from first principles is not an effective remedy for this. Instead, a whole series of additions have had to be made in order to accommodatethe impacts of various specific factors, such as mountains, clouds, rainwater run-offs, and the coupling of air movements and ocean currents. The exact form that these additions should take isdetermined by a trial-and-error process. The ones finally adopted are under-determined by fundamental theory,and indeed sometimes contradict it.[2]

Third, newanalytical methods have also been developed. The most notable innovation dates from the late 1990s, when models began to feature stochastic terms. This enabled the introduction and refinement of the ensemble method of forecasting: multiple simulations are run, generating probabilistic forecasts. In turn, this has overcome the problem of chaos:in particular, although any one simulation inevitably risks going seriously askew because of an arbitrarily small error in the inputted initial conditions, it has been found from experience that, as in many chaotic systems, these errors cancel out over many iterations. That is, the errors are not systematically in one particular direction, and so the probabilistic forecasts derived from the ensemble method are not biased.

Fourth, available computing power has hugely increased. This enables ever more complex models to be used, ever more simulations to be run in timely fashion, and thus the vastlyincreased quantity of data to be exploited.

These four sources of progress have interacted with each other in several ways. The ensemble method of forecasting was not feasible until sufficient computing power became available, for instance. The increase in data and computing power have together enabled the development and exploitation of more sophisticated models. And experience of what kind of data most improves the accuracy of models’ predictions has influenced the choice of instruments on new satellites.

We may now return to our main topic: what role has big data played? Weather forecasting’s improvement is indeed in part due to exploitation of more and better data. On the other hand: first, this improvement is not due to data alone, as we have seen; and second, it is only of limited extent anyway. Thus, even now forecasts more than seven days ahead cannot beat the naïve baselines of long-run climate averages or simple extrapolation from current conditions; more data is not a panacea.

How much could prediction improve with even more data? Any such improvement would require more and more physical instruments to collect this new data, more and more computing power to process it, and unknown future changes in modelling and analytical methods. But even ignoring that, if the weather system is indeed chaotic, still only probabilistic ensemble forecasts would ever be possible. How accurate could such forecasts become, how far in advance? That is unknown. Overall, the verdict is thatdata has helped with prediction up to a point, and more data might help more, but not unlimitedly.

3. Second example: Elections

I will discusshere two different approaches to predicting the results of political elections. The first approach is to use models of ‘fundamentals’: there is a literature in political science that models past election results in order to predict future ones on the basis of variables that recur from election to election, most commonly economic ones such as growth in GDP, jobs or real incomes.

The second, very different, approach is opinion polling. An important part of polling methodology is the need to adjust raw averages of survey responses. Most well-known is the need for demographic rebalancing: results will be biased if a sample is unrepresentative of the voting population with respect to factors such as age, income, race, or sex. Pollsters must decideexactly which such factors to allow for: should one rebalance, for instance, for interest in politics or for declared political affiliation? Mistaken treatment of these latter factors have been the source of errors in recent US and UK election polling.Anddemographic rebalancing is not the only source of pertinent methodological decisions. Other such decisions include: how hard and in what way to push initially undecided respondents for their opinions; how hard and in what way to pursue respondents who decline to participate; whether to sample face-to-face or by phone or online, and (in the latter cases) whether to interview or to let respondents fill out answers alone; how to assess how firmly held a respondent’s preference is; and how to assess the likelihood that a respondent will actually vote. Exactly how pollsters tackle these issues, especially the last one, has been shown significantly to influence the accuracy oftheirpredictions.

How successfulare the two approaches?In brief: opinion pollspredictmuch better than fundamentals models do.[3]In some respects, polls predict the actions of millions of voters remarkably accurately. Nevertheless, as we have seen in several recent elections, they are not perfectly reliable.

What role for big data? First, clearly more data has helped: polling today predicts better than in the past, and this is in part due simply to there being more polling data to work with. (Arguably, there were no reliable political polls at all until after World War Two.) As with weather forecasting, improved analytical methods have also helped significantly. In addition to ‘internal’ issues such as demographic rebalancing, recently the systematic aggregation of polling results has also improved predictive accuracysignificantly.

What would happen if even more data about voters’ preferences became available?It seems that predictive paradise would still remain elusive, for two reasons: first, there is only a limited sample of past elections, which is the relevant effect variable, and thus limitedscope for trainingpredictive models, regardless of the number of polls conducted. The second problem is that what predicts well in one election, such as a particular method of demographic rebalancing,may notnecessarily predict well in the next one. The model of voter turnout that predicted best the UK general election in 2015, for instance, was not the one that predicted best the election in 2017. Again, simply acquiring more data, in the form perhaps of larger samples or more detailed knowledge of individual respondents’ preferences or consumption patterns, does not resolve this issue.

Overall, thus, prediction of elections has improved, although it is still limited. More data has helped in the past, but increasing data further likely will not help as much in the future.

4. Third example: Gross Domestic Product

Predicting GDP growth has proved very difficult. One naïve benchmark is to assume that growth will be the same next year as this year. Currently, forecasts for 12 months ahead barely outperform this benchmark. Forecasts for 18 months ahead don’t outperform it at all. Forecasts also persistently fail to predict turning points, i.e. when GDP growth changes sign. In one study, in 60 cases of negative growth the consensus forecast was for negative growth on only three of those occasions.

The record shows little or no sustained difference in the success of different forecasters despite widely varying methods. These different methods include: purely numerical extrapolations, informal and formal; non-theory-based economic correlations, informal (indicators and surveys) and formal (multivariate time series); and theory-based econometric models, which sometimes feature hundreds or even thousands of equations. Moreover, the forecasting record has not improved in 50 years despite vast increases in available data and computing power, not to mention much theoretical development too.