Decision-Making from Probability Forecasts using Calculations of Forecast Value

Kenneth R. Mylne

The Met. Office, Bracknell, UK

(To be submitted to Meteorological Applications)

'How do I make a decision based on a probability forecast?'

1

Abstract: A method of estimating the economic value of weather forecasts for decision-making is described. The method may be applied equally to either probability forecasts or deterministic forecasts, and provides a forecast user with a direct comparison of the value of each in terms of money saved, which is more relevant to users than most standard verification scores. For a user who wishes to use probability forecasts to decide when to take protective action against a weather event, the method identifies the optimum probability threshold for action, thus answering the question of how to use probability forecasts for decision-making. The system optimises decision-making for any probability forecast system, whatever its quality, and therefore removes any need to calibrate the probability forecasts. The method is illustrated using site-specific probability forecasts generated from the ECMWF ensemble prediction system and deterministic forecasts from the ECMWF high-resolution global model. It is found that for most forecast events and most users the probability forecasts have greater user value than the deterministic forecasts from a higher resolution model.

1. Introduction

A weather forecast, however skilful, has no intrinsic value unless it can be used to make decisions which bring some benefit, financial or otherwise, to the end user. Conventionally in most weather forecast services the forecast provider supplies the user with their best estimate of whether a defined event will occur (e.g. wind speed will or will not exceed 15ms-1), or of a value for a measurable parameter (e.g. maximum wind speed =18 ms-1). Decision-making is often based on whether a defined event is expected to occur or not. For example the owner of a small fishing boat may decide to seek shelter when the forecast wind speed exceeds 15 ms-1.

The nature of atmospheric predictability is such that there is frequently a significant uncertainty associated with such deterministic forecasts. Forecast uncertainty can be expressed in many ways, either qualitatively or quantitatively, and where such information is included in a forecast this can aid the decision-maker who understands the potential impact of a wrong decision. However, uncertainty is most commonly estimated subjectively by a forecaster; such estimates are often inconsistent, and may be affected by factors such as forecasters “erring on the safe side”, which may not lead to optimal decision-making. In recent years there has been considerable development of objective methods of estimating forecast uncertainty, notably ensemble prediction systems (EPS) such as those operated by the European Centre for Medium Range Weather Forecasts (ECMWF) (Molteni et al, 1996, Buizza and Palmer 1998) and the US National Centers for Environmental Prediction (NCEP) (Toth and Kalnay, 1993). Output from an EPS is normally in the form of probability forecasts, and there is growing evidence (e.g. Molteni et al, 1996, Toth et al 1997) that these have greater skill than equivalent deterministic forecasts based on single high-resolution model forecasts, particularly on medium range time-scales. To make use of this additional skill, the decision-maker needs to know how to respond to a forecast such as ‘There is a 30% probability that the wind speed will exceed 15 ms-1.’ This paper will describe a technique which estimates the economic value of a probability forecast system for a particular user based on verification of past performance, and use it to determine the user's optimal decision-making strategy. The value of deterministic forecasts can be calculated in the same way, and this allows a direct comparison of the utility of probability and equivalent deterministic forecasts in terms which are clear and relevant to the user.

2. Background to Ensemble Probability Forecasts

Uncertainty in weather forecasts derives from a number of sources, in particular uncertainty in the initial state of the atmosphere and approximations in the model used to predict the atmospheric evolution. Errors in the analysis of the initial state result from observational errors, shortage of observations in some regions of the globe and limitations of the data assimilation system. Model errors are due to numerous approximations which must be made in the formulation of a model, most notably the many small-scale processes which cannot be resolved explicitly, and whose effect must therefore be represented approximately by parametrization. The non-linear nature of atmospheric evolution means that even very small errors in the model representation of the atmospheric state, whether due to the analysis or the model formulation, will be amplified through the course of a forecast and can result in large errors in the forecast. This sensitivity was first recognised by Lorenz (1963), and was influential in the development of chaos theory. Gross errors in the synoptic-scale evolution are common in medium range forecasts (over 3 days), but can occasionally occur even at less than 24 hours. Errors in the fine detail of a forecast, such as precipitation amounts or locations, are common even in short-range forecasts. Ensemble prediction systems have been developed in an attempt to estimate the probability density function (pdf) of forecast solutions by sampling the uncertainty in the analysis and running a number of forecasts from perturbed analyses (Molteni et al, 1996; Toth and Kalnay, 1993). In more recent developments, Buizza et al (1999) have included some allowance for model errors in the ECMWF EPS, by adding stochastic perturbations to the effects of model physics. Houtekamer et al (1996) describe an ensemble whichaccounts for both model errors and analysis errors by using a range of perturbations in both model formulation and analysis cycles. With any of these ensemble systems, probability forecasts may be generated by interpreting the proportion of ensemble members predicting an event to occur as giving a measure of the probability of that event.

A range of standard verification diagnostics are used to assess the skill of such probability forecasts. For example the Brier Score (see Wilks, 1995), originally introduced by Brier (1950), is essentially the mean square error for probability forecasts of an event. Murphy (1973) showed how the Brier score could be decomposed into three terms, reliability, resolution and uncertainty, which measure different aspects of probabilistic forecasting ability. Of these the reliability term measures how well forecast probabilities relate to the actual probability of occurrence of the event, and resolution measures how effectively the forecast system is able to distinguish between high and low probabilities on different occasions. ROC (Relative Operating Characteristics), described by Stanski et al (1989), measures the skill of a forecast in predicting an event in terms of Hit Rates and False Alarm Rates. Rank Histograms (Hamill and Colucci, 1997 and 1998) specifically measure the extent to which the ensemble spread covers the forecast uncertainty, and can also reveal biases in the ensemble forecasts. However while all these diagnostics are of great value to scientists developing ensemble systems, they are of little interest or relevance to most forecast users. In particular they do not tell users how useful or valuable the forecasts will be for their applications, nor do they answer the question of how to use probability forecasts for decision-making.

3. Calculation of Forecast Value

An overview of techniques for estimating the economic value of weather forecasts is given by Murphy (1994), and a comprehensive review by Katz and Murphy (1997). The method applied in this study is closely related to ROC verification (Stanski et al, 1989). It has recently been discussed by Richardson (2000), and is rapidly becoming accepted as a valuable tool for user-oriented verification of probability forecasts. The method has also been applied to seasonal forecasts by Graham et al (2000). The aim of this paper is to present the method in a way which is particularly suitable for aiding forecast users with decision making.

The concept of forecast value is that forecasts only have value ifa user takes action as a result, and the action saves the user money. Calculation of forecast value for predictions of a defined event therefore requires information on (a) the ability of the forecast system to predict the event, and (b) the user's costs and losses associated with the various possible forecast outcomes. Consequently the value depends on the application as well as on the skill of the forecast. Forecast value will be defined first for a simple deterministic forecast, and the generalisation to probability forecasts will be considered in more detail in section 3.6.

3.1 Ability of the Forecast System

The basis of most estimates of forecast value is the cost-loss situation described by Murphy (1977). This is based on forecasts of a simple binary event, against which a user can take protective action when the event is expected to occur. For such an event the ability of a forecast system is fully specified for a deterministic forecast by the 22 contingency table shown in Table 1, where h, m, f and r give the frequencies of occurrence of each possible forecast outcome.

3.2 User Costs and Losses

For any user making decisions based on forecasts, each of the four outcomes in table 1 has an associated cost, or loss, as given in table 2. For

Event Forecast
Yes - User Protects / No - No Protective Action
Event / Yes / Hit H
(h) / Miss M
(m)
Observed / No / False
Alarm F
(f) / Correct Rejection R
(r)

Table 1: Contingency table of forecast performance. Upper case letters H, M, F, R represent the total numbers of occurrences of each contingency, while the lower case versions in brackets represent the relative frequencies of occurrences.

Event Forecast
Yes - User Protects / No - No Protective Action
Event Observed / Yes / Mitigated Loss Lm / Loss L
No / Cost C / Normal Loss N=0

Table 2: Contingency table of generalised user costs and losses. Note: in the simple cost/loss situation described by Murphy (1977), this is simplified such that Lm = C.

convenience it is normal to measure all costs and losses relative to the user's costs for a Correct Rejection, so the 'Normal Loss' N for this contingency is set to zero. (Note however that this assumption is not necessary, and the method readily accounts for non-zero values of N.) If the event occurs with no protective action being taken, the user incurs a loss L. If the event is forecast to occur, the user is assumed to take protective action at cost C. In the simple cost/loss situation (Murphy, 1977), this action is assumed to give full protection if the event does occur, so the user incurs the cost C for both Hits and False Alarms. In reality protection will often not be fully effective in preventing all losses, and the losses may be generalised by specifying a Mitigated Loss Lm for Hits, as in table 2.

For a forecast to have value it is necessary that Lm<L. In most circumstances it would be expected that C Lm<L, but it is possible that in some circumstances Lm<C. For example, protective action could involve using an alternative process which works effectively in the weather conditions specified by the event, but does not work in the non-event conditions - in this case the cost C of a False Alarm would be high compared to Lm.

The above examples assume that costs, losses and forecast value are specified in monetary terms. They could, instead, be calculated in terms of a user's energy consumption, for example - the concept is the same. Note that one limitation of the system comes where a forecast is used to protect life, due to the difficulty in objectively placing a cost on lives lost.

3.3 Mean Expense Following Forecasts

Given the information specified in tables 1 and 2, and assuming the user takes protective action whenever the event is forecast, it can be expected that over a number of forecast occasions the user will experience a mean expense Efxof

Note that the last term rN in equation (1) is normally zero, but this specifies the generalisation to any definition of the Normal Loss.

3.4 Climatological Reference Expense

Forecast value represents the saving the user gains by using the forecasts, and therefore requires a baseline reference expense for comparison with Efx. If no forecast is available the user has two options, either always protect or never protect. In Murphy's (1977) simple cost-loss situation where Lm=C these options will incur mean expenses Eclover many occasions of C or L respectively, where is the climatological frequency of occurrence of the event. The user's best choice is always to protect if C<L, or C/L<, and never to protect otherwise. Assuming the user takes the best option, the mean climatological expense in the simple cost-loss situation is thus given by

For the generalised user loss matrix given in Table 2, the mean expense of the always protect option is given by , and the mean expense of following climatology is given by

Fully generalising this to allow for the possibility of N0 gives

In this case the user's best strategy is to always take protective action if

where is the generalised cost/loss ratio introduced by Richardson (2000). In some circumstances one of the climatological options may not be viable for a user, since taking protective action may involve stopping doing their normal economic activity (e.g. a fisherman’s protective action against strong winds may be to leave his boat in port). The user cannot do that all the time or he would go out of business. In this case the forecast value should be calculated using the viable climate option. This will be considered further in section 4.3.

3.5 Definition of Forecast Value

The value of the forecast in monetary terms, V, is the saving the user can expect to make from following the forecast, averaged over a number of occasions:

This basic definition of value is the most relevant for a user, except that it does not account for the cost of buying the forecast service. The true value to the user is therefore

where Cfx is the purchase price of the forecast. However, although Vu is the correct definition to use when estimating the value of a forecast to a user, Cfx is specific to any forecast service and cannot be estimated in general terms. For the purposes of this paper this term will therefore be ignored and value will be defined as in equation (6).

For general assessments of forecast value, it is convenient to scale V relative to the value of a perfect forecast, in a similar fashion to the normal definition of a skill score (see Wilks, 1995). With a perfect forecast m=f=0 and the user takes protective action only when the event occurs. The mean expense is therefore (or if N0, then). The relative economic value of a forecast is then defined as:

Vr has a maximum of 1 for a perfect forecast system, and is zero for a climatology forecast. Vr may also be multiplied by 100 and expressed as a percentage.

It is important to note that while Vr has a maximum value of 1 (or 100%), there is no lower limit, and from equations (1) and (8) it is clear that negative values are likely when m or f, or their corresponding user losses Lm or C, are large. Negative value simply indicates that the forecast system does not have sufficient skill to provide value to this particular user. In this case the user's best strategy is to follow the best climatological option. It may also be possible to find a different event for which the associated costs and losses will be different, and for which the forecasts may be more skilful.

3.6 Value for a Probability Forecast

Forecast value is defined above for deterministic forecasts of a simple binary event, against which the user takes protective action when the event is expected to occur. Probability forecasts are normally also defined for binary events, since they are expressed as the probability that an event will occur. To make decisions from probability forecasts, the user takes protective action (‘Forecast=Yes’ in Table 1) when the probability p of the event exceeds a chosen threshold pt. The value of the forecast therefore depends on the choice of pt. To completely specify the value of a probability forecast system, the value is calculated for a range of probability thresholds and plotted as a function of pt as shown in figure 1. (This use of probability thresholds is identical to that used in ROC verification, for which hit rates (HR) and false alarm rates (FAR) are calculated for various probability thresholds using the same contingency table as in table 1. The appendix describes how forecast value may be evaluated directly for any forecast system for which ROC verification is available.)

4 Results

Three examples of forecast value plots are shown in figure 1. All are based on verification of the same forecasts, but the user costs and losses are different, as shown in Table 3. The verification is based on daily forecasts of the event “10m wind speed of Beaufort Force 5 or more” for 41 sites in the UK, over two winter seasons (DJF, 1998/99 and 1999/2000). Results are shown for forecast lead-times of 48, 96, 144 and 192 hours. Probability forecasts were taken from the 51-member ECMWF operational Ensemble Prediction System (EPS). The value of the probability forecasts is calculated and plotted at probability thresholds of 0, 10, 20, 30, …, 90, 100%. For comparison, equivalent deterministic forecasts from the ECMWF high resolution (TL319) global model are also included.

(a) / (b)

Figure 1 (a) and (b): f or full caption see figure 1(c) overleaf.

User / N
(CR) / Lm
(H) / L
(M) / C
(FA)
A / 0 / 1 / 5 / 1
B / 0 / 1 / 2 / 1
C / 0 / 2 / 10 / 1

Table 3: User costs and losses in arbitrary monetary units for the three examples of forecast value plots shown in figure 1. (For the meaning of the column headings, see tables 1 and 2.)

(c) / Figure 1: Examples of relative forecast value Vr, expressed as a percentage of the value of a perfect forecast, plotted against probability threshold pt for probability forecasts of an event. The values of equivalent deterministic forecasts are shown in the columns at the right hand side of each graph. The best climatological option, selected following equation (4), is labelled above the graph. Details of the forecasts used are given in Section 4. Forecast lead-times are: 48h (solid line), 96h (dotted), 144h (dot-dot-dot-dash) and 192h (dashed). The three graphs are for the same forecasts, but for different user loss functions as given in Table 3: (a) A (b) B and (c) C. (Note that values for deterministic forecasts at 144h and 192h in figure 1(c) are missing because they are off the graph below –40%.)

4.1 Value of Probability Forecasts