THE SCHAAKE SHUFFLE:

A METHOD FOR RECONSTRUCTING SPACE-TIME VARIABILITY IN FORECASTED PRECIPITATION AND TEMPERATURE FIELDS

Martyn Clark1, Subhrendu Gangopadhyay1,2, Lauren Hay3, Balaji Rajagopalan2, and Robert Wilby4

1 Center for Science and Technology Policy Research

Cooperative Institute for Research in Environmental Sciences

University of Colorado, Boulder, Colorado

2 Department of Civil, Environmental and Architectural Engineering

University of Colorado, Boulder, Colorado

3 Water Resources Division

United States Geological Survey

Denver Federal Center, Lakewood, Colorado

4 Environment Agency, Trentside Office

Scarrington Road, West Bridgford

Nottingham, United Kingdom

Submitted to

Journal of Hydrometeorology

May, 2003

Corresponding Author:

Martyn Clark

Center for Science and Technology Policy Research, CIRES

University of Colorado at Boulder

Boulder, CO 80309-0488

Tel: 303-735-3624

Fax: 303-735-1576

Email:

Abstract

A number of statistical methods that are used to provide local-scale ensemble forecasts of precipitation and temperature do not contain realistic spatial co-variability between neighboring stations or realistic temporal persistence for subsequent forecast lead-times. To demonstrate this point, output from a global-scale numerical weather prediction model is used in a stepwise multiple linear regression approach to downscale precipitation and temperature to individual stations located in and around fourstudy basins in the United States. Output from the forecast model is downscaled for lead times up to 14 days. Residuals in the regression equation are modeled stochastically to provide 30 ensemble forecasts. The precipitation and temperature ensembles from this approach have a poor representation of the spatial variability and temporal persistence. The spatial correlations for downscaled output are considerably lower than observed spatial correlations at short forecast lead times (e.g., less than 5 days) when there is high accuracy in the forecasts. At longer forecast lead-times the downscaled spatial correlations are close to zero. Similarly, the observed temporal persistence is only partly present at short forecast lead times.

A method is presented for re-ordering the ensemble output in order to recover the space-time variability in precipitation and temperature fields. In this approach, the ensemble members for a given forecast day are ranked and matched with the rank of precipitation and temperature data from days randomly selected from similar dates in the historical record. The ensembles are then re-ordered to correspond to the original order of the selection of historical data. Using this approach, the observed space-time variability in precipitation and temperature fields is almost entirely recovered. This re-ordering methodology also has applications for recovering the space-time variability in modeled streamflow.

GAP:

Hydrology: 1854 Precipitation: 1869 Stochastic processes: 1899

1. Introduction

Local-scale forecasts of precipitation and temperature are commonly obtained through statistical post-processing of output from numerical weather prediction models. A standard approach is Model Output Statistics (MOS), where predictors from the Numerical Weather Prediction model (NWP; e.g., temperature, humidity, and winds at different pressure levels) are used in a multiple linear regression model to forecast temperature and precipitation at individual stations (Glahn and Lowry, 1972). In this approach, the regression equations (or other statistical transfer functions) are developed separately for each individual station of interest (Crane and Hewitson, 1998; Murphy, 1999; Wilby et al., 2002; Clark et al., 2003), or for stations pooled together in a region (Antolik, 2000). The latter approach (“pooled regression”) is favored in situations when the number of archived forecasts is relatively small; a situation that is very common due to the constant changes in operational forecast models. Both single-site and pooled regression techniques provide superior local-scale forecasts of precipitation and temperature than those obtained using raw output from forecast models (Antolik, 2000; Clark et al., 2003).

The problems with the MOS approach, as defined above, are that it does not preserve the spatial co-variability between neighboring stations, and it does not preserve the observed temporal persistence in the predicted precipitation and temperature fields. Preserving the space-time variability in precipitation and temperature fields is critical for hydrologic applications, for example, if one is to use the downscaled precipitation and temperature forecasts as input to a hydrologic model to forecast streamflow. Consider the snowmelt-dominated river basins in the western United States. The high spatial coherence in daily temperature fields results in high spatial coherence in daily streamflow. If the spatial co-variability in downscaled temperature fields is not preserved, then the resultant streamflow forecasts will mis-represent extreme values—higher runoff in one basin will offset lower runoff in a neighboring basin. Similar problems arise if the temporal persistence in precipitation and temperature fields is not preserved.

A method is presented for re-ordering local-scale ensemble forecasts of precipitation and temperature to reconstruct the space-time variability of precipitation and temperature fields. As an example, results are shown where predictors from the 1998 version of the National Centers for Environmental Prediction (NCEP) Medium-Range Forecast model are used in a forward-screening multiple linear regression model to forecast temperature and precipitation at individual stations. Residuals are modeled stochastically to provide ensemble forecasts. In this example, a different statistical model is used for each station, which allows an assessment of the value of this approach for reconstructing the space-time variability of precipitation and temperature patterns based on local-scale forecasts from multiple models (e.g., a multi-model ensemble comprised of outputs from a mix of statistical and dynamical models). The remainder of this paper is organized as follows: the model output and station data are described in Section 2, and methods are described in Section 3. Results, including assessments of model bias, probabilistic forecast skill, spatial co-variability, and temporal persistence, are described in Section 4.

2. NWP Model Output and Station Data

2.1 The CDC Forecast Archive

The Climate Diagnostics Center (CDC) in Boulder, Colorado, is generating a “reforecast” dataset using a fixed version (circa 1998) of the NCEP operational Medium-Range Forecast (MRF) model. The goal is to generate an archive of 14-day forecasts for the last 24 years (1978-2001). The MRF model has been running in near real-time since February 2002, allowing development of stable MOS equations using the archived forecasts, and application of those MOS equations in near real-time.

Output variables used to develop MOS equations are (a) the accumulated precipitation for a 12-hourperiod (e.g., 00Z-12Z), (b) 2-meter air temperature, (c) relative humidity at 700 hPa, (d) 10-meter zonal wind speed, (e) 10-meter meridional wind speed, (f) total column precipitable water, and (g) mean sea level pressure. Previouswork has demonstrated the importance of these variables when downscaling precipitation and temperature (e.g., Clark et al., 2003). For a given day, forecasts are used from 00Z-12Z from Day-1 (i.e., 5pm-5am local time in Colorado), forecasts from 12Z-00Z on Day+0 (i.e., 5am-5pm local time in Colorado), and forecasts from 00Z-12Z on Day+1 (i.e., 5pm-5am local time in Colorado). This provides a total of 21 potential predictor variables in the regression equations, and helps to account for temporal persistence in the MOS equations derived from the NWP (NCEP-MRF, 1998) archive.

2.2 Station Data

This study uses daily precipitation and maximum and minimum temperature data from the National Weather Service (NWS) cooperative network of climate observing stations across the contiguousUSA. These data were extracted from the National Climatic Data Center (NCDC) “Summary of the Day” (TD3200) Dataset by Jon Eischeid, NOAA Climate Diagnostics Center, Boulder, Colorado (Eischeid et al., 2000). Quality control performed by NCDC includes the procedures described by Reek et al. (1992) that flag questionable data based on checks for (a) extreme values, (b) internal consistency among variables (e.g., maximum temperature less than minimum temperature), (c) constant temperature (e.g., 5 or more days with the same temperature are suspect), (d) excessive diurnal temperature range, (e) invalid relations between precipitation, snowfall, and snow depth, and (f) unusual spikes in temperature time series. Records at most of these stations start in 1948, and continue through to the present.

Results are presented for fourriver basins used by Hay et al. (2002) and Hay and Clark (2003): (a) Cle Elum River in central Washington, (b) East Fork of the Carson River on the California/Nevada border, (c) Animas River in southwest Colorado, and (d) Alapaha River in southern Georgia (Figure 1). Attention is restricted to the “best stations” in the cooperative network that are located within a 100-kilometer search radiusof the center of these fourbasins. These “best stations” are defined as those with less than 10% missing or questionable data during the period 1979-1999. This provides 15 stations for the Animas, 16 stations for the Carson, 18 stations for the Cle Elum, and 10 stations for the Alapaha.

3. Method

3.1 Multiple Linear Regression Model

Multiple linear regression with forward selection is used to develop the MOS equations (Antolik, 2000). The forward selection procedure first identifies the predictor variable (e.g., 500 hPa height) that explains the most variance of the predictand (e.g., maximum temperature at a station location). It then searches through the remaining variables, and selects the variable that reduces the largest portion of the remaining unexplained variance in combination with the variable already chosen. If the improvement in explained variance exceeds a given threshold (taken here as 1%), the variable is included in the multiple linear regression equation. The remaining variables are examined in the same way until no further improvement is obtained based on the correlation threshold. MOS equations are developed for precipitation occurrence, precipitation amounts, maximum temperature, and minimum temperature. Logistic regression is used to develop equations for precipitation occurrence, and ordinary least squares are used for all other variables. A separate regression equation is developed for each station, each forecast lead-time, and each month.

The station time series of precipitation are pre-processed before the regression equations are developed. For precipitation occurrence, the daily precipitation data at a given station are converted to a binary time series of 1's (wet days) and 0's (dry days); the regression equation thuspredicts the probability of precipitation (see also Antolik, 2000). For precipitation amount, the station precipitation data (only wet days) are transformed to a normal distribution using a non-parametric probability transform (Panofsky and Brier, 1963). Wecompute the cumulative probability of observed precipitation, and the cumulative probability of a standard normal distribution (mean of zero and standard deviation of one). The cumulative probability of each daily precipitation total in the observed record is matched with the cumulative probability in the standard normal distribution, and the precipitation value is replaced with the corresponding z score.

To avoid over-specification of the regression equation, or chance selection of a set of insignificant variables, cross-validation techniques are used for variable selection (Michaelsen, 1987; Allen, 1971). The dependent sample is broken into five segments. For a given combination of variables, the regression equation is successively trained on 80% of the sample and applied to predict the 20% of the sample that is withheld. The five segments of withheld data are assembled, and are used to calculate the explained variance for each set of predictor variables. Once the variable set is selected, regression coefficients are estimated using the entire dependent sample. Typically, between three and eight variables ultimately appear in the MOS equations.

A final step in thisprocedure is stochastic modeling of the residuals in the multiple linear regression equations to generate ensemble forecasts. Thirty ensembles are generated. For maximum and minimum temperature, this is achieved by simply extracting a random number from a normal gaussian distribution (mean of zero and standard deviation of one), multiplying the random number by the standard deviation of the regression residuals, and adding this product to the forecast of temperature. Precipitation ensembles are generated in two steps. First, precipitation occurrence is estimated for each ensemble member. A random number is drawn from a uniformly-distributed distribution ranging from zero to one. If the random number is lower than the forecasted probability of precipitation occurrence, the ensemble member is classified as having precipitation. Next, precipitation amounts are estimated for the ensembles with precipitation. Precipitation z-scores are estimated, and residuals in the precipitation regression equations are modeled stochastically using methods identical to those used for maximum and minimum temperature. The forecasted precipitation z-scores are then transformed back to the original gamma-type distribution of observed precipitation using the non-parametric probability transform technique described above. The stochastic modeling of the regression residuals inflates the variance of precipitation and temperature forecasts, reducing problems of variance under-estimation that are typical of regression-based models.

Cross-validation approaches are also used to validate the MOS equations. Each year in the time series is successively held out of dependent sample, the MOS equations are developed using data for all years except the year that is withheld, and the MOS equations are used to predict data for the withheld year. This process is repeated for all years in the record.

3.2 The Ensemble Re-ordering Method

The ensemble re-ordering method was described to the authors by Dr. John Schaake in October 2002, and is now known as the Schaake Shuffle. The approach is very intuitive, and is demonstrated graphically in Figure 2 through an example (pseudo FORTRAN code illustrating the algorithm is given in Appendix A):

(1)For a given forecast and lead-time (e.g., the day+1 downscaled forecast initialized on 14th January 1985), the downscaled ensemble members are ranked from lowest to highest. Figure 2a illustrates a hypothetical example of ten ensemble members ranked separately for three stations and one variable (in this case maximum temperature).

(2)A second ensemble of observed dates are randomly selected from the historical record. These dates are selected so as to lie within seven days before and after the forecast date (dates can be pulled from all years in the historical record, except for the year of the forecast). Figure 2b illustrates an example for the forecast initialized on 14th January 1985. The first date selected is 8th January 1996, and is labeled “ensemble one.” The second date selected is 17th January 1982 (which is labeled “ensemble two”), the third date is 13th January 2000 (labeled “ensemble three”), and the fourth date 22nd January 1998 (“ensemble four”), and so forth.

(3)This “observed ensemble” is ranked from lowest to highest, keeping the ensemble member with the historical value (see Figure 2c). The observed ensemble is ranked separately for each variable and station. Suppose that Figure 2 represents the downscaled forecast for maximum temperature at three neighboring stations. Data for “ensemble one” (8th January 1996 in this example) is the fifth lowest of all randomly selected dates at station one, the sixth lowest of all randomly selected dates at station two, and the fourth lowest of all randomly selected dates at station three (Figure 2c).

(4)The ranked randomly selected “observed” ensemble members (Figure 2c) are matched with the ranked downscaled ensemble members (Figure 2a), and are re-ordered such that ensemble one (though ensemble n) represents the matched downscaled ensemble member for the first (through to the last) date selected from the historical record (Figure 2d). In this case, the first re-sorted ensemble member includes the original ensemble member “10” from station one, the original ensemble member “1” from station two, and the original ensemble member “7” from station three (Figure 2d).

(5)The random selection of dates that are used to construct the “observed ensemble” are only used for the first forecast lead-time (i.e., day+1), and are persisted for subsequent forecast lead times. In this example, the dates for the forecast with a lead-time of two days would be 9th January 1996, 18th January 1982, 14th January 2000, 23rd January 1998, etc.

Why does this approach work? Consider first the spatial correlation between neighboring stations. If observed data at two neighboring stations are similar (i.e., a high correlation between stations), then the observations at the two stations on a given (randomly selected) day are likely to have a similar rank. The rank of each downscaled ensemble member at the two stations is matched with the rank of each randomly selected observation, meaning that, for all ensemble members, the rank of the downscaled realizations will be similar at the two stations. When this process is repeated for all forecasts, the ranks of a given ensemble member will on average be similar for the two stations, and the spatial correlation will be reconstructed once the randomly selected days are re-sorted to their original order. Now consider the temporal persistence at a given station. High temporal persistence means that randomly selected historical observations for subsequent days will, on average, have a similar rank. The ensemble output is assigned identical ranks to the randomly selected observations, and thus the temporal persistence is reconstructed once the ensemble output is re-sorted.