Objective Verification of the SAMEX '98 Ensemble Forecasts

Dingchen Hou1, Eugenia Kalnay1,2,3 and Kelvin K. Drogemeier1,2

1Center for Analysis and Prediction of Storms,

2School of Meteorology,

University of Oklahoma

Norman, OK 73019

3National Centers for Environmental Prediction

Washington DC 20233

Last Revision May 28, 1999

Submitted to Monthly Weather Review

______

Corresponding author address and current affiliation: Dr. Eugenia Kalnay, Department of Meteorology University of Maryland, College Park, Maryland 20742.

ABSTRACT

During May, 1998, the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma coordinated a multi-institution numerical forecast project known as the Storm and Mesoscale Ensemble Experiment (SAMEX). SAMEX involved the real time operation of four different mesoscale/regional models over the same region and sought, among other things, to compare the value of relatively coarse resolution (30 km) ensemble forecasts against single forecasts made over smaller sub-regions of the Great Plains at both intermediate (10 km) and high (3 km) resolution. Participating in SAMEX were CAPS (running its Advanced Regional Prediction System, or ARPS), the Air Force Weather Agency (AFWA, running MM5), the National Center for Atmospheric Research (NCAR, running MM5), the National Centers for Environmental Prediction (NCEP, running the Eta model and Regional Spectral Model), and the National Severe Storms Laboratory (NSSL, running MM5).

The SAMEX ensembles consisted of a single 36-hour control forecast from the ARPS (at CAPS), MM5 (at NSSL), and the Eta and RSM (at NCEP), all with horizontal resolutions of approximately 30 km. In addition, and most importantly, perturbed runs also were made, resulting in a grand ensemble of 25 members. Despite logistical difficulties, most of the ensemble forecasts were completed for the last 18 days of May 1998, and on 8 days a full set of 25 real time forecasts were completed.

Based on a variety of quantitative analyses, we show that an ensemble of multiple forecast systems appears close to optimal, probably because it represents most realistically the current uncertainties in both models and initial conditions. This result is consistent with the behavior of multi-model global ensembles. In addition, the SAMEX results show that perturbations to model physics parameterizations, as well as the use of boundary condition perturbations consistent with those applied to the interior initial state, are important for regional ensemble forecasting. Efforts are now underway to compare the ensemble forecasts against those made using higher spatial resolution. Additionally, follow-on SAMEX experiments are anticipated in other geographical areas and weather regimes.

1. Introduction

Since the first pioneering experiments conducted by Leith (1974), which showed that a number of numerical forecasts created from slightly different initial conditions can, if appropriately averaged, yield improved skill relative to the individual forecasts themselves, ensemble forecasting has grown into a major area of research and is now a cornerstone of several operational prediction center in the world. The ensemble approach is a computationally tractable method for integrating the probability density function of the atmospheric state forward in time via the prediction of selected individual states, each physically plausible and distinct, drawn from the density function. As shown by Toth and Kalnay (1997), ensemble forecasting is advantageous not because it provides a numerically averaged solution, but because the averaging process serves as a nonlinear filter to selectively smooth unpredictable components of the flow, leaving behind only those features or signals which tend to agree.

Considerable attention has been given to methods for generating the initial conditions of ensemble members, ranging from the simple and efficient (breeding; Toth and Kalnay, 1993) to the complex and computationally expensive (e.g., singular vectors; e.g., Buizza and Palmer, 1993; Buizza, 1995). Such methods aim to introduce perturbations within the subspace of the growing errors, with breeding resulting in perturbations that are related to the leading Lyapunov vectors, and the leading singular vectors representing the fastest growing perturbations. The control initial conditions represent the "best" estimate of the state of the atmosphere, and the added perturbations should be representative of the expected analysis errors.

Following Houtekamer et al (1996), Hammill et al (1999) recently performed simulation experiments indicating that the most realistic ensemble can be obtained from an ensemble of data assimilations, where the observations are perturbed with random errors. In the perturbed observations method (PO) the random errors project onto the subspace of leading Lyapunov vectors, so that the results are similar (but somewhat better) than breeding, and the perturbation growth is much smaller than that of singular vectors. More recently, perturbations to the models that account for uncertainties due to the use of imperfect models have also been introduced in ensemble forecasting by varying model physics parameterizations (e.g., Stensrud et al., 1998, Houtekamer and Mitchell, 1998, Miller et al, 1999). Perturbations in the physics may be much more important for mesoscale short range ensemble forecasting than in global models.

Practical experience serves as a useful guide in assessing appropriate ensemble strategies and, perhaps not surprisingly, evidence from the large scale shows that ensemble averaging applied to forecasts from different models, each of which is started from the "best" initial condition possible, yields results that are significantly better than the individual forecasts (e.g., Wobus and Kalnay, 1995). Although the multi-model approach does not fit squarely within the classic ensemble framework of perturbations to initial conditions, it is sensible that the "best possible" forecasts produced by completely different systems would, when averaged, yield greater skill than even the best individual forecast given the nature of the nonlinear filtering process.

Ensemble forecasting has long been applied to the large-scale atmosphere and became operational for the global forecasting systems at NCEP and the ECMWF in December 1992 (Toth and Kalnay, 1993, Molteni and Palmer, 1993). It is now widely used by forecasters to assess the reliability of the day-to-day forecasts (e.g., Toth et al, 1997). More recently, ensemble strategies have been used in limited-area models (e.g., at 60 to 80 km resolution). The potential advantages of this so-called "short range ensemble forecasting" (SREF; Brooks et al., 1995; Du et al., 1997) were first discussed during a workshop held at NCEP in 1994 (Brooks et al, 1995), where it was concluded that SREF should be especially useful for precipitation forecasting. As a result, experimental ensemble forecasting systems were developed using the Eta and RSM models at NCEP (Tracton et al, 1999, Du and Tracton, 1999) and the MM5 at the National Severe Storm Laboratory (NSSL) (Stensrud et al, 1999). SREF has proven notably effective, particularly with regard to ensembles among multiple models (e.g., Hammill and Colucci, 1997).

As numerical forecast systems and observational platforms (e.g., the WSR-88D Doppler radar) continue to focus on smaller scales of the atmosphere, and as our understanding of physical processes continues to improve, greater emphasis will to be placed on the prediction of intense local weather using non-hydrostatic models at resolutions of 1 to 10 km. Indeed, the Weather Research and Forecasting (WRF) model, being developed as a dual-purpose research and operational system by the national community (e.g., Dudhia et al., 1998), is targeted specifically at such resolutions.

At this point in time, however, the specific data requirements, analysis and assimilation strategies, and spatial resolution and physics parameterizations needed to accurately predict the initiation, evolution, and decay of intense meso-beta and meso-gamma weather systems are not well established. Thus, as the scientific and operational communities explore strategies for dealing with the detailed short-range prediction of intense local weather, a question of fundamental importance must be addressed: what is the relative value of an ensemble of coarse-resolution (20 to 30 km grid spacing) forecasts compared to a much smaller number of shorter-duration forecasts run at considerably higher resolution (1 to 3 km) and at similar computational cost? The answer to this and related questions has far-reaching implications for the manner in which the US invests in future scientific research and technology acquisition, and efforts must be directed toward providing an answer so as to maximize available resources.

During the past decade, a number of groups in the US and abroad have begun to experiment with mesoscale forecast models that seek to resolve explicitly, using high spatial resolution grids and observational data, the most important processes associated with intense convective and winter precipitation systems. Operationally, the Eta model (e.g., Black 1994, Mesinger, 1996) and the Rapid Update Cycle (RUC, Benjamin et al, 1996) have recently been implemented at mesoscale resolution at the National Centers for Environmental Prediction (NCEP) for short-range predictions over North America. Several state-of-the-art non-hydrostatic models appropriate for storm-scale prediction have been developed as well, and some are also used for regional forecasting.

For example, the Advanced Research and Prediction System (ARPS), developed by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma, has been run on a daily basis for over 2 years with an emphasis on assimilating Doppler radar, satellite, and commercial aircraft observations to predict multi-season storms at spatial resolutions down to 3 km (Droegemeier, 1997; Carpenter et al., 1997, 1998, 1999). The Mesoscale Model version 5 (MM5), developed jointly by the National Center for Atmospheric Research (NCAR) and the Pennsylvania State University (Dudhia 1993, Grell et al, 1994), also has been used for routine regional short range predictions (e.g., Stensrud, 1999). The NCEP Regional Spectral Model (RSM) is closely related to the NCEP global model and has been used for short-range forecast and climate applications (Juang et al, 1997).

In an attempt to build upon these many modeling activities, several groups joined forces in a national-regional numerical weather prediction experiment during the spring, 1998 convective season. Known as SAMEX '98 (Storm and Mesoscale Ensemble Experiment, spring 1998; Droegemeier, 1997), this effort involved a real time comparison of some 20 to 25 ensemble forecasts, run at approximately 30 km resolution using 4 different models, against a much smaller number of forecasts run at both intermediate (10 km) and high (2-3 km) resolution over sub-sets of the ensemble domain.

Coordinated by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma, SAMEX involved the National Severe Storms Laboratory (NSSL), Air Force Weather Agency (AFWA), National Center for Atmospheric Research (NCAR), and the National Centers for Environmental Prediction (NCEP). Additionally, a number of other groups participated in the real time forecast evaluation process, including several National Weather Service Forecast Offices (NWSFOs), the Storm Prediction Center (SPC), Tinker Air Force Base, and the Aviation Weather Center (AWC).

SAMEX was novel in several ways. First, it provided for a direct comparison of several techniques for generating mesoscale ensemble initial conditions (bred perturbations, scaled-lagged average forecasting, Monte Carlo, and multiple physics options). Second, the ensembles from each forecast system were themselves combined to create a multi-model "grand ensemble." Third, it provided a framework for quantifying the relative value and computational expense of low-resolution probabilistic and higher resolution deterministic forecasts and developing techniques for verification and comparison. Fourth, it was conducted as a multi-institution effort that included NCEP and leveraged several ongoing activities in experimental real time NWP. And finally, it exposed operational forecasters, in real time, to technologies that are scheduled to become operational within the next several years. SAMEX involved no comparisons of or competition among models; all participating forecast systems demonstrated similar capabilities, on average, with each excelling in one or more aspects.

Despite its rapid organization and relatively short duration, SAMEX '98 resulted in an unprecedented data set that is providing opportunity for a large number of model and ensemble intercomparisons. Plans now underway to complete higher resolution runs for smaller areas embedded within the larger domain of SAMEX '98 should further enhance the utility of this data set. Future coordinated SAMEX experiments in other regions and seasons are planned as part of the US Weather Research Program (USWRP), and should further contribute to our understanding of the characteristics of different models and their application.

The purpose of this paper is to present verifications of the ensemble forecasting system utilized during SAMEX '98, and to explore the advantages and disadvantages of different perturbation methods as well as the use of different model and data assimilation systems. There is no intent to rank a given model's performance: all models used during SAMEX '98 are state-of-the-art and demonstrated similar capability, each excelling in one or more measures. On the other hand, the identification of specific model problems should provide a basis for improvement.

The logistics of SAMEX '98 and the method by which the ensemble initial conditions were created are described in Section 2. An analysis of ensemble spread is presented in Section 3, and in Section 4 we verify the bias and standard deviation of the individual and ensemble mean forecasts. Rank distribution diagrams are presented in Section 5, and Section 6 includes extensive probabilistic verifications. A summary and discussion are given in Section 7.

2. SAMEX '98 Ensemble Forecast System and Data Set

As indicated in the introduction, the first Storm and Meso-scale Ensemble Experiment (SAMEX '98) was conducted during May over the continental US and central/southern Great Plains. Performed without any new funding, SAMEX '98 was a proof-of-concept effort that sought to:

·  provide an initial quantitative assessment of the value of coarse (30 km) resolution ensemble forecasts relative to a few intermediate (10 km) and high (2-3 km) resolution forecasts;

·  apply short range ensemble techniques and related statistical verification strategies to multi-model mesoscale forecasts;

·  develop and apply strategies for verifying numerical predictions of individual convective storms with emphasis on quantitative precipitation forecasting;

·  expose operational forecasters in a variety of settings to mesoscale ensemble and explicit cloud-resolving numerical predictions;

·  provide the scientific community with initial data sets appropriate for assessing the predictability of the small-scale atmosphere with emphasis on the observations, model physics, and model spatial resolutions needed for generating quality forecasts of meso- and storm-scale phenomena.

Three centers participated in this project by running four different numerical models: the CAPS Advanced Regional Prediction System (ARPS), the NCEP Eta model and the Regional Spectral Model (RSM), and the NCAR/Penn State meso-scale model version 5 (MM5) used by NSSL. (Note that NCAR and the Air Force Weather Agency also participated by running the MM5 system, though not in an ensemble mode.) Each model was used to create a control run, along with a number of runs for which the initial conditions (and, in some cases, boundary conditions) were perturbed. The methods used to generate the perturbations in each system are given below (see Table 1).