Bayesian Neural Networks for Tornado Detection

THEODORE B. TRAFALIS MICHAEL B. RICHMAN

BUDI SANTOSA School of Meteorology

School of Industrial Engineering University of Oklahoma

University of Oklahoma 1100 East Boyd, Suite 1310

202 West Boyd, Suite 124 Norman OK, 73019 USA Norman, OK 73019 USA

Abstract: - In this paper, conventional feedforward artificial neural networks (ANNs) and Bayesian neural networks (BNNs), are applied for tornado detection. All methods are employed, using radar derived velocity data, to distinguish pre-tornadic circulations (known as mesocyclones) that remain nontornadic from those which become tornadic. Computational results show that the BNNs are more skillful than conventional ANNs for this type of discrimination. The additional skill seen in BNNs is derived from the ability of the technique to more accurately forecast when mesocyclones remain nontornadic.

Key words: ANN, chaotic systems, classification, detection, error generalization, feedforward neural networks, machine learning, performance analysis, severe weather, training samples

5

1.  INTRODUCTION

Detection of tornadoes with ample warning times has long been a goal of severe weather forecasters. Weather phenomena on the space scale of a tornado are thought to be deterministically chaotic, as the observational networks cannot resolve well the small scale circulations. Moreover, there is no accepted physical model describing the range of atmospheric conditions leading to the formation of tornadoes. Radzicki [7] defines deterministic chaos as characterized by self-sustained oscillations whose period and amplitude are nonrepetitive and unpredictable, yet generated by a system devoid of randomness. Whereas the precise location and time tornadoes will strike is unknown with advance lead-times, it is known what conditions lead to their development, when and where they are most frequent, and their likely trajectories. Operationally, when large scale atmospheric conditions appear conducive to tornado development, the Storm Prediction Center issues tornado watches. When tornadoes are detected visually or when pre-tornadic circulations (known as mesocyclones) are sensed by Doppler radar, a tornado warning is issued. With state-of-the-science weather radar, high speed computing and advanced signal processing algorithms, steady progress is being made on increasing the average lead-time of such warnings. As evidenced in the spring of 2003 in the United States, with a record number of tornadoes and a relatively small number of deaths, an extra minute of lead-time can translate into a number of lives saved. Hence the need to detect as many pre-tornadic circulations as possible is an important aspect of a tornado detection algorithm. However, meeting this goal can result in an algorithm that predicts tornadoes when none are observed. This is known as a “false alarm”. A high false alarm rate can lead to the public ignoring warnings. One of the severe weather detection algorithms, created by the National Severe Storms Laboratory and in use at the Weather Surveillance Radar 1998 Doppler (WSR-88D), is the Mesocyclone Detection Algorithm (MDA). This algorithm uses the outputs of the WSR-88D and is designed to detect storm–scale circulations associated with regions of rotation in thunderstorms. The MDA is used by meteorologists as one input in their decision to issue tornado warnings. Marzban and Stumpf [6] have shown that the performance of MDA is worse than artificial neural network (ANN) post-processing of the MDA.

In this paper, ANNs and Bayesian Neural networks (BNNs) are applied to detect tornado circulations sensed by the WSR-88D radar. A Bayesian framework is utilized for machine learning, using the evidence framework, to develop a variant of ANNs for discriminating between mesocyclones that remain nontornadic from those that become tornadic.

The paper is organized as follows. In section 2, the definition of the problem is provided. Section 3 describes the data, whereas, Section 4 provides a brief overview of ANNs and BNNs and the methodology used herein is discussed. In section 5, the experimental setting is detailed. Section 6 provides sensitivity analysis of ANNs and BNNs for several forecast evaluation indices. Finally, Section 7 concludes with specific recommendations.

2. PROBLEM STATEMENT

There are two classes of problems addressed in this research. One is physical and the other is methodological. The two are intimately entwined for the prediction of tornadoes.

There are two challenges involved in tornado warnings from the meteorological viewpoint. The first one is tornado detection. Of those tornadoes that do occur, the number of tornados detected is smaller. The second one is false alarms. This means that the algorithms detect tornado circulations more often than such circulations can be confirmed. This is insidious because the warnings have the potential to go unheeded by the public after a series of false alarms. Accordingly, it is desirable to develop a statistical algorithm that will maximize detection and minimize false alarms. Prediction of tornadoes is a difficult task owing to the small scale of their circulation and the rapid production in the atmosphere. They can form within minutes and disappear just as quickly. The best tool in the meteorologist’s arsenal to remotely sense tornadoes is Doppler radar. However, present day operational radar takes approximately 6 minutes to complete one volume scan and the spatial resolution averages close to ¼ km for Doppler radar velocity. Many tornadoes and pre-tornadic circulations are smaller than that. Despite the challenges, lead times for tornadoes have increased from a few minutes (a decade ago) to approximately 11 minutes (with current radar), largely due to improvements in algorithms that use the radar data as inputs.

The second research problem is to develop an intelligent system that can generalize well with data that have a significant noise component. ANNs are considered robust classifiers in terms of input noise. Recent work in BNNs shows that BNNs can be more effective classifiers with noisy data in terms of generalization ([4], [5]).

3. DATA AND ANALYSIS

The data set used for this research is the outputs from WSR-88D radar. Tornadoes are one of the three categories of the severe weather. The others are: hail greater than 1.9 cm in diameter and non-tornadic winds in excess of 25 ms-1. Any circulation detected on a particular volume scan of the radar data can be associated with a report of a tornado. In the severe weather database, supplied by NSSL, there are two truth numbers, the first for tornado ground truth, and the second for severe weather ground truth [6]. Tornado ground truth is based on temporal and spatial proximity of the circulation to the radar. If there is a tornado reported between the beginning and ending of the volume scan, and the report is within reasonable distance of a circulation detection (input manually), then the ground truth value is flagged. If a circulation detection falls within the prediction "time window" of -20 to +6 minutes of the ground truth report duration, then the ground truth value is also flagged. The key behind these timings is to determine whether a circulation will produce a tornado within the next 20 minutes, a suitable lead time for advanced severe weather warnings by the National Weather Service. Any data with the aforementioned flagged values are categorized as tornado cases with label 1. All other circulations are given as label -1, corresponding to a no tornado case.

The predictor pool employed in this study consists of 23 attributes based on Doppler velocity data. These same attributes have been used successfully by Marzban and Stumpf [6] in their work on post-processing radar data.

4. METHODOLOGY

4.1 Artificial Neural Network (ANN)

ANN models are algorithms for intellectual tasks such as learning, classification, recognition, estimation, and optimization that are based on the concept of how the human brain works [3]. An ANN model is composed of a large number of processing elements called neurons. Each neuron is connected to other neurons by links, each with an associated weight. Neurons without links toward them are called input neurons and those with no link leaving away from them are called output neurons. The neurons are represented by state variables. State variables are functions of the weighted-sum of input variables and other state variables. Each neuron performs a simple transformation at the same time in a parallel-distributed manner. The input-output relation of the transformation in a neuron is characterized by an activation function. The combination of input neurons, output neurons, and links between neurons with associated weights constitute the architecture of the ANN.

One of the advantages of using ANNs is that they can extract patterns and detect trends that are often too complex to be noticed by either humans or other computer techniques. ANNs are appropriate for capturing existing patterns and trends from noisy data. The procedure involves training an ANN with a large sample of representative data and testing the ANN by using data not included in the training with the aim of predicting the new outputs of the ANN. The training process involves different numbers of layers (inputs, output, and hidden) and neurons and links between neurons with associated weights. The last layer represents the output. The number of hidden layers is user-defined. The user can modify how many neurons each layer has. Training and testing error tolerances can also be adjusted by the user. After the network has been trained and tested to the user satisfaction, it is ready for use. New sets of input data can be presented to the network, and they will produce a forecast based on what it has learned. A trained ANN can be treated as an expert in the category of information it has been given to analyze. This expert can then be used to provide predictions given new daily situations, and answer what-if questions.

4.2 Bayesian Neural Network

The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay ([4], [5]).

Bayesian methods have been proposed for neural networks to solve regression and classification problems. These methods claim to overcome some difficulties encountered in the standard approach such as overfitting.

In conventional approaches, the training for ANNs is based on the minimization of an error function, and is often motivated by some underlying principle such as maximum likelihood. The disadvantage of such approaches is that the designed networks can suffer from a number of deficiencies, including the problem of determining the appropriate level of model complexity. More complex models (e.g. ones with more hidden units or with more number of layers or with smaller values of regularization parameters) give better fits to the training data, but if the model is too complex it may give poor generalization (overfitting).

The Bayesian viewpoint provides a general and consistent framework for statistical pattern recognition and data analysis. In the context of neural networks, a Bayesian approach offers several important features including the following [1]:

·  The technique of regularization arises in a natural way in the Bayesian framework. The corresponding regularization parameters can be treated consistently within the Bayesian setting, without the need for techniques such as cross-validation.

·  For classification problems, the tendency of conventional approaches to make overconfident predictions in regions of sparse data can be avoided.

·  Bayesian methods provide an objective and principled framework for dealing with the issue of model complexity (for example, how to select the number of hidden units in a feed-forward network), and avoid many of the problems of overfitting which arise when using maximum likelihood.

4.3 Forecast Evaluation Indices for Tornado Detection

In the detection paradigm, the forecast results are evaluated by using a suite of forecast evaluation indices based on a contingency table (otherwise also known as a "confusion matrix", Table 1).

The cell counts (a, b, c, d) from the confusion matrix can be used to form forecast evaluation indices [9]. In this definition of the confusion matrix, one such index is the Probability of Detection, POD, which is defined as POD = a/(a+c). POD measures the fraction of observed events that were correctly forecast. Its range is 0 to 1 and a perfect score is 1 (or 100%). Note that POD is sensitive to hits, therefore, good for rare events. However, POD ignores false alarms and it can be improved artificially by issuing more "yes" forecasts to increase the number of hits. False Alarm Rate, FAR, is defined as FAR = b/(a+b). FAR measures the fraction of "yes" forecasts in which the event did not occur. Its range is 0 to 1, and 0 is a perfect rate. FAR is sensitive to false alarms and it ignores misses. It can be improved artificially by issuing more "no" forecasts to reduce the number of false alarms. The concept of skill is one where a forecast is superior to some known reference forecast (e.g., random chance). Skill ranges from –1 (anti-skill) to 0 (no skill over the reference) to +1 (perfect skill). Heidke’s skill formulation is commonly used in meteorology since it uses all elements in the confusion matrix and works well for rare event forecasting (e.g. tornadoes) [2]. Heidke’s Skill = [2(ad-bc)/(a+b)(b+d)+(a+c)(c+d)].

5. EXPERIMENTS

In the experiments, the data are split into two sets: training and testing. For the training set, the ratio between tornadic and nontornadic observations is about the same. In the testing sets, the ratio is varied from 2% to 10% in 2% increments and the sensitivity of each method to this ratio is determined. The cases used for training are different to those used in the testing set. The same training and testing sets are applied to all methods. All experiments were performed using a Pentium IV computer. The BNN and ANN experiments are performed in the MATLAB environment.