Paper Template for HIC2004

APPLYING ARTIFICIAL NEURAL NETWORKS FOR fORECASTING AND ESTIMATION OF RIVER LEVELS

nITTAYA WANGWONGWIROJ AND ANNA SUKLUAN

Civil Engineering Department,

King’s Mongkut University of Technology Thonburi,

91Pracha-utis rd., Bangmod, Tungkhru district, Bangkok, 10140, Thailand

This paper investigates the applicability of artificial neural networks (ANNs) to forecasting and estimation of maximum water level data of the Chao Phraya River in Thailand. A multi-layer feed forward network that employs the back propagation training technique was used as the ANN structure. In the forecasting part of the study, predictions of maximum water levels 1-day to 5-days ahead were elaborated and compared. In the estimation part, three issues are addressed; (a) extrapolation of the maximum water level by applying the trained neural network to the data beyond the calibration range; (b) filling up river levels with the concurrent upstream and downstream data; and (c) infilling missing data using the information relevant from the nearby stations. The study results based on graphical and statistical assessment are satisfactory and promising.

INTRODUCTION

The design, planning and operation of river systems depend largely on relevant information derived from extreme events forecasting and estimation. Reliable flood forecasts are particularly important for warning against dangerous flood and inundation as well as in the case of multi-purpose reservoirs. Hydrological data estimation is also significant since the time series data often exhibits some form of deficiency due to the presence of gaps, discontinuities and inadequate length.

The application of artificial neural networks (ANNs) to various aspects of hydrological modeling has undergone much investigation in recent years. This computational method offers real advantages over conventional modeling, especially when the underlying physical relationships are not fully understood (Nayak et al. [1]).

This study aims at applying ANNs as a modeling tool to predict and estimate the maximum water level data of the Chao Phraya River in Thailand. Several ANN applications comprise of predictions of river levels 1-day to 5-days in advance, model generalization beyond the calibration range, and infilling of incomplete data sets.

ARTIFICIAL NEURAL NETWORK (ANN)

ANN is a type of biologically inspired computational model. The functioning of ANN is based on its learning process. A network is made up of a number of interconnected nodes (called neurons) arranged into three basic layers – input, hidden and output. The input nodes perform no computation but are used to distribute inputs into the network. This kind of network is called a feed forward network as information passes one way through the network from the input layer, through the hidden layers and finally to the output layer.

In this study, a standard methodology based on feed forward neural networks, trained with the standard back propagation algorithm which uses a set of input and output patterns was applied. An input pattern is used by the system to produce an output, which then is compared with the actual output. If there is no difference, then no learning takes place. Otherwise, if there is difference, the weights are changed backward to reduce the difference (see Haykin [2] for details).

APPLICATION OF ANNs TO THE CHAO PHRAYA RIVER DATA

Methodology

A Mathlab code was written for the feed forward back propagation method. The application of the ANNs to the time series data consisted of two steps. The first step was the training of the neural networks, which comprised the presentation of daily water level data describing the input and output to the network and obtaining the inter-connection weights. Once the training stage was completed the ANNs were applied to the testing data. Determining an appropriate architecture of a neural network for a particular problem is an important issue, since the network topology directly affects its computational complexity and its generalization ability (Cigizoglu H.K. [3]).

The number of hidden layers and the number of nodes in the input and hidden layers were determined after trying various network structures. The network structure providing the best training results, i.e. the lowest root mean square error (RMSE) and the highest coefficient of correlation (R2), was also employed for the testing stage. Prior to training, the connection weights and the biases in the ANN model were initialized with random numbers in the range (-1, 1). The training input and output data were scaled between 0 and 1 except for the extrapolation case.

The Chao Phraya River data

The catchment of the Chao Phraya River is depicted in Figure 1. Daily water level data from three gauging stations (C13, C3 and C7a) in the upper part of the catchment were used as exemplification of applying ANN model.

In order to forecast the extreme events, the recorded daily water levels during flood period, i.e. the data range from 1st August to 31st December each year were considered. The water level data at station C3 are not complete in the year 1996 and 1997; therefore they are applied only in the infilling study and not included in the forecasting part. The time periods selected for training and testing are presented in Table 1.

Figure 1. Map of the Chao Phraya River catchment and gauging stations

Table 1. Training and testing periods for different ANN applications

Forecasting
(1 day - 5 days ahead) / Estimation
Extrapolation / Infilling
Case I (using concurrent data) / Infilling
Case II (using data from nearby stations)
Training / 1990-1995,
1998-1999 / 1990-1994, 1998, 2000 / 1990-1995, 1998-1999, 2001-2002 / 1990-1995,
1998-2002
Testing / 2000-2001 / 1995, 1999, 2001-2002 / 2002 / 01.08.1996-30.09.1996, 01.12.1996-30.09.1997

Maximum water level forecasting (predictions 1-day to 5-days ahead)

The first set of applications was carried out for 1-day prediction in advance using the maximum water levels at station C3. The simulations during the training and testing stages showed that a network structure with one hidden layer having seven nodes, and six input nodes in the input layer provided the lowest RMSE and highest R2 to forecast the unique value in the output layer. The input layer nodes represented the five previous daily water levels (times and the output layer node corresponded to the water level at time. The predictions and scatter plots for the training period are compared with the observed data in Figure 2. The statistical measures exhibit satisfactory agreements between observed and forecasted data, i.e. RMSE = 0.2396 m and 0.2566 m, R2 = 0.994 and 0.992 for training and testing stages respectively.

Figure 2. One-day ahead model predictions compared with the observed data and scatter plot (training stage)

Table 2. Network patterns and statistical measures for forecasting river levels

Forecasting / Network / RMSE (m) / R2 / DWLmax* (m)
Train / Test / Train / Test / Train / Test
1 day ahead / 6-7-1 / 0.2396 / 0.2566 / 0.994 / 0.992 / 0.20 / 0.27
2 days ahead / 6-6-1 / 0.4251 / 0.4431 / 0.980 / 0.978 / 0.33 / 0.41
3 days ahead / 6-8-1 / 0.5854 / 0.6074 / 0.961 / 0.960 / 0.40 / 0.54
4 days ahead / 6-7-1 / 0.7131 / 0.7614 / 0.942 / 0.937 / 0.61 / 0.68
5 days ahead / 6-5-1 / 0.8562 / 0.9723 / 0.917 / 0.909 / 1.07 / 1.09

* Maximum water level difference

Figure 3. Plots of 3-days and 5-days ahead predictions compared with observed data

The performance of the ANNs was also extended to predictions 2-days to 5-days ahead to verify the adequacy of information presented to the model and the existence of model limits. It was found from Table 2 that the RMSE increases with time ahead of prediction while the R2 decreases noticeably. The differences between the peaks forecasted and observed water levels also increase with lead time. Figure 3 compares river level predictions and observed data for the 3-days and 5-days forecasts. The forecasted 3-days in advance data slightly deviate from the observed time series but remain satisfactory. In the 5-days ahead prediction, deviations are found for water level variations both on the low flow and for the main peak. Forecasting result of 4-days ahead (not shown) also shows some deviations and provided higher RMSE and lower R2 values. These results suggest that input information is adequate for water-level forecasts up to
3-days in advance.

Extrapolation ability (generalization beyond the calibration range)

The next purpose of the model is to estimate water level evolution when water-level rise may produce extreme values beyond the range of the available data. To investigate this application, the observation data at station C3 were re-arrange to group the peak magnitude of the water levels. The period of training and testing data are presented in Table 1. The range of water level data during training and testing stages are between 1.55 – 10.90 m and 1.79 – 12.98 m respectively. A network of six input nodes and five hidden layer nodes provided the best performance criteria for the testing period (RMSE = 0.2530 m, R2 = 0.994). The data were scaled so that the training data values were between 0.2 and 0.8. The results shown in Figure 4 clearly illustrate the capability of ANNs in applying the trained model to the data beyond the calibration range.

Figure 4. Model predictions compared with the observed data in the extrapolation study

Infilling ability Case I: Using concurrent data from upstream and downstream stations

This application describes the efficacy of ANNs to infill the missing data. The first case involving data series with data gaps in which vicinity one or more concurrent but complete data series are available. Time series data sets applied in this application are complete and thus exhibit no data gaps. Missing data at station C3 was however, assumed to occur in the year 2000 to evaluate the performance of the model. The complete data series at upstream (station C13) and downstream (station C7a) locations were used as input data in each experiment. The output node denotes the data at station C3. In testing stage, the concurrent data from upstream and downstream stations were used as input data. The statistic assessments are illustrated in Table 3 and the comparisons between observed and filled data at station C3 for testing stage are plotted in Figure 5.

Table 3. Network patterns and statistical measures for infilling of water levels at station C3 using concurrent data from upstream and downstream stations

Input node / Network / RMSE (m) / R2 / Cross-correlation
Train / Test / Train / Test / C3 C13 / C3 C7a
C13 (upstream) / 1-5-1 / 0.2145 / 0.1960 / 0.996 / 0.995 / 0.9902 / -
C7a (downstream) / 1-5-1 / 0.4490 / 0.3713 / 0.980 / 0.984 / - / 0.9796
C13 & C7a / 2-7-1 / 0.2006 / 0.1865 / 0.997 / 0.996 / - / -

(a) (b)

Figure 5. Observed and filled data using input from (a) upstream and (b) downstream stations

It was seen that estimation using input data from upstream station gives better performance than downstream location. This is because the selected station (C3) is more correlated with upstream data than the downstream time series. Furthermore, the statistic measures from Table 3 indicate that using more informative data from upstream and downstream stations can strengthen the model performance.

Filling ability Case II: Using relevant data from nearby stations

This application was applied with the incomplete data at station C3 during the year 1996 and 1997. The relevant time series data from upstream (station C13) and downstream (station C7a) were used as input data. The output node represents the water level at station C3. The filled data at station C3 compared with the data from nearby stations shown in Figure 6 illustrates this application of ANN model.

Figure 6. Infilling data at station C3 compared with data from nearby stations

CONCLUSION

The potential of ANNs for such hydrological applications as forecasting, generalization and estimation of daily water level data was examined for three gauging stations of the Chao Phraya River catchment. The model performance was discussed extensively based on graphical and statistical assessment.

The results of the study are highly encouraging and suggest that ANN approach is viable for forecasting and estimation of river levels. Further investigation may be needed for applying prior information from mixed hydrological records such as precipitation data, river flows and levels.

REFERENCES

[1] Nayak P.C., Sudheer K.P., Rangan D.M. and Ramasastri K.S., “A neuro-fuzzy computing technique for modeling hydrological time series”, Journal of Hydrology, Vol. 291, (2004), pp 52-66.

[2] Haykin S., “Neural Networks: A Comprehensive Foundation”, Macmillan College Publishing Company, New York, USA, (1995).

[3] Cigizoglu H.K., “Estimation, forecasting and extrapolation of river flows by artificial neural networks”, Hydrological Sciences Journal, Vol. 48, No. 3, (2003), pp 349-361.