Journal of Economics and Management

An Empirical Application of Genetic Algorithm with Artificial Neural Networks to Forecast the Price of Taiwan Index Futures and the Simulation of Butterfly Trading Strategies

Author Name: Wu, Potsang / Author Name: MeiChi Huang
Organization: National Taipei University / Organization: National Taipei University
E-mail Address: / E-mail Address:

ABSTRACT

Butterfly trading strategy is one of the most important methods for reducingconsiderably high-riskfutures exchange. Regression or time-series modelsare utilized by most researchers in a conventional way. In this paper, an integrated genetic algorithm and artificial neural network with special research design is proposed to forecast the price of Taiwan Index Futures(TAIFEX)of various expiry months. Two butterfly trading strategiesare compared to explore the effectiveness of the new method on futures investment.

The period of study, spans from 2003/03/16 to 2012/9/17, has a total of 2360daily observations. The dependent variables are rate of returns of five different futures from nearby month to far month. Nine technical indices, which include rate of return, trading volume, the volume of open interest, 5-day 10-day K, 5-day 10-day D, 5-day & 10-day Bias, are composes with six lagged lengths. The preliminary variable selection is done by stepwise regression. Anintegratedgenetic algorithm and artificial neural network model, with various training period, systematically searched hidden units, randomly simulated conversion and learning rates, is composed via SAS-IML programming language. Using the moving window method and 35 sets of various base periods for training sample, the rate of returns of five TAIFEXseries are forecasted. Two butterfly trading strategies are applied. The results show that although the forecasting accuracy is generally not very satisfactory, the annualized cumulated returns are quite high, with base period of 180 days, reach around 141% and 76% for strategies#1 & #2, respectively.This study shows that the integrated genetic algorithm and artificial neural network model with special research design is significouldt and helpful for arbitraging futures index investment.

Keywords: Butterfly trading, genetic algorithm, artificial neural network, stepwise regression, movingwindow method

  1. Introduction

With the liberalization of financial markets and diversification, many investment institutions have a high demand on hedge, arbitrage or speculative for futures trading, especially in futures index. The futures spread trading, in particular, have effective control of risks and earn stable profits.Thus,the spread trading futures market has become one of most important types of transactions. Thepractitioners could use a single platform through spread trading to construct its flexible trading strategy. Futures trading are quite risky and could cause a significouldt loss if operate improperly. The butterfly trading strategy, however,is an important means of reducing the risk of futures trading, which compose of two opposite directions, i.e., buy (or sell) contracts in recent months, while selling (or buying) centered contracts and buy (or sell) the far month contract; among them, the middle month is equal to the number of contracts recently the number of the month and far months contract aggregate, this way you could reduce the risk of the transaction.

A profitable butterfly trading strategy requires correct decisions on buying and selling various contracts. There are many studies using time-series models to forecastfutures prices.However, the use of artificial neural networks (ANN) in the financial markets have been drawn increasingly attention. Kimoto & Asakawa (1991) used back-propagation neural network to forecast the Nikkei TOPEX. Tanigawa & Kamijo (1992) uses ANN to forecast Japanese stock price trends. Gencay (1996) usedANNfor buying and selling stock index. Hamid (2004) used ANN for implied volatility forecast of the S&P500 index futures options. Chang (2012) used a novel model by evolving partially connected ANN for stock price forecasting.

Comparing traditional search algorithms with the genetic algorithms (GA),the GA hashigher robustness and could solve non-linear problem. Through the operation of genetic algorithm-based chromosomal evolution, a problem might be encoded into chromosomes with the process of reproduction, crossover, and mutation. The fitness function within GA calculates the adaptation values, and high fitness value chromosome represents better adaption to this environment, which is closer to the optimal solution.Therefore survive and save it into the next generation.In this study, the GA process is used for the selection of proper independent variables and parameters, while the ANN model is used for generating a nonlinear relationship from inputs to outputs through a various set of hidden units.Therefore, the purposes of the study are to establish an integrated GA&ANN model for forecasting futures prices, and to evaluate the effectiveness of butterfly trading strategies on TAIFEX series.

  1. Literature

Kimoto & Asakawa (1991), Irina K (2008) and Huang, A. Y. (2011) used the GA orANN model in the forecasting, diagnosis, decision-making, classification, etc.ANN modelconstructed with "learning process" and "recall process" and its expression could be expressed as a mathematical formula Yj = f (ΣωijXi-θ i), wherein the input unit(Xi) and the output unit(Y). Figure 2-1 shows the ANN architecture diagram neuron model. In the learning process, the known training examples (ie. known input and output values ​​of the historical data) neural network systems are loaded by simulating θi(threshold value),weights ωij, Hj, the conversion function f(.) and learning parameters. The ANN modelcould be used in the forecasting, classification, and adjust the bias weights (simulated synapses), the sign size and so on.Guoqiang Zhang (1998) used artificial neural networks (ANNs) for forecasting has led to a tremendous surge in research activities in the past decade. While ANNs provide a great deal of promise, they also embody much uncertainty. Researchers to date are still not certain about the effect of key factors on forecasting performance of ANNs. This paper presents a state-of-the-art survey of ANN applications in forecasting. Our purpose is to provide a synthesis of published research in this area, insights on ANN modeling issues, and the future research directions. Stephan Dreiseitl (2002) used Logistic regression and artificial neural networks are the models of choice in many medical data classification tasks. In this review, we summarize the differences and similarities of these models from a technical point of view, and compare them with other machine learning algorithms. We provide considerations useful for critically assessing the quality of the models and the results based on these models. Finally, we summarize our findings on how quality criteria for logistic regression and artificial neural network models are met in a sample of papers from the medical literature. David Bejou (2013) used the Relationship marketing has emerged as a focal point by which a company can succeed in a competitive environment. Understanding the success of methods used to develop long-term relationships with consumers, thus, becomes critical in the process of gaining competitive advantage. This article reviews the relationship quality (an important component of relationship marketing) literature and examines the factors that previous research has shown to be important. The article then presents an analysis of a survey of financial services consumers using a relatively new technique called artificial neural network analysis (ANNA). The technique is used to investigate the potential determinants of relationship quality. Methodologically, ANNA is shown to have a better predictive power than more conventional analytic techniques such as multiple regression.The ANN model has been applied to the forecasting of the stock markets (Kimoto & Asakawa, 1991; Tanigawa & Kamijo, 1992).

Figure 2-1 ANN model diagram

  1. Research Methods

3.1Data Source

This study uses settlement price, trading volume, open interests volume andtrading turnoverof five different expiration months Taiwan Futures Exchange Index(TAIFEX).2360daily data, ranging from March 16, 2003 to September 17, 2012, are collected from the Taiwan Futures Exchange Corporation.

3.2Variable definition

In addition to settlement price, trading volume, and open interests volume of five different expiration months TAIFEX series, this study also computes important technical indexes such as 5-day and 10-day K, D, and BIAS values. The forecasting model setups,for each TAIFEX series, include the index futures returns as the dependent variable and lagged six periods of 8 independent variables plus the dependent variable. Thus, there are 54 predetermined variables as input variables. Then, all initial independent variables are standardizedfor adjusting scale differential as; where i from 1 to 54. The dependent variable is converted into 0 to 1 for proper ANN simulation by using logistic transformation as.

3.3Analytical methods

Different from the previous studies, this study integrates GA and ANN model, using back-propagation ANN model and use stepwise regressionfor selecting significant independent variables .

where ωij represents input parameters; where λh represents the conversion factor of input layer to hidden layer; where Hj represents hidden unit. Furthermore,Let's explore the operation change of which from hidden layer to output layer; , where λo represents the conversion factor of hidden layer to output layer.

The back-propagation learning adjustment is as follow:

∆ωij=ηhλhXi, where ηh is the learning rate of input layer to hidden layer; ∆=ηoλoHi, where ηo is the learning rate of hidden layer to output layer(see figure3-1).

Figure 3-1 Back-propagation ANN model

The analytical procedure starts with the stepwise regression using minimum Mallow’s Cpcriterion for selectingstatistically significant IVs. Then, based on the self-developed GA&ANN using the SAS-IML programming language, the moving window method is applied. Using 30 days based period as an example, with the first to 30th day of training sample to simulate and establish an optimal GA_ANN model for forecasting the 31th day returns, and then use the second day to 31th day of training sample to forecast the 32th day returns, and so on till the end of the sample. The method is applied to five different expiration months of TAIFEX.The features of thedesigned integrated modelare described below:

  1. The GA method is utilized to select the appropriate numbers of independent variables. The number of IVs are simulated from integers of 2/k to k by 1, k is the number of IVs in the system.
  2. Then, the ANN training process is applied using backward propagation method.
  3. The base period is simulated using 10, 15, 20, to 180 days, with a five-day interval, i.e., a total of 35 different base periods.
  4. In order to obtain better results, the parameters of conversion rates(λh λo)and learning rates(ηhηo)are randomly simulated from 0 to 3, independently.
  5. For each training sample, the maximum hit ratio (HR) and R-square (RSQ) are tested and compared.The highest HR, which is better than RSQ in forecasts throughout the entire sample period, is eventually selectedfor two investment strategies.

Figure 3-2 Research Flowchart

3.4Comparison of Investment Strategies

Based onthe forecasted returns of five different seriesand apply two trading strategies, investment returns on each day are calculated, and the accumulated annualized returnsare computed and compared. The computationprocedure is as follows:

  1. For each simulated base periods,find theforecasted returns from different expiration months on each day, from the nearest to the farther months,as PR1 to PR5 and the actual rate of return (AR1~AR5).
  2. Sort the PR1 ~ PR5 and identify the smallest, the second smallest and the largest values among them.
  3. On trading strategies 1, buya TAIFEX index with the largest forecasted returns and sell another TAIFEX index with the smallest forecasted returns.Calculate the net and accumulated returns each day throughout the sample period.
  4. On trading strategies 2, buytwo units of a TAIFEX index with the largest forecasted returns and sell two other TAIFEX indexes with the smallest and the second smallest forecasted returns.Calculate the net and accumulated returns each day throughout the sample period.
  1. Results

4.1 Descriptive Statistics of Sample Overview

A simple descriptive statistics is summarized as table 4-1.

Table4-1 Descriptive Statistics of Sample Overview

Note: R represents index future returns, P represents futures prices, V represents the amount of outstanding contracts, Q represents volume. The numbers1~5 denote inter-to outer-month of TAIFEX series.

4.2 Results of Integrated GA & ANN Model

Table 4–2 shows the average values of final results of GA-ANN model. The number of iterations for each TAIFEX series is 500,000. The internal validity hit ratios(HR) are from 65% to 69%. Although the external validity hit ratios (PHIT, predicted HIT) are from 47% to 52%, it could be shown below that the butterfly trading strategies are still satisfactory. The LAMH and LAMO are around 1.3 and 0.25, respectively. The ETAH and ETAO are around 1.2 and 0.22, respectively.

Table4-2Simulated Mean Results of Integrated GA & ANN Model

Note: J is the number of hidden units; RSQ: the coefficient of determination in training phase; HR:the hit rate in training phase;LAMH (λh): the conversion ratefrom input layer to hidden layer; LAM0 (λ0) :the conversion rate from hidden layer to output layer; ETAH (ηh) :the learning rate from input layer to hidden layer; ETA0 (η0):the learning rate from hidden layer to output layer; PHIT the hit rate for the forecast period.The training base periods are 10, 15, 20 until 180 days with a five-day interval, a total of 35 bases.

4.3 Results of simulated butterfly trading strategies

This study considers two butterfly trading strategies.Strategy one is to buy the highest rising percentage of its expiry month and sell out the lowest rising percentage of its expiry month. Strategy two is to buy two units of the highest rising percentage of its expiry month and sellthe lowest and second to lowest rising percentage. Transaction costs of tax and fee are deducted from investment returns.

Figure 4-1 to figure 4-3 shows the simulated cumulative returns of different base periods in two strategies.Summary findings are discussed below:

Figure4-1simulation results

Based on self-developed genetic algorithm and neural network programming in 35 different training during training base period under the hit rate (HR) in the past month TAIFEX (R1) ranged from 61% to 71%; at times TAIFEX recent month (R2) between 63% ~ 71%; on the third month TAIFEX (R3) is between 63% to 70%; in the fourth month TAIFEX (R4) is between 61% ~ 70%; in the fifth month TAIFEX (R5) ranged from 61% to 67%. On the other hand Based on self-developed genetic algorithm and neural network programming in 35 different training during training base period under the hit rate (HR) in the past month TAIFEX (R1) ranged from 61% to 71%; at times TAIFEX recent month (R2) between 63% ~ 71%; on the third month TAIFEX (R3) is between 63% to 70%; in the fourth month TAIFEX (R4) is between 61% ~ 70%; in the fifth month TAIFEX (R5) ranged from 61% to 67%.During the test 35 different training base under the hit rate (PHIT) in the past month TAIFEX (R1) between 48% to 60%; at times in recent months TAIFEX (R2) between 48% ~ 54%; TAIFEX of the third month (R3) is between 47% ~ 55%; in the fourth month TAIFEX (R4) between 48% ~ 53%; in the fifth month TAIFEX (R5) medium at 50% to 58%. The hit rate during training (HR) as the training period has decreased gradually lengthening trend. The hit rate during the test (PHIT) has gradually lengthen the training period with the upward trend. From the figure4-1 could see the phenomenon, with the increase in the base period of training will increase compensation which the best performing 180 days.

Figure4-2 trading strategy1 base = 180 days of cumulative gain or loss, the cumulative net profit or loss and the average settlement price Chart

Figure4-3 trading strategy2 base = 180 days of cumulative gain or loss, the cumulative net profit or loss and the average settlement price Chart

  1. Conclusions

In this study, we use Self-developed genetic algorithm, neural network program and against five different expiration months TAIFEX for futures price forecasting. The findings of this study are summarized as the following:

With uniformly and randomly simulated both on conversion rates(λh &λo)and learning rates(ηh &ηo), the optimal base period is found at 180-day base period. Butterfly trading strategies show that the cumulative portfolio returns with the consideration of tax and transaction fee costs are the highest at 180-day base period. Future researches might be needed for better forecasting accuracy, either by adding more variables or trying more sophisticated models.

Reference

[1]Kimoto, T. and K. Asakawa(1991), “Stock Market Forecastion System with Modular Neural Networks”, IEEE International Joint Conference on Neural Networks PⅠ1-I6.

[2]Tanigawa, T. and K. Kamijo (1992), "Stock Price Pattern Matching System: Dynamic Programming Neural Network Approach," Proceeding of the International Joint Conference on Neural Networks, 465-471.

[3]Back, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York.

[4]Gencay (1996), “Using the moving average method as neural networks determine buy and sell stock index“ ,Journal of International EconomicsVolume 47, Issue 1, 1 February 1999, Pages 91–107

[5]Guoqiang Zhang (1998), “Forecasting with artificial neural networks”, The state of the art”, International Journal of Forecasting, Volume 14, Issue 1, 1 March 1998, Pages 35–62