Entrenamiento Neural

Embed Size (px)

Citation preview

  • 7/28/2019 Entrenamiento Neural

    1/7

    Neural Network Training Techniques for a GoldTrading ModelErik 0.Brauner and Judith E. [email protected], dayhoff @isr.umd.edu

    Institute for Systems ResearchUniversity of Maryland, College Park

    College Park, M D 207 42Xiaoyun Sun and Sharon Hormby

    ssun @bhi-rearm.comBehavHeuritics, Inc.7240 Parkway Dr, Suite 170Hanover. MD 21076

    AbstractThe purpose of o ur stua'y was to build and evaluate a decision1 support system fo r financia lmodels, incorporating a ,neural network app roach. Since neural network mod els may employ avariety of difSerent trairzing techniques, we have developed an approach to choosing andoptimizing the structures and procedures used during training, to optimize the f i t between thetrained neural network and th e financia l decision-support goals. Here we evaluate alternativemethods o r training a network to orecast gold market prices. Essential to this evaluation is theidentification of an appropriate trading model to evaluate sy s t e m pe ~ ormanc e ,withoutconstraining the details (of the fina ncia l deci sions tha t can be ntade w ith the resulting trainedneural network. The methods explored here were neural network models using both standardand non-standard training techniques, and varying parameters used during weight adjustmentand in the training schedule. We illustrate techniqu es fo r choosing network structure andinputs, data segmentation, error measures j%r training, error measur es fo r validation, andoptimizing validation set lengths. These techniques are applied in this pap er with respect to theaim of forecasting th e price of gold.

    1. IntroductionNeural networks appear to be a highly promising approach to financial forecasting and decision-

    making in financial analysis. Initial experimentation with neural networks have been successful in a varietyof domain s with differin g financial instruments [1],[ ],[3]. In addition, a compendium of theoretical resultshave shown neural networks to be a natural extension of linear regression, adding capabilities for learning,and capturing nonlinear relationships over time [4],[5].

    Since a neural network is a general type of tool, and is not constrained to a particular type offinancial market, or a particu lar type of decisio n, each neural netwo rk must be trained to produce the typeof result needed for a specific decision-mak ing task. To this aim, we have identified a series of trainingtechniques that are key to tailoring a neural network to a particu lar dom ain of decision-making. W eillustrate these techniques in t'erms of predicting the price of gold, and analyze the results with a tradingmodel. In this paper we describe the basic trading model for evaluation of neural network performance inSection 2. Section 3 cove rs the neural network structure and data segmentation. Techn iques for trainingare covered in Section 4, including the choice of error m easures for training, error measures for v alidation,optimizing validation set length s, and tradeof fs in choo sing the nurnber of inputs. Section 5 covers anextension to the trading mo del, in which the cost of each transaction is accounted for.

    57

    http://math.umd.edu/http://math.umd.edu/http://isr.umd.edu/http://isr.umd.edu/http://bhi-rearm.com/http://bhi-rearm.com/http://bhi-rearm.com/http://isr.umd.edu/http://math.umd.edu/
  • 7/28/2019 Entrenamiento Neural

    2/7

    2. A Trading ModelWh en designing a neural netwo rk, an evaluation must be do ne to choos e between different mod els

    and different tI;aining techniques. To evaluate the neural network, its performance must be measured in afashion that reflects the perform ance of the trained network in decisio n-mak ing situations. Wh ereastraditional scientific studies have relied on a root mean square formula for measuring performance, thiscriterion do es not measure performa nce in a trading situation[6],[7]. For this reason a simulated tradingmodel was chosen to evalua te the performan ce of our neural models. Th is model is based upon a modelproposed by Blake LeBaron[8] where the idea is to simulate the trading practices of a large trading firmwhich is changing positions on a daily basis. Th e premise for the model is that, at the close of each day, thetrader is given a forecast for the market tomorrow and must decide whether to take a long or short positionfor the coming day. If the forecast is that the price will rise, then all capital is put into a long position. Ifthe forecast is that the price will drop , then all capital is put into a short position . Sinc e magnitude is notconsid ered, it is only the direction of the forecast that matters. Thu s, if we forecas t the direction co rrectly,then our profit is equal to the absolute value of the percentage cha nge that the comm odity undergoes. If weforecast incorrectly, we lose this amount. For ease of keeping track, we assume that our total capital on thefirst day is simply 1.0, and that the entire accumulated capital is used in each trade . A final capital of 1.5would indicate a profit of .5 or 50%. The trading rule can be expressed mathematically as a change incapital from day t to t+ 1:

    If our forecast for day,,, is correct in direction:else Capital,,, =Captiall*( 1 + % chang e in goldl)Capital,, =Capital,*( 1 - I%change in goldl)Predicting that the price will increase every day is equivalent to a buy and hold strategy. As a

    basis for comp arison we note that over the test period of 1988 -1994 , gold actually lost 17% of its value. Ifone were to use this trading model with a buy and hold strategy the final capital would be 0.83.3. Training Techniques3.1 Network StructureThe neural networks used for ou r experimentation were fully interconnected feedforwardbackp ropaga tion networks. (For a more comp lete description of neural network architectu res and trainingsee [9][lO].) All of our networks were three layer networks consisting of an input layer, one hidden layercomposed of three hidden nodes, and an output layer which supplied the networks prediction of percentagechange in price. The input layer nodes had notransform functions and were supplied with the values of the corresponding inputs scaled between .I and .9.The hidden and output layer nodes used the standard sigmoidal transfer function and supplied an outputvalue between 0 and 1, which was scaled back to the target range. After scaling, output from the singleoutput node was the predicted percentage change in the price of gold for tomorrow.

    Figure 1 illustrates a typical network configuration.

    1 I n p u t s H i d d e n L a y e r O u t p u t L a y e r

    Figure 1. A typical feed-forw ard network configuration used fo r these experiments.

    58

  • 7/28/2019 Entrenamiento Neural

    3/7

    3.2 Data SegmentationWhen training and evaluating the network models, the data is always segmented into three non-overlapping sets: a training set, a validation set and a forecasting set. Th e training set is used to update theweights in the network by fittirig its outp ut to the data points. Th e validation set is never trained o n, but isused to determine when the training is com plete. Too much training results in overfitting, or memorizationof the data points, and too little training results in suboptimal performance . On ce training is finished, theforecast set is used only once by a model to test on completely blind data. Th e importance of using aseparate validation set cannot be over-emphasized . If the forecast set is used to determ ine when to stoptraining, then the model has been tainted, since i t has already seen the forecast set. T o ensure acompletely blind forecast, the model must never use the forecast data in any aspect of its construction, butonly after the model is fully done training.One of our primary assumptions in dev eloping a training technique is that the market goes throughphases and that only the most recent past data is lik.ely to reflect the state of the market at any given time.For this reason we decided to build a separate model to forecast each me month period in the forecast set.Since the forecast set spanned the seven year period of 1988 to 1994, if: was necessary to build 84 separatemodels for each experiment. Ehch one month forecast period used the imm ediately preceding data to formthe training and validation sets. Thi s novel window ing approach is illustrated in Figure 2. Once trainingwas completed for one model and the one m onth period has been forecast, the sets were slid forward in timeuntil the new forecast set was directly after the old one. As an example, if we were interested in forecastingthe one month period Jan. 199 0 we might choose a three month validatiton period of Oct. 1989 - Dec. 1989,an d a one year training period of Sept. 1988 - Sept. 1989. Once this model is completed, we would slidethese periods forward by one mon th to forecast the next period. This; process is then repeated until theentire seven year forecast set has been covered by these forecasting windows. Th e results from theconstruction of these 84 independent models which1 constitute a single experimen t. In practic e it may bebeneficial to extend this rebuilding of the network model to a daily procedure rather than monthly, (i.e.forecast set length of one day)., but for the purposes of our current exp erimentation, th is extension p rovedtoo time consuming.

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . _. . . .

    t i m e

    -----+-+--+ 0training set validation forecastset set--I----------training set validation forecast+ set set

    Figure 2. Windowing of data into training, validation and forec ast sets. The wind ow is then shiftedforward to cover a1 new forecast period.

    Since we assume that the market undergoes phase changes, the natural question arose as to whathappens when o ne of these is reache d. If our training period represents {:lata from on e market pha se and ourforecast set is data from a different market phase, then the model will c [early d o a poor job of forecasting.For this reason, the models performance on the validation set was used not only for deciding when to stoptraining, but also to decide when a phase shift had occu rred. If the validation set reflected a different

    59

  • 7/28/2019 Entrenamiento Neural

    4/7

    market phase than the training set, then repeated training only worsened performance on the validation set.In general, if the training set was from a different phase, the average error over the validation set onlydecreased for the first few training passes. This information was used to make a decision on whether amodel represented the validation period well. If the minimum average error over the validation set wasachieved within the first 20 training passes, we assumed that the model was not converging, and we did notuse the resulting network for decision making. For those periods in which we failed to build a convergentmodel, we as sumed, for purposes of the trading model, that we withdrew from the market and incurred n oprofits or losses.Table 1details the output from one exp erimental trial. Each row represents the output from onenetwork, which was trained for 10,000passe s on the training data. Row 3 shows an example where thenetwork did not conv erge, and a cash out strategy w as used. Th e totals at the bottom are used to m easurethat models performance.

    1/88 I -.075% I Y es I 1.2% I 1.0122/88 I -.15% I Y e s 1 1 % I 1.023 I

    I I Earnings: -17% I I Profits: +145% I Capital: 2.45 ITable 1. Output fr om one experimental trial. Each forecast period represents on e network trained 10,000times on the training data set which precedes it .4. Results4.1 Error Measures for Training

    To tra in a network, a formula for error measure must be defined; the error formula is thenminimized by weight adjustments during training. The error is measured for one training pattern, which is aset of inputs to the network, and its corresp ond ing desired o utpu t, and the error is then prop agate dbackwards through the network to upd ate the weights. Traditionally, squared error is used as the errorfunction for determining weight updates. The form of the squared error function is given by:

    Error(n) =1/2 *(networkOutput, - desiredOutputJ2where Error(n) represents the error for pattern n. However, financial decision-making is evaluated byaccounting for actual gains and losses, which are not raised to the second power as in the squared errorformula. For this reason, we found absolute error a better measure for purposes of training the networks.The error function for absolute error has the form:

    Error(n) =InetworkOutput, - desiredOutput,IThis measure of error, when used with the backpropagation algorithm, results in weight adjustments whichare only depend ent on the direction of the error and not on the magnitude. This bears a better resemblan ceto our trading model. A drawback of absolute error training is that patterns which have very little errorassociated with them will receive as much weight adjustment as those which are farther off. Tocompensate, an error cutoff of 0.01 was chosen . On ly patterns which have error greate r then the cutoff areused for weight adju stme nt, those which the netw ork can predict sufficiently well are not used foradjustments. Wh en used with this cutoff, absolute error was found to produce networks far superior interms of trading profits to those trained with the traditional squared error measure.

    60

  • 7/28/2019 Entrenamiento Neural

    5/7

    4.2 Error Measures for ValildationError measures serve multiple purposes iin training a network.. In addition to prov iding erro r forthe backpropagation algorithm, we need an error measure to evaluati:: performance on the validation set.

    This error measure is our stopping criterion, since it is used to decide when to stop training. After eachtraining pass on the training data, the network forecasts the data in the validation set. When forecastingerror on this va lidatio n set reaches a minimum , we: sto p the training. !Since the error is a measure over allthe data points in th e validation set, three possible measu res seem natural:

    #of In uts Sto ing Criteriamean a h rr

    ~ mean squared errmean abs er rmean square err

    1' i

    0 mean squared enror (MSE) =- (networkOutput, desired0utputi) '

    Mean Profit (%)1.2342.9517 - 'Fl6033.9912.96

    - .1284,0764--

    1mean absolute en or (MAE) =-n i

    InetworkOutput, ---desiredoutput, I0 trading profit over the validation setAll three of these possibilities were tested on various different models. Table 2details the results

    from this comparison for different stoppin g criteria on models with differing numbers of inpu ts. All modelsin Table 2used a training set length of one year and a validation set h g t h of 3 months. Each model wastested in 5 separate trials using randomized starting weights, and the data represents the mean of these 5trials. This table sho ws that mean absolute error was usually the best pixforming error m easure for stoppingcriteria.

    I l 1I l S 1 ~Table 2. Comparison of different stopping criteria o n me an projitJrom 5 different choices of

    inputs. Me an prc,fit is over 5 separate trials with different random seed s.4.3 Optimal Validation LengthEssential to constructing a good network mo del is the choice of validation period length. If theforecast data were equally well represented by all historical data, then the correct amount of da ta to use fo rvalidation would be as much as possible . The changing behavior of thB: market, however, indicates that thesmaller the validation period, the more likely it is to represent the coming forecast period well. To o littlevalidation data makes it imp ossible to get a good m easure of the model 's fitness, so we are left to search forthe minimum pe riod which will give a goo d meas ure of network fitness. Differing validation periods weretested on 3 network models having 8, 11, and 15 inputs, respectively. Figure 3illustrates the results ofthese tests. Each data point given is the mean of 5 separate trials, run with differing random startingweights for the networks.For each of the three networks, a max imu m profit by valida1.ion set length can be s een from th egraphs in Fig. 3. The 8-input inetwork had a maximum at 60 days, the 1. l-inpu t network at 7 0 days, and the15-input network at 100 days. Thus, it appears that the more com plex networks are best validated by longervalidation set len gths.

    61

  • 7/28/2019 Entrenamiento Neural

    6/7

    A clear maximum profit over all models was obtained at a validation set length of 70 days, with the1 1-input network. This m aximum appears to reflect an underlying structure in the market data that allowsthe best predictions to be do ne with an 1 1-input network and a 70-day validation set.

    2.52.3

    2.1

    1.91.7

    c-g 1.5a1.3

    1.1

    0.9

    0.70.5

    20 40 50 70 80 90 1M) 110 120val idat ion length

    Figure 3. Comparison of differing validation periods on three different network models. Profit given isthe mean over 5 separate trials with different random se eds.

    4.4 Optimal Number of InputsInputs were chosen as a variety of market indicators, with past prices of gold and other indicatorssuch as the Dow Jones Industrial Average included. They were rank-ordered according to expectedinfluence based on business and market analysis. Th e networks trained for the experiment in Fig. 3 ha dinputs from the first 8, 1 1, and 15 financial signals in the rank-ordering . From this system , we found 11inputs to be optimal for our trading mo del.5. A Model With Transaction CostsOne of the difficulties with the trading model described in Section 2 is that it does not take intoaccount any transaction costs. To remedy this, we extended the model by inserting a transaction penalty foreach time a position is changed. If we decide to change from a long to a short position, or vice versa, weassume that there is a penalty, which is modeled as a set percentage of the capital being transferred. If wekeep the same position which we had on the preceding day, no penalty is incurred. Thus, every time wechange positions, we remove a percentage of our capital at that time. To explore the impact of thesetransaction co sts on our model we chose to test a model which had been trained with a one year traininglength and a 70 day validation period. Before considering transaction costs, the model yielded a finalcapital of 2.455, indicating a profit of 145% over the seven year forecast period. In fact, the mode l onlyincurred 278 changes of position over this period, making the transaction losses an acceptable amount.Table 3details the final capital and profits for various transaction costs on this model. For costs as high as.25% per transaction, the model still indicates a substantial profit over the markets loss of 17%.

    Transaction Cost I Final Capital I Profit0 I 2.455 I 146%.01% 1 2.388 I 139% I.OS% I 2.137 I 114%. l % I 1.859 I 86 % II 25% I 1.432 143% I

    Table 3. Profit measures ro m one model, with transaction costs considered.

    62

  • 7/28/2019 Entrenamiento Neural

    7/7

    6. ConclusionsW e have demonstrated a unique neural network approach to training which was optimized for gold

    market analysis. Using these training technique s, neural network predictions were used to make goldtrading decisions. The result was that the trading model based on neural network predictions outperformedthe buy-and-holdstrategy significantly. The training techniques described in this paper are general, andcan be applied to any market. In addition, these techniques can be applied to a variety of trading schemes,having different order placement, transaction costs, and execution details.Neural network perfoirmance was evaluated using a trading sclheme modeled after a large tradingfirm that can change positions on a daily basis. W e considered absolute error as the measure for neuralnetwork training because it reflected the trading model that was used to evaluate the trained network. Inaddition, we have shown that the same error measure led to superior results when used as the stoppingcriterion for network training.

    Other training techniques presented here include optimizing the validation set length, which wastailored to reflect the duration of market phase changes. Th e number of inputs was also optimized, after arank-ordering was made of their importance based on business and market analysis. Finally, a tradingmodel that incorporated transaction fees showed that results were still overwhelmingly positive.Future directions include the application of this type of decision support approach to othermarkets, besides gold, and the use of the training techniques presented here on differing financialinstruments. Whereas previ0u.s investigations have demo nstrated that neural networks show great promisefor modeling the nonlinear arid time-varying interactions in the market, this paper demonstrates a ne wcombination of techniques for optimizing neural network training when tailored to financial decision-making in a particular market.6. References[11 Deboe ck, Gu ido J. editor, 1994. Trading on th e Edge. Neural, Genetic, and Fuzzy Systems for ChaoticFinancial Markets. New Y ork: John Wiley and Sons.[2] R. Trippi and E. Turban edjitors, 1992. Neural N4etworksin Finance und Investing. Chicago: ProbusPublishing.[3] Nabney, I. et. al. , 1996. Leading Edge Forecasting Tech niques for Ex change R ate Prediction.Forecasting Financial Marketx. Chichester, John Wiley and Sons.[4] Hornik, K. , M. Stinchom be, and H. White, 19 89. Multilayer Feedfurward Networks are UniversalApproximators. Neural Networks 2, 359-366.[5]Marquez, L., T. Hill, R. Worthley, and W. Remus, 1992. Neural Network Models as an Alternative toRcgrcssion. Neural Networks in Finance and Investing, R. Trippi and 13. Turban editors, Chicago: ProbusPublishing.[6] Klimasauskas, Casimir C. 1994. Neural Network Techniques. Troding on th e Edge , Deboeck, GuidoJ. editor, New Y ork: John Wiley and S ons, pp. 3-26.[7] Pardo, R 1992. Design, Testing and Optimization of trading systemx. New York: John Wiley and Sons.[8 ] LeBaron, Blake, 199 4. Nonlinear Diagnostics and Sim ple Trading Rules for High Frequency ForiegnExchange Rates. Time Series Prediction, A S . Weigend and N.A. Gers henfeld editors, pp.457-474.[9] Haykin, Simon, 1994. Neiirul N etworks. A Comprehensive Foundation. New York: Macmillan.[101 Rumelhart, D.G., Hinton, and Williams, 1986 Parallel Distributed Processing. Cambridge, MA: T heMI T Press.7. AcknowledgementsThis work was supported by: The Maryland Industrial Partnershilp Program (MIPS contract 1716.17), theNational Science Foundation (NSF grant# CDF88-03012), the Institute for Systems Research at theUniversity of Maryland, College Park, and B ehavHeuristics, Inc.

    63