[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)]...

Preview:

Citation preview

Application of Mutation Only Genetic Algorithm

for the Extraction of Investment Strategy in

Financial Time SeriesXia Pan

Department of PhysicsHong Kong University

of Science and TechnologyClear Water Bay

Hong Kong SAR, China

Jian ZhangDepartment of MathematicsHong Kong University

of Science and TechnologyClear Water Bay

Hong Kong SAR, China

K.Y. SzetoDepartment of PhysicsHong Kong University

of Science and TechnologyClear Water Bay

Hong Kong SAR, ChinaE-mail: phszeto@ust.hk

Abstract-We use the recently introduced method ofMutation Only Genetic Algorithm (MOGA) to search for goodstrategies of investment in financial time series, as measured bythe yield over a fixed period of investment. The rules for buy,sell or hold are introduced as conditional statements involvinginequalities of various moving averages, and encoded in astring representation chromosomes in MOGA. The extractionof good investment strategies corresponds to the discovery ofrules that are fit in the sense of evolutionary computation. Theinvestment strategy is evaluated using the rate of overall returnin both the training set and the test set, thereby converting theproblem of discovering good investment strategies to anoptimization problem in combinatorics. Stock data fromNASDAQ, including Microsoft, Intel, and Dell are tested. Wehave compared the performance of the investment rulesinvolving a single stock and that involving two stocks. Withinthe confme of limited data, we find that rules that allow buy,sell, hold and swap between two stocks are superior in allsamples tested.

I INTRODUCTION

Technical analysis has been popular among the foreignexchange dealers and stock market investors to makeprediction in time series. Many technical indicators havebeen developed during the last three decades, of which themoving averages are among the simplest. Their efficientusages remain worthy for further investigations. In thispaper, we follow the idea of our previous works [1] to findways to further improve the performance of investment rulesin the stock market. The conditional statements for buy andsell are established as statements involving inequalities ofvarious moving averages, in different permutations. Theforecasting problems in financial time series are thereforerephrased as an optimization problem, which has beenshown to be effectively solved in our previous work [1] bygenetic algorithms. Prompted by the success of the recentworks in mutation only genetic algorithm (MOGA) [2,3] insolving the knapsack problem, we attempt to use this

0-7803-9422-4/05/$20.00 (C)2005 IEEE

method of MOGA to handle a generalization of extractionof investment rules in financial time series. This ismotivated by the success of traditional Genetic Algorithmsin finding good solutions with less time compared to manytraditional methods such as Random Walk, Buy and Hold,Exhaustive Search in the paper by Jiang and Szeto[l].Therefore, it is natural for us to try our MOGA on a morecomplex situation of rule extraction in multiple time series.We like to address the problem of finding the best strategy,in terms of highest profit in the test set, for two time series.This problem involves a rule with a conditional part thatcontains moving averages ordering of both time series, andaction part of the rule that allows us to buy, sell, of one ofthe two stocks, or swap the two stocks, or hold. Ourintuition prompts us to believe that this formal of morecomplex rule will lead to higher profit that the rulesinvolving only one stock. This intuition is satisfactorilyverified by numerical test on real data involving Microsoft,Intel and Dell.

After we verify the accuracy of the new genetic algorithm(Mutation Only Genetic Algorithm with Mixing, orMOGAM) with the exact method, (Exhaustive Search), wesearch for good rules in our two stocks investment problemwith MOGAM. This is necessary in general sinceExhaustive Search is too slow to be of practical use.

II ORDERING OF MOVING AVERAGES

A. Moving Averages

Moving averages are among the oldest and most efficientindicators which base on the updating price information.Different ways of calculating moving averages have beendeveloped to specify the problems. However, in our problem,as is proved in the paper of Rui Jiang and K.Y. Szeto [1],the choice of which kind of moving averages to use is notvery important. Therefore, we decide to focus on the use ofthe simple moving average, SMA in short.

0-7803-9422-4/05/$20.00 C2005 IEEE1682

Simple moving average on the mn day considering thelast n days, is defined as the arithmetic average of the closeprices of the n past days.

I n-I

SMA(m, n) =1

cm-in i=0

where cn is the close price on the ne day. And we canchoose a set of SMAs for a single day, such as M1 , M50,and M250.

B. Rule ofInvestment using Moving Averages

We define our investment rule by specifying theconditional part (IF) and the action part (THEN). The actionpart is a combination of a "BUY" part, a "SELL" part and a"HOLD" part. By the action "BUY" we mean to changecash into one kind of stock, "SELL" means to change thestock into cash and "HOLD" means to keep what we have athand. The conditional part involves the various ordering ofthe moving averages. If we denote the simple movingaverage(SMA) for n past days as Mn a simple rule forinvesting in one stock (A) can read as follow,

If(M5 >M50 >M250 >MIoo >M25 >M1), then buy A;If(M250 >M50 >M5 >MIOO >M25 >M1), then sell A;Else, then hold

Such a strategy can be described in detail like this: On theday when the six SMAs of the day satisfy M5 >M50 >M250>MIOO >M25 >M1, the next day we do the buy part ifwe havemoney, otherwise we are in the case that we have A alreadyand we keep it. So we can call M5 >M50 >M250 >MIOO >M25>M1 "the buy sequence". By the same token, if "the sellsequence" is satisfied, we sell the stock if we have stock Aat hand, otherwise we are carrying cash and we just keep it.At last, if both sequences are not satisfied, we do nothing.

When this is extended to a case when we can invest intwo stocks, the strategies are generalized as, for example:

If (MA50 >MAI >MA250; MB250 >MB50 >MB1 ), thenbuy A;

If (MA250 >MA1 >MA5o; MB250 >MB50 >MBI ), thensell A;

If (MA50 >MAI >MA250; MBI >MB50 >MB250 ), thenbuy B;

If (MA50 >MAI >MA250; MB250 >MB50 >MB1 ), thensell B;

If (MA250 >MAI >MA5o; MB50 >MB250 >MBI ), thenconvert A to B;

If (MA1 >MA50 >MA250; MB250 >MB50 >MBI ), thenconvert B to A;

Else, then hold

Here MA,, stands for the SMA over the past n days forstock A, and MBn the SMA over the past n days for stock B.In addition, we can also have the "converting sequence"which allows us to swap two stocks. Here we use acombination of three SMAs (the first three moving averagesin the brackets) for stock A and three for stock B. This isreasonable because all the information is included in thefluctuations of both prices of stocks A and B, which mayprovide subtle insight for the increase ofprofit. In this paper,we only compare three SMAs for each stock since in thisway the process of solving the problem is simplified butrepresentative.

Throughout the paper, this definition of strategies forsingle stock investment problem and two stocks investmentproblem is employed to demonstrate the efficiency of the"Mutation Only Genetic Algorithm", a method compared toexhaustive search. In fact, when we increase the number ofmoving averages, exhaustive search is not practical. Ournumerical experiments using MOGA extract good rulesefficiently.

C. Rule Extraction as an Optimization Problem

The best strategies are the ones that maximize the wholeprofit over the period under investigation. Therefore, it ispossible that we treat the stock investing problem as anoptimization problem. We can encode the sequence of theSMAs in a descending order into a sequence of integers. Forexample, the situation M250 >M1 >M50 is encoded as: (2, 0,1). The smaller the integer is, the shorter the period thecorresponding moving average covers. The objective is tofind the conditions under which we should buy or sell orhold to maximize the total profit. Consequently, an overallreturn RN is introduced to evaluate the sequences that standfor various strategies.

If we denote Wi, Si, ci as the volume of cash, the quantityof stock, and the close price on the ih day respectively, andwe cover a period ofN days, then the overall return is

R =NNWI~~~~~~~~~~~~~1Note that these factors are related. For example,

WSi = ,Ai = ;

Cirefers to the case where a BUY action is performed whenthe condition that the buy sequence is satisfied and we havecash. Similarly, a SELL action is performed when thecondition for the sell sequence is satisfied and we have stock.

Wi =Si-_ xciS= ; (3)

Our initial state isW1=M, S1=0; (4)which means that initially we have some money but nostocks.

1683

(2)

During the optimizing process, the SMAs of a set of daysare calculated at first and then compared to the assumed buysequence, sell sequence and convert sequence (if necessary)day by day. The assumed sequences can be generated inmany ways and the exhaustive method which exhausts allthe possible combinations of buy and sell (and convert)sequences extracts the best strategy. However, theexhaustive method doesn't work when the combination islarge. We only use it to verify our MOGA algorithm.

]m. MUTATION ONLY GENETIC ALGORITHM

With similar encoding and evaluating schemes butdifferent GA operators, we exert a new ADAPTIVE geneticalgorithm called Mutation Only Genetic Algorithm (MOGA)[2,3] This new method replaces selection, crossover andordinary mutation in traditional genetic algorithm withreordering and locus oriented mutation [4] whichconsiderably increases the speed to solution.

A. Encoding Scheme

Genetic algorithm is a method operating on a pool ofsolution candidates generated randomly which form thepopulation. Each solution candidate is called a chromosomethat appears as a string of characters. The encoding schemesets a rule for establishing such strings.In our problem, we encode as follows:

(M250 >M50 >M5 >M100 >M25 >M1)I l, I1 1 1

(5 3 1 4 2 0)for the one stock problem and

(MA50 >MA1 >MA250; MB250 >MB50 >MB1)I 4 1 4 4 4

(1 0 2 ; 2 1 0)for the two stocks problem.

Under this encoding scheme, we have a chromosome of twofragments like(5 3 1 4 2 0;5 4 2 3 0 1)fortheonestock problem with the first fragment ( 5 3 1 4 2 0 )as the "BUY" part and the second fragment;( 5 4 2 3 0 1 ) as the "SELL"part.

Similarly for the two stocks problem, we can encode achromosome as:

((0 1 2; 1 20);(2 1 0;2 1 0);(I 20; 1 02);I 4 4

BUY A SELL A BUY B(21 0;2 10);(I 20;0 12);(2 10;012)}

SELL B A--B4

This chromosome has twelve fragments. In the mutationprocess, each integer can only be exchanged with the otherones in the same fragment.

B. Fitness Evaluation

The overall return RN mentioned before is quoted here torepresent the fitness of every chromosome. As anevolutionary method, GA will eliminate the poorchromosomes with small fitness and let the fitting ones tosurvive. We initialize the overall return to 1.0. We sort themoving averages of a given day in a descending order to geta corresponding sequence to be compared with theconditional part of a given rule. If the sequence equals toeither the buy sequence part or the sell sequence part of thechromosome, the overall return is probably modified usingthe formulae in section II .C. Otherwise we leave the overallreturn unchanged.

C. Genetic Operator

One of the most important advantages of the new GAmethod (MOGA) is that it uses only mutation. By makinguse of the locus statistics, the worse genes have higherprobabilities to mutate. This simple observation greatlyreduces the computation time. In MOGA, we first reorderthe rows of the whole population in a descending order bytheir fitness. We then calculate the mutation probabilities forevery locus in the population. We generate a set of row

i-imutation probabilities, (axi = N for the iffi row and N is

N-1the size of the population). After the mutation probability ofa row has been computed, we adopt a column mutationprobability to decide which loci in the ih row are to bechanged. Before the column mutation probability iscalculated, we define the locus mutation probability ofchanging to an X at locus j as Pjx by

L. o (N-i)xo5- (X)Pix N

6R (X) is 1 if the ith chromosome has an X at locus j, andzero otherwise.Find the maximum pjx and calculate pj by the followingformula

1- 2x Pjxmax - 21

If there are two or more pjx s, one can lift thedegeneracy by some specific scheme. Now the mutationprobability for every locus can be written as ai*pj.

The above definitions give us the basic ingredients ofMOGA. In implementing MOGA, we can either perform

1684

mutation by row (MOGAR) or mutation by column(MOGAC), or do it in alternate fashion, MOGA withMixing or (MOGAM). The details can be found in theoriginal papers. [2,3,4].At the beginning of the mixed mutation, for a given row i,

a random number x is generated and if x<aj, then mutationis performed on this row, otherwise we proceed to the nextrow. Next, sort the pj values in a descending order withinevery fragment of the chromosome, and exchange the twowith the highest column mutation probabilities. The aboveprocess is called the row mutation, after which we shouldreorder the rows of the population again as we do just beforemutation. Then the column mutation is started. For a givencolumn j, a random number y is generated and if y<pj, thenmutation is performed on this column, or we proceed to thenext column. In fact, we mutate the last pj*N members ofcolumn j, for the rows are already in a descending order bytheir fitness and we just mutate in the most unfitted rows.When doing mutating, we exchange the member on row i,and column j with another member randomly chosen in thesame fragment on the same row. When all this is done, oneMOGAM generation is completed. We can do a number ofgenerations to work out the best set of strategies.The most significant advantage of the new genetic

algorithm is that it guarantees that the poorer the genes are,the more likely they are going to be mutated. Thus, selectionpressure is automated and good chromosomes are kept andbad ones are mutated quickly to generate new ones thatprovide an efficient search in the solution space.

IV. PRACTICE WITH REAL DATA

We use this MOGAM method to test a set of data ofsecurities on www.nasdaq.com, such as Microsoft, Intel,Dell and so on. The data are accumulated for ten years from1995 to 2005. The first 2000 points are used for training andthe next 500 points are for testing. When using MOGAMfor the two stocks problem, we use points in the training setto generate an optimal population of size 200 after 500generation, which will be used as a data base of optimalsolution candidates.To begin with, we first compare the results of the

MOGAM and the exhaustive method using the training datato evaluate the quality of the MOGAM. We use the trainingset for one stock to compare strategies extracted by bothexhaustive method and MOGAM. See Table 1 and Table 2.

Next, we use the training set for two stocks to see how theoverall profit can be improved by investing in two kinds ofstocks with MOGAM. Here we do not use exhaustivemethod as it is too slow. If the exhaustive method is used, itneeds (3!)A12=2176782336 times of recursion and in eachtime of recursion, 2000-250=1750 points need to becompared with the solution candidates in the training set and500 in the testing set. From Table 3, we see that investing inmore than one stock improves the profit greatly, not less

than the sum of the maximum profit of buying stock A or Balone if the strategies extracted by the MOGAM wasperformed.

Third, we use the optimal population established byMOGAM in the training set to see if our extracted rulesusing MOGAM are really good rules in the in the test set.The average values are the arithmetic average of the tenhighest profits in the 500 testing days using the strategiesextracted from the past 2000 days. The results are shown inTable 4.

Although the overall return value is not as high as in thetraining set, it's still obvious to see that the profit, measuredby the overall return over the 500 days in the test set issubstantially bigger when we invest in two stocks, comparedto the simple strategy extracted by MOGAM on investing ina single stock.

V. CONCLUSIONS

By comparing the Mutation Only Genetic Algorithm withMixing to the exhaustive search method, the quality of theresults by MOGAM is quite satisfactory in accuracy. On theother hand, when exhaustive search is prohibitively slow,we easily extract complex rules that demonstrate thatinvestment in two stocks produces much higher overallreturn than investing in one stock, using MOGAM.

Table 1Strategies Extracted by Exhaustive Method and TheirFitness as the Overall Return

Exhaustive Method

Name Overall Strategy______ __ Returnm _ _ _ _ _ _ _ _ _ _ _

Dell 30.3497 012345302145Intel 7.80897 102345123045

Microsoft 6.74308 2210345034125---A-.l {CINAl _C1 OAAAS 4.- - ,.nxnaustuve neeus wO!)-=') Ii4UU nimes ot recursion.

Table 2Strategies Extracted by MOGAM and Their Fitness asthe Overall Return

MOGAM

Name Overall StrategyName Returnm _ _ _ _ _ _ _ _ _ _ _

Dell 30.3497 012345302145Intel 7.79994 012345103245

Microsoft 6.74308 210345034125In our MOGAM,. We use the following parametersPopulation = 100;Generation = 1000;

1685

I:tA

MOGAM tested a pool of 100 * 1000 strategies before anoptimal set of strategies is extracted. Note that the numberofgeneration can be reduced at the cost of precision.

Table 3Investing In Two Stocks with the Optimal StrategiesExtracted By MOGAM

We can see from this table that the profit gained with twostocks investment can be greater than the sum of themaximum gain obtained by investing in single stock A andB.

Table 4Testing for the 500 Days with a Pool of Strategies ExtractedBy MOGAM

Population = 200;Generation = 500;

Stocks Micro Intel DellAverageOverall 1.029136 1.338895 1.227455Return

Combined Micro- Dell- Dell-Stocks Intel Intel MicroAverageOverall 1.335223 1.322191 1.3274Return I

__

It can be seen that MOGAM is a good tool to solvecomplex optimization problem, here illustrated by thecombinatoric problem of investment rules extraction infinancial time series. Different from the traditional methodssuch as Random Walk, Buy and Hold, Exhaustive Search,our MOGAM is more efficient in searching in the hugesolution space as all evolutionary methods do, but faster.For future work, we will generalize our method to tacklemore complex problems in investment strategies.

ACKNOWLEDGMENT

K.Y. Szeto acknowledges the support by CERG GrantHKUST 6157-OIP and HKUST 6071-02P. * Pan Xia is onleave from Department of Physics, Shanghai JiaotongUniversity, Shanghai, China

REFERENCES

[1] Rui Jiang and K.Y. Szeto; Extraction of Investment Strategies basedon Moving Averages: 2003 IEEE International Conference onComputational Intelligencefor Financial Engineering, CIFEr2003,Hong Kong, 2003, 403410.

[2] K. Y. Szeto and Zhang Jian Adaptive Genetic Algorithm and Quasi-Parallel Genetic Algorithm: Application to Knapsack Problem(Accepted for publication in LNCS for the 5th InternationalConference on "Large-Scale Scientific Computations" with publicationin LNCS, Date ofAcceptance: April 13, 2005)

[3] Zhang Jian and K.Y. Szeto, Mutation Matrix in EvolutionaryComputation: An Application to Resource Allocation Problem(Accepted for publication in LNCS for the First InternationalConference on Natural Computation and The SecondInternationalConference on Fuzzy Systems andKnowledge Discovery, 27 - 29August 2005, Changsha, China, Date of acceptance: April 23, 2005)

[4] C.W. Ma and K.Y. Szeto, Locus Oriented Adaptive Genetic Algorithm:Application to the Zero/One Knapsack Problem, Proceeding ofThe 5thInternational Conference on Recent Advances in Soft Computing,RASC2004 Nottingham, UK. p.410415, 2004

1686

Recommended