Using Java & Genetic Algorithms to Beat the Market

Embed Size (px)

Citation preview

Using Java & Genetic Algorithms to Beat the Market

Matthew RingJavaOne 2011BOF Session 22382

[20111007 15:11CDT MRing added results pages]

401K Problem?

I can't speak for you, but...

I know, I'll write software to time the market!

Give it the 'ole college try, son!

Shoestring Toolkit

K.I.S.S. Let's trade stocks.

Cheap or free market data. Especially daily OHLCV data.

Generally Liquid.

Low transaction fees.

Many brokers offer equity trading APIs.

Java My primary language.

Free.

Huge number and variety of open source libraries.

Some Styles of Trading

HFT

Market-making or scalping (opinions vary) in sub-second windows.

Day Trading

Buying & selling a stock in the same trading day, attempting to profit on same-day price moves.

Position Trading

Attempting to profit on trends over a window of a few days to a few years

Swing Trading

Attempting to profit on volatility over a window of a few days to a few weeks

How do humans get an edge or time the market?

Dumb luck.

Inside information.

Gut instinct.

Find repeating patterns in the market.

Manipulate other participants.

Better or faster information.

Faster execution.

So, for which edge should *I* try to write software?

I like the sound of Find repeating patterns in the market

Patterns? Signals?

Technical Analysis!

TA-Lib : Technical Analysis Library Open-source API for C/C++, Java, Perl, Python and 100% Managed .NET

What is Technical Analysis?

Bands

Oscillators

Moving
averages

Candlesticks

Trading decisions as reactions to these signals

Voodoo?

OK, how can I automate this?

Convert market data to signals.

The software interprets the signals & decides how to trade.Refinement towards implementation Map a discrete set of signal responses to a discrete set of trading decisions.

Simulate trades based on these decisions.

Goal is optimize for profitable trades, of course.

Leads us to: Optimization Algos and/or Machine Learning!

Artificial Neural Networks

Simulated Annealing

Emergent Behavior (swarm, cellular automata)

Genetic Programming

Genetic Algorithms

Support Vector Machines

Genetic Algorithms

Sources: http://www.slideshare.net/kancho/genetic-algorithm-by-example (Nobal Niraula)

Genetic Algorithms

Sources: http://www.slideshare.net/Mathijsje/genetic-algorithms-7276974 (Mathijs van Meerkerk)

Some possibly familiar examples of GA...

Open Source, Java, created by Dan Dyer

Provides easy-to-follow examples & interface-based API.

Our Machine

Market Data

Operational
Settings

Trading
Decisions

?

Backtesting
Feedback

GA

Piecing our Machine Together

Get the OHLCV market data as CSV files, load into DB. (boring, will not be shown here)

Generate signals from market data with TA-LIB.

Generate choices for trading decisions.

React to the signals

Map reactions to choices

Test the choices made.

Generate Signals

Generate Choices for Trading Decisions

What Decisions? How about...When to buy?

How much to buy?

How long to hold?

Predicted low price (for buying)?

Predicted high price (for selling)?

Price at which I should sell prior to end of holding period (opportunistic sell)?

Generate Choices for Trading Decisions

Generate Choices for Trading Decisions

Generate Choices for Trading Decisions

Recall that number of choices for each decision is purposely limited.

Choice values are generated from 'random' doubles, scaled to meet their various requirements.

React to the signals

Each signal feeds, one day at a time, into a device I named Thresholds.

Each Thresholds instance is initialized w/ a 'random', sorted array of doubles, scaled to span the range of the signal it is to consume.

This initialization step establishes both the number of bins and the bin boundaries.

public int findBin(double testVal)

React to the signals

A Thresholds selects a bin based on the signal level via a simple sigVal < binBoundary check.

The bin index is the reaction.

Sort of like an A/D converter, if you pretend that the signal is analog, sampled daily.

A subclass provides a memory of its previous reaction, so it reacts to both the current signal level and the change from its previous reaction.

Summary, so far...

Daily stock OHLCV data become Signals.

I have chosen 24 various TA signals.

I have defined 6 trading-related decisions: Buy?, How much?, Holding period? Low price guess? High price guess? Opportunistic sell price?

For each decision, for each signal, a Thresholds instance will react.

Converting reactions to decisions

Each threshold bin in each decision layer is mapped to one of the decision choices.

The mappings are established 'randomly'

The choice that gets the most bin reactions is selected.

Winner take-all voting.

Note that some mappings will effectively abstain from voting

Ties are broken based on iteration order.

Converting reactions to decisions

Now to get a little fancier...

Added combinational voting to each layer.

Some 'randomly' selected bins are ANDed with others, then 'randomly' mapped to choices for the given decision layer.

More opinions in the mix.

Backtesting the decisions

Rules, laws & things to keep in mindCash account trades settle in T+3 days.

Avoid free-riding:Buy X at T

Sell X at T', where T' < T+3

Buy Y on T' with proceeds from sale of X

Cannot sell Y before T'+3

If you do, the Federal Reserve Board requires that your broker freeze your account for 90 days

Backtesting the decisions

More rules, laws & things to keep in mindAvoid day trading:Buy X on T

Sell X later on T

If you do this more than 3x in a 5-day period, you will be classified as a Pattern Day Trader

FINRA rules require that Pattern Day Traders use margin accounts with a Min. balance of $25K

When backtesting:Allow for slippage (be pessimistic in fill price)

Market will eat up high bids & low asks.

Backtesting the decisions

Based on the last slides, I usually set minHoldDays=3

Iterate signal data while trying to create trades based on machine advice & market actuals.

Remember to skip first max 'lookback' number of days, as many signals involve moving averages.

Strongly bias 'buy' fills toward the high, 'sell' fills toward the low.

Backtesting the decisions

Check proposed buy & sell prices against the current day open as a sanity check.

Sell is greedy will try to sell at Max(predicted high price, chosen opportunistic sell price)

Any position where holdDays >= maxHoldDays is dumped at the end of the day at a price biased toward the low and the close.

Evolving the Candidates

Notice that the previous slides have mentioned 'random' & 'randomly' several times.

This is where the GA comes into play.

The genome, a double[6850] serves as the settings, the 'randomness', for each candidate.

Evolving the Candidates

The trades generated during backtesting will be used to score the individual candidate.

The GA will select the fittest candidates for mating, based on their scores.

The GA will recombine (crossover) and mutate (spontaneously change values) mated genomes.

More Details

I have simplified some things for this presentation.

I pre-screen the stocks used, for swing trading appropriateness. I don't want to wait while it runs through the whole Nasdaq.

I backtest & evolve on a subset of available data.

Then I try (backtest w/o evolution) the top evolved candidates against recent out-of-sample data.

Other Comments

The GA is the easy part.

Designing the machine to work with it is a bit of an art.

Reduce the solution space by limiting choices.Ex: Pre-screen stocks, favor discrete choices

Allow the GA plenty of wiggle room.Ex: Large genome, multiple voting schemes

How meta to get? I hard-coded some values that could have been part of the genome.

Last Minute Notes

Termination Condition: Num of stagnant generations

Scoring I'm using the ending account balance (cash + approx val of open positions) as the score.

The chosen scoring formula has a critical impact on the out come. It provides the 'motivation' for the GA to 'improve'.

What happens when you try a score like numGoodTrades / numBadTrades?

Watchmaker only supports positive double score values.

The Trade PnL calculation includes transaction costs (my target broker is $2.50 a trade)

Relevant Links

http://watchmaker.uncommons.org/

http://www.eoddata.com/

http://ta-lib.org/

http://www.slideshare.net/kancho/genetic-algorithm-by-example

http://www.slideshare.net/Mathijsje/genetic-algorithms-7276974

Results About the demo you didn't get to see (sorry).

Pick 4 random stocks from the list of pre-screened Nasdaq stocks.

Each model starts w/ $2000.

Evolve & test models on 20080101 thru (Today - 45 trading days). This is the in-sample data.

Stagnation Condition: quit after 7 generations w/ no improvement.

Keep the best 2 evolved individuals for each stock, for a total of 8 reserved candidates.

Test them on out-of sample data of (Today - 44 trading days) thru Today.

Print out the trading results of the top 3 out-of-sample tests.

Results Talkin' 'bout an evolution...

Run 1

Run 2

Best candidate score Vs.
the number of generations

Yes, the score is in $, but since
this is in-sample data, please
take it with a grain of salt.

Results Out-of-sample trades

Run 1

Run 2

Results Discussion of

These 2 demo runs took about 5 mins each. I cut corners with some of my settings (number of stocks examined; stagnation), in order to keep the run time short.

The graphs show that evolution is a choppy process, due in part to my use of aggressive crossover & mutation strategies as well my choice not to enable elitism (best members transcend their own generation).

Results Conclusions

On average, I am making about 3% per simulated trade. The max gain was about 10%, the max loss was about 7%. The standard deviation was about 4.5%.

So, the generated models appear to have predictive power.

The validity of these results depends on my belief that my backtesting routine is both reasonable & bug-free.

No guarantee that the evolved models will continue to work when facing new, out-of-sample data. I'm still just betting. But, maybe, with an edge.