44
MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS OPPORTUNITIES AND PITFALLS

MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS OPPORTUNITIES AND PITFALLS

Page 2: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

WHAT I’M GOING TO TALK ABOUT

• Extremely broad topic – will keep it high level

• Why and how you might use ML

• Common pitfalls – not ‘classic’ data science

• Some example applications and algorithms that I like

• I hope I whet your appetite for ML!

Page 3: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

A LITTLE ABOUT ME

• Director of Algorithmic Trading at Honour Plus Capital

• Wholesale fund, diversified investment approach – fixed income, equities,

foreign exchange, property

• Mechanical engineer by training

• Other than algorithmic trading, I also like learning, travelling, rowing and

these days hanging out with my young family.

Page 4: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

CODING LANGUAGES AND SOFTWARE

• I particularly like R and C, but all of what I present can be implemented in

Python, MATLAB etc

• For trading simulations, I like the Zorro platform: powerful, flexible, simple C-

syntax, R interface, designed specifically for back-testing accuracy

Page 5: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MACHINE LEARNING – WHAT’S THE FUSS?

Page 6: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

WHAT IS IT AND WHY SHOULD YOU CARE?

• Algorithms that allow computers to find insights hidden in data

• As simple as linear regression, as complex as a deep neural network with

thousands of interconnected nodes

• ‘Mainstream’ by 2018 – big data, fast computers

• Rapidly evolving

• Find new sources of alpha. Maintain your market edge.

Page 7: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

THE FASCINATION FACTOR

Page 8: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MACHINE LEARNING – PROS AND CONS

• Powerful, flexible, insightful

• Find complex, non-linear relationships that humans can’t observe directly

• Humans are good at seeing relationships between 2, 3, possibly 4 variables.

• Higher dimensionality requires abstract thinking that we are not really built

for.

• Examples

Page 9: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

2D RELATIONSHIPS

X Y

8 7.48

9 10.07

10 10.19

11 11.98

12 10.38

13 12.02

14 15.05

15 13.67

16 15.25

17 16.83

18 17.07

19 19.76

20 19.74

21 21.75

22 19.03

23 23.20

Page 10: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

3D RELATIONSHIPS

X Y Z

-0.895 -0.895 1.342

-0.789 -0.789 1.274

-0.684 -0.684 1.212

-0.579 -0.579 1.155

-0.474 -0.474 1.107

-0.368 -0.368 1.066

-0.263 -0.263 1.034

-0.158 -0.158 1.012

-0.053 -0.053 1.001

0.053 0.053 1.001

0.158 0.158 1.012

0.263 0.263 1.034

Page 11: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

HIGH ORDER DIMENSIONALITY

• Can you visualise the relationships between 4, 5, 6 variables?

• How about 100? 1000?

Page 12: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MACHINE LEARNING – PROS AND CONS

• Powerful – easy to overfit

• Requires effort to understand and set up

• Lots of moving parts = lots can go wrong, particular when you begin

processing data and executing trades in real time

Page 13: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

TWO FLAVOURS – SUPERVISED AND UNSUPERVISED LEARNING

• What is the difference?

• When would you use one or the other?

Page 14: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

SUPERVISED LEARNING

• Predict some value or class from the features

• For example, predict to which group something belongs based on the values of x1 and x2

• Examples: linear regression, logistic regression, artificial neural networks, support vector machines, decision trees

Image credit: Andrew Ng

Page 15: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

UNSUPERVISED LEARNING

• Detect natural structure within the data

• For example, divide the data into its natural segments based on x1 and x2.

• Examples: k-means clustering, self-organizing maps

Image credit: Andrew Ng

Page 16: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

EXAMPLE APPLICATIONS

• ‘Blind’ data mining

• ‘Intelligent’ data mining

• Model-based

• These approaches attempt to predict price movement based on some known data

• Strategy insights – train a model using the returns of a trading system as the dependent variable and any factor you think affects it as an independent variable. Essentially the above process in reverse.

• Segregating data into ‘natural’ groupings

Page 17: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

‘BLIND’ DATA MINING

• Not recommended

• Mine a data set for combinations of variables that have predictive utility

• Very easy to get lucky

• Requires additional work to statistically account for data-mining bias. See White (2000). His approach was popularised by Aronson (2006).

• White’s approach lends itself to computerised data mining as it can be automated as part of the workflow

• Compare with manually combining technical analysis rules

Page 18: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

‘INTELLIGENT’ DATA MINING

• Engineer and select variables that are likely to have some relationship with

the target variable

• Tap into your domain knowledge and creativity

• Feature engineering takes time and effort

• Some algorithms that can assist – RFE, Boruta

• Also susceptible to data-mining bias

Page 19: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

‘INTELLIGENT’ DATA MINING

• Boruta feature selection algorithm measures the relative importance of your

variables by comparing them with random variables obtained by randomised

subsampling of the actual variables.

• Easy to implement, intuitive, elegant

Page 20: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

• Blue: random variables

• Green: variables that ranked higher than the best random variable

• Red: variables that ranked lower than the best random variable

Source: robotwealth.com

Page 21: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

Sharpes of models constructed with various combinations of variables selected by Boruta trained with various algorithms. Source: robotwealth.com

Page 22: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MODEL-BASED APPROACH

• Construct a mathematical

representation of some market

phenomenon

• Consider the ARMA model and its

limitations

• Artificial Neural Networks – “universal function approximators”

Page 23: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

MODEL-BASED APPROACH

• Some creative examples that leverage the power of deep neural networks

and the availability of high quality satellite imagery

• Oil tanks with floating lids for insight into oil supply and demand dynamics

• Car park usage as a predictor of retail sales

Page 24: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed
Page 25: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

SEGREGATION

• Split data into natural groupings to gain insight into how it is structured

• Example: candlestick patterns that actually have a quantitative rationale

Page 26: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed
Page 27: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

Source: robotwealth.com

Page 28: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

PITFALLS

• “Classic” data science is difficult to apply to the markets. The Coursera ML courses and the Kaggle data science competitions are a great place to start, but nearly

always deal with data that is fundamentally different to ours.

• Specifically, our data is not IID. It contains complex autocorrelations and is non-

stationary. Makes conventional cross-validation techniques difficult to apply – more

on this shortly.

• Tendency to overfit

• Data mining and the tendency to be fooled by randomness

Page 29: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

TENDENCY TO OVERFIT

• Deal with the easiest problem first

• Reduce the number of features – in practice, avoid the temptation to use more

than three or four

• Regularization is your friend

• So is cross-validation – but needs to be done right

Page 30: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

REGULARIZATION

• A mathematical technique for reducing the complexity of a model

• For example, reduce the impact of terms in a linear regression:

= � + � + � + � �

• Regularization might set some coefficients to zero or to very low values

Page 31: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

REGULARIZATION

• Getting the regularization parameter right models that are neither under-

fit nor over-fit

• Need a way of measuring and comparing your out-of-sample performance –

cross validation and a final validation test set

Page 32: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

CROSS VALIDATION

What?

• A method of measuring the predictive performance of a statistical model

• Several varieties – k-fold, leave one out, bootstrap

Why?

• High R-squared != good model … especially if model is overfit

• Test the model on a set of data not used in model training

• One in-sample plus one OOS set not usually good enough

Page 33: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

K-FOLD CROSS VALIDATION

• The original sample is randomly partitioned into k subsamples

• Train the model on k-1 subsamples

• Test it on the kth subsample, that is, the one that was left out and record

performance

• Select a different subsample to leave out and repeat the process until all k

subsamples have been used as a test set

• Repeated k-fold CV: repeat the process with a different random partitioning

Page 34: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed
Page 35: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

DON’T TREAT CV AS A PANACEA

• CV performance can be an estimate of out-of-sample performance, but in

practice it will be a biased estimate, particularly if we repeat it for different

variations of our model

• Consider tuning a model parameter (eg a regularization parameter) and

selecting the model with the best CV performance…selection bias

• Unfortunately k-fold CV is not a great approach for financial data

Page 36: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

THE PROBLEM OF NON-I.I.D. DATA

• What is Independent and Identically Distributed data?

• Independent: one event gives no information about another event (P2 is not

related to P1)

• Identically distributed: the distribution is constant

• If x and y taken from the same distribution, they are identically distributed

Page 37: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

EXAMPLE: COIN TOSS

The tosses are IID because:

1. The occurrence of one result tells us nothing about the probability of another

result. That is, the process has no memory.

2. The distribution from which the tosses are drawn never changes. The

probability of each result never changes.

Note that even a biased coin toss is IID (events don’t need to be equiprobable)

Page 38: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

FINANCIAL DATA

• Highly non-stationary – the statistical moments vary with time and the length of their

calculation windows

• Certain events may or may not affect the probability of other events

• Not a total disaster – simply means that the assumptions of many common statistical

test are violated

• Also means that the randomisation process used in regular CV is not appropriate

• Don’t use data from the future to build a model on the past!

Page 39: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

TIME SERIES CROSS VALIDATION

• Use a rolling window

• Train the model on the window, test it on the adjacent data

• Shift the window into the future and repeat (the start of the window can be

anchored or moving)

• In R, easy to implement with the caret package

• In Zorro, easy to implement with the walk-forward framework

Page 40: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

TIME SERIES CROSS VALIDATION

In practice, the length of the optimization and out of sample periods may be important factors in a model’s success. But a robust trading model will be largely insensitive to these.

Page 41: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

TIME SERIES CV IS ALSO NOT A PANACEA

• Even if your model passes a TS CV test, there is no guarantee that it won’t just stop working if and when the underlying relationship being modelled changes.

• Particularly true for data-mining systems.

• At lest with model-based systems, you have insight into when it is likely to

break down.

Page 42: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

DIAGNOSTICS - LEARNING PLOTS (GET YOUR MODEL UNDER CONTROL EFFICIENTLY)

High variance – overfit model

High bias – underfit model

Image credit: Andrew Ng

Page 43: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

ALL THE RAGE: DEEP NEURAL NETWORKS

• What is it?

Broadly - multi-layer neural networks with potentially thousands of nodes

Capable of making high level abstractions in line with what we call ‘intuition’

Computer vision (driverless cars), handwriting recognition, Alpha Go

Made possible through advances in computational efficiency, both hardware and

software

• Stacked Auto Encoders – simplify feature selection?

Page 44: MACHINE LEARNING IN ALGORITHMIC TRADING SYSTEMS...A LITTLE ABOUT ME Director of Algorithmic Trading at Honour Plus Capital Wholesale fund, diversified investment approach ² fixed

USEFUL RESOURCES

• Max Kuhn, his caret package and book Applied Predictive Modelling

• CRAN machine learning task view

• Winning the Kaggle Algorithmic Trading Competition – an interesting read

• David Aronson’s Statistically Sound Machine Learning

• www.robotwealth.com (of course) – lots of source code examples, R and Lite-C

• Hyndsight - the blog of Rob Hyndman, Melbourne Uni statistician -

http://robjhyndman.com/