Sparse Optimization Methods and Statistical Modeling with …mtho1/DefenseCharts6.pdf · 2016. 3. 25. · Modern portfolio theory (MPT) ... Portfolio design is sensitive to modeling

Sparse Optimization Methods and StatisticalModeling with Applications to Finance

Michael Ho

Department of MathematicsUniversity of California, Irvine

March 25, 2016

Michael Ho Sparse Finance March 25, 2016 1 / 46

Outline

1 Introduction and ContributionsMean-Variance Portfolio SelectionResearch Contribution

2 Pairwise Weighted Elastic Net

3 Covariance estimation from High Frequency Data

4 Conclusion


Introduction and Contributions

Section 1

Introduction and Contributions


Introduction and Contributions Mean-Variance Portfolio Selection

Outline




4 Conclusion



Modern Portfolio Theory

Modern portfolio theory (MPT) considers the following questionSuppose an investor needs to invest in a portfolio of assetsHow should the investor choose the portfolio?

To answer this question MPT makes the following assumptionsInvestors make decisions based only on expected return and riskGiven two portfolios with the same expected return, an investorwill choose the lower risk portfolio



Mean - Variance Criteria can be formulated as a quadraticprogram

Suppose there are N risky (random return) assetsDenote the single period return of the nth asset as rnThen a Mean-Variance optimal portfolio w can be written as thesolution to following quadratic program (QP)

minw

wT Γw

s.t. wTEr ≥ η ≥ 0

wT~1 = const (MV)

where Γ is the covariance matrix of r .Here we assume Er 6= 0 and Γ is positive definiteThe above problem is convex and there are many techniques forsolving (MV)



Sharpe ratio optimal portolio

If rF is the return of a risk-free asset, the excess return of the riskyassets is defined as r − rFThe Sharpe ratio (SR) optimal portfolio of risky assets can be computedvia

maxw

wTµ√wT Γw

s.t. w 6= 0

where µ is the mean of r − rFSince SR is invariant to positive scaling this can be reformulated (up to aconstant scaling) as

minw

wT Γw − wTµ

SR optimal portfolio coincides with risky component of mean-varianceoptimal portfolio



Mean-variance criteria is subject to parameter uncertainty

Implementation of mean-variance criteria is impeded by lack ofinformation

Mean and covariance are unknown

Intuitive work around is to estimate mean and covariance usingsample averages from past return data and plug-in into theoriginal MV problem

minw

wT Γ̂w − wT µ̂

Applied to the stock market out-of-sample portfolio performanceusing this technique is poor

Noisy dataNon-stationary statisticIll-conditioned covariance matrix ( high sensitivity to errors)


Introduction and Contributions Research Contribution

Outline




4 Conclusion



Overview

Research investigates two aspects of mean-variance portfolios

Robustness of mean-variance criterion to modeling errors

Portfolio design is sensitive to modeling and parameterassumptionsPerformance can be severely degraded when incorrectassumptions are made

Parameter estimationParameters such as mean and variance needed for many portfolioselection criteriaParameters are often unknown but can be estimated fromhistorical dataAccurate estimation is essential to achieving robust performance



Contributions of Dissertation

1. Weighted elastic net penalized criterion

Penalization approach that improves portfolio performance underparameter uncertaintyMaterial presented during candidacy examination (Nov 2014)Method improves on other techniques proposed in literatureSIAM J. Financial Math. (with J. Xin, Z. Sun), Vol. 6 2015

2. Robust covariance estimation from high frequency dataAddresses market microstructure noise, asynchronous trading,jumpsSparse modeling approach (`1, Spike and Slab) adds robustnessto jumpsMethod outperforms simpler techniques proposed in literature


Pairwise Weighted Elastic Net

Section 2





To address parameter uncertainty the following is proposed

Pairwise weighted elastic net (PWEN) penalized criterion

minw

wT Γ̂w − wT µ̂+ |w |T ∆|w |+ ||w ||~β,`1

∆ is positive semidefinite matrix with non-negative entries, βnon-negative||w ||β,`1 =

∑i |wi |βi

Weighted elastic net penalty when ∆ is diagonal



PWEN promotes robustness

TheoremPWEN criterion equivalent to a robust optimization problem

minw

maxR∈A,v∈B

wT Rw − vT w .

A and B are parameter uncertainty sets for covariance and mean

A ={

R : Ri,j = Γ̂i,j + ei,j ; |ei,j | ≤ ∆i,j ; R � 0}

B = {v : vi = µ̂i + ci ; |ci | ≤ βi} .

∆ is assumed to be diagonally dominate

PWEN criterion optimizes worse case performance



Calibration of PWEN

Calibration of PWEN can be done by selecting an appropriateuncertainty set for parameter estimateBootstrapping is one way to quantify uncertainty

Robust optimization interpretation used in calibration



Performance Plot

Performance benefit of PWEN and WEN demonstrated on U.S.stock return data630 stocks, from January 1,2001 to July 1, 2014, Mid to Large Cap


Covariance estimation from High Frequency Data

Section 3




Large-Dimensional Covariance Estimation

Covariance estimation of asset returns is an important step inportfolio optimizationMore training data can improve covariance matrix estimation ....however,Time varying nature of asset return statistics place limits on thetime interval where training data is relevant

Figure: Time varying volatility limits amount of relevant data



Exploiting High Frequency Data

High-frequency data allows for more data in shorter time intervalCan obtain covariance estimates using more recent dataHowever,estimation of covariance from high-frequency data iscomplicated by

Asynchronous returnsMarket Microstructure NoiseJumps

Benefits of High frequency data complicated bynoise,asynchronous trading and jumps



Asynchronous trading

Standard sample average estimation of covariation of returnsrequires returns of all assets are sampled on a common gridIn high frequency data assets trade asynchronouslyResampling the data to a common grid can be performed but doesnot use all the data or may cause covariance to be non-positivedefinite



Market Microstructure Noise

Market friction such as bid ask spread is a source of noiseTrue efficient price is not observedOver short time periods price variation due to bid/ask spread canmask “true” efficient return

lim∆→0

T/∆∑n=0

(Pnoise(∆(n + 1))− Pnoise(∆n))2 =∞



Jumps in price can corrupt estimate of covariance

Jumps in market returns not explained by a diffusion can occurThese jumps can severely bias the covariance estimate of thediffusion component of the returnsDisentangling price movement due to jumps and diffusioncomponents necessary to estimate covariance



Data Model for Hidden Price Process

Let Xn be a vector containing all log-prices at time n.Model discrete time log-price as

Xn = Xn−1 + Vn︸︷︷︸N (D,Γ)

+ Jn︸︷︷︸Jump

(1)

Jn and Vn are i.i.d sequences and independentX is unobservedD and Γ unknown but assume known prior distribution.



Observations are noisy and missing

Observations are noisy (market micro-structure noise) andmissing

Yn = ĨnXn︸︷︷︸subset of prices observed

+ Wn︸︷︷︸microstructure noise,N (0,Q)

(2)

V independent of J,W and XQ is unknown and diagonal but assume known prior distributionAssume observations are MAR(missing at random) and areindependent of prices



Missing Data Example

Single Asset

Missing data can be inferred by nearby observations

Multiple Assets

Low rank structure in covariance can allow for improved

inference of missing values

Missing data can be inferred from observation of otherassets at same and different times



Data Completion through Kalman smoothing

Kalman smoothing can beused to infer missing data andremove noiseConditioned on parameters, θ,Kalman smoothing is arecursive method forcomputing the posteriordistribution p(x |y , θ)Applies only to normallydistributed data ( computesmean and variance)

Rudolf Kalman, 2008

Smoothing of noisy time series with missing data



Jumps

Kalman filter tends to over smooth jumps

Jumps can contaminate estimate of covariance (further degradingKalman smoothing performance)Γ̂ ∼ Γ + E(JJT )︸︷︷︸

jump bias



Sparse Jump Models

For discrete time modeling we consider two types of priordistributions for jump

Spike and SlabLaplace Distribution

Both priors induce sparsity in posterior mode of jumpsBoth models also popular for variable selection in regression andmachine learning



Spike and Slab Jump Model

For this model prior of Ji(t) is a mixture of point mass at 0 and anormal distribution

p(ji(t)) = ζ 1ji (t)=0︸︷︷︸spike at 0

+(1− ζ)N (ji(t),0, σ2j,i(t))︸︷︷︸slab

,



Laplace Distribution

Spike and slab distribution of J is non-continuous andmulti-modal, which complicates estimation of JAs an approximate we consider the Laplace distribution

p(jn(t)) ∝ exp (−λn(t)|jn(t)|) (3)

Induces weighted `1 norm in conditional log posteriorλn(t) treated as unknown with known distribution (gamma)Iterative estimation of λn(t) induces a reweighting of `1



Laplace prior promotes sparse posterior mode

Consider the following experiment

Suppose κ is Laplacedistributed, q is N (0,1)Let observe η = κ+ q.Suppose we observeη = 0.5Maximum likelihoodestimate of κ is 0.5.Posterior mode is 0 !

κ

-6 -4 -2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Laplace Prior Promotes Sparse Posterior Mode

LikelihoodLaplace PriorPosterior

Laplace prior promotes spare posterior mode



Maximum a posteriori (MAP) estimation of covariance

MAP estimate of covariance, Γ is mode of posterior

[Γ̂, θ̂] = arg maxΓ′,θ′

log p(θ′, Γ′|y)

where θ are the nuisance parameters (jumps,noise variance, etc)Posterior is difficult to directly optimize due to missing dataIterative approaches normally employed



ECM approach to MAP estimation

Expectation conditional maximization (ECM) algorithm(Meng,Rubin 1993) alternates between two steps

E-step: Compute the following surrogate function

G(k)([Γ, θ]) = EX |Y ,Γ(k),θ(k) log p(Γ, θ|y , x)

M-step: Set [Γ̂(k+1), θ̂(k+1)] to conditional maximizers of G(k)([Γ, θ])

E-step performed using Kalman Smoother (jumps compensatedfor using estimate from prior iteration)Monotonic increase in log posteriorAlgorithm converges to a local mode under mild regularityconditions which hold for this problem



KECM-Laplace recovery, Low Rank Covariance

(Movie Loading.avi)

KECM approach can recover missing prices whencovariance is low rank


LowRankVideo.wmvMedia File (video/x-ms-wmv)


KECM-Laplace recovery, High Rank Covariance

time510 520 530 540 550 560 570 580

pric

e

35

35.05

35.1

35.15

35.2

35.25

35.3

35.35

35.4

Posterior MeanObservationTruth

Price recovery more difficult when covariance is highrank



KECM-Laplace recovery with Jump

(Movie Loading.avi)

KECM-Laplace


JumpVideo2.wmvMedia File (video/x-ms-wmv)


Bayesian Approach using MCMC

Problems with ECMapproach

Reports single modeNuisance parametersestimatedUncertainty notreflected in mode

Bayesian ApproachPosterior distributiondeterminedNuisance parametersintegrated out

Moderate jumps: Single Modeposterior

Small jumps: Multimodal posteriorMichael Ho Sparse Finance March 25, 2016 37 / 46


Gibbs sampling approximation to posterior

Computing posterior of covariance directly involves integrationover a high-dimensional parameter spaceMarkov Chain Monte Carlo (MCMC) approaches such as Gibbssampling can be used to approximate the posterior in an efficientmanner

Sequentially draw each parameter from it’s conditional posteriordistributionSequence converges in distribution to posterior (under someconditions)

For this model Gibbs sampling is convenient since eachconditional distribution is easy to draw from



MCMC Example

(Movie Loading.avi)

MCMC captures uncertainty in parameters


mcmc_smp.wmvMedia File (video/x-ms-wmv)


MCMC Movie

(Movie Loading.avi)

MCMC escapes from local mode


MCMCvideo.wmvMedia File (video/x-ms-wmv)


Results of Covariance Estimation

Characterize performance using normalized Frobenius norm of error√∑i,j |Γi,j − Γ̂i,j |2√∑

i,j |Γi,j |2.

Relative covariance estimation error for various jump size and frequency.Michael Ho Sparse Finance March 25, 2016 41 / 46


Performance under GARCH(1,1)-jump model

Xi(t) = Xi(t − 1) +√

hiVi(t) + Ji(t)Zi(t) + D

hi(t + 1) = bihi(t) + ai(Xi(t)− Xi(t − 1)− D)2 + ci

Relative covariance estimation error for various jump size and frequency.



Performance with stochastic noise variance

Here we extend GARCH(1,1) model to stochastic microstructure noisevariance

σ2o,i(t) = a2(Xi(t)− Xi(t − 1)− D)2 + b2

Relative covariance estimation error for various jump size and frequency.


Conclusion

Section 4

Conclusion


Conclusion

Conclusion

Sparse modeling and optimization applied to finance in 2 waysPortfolio robustness enhancements

This dissertation has considered the application of sparseoptimization and modeling to financeWeighted and Pairwise Weighted Elastic Net penalized portfolioshown to improve robustness of portfolios using U.S. stock returndata

Covariance estimation from high frequency dataKalman EM approach extended to models that include price jumpsNew approach shows enhanced performance under jump modelsfor a variety of simulated data models (Jumps, GARCH, dependentobservation noise)


Conclusion

Future work

Pairwise weighted elastic netFurther investigate calibration of pairwise weighted elasticRelaxing diagonal dominant restriction on weighting matrix, ∆, mayimprove performance

Covariance estimation from high frequency dataFurther investigate low rank + sparse matrix factorizationtechniques to enhance covariance estimationReweighted nuclear norm and reweighted `1 penalties


Backup Charts

Section 5

Backup Charts


Backup Charts

Solution via nuclear norm minimization

Missing data can also be recovered using matrix completion by notingreturns are low rank

DefinitionRi,t :unobserved low rank component return of asset i at time tJi,t :unobserved sparse jump component return of asset i at time tXi :unobserved efficient price of asset i at time 0Yik ,tk : observed (noisy) price of asset ik at time tk .S: discrete time integration ( in time) operator (rectangularmethod)

Nuclear Norm Formulation

minX ,J,R ||R||∗ + λ1∑

k

(Xik + ((R + J)S)ik ,tk − Yik ,tk

)2+ λ2||J||`1


Backup Charts

Example Reconstruction 80 percent observed, No Noise

Time0 50 100 150 200 250

log-

Pric

e

4.095

4.1

4.105

4.11

4.115

4.12

4.125

4.13

Reconstructed log-price

TruthNuclear Norm MinimizationKECM-LaplaceObservations

Time100 110 120 130 140 150 160 170 180 190

log-

Pric

e

4.113

4.1135

4.114

4.1145

4.115

Reconstructed log-price - Zoom In


Time0 50 100 150 200 250

Jum

p

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02Reconstructed Jump

TruthNuclear Norm MinimizationKECM-Laplace

Singular Value #0 2 4 6 8 10 12 14 16 18 20

Sin

gula

r V

alue

10-20

10-15

10-10

10-5

100Singular Values of log Returns (jumps removed)


80 percent observed - No Noise


Backup Charts

Example Reconstruction 30 percent observed, No Noise

Time0 50 100 150 200 250

log-

Pric

e

4.16

4.165

4.17

4.175

4.18

4.185

4.19

4.195



Time80 90 100 110 120 130 140 150

log-

Pric

e

4.1765

4.177

4.1775



Time0 50 100 150 200 250

Jum

p

-0.01

-0.005

0

0.005

0.01Reconstructed Jump


Singular Value #0 2 4 6 8 10 12 14 16 18 20

Sin

gula

r V

alue

10-8

10-7

10-6

10-5

10-4

10-3

10-2Singular Values of log Returns (jumps removed)


30 percent observed - No Noise


Backup Charts

Example Reconstruction 80 percent observed, Noise

Time0 50 100 150 200 250

log-

Pric

e

3.635

3.64

3.645

3.65

3.655

3.66

3.665

3.67



Time50 60 70 80 90 100 110 120 130 140

log-

Pric

e

3.65

3.6505

3.651

3.6515

3.652

3.6525



Time145 150 155 160 165 170 175

Jum

p

×10-3

-1

0

1

2

3

4

5

6

Reconstructed Jump


Singular Value #0 2 4 6 8 10 12 14 16 18 20

Sin

gula

r V

alue

10-20

10-15

10-10

10-5



80 percent observed - Noise


Backup Charts

Example Reconstruction 30 percent observed, Noise

Time0 50 100 150 200 250

log-

Pric

e

3.595

3.6

3.605

3.61

3.615

3.62

3.625

3.63Reconstructed log-price


Time70 80 90 100 110 120 130 140 150 160

log-

Pric

e

3.6096

3.6098

3.61

3.6102

3.6104

3.6106

3.6108

3.611

3.6112



Time206 208 210 212 214 216 218

Jum

p

×10-3

0

5

10

15

20

Reconstructed Jump


Singular Value #0 2 4 6 8 10 12 14 16 18 20

Sin

gula

r V

alue

10-20

10-15

10-10

10-5



30 percent observed - Noise


Backup Charts

ECM algorithm for Laplace jump model

Initialize estimate of Γ, σ2, and Jwhile not converge

Compute posterior distribution of the X given Y , Γ, σ2, J,D withKalman smoother(E-Step)Perform M-step for Γ,D and σ2, assume J is fixedCompute MAP estimate of J given Γ and σ2 using ADMM,FISTA,etc..Update λi (t) ( effectively reweights `1 penalty)

Algorithm for spike and slab model is similar.


Backup Charts

Gibbs sampling approach for spike and slab

Initialize parameters Θ(0) = [Ymiss,X , Γ,D, J, σ2, ζ, σ2j ]

for m = 0 . . .Mfor k = 1 . . . 8

Sample Θ(m+k/8)k from p(Θk |Θ(m+(k−1)/8)−k )

Discard first P samples “burn-in”Take covariance samples to estimate posterior mean ofcovariance


Backup Charts

Example: Bootstrapping the uncertainty set when statistics areunknown

Here we illustrate one way to calibrate the uncertainty set for µSuppose we have training data returns r(1), . . . , r(T )Randomly take T samples from {r(1), . . . , r(T )} (withreplacement)

Call these ζ(1), . . . ζ(T )Use empirical distribution of µ̂(ζ(1), . . . , ζ(T ))− µ̂(r(1), . . . , r(T ))as proxy for estimation error

This can be done via Monte Carlo by resampling many times

β can be selected as a percentile of the empirical distribution


Backup Charts

Sample Average Plug-in Performance is Disappointing

Consider the following experiment

Return data collectedfrom 20 US stocksbetween 7-2001 and7-2013Sharpe Ratio OptimalPortfolio Computedbased on 55 days oftraining dataPortfolio performanceevaluated using next 30trading days Performance of plug-in mean-variance portfolio is disappointing


Backup Charts

Bootstrap versus Normal-χ2 Approximation Calibration

Calibration using bootstrap Calibration using Normal-χ2 approximation


Introduction and ContributionsMean-Variance Portfolio SelectionResearch Contribution

Pairwise Weighted Elastic NetCovariance estimation from High Frequency DataConclusionAppendix

Documents

Sparse Optimization Methods and Statistical Modeling with …mtho1/DefenseCharts6.pdf · 2016. 3. 25. · Modern portfolio theory (MPT) ... Portfolio design is sensitive to modeling