[IEEE 2012 International Conference on Machine Learning and Cybernetics (ICMLC) - Xian, Shaanxi, China (2012.07.15-2012.07.17)] 2012 International Conference on Machine Learning and

Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 July, 2012

OPTION MONEYNESS CLASSIFICATION USING SUPPORT VECTOR MACHINE

CHIH-HUNG WU1*, YI-LIN TZENG\ CHIH-CHAING LU2, GWO-HSHIUNG TZENG3

1 Digital Content and Technology, National Taichung University of Education, Tai-Chung, Taiwan R.O.C 2 International Business, National Taipei College of Business, Taipei, Taiwan R.O.C.

3 Graduate Institute of Project Management, Kainan University, Tao Yuan, Taiwan R.O.C. E-MAIL: [email protected]@[email protected]@gmail.com

Abstract: Determining the theoretical price for an option, or option

pricing, is regarded as one of the most important issues in financial research. In recent years, linear and non-linear

GARCH (Generalized AutoRegressive Conditional

Heteroskedasticity) models were used to estimate volatility. However, the empirical analysis of various different volatility

model estimations has not achieved consistent results. This study construct an Taiwan's existing tech index options price

classification with various a values to determine the

moneyness (at-the-money, in-the-money, out-the-money) of

option price. This study tested 140 models, the combinations

included 4 types of the kernel function in multi-SVM (Linear,

Polynomial, RBF, Sigmoid), 7 types of volatility estimation

(historical volatility, implied volatility, GARCH, IGARCH,

GJR-CARCH, EGARCH, TBGARCH) and 5 types of a (2%, 4%,5%,6%,8%). Finally, the classification result shows that

using a=2%, polynomial function multi-SVM with the three types of volatility estimation methods of TBGARCH,

EGARCH and GJR-GARCH would yield better classification

performance.

Keywords: Option Moneyness; Volatility; Support Vector Machine;

Kernel Function

1. Introduction

Taiwan's options transaction market opened on December 24, 2001, offering Taiwan's first options financial derivative: the TAIEX index Options (TXO). This financial derivative allow investors to control their investment risks in the securities market Determining the theoretical price for an option, or option pricing, is regarded as one of the most important issues in financial research.

Based on financial theories, there are six major factors that influence option premiums. The factors having the greatest effect are: 1. a change in price of the underlying security; 2. strike price, 3.time until expiration; 4.volatility of the underlying security; 5. dividendslRisk-free interest

978-1-4673-1487-9/12/$31.00 ©2012 IEEE

rate. In these factors, the effect of volatility is the most subjective and perhaps the most difficult factor to quantify, but it can have a significant impact on the time value portion of an option's premium. The volatility can be represented as a measure of risk (uncertainty), or variability of price of an option's underlying security.

For finding the optimal model to estimate volatility to assist with the pricing of options, several models were proposed. Among the various GARCH models used to estimate volatility, there several research gaps be found: 1. no model has achieved widespread approval in estimating volatility. 2. Differences in sample periods and characteristics have prevented researchers from reaching consensus regarding the estimation ability of each model. 3. As for the options evaluation model under the GARCH model framework, changing volatility does not usually result in a generalized evaluation equation. Though previous studies use different GARCH perspectives to evaluate options, empirical research on GARCH options evaluations is generally limited to European options.

Scholars have recently begun to use new artificial knowledge, such as support vector machine (SVM) to make evaluations and predictions in finance. Hence, in the recent years, many researchers have turned to nonparametric methods such as the ANN and SVR methods for option pricing [1, 2]. Nonparametric techniques such as ANNs, SVMs are the latest and most promising approaches, in respect to unbiasedness and pricing accuracy, relative to the parametric OPM. A well-developed ANN can be trained to forecasting price options in areas were the BSF is most biased [3].

However, most of studies focused on forecasting option price without first dividing the option price into at-the-money, in-the-money, or out-the money and then predicting it. Few studies focus on developing an effective classification model to predict option price into three status of moneyess of option price.

1715


This study construct an Taiwan's existing tech index options price classification with various a values to determine the moneyness (at-the-money, in-the-money, out-the-money) of option price. This study used seven different types of volatility (including historical volatility, implied volatility (BS model), GARCH, EGARCH, IGARCH, GJR-GARCH, and TB-GARCH) as the input variables separately, and the classification tool used the multi-class SVM with four different kernel functions (Linear, Polynomial, RBF, and Sigmoid). Finally, the results were compared from each model.

2. Research Methodology

2.1. ADF unit root test and volatility estimation

First, the ADF unit root test is conducted on the TEO object prices. If the object price conforms to the stationary assumption, then the historical volatility, implied volatility (BS model), GARCH, IGARCH, GJR-CARCH, EGARCH, and TBGARCH models are used to calculate volatility. If it does not conform to the normal hypothesis, then the above steps are conducted after single difference of the original

price sequence.

2.1.1 Unit Root Test

If time series is non-stationary, meaning it has unit roots, it must undergo difference, and then the sequence can become stationary and be estimated and analyzed statistically. The three forms of ADF testing are as follows:

(6)

Of which, Yt is the dependent variable, xt is the

independent variable, the expected value of Yt is xta,

0t_1 is the set of possible information used in the t-l period,

N(xta,ht) expressing the fact that this random variable was

produced from the normal distribution with expected value

xta and variance ht, ht is the conditional variance

function, the linear combination of disturbance item square in the past q period

b. IGARCH model

Integrated Generalized Autoregressive Conditional Heteroskedasticity model (IGARCH) model is the estimation equation simplifying earlier models created by [4], the format is as follows:

Gt = Yt -xta. 8t 1 0t_1 � N(O,ht) , ht = ao +aI8�1 + Plht_1

Of which, ao > O,al 'C. O,PI 'C. O,al + PI = 1

c. TBGARCH model

(7) (8)

Financial asset random variables tend to have leptokurtic distribution or clustering at the ends, Bollerslev [5] use t-distribution model:

8t = Yt -xta ; 8t 1 0t_1 � Tv(O,(v-2)ht Iv) (9)

p L'l.Yt = ao + rYt-1 + a2t + L PiL'l.Yt-i+1 + 8t (1) ht = ao + alGt�1 + Plht_1 (10)

i=2 p

L'l.Yt = ao + rYt-1 + L PiL'l.Yt-i+1 + 8t j=2

p L'l.Yt = rYt-1 + L PiL'l.Yt-i+1 + 8t

i=2

Of which, it satisfies ao > 0, at > 0, (2) PI > 0, al + PI < 1 , v is the degree of freedom.

(3) 2.1.3 Asymmetric GARCH model

p means that the residual item conforms to the optimal lag length with no sequence connections, selected based on the minimum value of Akaike information criterion (AIC) or Schwarz criterion (SC).

a. GJR GARCH model

Uses the GJR GARCH model proposed by [6] ; the theorized applications are as follows:

2.1.2 Volatility estimation GARCH model

a. GARCH model

Yt =xta+8t;Yt lOt_I �N(xta,ht) 8t = Yt -xta;8t 1 0t_1 � N(O,ht)

ht = ao + al8t�1 + ... + aq8t�q + Plht_1 + ... + Ppht_p

(4)

(5)

Gt = Yt -xta; Gt I 0t_l � N(O,ht) (11)

ht = ao +atGt} +rGt}Dt_t + Ptht_t2 (12)

Of which, r >0 means that there is a leverage effect,

if the previous period offers bad news, or 81-1 is less than

0, then the fixed virtual variable Dt_1 equals 1, with the

result: If 8t_1 < 0 then Dt_t = 1

1716


ht = ao + (a, + r)&t_,2 + fJ,ht_,2 (13)

If the previous period provides good news, &t-' is

greater than 0, then virtual variable Dt_l is 0, resulting in:

(14)

b. EGARCH model

Uses the exponential GARCH model (or EGARCH) proposed by [7]; the theorized applications are as follows:

&t = Yt - xta ; &t I 0t_, � N(O, ht) (15)

In(hJ =ao +al l.jh;- I+Yjh;-+ Plln(ht_l) (1 6)

Of which, r < 0 , and means the leverage effect exists.

If the previous period presented bad news (&t-l < ° ),

then Y&t-l > 0 , it would increase the conditional variance of

that period, conforming to the definition of leverage effect.

2.2. Partition by various a

In order to develop the option price moneyness classification model, the data will separate to three sub-datasets (in-the money, at-the-money, out-the money) via a cut. Thea ranges from 2%, 4%, 5%, 6%, and to 8%. The new M variables will be created and included in the following option price classification model. The M denotes each day's option price that belongs to one of the set: in-the-money, at-the-money, or out-the money.

2.3. Option price classification model

2.3.1 Input variables

The data was used in this proposal are the transaction data of Taiwan stock index options traded in Taiwan International Mercantile Exchange (TAIMEX). This study used TEO call option prices data. Only traded prices were used. Bid and ask prices were not included in this study. The columns of data are arranged as follows: trading day, date of expiration, strike price (S), option price (C). In addition, the time-to-maturity according to the trading and expiration dates (t) was calculated as one of the input variables. The variable is an important variable for option pricing.

This study developed an option price classification model with the model as below:

M = f(S,X,t,r,a)

In this model, this study included which from each

one of seven types of volatility estimation (historical volatility, implied volatility, GARCH, IGARCH, GJR-GARCH, EGARCH, TBGARCH). This arrangement created seven combinations with six variables for the input variables of the traditional Black-Scholes.

2.3.2 The kernel functions of SVM

In this study, the multi-class SVM with four different kernel functions (Linear, Polynomial, RBF, Sigmoid) be used to predict M (the option price belongs to in-the-money, at-the-money, or out-the-money). There is still dispute over which type of kernel function achieves a better approximation effect. Therefore, it is necessary to construct a suitable network framework and fmd the optimal classification results.

Linear kernel:

Polynomial kernel:

2

Gaussian (RBF): k(xj,xj) = exp( IIXj'X�II ) 2cr

Sigmoid kernel: k(xj,xj) = tanh(axT Xj +t)

2.3.3 Classification objects

(17)

(18)

(19)

(20)

The moneyness is defined as the quotient of stock price and strike price. Options are referred to as in-the-money, at-the money and out-of-the money in the standard option pricing terminology. In the money: Situation in which an option's strike price is below the current market price of the underlier (for a call option) or above the current market price of the underlier (for a put option). At the money: A condition in which the strike price of an option is equal to (or nearly equal to) the market price of the underlying security. Out of the money: A call option whose strike price is higher than the market price of the underlying security, or a put option whose strike price is lower than the market price of the underlying security.

Based on abovementioned terminology, a call option is in-the-money when S > X, at-the-money when S=X and out-of-the-money when S < X Event in the absence of transaction costs, an at-the-money is hard to find in the real-world case. Hence, this study uses a to indicate the at-the-money when S � X based on previous studies. The out-of-the-money is defined as S / X � I-a . The In the-money is defined as S / X > 1 + a . At-the-money is defined as 1-a < S / X � 1 + a . The data partition

1717


according to moneyness (stock price/strike price) among various a is adopted in this study. This study test the a from the range 1 % to 8% to partition the data and then find the best a with the highest classify SVM accuracy to train the in-the money option SVM model, at-the-money option SVM model, and out-the-money option SVM model.

3. Empirical analysis and results

3.1 Sample selection

The empirical data of this study includes the stock options daily data from the Taiwan Economic Journal (TEJ) database, selected from the call of Taiwan Electronic Sector Index Options (TEO). This database includes options data matured in the most recent month. This study selected the train data between September 2, 2011 and December 22, 2011, for 130 daily data items.

3.2 Classification and evaluation of moneyness

This study tested 140 models, the combinations included 4 types of the kernel function in multi-SVM (Linear, Polynomial, REF, Sigmoid), 7 types of volatility estimation (historical volatility, implied volatility, GARCH, IGARCH, GJR-CARCH, EGARCH, TBGARCH) and 5 types of a (2%, 4%, 5%, 6%, 8%). The classification accuracy was evaluated by hit ratio. Based on the hit ratio, this study determined the best option price classification model.

The accuracy in each model was shown in Table 1-5 and each model was compared in Figure 1-5. First, the accuracy of 4 types of the kernel function in multi-SVM, Polynomial function have the best performance when a is set 2%, 4%, 5% and 6%. REF function is better than linear function when a is set 4%, 5% and 6%. Sigmoid function is the lowest function in each model. Specially, when a is 2, both of the REF function and linear function almost have the same performance. When a is 8, Polynomial, REF and linear almost have the same performance.

Second, the accuracy of 7 types of volatility estimation, GJR-GARCH get 3 times the best performance when a is 2%, 5% and 8%. E-GARCH get 2 times the best performance whena is 2% and 4%. TB-GARCH only get the best performance when a is 6%. Specially, GJR-GRACH and E-GARCH have the same and also the best performance when a is 2%.

Third, the accuracy of 5 types of a (Figure 6), 2% model have the best performance (accuracy=89.26%), next is 5% model (accuracy=88.03%). The 6% model (87.9%), 8% model (87.46) and 4% model (87.15) have lower

accuracy.

3.3 Discussion

This study tested 140 models, the best combination is polynomial function, GJR-GARCH and E-GARCH, a=2%, the accuracy is 89.26%. Next combination is polynomial function, TB-GARCH, a=2%, the accuracy is 89.2%. It is theorized that usinga=2%, polynomial function multi-SVM with the three types of volatility estimation methods of TBGARCH, EGARCH and GJR-GARCH would yield better classification performance. This result and the past research are the same [8].

TABLE 1. ACCURACY OF EACH MODEL (a=2%)

Accuracy (%) a=2%

Linear Poll:nomial RBF Sigmoid Historical volatility 85.83 88.67 86.03 70.11 Implied volatility 85.70 89.06 85.91 66.30 GARCH 85.94 89.12 86.11 69.95 IGARCH 85.83 89.01 86.12 69.95 GJR-CARCH 85.93 89.26 86.13 66.22 EGARCH 85.91 89.26 86.00 69.93 TBGARCH 85.91 89.20 86.16 69.93

TABLE 2. ACCURACY OF EACH MODEL ( a =4%)

Accuracy (%) a=4%



Accuracy (%) a=5%



Accuracy (%) a=6%


1718


TABLES. ACCURACY OF EACH MODEL ( a =8%)

Accuracy(%) a=8%

Linear Poi;r.nomiai RBF Sigmoid historical volatility 86.7 87.30 86.88 62.63 implied volatility 86.5 87.45 86.71 58.31 GARCH 86.71 87.42 86.80 62.11 IGARCH 86.71 87.42 86.80 62.11 GJR-CARCH 86.68 87.46 86.74 57.65 EGARCH 86.71 87.40 86.75 57.47 TBGARCH 86.76 87.39 86.90 57.00

95 a=2 90 • • • • • • • 85 • • • • • • •

� 80

� 75 l!! a 70 u <[ 65

60

55 ___ linear _Polynomial -&-RBF �Sigmoid 50

History BS model GARCH IGARCH TB-GARCH GJR-GARCH EGARCH Volatility Volatility estimation model

Figure 1. Accuracy of each model (a=2 %)

95 a=4 90

• • • • • • • 85 .. • • • • • '"

_80 I �

e: 75 l!! H H H � 70 )( )( H ) � <[ 65

60

55 ___ Linear _Polynomial -&-RBF �Sigmoid 50

History OS model GARCH IGARCH TB-GARCH GJR-GARCH EGARCH Volatility Volatility estimation model

Figure 2. Accuracy of each model ( a =4%)

95 a=5 90

• • • • • • • 85 .. • • • • • '"

_80 �

� 75 l!! H 70 )( U H :IE 3E H , � <[ 65

60

55 ___ linear _Polynomial -a-RBF �Sigmoid 50


Figure 3. Accuracy of each model ( a =S%)

95

90

85

60

55

50

95

90

85

60

55

50

95

90

85

60

55

50

a=6

• • • • • • • .. • • • • • '"

H

___ Unear _Polynomial -a-RBF �Sigmoid

History BS model GARCH IGARCH TB-GARCH GJR-GARCH EGARCH Volatmty Volatility estimation model

Figure 4. Accuracy of each model ( a =6%)

a=8

I • • • • a •

___ Unear _Polynomial -a-RBF �Sigmoid


Figure S. Accuracy of each model ( a =8%)

kernel function: Polynomial --------,------

�a=2% _a=4% -'-a=5% �a=6% �a=8%

History BS model GARCH IGARCH TGARCH GJRGARCH EGARCH Volatmty Volatility estimation model

Figure 6. Accuracy of each a (kernel: Polynomial)

4. Conclusions

This study construct an Taiwan's existing tech index options price classification with various a values to determine the moneyness (at-the-money, in-the-money, out-the-money) of option price. This study tested 140 models, the combinations included 4 types of the kernel function in multi-SVM (Linear, Polynomial, RBF, Sigmoid),

1719


7 types of volatility estimation (historical volatility, implied

volatility, GARCH, IGARCH, GJR-CARCH, EGARCH, TBGARCH) and 5 types of a (2%, 4%, 5%, 6%, 8%). Finally, the classification result shows that usinga=2%, polynomial function multi-SVM with the three types of volatility estimation methods of TBGARCH, EGARCH and GJR-GARCH would yield better classification performance.

This study found the option price classification model to forecast the moneyness of option price into 3 classes: at-the-money, in-the-money, and out-the-money. The classification model can be an input variable to develop finial the two-stage option price forecasting model.

Acknowledgement

Wu, C. H., Lu, C. H., and Tzeng, G. H. thank the National Science Council of Taiwan (grants NSC 100-241O-H-I42-009) for support.

References

[1] Liang, x., et aI., "Improving option price forecasts with neural networks and support vector regressions." Neurocomputing, Vol. 72 No. 13-15, pp. 3055-3065, 2009.

[2] Wang, P., "Pricing currency options with support vector regression and stochastic volatility model with

jumps." Expert Systems with Applications, Vol. 38 No. 1, pp. 1-7,2011.

[3] Lajbcygier, P., "Improving option pricing with the product constrained hybrid neural network." Neural Networks, IEEE Transactions on, Vol. 15 No. 2, pp. 465-476,2004.

[4] Engle, R. and T. Bollerslev, "Modelling the persistence of conditional variances." Econometric Reviews, Vol. 5 No. 1, pp. 1-50, 1986.

[5] Bollerslev, T., "Generalized autoregressive conditional heteroskedasticity." Journal of Econometrics, Vol. 31 No. 3, pp. 307-327, 1986.

[6] Glosten, L.R., R. Jagannathan, and D.E. Runkle, "On the relation between the expected value and the volatility of the nominal excess return on stocks." Journal of Finance, Vol. 48 No. 5, pp. 1779-1801, 1992.

[7] Nelson, D.B., "ARCH models as diffusion approximations." Journal of Econometrics, Vol. 45 No. 1-2, pp. 7-38,1990.

[8] Lu, C.C. and C.-H. Wu. "Support Vector Machine Combined with GARCH Models for Call Option Price Prediction" .Proceeding of 2009 International Conference on Artificial Intelligence and Computational Intelligence (AICI'09), ShangHai, pp. 35-40,2009

1720

Documents

[IEEE 2012 International Conference on Machine Learning and Cybernetics (ICMLC) - Xian, Shaanxi, China (2012.07.15-2012.07.17)] 2012 International Conference on Machine Learning and