Upload
ngoliem
View
214
Download
0
Embed Size (px)
Stock Market Volatility Prediction:A Service-Oriented Multi-Kernel Learning Approach
Feng WangState Key Lab of Software Engineering
Wuhan UniversityWuhan, China
Email: [email protected]
Ling LiuCollege of Computing
Georgia Institute of TechnologyAtlanta, Georgia, USA
Email: [email protected]
Chenxiao DouSchool of Computer Science
Wuhan UniversityWuhan, China
Email: [email protected]
Abstract—Stock market is an important and active part ofnowadays financial markets. Stock time series volatility analysisis regarded as one of the most challenging time series forecastingdue to the hard-to-predict volatility observed in worldwide stockmarkets. In this paper we argue that the stock market state isdynamic and invisible but it will be influenced by some visiblestock market information. Existing research on financial timeseries analysis and stock market volatility prediction canbeclassified into two categories: in depth study of one marketfactor on the stock market volatility prediction or predict ion bycombining historical price fluctuations with either tradin g volumeor news. In this paper we present a service-oriented multi-kernelbased learning framework (MKL) for stock volatility analys is.Our MKL service framework promotes a two-tier learning ar-chitecture. In the top tier, we develop a suite of data preparationand data transformation techniques to provide a source-specificmodeling, which transforms and normalizes a source specificinput dataset into the MKL ready data representation. Then weapply data alignment techniques to prepare the datasets frommultiple information sources based on the classification modelwe choose for cross-source correlation analysis. In the next tier,we develop model integration methods to perform three analytictasks: (i) building one sub-kernel per source, (ii) learning andtuning the weights for sub-kernels through weight adjustmentmethods and (iii) performing multi-kernel based cross-correlationanalysis of market volatility. To validate the effectiveness of ourservice oriented MKL approach, we performed experiments onHKEx 2001 stock market datasets with three important marketinformation sources: historical prices, trading volumes and stockrelated news articles. Our experiments show that 1) multi-kernellearning method has a higher degree of accuracy and a lowerdegree of false prediction, compared to existing single kernelmethods; and 2) integrating both news and trading volume datawith historical stock price information can significantly i mprovethe effectiveness of stock market volatility prediction, comparedto many existing prediction methods.
Keywords-multiple kernel learning; stock prediction; supportvector machine; multi-data source integration;
I. I NTRODUCTION
Stock market is an important and active part of nowadaysfinancial markets. Since finance market has become moreand more competitive, stock price prediction has been a hotresearch topic in the past few decades. Autoregressive andmoving average are some of the popular stock volatility predic-tion techniques, which have dominated for years in time seriesanalysis. With the advancement of computing technology,data mining techniques have been widely used for the stockprice prediction. Several approaches using inductive learning
for prediction have been developed using historical stockprice data, such as k-nearest neighbor and neural network,which have greatly improved the performance of prediction.However, one major weakness of these existing approaches isthat they only rely on the historical price, and neglect someother information and their influences on the market volatility.
Many factors observed from other data sources tend to beless or not aligned well to the time-series information ofthe stock data such as historical prices, such as news aboutbig mergers, bankruptcy of some companies or unexpectedelection and economic turmoil. However, such sources ofinformation may have significant impacts on the time seriesbehavior of the market. In recent years, several researchershave engaged in studying stock price prediction by combininghistorical prices with news to help improve the performanceofprediction [4]. Most of them have applied or extended existingdata mining techniques to explore how and when the marketnews may influence the investors’ actions and in-turn affectthestock prices. At the same time, other researchers have arguedthat the trading volume is an important source of information,which may reflect some actions taken by the investors instock trading. Some preliminary studies have been conductedto show the impact of stock volume fluctuation on the stockprice movements. Some other studies have investigated theimpact of combining the trading volume with historical priceson the effectiveness of stock volatility prediction. Surprisingly,none of the existing research, to the best of our knowledge,has explored whether correlations of fluctuations in historicalprices, in trading volumes and in news (bad verse goodnews) may provide more and stronger indicators and improveaccuracy of stock market volatility predication.
In this paper we argue that combining historical pricefluctuations with two or more sources of information, such asboth volume and news, can significantly improve the quality ofstock market volatility prediction. For example, the correlationanalysis of more than two sources of information will providestronger indicators than correlating historical price with newsindependently from the correlation analysis of historicalpricewith volume. This is because when historical price and newsfail to show good correlation during a time interval, thepredication based solely on price and news may fail, but ifthe correlation between historical price and volume duringthe same time interval is strong, then we can still deliverhigh quality predication with high confidence. Based on this
intuition and our observations over real stock trading datasetsand related news datasets, we argue that the more sourcesof time series information related to stock market we use,the higher quality of the stock market volatility predictionmay produce. Our experimental results show that using allthree sources of time series data, historical prices, tradingvolume and news, one can significantly improve the stockmarket volatility predication over the predication based onone or two sources of information. Our second argument isthe need for developing systematic framework for integratingthe N (N ≥ 3) factors by taking into account three types ofinconsistencies: (i) the inconsistency observed within the samesource of information, (ii) the inconsistency observed in thecorrelation of historical price with one other source, and (iii)the inconsistency observed when correlating alln sources ofinformation.
For example, all existing work, which combines historicalprice fluctuations with news or historical price fluctuationswith trading volume fluctuations for the market predictionhave encountered some difficult dilemma in terms of pre-diction consistency. Figure 1(a) shows that the stock pricesare strongly correlated with the feedback of news and tradingvolume during the time periodT 1. More interestingly, whetherwe choose to combine historical prices with news articles orwith trading volumes is not critical due to that fact that bothnews and volume have strong correlations with the historicalprice fluctuations, the prediction result will be consistent withthe actual price movement in the market. However, for someother datasets or some different periods of the same dataset,we may observe inconsistency or possibly conflicting resultswith respect to the positive or negative movements of newsfeedback, the high or low of trading volumes and how theyrelate to the ups and downs of stock prices. This is primarilydue to the difference in terms of the correlation strength andthe ways of how correlation intervals in different time seriesdata sources are established. Figure 1(b) shows that for stock0005.HK, using the news and historical price we can makea correct prediction during the time periodT2, while usingtrading volume will produce incorrect prediction during thesame time period. In other cases, using news with histori-cal price will generate incorrect prediction while combiningtransaction volume with historical price will provide betterprediction accuracy [3]. Surprisingly, very few have engagedin investigating the problem of integrating more than two typesof information sources and studying how to utilize multipleinformation sources to improve the accuracy and consistencyin stock market volatility prediction.
In this paper we develop a service oriented multi-kernellearning method. We argue that it is not only necessary butalso critical to investigate a general, multiple kernel basedmethodology that utilizes a weight function with weight updatemethods to incorporate and balance the contributions frommultiple data sources in the market volatility prediction.Weconjecture that to deliver multi-kernel learning based stockvolatility analysis as a service, we need to address twomajor technical challenges: (1) How to extract the information
2001/05 2001/06 2001/07012
x 107 HK−0005
DateTra
din
g V
olu
me
2001/05 2001/06 2001/0705
10
DateNe
ws F
ee
db
ack
2001/05 2001/06 2001/0790
100110
Date
Sto
ck P
rice
Trading volume goes DOWN
New feedback goes DOWN
Stock price goes DOWNT1
(a) Stock 0005.HK (2001/05-2001/07)
2001/08 2001/09 2001/100
2
4x 10
7 HK−0005
Date
Tra
din
g V
olu
me
2001/08 2001/09 2001/100
20
40
Date
Ne
ws F
ee
db
ack
2001/08 2001/09 2001/1060
80
100
Date
Sto
ck P
rice
Trading volume goes DOWN
News feedback goes UP
Stock price goes UP T2
(b) Stock 0005.HK (2001/08-2001/10)
Fig. 1. Trading Volume, News Articles and Historical Price Movements WithConflicts
presented in the market news articles as well as the tradingvolume? (2) How to formulate a model by combiningN(N ≥ 3) data sources (e.g., historical prices, current pricesand news articles impacts, trading volume) together? In thispaper, we develop a two-tier multi-kernel learning framework(MKL) using a service oriented approach, aiming at adaptivelyand progressively integrating multiple kernels towards theimprovement on both performance and effectiveness of theprediction. In the top tier, for different data sources, wedevelop a suite of data cleaning and data transformationtechniques to provide a source-specific data modeling andpreparation, which transforms and normalizes each sourcespecific dataset of input source into the MKL ready data streamrepresentation. In the next tier, we first train the models withsingle information source to generate a source-specific sub-kernel. Then we perform iterative learning to adjust and tunethe weights of sub-kernels, and finally we predict the marketvolatility based on the cross correlation analysis of MKLlearning results. We evaluate our MKL method over the HKEx2001 stock market price dataset, stock trading volume dataset,and news articles of year 2001. Our experimental resultsshow that the service oriented development of our multi-kernellearning approach can seamlessly integrate multiple sourcesinto our market volatility analysis framework and increasethepredication accuracy significantly.
The rest of this paper is organized as follows. Section IIgives an overview of our proposed two-tier system architectureand focus on the tasks to be performed and the techniquesto be used in the top tier of the MKL system. Section IIIpresents the main tasks in the second tier of our system.Experimental results and discussion are reported in Section
IV. The conclusion and future work are given in Section V.
II. A RCHITECTUREOVERVIEW
The multiple data source prediction system has a two tierarchitecture. The top tier is dedicated to preparing the datasetsfrom multiple information sources to make them ready for thepredication tasks in the next tier. It is composed of two majorparts. The first part is data preprocessing. The second part isthe data alignment. Since the news release time is independentof stock trading, we should align the relevant news articlestothe market trading data.
The second tier is dedicated to the market volatility analysisand prediction through the model integration and training,which uses multiple kernel learning methodology to train theclassifier. It consists of three tasks: First, we build one sub-kernel per source. Second, we learn to update and tune theweights for sub-kernels through weight adjustment methods,and finally we perform the multi-kernel based cross-correlationanalysis of market volatility.
In this paper, we will give an instantiation of the multi-kernel learning method in the context of three informationsources: historical price, news articles and stock tradingvol-ume. In the rest of this section we will use these three infor-mation sources as an example to illustrate the key techniquesinvolved in the data preprocessing and data alignment tier.Wewill describe the three tasks of tier two development in SectionIII.
A. Data Preprocessing
1) News Articles:News articles are regarded as raw ma-terials that need to be preprocessed before aligning themwith the time series of historical stock price data. Given aset of news articles, our objective is to extract some usefulinformation in them. For example, we are interested in theinformation that will trigger the stock price changes. One ofthe key challenges in data preprocessing phase is to determinehow to extract useful information from the raw data sources.Some researchers align the contents of the whole news articlesto the stock prices movements. We believe that only certainterms from the raw news articles may have high likelihoodto trigger the stock price changes. For text document basedinformation sources such as news articles, we need to performthe following four steps in the data preprocessing phase:
• Language-specific term segmentation and refinement.This step is to segment each of the news articles into acollection of terms. Documents written in different lan-guages will need different term segmentation techniques.For Chinese news article, we perform term segmentationby utilizing an existing Chinese segmentation software1.Although the segmentation software could produce out-puts with high quality, many words that are specific tofinance domain cannot be segmented correctly. A financedictionary is thus employed to refine the segmentationresults.
1http://ictclas.org/
• Term normalization and filtering . This step performstwo tasks: (1) remove stop word and (2) filter out otherunimportant terms/words and keep representative terms,such as nouns, verbs and adjectives.
• Feature selection. For long articles, not all theterms/words will be included into the final feature list.For example, Feldman [2] (Chapter IV.3.1) selects about10% of the words as features. After the filtering step,we get a word-frequency matrix with 7052 term/words.We computeχ2 score of these words and select 1000terms/words with highest scores as features of news.
• Term weighting. With the selected 1000 features, wecalculate the widely usedtf ·idf value for each term/wordselected as its weight.
2) Historical Prices: With the development of high fre-quency trading, stock tick data is popularly used for marketvolatility prediction. It is important to note that tick data aredistinguished from daily data in the following aspects.
• Big amount. Tick data have much more records thandaily data over the same time period.
• Unordered. Tick data is not recorded by the order of theirtime stamps, but by the time they arrive at the loggingsystem.
• Variant interval . Time intervals between different tickrecords may not be the same. As transactions may happenin any seconds, a time interval between consecutiverecords is not always the same.
• Incomplete or missing Items. The tick data may alsoarrive with some incomplete information or missing in-formation due to collection errors or transmission errors.
Thus, raw tick price data is preprocessed through the followingtwo steps:
• Sorting. Since transactions do not arrive in the order oftheir time stamps, we must first sort the entire list of tickrecords by their time stamps. This allows us to performtime series based alignment at a later stage.
• Interpolation . Since time intervals between consecutivetransactions are not the same, and sometimes there existsno record for some time periods, which leads to oneproblem: In order to perform time-aligned correlationanalysis we need to determine what price value shouldbe filled in during that time period. There are twocommon ways to address this problem: (1) linear time-weighted interpolation, and (2) nearest closing price.The former takes into account the price information inthe neighboring periods and place higher weight to thenearest neighbor period to compute a price value duringinterpolation process. The later method splits tick data ina given time unit interval (e.g., a minute) and samplesthe closing price in each time unit (e.g., per minute). Ifthere is no record in a given time unit, then the closingprice of previous time unit will be taken as the closingprice of this time unit. The choice of time unit dependson the time series alignment requirement and the tick datasampling rate. Both approaches have their pros and cons.
In the experiments reported in this paper we used thesecond method, as it is simple and easy to implement.
3) Trading Volume: Trading volume as time series datashares many similarities in terms of data characteristics andthus the preprocessing steps. First, it is similar to extractthe trading volume from the high frequency trading recordsas the process of extracting price data. Since the value oftrading volume data fluctuates sharply in the market, in orderto minimize the noise in the classifier training process, thetrading volume ratiovr, the difference between the currenttrading volume and the previous trading volume normalizedby the previous trading volume, can be used to represent thetrading volume data series.
Thus, raw trading volume data is preprocessed also usingthe following two steps.
• Sorting. Given that the trading volume is extract fromthe transaction records, and the transaction records gotfrom the data source are unordered, in order to preparefor the alignment of the trading volume with news andhistorical price, we first need to sort the trading records.
• Interpolation . There might have lots of transactions dur-ing a time periodTi. In this circumstance, we summarizedthe trading volume of each transaction during this timeperiod as the trading volumevi during time intervalTi.For those time periods which do not have any transactionrecord or its transaction records are not readable, weconsider no trading activities occurred and label it as zero.
Upon the completion of source specific data preprocessing,we have cleaned and tidied every information source basedon our multi-kernel learning framework. In order to constructa sub-kernel for each information source in terms of multi-kernel learning objectives, we need to perform data alignmentto finalize the set of alignment operations that we need toperform before we enter the tier two learning phase.
B. Data Alignment
Support Vector Machines (SVMs) are popular for classify-ing time series data. In order to use a SVM classifier, we needto re-sample the time series to a common length or to extractfixed number of features before applying static kernels. Wecall this type of alignments the classifier-specific alignment. Inaddition, we also need to correlate one information source withanother, such as correlating news articles with historicalpricetime series, we refer to this type of alignments as correlation-based data alignment.
1) Alignment of Market Price Time Series:Given that ourgoal for market volatility analysis is to predict the upcomingtrend of market price in terms of going up or down or on hold,this can be done using the rate of change for a given periodn on a time seriesP , often referred to as relative differencein percentage (RDP):
RDP − k(Pk) = 100×Pt − Pt−k
Pt−k
k is the prediction horizon. Based on this preprocessing,we can examine many different input time series and their
performance in conjunction with the output series and thebest results typically are achieved by using a multidimensionalinput vector consisting of several rates of change with differentperiods. For example, we can incorporate the time series RDP-1, RDP-2, RDP-3, RDP-4, RDP-5 and so forth. As a result,the different values at each time represent by which ratio thecurrent price differs from a distinct price in the past.
Another advantage of transforming the original market pricetime series into relative difference in percentage of price(RDP-k) is to preserve the sequential dependency of the pricetime series. If we naively take the 30 price points as 30independent features, the sequential dependency of the priceserial p−30, p−29, . . . , p−1 is not preserved and thus themachine learning model will not be able to use the sequenceinformation. Several existing work [1] have shown that thispreprocessing will improve the predictive power.
From the trading perspective, the prediction horizon shouldbe sufficiently long to avoid excessive transaction costs inover-trading resulting. But from the prediction perspective,the prediction horizon should be short enough since thepersistence of financial time series is of limited duration.Moving average (MA) is commonly used with time series datato smooth out short-term fluctuations and highlight longer-term trends or cycles, and MA-5 is mostly common used in theprediction over time series. Thus in the experiments reportedin this paper, we set the valuek as 5 and the calculations forthe RDPs indicators are given in Table I.
TABLE ITHE FORMULAE OFRDPS
RDP FormulaRDP-5 100 ∗ (pi − pi−5)/pi−5
RDP-10 100 ∗ (pi − pi−10)/pi−10
RDP-15 100 ∗ (pi − pi−15)/pi−15
RDP-20 100 ∗ (pi − pi−20)/pi−20
RDP-25 100 ∗ (pi − pi−25)/pi−25
RDP-30 100 ∗ (pi − pi−30)/pi−30
In addition to RDPs, one can also employ some other pop-ular market indicators from stock technical analysis. Table IIlists the formulae of some of such market indicators. wherepi
TABLE IIOTHER MARKET INDICATORS
Indicator FormulaRSI(q) 100 ∗ UpAvg/(UpAvg+DownAvg)
UpAvg =∑
pi>(∑
i pi)/q(pi − (
∑i pi)/q)
DownAvg =∑
pi<(∑
i pi)/q(pi − (
∑i pi)/q)
RSV(q) 100 ∗ (p0 −minq(pi))/(maxq(pi)−minq(pi))R(q) 100 ∗ (maxq(pi)− p0)/(maxq(pi)−minq(pi))BIAS(q) 100 ∗ (p0 − (
∑i pi)/q)/((
∑i pi)/q)
PSY(q) 100 ∗ (∑
1{pi > pi−1})/q
is the price at minutei, q refers to the order counted in minute,RSI(q) is relative strength index, RSV(q) is raw stochasticvalue, R(q) is William index, BIAS(q) represents the bias ofthe stock price, and PSY(q) means psychological line.
2) Alignment of Stock Trading Volume Time Series:Thestock trading volume data can be extracted from the tick-by-tick trading records. In order to predict the volatility of thestock price movements, we can re-sample the trading volumetime seriesV in terms of the rate of trading volume change,which is defined as the relative difference in percentage oftrading volume, denoted by vRDP.
vRDP − k(Vi) = 100×Vi − Vi−k
Vi−k
In order to ensure that the features of trading volumedata that we feed into the SVR classifier learning model areindependent, we also convert the trading volume data intosome trading volume indicators using the formulae of vRDPsbefore the training of model. Table III lists a set of vRDPformulas for the volume indicator conversion, which we usein the experiments reported in this paper.
TABLE IIITHE FORMULAE OF VRDPS
vRDP FormulavRDP-5 100 ∗ (vri − vi−5)/vri−5
vRDP-10 100 ∗ (vri − vi−10)/vri−10
vRDP-15 100 ∗ (vri − vi−15)/vri−15
vRDP-20 100 ∗ (vri − vi−20)/vri−20
vRDP-25 100 ∗ (vri − vi−25)/vri−25
vRDP-30 100 ∗ (vri − vi−30)/vri−30
3) Alignment of News Articles with Market Price TimeSeries: This type of alignments is to prepare and align otherinformation sources for market volatility analysis based ontheir correlation with the extracted historical price features.This enables us to train machine learning model with infor-mation from multiple sources.
In the context of news articles, we need to extract newsfeatures that can correlate with the stock price volatility.Given a set of stocks,S = S1, S2, . . ., where a stockSi = si1, si2, . . . , siT is a sequence of stock prices in the timeintervalT (for example at 1 minute sampling rate). In the sameinterval there exists a set of news articlesA = A1, A2, . . .,where a news articleAi contains a set of features (such asterms)fij (j = 1, . . . ,m). We assume that it is known whichstock Si an article is related to. We represent a featuref
related to a stockSi in the time intervalT as a time series,f(i) = f i(1), f i(2), . . . , f i(T ), where f i(t) is defined asfollows:
f i(t) =AFi,f (t)
Ni(t)× log(
Ni(A)
AFi,f (A))
whereAFi,f (t) denotes the number of related articles inA
containing the featuref for the stockSi at time t andNi(t)denotes the total number of articles inA which are related tostockSi at timet. Heret is a time unit defined by user, suchas one minute, one day, or one week. We callf i the adjustedTF-IDF value of the news featuref related to stockSi at timet. Similarly,AFi,f (A) denotes the number of articles inA thatare related to stockSi and containing the featuref during the
interval T andNi(A) denotes the total number of articles inA which are related to stockSi during the intervalT .
Similarly, we can also align news with trading volume timeseries.
In principle, with a total ofN information sources, if ourgoal is to predict the volatility of market prices, we need toalign all N − 1 sources with the stock price time series.
In addition, one can also introduce different weights fordifferent features. Due to space constraint, we omit thisnormalization step in this paper.
III. M ULTI -KERNEL LEARNING AND CLASSIFICATION
In this section we describe the main functional componentsin the second tier of our MKL framework, including trainingand building kernels for different information sources andintegrating multiple sub-kernels through a chosen machinelearning model for classification based prediction.
A. Model Integration and Training with MKL
1) Model Integration:In order to do the model integration,we explore the idea of ’incremental learning’ to constructfive different prediction models for companion and betterunderstanding of the effectiveness of our MKL model. Firstly,we use some single information source based models to do theprediction. According to the difference of the data sources, wenamed it as news article only model which only use news datafor prediction, and historical prices only model which onlyuse historical price data for prediction. We then make somenaive combination about the data source, and propose a modelnamed naive combination model which simply combines thenews data and price data together. Thirdly, we explore themultiple data source which named as MKL model. In orderto show the significance of incorporating trading volume inMKL prediction, we compare the performances of the modelof using news data and price data only (MKL-NP) with themodel of using news data, price data and trading volume datatogether (MKL-NPV).
• News article only (Model 1) . This model takes labelednews instances as the input of SVM. It tests the predictionability when there are only news articles.
• Historical prices only (Model 2). This model takeslabeled price data as the input of SVM. It tests theprediction ability when there are only history prices.
• Naive combination of news article and historicalprices (Model 3). This approach uses the simple com-bination of news articles and prices. Naive combinationmeans combine the 1000 features from news and 11features from indicators to form a 1011 feature vector.
• Using multiple kernel to combine news articles andhistorical prices (MKL-NP) . Based on model 3, thismodel uses multiple kernel learning approach to analyzethe news articles and prices data.
• Using multiple kernel to combine news articles, histor-ical prices and trading volume (MKL-NPV) . Based onmodel MKL-NP, this model uses multiple kernel learning
approach to analyze the news articles, prices data andtrading volume.
Many other prediction models could also be employed andintegrated in this MKL framework. Due to the space constraint,we select a small set of representative models to be discussedand compared in this paper.
2) Model Training: Kernel based method such as supportvector machines (SVM) has been successfully applied toclassify the time series using characteristic attributes extractedfrom the time series and related datasets as inputs. Basically,SVM uses a hyperplane to separate two classes. For clas-sification problems that cannot be linearly separated in theinput space, SVM finds a solution using a nonlinear mappingfrom the original input space into a high-dimensional featurespace, where an optimally separating hyperplane is searched.We call a hyperplane optimal if it has a maximal margin,namely the minimal distance from the separating hyperplaneto the closed (mapped) data points (so-called support vectors).The transformation is usually realized by nonlinear kernelfunctions. There are many kinds of kernels can be used totransform the original feature space to a high dimensionalkernel feature space, in which the SVM is used to find a linearclassifier.
Suppose the training setG = {(xi, yi)|i = 1, . . . , n}, wherexi belongs to some input spaceX andyi is the desired targetvalue for patternxi, andn is the total number of data patterns,SVMs based classifier is defined as:
min 12 ‖w‖
2 + Cn∑
i=1
ξi
s.t.
{
yi (〈Φ (w) , xi〉+ b) ≥ 1− ξiξi ≥ 0
where w is the vector of parameters defining the optimaldecision hyperplane〈Φ (w) , xi〉+ b = 0, and b representsthe bias. 12 ‖w‖
2 is the regularization item and controls thegeneralization capabilities of the classifier and can be tuned
during the training process. The second termCn∑
i=1
ξi is the
empirical risk (error).C is referred to as the regularizedconstant and it determines the tradeoff between the empiricalrisk and the regularization item. Increasing the value ofC willresult in the relative importance of empirical risk with respectto the regularization item to grow. Positive slack variables ξiallow to deal with the permitted errors.
The bottleneck for any kernel method is the definition ofa kernelk that accurately reflects the similarity among datasamples. However, not all metric distances are permitted.In fact, valid kernels are only those fulfilling the Mercer’sTheorem and the most common ones are the following: (1)The linear kernel,k(xi, xj) = Φ(xi)∗Φ(xj). The value of thekernel equals to the inner product of two vectorsxi andxj inthe kernel spaceΦ(xi) andΦ(xj). (2) The polynomial kernel,k(xi, xj) = (〈xi, xj〉+ 1)d ,d is an integer. (3) The radial
basis function (RBF),k(xi, xj) = exp(
−‖xi−xj‖2
2σ2
)
, σ ∈ R.It is also called the Gaussian kernel. The kernel parameter
should be carefully chosen as it implicitly defines the structureof the high dimensional feature spaceΦ(x) and thus controlsthe complexity of the final solution.
In the experiments reported in this paper, we choose to usethe radial basis function (RBF) as the kernel in single kernelbased model. There are many parameters in the kernel functionto tune. For RBF kernel function, there are two parametersγ and C to be optimized in the training progress. Detailedparameters tuning methods will be discussed in section IV.
For training model 1, model 2 and model 3, we use gridsearch and 5-fold cross validation to find the best combinationof model parameters. As for MKL, parameter selection is alittle bit different. As the best parameter combination forsub-kernels of news and indicators has been found during thetraining of model 1 and model 2, we just need to adopt thederived parameters and search the best parameters which arespecific to MKL.
B. Multi-Kernel Based Classification
1) Multi-kernel Learning: For SVM, we need to test bycross validation to find the best parameters and find the bestkernel. But for multiple data sources, it is too complicatedtomap the data with one kernel to highly nonlinear space. MKLuses multiple kernels to map the data to the other space andmerge different kernels to try to get a better separation functionF . In MKL, it combines many kernels and the weights of eachkernel regulates its importance. The kernels can be differentkernels, with different parameters or different data sets forthe same labels. In the multi-kernel learning framework, theoptimal kernel is learned from data by building a weightedlinear combination ofM base kernels. Each kernel in themixture may account for different features or set of features.The use of multiple kernels can enhance the performance ofthe model and, more importantly, the interpretability of theresults.
Supposekm (m =1,...M) areM positive definite kernels onthe same input spaceX ,
min 12
1dm
‖Fm‖2Hm+ C
N∑
i=1
ξi
s.t.∀i,m,
yiM∑
m=1Fm (xi) + yib ≥ 1−ξi
ξi ≥ 0M∑
m=1
dm = 1, dm ≥ 0
wheredm is the weights of subkernelm, andξ representsthe loose variables andC is used to control the generalizationcapabilities of the classifier and is chosen by cross validation.In our case, the learned (optimized) weightsdm will directlygive a ranked relevance of each kind of data information weused in the prediction process.
2) MKL based Classification:In order to decrease thetraining time and increase the ability of prediction, MKL isapplied in the multiple data sources prediction system. In theimplementation of MKL, model selection was performed bybuilding several kernels with different kernel parameters.
In our case, we use RBF kernel for SVM and multiplekernel learning (MKL). We employ multiple kernel learningto aggregate the information within news articles, historicalprices and stock trading volume. And we apply a multiplekernel learning toolbox SHOGUN [5] in our experiment.Unlike naive combination model (model 3) which trains SVMusing
f(−→x ) = sign(m∑
i=1
αiliknaive(−→x i,
−→x ) + b)
where −→x i, i = 1, 2, . . . ,m are labeled training samples of1011 features, andli ∈ {±1}. In the MKL based mod-els, similarity is measured among the instances of news,instances of price indicators and instances of trading volumesrespectively. We construct three similarity matrices for eachdata sources through the Euclidean distance. And these threederived similarity matrices are taken as three sub-kernelsofMKL and weight dm,news, dm,indicator and dm,volume arelearnt for sub-kernels,
k(−→x i,−→x j) =dm,newsknews(
−→x(1)i ,−→x
(1)j )+
dm,indicatorkindiccator(−→x
(2)i ,−→x
(2)j )+
dm,volumekvolume(−→x
(3)i ,−→x
(3)j )
wheredm,news, dm,indicator, dm,volume ≥ 0 anddm,news+dm,indicator+dm,volume = 1, −→x (1) are news instances of 1000features,−→x (2) are indicator instances of 11 features, and−→x (3)
are volume instances of 6 features.For other types of information sources or sub-kernel combi-
nations, similar distance based similarity matrices and kernelfunctions can be constructed and easily imported into ourmulti-kernel based learning framework.
IV. EXPERIMENTAL RESULTS AND DISCUSSION
To validate the effectiveness of our service oriented MKLapproach, we performed experiments on HK exchange datasetswith three important market information sources: historicalprices, transaction volumes and stock related news articles.Our experimental results show that the proposed multiple datasource prediction system with MKL has better performance,higher degree of accuracy and lower degree of false prediction,compared to single kernel models. Also integrating both newsand trading volume data with historical stock price informationcan significantly increase the prediction accuracy significantly.
All experiments reported in this paper are performed usinga sever with 2.53 GHz CPU, 4G Bytes EMS memory andrunning on a Linux operating system. The average runningtime of our MKL algorithm is about 11 minutes with threetypes of data sources (historical price, news articles and stocktrading volume), which is very close to the time (nearly 10minutes) spent in the experiments performed on two types ofdata sources (historical price and news article). That means,the multiple data source prediction system has good scalability.
A. Data Sets
In order to evaluate the performance of the multiple datasource prediction system, we used the following three kindsof data sources: 1) stock price data, 2) news articles and 3)stock trading volume.
• Market prices. The market prices contain all the stocks’prices of HKEx in year 2001.
• News articles. The news articles of year 2001 used inour experiment are bought from Caihua2. All the newsarticles are written in Traditional Chinese. Each piece ofnews is attached with a time stamp indicating when thenews is released.
• Stock trading volume. The stock trading volumes areextracted from the corresponding stocks’s trading recordof HKEx in year 2001.
Time stamps of news articles and prices/trading volumes aretick based.
HKEx has thousands of stocks and not all the stocks areactive in the market. We mainly focus on the constituents ofHang Seng Index3 (HSI) which, according to the change log,includes 33 stocks in year 2001. However, the constituents ofHSI changed twice in year 2001, which was on June 1st andJuly 31th. Due to thetyranny of indexing, price movementof newly added constituent is not rational and usually willbe mispriced during the first few months. We only select theconstituents that had been constituents through the whole year.Thus, the number of stocks left becomes 23. The first 10-month data is used as the training/cross-validation set andthelast 2-month data is used as the testing set.
B. Parameter Selection
During model training period, parameters are determinedby two methods: grid search and 5-fold cross validation. Takemodel 1’s training for example, SVM parameters to be tunedare γ and C. For γ, algorithm searches from0 to 10 withstep size0.2; For C, the step size is1 andC searches from1 to 20. Thus, there are totally50× 20 = 1000 combinationsof parameters (in other words,1000 loops). In each loop,5-fold cross validation is executed to validate the system’sperformance, which equally splits the first 10-month data into5 parts and use 4 of them to train the model and the left 1part to validate. Among the 1000 combinations, the one withthe best performance is preserved and used to configure thefinal model for testing.
For models 1, model 2 and model 3, we select the parame-ters in the same way. For multiple learning models (MKL-NPand MKL-NPV), we just adopt theγs which have already beenselected in multiple models’ training. MKL’s parameterC isselected by the same method as the other models.
C. Experimental Results
The accuracy of the prediction is obtained by checkingwhether the direction of the predicted volatility is the same
2http://www.finet.hk/mainsite/index.htm3http://www.hsi.com.hk/HSI-Net/
as the actual trend. For instance, if the prediction for theupcoming trend is ’UP’ and the upcoming trend is reallyrising, then we say that the prediction is correct; otherwise, ifthe prediction is ’DOWN’, but the upcoming trend is rising,then we say that the prediction is wrong. Here we use theformula based on precision and recall to measure the accuracyof prediction, which has adopted by many previous works [4].
accuracy =tp+ tn
tp+ fp+ tn+ fn
wheretp andtn refer to the number of true positives and truenegatives respectively.fp andfn denote the number of falsepositives and false negatives.
Table IV lists the cross validation results and Table Vlists the results of independent testing (numbers in bold fontindicate the best results at the given time point and the secondbest results are underlined). From these two tables, we cansee that the multiple learning models (MKL-NP and MKL-NPV) outperform the other three single models both in crossvalidation and independent testing, except for point 5m incross validation and points 10m in testing. Although both naivecombination approach (model 3) and multiple learning modelsmake use of market news and prices, naive combination doesnot outperform single information source based model asexpected. On the other hand, due to a different learning ap-proach, MKL models balance thepredictabilityof news, pricesand trading volume (Market news, stock prices and tradingvolume have their own characteristics and the informationhidden in either of them could be a complement to the other.).Comparing to cross validation results in TableIV, althoughtheperformance of MKL models decrease in independent testing,MKL-NP can achieve 3 second-best-results and MKL-NPVcan achieve 4 best-results and 1 second-best-result, whichalsoproves that MKL models outperform than the other models.From these tables, it can be clearly observed that MKL-NPVhas better performance than MKL-NP in most circumstances,which means that, taking the trading volume information intoaccount is much helpful to improve the performance of theprediction system.
TABLE IVPREDICTION ACCURACY OF5-FOLD CROSS VALIDATION(%)
Model 5m 10m 15m 20m 25mIndicator 60.24 58.42 57.59 58.48 57.99News 59.12 62.65 62.84 63.06 60.93Naive combination 60.05 61.78 62.84 62.84 60.85MKL-NP 60.20 62.94 63.68 64.23 61.14MKL-NPV 59.43 62.3 64.12 65.29 62.56
TABLE VPREDICTION ACCURACY OF INDEPENDENT TESTING(%)
Model 5m 10m 15m 20m 25mIndicator 53.48 50.23 47.84 48.92 45.38News 52.94 51.13 44.40 52.38 53.41Naive combination 56.15 61.54 49.14 52.38 45.78MKL-NP 56.68 57.84 53.20 53.87 50.34MKL-NPV 59.1 59.58 58.25 56.69 54.39
V. CONCLUSION
We have presented a service-oriented approach to marketvolatility analysis and prediction based on a multi-kernellearning framework (MKL). By service oriented in the contextof market prediction, we mean that the users of our MKLservice framework can feed the service withn sources of stockmarket time series datasets, and obtain the prediction of up,down or hold of a given stock in the forecasting time interval.The MKL service framework is designed using the two-tierservice oriented architecture. The first tier is focused on datacleaning and data transformation techniques, which performdata alignment and normalize a source specific input datasetinto the MKL ready data representation. The second tier isdedicated to developing model integration and normalizationto prepare the datasets from multiple information sources tobuild one sub-kernel per information source and learning andtuning the weights for combining sub-kernels and performmulti-kernel based cross correlation analysis over marketvolatility. To the best of our knowledge, this is the first workthat presents a service oriented approach to facilitating theincorporation of multiple factors (more than two) in stockmarket volatility prediction. We conduct experiments by usingone whole year of Hong Kong stock market tick data. Resultshave shown that, 1) Multiple kernel learning methods have ahigher accuracy rate and a lower false prediction rate comparedto single kernel methods; and 2) Both news and trading volumeinformation are critical sources of information for improvingthe effectiveness of stock market volatility predication.
ACKNOWLEDGEMENT
The first author’s research is partially supported by theNational Nature Science Foundation of China under grant No.61103125, the Doctoral Fund of Ministry of Education ofChina under grant No. 20100141120046, the Natural ScienceFoundation of Hubei Province of China (Grant No. 2010CD-B08504), and Research Project of Shanghai Key Laboratoryof Intelligent Information Processing (Grant No. IIPL-2010-007). The second author’s research is partially sponsored bygrants from USA NSF CISE NetSE program and CyberTrustprogram, an IBM faculty award and a grant from Intel ISTCon Cloud Computing.
REFERENCES
[1] L. Cao and F. Tay. Support vector machine with adaptive parameters infinancial time series forecasting.IEEE Transactions on Neural Networks,,14(6):1506–1518, 2003.
[2] R. Feldman and J. Sanger.The text mining handbook. 2007.[3] X. Li, C. Wang, J. Dong, F. Wang, X. Deng, and S. Zhu. Improving stock
market prediction by integrating both market news and stockprices. InDEXA (2), pages 279–293, 2011.
[4] R. Schumaker and H. Chen. Textual analysis of stock market predictionusing breaking financial news: The AZFin text system.ACM Transactionson Information Systems (TOIS), 27(2):12, 2009.
[5] S. Sonnenburg, G. Ratsch, S. Henschel, C. Widmer, J. Behr, A. Zien,F. Bona, A. Binder, C. Gehl, and V. Franc. The SHOGUN machinelearning toolbox.The Journal of Machine Learning Research, 99:1799–1802, 2010.