A High-order Interactive Hidden Markov Model and …hkumath.hku.hk/~wkc/talks/IhmmOR.pdfhidden Markov chains, as well as the feedback e ect of the ob-served chain on the hidden one

A High-order Interactive Hidden Markov Model andIts Applications 1

Wai-Ki CHINGDepartment of Mathematics

The University of Hong Kong

Dedicated to Professor Raymond H. CHAN on the occasion of his 60th birthday

In this talk, we discuss some Hidden Markov models (HMMs). In particular, we propose a

higher-order Interactive Hidden Markov Model (IHMM), which incorporates both the feedback

effects of observable states on hidden states and their mutual long-term dependence. The

key idea of this model is to assume the probability laws governing both the observable and

hidden states can be written as a pair of high-order stochastic difference equations. We also

present an efficient estimation method for the model parameters. Some applications are given

to demonstrate the effectiveness of our proposed model.

Keywords: Interactive Hidden Markov Model (IHMM); Hidden Markov Model (HMM); Feedback Effect;

Stochastic Difference Equations.

1A joint work with Robert J. ELLIOTT (Calgary University) T.K. SIU (Macquarie University), L. ZHANG (Nanjing Uni-versity), D. ZHU (Southeast University). OR Spectrum, 39 (2017) 1055–1069. Research supported in part by Research GrantsCouncil of Hong Kong under Grant Number 17301214.

The Outline

1. Motivations

2. Background

3. The High-Order Interactive Hidden Markov Model

4. Numerical Examples

5. Summary

2

1 Motivations

• Markov chain models have many applications in diverse fields:Engineering, Biomedical Science, Management Science, Economics,Finance and Actuarial Science, etc.

- Queueing networks (Chan and Ching (1996))- Credit risk management (Siu et al. (2005), Leung and Kwok(2009))- Customer classification (Ching et al. 2004a)- Environmental problems (Yang et al. (2011))- Pricing problems (Ching et al. (2009) and Choi et al. (2011))

• For a comprehensive discussion on the applications of Markovchains, interested readers may consult Ching and Ng (2006).

3

1.1 High-order Markov Chains

• The probability law of a stochastic system may depend on thepast states of the system up to a certain lag.

• Biomedical Science and Bioinformatics:-DNA Sequencing (A,C,T,G).

• Economics and Finance:-Long-term dependence in economic and financial time series: theJoseph effect (Cutland et al. (1993, 1995))

• To model the long-term dependence in Markov chains:- High-order Markov chains (Weak Markov chains) are in-troduced.

4

• The probability law of a high-order Markov chain can be writtenin the form of a stochastic difference equation.

- Ching et al. (2004b) considered a high-order Markov chain modelfor analyzing categorical data sequences.

- Siu et al. (2009) applied a discrete-time weak Markov chain torisk measurement.

• Most of these applications indicate that taking high-order ef-fects into consideration can yield better results compared tofirst-order case.

• The order can be determined by using BIC (Bayesian Infor-mation Criterion), Raftery (1985).

5

1.2 Hidden Markov Models (HMMs)

• Applications in Speech Recognition, Bioinformatics, Signal Pro-cessing, Wireless and Communication, Economics and Finance(MacDonald and Zucchini (1999) et al. and Elliott et al. (1994)).

• A tutorial paper can be found in Rabiner (1989).

• A high-order mixed memory HMM was proposed by Saul andJordan (1999) .

• The model parameters admit a simple probabilistic inter-pretation that can be fitted iteratively by an ExpectationMaximization (EM) algorithm.

6

1.3 Interactive Hidden Markov Models (IHMMs)

• Ching et al. (2007) first introduced the idea of InteractiveHidden Markov Models (IHMMs) which can incorpo-rate the feedback effect of the observed Markov chainto the hidden one.

• In this talk, we introduce a high-order IHMM which canincorporate both the long-term dependence in the observed andhidden Markov chains, as well as the feedback effect of the ob-served chain on the hidden one.

• The basic idea of the high-order IHMM is to model the proba-bility laws of both the hidden and observable Markov chains asa pair of high-order stochastic difference equations.

7

1.4 Main Contributions

•We extend the first-order IHMM to a high-order case,which can incorporate both the long-term dependence in hiddenand observed sequences.

•We propose an effective algorithm based on nonnegative ma-trix factorization techniques (Berry et al. (2007) and Lin2007)) and the idea of bi-level optimization (Bard (1998))to give estimation of unknown parameters relying on limitedinformation.

• Numerical examples with real-world data are given to illustratethe feasibility and efficiency of our proposed model and algo-rithm.

8

2 Background

2.1 High-order Markov Chain Models

• High-order Markov chain can better model categorical data se-quence (to capture the delay effect).

• Problem: A conventional n-th order Markov chain of m stateshas O(mn) states and therefore parameters. The number of transi-tion probabilities (to be estimated) increases exponentially withrespect to the order n of the model. There is also computationalproblem.

•Raftery (1985) proposed a high-order Markov chain modelwhich involves only one additional parameter for each extra lag.

9

• The model:

P (Xt = k | Xt−1 = k1, . . . , Xt−n = kn) =

n∑i=1

λiqkki (2.1)

where k, k1, . . . , kn ∈ M . Here M = {1, 2, . . . ,m} is the set ofthe possible states and

n∑i=1

λi = 1 and Q = [qij]

is a transition probability matrix with column sums equal to one,such that

0 ≤n∑i=1

λiqkki ≤ 1, k, k1, . . . , kn = 1, 2, . . . ,m. (2.2)

10

• Raftery proved that Model (2.1) is analogous to the standardAR(n) model in time series.

• The parameters qk0ki, λi can be estimated numerically by max-imizing the log-likelihood of (2.1) subject to the constraints(2.2).

• Problems:

(i) This approach involves solving a highly non-linear opti-mization problem (which is not easy to solve).

(ii) The proposed numerical method neither guarantees conver-gence nor a global maximum.

11

• Ching et al. (2004) generalized Raftery’s model as follows:

Xt+n+1 =

n∑i=1

λiQiXt+n+1−i. (2.3)

A stochastic difference equation approach. HereQi is theith step transition probability matrix of the sequence andn is the order of the model. We note that if Q1 = Q2 = . . . = Qnthen (2.3) is just the Raftery model.

• We also assume that λi are non-negative such thatn∑i=1

λi = 1

so that the right-hand side of Eq. (2.3) is a probability distribution.

12

Proposition 2.1: IfQi is irreducible, λi > 0 for i = 1, 2, . . . , nand

n∑i=1

λi = 1

then the model in (2.3) has a stationary distribution X̄ whent→∞ independent of the initial state vectors

X0,X1, . . . ,Xn−1.

The proof is based on Perron-Frobenius Theorem.

• The stationary distribution X̄ is the unique solution of thelinear system of equations:I − n∑

i=1

λiQi

X̄ = 0 and 1T X̄ = 1 (2.4)

where I is them-by-m identity matrix (m is the number of possiblestates taken by each data point).

13

• Estimation of Qi, the ith step transition probability matrix. One

can count the transition frequency f(i)jk in the sequence from State

k to State j in the ith step. We get

F (i) =

f

(i)11 · · · · · · f

(i)m1

f(i)12 · · · · · · f

(i)m2

... ... ... ...

f(i)1m · · · · · · f

(i)mm

, for i = 1, 2, . . . , n. (2.5)

• From F (i), we get by column normalization:

Q̂i =

q̂

(i)11 · · · · · · q̂

(i)m1

q̂(i)12 · · · · · · q̂

(i)m2

... ... ... ...

q̂(i)1m · · · · · · q̂

(i)mm

where q̂(i)kj =

f(i)kj

m∑r=1

f(i)rj

(2.6)

14

• Linear Programming Formulation for the Estimation of λi.

Note: Proposition 2.1 gives a sufficient condition for the sequenceXt to converge to a stationary distribution X.

• We assume Xt→ X̄ as t→∞.

• X̄ can be estimated from the sequence {Xt} by computing theproportion of the occurrence of each state in the sequence and letus denote it by X̂.

• From Eq. (2.4) one would expect that

n∑i=1

λiQ̂iX̂ ≈ X̂. (2.7)

Therefore λi can be estimated by minimizing Eq. (2.7).

15

2.2 The Basic Idea of HMM Through an Example

•We consider the process of choosing dice and recording the num-ber of dots by throwing the dice of six faces (a cube) withnumbers 1, 2, 3, 4,5,6.

• Suppose we have two dice A and B of six faces (1, 2, 3, 4, 5, 6)such that Die A is fair and Die B is bias.

• The probability distributions of the dots obtained by throwingDie A and Die B are given in the table below.

Die 1 2 3 4 5 6A 1/6 1/6 1/6 1/6 1/6 1/6B 1/12 1/12 1/12 1/4 1/4 1/4

Table 1.

16

• Each time a die is chosen, with probability α, Die B is cho-sen given that Die A was chosen last time. And withprobability β, Die A is chosen given that Die B was chosenlast time. It is a 2-state Markov chain process having transitionprobability matrix:

Die ADie B

(1− α αβ 1− β

)• This process is hidden because we don’t know actually which dieis being chosen.• The chosen die is then thrown and the number of dots (thisis observable) obtained is recorded. The following is a possiblerealization of the process:

A︸︷︷︸Hidden

→ A︸︷︷︸Hidden

→ B︸︷︷︸Hidden

→ B︸︷︷︸Hidden

→ A︸︷︷︸Hidden

→ · · ·

↓ ↓ ↓ ↓ ↓ · · ·1︸︷︷︸

Observable

3︸︷︷︸Observable




· · ·

17

2.2.1 The Usual Model Parameters of a HMM

N , number of hidden statesK, number of observable sequencesT , length of the observation periodM , number of distinct observable statesS = {S1, . . . , SN}, the set of hidden statesV = {v1, . . . , vM}, the set of observable statesπ, initial state distributionOk = (ok1 , o

k2 , . . . , o

kT ), kth observation sequence

wk, the weighting of the kth observation sequence

Q = (q0, q1, q2, . . . , qT ), the sequence of hidden stateqt, hidden state at time taij, transition probabilities from hidden State i to hidden State j

bkj (v), the probability of symbol v ∈ V being observed in State jin the k-th sequenceλ = (A,B, π), the model training parameters (N , M are fixed).

18

2.2.2 Model Parameter Estimation

To construct an HMM, one has to solve three standard problems.• Problem (I) To efficiently compute P (O|λ), the likelihoodof of a given observation sequence, when we are given the modelλ = (A,B, π) and the observation sequence O = O1O2 . . . OT .[Forward and Backward Algorithm, (Baum, 1972). ]

• Problem (II) To find the most likely hidden sequence.[Viterbi algorithm (Viterbi, 1967), a dynamic programmingapproach.]

• Problem (III) To adjust the parameters λ = (A,B, π)of the model so as to maximize P (O|λ). [Baum-Welch al-gorithm (Bunke and Caelli, 2001), an EM-like algorithm.]

• Once the model parameters are obtained, one can simulate themodel easily.

19

2.3 Interactive Hidden Markov Model (IHMM)

• Suppose we are given a categorical data sequence of six possible(observable) sales volumes (1,2,3,4,5,6) of certain products follows:

1, 2, 1, 2, 1, 2, 2, 4, 2, 5, 6, 2, 1, . . . .

1 = very high, 2 = high, 3 = moderate high,4 = moderate low, 5= low, 6 = very low.• Suppose there are two hidden states: Bad Market Situation(A) and Good Market Situation (B).• In the Good Market Situation, the probability distributionof the sales volume is assumed to follow the distribution:

(1

4,1

4,1

4,1

4, 0, 0).

While in the Bad Market Situation, the probability distribu-tion of the sales volume is assumed to follow the distribution:

(0, 0,1

4,1

4,1

4,1

4).

20

• In our model (Ching et al. 2009), we assume that the market sit-uation and the sales volume interact each other. The sales volume(observable state) can infer the market situation.• In the Markov chain, the states are A,B, 1, 2, 3, 4, 5 and 6. Weassume that when the observable state is i, the market digests theinformation, the probabilities that the hidden state is A and B inthe next time step are given by αi and 1 − αi, respectively. Thetransition matrix governing the Markov chain is given by

P2 =

0 0 14

14

14

14 0 0

0 0 0 0 14

14

14

14

α1 1− α1 0 0 0 0 0 0α2 1− α2 0 0 0 0 0 0α3 1− α3 0 0 0 0 0 0α4 1− α4 0 0 0 0 0 0α5 1− α5 0 0 0 0 0 0α6 1− α6 0 0 0 0 0 0

.

21

• In order to define the IHMM, one has to estimate

α = (α1, α2, α3, α4, α5, α6)

from an observed data sequence. We consider the two-step tran-sition probability matrix:

P 22 =

α1+α2+α3+α44 1− α1+α2+α3+α4

4 0 0 0 0 0 0α3+α4+α5+α6

4 1− α3+α4+α5+α64 0 0 0 0 0 0

0 0 α14

α14

14

14

14 −

α14

14 −

α14

0 0 α24

α24

14

14

14 −

α24

14 −

α24

0 0 α34

α34

14

14

14 −

α34

14 −

α34

0 0 α44

α14

14

14

14 −

α44

14 −

α44

0 0 α54

α54

14

14

14 −

α54

14 −

α54

0 0 α64

α64

14

14

14 −

α64

14 −

α64

.

22

• We then extract the one-step transition probability ma-trix of the observable states from the matrix P 2

2 as follows:

P̃2 =

α14

α14

14

14

14 −

α14

14 −

α14

α24

α24

14

14

14 −

α24

14 −

α24

α34

α34

14

14

14 −

α34

14 −

α34

α44

α14

14

14

14 −

α44

14 −

α44

α54

α54

14

14

14 −

α54

14 −

α54

α64

α64

14

14

14 −

α64

14 −

α64

.

• The advantage of looking at the matrix P̃2 is that it gives the in-formation of one-step transition from one observable stateto another observable state.

• Here P̃2 can be estimated from the observed data and there aresix parameters to be estimated.

23

2.3.1 Estimation Method

• To estimate the parameter αi, we estimate the one-step tran-sition probability matrix from the given observed sequence.This can be done by counting the transition frequency of the statesin the observed sequence.• Suppose the estimates for this example is given by P̂2. We expectP̃2 ≈ P̂2 and, hence, αi can be obtained by solving the followingminimization problem:

minαi||P̃2 − P̂2||2F

subject to 0 ≤ αi ≤ 1. Here, ||.||F is the Frobenius norm, i.e.,

||A||2F =

n∑i=1

n∑i=1

A2ij.

• We remark that other matrix norms can also be used as theobjective function.

24

• For a general situation, we define the following transition prob-ability matrices:

P =

p11 p12 · · · p1np21 p22 · · · p2n

... ... ... ...pm1 pm2 · · · pmn

, (2.8)

and

α =

α11 α12 · · · α1mα21 α22 · · · α2m

... ... ... ...αn1 αn2 · · · αnm

(2.9)

• Here the matrix α is unknown but the matrix P can be unknownor given.

25

Case I: P is Known: The one-step transition probability matrixof the observable states is given by P̃2 = αP , i.e.

[P̃2]ij =

m∑k=1

αikpkj i, j = 1, 2, . . . , n.

Here αij are unknown but the probabilities pij are given.• Suppose [Q]ij is the one-step transition probability matrix esti-mated from the observed sequence.Then, for each fixed i, αij, j = 1, 2, . . . ,m can be obtained bysolving the following constrained least-square problem:

minαik

n∑j=1

m∑k=1

αikpkj − [Q]ij

2

subject tom∑k=1

αik = 1 and αik ≥ 0.

26

Case II: P is Unknown: Suppose all the probabilities Pij areunknown. We adopt the bi-level programming techniques.

Initialize p(0)ij ; e = 1; h = 1;

Solve α(h)ik

minα

(h)ik

n∑j=1

m∑k=1

α(h)ik p

(h−1)kj − [Q]ij

2

subject to∑mk=1α

(h)ik = 1 and α

(h)ik ≥ 0;

Solve p(h)ik

minp

(h)ik

n∑j=1

m∑k=1

α(h)ik p

(h)kj − [Q]ij

2

subject to∑nk=1 p

(h)ik = 1 and p

(h)ik ≥ 0.

27

While e > tolerance, h := h + 1; Solve α(h)ik

minα

(h)ik

n∑j=1

m∑k=1

α(h)ik p

(h−1)kj − [Q]ij

2

subject to∑mk=1α

(h)ik = 1 and α

(h)ik ≥ 0.

Solve p(h)ik

minp

(h)ik

n∑j=1

m∑k=1

α(h)ik p

(h)kj − [Q]ij

2

subject to∑nk=1 p

(h)ik = 1 and p

(h)ik ≥ 0.

e := ||α(h) − α(h−1)||22 + ||P (h) − P (h−1)||22 ;end.

28

3 The High-Order Interactive Hidden Markov Model

3.1 High-Order IHMM

• A discrete time parameter set

T := {0, 1, 2, . . . , T}.

• Suppose there are m hidden states and n observable states.

• The hidden state qt at time t is unobservable:

S = {S1, S2, . . . , Sm}.

• The observable state Ot at time t is observable:

V = {v1, v2, . . . , vn}.

29

• xt : the probability vector of the hidden Markov chain (xt ∈Rm).

-The ith-component of xt represents the probability that thehidden Markov chain is in the ith-state at time t.

- xt = ek is used to represent the hidden sequence in statek ∈ S at time t.

• yt : the probability vector of the observable process at time t(yt ∈ Rn).

- yt = ej to denote the observable state in state j ∈ V .

30

• In the high-order IHMM, the hidden states and the observ-able states influence each other according to the following pairof linear, vector-valued, high-order stochastic differenceequations:

xt =

h∑i=1

λiPiyt−i

yt =

k∑j=1

µjMjxt−i+1

(3.1)

•We suppose that only the process {yt} is observable.

31

• The key feature of an IHMM : the transitions of the hiddenprocess {xt} are affected by the observable process {yt} andvice versa.

• Here h and k are the orders of the hidden states and observ-able states, respectively.

• The matrices Pi and Mi are the i-step transition proba-bility matrices from observable states to hidden states andfrom hidden states to observable states, respectively.

•Moreover, we have the following constraints:

0 ≤ λi, µi ≤ 1 and

h∑i=1

λi =

k∑j=1

µj = 1.

32

•When k = h = 1, the model in Eq. (3.1) is degenerated to theIHMM introduced in (Ching et al., 2007).

• By substituting xt into the second equation in Eq. (3.1), ahigh-order Markov chain model in yt is obtained as follows:

yt =

k∑j=1

h∑i=1

λiµj(MjPi)yt−i−j+1. (3.2)

• To determine the parameters λi, µi, Pi and Mi.

•We employ the Non-negative Matrix Factorization (NMF)techniques in Lin (2007) for this task.

33

3.2 Second-Order IHMM

• A high-order homogeneous IHMM with k = h = 2 :{xt = λPyt−1 + (1− λ)Qyt−2yt = µMxt + (1− µ)Nxt−1.

(3.3)

where 0 ≤ λ, µ ≤ 1.

• By Eq. (3.3),

yt = λµMPyt−1+[(1− λ)µMQ + λ(1− µ)NP ]yt−2+(1− λ)(1− µ)NQyt−3.

(3.4)

34

•We define Zt as follows:

Zt =

ytyt−1yt−2

=

C1 C2 C3I 0 00 I 0

yt−1yt−2yt−3

= CZt−1(3.5)

where C1 = λµMPC2 = (1− λ)µMQ + λ(1− µ)NPC3 = (1− λ)(1− µ)NQ.

(3.6)

• In order to estimate λ, µ,M,N, P and Q, we give the estima-tion of matrix C first.

35

• The matrix C can be estimated by minimizing the Frobeniusnorm of the difference of two matrices, i.e.

minC

T∑t=3

‖Zt −CZt−1‖2F

subject to some constraints, where T is the length of the ob-servable states

• Let 1n denote the 1 × n vector (1, 1, . . . , 1). Then from Eq.(3.6): 1nC1 = λµ1n

1nC2 = [(1− λ)µ + λ(1− µ)]1n1nC3 = (1− λ)(1− µ)1n.

(3.7)

• This means that the column sums of C1, C2 and C3 are allequal, set them to be γ1, γ2 and γ3, respectively.

36

• Adding these three equations, we obtain

1n(C1 + C2 + C3) = 1n

i.e., γ1 + γ2 + γ3 = 1.

• Suppose that C has been obtained. Then by Eq. (3.7), we have λµ = γ1(1− λ)µ + λ(1− µ) = γ2(1− λ)(1− µ) = γ3

or, equivalently, {λµ = γ1λ + µ = γ2 + 2γ1.

37

• Here λ and µ are the roots of the equation

f (x) = x2 − (γ2 + 2γ1)x + γ1 = 0.

• However, 0 ≤ λ, µ ≤ 1 and f (x) = 0 may have no real root.We note that

f (0) = γ1 ≥ 0 and f (1) = γ3 ≥ 0

and the global minimum of f occurs at

0 ≤ x∗ = γ2/2 + γ1 ≤ 1

which is the root of

f ′(x) = 2x− (γ2 + 2γ1) = 0.

Thenf (x∗) = γ1 − (γ1 +

γ2

2)2.

38

• Thus there are THREE cases:

(i) If f (x∗) < 0 then the function f (x) has two real roots for λand µ in [0, 1];

(ii) If f (x∗) = 0 then the function f (x) has the repeated root x∗

in the interval [0, 1] and λ = µ = x∗;

(iii) If f (x∗) > 0 then the function f (x) has no real root in theinterval [0, 1] and we shall take

x∗ = γ2/2 + γ1

for λ and µ as it minimizes |f (x)| for x ∈ [0, 1].

39

• Based on the estimates of parameters λ and µ, the followingequations are employed to get transition matrices: λµMP = C1

(1− λ)µMQ + λ(1− µ)NP = C2(1− λ)(1− µ)NQ = C3.

• Algorithm 1 is proposed to give the optimal estimation of un-known parameters.

• This algorithm is based on NMF and the idea of bi-level opti-mization, see, for instance, Lin (2007), Ching et al. (2007) andBard (1998).

40

Step 1: Initialize M (0), N (0), d = 1;Step 2: With the sub-problem algorithm from Lin, C.J. (2007),

solve P (d), Q(d) by minimizing

‖λµM (d−1)P (d) − C1‖2F and ‖(1− λ)(1− µ)N (d−1)Q(d) − C3‖2F ;

Step 3: Solve M (d), N (d) by minimizing

‖C2 − (1− λ)µM (d)Q(d) − λ(1− µ)N (d)P (d)‖2Fsubject to

the column sums of M (d) and N (d) being 1 and 0 ≤M (d), N (d) ≤ 1;Step 4:

If ‖M (d) −M (d−1)‖2F + ‖N (d) −N (d−1)‖2F < tolerance, stop;otherwise d = d + 1 and go back to Step 2.

Table 1: Algorithm 1.

41

• To avoid being trapped in a local optimum, we choose the ini-tial guess randomly for several times.

• The parameters which minimize the following function:

‖λµMP − C1‖2F + ‖C2 − (1− λ)µMQ− λ(1− µ)NP‖2

F + ‖(1− λ)(1− µ)NQ− C3‖2F .

are selected among all of them.

• After that, both P and Q are normalized, which ensures thespecific characters of transition probability matrices, i.e., thecolumn sum is equal to one.

42

4 Numerical Examples

4.1 SSE Composite Index Data

•We try to extract the underlying macro-economic states fromthe observable SSE Composite Index data by employing theabove discussed model.

• In the following application, we assume that a hidden statedepends on two previous observable states as follows:

xt = λPyt−1 + (1− λ)Qyt−2 (t = 2, . . . , T ) (4.1)

where 0 ≤ λ ≤ 1.

• Also we haveyt = Mxt. (4.2)

43

• From Eq. (4.1) and Eq. (4.2),

yt = λMPyt−1 + (1− λ)MQyt−2, t = 2, . . . , T. (4.3)

• From the above discussion, after getting the estimation of λ, analgorithm similar to the Algorithm 1 is given as follows:

Step 1: Initialize M (1),d = 2;

Step 2: By the sub-problem algorithm from Lin, C.J. (2007),

solve P (d), Q(d) by minimizing

‖C1 − λM (d−1)P (d)‖2F and ‖C2 − (1− λ)M (d−1)Q(d)‖2

F respectively;

Step 3: Then by the same algorithm, solve M (d) by minimizing

‖C1 − λM (d)1 P (d)‖2

F and ‖C2 − (1− λ)M(d)2 Q(d)‖2

F ;

Step 4: If both the column sum of M(d)1 ,M

(d)2 equal to 1, go to Step 7 ;

otherwise, divide any entry by the column sum that the entry lies;

Step 5: Let D(d)1 , D

(d)2 be two diagonal matrices

whose diagonal entries are corresponding column sums of M(d)1 and M

(d)2 ;

Step 6:Set P (d), Q(d) to be D(d)1 ∗ P (d), D

(d)2 ∗Q(d) respectively;

Step 7: Set M (d) =M

(d)1 +M

(d)2

2 ;

Step 8: If ‖M (d) −M (d−1)‖2F/W < tolerance, stop;

Otherwise d = d + 1 and go back to Step 2.Table 2: Algorithm 2.

44

Figure 1: Index

Time01/01/08 01/01/10 01/01/12 01/01/14 01/01/16 01/01/18

SS

E C

ompo

site

Inde

x

1500

2000

2500

3000

3500

4000

4500

5000

5500

We use daily data from January 2nd 2008 to October 12th 2016of SSE Composite Index. There are 2133 observations (data) intotal.

45

• To apply our model, we divide the index data into 4 states:State 1 if the index goes beyond 4000,State 2 between 3000 and 4000,State 3 between 2000 and 3000 andState 4 if it declines under 2000.

• By employing our proposed algorithm, we get λ = 0.5996 andthe estimates of M , P and Q are given as follows:

M =

0.0691 0.97080.6380 0.00000.1465 0.01460.1464 0.0146

P =

(0.1404 1.0000 0.8229 0.82280.8596 0.0000 0.1771 0.1772

)and

Q =

(0.3478 1.0000 0.8275 0.82750.6522 0.0000 0.1725 0.1725

).

46

• For the hidden states:State 1 indicates “Recession” (Bear Market) andState 2 represents “Expansion” (Bull Market).

• The estimated hidden sequence is obtained by using the modelin Eq. (3.1) with h = 2, k = 1.

• That is we suppose that the condition of stock market wasmainly affected by the current macro-economic state, while themacro-economic condition was influenced by the previous andcurrent market states.

47

Figure 2: Index

Time01/01/08 01/01/10 01/01/12 01/01/14 01/01/16 01/01/18

SS

E C

ompo

site

Inde

x

0

1000

2000

3000

4000

5000

6000

48

• The results show that Chinese macro-economic states remainthe state of “Recession” from Sept. 2008 until Feb. 2009.

• This round of economic downturn may be attributed to for-eign impact, when the whole world is affected by U.S. subprimemortgage crisis.

• In this period, the Chinese government released a series of suc-cessive bailout policies, including decreasing the stamp duty andlending rate, and providing trillion investment plans etc.

• Our results show that China went through a plateau period andthis period sustained until July 2013.

• This shows that our model and related algorithm can capturethe considerable changes of underlying economic situation.

49

4.2 Default Data

•We employ our model to a real scenario, where the defaultevents were taken from a large database provided by Standard& Poor’s taken from Giampieri et al. (2005).

• Hidden risk state: 1 (Enhanced Risk) and 2 (Normal Risk).

• Default data from 4 sectors: consumer, energy, media andtransport where the default times of bonds in considered USsector is extracted from the Standard & Poor’s CreditPro 6.2database from 1 Jan. 1981 to 31 Dec. 2002 (88 data).

• State 1 if the number of default events is between 0 and 5;State 2 if between 5 and 10;State 3 if from 10 to 15State 4 if greater than 15.

50

• By applying the above algorithm, we obtain λ = 0.5346

M =

0.0000 0.79780.7933 0.00000.1033 0.10110.1034 0.1011

P =

(0.0525 0.9268 0.4994 0.49910.9475 0.0732 0.5006 0.5009

)and

Q =

(0.0772 0.9039 0.4993 0.49950.9228 0.0961 0.5007 0.5005

).

51

Figure 3: Default Events

time1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

Def

ault

Eve

nts

0

5

10

15

20

25

52

• The blue line gives the number of total default from four sec-tors in each quarter from January 1981 to December 2002, thebackground yellow gives the estimated “enhanced” risk state

• high risk states happened mainly during 1990 to 1992, 1995-1996 and 1997 to 2002.

• From the National Bureau of Economics Research 2003, twomost serious recession happened during this period, which areJuly 1990-March 1991 and March 2001-November 2001, respec-tively. There was an economic growth pause during 1995 be-cause of the increased interest rates raised by Federal Reserveand also the economic financial crises in Mexico in 1995.

• It is quite interesting to note that our estimation of high riskstates anticipates the onset of recession, and matches with thewhole recession period

• Furthermore, our estimates hang on for a few months after theend of the recession.

53

Figure 4: Hidden States

time1977 1980 1982 1985 1987 1990 1992 1995 1997 2000 2002 2005

Def

ault

Eve

nts

0

5

10

15

20

25

54

5 Summary

• In this talk, we propose a high-order IHMM, which take thelong-term dependence in both observed and hidden Markovchains into consideration.

• An efficient algorithm based on NMF techniques and the ideaof bi-level optimization was proposed to estimate the unknownparameters and the hidden sequence.

•We used both financial data and default data to illustrate thepractical implementation and application of the proposed model.

55

References

[1] J.F. Bard, “Practical bilevel optimization: algorithms and applications”, Dordrecht, Kluwer Academic Publishers, 1998.

[2] L. Baum, “ An inequality and associated maximization techniques in statistical estimation for probabilistic function ofMarkov processes”, Inequality, 3 (1972), 1-8.

[3] H. Bunke and T. Caelli Hidden Markov models : applications in computer vision, Editors, Horst Bunke, Terry Caelli,Singapore, World Scientific, 2001.

[4] M. Berry, M. browne, A. Langville, V. Pauca and R.J. Plemmons, Algorithms and applications for approximate nonnegativematrix factorization, Computational Statistics and Data Analysis, 52 (2007), 155–173.

[5] R. Chan and W. Ching, Toeplitz-circulant preconditioners for toeplitz systems and their applications to Queuing Networkswith Batch Arrivals, SIAM Journal on Scientific Computing, 17 (1996), 762–772.

[6] W. Ching, M. Ng and K. Wong, Hidden Markov model and its applications in customer relationship management, IMAJournal of Management Mathematics, 15 (2004a), 13–24.

[7] W. Ching, E. Fung and M. Ng, Higher-order Markov chain models for categorical data sequences, International Journal ofNaval Research Logistics, 51 (2004b), 557–574.

[8] W. Ching and M. Ng, “Markov chains : models, algorithms and applications”, International Series on Operations Researchand Management Science, Springer: New York, 2006.

[9] W. Ching, E. Fung, M. Ng, T. Siu and W. Li, Interactive hidden Markov models and their applications, IMA Journal ofManagement Mathematics, 18 (2007), 85–97.

[10] W. Ching, S. Choi, T. Li and I. Leung, A tandem queueing system with applications to pricing strategy, Journal of Industrialand Management Optimization, 5 (2009), 103–114.

[11] W. Ching, T. Siu, L. Li, L., Li, and W. Li , “Modeling Default Data via an Interactive Hidden Markov Model”, Computa-tional Economics, 34 (2009), 1–19.

[12] S. Choi, X. Huang, W. Ching and M. Huang, Incentive effects of multiple-server queueing networks: the principal-agentperspective, East Asian Journal on Applied Mathematics, 1 (2011) 379–402.

[13] N.J. Cutland, P.E. Kopp and W. Willinger, From discrete to continuous financial models: new convergence results foroption pricing, Mathematical Finance, 3 (1993), 101–123.

56

[14] N.J. Cutland, P.E. Kopp and W. Willinger, Stock price returns and the Joseph effect: A fractional version of the Black-Scholes model, Progress in Probability, 36 (1995), 327–351.

[15] R.J. Elliott, L. Aggoun and J.B. Moore, “Hidden Markov models: estimation and control”, Springer-Verlag: Berlin-Heidelberg-New York, 1994.

[16] G. Giampieri and M. Davis and M. Crowder, “Analysis of default data using hidden Markov models”, Quantitative Finance,5 (2005), 27–34.

[17] K. Leung and Y. Kwok, Counterparty risk for credit default swaps: Markov chain interacting intensities model withstochastic intensity, Asia Pacific Finance Markets, 16 (2009), 169–181.

[18] C. Lin, Projected gradient methods for non-negative matrix factorization, Neural Computation, 19 (2007), 2756–2779.

[19] D. Lee and H. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature, 401 (1999), 788–791.

[20] I. MacDonald and W. Zucchini, “Hidden Markov and Other Models for Discrete-valued Time Series”, Chapman & Hall:London, 1999.

[21] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE,77 (1989), 257–286.

[22] A. Raftery, A model for high-order Markov chains, Journal of the Royal Statistical Society B, 47 (1985) 528–539.

[23] L. Saul and M. Jordan, Mixed memory Markov models : decomposing complex stochastic processes as mixtures of simplerones, Machine Learning, 37 (1999) 75-86.

[24] T. Siu, W. Ching, M. Ng and E. Fung, On a multivariate Markov chain model for credit risk measurement, QuantitativeFinance, 5 (2005), 543–556.

[25] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm”, IEEE Trans.Information Theory, 13, (1967) 260–269.

[26] H. Yang, Y. Li, L. Lu and R. Qi, First order multivariate Markov chain model for generating annual weather data for HongKong, Energy and Buildings, 43, 2371–2377.

57

Documents

A High-order Interactive Hidden Markov Model and …hkumath.hku.hk/~wkc/talks/IhmmOR.pdfhidden Markov chains, as well as the feedback e ect of the ob-served chain on the hidden one