44
Maandag 5 mei 2008 KATHOLIEKE UNIVERSITE IT LEUVEN Promotoren: Prof.dr.ir. Bart De Moor, promotor Prof.dr.ir. Jan Willems, copromotor Juryleden: Prof.dr.ir. H. Van Brussel, voorzitter Prof.dr. A. Bultheel Prof.dr. V. Blondel (UCL, Louvain-la-Neuve) Prof.dr. P. Spreij (UVA, Amsterdam) Prof.dr.ir. L. Finesso (ISIB-CNR, Padova) Prof.dr.ir. K. Meerbergen Ph.D. defence Realization, identification and filtering for hidden Markov models using matrix factorization techniques Bart Vanluyten

Bart Vanluyten

  • Upload
    kiet

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Realization, identification and filtering for hidden Markov models using matrix factorization techniques. Bart Vanluyten. 04/’06 06/’06 08/’06 10/’06 12/’06 02/’07 04/’07 06/’07 08/’07 10/’07 12/’07 02/’08 04/’08. Mathematical modeling. Bel-20. - PowerPoint PPT Presentation

Citation preview

Page 1: Bart Vanluyten

Maandag 5 mei 2008

KATHOLIEKE

UNIVERSITEIT

LEUVEN

Promotoren: Prof.dr.ir. Bart De Moor, promotor

Prof.dr.ir. Jan Willems, copromotor

Juryleden: Prof.dr.ir. H. Van Brussel, voorzitter Prof.dr. A. Bultheel Prof.dr. V. Blondel (UCL, Louvain-la-Neuve) Prof.dr. P. Spreij (UVA, Amsterdam) Prof.dr.ir. L. Finesso (ISIB-CNR, Padova) Prof.dr.ir. K. Meerbergen

Ph.D. defence

Realization, identification and filtering for hidden Markov models

using matrix factorization techniques

Bart Vanluyten

Page 2: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

2 / 43

04/’06 06/’06 08/’06 10/’06 12/’06 02/’07 04/’07 06/’07 08/’07 10/’07 12/’07 02/’08 04/’08

Mathematical modeling

Bel-20

Process with finite valued output: { , , = }

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 3: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

3 / 43

Hidden Markov model

Example: Bel-20• Output process {up, down, unchanged}• State process {bull market, bear market, stable market}

State process has Markov property and is hidden

Andrey Markov (1856 - 1922)

Bull Market

StableMarket

Bear Market

30% BEL20 30% BEL20 40% BEL20 =

50%

20%

60% 30%

20%

10%

40%

20%

50%

70% BEL20 10% BEL20 20% BEL20 =

10% BEL20 60% BEL20 30% BEL20 =

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 4: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

4 / 43

BEL20

4-06 8-06 12-06 4-07 8-07 12-07 4-08

4.800

4.600

4.400

4.200

4.000

3.800

3.600

Finite-valued processes

FINITE-VALUED

PROCESSES

{ , , = }

{ A, C, G, T } { 1, 2, ..., 6 }

{ head, tail }

{ i:, e, æ, a:, ai, ..., z }

BEL20

Bio-informatics

Speech recognitionEconomics

Coin flipping - dice-tossing (with memory)

TGGAGCCAACGTGGAATGTCACTAGCTAGCTTAGATGGCTAAACGTAGGAATACACGTGGAATATCGAATCGTTAGCTTAGCGCCTCGACCTAGATCGAGCCGATCGGACTAGCTAGCTCGCTAGAAGCACCTAGAAGCTTAGACGTGGAAATTGCTTAATC

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 5: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

5 / 43

Open problems for HMMs

Obtain model from data

Use model for estimation

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 6: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

6 / 43

Relation to linear stochastic model (LSM)

Mathematical model for stochastic processes

• Output process continuous range of values• State process continuous range of values

+

NOISE

NOISE

STATE OUTPUT

+

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 7: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

7 / 43

Relation to linear stochastic model

Realization Identification

Estimation

Hidden Markov model

Realization Identification

Estimation

Linear stochastic model

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 8: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

8 / 43

Relation to linear stochastic model

Realization Identification

Estimation

Hidden Markov model

Realization Identification

Estimation

Linear stochastic model

Singular value decomposition

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 9: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

9 / 43

Relation to linear stochastic model

Realization Identification

Estimation

Hidden Markov model

Realization Identification

Estimation

Linear stochastic model

Singular value decomposition

Nonnegative matrix

factorization

1. INTRODUCTION

Modeling—HMMs—Finite valued process—Open problems—Relation to LSM

Page 10: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

10 / 43

Outline

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

Matrix factorizations

Given: matrix

Find: low rank approximation of2nd objective

1st objective

Page 11: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

11 / 43

Outline

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

Matrix factorizations

Given: matrix

Find: low rank approximation of

Page 12: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

12 / 43

Matrix – Decomposition – Rank : example

Matrix

Matrix decomposition

Matrix rank

minimal inner dimension of exact decomposition

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 13: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

13 / 43

Low rank matrix approximation

Rank approximation of

James Sylvester

(1814 - 1897)

orthogonal

SVD yields (global) optimal low rank approximation in Frobenius distance

Singular value decomposition (SVD)

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 14: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

14 / 43

Nonnegative matrix factorization

In some applications is nonnegative and and need to be nonnegative too

Nonnegative matrix factorization (NMF) of

Algorithm (Kullback-Leibler divergence) [Lee, Seung]

This thesis: 2 modifications to NMF

NONNEGATIVE NONNEGATIVE

NONNEGATIVE

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 15: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

15 / 43

Structured NMF

Structured nonnegative matrix factorization of

Algorithm (Kullback-Leibler divergence)

Convergence to stationary point of divergence

NONNEGATIVE NONNEGATIVE

NONNEGATIVENONNEGATIVE

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 16: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

16 / 43

Structured NMF: application

Applications apart from HMMs: clustering data points

Setosa Versicolor Virginica

– petal width– petal length– sepal width– sepal length

Asked: Divide 150 flowers into clusters

Given:

of 150 iris flowersPETAL

SEPAL

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 17: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

17 / 43

Structured NMF: application

Clustering obtained by: • Computing distance matrix between points

• Applying structured nonnegative matrix factorization on distance

matrix

cluster 1

cluster 2

cluster 3

PE

TA

L W

IDT

HS

EP

AL

WID

TH

SEPAL LENGTH

PETAL LENGTH

SE

PA

L L

EN

GT

H

SEPAL WIDTH

PE

TA

L L

EN

GT

H

PETAL WIDTH

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 18: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

18 / 43

NMF without nonnegativity of the factors

NMF without nonnegativity constraints on the factors of

We provide algorithm (Kullback-Leibler divergence)

Problem allows to deal with upper bounds in an easy way

NONNEGATIVE NO NONNEGATIVITY CONSTRAINTS NONNEGATIVE

Example 3

3

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 19: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

19 / 43

NMF without nonnegativity of the factors

ORIGINALNMF without

nonneg. factors

Upperbounded NMF

without nonneg. fact.NMF

Applications apart from HMMs: database compressionGiven: Database containing 1000 facial images of size 19 x 19 = 361 pixels

Asked: Compression of database using matrix factorization techniques

361

1000 20

. . .

Kullback-Leibler divergence: 339 383564

> 1

2. MATRIX FACTORIZATIONS

Introduction—Existing factorizations—Structured NMF—NMF without nonneg. factors

Page 20: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

20 / 43

Outline

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

Matrix factorizations

Given: matrix

Find: low rank approximation of

Page 21: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

21 / 43

Hidden Markov models: Moore - Mealy

Moore HMM

Mealy HMM

=

NONNEGATIVE

NONNEGATIVE

ORDER

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 22: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

22 / 43

Realization problem

String from String probabilities

String probabilities generated by Mealy HMM

POSITIVE REALIZATION

NONNEGATIVE

such that

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 23: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

23 / 43

Realization problem: importance

Theoretical importance: transform ‘external’ model into ‘internal’ model Realization can be used to identify model from data

POSITIVE REALIZATION

NONNEGATIVE

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 24: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

24 / 43

Realizability problem

Generalized Hankel matrix

Necessary condition for realizability: Hankel matrix has finite rank

No necessary and sufficient conditions for realizability are known

No procedure to compute minimal HMM from string probabilities

This thesis: two relaxations to positive realization problem• Quasi realization problem• Approximate positive realization problem

Hermann Hankel

(1839 - 1873)

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 25: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

25 / 43

Quasi realization problem

QUASI REALIZATION

such that

NO NONNEGATIVITYCONSTRAINTS !

Finiteness of rank of Hankel matrix = N & S condition for quasi realizability Rank of hankel matrix = minimal order of exact quasi realization Quasi realization is more easy to compute than positive realization Quasi realization typically has lower order than positive realization

Negative probabilities • No disadvantage in several estimation applications

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 26: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

26 / 43

Partial quasi realization problem

Given: String probabilities of strings up to length t

Asked: Quasi HMM that generates the string probabilities

This thesis:

• Partial quasi realization problem has always a solution

• Minimal partial quasi realization obtained with quasi realization algorithm if a rank condition on the Hankel matrix holds

• Minimal partial quasi realization problem has unique solution (up to similarity transform) if this rank condition holds

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 27: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

27 / 43

Approximate quasi realization problem Given: String probabilities of strings up to length t Asked: Quasi HMM that approximately generates the string probabilities This thesis: algorithm

• Compute low rank approximation of largest Hankel block subject to consistency and stationarity constraints

• Reconstruct Hankel matrix from largest block

We prove that rank does not increase in this step• Apply partial quasi realization algorithm

Upperbounded NMF without nonnegativity of the factors with additional

constraints

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 28: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

28 / 43

Approximate positive realization problem

APPROXIMATE POSITIVE REALIZATION

NONNEGATIVE

such that

Given: String probabilities of strings up to length t Asked: Positive HMM that approximately generates the string probabilities

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 29: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

29 / 43

Approximate positive realization problem

Moore, t = 2

• Define

• If string probabilities are generated by Moore HMM

Structured nonnegative matrix factorization

Mealy, general t Generalize approach for Moore, t = 2

where

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 30: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

30 / 43

Modeling DNA sequences

DNA

40 sequences of length 200

String probabilities of strings up to length 4 stacked in Hankel matrix

ORDER 1 2 3 4 5 6 7

Quasi realization 0.1109 0.0653 0.0449 0.0263 0.0220 0.0211 0.0210

Positive realization 0.3065 0.1575 0.0690 0.0411 0.0374 0.0373 0.0371

Kullback-Leibler divergence

SIN

GU

LA

R V

AL

UE

TGGAGCCAACGTGGAATGTCACTAGCTAGCTTAGATGGCTAAACGTAGGAATACCCTACGTGGAATATCGAATCGTTAGCTTAGCGCCTCGACCTAGATCGAGCCGATCGGTCTACTAGCTAGCTCGCTAGAAGCACCTAGAAGCTTAGACGTGGAAATTGCTTAATCTAG

3. REALIZATION

Introduction—Realization—Quasi realization—Approx. realization—Modeling DNA

Page 31: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

31 / 43

Outline

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

Matrix factorizations

Given: matrix

Find: low rank approximation of

Page 32: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

32 / 43

Identification problem

Given: Output sequence of length T

Asked: (Quasi) HMM that models the sequence

NONNEGATIVE NO NONNEGATIVITYCONSTRAINTS!

Approach

Baum-Welch identification

Linear Stochastic

Models

HiddenMarkovModels

Subspace basedidentification

Subspace inspiredidentification

Prediction error identification SVD

NMF

4. IDENTIFICATION

Introduction—Subspace inspired identification—HIV modeling

Page 33: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

33 / 43

Identification problem

output sequence

system matrices

system matrices

state sequence

state sequence

Baum-Welch Subspace inspired

4. IDENTIFICATION

Introduction—Subspace inspired identification—HIV modeling

Page 34: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

34 / 43

Subspace inspired identification

Estimate the (quasi) state distribution

• quasi state predictor can be built from data using upperbounded NMF without nonnegativity of the factors

• state predictor can be built from data using NMF

Compute the system matrices: least squares problem

Quasi HMM:

Positive HMM:

. . .

. . .

. . .

. . .

. . .

. . .

. . .

4. IDENTIFICATION

Introduction—Subspace inspired identification—HIV modeling

We have shown that:

Page 35: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

35 / 43

Modeling sequences from HIV genome

Mutation

25 mutated sequences of length 222 from the part of the HIV1 genome that codes for the envelope protein [NCBI database]

• Training set

• Test set

HMM model using Baum-Welch – Subspace inspired identification

A

HIV virus

ENVELOPE

MATRIXCORE

4. IDENTIFICATION

Introduction—Subspace inspired identification—HIV modeling

Page 36: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

36 / 43

Modeling sequences from HIV genome

Kullback-Leibler divergence (string probabilities of length-4 strings)

Mean likelihood of the given sequences

Likelihood of using third order subspace inspired model

Model can be used to predict new viral strains and to distinguish between different HIV subtypes

ORDER 1 2 3 4 5

Baum-Welch 3.15 4.65 8.27 21.02 22.93

Subspace 3.15 2.14 1.13 1.08 1.10

ORDER 1 2 3 4 5

Baum-Welch 8.13 10-5 9.03 10-5 1.40 10-4 1.45 10-4 1.50 10-4

Subspace 8.14 10-5 8.84 10-5 9.84 10-5 9.60 10-5 9.83 10-5

TEST-SEQUENCE

Likelihood 9.18 10-5 9.15 10-5 9.26 10-5 8.82 10-5 9.15 10-5

4. IDENTIFICATION

Introduction—Subspace inspired identification—HIV modeling

Page 37: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

37 / 43

Outline

Estimation problem

Given: output sequence

Find: state distribution at time

Identification problem

Given: output sequence

Find: HMM that models the sequence

Realization problem

Given: string prob’s

Find: HMM generating string prob’s

Matrix factorizations

Given: matrix

Find: low rank approximation of

Page 38: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

38 / 43

Estimation for HMMs

State estimation – output estimation

We derive recursive formulas to solve state and output filtering, prediction and smoothing problems

HMM HMM

5. ESTIMATION

Estimation for HMMs—Application

Filtering – smoothing – prediction

TIME

TIME

TIME

t

t

t

FILTERING:

SMOOTHING:

PREDICTION:

= span of available measurements

Page 39: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

39 / 43

Estimation for HMMs

Example:• Recursive algorithm to compute

Recursive output estimation algorithms effective with quasi HMM

Finiteness of rank of Hankel matrix = N & S condition for quasi realizability Rank of hankel matrix = minimal order of exact quasi realization Quasi realization is easier to compute than positive realization Quasi realization typically has lower order than positive realization Negative probabilities

• No disadvantage in output estimation problems

5. ESTIMATION

Estimation for HMMs—Application

Page 40: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

40 / 43

Finding motifs in DNA sequences

Find motifs in muscle specific regulatory regions [Zhou, Wong]• Make motif model

• Make quasi background model (see Section realization)

• Build joint HMM

• Perform output estimation

Results (compared to results from Motifscanner [Aerts et al.])

POSITION POSITION

MO

TIF

PR

OB

AB

ILIT

Y

MO

TIF

PR

OB

AB

ILIT

Y

Mef-2MyfSp-1SRFTEF

5. ESTIMATION

Estimation for HMMs—Application

Page 41: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

41 / 43

Conclusions

Two modification to the nonnegative matrix factorization• Structured nonnegative matrix factorization• Nonnegative matrix factorization without nonnegativity of the factors

Two relaxations to the positive realization problem for HMMs• Quasi realization problem• Approximate positive realization problem Both methods were applied to modeling DNA sequences

We derive equivalence conditions for HMMs

We propose a new identification method for HMMs Method was applied to modeling DNA sequences of HIV virus

Quasi realizations suffice for several estimation problems Quasi estimation methods were applied to finding motifs in DNA sequences

6. CONCLUSIONS

Conclusions—Further research—List of publications

Page 42: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

42 / 43

Further research

Matrix factorizations Develop nonnegative matrix factorization with nesting property (cfr. SVD)

Hidden Markov models Investigate Markov models (special case of hidden Markov case) Develop realization and identification methods that allow to

incorporate prior-knowledge in the Markov chain

Method to estimate minimal order of positive HMM from string probabilities Canonical forms of hidden Markov models Model reduction for hidden Markov models System theory for hidden Markov models with external inputs

. . .

6. CONCLUSIONS

Conclusions—Further research—List of publications

Page 43: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

43 / 43

List of publications

Journal papers• B. Vanluyten, J.C. Willems and B. De Moor. Recursive Filtering using Quasi-Realizations. Lecture Notes in Control and

Information Sciences, 341, 367–374, 2006.

• B. Vanluyten, J.C. Willems and B. De Moor. Equivalence of State Representations for Hidden Markov Models. Systems and Control Letters, 57(5), 410–419, 2008.

• B. Vanluyten, J.C. Willems and B. De Moor. Structured Nonnegative Matrix Factorization with Applications to Hidden Markov Realization and Filtering. Accepted for publication in Linear Algebra and its Applications, 2008.

• B. Vanluyten, J.C. Willems and B. De Moor. Nonnegative Matrix Factorization without Nonnegativity Constraints on the Factors. Submitted for publication.

• B. Vanluyten, J.C. Willems and B. De Moor. Approximate Realization and Estimation for Quasi hidden Markov models. Submitted for publication.

International conference papers• I. Goethals, B. Vanluyten, B. De Moor. Reliable spurious mode rejection using self learning algorithms. In Proc. of the

International Conference on Modal Analysis Noise and Vibration Engineering (ISMA 2004), Leuven, Belgium, pages 991–1003, 2004.

• B. Vanluyten, J. C.Willems and B. De Moor. Model Reduction of Systems with Symmetries. In Proc. of the 44th IEEE Conference on Decision and Control (CDC 2005), Seville, Spain, pages 826–831, 2005.

• B. Vanluyten, J. C. Willems and B. De Moor. Matrix Factorization and Stochastic State Representations. In Proc. of the 45th IEEE Conference on Decision and Control (CDC 2006), San Diego, California, pages 4188-4193, 2006.

• I. Markovsky, J. Boets, B. Vanluyten, K. De Cock, B. De Moor. When is a pole spurious? In Proc. of the International Conference on Noise and Vibration Engineering (ISMA 2007), Leuven, Belgium, pp. 1615–1626, 2007.

• B. Vanluyten, J. C. Willems and B. De Moor. Equivalence of State Representations for Hidden Markov Models. In Proc. of the European Control Conference 2007 (ECC 2007), Kos, Greece, 2007.

• B. Vanluyten, J. C. Willems and B. De Moor. A new Approach for the Identification of Hidden Markov Models. In Proc. of the 46th IEEE Conference on Decision and Control (CDC 2006), New Orleans, Louisiana, 2007.

6. CONCLUSIONS

Conclusions—Further research—List of publications

Page 44: Bart Vanluyten

1. INTRODUCTION 2. MATRIX FACTORIZATIONS 3. REALIZATION 4. IDENTIFICATION 5. ESTIMATION 6. CONCLUSIONS

SLIDE

44 / 43