8
Applications of Hidden Markov Model: state-of-the-art Marcin PIETRZYKOWSKI and Wojciech SAŁABUN Department of Artificial Intelligence Methods and Applied Mathematics, Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, ul. Żołnierska 49, 71-210 Szczecin, Poland E-mail: [email protected], [email protected] Abstract This paper performs a state-of-the-art literature review to classify and interpret the ongoing and emerging issues associated with the Hidden Mar- kov Model (HMM) in the last decade. HMM is a commonly used method in many scientific areas. It is a temporal probabilistic model in which the state of the process is described by a single discrete ran- dom variable. The theory of HMMs was developed in the late 1960s. Now, it is especially known for its application in temporal pattern recognition, i.e. speech, handwriting, and bioinformatics. After a brief description of the study methodology, this paper comprehensively compares the most impor- tant HMM publications by field of interest, most cited authors, authors' nationalities, and scientific journals. The comparison is based on papers in- dexed in the Institute for Scientific Information (ISI) Web of Knowledge and ScienceDirect data- bases. Keywords: Markov Chains, Hidden Markov Model, application areas, literature review. 1. Introduction Hidden Markov Model (HMM) is a statistical model named after Russian mathematician Andrey Markov. It is a large and useful class of stochastic processes. It is characterized by Markov Property which means that future state of the process de- pends only upon the present state, not on the se- quence of events that preceded it. HMM was origi- nally introduced by Baum and Petrie [4]. The first, foremost and engineer-friendly work was an appli- cation of automatic speech recognition [56]. Mar- kov Models are very rich in mathematical structure and when applied properly, work very well in prac- tice for several applications. Hidden Markov Mod- els are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics. This paper provides a state-of- the-art literature survey on Hidden Markov Models applications and methodologies. A reference repo- sitory has been established based on a classification scheme, which includes 73 papers published in 42 scholarly journals since 2003 to 2012. The rest of the paper is set out as follows: in Section 2, we present the basics description of the HMM method with basic conceptions but the with- out detailed mathematical definitions. In this sec- tion, we show only the fundamental mathematical formulas which are necessary to introduce HMM method. Section 3 presents a methodology which is used to paper selection. In this part, we present a basic bibliographic parameters and statistics. Af- terwards, in Section 4, we show a set of most im- portant selected publications in respect to primary application areas and bibliographic parameters. Section 5 contains some concluding remarks. 2. Markov Model description Consider a system which consists of a set of N distinct states N S S S ,..., , 2 1 . At each discrete time moment t the system can be in a single state. We denote a single state in time t as t q . In general case the current state t q depends on the previous state 1 t q and the whole history of all previous states. In the case of Markov Chain the history is truncated to just the predecessor state. Moreover we consider a system in which transition between states is con- stant in time: N j i S q S q P a i t j t ij , 1 , | 1 (1) The probability matrix is defined as ij a A where 0 ij a and . 1 1 N j ij a That kind of sto- chastic process could be called an observable Mar- kov Model but it additionally needs the probability distribution for moment 1 t . The initial distribu- tion is denoted as: N i i 1 , (2) where 1 1 N i i . The Markov Model is defined by a pair: , A (3) Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391 IJCTA | July-August 2014 Available [email protected] 1384 ISSN:2229-6093

Applications of Hidden Markov Model: state-of-the-art of Hidden Markov Model: state-of-the-art Marcin PIETRZYKOWSKI and Wojciech SAŁABUN Department of Artificial Intelligence Methods

  • Upload
    vudiep

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Applications of Hidden Markov Model: state-of-the-art

Marcin PIETRZYKOWSKI and Wojciech SAŁABUN

Department of Artificial Intelligence Methods and Applied Mathematics, Faculty of Computer

Science and Information Technology, West Pomeranian University of Technology, Szczecin,

ul. Żołnierska 49, 71-210 Szczecin, Poland E-mail: [email protected], [email protected]

Abstract

This paper performs a state-of-the-art literature

review to classify and interpret the ongoing and

emerging issues associated with the Hidden Mar-

kov Model (HMM) in the last decade. HMM is a

commonly used method in many scientific areas. It

is a temporal probabilistic model in which the state

of the process is described by a single discrete ran-

dom variable. The theory of HMMs was developed

in the late 1960s. Now, it is especially known for its

application in temporal pattern recognition, i.e.

speech, handwriting, and bioinformatics. After a

brief description of the study methodology, this paper comprehensively compares the most impor-

tant HMM publications by field of interest, most

cited authors, authors' nationalities, and scientific

journals. The comparison is based on papers in-

dexed in the Institute for Scientific Information

(ISI) Web of Knowledge and ScienceDirect data-

bases.

Keywords: Markov Chains, Hidden Markov Model,

application areas, literature review.

1. Introduction Hidden Markov Model (HMM) is a statistical

model named after Russian mathematician Andrey

Markov. It is a large and useful class of stochastic

processes. It is characterized by Markov Property

which means that future state of the process de-

pends only upon the present state, not on the se-quence of events that preceded it. HMM was origi-

nally introduced by Baum and Petrie [4]. The first,

foremost and engineer-friendly work was an appli-

cation of automatic speech recognition [56]. Mar-

kov Models are very rich in mathematical structure

and when applied properly, work very well in prac-

tice for several applications. Hidden Markov Mod-

els are especially known for their application in

temporal pattern recognition such as speech,

handwriting, gesture recognition, part-of-speech

tagging, musical score following, partial discharges

and bioinformatics. This paper provides a state-of-

the-art literature survey on Hidden Markov Models

applications and methodologies. A reference repo-

sitory has been established based on a classification

scheme, which includes 73 papers published in 42

scholarly journals since 2003 to 2012. The rest of the paper is set out as follows: in

Section 2, we present the basics description of the

HMM method with basic conceptions but the with-

out detailed mathematical definitions. In this sec-

tion, we show only the fundamental mathematical

formulas which are necessary to introduce HMM

method. Section 3 presents a methodology which is

used to paper selection. In this part, we present a

basic bibliographic parameters and statistics. Af-

terwards, in Section 4, we show a set of most im-

portant selected publications in respect to primary

application areas and bibliographic parameters.

Section 5 contains some concluding remarks.

2. Markov Model description Consider a system which consists of a set of N

distinct states NSSS ,...,, 21 . At each discrete time

moment t the system can be in a single state. We

denote a single state in time t as tq . In general case

the current state tq depends on the previous state

1tq and the whole history of all previous states. In

the case of Markov Chain the history is truncated to

just the predecessor state. Moreover we consider a

system in which transition between states is con-

stant in time:

NjiSqSqPa itjtij ,1,| 1 (1)

The probability matrix is defined as ijaA

where 0ija and .11

N

j ija That kind of sto-

chastic process could be called an observable Mar-

kov Model but it additionally needs the probability

distribution for moment 1t . The initial distribu-

tion is denoted as:

Nii 1,

(2)

where 11

N

i i . The Markov Model is defined

by a pair:

,A

(3)

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1384

ISSN:2229-6093

Given the specified Markov Model three inter-

esting questions can be asked (and answered):

1. What is the probability of the sequence of obser-

vations O, e.g.: 914431 ,,,,, SSSSSSO ?

2. What is the probability distribution for all states

at t = T (after T – 1 moments passed by)?

3. What is the probability of staying at a fixed state

iS for exactly d successive moments, given that

the system is currently in that state and given the

model (where observation sequence is defined

as

i

d

jdiii SSSSSO ,,,...,,

121

)?

The paper only briefly describes the method. An-

swers for above questions and solutions to basic

problems (showed late in the section) will not be

described here but can be found in appropriate lite-

rature.

In a Markov Model, states of the model corres-

pond to observable events. In Hidden Markov

Model states are hidden and not observable. We

can only see the sequence of observations. The set

of observation symbols is finite and contains M

distinct elements. The observation symbol corres-

pond to the physical output of the system being

modeled. We denote the set of symbols as

MvvvV ,..., 21 . HMM is a double embedded

stochastic process. The first process determines

transitions from one state to another and is identical

to process described above. The second stochastic

process produce the sequence of observations. The

observation symbol probability distribution is de-

fined by matrix kbB j , where:

MkNjSqtatvPkb jtkj 1,1,| (4)

Each row of the matrix contains distribution of the

observation symbols for the specified single states

j. The HMM is defined as a triplet:

,, BA

(5)

There are three basic problems of interest that must

be solved for the model to be useful in real-word

applications:

1. Given a sequence of observations

TOOOO ,...,, 21 and a model , what is the

probability of the sequence given the model

|OP ?

2. Given a sequence of observations

TOOOO ,...,, 21 and a model , how do we

choose the best states sequence TqqqQ ,...,, 21

that correspond to the observations sequence O?

3. Given a sequence of observations

TOOOO ,...,, 21 and knowing M and N, how do

we tune the model (how to choose the best con-

tent for the triplet ,, BA in order to maxim-

ize |OP )?

The above problems can be solved with following

methods, respectively:

1. Forward-Backward Algorithm

2. Viterbi Algorithm

3. Baum-Welch reestimation procedure

HMM described above can be called as non-

parametric discrete HMM. Instead of probability

defined by matrix B we can use almost any proba-

bilistic parametric distribution e.g.: binomial,

Gaussian, Poisson, etc. For example observation

emission probability for Poisson Discrete Hidden

Markov Model is denoted as:

Nnn

enb

nj

j

j

,!

(6)

where )(nb j is a Poisson Model for state j with

parameter j . For more general models, e.g.: mul-

tinomials the parameter j could be a vector.

When observations are real value, the model is

called Continuous Hidden Markov Model. The

discrete observation probability )(kb j is replaced

by a continuous probability density function. For example for Gaussian Hidden Markov Model:

),,()( jjj xNxb

(7)

In discrete and continuous Hidden Markov Model

use of mixture probabilities is also possible. For

example Gaussian Mixture HMM has following

form:

M

mjmjmjmj xNcxb

1

,,

(8)

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1385

ISSN:2229-6093

where x is vector being modeled, jmc is mixture

coefficient for mth mixture in state j.

3. Study of the art The literature review was undertaken to identify

papers in the highest-ranking journals that provide

the most valuable information to researchers and

practitioners studying issues concerning the HMM.

For the last ten years (2003-2012) many significant

papers on the HMM were published. With this

scope in mind, we conducted an extensive search

for HMM in the title, abstract and keywords of

scientific papers. We particularly targeted ISI Web

of knowledge library and Elsevier databases. In this

period of time, 11,081 papers were indexed in ISI

Web of Knowledge and 11,764 papers were in-

dexed in ScienceDirect. Table 1. gives valuable

information regarding the frequency distribution by publication year. Since 2006, the number of pub-

lished papers exceeded the number of 1000 articles.

Almost one-third (27.81%) of the total number

of papers were published by U.S. researchers. This

is slightly less than all Chinese, English, French

and German scientists. A little more than three-

fourth (78.31%) of all papers were written by the

ten most productive nationalities. Table 2. shows

particular data on the most productive nationalities

that participated in HMM publications.

In the 10 most popular journals, 2,164 scholarly

papers were publicated. This is almost one-fifth

(19.53%) of all publicated papers. Table 3. shows

the number of scholarly papers by journal publica-

tion. According to Table 3., Lecture Notes in Com-

puter Science is the most popular source, it pub-

lished 636 papers (5:74%) of the total discussed HMM papers. The second place of productivity is

International Conference on Acoustics Speech and

Signal Processing, which published 389 (3.51%)

papers on HHM.

The most of papers on HMM have been written

by Pieczyński W. He is currently Professor at the

Telecom SudParis (ex Telecom INT). The result of

his research greatly improves classification by us-

ing HMM for unsupervised data [10, 18, 52]. Pro-

fessor Rigoll G. is also a leading scientist on HMM

(45 papers). He is the head of Institute for Human-

Machine Communication, Technical University

Munich. His paper on handwritten address recogni-

tion using HMM is the most frequently cited paper

of his research [9]. Table 4. shows the number of

scholarly papers by authors.

The most often cited article was cited 4013

times. The article describes improvements of the currently most popular method for prediction of

classically secreted proteins, SignalP. It consists of

predictors based on neural network and HMM [5].

The second is a paper on the SWISS-model work-

space. It is a web-based environment for protein

structure homology modeling. It was cited 2021

times [2]. Finally, the third is an article on Pfam,

publicated in a special issue of Nucleis Acids and

Research. Pfam is a comprehensive collection of

protein domains and families, represented as mul-

tiple sequence alignments and as profile HMMs

[20, 21].

4. Application areas The last 10 years have seen a large number of ma-

jor scientific papers, from the construction of an

extensive database of genomic information to the

better denoising of signals. Below is a short list of

the most important scientific achievement of the

last decade for a common application areas. We

selected 73 the most important papers in respect to

citation number. A lower bound of citation number

is determined as 300, because we would like to

select only the most important scientific articles.

HMM is most widely used and important in Ge-

netics and Heredity [1, 7, 8, 16, 25, 28, 36, 37, 39,

43, 48, 49, 51, 60, 63, 68, 69, 70, 71, 74, 75] and Biochemistry and Molecular Biology [2, 3, 5, 6, 13,

15, 17, 27, 30, 31, 33, 35, 44, 50, 59, 64, 66].

Table 1: The distribution of papers by year of

publication

Year ISI Web of

knowledge ScienceDirect

2003 746 635

2004 866 687

2005 946 839

2006 1,158 1,064

2007 1,293 1,113

2008 1,284 1,167

2009 1,448 1,400

2010 1,114 1,441

2011 1,139 1,549

2012 1,087 1,869

Total: 11,081 11,764

Table 2: The distribution of papers by authors’

nationality.

No. Country No. of

articles

Percent of

the all

1 USA 3,082 27.81

2 China 1,438 12.98

3 France 703 6.34

4 England 684 6.17

5 Germany 594 5.36

6 Canada 575 5.19

7 Japan 575 5.19

8 Australia 354 3.19

9 Italy 338 3.05

10 South Korea 335 3.02

Total: 8,678 78.31

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1386

ISSN:2229-6093

Forexample, The PANTHER (Protein ANalysis

THrough Evolutionary Relationships) database was

proposed for high-throughput analysis of protein

sequences. One of the key features is a statistical

models (Hidden Markov Models). Separate HMM

are built from each of protein groups. The advan-

tage of using HMMs is that new sequences can be

automatically classified as they become available.

The HMMs have been used to classify gene prod-ucts across the entire genomes of human [48, 49,

73, 74].

HMMs are used a very frequently to prediction.

For instance, a peptide predictor was presented in

the paper: A combined trans-membrane topology

and signal peptide prediction method. This predic-

tor was based on a HMM and try to model the dif-

ferent sequence regions of a signal peptide and the

different regions of a trans-membrane protein in a

series of interconnected states [33].

HMM also has a very significant impact in Elec-

tricity, Electronics, Computer Science and Artifi-

cial Intelligence [9, 10, 11, 18, 32, 42, 24, 46, 52,

54, 55, 61, 62, 72] and Mathematical and Computa-

tional Biology [26, 40, 41, 57, 65]. For example in

[10], authors dealed with the statistical restoration

of hidden discrete signals, extending the classical methodology based on HMM. The aim was to take

into account the hidden signal and complex rela-

tionships between the noises which can be from

different parametric models, non-independent, and

of class-varying nature. In the paper [65] authors

have generalized the alignment of protein se-

quences with a profile Hidden Markov Model

(HMM) to the case of pairwise alignment of profile

HMMs. They presented a method for detecting

distant homologous relationships between proteins

based on this approach. HMM was also used to

representation and recognition of a human gait

[32]. The gait information in the frame to exemplar

(FED) distance vector sequences was captured in a

HMM. In the second method, referred as the direct

approach, authors worked with the feature vector

directly (as opposed to computing the FED) and train HMM. The HMM parameters (specifically the

observation probability B) were estimated based on

the distance between the exemplars and the image

features. In this way, learning high-dimensional

probability density functions has been avoided. The

statistical nature of the HMMs lend overall robust-

ness to representation and recognition [9, 72]. The

HMM, MEME/MAST (Multiple Em for Motif Eli-

citation/Motif Alignment and Search Tool) and

hybrid model that combined two or more models

were developed in [57]. In result a high accuracy of

prediction was obtained. An another interesting

application of HMM is the Automatic Linguistic

Indexing of Pictures (ALIP) System. The paper

[42] introduces a statistical modeling approach to

automatic linguistic indexing of pictures. It is an

important but highly challenging problem for re-

searchers in computer vision and content-based

image retrieval. The authors implemented and

tested their ALIP system by used HMM. Whereas

the paper [55] describes a method for removing

noise from digital images, based on a statistical

model of the coefficients of an over-complete mul-

tiscale oriented basis. Neighborhoods of coeffi-

cients at adjacent positions and scales were mod-

eled as the product of two independent random variables: a Gaussian vector and a hidden positive

scalar multiplier.

Table 3: The distribution of papers by journals.

No. Name of source No. of

articles

1 Lecture Notes in Computer Sci-

ence 363

2

International Conference on

Acoustics Speech and Signal

Processing

389

3 Bioinformatics 243

4 Lecture Notes in Artificial Intelli-

gence 200

5 IEEE Transactions on Audio

Speech and Language Processing 188

6 BMC Bioinformatics 165

7 Nucleic Acids Research 132

8 Speech Communication 77

9 IEEE Transactions on Pattern

Analysis and Machine Intelligence 76

10 BMC Genomics 58

Total: 2,164

Table 4: The distribution of papers by author.

No. Name of author No. of articles

1 Pieczyński W. 49

2 Rigoll G. 45

3 Bunke H. 42

4 Liu Y. 42

5 Tokuda K. 41

6 Kobayashi T. 36

7 Carin L. 34

8 Nakamura Y. 32

9 Lee CH. 31

10 Schuller B. 31

Total: 383

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1387

ISSN:2229-6093

Other research areas include: Pharmacology and

Pharmacy [22, 23], Mechanics [29], Microbiology

[34, 38], Biophysics [47, 73], Cell Biology [12],

Neurosciences and Neurology [19] and Multidis-

ciplinary Sciences [45, 53]. For instance, authors in

[47] had developed an analysis scheme that casted

single-molecule time-binned FRET (Fluorescence

Resonance Energy Transfer) trajectories as HMMs.

5. Conclusions This paper performs a state-of the-art literature

review to classify and interpret the ongoing and

emerging issues that apply the HMM. Overall, au-

thors show that the HMMs have been successfully

applied to a wide range of application areas and

industrial sectors with varying terms and subjects.

The insights identified in this review will help

channel research efforts and fulfill researchers'

needs for easy references to HMM publications.

6. References [1] Abecasis, G. R., Wigginton, J.E.: Handling

Marker-Marker linkage disequilibrium: Pedigree

analysis with clustered Markers, AMERICAN

JOURNAL OF HUMAN GENETICS, 77(5), pp.

754-767, Nov. 2005.

[2] Arnold, K., Bordoli, L., Kopp, J., et al.: The

SWISS-MODEL workspace: a web-based envi-

ronment for protein structure homology modelling, BIOINFORMATICS, 22(2), pp. 195-201, Jan.

2006.

[3] Babu, M. M., Luscombe, N. M., Aravind, L., et

al.: Structure and evolution of transcriptional regu-

latory networks, CURRENT OPINION IN

STRUCTURAL BIOLOGY, 14(3), pp. 283-291,

Jun. 2004.

[4] Baum, L., Petrie, T. : Statistical inference for

probabilistic functions of finite state Markov chains

ANNALS OF MATHEMATICAL STATISTICS,

37, pp. 1554 - 1563. 1966.

[5] Bendtsen, J. D., Nielsen, H., von Heijne, G., et

al.: Improved prediction of signal peptides: SignalP

3.0, JOURNAL OF MOLECULAR BIOLOGY,

340(4), pp.783-795, Jul. 2004.

[6] Bennett-Lovsey, R. M., Herbert, A. D., Stern-

berg, M. J., et al.: Exploring the extremes of se-

quence/structure space with ensemble fold recogni-

tion in the program Phyre, PROTEINS-

STRUCTURE FUNCTION AND BIOINFOR-

MATICS, 70(3), pp. 611-625, Feb. 2008.

[7] Berriman, M., Haas, B. J., LoVerde, P. T., et

al.: The genome of the blood fluke Schistosoma

mansoni, NATURE, 460(7253), pp. 352-U65, Jul. 16 2009.

[8] Birney, E., Clamp, M., Durbin, R.: GeneWise

and genomewise, GENOME RESEARCH, 14(5),

pp. 988-995, May 2004.

[9] Brakensiek, A., Rigoll, G.: Handwritten address

recognition using hidden Markov models, LEC-

TURE NOTES IN COMPUTER SCIENCE, 2956,

pp. 103-122, 2004.

[10] Brunel, N., Pieczynski, W.: Unsupervised sig-

nal restoration using hidden Markov chains with

copulas, SIGNAL PROCESSINGS , 85(12), pp. 2304-2315, May 2005.

[11] Cappe, O., Godsill, S. J., Moulines, E.: An

overview of existing methods and recent advances

in sequential Monte Carlo, PROCEEDINGS OF

THE IEEE , 95(5), pp. 899-924, May 2007.

[12] Carter, C., Pan, S. Q., Jan, Z. H., et al.: The

vegetative vacuole proteorne of Arabidopsis thalia-

na reveals predicted and unexpected proteins,

PLANT CELL, 16(12), pp. 3285-3303, Dec. 2004.

[13] Chandonia, J. M., Hon, G., Walker, N. S., et

al.: The ASTRAL Compendium in 2004, NUC-

LEIC ACIDS RESEARCH, 32Special (SI), pp. D189-D192, Jan. 2004.

[14] Cohen, I., Sebe, N., Garg, A., et al.: Facial

expression recognition from video sequences: tem-

poral and static modeling, COMPUTER VISION

AND IMAGE UNDERSTANDING, 91(1-2), pp.

160-187, Jul.-Aug. 2003.

[15] Colella, S., Yau, C., Taylor, J. M., et al.:

QuantiSNP: an Objective Bayes Hidden-Markov

Model to detect and accurately map copy number

variation using SNP genotyping data, NUCLEIC

ACIDS RESEARCH, 35(6), pp. 2013-2025, Mar.

2007.

[16] Corander, J., Waldmann, P., Sillanpaa, M. J.:

Bayesian analysis of genetic differentiation be-

tween populations, GENETICS, 163(1), pp. 367-

374, Jan 2003.

[17] D'Andrea, L. D., Regan, L.: TPR proteins: the

versatile helix, TRENDS IN BIOCHEMICAL

SCIENCES, 28(12), pp. 655-662, Dec. 2003.

[18] Derrode, S., Pieczynski, W.: Signal and image

segmentation using Pairwise Markov chains, IEEE

TRANSACTION ON SIGNAL PROCESSING,

52(9), pp. 2477-2489, Sep. 2004.

[19] Dombeck, D. A., Khabbaz, A. N., Collman, F.,

et al.: Imaging large-scale neural activity with cel-

lular resolution in awake, mobile mice, NEURON,

56(1), pp. 43-57, Oct. 2007.

[20] Finn, R. D., Mistry, J., Tate, J., et al.: The

Pfam protein families database, NUCLEIC ACIDS

RESEARCH, 38(S1), pp. D211-D222, Jan. 2010.

[21] Finn, R. D., Tate, J., Mistry, J., et al.: The

Pfam protein families database, NUCLEIC ACIDS

RESEARCH, 36(SI), pp. D281-D288, Jan. 2008.

[22] Fredriksson, R., Lagerstrom, M. C., Lundin, L. G., et al.: The G-protein-coupled receptors in the

human genome form five main families. Phyloge-

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1388

ISSN:2229-6093

netic analysis, paralogon groups, and fingerprints,

MOLECULAR PHARMACOLOGY, 63(6),

pp.1256-1272, Jun. 2003.

[23] Fredriksson, R., Schioth, H. B.: The repertoire

of G-protein-coupled receptors in fully sequenced

genomes, MOLECULAR PHARMACOLOGY,

67(5), pp. 1414-1425, May 2005.

[24] Fridlyand, J., Snijders, A. M., Pinkel, D., et

al.: Hidden Markov models approach to the analy-

sis of array CGH data, JOURNAL OF MULTI-

VARIATE ANALYSIS, 90(1), pp. 132-153, Jul.

2004.

[25] Gerstein, M. B., Bruce, C., Rozowsky, J. S., et

al.: What is a gene, post-ENCODE? History and

updated definition, GENOME RESEARCH, 17(6),

pp. 669-681, Jun. 2007.

[26] Haft, D. H., Selengut, J., Mongodin, E.F., et

al.: A guild of 45 CRISPR-associated (Cas) protein

families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes, PLOS COMPUTATION-

AL BIOLOGY, 1(6), pp. 474-483, Nov. 2005.

[27] Haft, D. H., Selengut, J. D., White, O.: The

TIGRFAMs database of protein families, NUC-

LEIC ACIDS RESEARCH, 31(1), pp. 371-373,

Jan. 2003.

[28] Hoggart, C. J., Parra, E. J., Shriver, M. D., et

al.: Control of confounding of genetic associations

in stratified populations, AMERICAN JOURNAL

OF HUMAN GENETICS, 72(6), pp. 1492-1504,

Jun. 2003.

[29] Jardine, A. K., Lin, D., Banjevic, D.: A review

on machinery diagnostics and prognostics imple-

menting condition-based maintenance, MECHAN-

ICAL SYSTEMS AND SIGNAL PROCESSING,

20(7), pp. 1483-1510, Oct. 2006.

[30] Juncker, A. S., Willenbrock, H., Von Heijne,

G., et al.: Prediction of lipoprotein signal peptides

in Gram-negative bacteria, PROTEIN SCIENCE,

12(8), pp. 1652-1662, Aug. 2003.

[31] Kaell, L., Krogh, A., Sonnhammer, E. L.: Ad-

vantages of combined transmembrane topology and

signal peptide prediction - the Phobius web server,

NUCLEIC ACIDS RESEARCH, 35(S), pp. W429-W432, Jul. 2007.

[32] Kale, A., Sundaresan, A., Rajagopalan, A. N.,

et al.: Identification of humans using gait, IEEE

TRANSACTIONS ON IMAGE PROCESSING,

13(9), pp. 1163-1173, Sep. 2004.

[33] Kall, L., Krogh, A., Sonnhammer, E. L.: A

combined transmembrane topology and signal pep-

tide prediction method, JOURNAL OF MOLECU-

LAR BIOLOGY, 338(5), pp. 1027-1036, May

2004.

[34] Kazmierczak, M. J., Mithoe, S. C. , Boor, K.

J., et al.: Listeria monocytogenes sigma(B) regu-

lates stress response and virulence functions,

JOURNAL OF BACTERIOLOGY, 185(19), pp.

5722-5734, Oct. 2003.

[35] Kim, D. E., Chivian, D., Baker, D.: Protein

structure prediction and analysis using the Robetta

server, NUCLEIC ACIDS RESEARCH, 32(S2),

pp. W526-W531, Jul. 2004.

[36] Kim, H., Melen, K., Osterberg, M., et al.: A

global topology map of the Saccharomyces cerevi-

siae membrane proteome, PROCEEDINGS OF

THE NATIONAL ACADEMY OF SCIENCES OF

THE UNITED STATES OF AMERICA, 103(30),

pp. 11142-11147, Jul. 2006.

[37] Korn, J. M., Kuruvilla, F. G., McCarroll, S.

A., et al.: Integrated genotype calling and associa-

tion analysis of SNPs, common copy number po-

lymorphisms and rare CNVs, NATURE GENET-

ICS, 40(10), pp. 1253-1260, Oct. 2008.

[38] la Cour, T., Kiemer, L., Molgaard, A., et al.: Analysis and prediction of leucine-rich nuclear

export signals, PROTEIN ENGINEERING DE-

SIGN AND SELECTION, 17(6), pp. 527-536, Jun.

2004.

[39] Lagesen, K., Hallin, P., Rodland, E. A., et al.:

RNAmmer: consistent and rapid annotation of ribo-

somal RNA genes, NUCLEIC ACIDS RE-

SEARCH, 35(9), pp.3100-3108, May 2007.

[40] Lai, W. R., Johnson, M. D., Kucherlapati, R.,

et al.: Comparative analysis of algorithms for iden-

tifying amplifications and deletions in array CGH

data, BIOINFORMATICS, 21(19), pp. 3763-3770,

Oct. 2005.

[41] Leslie, C. S., Eskin, E., Cohen, A., et al.:

Mismatch string kernels for discriminative protein

classification, BIOINFORMATICS, 20(4), pp. 467-

476, Mar. 2004.

[42] Li, J., Wang, J.Z.: Automatic linguistic index-

ing of pictures by a statistical modeling approach,

IEEE TRANSACTIONS ON PATTERN ANALY-

SIS AND MACHINE INTELLIGENCE, 25(9), pp.

1075-1088, Sep. 2003.

[43] Li, N., Stephens, M.: Modeling linkage dise-

quilibrium and identifying recombination hotspots using single-nucleotide polymorphism data, GE-

NETICS, 165(4), pp. 2213-2233, Dec. 2003.

[44] Liu, G. Y., Loraine, A. E., Shigeta, R., et al.:

NetAfix: Afiymetrix probesets and annotations,

NUCLEIC ACIDS RESEARCH, 31(1), pp. 82-86,

Jan. 2003.

[45] Loytynoja, A., Goldman, N.: An algorithm for

progressive multiple alignment of sequences with

insertions, PROCEEDINGS OF THE NATIONAL

ACADEMY OF SCIENCES OF THE UNITED

STATES OF AMERICA, 102(30), pp. 10557-

10562, Jul. 2005.

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1389

ISSN:2229-6093

[46] Markou, M., Singh, S.: Novelty detection: a

review - part 1: statistical approaches, SIGNAL

PROCESSING, 83(12), pp. 2481-2497, Dec. 2003.

[47] McKinney, S. A., Joo, C., Ha, T.: Analysis of

single-molecule FRET trajectories using hidden

Markov modeling, BIOPHYSICAL JOURNAL,

91(5), pp. 1941-1951, Sep. 2006.

[48] Mi, H., Guo, N., Kejariwal, A., et al.:

PANTHER version 6: protein sequence and func-

tion evolution data with expanded representation of

biological pathways, NUCLEIC ACIDS RE-

SEARCH, 35(SI), pp. D247-D252, Jan. 2007.

[49] Mi, H. Y., Lazareva-Ulitsky, B., Loo, R., et

al.: The PANTHER database of protein families,

subfamilies, functions and pathways, NUCLEIC

ACIDS RESEARCH, 33(SI), pp. D284-D288, Jan.

2005.

[50] Nielsen, M., Lundegaard, C., Worning, P., et

al.: Reliable prediction of T-cell epitopes using neural networks with novel sequence representa-

tions, PROTEIN SCIENCE, 12(5), pp. 1007-1017,

May 2003.

[51] Patterson, T. A., Thomas, L., Wilcox, C., et

al.: State-space models of individual animal

movement, TRENDS IN ECOLOGY AND EVO-

LUTION, 23(2), pp. 87-94, Feb. 2008.

[52] Pieczyński W.: Pairwise Markov chains, IEEE

TRANSACTION ON PATTERN ANALYSIS

AND MACHINE INTELLIGENCE, 25(5), pp.

634-639, May. 2010.

[53] Pinto, D., Pagnamenta, A. T., Klei, L., et al.:

Functional impact of global rare copy number vari-

ation in autism spectrum disorders, NATURE,

466(7304), pp. 368-372, Jul. 15 2010.

[54] Po, D. D., Do, M. N.: Directional multiscale

modeling of images using the contourlet transform,

IEEE TRANSACTIONS ON IMAGE

PROCESSING, 15(6), pp. 1610-1620, Jun. 2006.

[55] Portilla, J., Strela, V., Wainwright, M.J., et al.:

Image denoising using scale mixtures of Gaussians

in the wavelet domain, IEEE TRANSACTIONS

ON IMAGE PROCESSING, 12(11), pp.1338-

1351, Nov. 2003.

[56] Rabiner, L. R.: A tutorial on hidden Markov

models and selected application in speech recogni-

tion, PROCEEDINGS OF THE IEEE , 77(2), pp.

257-286, 1989.

[57] Rashid, M., Saha, S., Raghava, G. P.: Support

Vector Machine-based method for predicting sub-

cellular localization of mycobacterial proteins us-

ing evolutionary information and motifs, BMC

BIOINFORMATICS, 8(337), Sep. 2007.

[58] Riley, T., Sontag, E., Chen, P., et al.: Tran-

scriptional control of human p53-regulated genes,

NATURE REVIEWS MOLECULAR CELL BI-

OLOGY, 9(5), pp. 402-412, May 2008.

[59] Sadreyev, R., Grishin, N.: COMPASS: A tool

for comparison of multiple protein alignments with

assessment of statistical significance, JOURNAL

OF MOLECULAR BIOLOGY, 326(1), pp. 317-

336, Feb. 2003.

[60] Scheet, P., Stephens, M.: A fast and exible

statistical model for large-scale population geno-

type data: Applications to inferring missing geno-

types and haplotypic phase, AMERICAN JOUR-

NAL OF HUMAN GENETICS, 78(4), pp. 629-

644, Apr. 2006.

[61] Sheikh, H. R., Bovik, A. C.: Image informa-

tion and visual quality, IEEE TRANSACTIONS

ON IMAGE PROCESSING, 15(2), pp. 430-444,

Feb. 2006.

[62] Sheikh, HR, Bovik, A. C., de Veciana, G.: An

information fidelity criterion for image quality as-sessment using natural scene statistics, IEEE

TRANSACTIONS ON IMAGE PROCESS-ING,

14(12), pp. 2117-2128, Dec. 2005.

[63] Siepel, A., Bejerano, G., Pedersen, J. S., et al.:

Evolutionarily conserved elements in vertebrate,

insect, worm, and yeast genomes, GENOME RE-

SEARCH, 15(8), pp. 1034-1050, Aug. 2005.

[64] Siepel, A., Haussler, D.: Phylogenetic estima-

tion of context-dependent substitution rates by

maximum likelihood, MOLECULAR BIOLOGY

AND EVOLUTION, 21(3), pp. 468-488, Mar.

2004.

[65] Soding, J.: Protein homology detection by

HMM-HMM comparison, BIOINFORMATICS,

21(7), pp. 951-960, Apr. 2005.

[66] Soding, J., Biegert, A., Lupas, A. N.: The

HHpred interactive server for protein homology

detection and structure prediction, NUCLEIC AC-

IDS RESEARCH, 33(2), pp. W244-W248, Jul.

2005.

[67] Thomas, P. D., Campbell, M. J., Kejariwal, A.,

et al.: PANTHER: A library of protein families and

subfamilies indexed by function, GENOME RE-

SEARCH, 13(9), pp. 2129-2141, Sep. 2003.

[68] Thomas, P. D., Kejariwal, A., Campbell, M. J.,

et al.: PANTHER: a browsable database of gene

products organized by biological function, using

curated protein family and subfamily classification,

NUCLEIC ACIDS RESEARCH, 31(1), pp. 334-

341, Jan. 2003.

[69] Tunnaclifie, A., Wise, M. J.: The continuing

conundrum of the LEA proteins, NATURWIS-

SENSCHAFTEN, 94(10), pp. 791-812, Oct. 2007.

[70] Vassilatis, D. K., Hohmann, J. G., Zeng, H., et

al.: The G protein-coupled receptor repertoires of

human and mouse, PROCEEDINGS OF THE NA-

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1390

ISSN:2229-6093

TIONAL ACADEMY OF SCIENCES OF THE

UNITED STATES OF AMERICA, 100(8), pp.

4903-4908, Apr. 2003.

[71] Wang, K., Li, M., Hadley, D., et al.:

PennCNV: An integrated hidden Markov model

designed for high-resolution copy number variation

detection in whole-genome SNP genotyping data GENOME RESEARCH, 17(11), pp. 1665-1674,

Nov. 2007.

[72] Wang, L. A., Hu, W. M., Tan, T. N.: Recent

developments in human motion analysis, PAT-

TERN RECOGNITION, 36(3), pp. 585-601, Mar.

2003.

[73] Whisstock, J. C., Lesk, A. M.: Prediction of

protein function from protein sequence and struc-

ture, QUARTERLY REVIEWS OF BIOPHYSICS,

36(3), pp. 307-340, Aug. 2003.

[74] Xu, R., Wunsch, D.: Survey of clustering algo-

rithms, IEEE TRANSACTIONS ON NEURAL NETWORKS, 16(3), pp.645-678, May 2005.

[75] Zhang, Z. M., Henzel, W. J.: Signal peptide

prediction based on analysis of experimentally veri-

fied cleavage sites, PROTEIN SCIENCE, 13(10),

pp. 2819-2824, Oct. 2004.

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

IJCTA | July-August 2014 Available [email protected]

1391

ISSN:2229-6093