Upload
vudiep
View
213
Download
1
Embed Size (px)
Citation preview
Applications of Hidden Markov Model: state-of-the-art
Marcin PIETRZYKOWSKI and Wojciech SAŁABUN
Department of Artificial Intelligence Methods and Applied Mathematics, Faculty of Computer
Science and Information Technology, West Pomeranian University of Technology, Szczecin,
ul. Żołnierska 49, 71-210 Szczecin, Poland E-mail: [email protected], [email protected]
Abstract
This paper performs a state-of-the-art literature
review to classify and interpret the ongoing and
emerging issues associated with the Hidden Mar-
kov Model (HMM) in the last decade. HMM is a
commonly used method in many scientific areas. It
is a temporal probabilistic model in which the state
of the process is described by a single discrete ran-
dom variable. The theory of HMMs was developed
in the late 1960s. Now, it is especially known for its
application in temporal pattern recognition, i.e.
speech, handwriting, and bioinformatics. After a
brief description of the study methodology, this paper comprehensively compares the most impor-
tant HMM publications by field of interest, most
cited authors, authors' nationalities, and scientific
journals. The comparison is based on papers in-
dexed in the Institute for Scientific Information
(ISI) Web of Knowledge and ScienceDirect data-
bases.
Keywords: Markov Chains, Hidden Markov Model,
application areas, literature review.
1. Introduction Hidden Markov Model (HMM) is a statistical
model named after Russian mathematician Andrey
Markov. It is a large and useful class of stochastic
processes. It is characterized by Markov Property
which means that future state of the process de-
pends only upon the present state, not on the se-quence of events that preceded it. HMM was origi-
nally introduced by Baum and Petrie [4]. The first,
foremost and engineer-friendly work was an appli-
cation of automatic speech recognition [56]. Mar-
kov Models are very rich in mathematical structure
and when applied properly, work very well in prac-
tice for several applications. Hidden Markov Mod-
els are especially known for their application in
temporal pattern recognition such as speech,
handwriting, gesture recognition, part-of-speech
tagging, musical score following, partial discharges
and bioinformatics. This paper provides a state-of-
the-art literature survey on Hidden Markov Models
applications and methodologies. A reference repo-
sitory has been established based on a classification
scheme, which includes 73 papers published in 42
scholarly journals since 2003 to 2012. The rest of the paper is set out as follows: in
Section 2, we present the basics description of the
HMM method with basic conceptions but the with-
out detailed mathematical definitions. In this sec-
tion, we show only the fundamental mathematical
formulas which are necessary to introduce HMM
method. Section 3 presents a methodology which is
used to paper selection. In this part, we present a
basic bibliographic parameters and statistics. Af-
terwards, in Section 4, we show a set of most im-
portant selected publications in respect to primary
application areas and bibliographic parameters.
Section 5 contains some concluding remarks.
2. Markov Model description Consider a system which consists of a set of N
distinct states NSSS ,...,, 21 . At each discrete time
moment t the system can be in a single state. We
denote a single state in time t as tq . In general case
the current state tq depends on the previous state
1tq and the whole history of all previous states. In
the case of Markov Chain the history is truncated to
just the predecessor state. Moreover we consider a
system in which transition between states is con-
stant in time:
NjiSqSqPa itjtij ,1,| 1 (1)
The probability matrix is defined as ijaA
where 0ija and .11
N
j ija That kind of sto-
chastic process could be called an observable Mar-
kov Model but it additionally needs the probability
distribution for moment 1t . The initial distribu-
tion is denoted as:
Nii 1,
(2)
where 11
N
i i . The Markov Model is defined
by a pair:
,A
(3)
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1384
ISSN:2229-6093
Given the specified Markov Model three inter-
esting questions can be asked (and answered):
1. What is the probability of the sequence of obser-
vations O, e.g.: 914431 ,,,,, SSSSSSO ?
2. What is the probability distribution for all states
at t = T (after T – 1 moments passed by)?
3. What is the probability of staying at a fixed state
iS for exactly d successive moments, given that
the system is currently in that state and given the
model (where observation sequence is defined
as
i
d
jdiii SSSSSO ,,,...,,
121
)?
The paper only briefly describes the method. An-
swers for above questions and solutions to basic
problems (showed late in the section) will not be
described here but can be found in appropriate lite-
rature.
In a Markov Model, states of the model corres-
pond to observable events. In Hidden Markov
Model states are hidden and not observable. We
can only see the sequence of observations. The set
of observation symbols is finite and contains M
distinct elements. The observation symbol corres-
pond to the physical output of the system being
modeled. We denote the set of symbols as
MvvvV ,..., 21 . HMM is a double embedded
stochastic process. The first process determines
transitions from one state to another and is identical
to process described above. The second stochastic
process produce the sequence of observations. The
observation symbol probability distribution is de-
fined by matrix kbB j , where:
MkNjSqtatvPkb jtkj 1,1,| (4)
Each row of the matrix contains distribution of the
observation symbols for the specified single states
j. The HMM is defined as a triplet:
,, BA
(5)
There are three basic problems of interest that must
be solved for the model to be useful in real-word
applications:
1. Given a sequence of observations
TOOOO ,...,, 21 and a model , what is the
probability of the sequence given the model
|OP ?
2. Given a sequence of observations
TOOOO ,...,, 21 and a model , how do we
choose the best states sequence TqqqQ ,...,, 21
that correspond to the observations sequence O?
3. Given a sequence of observations
TOOOO ,...,, 21 and knowing M and N, how do
we tune the model (how to choose the best con-
tent for the triplet ,, BA in order to maxim-
ize |OP )?
The above problems can be solved with following
methods, respectively:
1. Forward-Backward Algorithm
2. Viterbi Algorithm
3. Baum-Welch reestimation procedure
HMM described above can be called as non-
parametric discrete HMM. Instead of probability
defined by matrix B we can use almost any proba-
bilistic parametric distribution e.g.: binomial,
Gaussian, Poisson, etc. For example observation
emission probability for Poisson Discrete Hidden
Markov Model is denoted as:
Nnn
enb
nj
j
j
,!
(6)
where )(nb j is a Poisson Model for state j with
parameter j . For more general models, e.g.: mul-
tinomials the parameter j could be a vector.
When observations are real value, the model is
called Continuous Hidden Markov Model. The
discrete observation probability )(kb j is replaced
by a continuous probability density function. For example for Gaussian Hidden Markov Model:
),,()( jjj xNxb
(7)
In discrete and continuous Hidden Markov Model
use of mixture probabilities is also possible. For
example Gaussian Mixture HMM has following
form:
M
mjmjmjmj xNcxb
1
,,
(8)
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1385
ISSN:2229-6093
where x is vector being modeled, jmc is mixture
coefficient for mth mixture in state j.
3. Study of the art The literature review was undertaken to identify
papers in the highest-ranking journals that provide
the most valuable information to researchers and
practitioners studying issues concerning the HMM.
For the last ten years (2003-2012) many significant
papers on the HMM were published. With this
scope in mind, we conducted an extensive search
for HMM in the title, abstract and keywords of
scientific papers. We particularly targeted ISI Web
of knowledge library and Elsevier databases. In this
period of time, 11,081 papers were indexed in ISI
Web of Knowledge and 11,764 papers were in-
dexed in ScienceDirect. Table 1. gives valuable
information regarding the frequency distribution by publication year. Since 2006, the number of pub-
lished papers exceeded the number of 1000 articles.
Almost one-third (27.81%) of the total number
of papers were published by U.S. researchers. This
is slightly less than all Chinese, English, French
and German scientists. A little more than three-
fourth (78.31%) of all papers were written by the
ten most productive nationalities. Table 2. shows
particular data on the most productive nationalities
that participated in HMM publications.
In the 10 most popular journals, 2,164 scholarly
papers were publicated. This is almost one-fifth
(19.53%) of all publicated papers. Table 3. shows
the number of scholarly papers by journal publica-
tion. According to Table 3., Lecture Notes in Com-
puter Science is the most popular source, it pub-
lished 636 papers (5:74%) of the total discussed HMM papers. The second place of productivity is
International Conference on Acoustics Speech and
Signal Processing, which published 389 (3.51%)
papers on HHM.
The most of papers on HMM have been written
by Pieczyński W. He is currently Professor at the
Telecom SudParis (ex Telecom INT). The result of
his research greatly improves classification by us-
ing HMM for unsupervised data [10, 18, 52]. Pro-
fessor Rigoll G. is also a leading scientist on HMM
(45 papers). He is the head of Institute for Human-
Machine Communication, Technical University
Munich. His paper on handwritten address recogni-
tion using HMM is the most frequently cited paper
of his research [9]. Table 4. shows the number of
scholarly papers by authors.
The most often cited article was cited 4013
times. The article describes improvements of the currently most popular method for prediction of
classically secreted proteins, SignalP. It consists of
predictors based on neural network and HMM [5].
The second is a paper on the SWISS-model work-
space. It is a web-based environment for protein
structure homology modeling. It was cited 2021
times [2]. Finally, the third is an article on Pfam,
publicated in a special issue of Nucleis Acids and
Research. Pfam is a comprehensive collection of
protein domains and families, represented as mul-
tiple sequence alignments and as profile HMMs
[20, 21].
4. Application areas The last 10 years have seen a large number of ma-
jor scientific papers, from the construction of an
extensive database of genomic information to the
better denoising of signals. Below is a short list of
the most important scientific achievement of the
last decade for a common application areas. We
selected 73 the most important papers in respect to
citation number. A lower bound of citation number
is determined as 300, because we would like to
select only the most important scientific articles.
HMM is most widely used and important in Ge-
netics and Heredity [1, 7, 8, 16, 25, 28, 36, 37, 39,
43, 48, 49, 51, 60, 63, 68, 69, 70, 71, 74, 75] and Biochemistry and Molecular Biology [2, 3, 5, 6, 13,
15, 17, 27, 30, 31, 33, 35, 44, 50, 59, 64, 66].
Table 1: The distribution of papers by year of
publication
Year ISI Web of
knowledge ScienceDirect
2003 746 635
2004 866 687
2005 946 839
2006 1,158 1,064
2007 1,293 1,113
2008 1,284 1,167
2009 1,448 1,400
2010 1,114 1,441
2011 1,139 1,549
2012 1,087 1,869
Total: 11,081 11,764
Table 2: The distribution of papers by authors’
nationality.
No. Country No. of
articles
Percent of
the all
1 USA 3,082 27.81
2 China 1,438 12.98
3 France 703 6.34
4 England 684 6.17
5 Germany 594 5.36
6 Canada 575 5.19
7 Japan 575 5.19
8 Australia 354 3.19
9 Italy 338 3.05
10 South Korea 335 3.02
Total: 8,678 78.31
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1386
ISSN:2229-6093
Forexample, The PANTHER (Protein ANalysis
THrough Evolutionary Relationships) database was
proposed for high-throughput analysis of protein
sequences. One of the key features is a statistical
models (Hidden Markov Models). Separate HMM
are built from each of protein groups. The advan-
tage of using HMMs is that new sequences can be
automatically classified as they become available.
The HMMs have been used to classify gene prod-ucts across the entire genomes of human [48, 49,
73, 74].
HMMs are used a very frequently to prediction.
For instance, a peptide predictor was presented in
the paper: A combined trans-membrane topology
and signal peptide prediction method. This predic-
tor was based on a HMM and try to model the dif-
ferent sequence regions of a signal peptide and the
different regions of a trans-membrane protein in a
series of interconnected states [33].
HMM also has a very significant impact in Elec-
tricity, Electronics, Computer Science and Artifi-
cial Intelligence [9, 10, 11, 18, 32, 42, 24, 46, 52,
54, 55, 61, 62, 72] and Mathematical and Computa-
tional Biology [26, 40, 41, 57, 65]. For example in
[10], authors dealed with the statistical restoration
of hidden discrete signals, extending the classical methodology based on HMM. The aim was to take
into account the hidden signal and complex rela-
tionships between the noises which can be from
different parametric models, non-independent, and
of class-varying nature. In the paper [65] authors
have generalized the alignment of protein se-
quences with a profile Hidden Markov Model
(HMM) to the case of pairwise alignment of profile
HMMs. They presented a method for detecting
distant homologous relationships between proteins
based on this approach. HMM was also used to
representation and recognition of a human gait
[32]. The gait information in the frame to exemplar
(FED) distance vector sequences was captured in a
HMM. In the second method, referred as the direct
approach, authors worked with the feature vector
directly (as opposed to computing the FED) and train HMM. The HMM parameters (specifically the
observation probability B) were estimated based on
the distance between the exemplars and the image
features. In this way, learning high-dimensional
probability density functions has been avoided. The
statistical nature of the HMMs lend overall robust-
ness to representation and recognition [9, 72]. The
HMM, MEME/MAST (Multiple Em for Motif Eli-
citation/Motif Alignment and Search Tool) and
hybrid model that combined two or more models
were developed in [57]. In result a high accuracy of
prediction was obtained. An another interesting
application of HMM is the Automatic Linguistic
Indexing of Pictures (ALIP) System. The paper
[42] introduces a statistical modeling approach to
automatic linguistic indexing of pictures. It is an
important but highly challenging problem for re-
searchers in computer vision and content-based
image retrieval. The authors implemented and
tested their ALIP system by used HMM. Whereas
the paper [55] describes a method for removing
noise from digital images, based on a statistical
model of the coefficients of an over-complete mul-
tiscale oriented basis. Neighborhoods of coeffi-
cients at adjacent positions and scales were mod-
eled as the product of two independent random variables: a Gaussian vector and a hidden positive
scalar multiplier.
Table 3: The distribution of papers by journals.
No. Name of source No. of
articles
1 Lecture Notes in Computer Sci-
ence 363
2
International Conference on
Acoustics Speech and Signal
Processing
389
3 Bioinformatics 243
4 Lecture Notes in Artificial Intelli-
gence 200
5 IEEE Transactions on Audio
Speech and Language Processing 188
6 BMC Bioinformatics 165
7 Nucleic Acids Research 132
8 Speech Communication 77
9 IEEE Transactions on Pattern
Analysis and Machine Intelligence 76
10 BMC Genomics 58
Total: 2,164
Table 4: The distribution of papers by author.
No. Name of author No. of articles
1 Pieczyński W. 49
2 Rigoll G. 45
3 Bunke H. 42
4 Liu Y. 42
5 Tokuda K. 41
6 Kobayashi T. 36
7 Carin L. 34
8 Nakamura Y. 32
9 Lee CH. 31
10 Schuller B. 31
Total: 383
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1387
ISSN:2229-6093
Other research areas include: Pharmacology and
Pharmacy [22, 23], Mechanics [29], Microbiology
[34, 38], Biophysics [47, 73], Cell Biology [12],
Neurosciences and Neurology [19] and Multidis-
ciplinary Sciences [45, 53]. For instance, authors in
[47] had developed an analysis scheme that casted
single-molecule time-binned FRET (Fluorescence
Resonance Energy Transfer) trajectories as HMMs.
5. Conclusions This paper performs a state-of the-art literature
review to classify and interpret the ongoing and
emerging issues that apply the HMM. Overall, au-
thors show that the HMMs have been successfully
applied to a wide range of application areas and
industrial sectors with varying terms and subjects.
The insights identified in this review will help
channel research efforts and fulfill researchers'
needs for easy references to HMM publications.
6. References [1] Abecasis, G. R., Wigginton, J.E.: Handling
Marker-Marker linkage disequilibrium: Pedigree
analysis with clustered Markers, AMERICAN
JOURNAL OF HUMAN GENETICS, 77(5), pp.
754-767, Nov. 2005.
[2] Arnold, K., Bordoli, L., Kopp, J., et al.: The
SWISS-MODEL workspace: a web-based envi-
ronment for protein structure homology modelling, BIOINFORMATICS, 22(2), pp. 195-201, Jan.
2006.
[3] Babu, M. M., Luscombe, N. M., Aravind, L., et
al.: Structure and evolution of transcriptional regu-
latory networks, CURRENT OPINION IN
STRUCTURAL BIOLOGY, 14(3), pp. 283-291,
Jun. 2004.
[4] Baum, L., Petrie, T. : Statistical inference for
probabilistic functions of finite state Markov chains
ANNALS OF MATHEMATICAL STATISTICS,
37, pp. 1554 - 1563. 1966.
[5] Bendtsen, J. D., Nielsen, H., von Heijne, G., et
al.: Improved prediction of signal peptides: SignalP
3.0, JOURNAL OF MOLECULAR BIOLOGY,
340(4), pp.783-795, Jul. 2004.
[6] Bennett-Lovsey, R. M., Herbert, A. D., Stern-
berg, M. J., et al.: Exploring the extremes of se-
quence/structure space with ensemble fold recogni-
tion in the program Phyre, PROTEINS-
STRUCTURE FUNCTION AND BIOINFOR-
MATICS, 70(3), pp. 611-625, Feb. 2008.
[7] Berriman, M., Haas, B. J., LoVerde, P. T., et
al.: The genome of the blood fluke Schistosoma
mansoni, NATURE, 460(7253), pp. 352-U65, Jul. 16 2009.
[8] Birney, E., Clamp, M., Durbin, R.: GeneWise
and genomewise, GENOME RESEARCH, 14(5),
pp. 988-995, May 2004.
[9] Brakensiek, A., Rigoll, G.: Handwritten address
recognition using hidden Markov models, LEC-
TURE NOTES IN COMPUTER SCIENCE, 2956,
pp. 103-122, 2004.
[10] Brunel, N., Pieczynski, W.: Unsupervised sig-
nal restoration using hidden Markov chains with
copulas, SIGNAL PROCESSINGS , 85(12), pp. 2304-2315, May 2005.
[11] Cappe, O., Godsill, S. J., Moulines, E.: An
overview of existing methods and recent advances
in sequential Monte Carlo, PROCEEDINGS OF
THE IEEE , 95(5), pp. 899-924, May 2007.
[12] Carter, C., Pan, S. Q., Jan, Z. H., et al.: The
vegetative vacuole proteorne of Arabidopsis thalia-
na reveals predicted and unexpected proteins,
PLANT CELL, 16(12), pp. 3285-3303, Dec. 2004.
[13] Chandonia, J. M., Hon, G., Walker, N. S., et
al.: The ASTRAL Compendium in 2004, NUC-
LEIC ACIDS RESEARCH, 32Special (SI), pp. D189-D192, Jan. 2004.
[14] Cohen, I., Sebe, N., Garg, A., et al.: Facial
expression recognition from video sequences: tem-
poral and static modeling, COMPUTER VISION
AND IMAGE UNDERSTANDING, 91(1-2), pp.
160-187, Jul.-Aug. 2003.
[15] Colella, S., Yau, C., Taylor, J. M., et al.:
QuantiSNP: an Objective Bayes Hidden-Markov
Model to detect and accurately map copy number
variation using SNP genotyping data, NUCLEIC
ACIDS RESEARCH, 35(6), pp. 2013-2025, Mar.
2007.
[16] Corander, J., Waldmann, P., Sillanpaa, M. J.:
Bayesian analysis of genetic differentiation be-
tween populations, GENETICS, 163(1), pp. 367-
374, Jan 2003.
[17] D'Andrea, L. D., Regan, L.: TPR proteins: the
versatile helix, TRENDS IN BIOCHEMICAL
SCIENCES, 28(12), pp. 655-662, Dec. 2003.
[18] Derrode, S., Pieczynski, W.: Signal and image
segmentation using Pairwise Markov chains, IEEE
TRANSACTION ON SIGNAL PROCESSING,
52(9), pp. 2477-2489, Sep. 2004.
[19] Dombeck, D. A., Khabbaz, A. N., Collman, F.,
et al.: Imaging large-scale neural activity with cel-
lular resolution in awake, mobile mice, NEURON,
56(1), pp. 43-57, Oct. 2007.
[20] Finn, R. D., Mistry, J., Tate, J., et al.: The
Pfam protein families database, NUCLEIC ACIDS
RESEARCH, 38(S1), pp. D211-D222, Jan. 2010.
[21] Finn, R. D., Tate, J., Mistry, J., et al.: The
Pfam protein families database, NUCLEIC ACIDS
RESEARCH, 36(SI), pp. D281-D288, Jan. 2008.
[22] Fredriksson, R., Lagerstrom, M. C., Lundin, L. G., et al.: The G-protein-coupled receptors in the
human genome form five main families. Phyloge-
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1388
ISSN:2229-6093
netic analysis, paralogon groups, and fingerprints,
MOLECULAR PHARMACOLOGY, 63(6),
pp.1256-1272, Jun. 2003.
[23] Fredriksson, R., Schioth, H. B.: The repertoire
of G-protein-coupled receptors in fully sequenced
genomes, MOLECULAR PHARMACOLOGY,
67(5), pp. 1414-1425, May 2005.
[24] Fridlyand, J., Snijders, A. M., Pinkel, D., et
al.: Hidden Markov models approach to the analy-
sis of array CGH data, JOURNAL OF MULTI-
VARIATE ANALYSIS, 90(1), pp. 132-153, Jul.
2004.
[25] Gerstein, M. B., Bruce, C., Rozowsky, J. S., et
al.: What is a gene, post-ENCODE? History and
updated definition, GENOME RESEARCH, 17(6),
pp. 669-681, Jun. 2007.
[26] Haft, D. H., Selengut, J., Mongodin, E.F., et
al.: A guild of 45 CRISPR-associated (Cas) protein
families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes, PLOS COMPUTATION-
AL BIOLOGY, 1(6), pp. 474-483, Nov. 2005.
[27] Haft, D. H., Selengut, J. D., White, O.: The
TIGRFAMs database of protein families, NUC-
LEIC ACIDS RESEARCH, 31(1), pp. 371-373,
Jan. 2003.
[28] Hoggart, C. J., Parra, E. J., Shriver, M. D., et
al.: Control of confounding of genetic associations
in stratified populations, AMERICAN JOURNAL
OF HUMAN GENETICS, 72(6), pp. 1492-1504,
Jun. 2003.
[29] Jardine, A. K., Lin, D., Banjevic, D.: A review
on machinery diagnostics and prognostics imple-
menting condition-based maintenance, MECHAN-
ICAL SYSTEMS AND SIGNAL PROCESSING,
20(7), pp. 1483-1510, Oct. 2006.
[30] Juncker, A. S., Willenbrock, H., Von Heijne,
G., et al.: Prediction of lipoprotein signal peptides
in Gram-negative bacteria, PROTEIN SCIENCE,
12(8), pp. 1652-1662, Aug. 2003.
[31] Kaell, L., Krogh, A., Sonnhammer, E. L.: Ad-
vantages of combined transmembrane topology and
signal peptide prediction - the Phobius web server,
NUCLEIC ACIDS RESEARCH, 35(S), pp. W429-W432, Jul. 2007.
[32] Kale, A., Sundaresan, A., Rajagopalan, A. N.,
et al.: Identification of humans using gait, IEEE
TRANSACTIONS ON IMAGE PROCESSING,
13(9), pp. 1163-1173, Sep. 2004.
[33] Kall, L., Krogh, A., Sonnhammer, E. L.: A
combined transmembrane topology and signal pep-
tide prediction method, JOURNAL OF MOLECU-
LAR BIOLOGY, 338(5), pp. 1027-1036, May
2004.
[34] Kazmierczak, M. J., Mithoe, S. C. , Boor, K.
J., et al.: Listeria monocytogenes sigma(B) regu-
lates stress response and virulence functions,
JOURNAL OF BACTERIOLOGY, 185(19), pp.
5722-5734, Oct. 2003.
[35] Kim, D. E., Chivian, D., Baker, D.: Protein
structure prediction and analysis using the Robetta
server, NUCLEIC ACIDS RESEARCH, 32(S2),
pp. W526-W531, Jul. 2004.
[36] Kim, H., Melen, K., Osterberg, M., et al.: A
global topology map of the Saccharomyces cerevi-
siae membrane proteome, PROCEEDINGS OF
THE NATIONAL ACADEMY OF SCIENCES OF
THE UNITED STATES OF AMERICA, 103(30),
pp. 11142-11147, Jul. 2006.
[37] Korn, J. M., Kuruvilla, F. G., McCarroll, S.
A., et al.: Integrated genotype calling and associa-
tion analysis of SNPs, common copy number po-
lymorphisms and rare CNVs, NATURE GENET-
ICS, 40(10), pp. 1253-1260, Oct. 2008.
[38] la Cour, T., Kiemer, L., Molgaard, A., et al.: Analysis and prediction of leucine-rich nuclear
export signals, PROTEIN ENGINEERING DE-
SIGN AND SELECTION, 17(6), pp. 527-536, Jun.
2004.
[39] Lagesen, K., Hallin, P., Rodland, E. A., et al.:
RNAmmer: consistent and rapid annotation of ribo-
somal RNA genes, NUCLEIC ACIDS RE-
SEARCH, 35(9), pp.3100-3108, May 2007.
[40] Lai, W. R., Johnson, M. D., Kucherlapati, R.,
et al.: Comparative analysis of algorithms for iden-
tifying amplifications and deletions in array CGH
data, BIOINFORMATICS, 21(19), pp. 3763-3770,
Oct. 2005.
[41] Leslie, C. S., Eskin, E., Cohen, A., et al.:
Mismatch string kernels for discriminative protein
classification, BIOINFORMATICS, 20(4), pp. 467-
476, Mar. 2004.
[42] Li, J., Wang, J.Z.: Automatic linguistic index-
ing of pictures by a statistical modeling approach,
IEEE TRANSACTIONS ON PATTERN ANALY-
SIS AND MACHINE INTELLIGENCE, 25(9), pp.
1075-1088, Sep. 2003.
[43] Li, N., Stephens, M.: Modeling linkage dise-
quilibrium and identifying recombination hotspots using single-nucleotide polymorphism data, GE-
NETICS, 165(4), pp. 2213-2233, Dec. 2003.
[44] Liu, G. Y., Loraine, A. E., Shigeta, R., et al.:
NetAfix: Afiymetrix probesets and annotations,
NUCLEIC ACIDS RESEARCH, 31(1), pp. 82-86,
Jan. 2003.
[45] Loytynoja, A., Goldman, N.: An algorithm for
progressive multiple alignment of sequences with
insertions, PROCEEDINGS OF THE NATIONAL
ACADEMY OF SCIENCES OF THE UNITED
STATES OF AMERICA, 102(30), pp. 10557-
10562, Jul. 2005.
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1389
ISSN:2229-6093
[46] Markou, M., Singh, S.: Novelty detection: a
review - part 1: statistical approaches, SIGNAL
PROCESSING, 83(12), pp. 2481-2497, Dec. 2003.
[47] McKinney, S. A., Joo, C., Ha, T.: Analysis of
single-molecule FRET trajectories using hidden
Markov modeling, BIOPHYSICAL JOURNAL,
91(5), pp. 1941-1951, Sep. 2006.
[48] Mi, H., Guo, N., Kejariwal, A., et al.:
PANTHER version 6: protein sequence and func-
tion evolution data with expanded representation of
biological pathways, NUCLEIC ACIDS RE-
SEARCH, 35(SI), pp. D247-D252, Jan. 2007.
[49] Mi, H. Y., Lazareva-Ulitsky, B., Loo, R., et
al.: The PANTHER database of protein families,
subfamilies, functions and pathways, NUCLEIC
ACIDS RESEARCH, 33(SI), pp. D284-D288, Jan.
2005.
[50] Nielsen, M., Lundegaard, C., Worning, P., et
al.: Reliable prediction of T-cell epitopes using neural networks with novel sequence representa-
tions, PROTEIN SCIENCE, 12(5), pp. 1007-1017,
May 2003.
[51] Patterson, T. A., Thomas, L., Wilcox, C., et
al.: State-space models of individual animal
movement, TRENDS IN ECOLOGY AND EVO-
LUTION, 23(2), pp. 87-94, Feb. 2008.
[52] Pieczyński W.: Pairwise Markov chains, IEEE
TRANSACTION ON PATTERN ANALYSIS
AND MACHINE INTELLIGENCE, 25(5), pp.
634-639, May. 2010.
[53] Pinto, D., Pagnamenta, A. T., Klei, L., et al.:
Functional impact of global rare copy number vari-
ation in autism spectrum disorders, NATURE,
466(7304), pp. 368-372, Jul. 15 2010.
[54] Po, D. D., Do, M. N.: Directional multiscale
modeling of images using the contourlet transform,
IEEE TRANSACTIONS ON IMAGE
PROCESSING, 15(6), pp. 1610-1620, Jun. 2006.
[55] Portilla, J., Strela, V., Wainwright, M.J., et al.:
Image denoising using scale mixtures of Gaussians
in the wavelet domain, IEEE TRANSACTIONS
ON IMAGE PROCESSING, 12(11), pp.1338-
1351, Nov. 2003.
[56] Rabiner, L. R.: A tutorial on hidden Markov
models and selected application in speech recogni-
tion, PROCEEDINGS OF THE IEEE , 77(2), pp.
257-286, 1989.
[57] Rashid, M., Saha, S., Raghava, G. P.: Support
Vector Machine-based method for predicting sub-
cellular localization of mycobacterial proteins us-
ing evolutionary information and motifs, BMC
BIOINFORMATICS, 8(337), Sep. 2007.
[58] Riley, T., Sontag, E., Chen, P., et al.: Tran-
scriptional control of human p53-regulated genes,
NATURE REVIEWS MOLECULAR CELL BI-
OLOGY, 9(5), pp. 402-412, May 2008.
[59] Sadreyev, R., Grishin, N.: COMPASS: A tool
for comparison of multiple protein alignments with
assessment of statistical significance, JOURNAL
OF MOLECULAR BIOLOGY, 326(1), pp. 317-
336, Feb. 2003.
[60] Scheet, P., Stephens, M.: A fast and exible
statistical model for large-scale population geno-
type data: Applications to inferring missing geno-
types and haplotypic phase, AMERICAN JOUR-
NAL OF HUMAN GENETICS, 78(4), pp. 629-
644, Apr. 2006.
[61] Sheikh, H. R., Bovik, A. C.: Image informa-
tion and visual quality, IEEE TRANSACTIONS
ON IMAGE PROCESSING, 15(2), pp. 430-444,
Feb. 2006.
[62] Sheikh, HR, Bovik, A. C., de Veciana, G.: An
information fidelity criterion for image quality as-sessment using natural scene statistics, IEEE
TRANSACTIONS ON IMAGE PROCESS-ING,
14(12), pp. 2117-2128, Dec. 2005.
[63] Siepel, A., Bejerano, G., Pedersen, J. S., et al.:
Evolutionarily conserved elements in vertebrate,
insect, worm, and yeast genomes, GENOME RE-
SEARCH, 15(8), pp. 1034-1050, Aug. 2005.
[64] Siepel, A., Haussler, D.: Phylogenetic estima-
tion of context-dependent substitution rates by
maximum likelihood, MOLECULAR BIOLOGY
AND EVOLUTION, 21(3), pp. 468-488, Mar.
2004.
[65] Soding, J.: Protein homology detection by
HMM-HMM comparison, BIOINFORMATICS,
21(7), pp. 951-960, Apr. 2005.
[66] Soding, J., Biegert, A., Lupas, A. N.: The
HHpred interactive server for protein homology
detection and structure prediction, NUCLEIC AC-
IDS RESEARCH, 33(2), pp. W244-W248, Jul.
2005.
[67] Thomas, P. D., Campbell, M. J., Kejariwal, A.,
et al.: PANTHER: A library of protein families and
subfamilies indexed by function, GENOME RE-
SEARCH, 13(9), pp. 2129-2141, Sep. 2003.
[68] Thomas, P. D., Kejariwal, A., Campbell, M. J.,
et al.: PANTHER: a browsable database of gene
products organized by biological function, using
curated protein family and subfamily classification,
NUCLEIC ACIDS RESEARCH, 31(1), pp. 334-
341, Jan. 2003.
[69] Tunnaclifie, A., Wise, M. J.: The continuing
conundrum of the LEA proteins, NATURWIS-
SENSCHAFTEN, 94(10), pp. 791-812, Oct. 2007.
[70] Vassilatis, D. K., Hohmann, J. G., Zeng, H., et
al.: The G protein-coupled receptor repertoires of
human and mouse, PROCEEDINGS OF THE NA-
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1390
ISSN:2229-6093
TIONAL ACADEMY OF SCIENCES OF THE
UNITED STATES OF AMERICA, 100(8), pp.
4903-4908, Apr. 2003.
[71] Wang, K., Li, M., Hadley, D., et al.:
PennCNV: An integrated hidden Markov model
designed for high-resolution copy number variation
detection in whole-genome SNP genotyping data GENOME RESEARCH, 17(11), pp. 1665-1674,
Nov. 2007.
[72] Wang, L. A., Hu, W. M., Tan, T. N.: Recent
developments in human motion analysis, PAT-
TERN RECOGNITION, 36(3), pp. 585-601, Mar.
2003.
[73] Whisstock, J. C., Lesk, A. M.: Prediction of
protein function from protein sequence and struc-
ture, QUARTERLY REVIEWS OF BIOPHYSICS,
36(3), pp. 307-340, Aug. 2003.
[74] Xu, R., Wunsch, D.: Survey of clustering algo-
rithms, IEEE TRANSACTIONS ON NEURAL NETWORKS, 16(3), pp.645-678, May 2005.
[75] Zhang, Z. M., Henzel, W. J.: Signal peptide
prediction based on analysis of experimentally veri-
fied cleavage sites, PROTEIN SCIENCE, 13(10),
pp. 2819-2824, Oct. 2004.
Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391
IJCTA | July-August 2014 Available [email protected]
1391
ISSN:2229-6093