27
Gene Prediction Using Hidden Markov Model & Recurrent Neural Network Ahmed Hani AlGhidani MSc Student in Computer Science at Cairo University Research and SDE at RDI Egypt [email protected]

Gene Prediction Using Hidden Markov Model and Recurrent Neural Network

Embed Size (px)

Citation preview

Gene Prediction Using HiddenMarkov Model

&Recurrent Neural Network

Ahmed Hani AlGhidaniMSc Student in Computer Science at Cairo University

Research and SDE at RDI Egypt

[email protected]

Agenda

• DNA Structure- Eukaryotic and Prokaryotic Cells

• Gene Prediction Methods- Empirical Methods- Ab initio Methods

• Hidden Markov Model- Existed HMM-based systems

• Recurrent Neural Network• Other Methods

DNA Structure

DNA Structure (Cont.)

• Prokaryotic Cells

• Most of DNA is coding• No Introns• Promoters

DNA Structure (Cont.)

• Eukaryotic Cells

• Exons (Coding)• Introns (Non-Coding)• Acceptors (End of Intron in 5’ direction)• Donors (Start of Intron in 5’ direction)

DNA Structure (Cont.)

• Eukaryotic Cells (cont.)

Gene Prediction

• Get the exons regions that would betranslated to Amino Acid (Protein)

Gene Prediction (Cont.)

• Empirical methods are used for specificallyProkaryotic cells

• Most of it is coding regions and no introns

• Feature Engineering method

• Open Reading Frames (ORFs)

Gene Prediction (Cont.)

Gene Prediction (Cont.)

• Pros- Simple and easy for implementation- Works well with Prokaryotic DNAbecause of its simplicity

• Cons- Bad performance in large sequences- Works bad with complex DNA such asEukaryotic DNA

Gene Prediction (Cont.)

• Ab initio methods for Eukaryotic cells

• Depend on statistical methods andcomputational models

• Features Engineering could be involved inthe computations

• Hidden Markov Model and RecurrentNeural Networks

Hidden Markov Model

• The basic idea is Markov Chains•

• Set of finite states

• Transition Matrix

Hidden Markov Model (Cont.)

Hidden Markov Model (Cont.)

• Practically, it may be hard to access thepatterns or classes that we want to predict

• We need indicators (visible states) toobtain the hidden patterns

Hidden Markov Model (Cont.)

Hidden Markov Model (Cont.)• Observations Probability Estimation

- Estimate the probability of observationsequence given the model

• Optimal Hidden State Sequence- Determine the optimal sequence of thehidden states

• HMM Parameters Estimation- Get the model parameters that maximizesthe probability of specific observationsgiven specific states

Hidden Markov Model (Cont.)

• In Gene Prediction, the observations arethe A, C, G, T sequences, and the hiddenstates are Exons, Introns and Other

• Use the training data to set the modelparameters (problem 3) using Baum-Welch algorithm

• For the given observations, we predict thestates (problem 2) using Viterbi algorithm

Hidden Markov Model (Cont.)

Hidden Markov Model (Cont.)

Neural Network (Cont.)

• Unexplored area in Bioinformatics

• No need for features engineering

• Outperforms old-school Machine Learning

• Based on Biological philiosophy!

Neural Network (Cont.)

Recurrent Neural Networks

Recurrent Neural Networks(Cont.)

Recurrent Neural Networks(Cont.)

• Acceptor/Donor experiments

Recurrent Neural Networks(Cont.)

• Exons/Introns still in progress

• Dataset size is 800K sequences

• Sequences aren’t fixed-size

• LSTM instead of Vanilla RNN

• Tensorflow

Other Methods

• Naive Bayesian + Statistical Features

• Hidden Markov Model Support VectorMachine (HMM-SVM)

• Open Reading Frames + Hidden MarkovModel

• Open Reading Frames + StatisticalFeatures + Hidden Markov Model

References• http://bpg.utoledo.edu/~afedorov/lab/eid.html• http://www.ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf• http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-62

• https://github.com/AhmedHani/Hidden-Markov-Model• https://ahmedhanibrahim.wordpress.com/2015/10/25/hidden-markov-models-hmms-part-i/

• http://www.cbcb.umd.edu/software/Glim-merHMM/man.shtml?tid%5B%5D=44&=Apply

• http://www.math.uwaterloo.ca/~aghodsib/courses/w05stat440/w05stat440-notes/feb27.pdf

• https://en.wikipedia.org/wiki/GLIMMER• https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci-ence/6-096-algorithms-for-computational-biology-spring-2005/lecture-notes/lecture7.pdf

• https://www.cs.us.es/~fran/students/julian/gene_finding/gene_find-ing.html

• http://www.nature.com/nbt/journal/v25/n8/full/nbt0807-883.html• http://gobics.de/mario/papers/diss.pdf• https://www.ncbi.nlm.nih.gov/books/NBK21132/• https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junc-tion+Gene+Sequences)