Upload
rlpoulsen
View
558
Download
0
Embed Size (px)
DESCRIPTION
Presentation prepared for the WNAR conference held at Portland State University in 2009
Citation preview
XPRIME: A Novel Motif Searching Method
Rachel L. Poulsen
Department of StatisticsBrigham Young University
June 15, 2009
Introduction
DNA contains the genetic instructions that uniquely define anorganism
RNA is created to carry genetic instructions from the DNA tothe rest of the cell
The process of DNA “talking” to the rest of the cell is calledtranscription
Introduction
DNA contains the genetic instructions that uniquely define anorganism
RNA is created to carry genetic instructions from the DNA tothe rest of the cell
The process of DNA “talking” to the rest of the cell is calledtranscription
Transcription
DNA
RNA
Transcription
DNA RNA
Transcription
DNA RNA
Position Weight Matrix (PWM) (Hertz et al 1990)
ETS1 TF binding motif
Position: 1 2 3 4 5 6 7 8ACGT
0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.0670.933 0.600 0.0 0.0 0.0 0.133 0.067 0.4000.000 0.000 1.0 1.0 0.0 0.000 0.667 0.0000.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533
Position Weight Matrix (PWM) (Hertz et al 1990)
ETS1 TF binding motif
Position: 1 2 3 4 5 6 7 8ACGT
0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.0670.933 0.600 0.0 0.0 0.0 0.133 0.067 0.4000.000 0.000 1.0 1.0 0.0 0.000 0.667 0.0000.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533
Sequence Logos
Figure: DNA binding motif for the ETS1 TF
De Novo motif searching
Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)
De Novo motif searching
Regular expression enumeration
1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)
De Novo motif searching
Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)
De Novo motif searching
Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating
1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)
De Novo motif searching
Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)
Known Motif Search
1 GREP
2 Database search with scoring function (Hertz et al 1990)
XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously
XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFAC
XPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously
XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior information
XPRIME can search for both de novo motifs and known motifssimultaneously
XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously
Notation and Data
Indices
w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence
The data, zs
zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )
yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not
Notation and Data
Indices
w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence
The data, zs
zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )
yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not
Notation and Data
Indices
w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence
The data, zs
zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )
yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not
Notation and Data
Indices
w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence
The data, zs
zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )
yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not
The Scoring Function
MotifScore = f (y) =w∏
j=1
∑i∈A,C ,G ,T
pij I (yj = i).
Methods: Complete Data Likelihood
(m+1) – component mixture model
L(θ|z) =Ls∏i=1
C (yi )[r1f1(yi )]∆1i [r2f2(yi )]∆2i · · · [rm+1fm+1]∆(m+1)i
f(y) is the Motif Score equation
Methods: Complete Data Likelihood
(m+1) – component mixture model
L(θ|z) =Ls∏i=1
C (yi )[r1f1(yi )]∆1i [r2f2(yi )]∆2i · · · [rm+1fm+1]∆(m+1)i
f(y) is the Motif Score equation
Methods: Priors
fm+1(y) is fixed a priori
∆(m+1)i ’s are missing a priori
f1(y), · · · , fm(y) have product Dirichlet priors such that
π(fm(y)) ∝L∏
j=1
∏k∈(A,C ,G ,T )
papmij
−1
mjk
r also has a Dirichlet prior
π(r) ∝M∏i=1
rari−1
i
Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM(y)
2 Draws r from a Dirichlet distribution
αr =∑L
i=1 ∆Mi + aM
3 Draws pmij from a Dirichlet distribution
αpmij =∑L
i=1
∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij
Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM(y)
2 Draws r from a Dirichlet distribution
αr =∑L
i=1 ∆Mi + aM
3 Draws pmij from a Dirichlet distribution
αpmij =∑L
i=1
∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij
Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM(y)
2 Draws r from a Dirichlet distribution
αr =∑L
i=1 ∆Mi + aM
3 Draws pmij from a Dirichlet distribution
αpmij =∑L
i=1
∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij
Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM(y)
2 Draws r from a Dirichlet distribution
αr =∑L
i=1 ∆Mi + aM
3 Draws pmij from a Dirichlet distribution
αpmij =∑L
i=1
∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij
An Example: ETS1
We hypothesize that ETS1 has a specific binding site
The Data1 ETS1 only2 GABP only3 ETS1 and GABP
ETS1 Binding Motifs
(a) ETS1 from TRANSFAC (b) ETS1 from ETS1 only
(c) ETS1 from GABP only (d) ETS1 from ETS1/GABP
Justification of Prior Information
Pete Hollenhorst sequence logo
Justification of Prior Information
Figure: Motif found without prior specification
Figure: Motif found with prior specification
Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r
Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r
Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r
Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r
Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r