Upload
janis-scott
View
223
Download
2
Tags:
Embed Size (px)
Citation preview
DNA Motif and protein domain discovery
Presented by:
Deeter Neumann
Peter St. Andre
PDB; zinc finger 224 PDB; human enhancer binding protein
Outline
What are DNA motifs & proteins domains?
Their importance and function
motif algorithms
locating domain/motif experimentally
available programs: PFAM & SMART
Taken fromwikimedia.org
What are DNA sequence motifs?
“Sequence motifs are short recurring patterns in DNA that are presumed to have biological
function.”D’haeseleer, P. Nature Biotechnology 24, 423 - 425 (2006).
Image taken from bio.miami.edu
Indicates common structural protein domains
Identifies similar function
Other possible biological functions, eg. transcription factors, mRNA processing
Why are DNA sequence motifs important to know?
What is the function of DNA domains?
specific and non-specific interactions
permits binding of transcription factor to target gene
sequence-specific recognition
Human Molecular Genetics 3; Strachan & Read
What are protein domains?Protein sequences and structures that evolve,
function, and exist independently from the rest of the protein
They often form functional
units, like metal
binding domains
Image of human zinc finger domain
Taken from .ionchannels.org
7
Why are Proteins Domains Important?
7
Bind to other molecules in the cell
Signal transduction pathways
Genetically engineering novel proteins
Pharmaceutical importance
Algorithmic Approaches for both DNA motifs and protein domain searches
Three general approaches are used:
Enumeration
Deterministic optimization
Probabilistic optimization
Enumeration
Employs the broadest approach
Looks at all possible motifs
Few limitations are enacted on it
Enumeration, cont.
Key point: Covers all possible sequence motifs with few limitations
Pros: Does not get stuck in local optimum
Cons: May overlook subtle patterns
Programs like WeederWeb and YMF use these type of algorithms
Deterministic optimization
Takes into account an Expectation Maximization model and a position weight matrix
MEME is one program that uses this approach
What does this mean?
Probabilistic optimization
Uses a Gibbs sampling approach– Randomized implementation of expectation
maximization model
How is this applied?
Probabilistic optimization, cont.
Selects random sites and each is weighted against known motifs
Allows program to add or remove sequences and continuously update motifs
18
AlignAce 3.0
Which one to use?
Recent research showed that enumeration approaches worked very well
Generally accepted that no one approach is the best
Programs that incorporate several approaches work the best
Important to rerun programs
Examples of programs
WeederWeb is a web-based interface with an enumerative approach
YMF is another enumerative program
MEME is an online program that uses a deterministic optimization approach
MotifSampler is a program that combines Gibbs sampling and a third order Markov model
Measurements used to score sequence motifs
Three main statistics used:
Information content
Log likelihood
MAP score
Other measures of motif quality
Group specificity, or site specificity• Probability of having a certain number of target
sequences with the site in question
Sequence specificity• Accounts for both number of sequences with the sites in
question and the number of sites per sequence
Positional bias, or uniformity• Looks at how uniform of the sites in question are
distribute with respect to transcription start sites of the gene
Identification and preliminary characterization of a protein
motif related to the zinc finger
Lovering et al. (1993)
What is a zinc finger?
PDB; single zinc finger in solution
autonomously folding domain
structural motif
zinc required for folding and DNA
interactions
part of protein that is used to regulate DNA
Classic zinc finger
conserved cysteines and histidines
binds with zincTetrahedral structure
antiparallel two-stranded β-sheets and an α-helix
image from wikipedia
Actual RING1 sequence
MTTPANAQNASKTWELSLYELHRTPQEAIMDGTEIAVSPRSLHSELMCPICLDMLKNTMTTKECLHRFCSDCIVTALRSGNKECPTCRKKLVSKRSLRPDPNFDALISKIYPSREEYEAHQDRVLIRLSRLHNQQALSSSIEEGLRMQAMHRAQRVRRPIPGSDQTTTMSGGEGEPGEGEGDGEDVSSDSAPDSAPGPAPKRPRGGGAGGSSVGTGGGGTGGVGGGAGSEDSGDRGGTLGGGTLGPPSPPGAPSPPEPGGEIELVFRPHPLLVEKGEYCQTRYVKTTGNATVDHLSKYLALRIALERRQQQEAGEPGGPGGGASDTGGPDGCGGEGGGAGGGDGPEEPALPSLEGVSEKQYTIYIAPGGGAFTTLNGSLTLELVNEKFWKVSRPLELCYAPTKDPK
RING finger
Cys1-Xaa-hydrophobic aa-Cys2-Xaa9-27-Cys3-Xaa1-3-His-Xaa-hydrophobic aa-Cys4-Xaa2-Cys5-hydrophobic aa-
Xaa5-47-Cys6-Xaa2-Cys7
RING1 peptide
55 aa synthetic peptide (residues 12-66 in RING1 seq) RING finger
metal binding ---> prefers Zinc
cobalt
cadmium
copper
RING1 function1992 No known function (not published until 1993)
2004 Inhibit transactivation of recombination signal binding protein-J (RBP-J) (Hongyan et al.)
Ubiquitin-protein ligases
Pfam databasehttp://pfam.sanger.ac.uk/
Database that contains large collection of protein domains and families
Represented as sequence alignments and HMMs
List of key features about protein
New interface that combined other Pfam versions
New updates have made it more user-friendly
SMART
Multiple sequence alignment of members
>400 domains in >54,000 different proteins
Searches database using HMMs
http://smart.embl-heidelberg.de/