60
DNA Motif and protein domain discovery Presented by: Deeter Neumann Peter St. Andre PDB; zinc finger 224 PDB; human enhancer binding protein

DNA Motif and protein domain discovery Presented by: Deeter Neumann Peter St. Andre PDB; zinc finger 224 PDB; human enhancer binding protein

Embed Size (px)

Citation preview

DNA Motif and protein domain discovery

Presented by:

Deeter Neumann

Peter St. Andre

PDB; zinc finger 224 PDB; human enhancer binding protein

Outline

What are DNA motifs & proteins domains?

Their importance and function

motif algorithms

locating domain/motif experimentally

available programs: PFAM & SMART

Taken fromwikimedia.org

What are DNA sequence motifs?

“Sequence motifs are short recurring patterns in DNA that are presumed to have biological

function.”D’haeseleer, P. Nature Biotechnology 24, 423 - 425 (2006).

Image taken from bio.miami.edu

Indicates common structural protein domains

Identifies similar function

Other possible biological functions, eg. transcription factors, mRNA processing

Why are DNA sequence motifs important to know?

What is the function of DNA domains?

specific and non-specific interactions

permits binding of transcription factor to target gene

sequence-specific recognition

Human Molecular Genetics 3; Strachan & Read

What are protein domains?Protein sequences and structures that evolve,

function, and exist independently from the rest of the protein

They often form functional

units, like metal

binding domains

Image of human zinc finger domain

Taken from .ionchannels.org

7

Why are Proteins Domains Important?

7

Bind to other molecules in the cell

Signal transduction pathways

Genetically engineering novel proteins

Pharmaceutical importance

Algorithmic Approaches for both DNA motifs and protein domain searches

Three general approaches are used:

Enumeration

Deterministic optimization

Probabilistic optimization

Enumeration

Employs the broadest approach

Looks at all possible motifs

Few limitations are enacted on it

Enumeration, cont.

Key point: Covers all possible sequence motifs with few limitations

Pros: Does not get stuck in local optimum

Cons: May overlook subtle patterns

Programs like WeederWeb and YMF use these type of algorithms

WeederWeb

WeederWeb Results

Deterministic optimization

Takes into account an Expectation Maximization model and a position weight matrix

MEME is one program that uses this approach

What does this mean?

Deteriministic optimization, cont.

Deterministic optimization, cont.

Taken from ws.nbcr.net/app1234127263839/meme.html

Probabilistic optimization

Uses a Gibbs sampling approach– Randomized implementation of expectation

maximization model

How is this applied?

Probabilistic optimization, cont.

Selects random sites and each is weighted against known motifs

Allows program to add or remove sequences and continuously update motifs

19

Results

Which one to use?

Recent research showed that enumeration approaches worked very well

Generally accepted that no one approach is the best

Programs that incorporate several approaches work the best

Important to rerun programs

Examples of programs

WeederWeb is a web-based interface with an enumerative approach

YMF is another enumerative program

MEME is an online program that uses a deterministic optimization approach

MotifSampler is a program that combines Gibbs sampling and a third order Markov model

YMF

YMF results

Measurements used to score sequence motifs

Three main statistics used:

Information content

Log likelihood

MAP score

Other measures of motif quality

Group specificity, or site specificity• Probability of having a certain number of target

sequences with the site in question

Sequence specificity• Accounts for both number of sequences with the sites in

question and the number of sites per sequence

Positional bias, or uniformity• Looks at how uniform of the sites in question are

distribute with respect to transcription start sites of the gene

Identification and preliminary characterization of a protein

motif related to the zinc finger

Lovering et al. (1993)

What is a zinc finger?

PDB; single zinc finger in solution

autonomously folding domain

structural motif

zinc required for folding and DNA

interactions

part of protein that is used to regulate DNA

Classic zinc finger

conserved cysteines and histidines

binds with zincTetrahedral structure

antiparallel two-stranded β-sheets and an α-helix

image from wikipedia

Figure 1A

Lovering et al.

Actual RING1 sequence

MTTPANAQNASKTWELSLYELHRTPQEAIMDGTEIAVSPRSLHSELMCPICLDMLKNTMTTKECLHRFCSDCIVTALRSGNKECPTCRKKLVSKRSLRPDPNFDALISKIYPSREEYEAHQDRVLIRLSRLHNQQALSSSIEEGLRMQAMHRAQRVRRPIPGSDQTTTMSGGEGEPGEGEGDGEDVSSDSAPDSAPGPAPKRPRGGGAGGSSVGTGGGGTGGVGGGAGSEDSGDRGGTLGGGTLGPPSPPGAPSPPEPGGEIELVFRPHPLLVEKGEYCQTRYVKTTGNATVDHLSKYLALRIALERRQQQEAGEPGGPGGGASDTGGPDGCGGEGGGAGGGDGPEEPALPSLEGVSEKQYTIYIAPGGGAFTTLNGSLTLELVNEKFWKVSRPLELCYAPTKDPK

RING finger

Cys1-Xaa-hydrophobic aa-Cys2-Xaa9-27-Cys3-Xaa1-3-His-Xaa-hydrophobic aa-Cys4-Xaa2-Cys5-hydrophobic aa-

Xaa5-47-Cys6-Xaa2-Cys7

Figure 1B

Fig. 1B Lovering et al.

Gene expression similar in variety of cell lines

Figure 2

Lovering et al.

DNA binding

regulation

recombination

repair

RING1 peptide

55 aa synthetic peptide (residues 12-66 in RING1 seq) RING finger

metal binding ---> prefers Zinc

cobalt

cadmium

copper

Figure 3A

Fig. 3A Lovering et al.

___ cobalt

----- zinc

S-C0(II)

Co(II) d-d transitions

Figure 4A

Zinc dependence binding

RING1 function1992 No known function (not published until 1993)

2004 Inhibit transactivation of recombination signal binding protein-J (RBP-J) (Hongyan et al.)

Ubiquitin-protein ligases

Pfam databasehttp://pfam.sanger.ac.uk/

Database that contains large collection of protein domains and families

Represented as sequence alignments and HMMs

List of key features about protein

New interface that combined other Pfam versions

New updates have made it more user-friendly

Pfam search of RING1

Pfam search

Pfam search results

Pfam search results

Pfam link out

HMM logo of sequence motif

SMART

Multiple sequence alignment of members

>400 domains in >54,000 different proteins

Searches database using HMMs

http://smart.embl-heidelberg.de/

SMART2 different modes

normal

swiss-Prot

SP-TrEMBL

ensemble

genomic

proteomes of sequenced genomes

SMART

SMART

SMART

SMART

SMART

52

SMART

SMART

54

More motif madness

55

PRINTS

56

PRINTS

57

PROSITE

58

PROSITE

59

Questions?

60

How primitive is this RING-finger motif? The author only discusses genes containing this motif that come from eukaryotes. Is this motif found in prokaryotes as well?