38
Multiple sequence alignments and motif discovery Tutorial 5

Multiple sequence alignments and motif discovery

  • Upload
    olive

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Tutorial 5. Multiple sequence alignments and motif discovery. Multiple sequence alignments and motif discovery. Multiple sequence alignment ClustalW Muscle Motif discovery MEME Jaspar. A. C. D. B. Multiple Sequence Alignment. More than two sequences DNA Protein - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple sequence alignments  and motif discovery

Multiple sequence alignments and motif discovery

Tutorial 5

Page 2: Multiple sequence alignments  and motif discovery

• Multiple sequence alignment– ClustalW– Muscle

• Motif discovery– MEME– Jaspar

Multiple sequence alignments and motif discovery

Page 3: Multiple sequence alignments  and motif discovery

• More than two sequences– DNA– Protein

• Evolutionary relation– Homology Phylogenetic tree– Detect motif

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 4: Multiple sequence alignments  and motif discovery

• Dynamic Programming– Optimal alignment– Exponential in #Sequences

• Progressive– Efficient– Heuristic

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 5: Multiple sequence alignments  and motif discovery

ClustalW

“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

Pairwise alignment – calculate distance

matrix

Guided tree

Progressive alignment using the

guide tree

Page 6: Multiple sequence alignments  and motif discovery

ClustalW

• Progressive– At each step align two existing alignments or

sequences– Gaps present in older alignments remain fixed

-TGTTAAC-TGT-AAC-TGT--ACATGT---CATGT-GGC

Page 7: Multiple sequence alignments  and motif discovery

ClustalW - Inputhttp://www.ebi.ac.uk/Tools/clustalw2/index.html

Input sequences

Gap scoring

Scoring matrix

Email address

Output format

Page 8: Multiple sequence alignments  and motif discovery

ClustalW - Output

Match strength in decreasing order: * : .

Page 9: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 10: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 11: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 12: Multiple sequence alignments  and motif discovery

ClustalW - Output

Pairwise alignment scores

Building alignment

Final score

Building tree

Page 13: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 14: Multiple sequence alignments  and motif discovery

ClustalW Output

Sequence names Sequence positions

Match strength in decreasing order: * : .

Page 15: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 16: Multiple sequence alignments  and motif discovery

ClustalW - Output

Branch length

Page 17: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 18: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 19: Multiple sequence alignments  and motif discovery

http://www.ebi.ac.uk/Tools/muscle/index.html

Muscle

Page 20: Multiple sequence alignments  and motif discovery

Muscle - output

Page 21: Multiple sequence alignments  and motif discovery

What’s the difference between Muscle and ClustalW?

ClustalW Muscle

Page 22: Multiple sequence alignments  and motif discovery

http://www.megasoftware.net/index.html

Page 23: Multiple sequence alignments  and motif discovery

Can we find motifs using multiple sequence alignment?

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

1 3 5 7 9..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *:

MotifA widespread pattern with a biological significance

Page 24: Multiple sequence alignments  and motif discovery

Can we find motifs using multiple sequence alignment?

YES! NO

Page 25: Multiple sequence alignments  and motif discovery

MEME – Multiple EM* for Motif finding

• http://meme.sdsc.edu/• Motif discovery from unaligned sequences

– Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in

some sequences or appear several times in one sequence)

*Expectation-maximization

Page 26: Multiple sequence alignments  and motif discovery

MEME - InputEmail address

Input file (fasta file)

How many times in each

sequence?

How many motifs?

How many sites?

Range of motif

lengths

Page 27: Multiple sequence alignments  and motif discovery

MEME - Output

Motif score

Page 28: Multiple sequence alignments  and motif discovery

MEME - Output

Motif length

Number of times

Motif score

Page 29: Multiple sequence alignments  and motif discovery

MEME - Output

Low uncertainty

=

High information content

Page 30: Multiple sequence alignments  and motif discovery

MEME - Output

Multilevel Consensus

Page 31: Multiple sequence alignments  and motif discovery

Sequence names

Position in sequence

Strength of match

Motif within sequence

MEME - Output

Page 32: Multiple sequence alignments  and motif discovery

Overall strength of motif matches

Motif location in the input sequence

MEME - OutputSequence names

Page 33: Multiple sequence alignments  and motif discovery

MAST

• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for

searching the discovered motifs on the given sequences.

http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi

Page 34: Multiple sequence alignments  and motif discovery

MEME - Input

Email address

Input file (motifs)

Database

Page 35: Multiple sequence alignments  and motif discovery

JASPAR

• Profiles – Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of experiments

• Open data accesss

Page 36: Multiple sequence alignments  and motif discovery

JASPAR• profiles

– Modeled as matrices.– can be converted into PSSM for scanning genomic

sequences.

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

Page 37: Multiple sequence alignments  and motif discovery

Search profile

http://jaspar.genereg.net/

Page 38: Multiple sequence alignments  and motif discovery

scoreorganism logoName of gene/protein