View
30
Download
0
Category
Tags:
Preview:
DESCRIPTION
The FREAKS Session 3 .1: Repeats Session 3 .2: Biased regions. of PROTEIN SEQUENCE. Miguel Andrade Johannes-Gutenberg University of Mainz A ndrade @uni-mainz.de. Frequency. 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. - PowerPoint PPT Presentation
Citation preview
The FREAKS
Session 3.1: Repeats Session 3.2: Biased regions
Miguel AndradeJohannes-Gutenberg University of Mainz
Andrade@uni-mainz.de
of PROTEIN SEQUENCE
Frequency
14% proteins contains repeats (Marcotte et al, 1999)
1: Single amino acid repeats.
2: Longer imperfect tandem repeats. Assemble in structure.
Definition repeatsSequence, long, imperfect, tandem
MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASFGSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN
Definition repeatsSequence, long, imperfect, tandem
MRAVVKSPIMCHEKSPSVCSPLNMTSSVCSPAGINSVSSTTASFGSFPVHSPITQGTPLTCSPNVENRGSRSHSPAHASNVGSPLSSPLSSMKSSISSPPSHCSVKSPVSSPNNVTLRSSVSSPANINN
Definition repeatsSequence, long, imperfect, tandem
MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN
Definition repeatsSequence, long, imperfect, tandem
MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN
Tandem repeats fold together
Tandem repeats fold together
Tandem repeats fold together
Tandem repeats fold together
Tandem repeats fold together
Tandem repeats fold together
Definition repeatsSequence, long, imperfect, tandem
MRAVVKSPIM CHEKSPSVCSPLNMTSSVCSPAG INSVSSTTASFGSFPVHSPIT QGTPLTCSPNV ENRGSRSHSPAH ASNVGSPLSSPLS SMKSSISSPPS HCSVKSPVSSPNN VTLRSSVSSPAN INN
(Vlassi et al, 2013)
http://weblogo.berkeley.edu
Andrade et al. (2001) J Struct Biol
Definition CBRs
Perfect repeat: QQQQQQQQQQQImperfect: QQQQPQQQQQQAmino acid type: DDDDDEEEDEDEED
Compositionally biased regions (CBRs)
High frequency of one or two amino acids in a region.
Particular case of low complexity region
Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection CBRsSometimes straightforward. N-terminal human Huntingtin. How many CBRs can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection repeatsSometimes straightforward. N-terminal human Huntingtin. How many repeats can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?
>sp|P42858|HD_HUMAN Huntingtin OS=Homo sapiens MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQP LLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPE FQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAP RSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFG NFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLV PVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL TLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSI VELIAGGGSSCSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAA SSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAV PSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQPQD EDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQE NKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVG AAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSIL
Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?
EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSTQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLPSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT
Detection repeatsOften NOT straightforward. N-terminal human Huntingtin. How many repeats can you find?
EFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKA CRPYLVNLLPCLTRTSKRP-EESVQETLAAAVPKIMAS NDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSTQYFYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLPSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRT
The FREAKS
Session 3.1: Repeats Session 3.2: Biased regions
Miguel AndradeJohannes-Gutenberg University of Mainz
Andrade@uni-mainz.de
of PROTEIN SEQUENCE
Frequency repeatsFraction of proteins annotated with the keyword REPEAT in SwissProt
%Archaea 27/3428 0.79Viruses 81/8048 1.00Bacteria 299/28438 1.05Fungi 232/8334 2.78Viridiplantae 153/6963 2.20Metazoa 1538/28948 5.31Rest of Eukaryota 92/2434 3.78
(Andrade et al 2001)
Detection of repeats
Dotplots
Comparing a sequence against itself
Detection of repeats
Dotplots
TLRSSVSSPANINNS
NMTSSVCSPANISV
Detection of repeats
Dotplots
TLRSSVSSPANINNS
NMTSSVCSPANISV | 1 match
Detection of repeats
Dotplots
TLRSSVSSPANINNS
NMTSSVCSPANISV ||| ||||| 8 matches
Detection of repeats
Dotplots
TLRSSVSSPANINNS
NMTSSVCSPANISV | | 2 matches
Detection of repeats
Dotplots
TLRSSVSSPANINNS
NMTSSVCSPANISV | 1 match
Detection of repeats
Dotplots
TLRSSVSSPANINNS NMTSSVCSPANISV
8
Detection of repeats
Dotplots
TLRSSVSSPANINNS NMTSSVCSPANISV
1821
Exercise 1
• Obtain the sequence from UniProt
• Go to the Dotlet web page
• Click on the input button and paste the sequence there
• Try to find combinations of parameters that show patterns in the dot plot
• Find repetitions clicking in the diagonal patterns
Exercise 1. Using Dotlet with the human mineralocorticoid receptor (MR)
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by A
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by AWhich one is not matched?LPTA, SPAA, LPPA, LPAP, SPLA
Detection of repeatsUsing a multiple sequence alignment helpsConserved repeated patterns
JalView with Regular Expression searches
Regular Expressions:[LS]P.Amatches L or S, followed by P, followed by anything, followed by AWhich one is not matched?LPTA, SPAA, LPPA, LPAP, SPLA
• Run JalView using the JNLP file in desktop (from http://www.jalview.org/Download)
• Load this MSA in JalView
• Use the "find" option with a regular expression and mark all matches
• Try to find the expression that matches more repeats. How many repeats do you see? How long are they? Would you correct the alignment based on these findings?
Exercise 2. Using JalView with a MSA of the MR with orthologs
#T1
#T13#T12#T11#T10#T9#T8
#T7#T2 #T3 #T4 #T5 #T6
#F1 #F2 #F3 #F4
#F5 #F10#F9#F8#F7#F6
#T14 #T15
#F11
** *
* * **
(Vlassi et al, 2013)
Andrade and Bork (1995) Nature Genetics
A subunit PP2A structure
PDB:1b3uGroves et al. (1999) Cell
Ap1 Clathrin Adaptor Core
PDB:1w63Heldwein et al. (2004) PNAS
Ap1 Clathrin Adaptor Core
PDB:1w63Heldwein et al. (2004) PNAS
Neural Network!Secondary structureTransmembrane helicesResidue exposure
Andrade, Petosa, O'Donoghue, Müller and Bork (2001) J Mol Biol
Neural Network
Backpropagation neural network
…GTAARTWCGASLFVPRLLAGHVDITSLVRALAKSGDLFVARSTKT…
Central position
Output neuron = 0.1 - 0.9
Hidden layer n=3
Input layer n=39x20
Architecture
Palidwor et al. (2009) PLoS Comp Biol
Neural NetworkArchitecture
H1 H2
L
H1 H2L
Fournier et al. (2013) PLoS One
Virus/phages 1379599 16 1.16E-05
Archaea 362208 296 8.17E-04Euryarchaeota 225118 247 1.10E-03
Crenarchaeota 100611 32 3.18E-04
Bacteria 14505441 3939 2.72E-04
Acidobacteria 40456 27 6.67E-04
Actinobacteria 1634898 462 2.83E-04
Bacteroidetes 724491 135 1.86E-04
Chlamydiae 103375 155 1.50E-03
Chloroflexi 57584 74 1.29E-03
Cyanobacteria 279184 549 1.97E-03
Firmicutes 3837822 627 1.63E-04
Planctomycetes 56777 129 2.27E-03
Proteobacteria 7220418 1599 2.21E-04
Spirochetes 135404 100 7.39E-04
Eukaryota 5710673 14659 2.57E-03
Apicomplexa 129180 237 1.83E-03
Ciliophora 66444 128 1.93E-03
Bacillariophyta 24672 74 3.00E-03
Viridiplantae 1577040 4086 2.59E-03
Chlorophyta 82919 321 3.87E-03
Streptophyta 930229 1463 1.57E-03
Kinetoplastida 119358 385 3.23E-03
Mycetozoa 34276 103 3.01E-03
Phaeophyceae 21376 63 2.95E-03
Fungi 1256144 4228 3.37E-03
Metazoa 2796864 6873 2.46E-03
Nematodes 223421 365 1.63E-03
Trematodes 39558 97 2.45E-03
Insecta 694809 1173 1.69E-03
Sarcopterygii 1145548 3909 3.41E-03
0
4E-03
Fournier et al. (2013) PLoS One
http://cbdm.mdc-berlin.de/~ard2/
• Go to the ARD2 web page
• Paste this sequence (1-780 fragment of human huntingtin) in the input window
• Run ARD2 and interpret the output
Exercise 3. Detecting repeats in human huntingtin
• Go to the PDBPaint web page
• In the "Query PDB" window type "2IE3", in the "web service" menu choose the "ARD2" option and select a large window size (e.g. 800*800)
• Hit the "Go!" button
• Turn around the structure and examine the correspondence between the hits and the structure
Exercise 4. Viewing detected repeats in a protein structure
Recommended