105
/105 © Burkhard Rost 1 title: Computational Biology 1 - Protein structure: Intro into protein structure short title: cb1_intro2_structure lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016

cb1f 20180424 intro2 structure - Rostlab · 2018. 4. 30. · cb1_intro2_structure lecture: Protein Prediction 1 - Protein structure

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • /105© Burkhard Rost

    �1

    title: Computational Biology 1 - Protein structure: 
 Intro into protein structure

    short title: cb1_intro2_structure

    lecture: Protein Prediction 1 - Protein structure
 Computational Biology 1 - TUM Summer 2016

  • /105© Burkhard Rost

    Videos: YouTube / www.rostlab.org/talks 
THANKS :

. EXERCISES: Special lectures: • TBN No lecture: • 04/26 Security check Rostlab (exercise WILL be) • 05/01 May Day (also no exercise) • 05/10 Ascension Day (also no exercise) • 05/22 Whitsun holiday (also no exercise) • 05/31 Corpus Christi (also no exercise) • 06/21 no lecture (but exercise) LAST lecture: bef: Jul 14
 after: Jul xx Examen: x(!!) J TBA • Makeup: Oct 16 (TBC)

    �2

    CONTACT: [email protected]

    Dmitrij Nechaev

    Your Name

    Lothar Richter

    Michael Heinzinger

    http://www.rostlab.orgmailto:[email protected]

  • /105© Burkhard Rost

    �3

    Questions about last lecture?

    ??

    ???

    © Wikipedia

  • © Burkhard Rost

    Recap

    �4

  • /105© Burkhard Rost

    common to life: DNA/cellsDNA->RNA->proteins = machinery of life

    �5

    Recap: genes - proteins

  • /105© Burkhard Rost

    �6

    Central dogma

    slide: Andrea [email protected]

    Structure

    Function

  • /105© Burkhard Rost

    �7

    Codon wheel

    © yourgenome.org slide: Andrea Schafferhans

  • /105© Burkhard Rost

    run: 6 hoursfull capacity: ~5-10 TB data / day

    �8

    Illumina: MiSeq

    Illumina - Early 2012

  • /105© Burkhard Rost

    common to life: DNA/cellsDNA->RNA->proteins = machinery of lifeproteins made up of amino acids (20 different: like pearl chains with pearls of 20 different sizes & “colors”, i.e. biophysical features)~20k proteins in human~11M protein sequences knownprotein length (number of amino acids): 35-30K

    �9

    Recap: genes - proteins

  • /105© Burkhard Rost

    �10

    1ej9 Topoisomerase

    1i6h RNA Polymerase

    1ais TATA-binding Protein/Transcription Factor IIB

    1cgp Catabolite GeneActivator Protein

    1tau DNA Polymerase

    1lbh+1efa lacRepressor

    1aoi Nucleosome

    1aoi Nucleosome

    1b7t Myosin 1atn Actin

    1tub Microtubule1bkv Collagen

    © David Goodsell

    scale reduced

  • /105© Burkhard Rost

    �11

    Cells: outside & inside

    Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /105© Burkhard Rost

    Previous lecture• Organisms, genes, central dogma

    TODAY: Protein introduction• Amino acids • Protein structure • Bonds & energies • domains • 3D comparisons

    NEXT lectures• sequence comparisons/alignments

    �12

    TOC today

  • © Burkhard Rost

    Protein Prediction I: Beginners

    1 Introduction

    1.2 - Proteins/domains

    1.3 - 3D comparisons

    �13

  • © Burkhard Rost

    Reality and images

    �14

  • /105© Burkhard Rost

    �15

    slide: Marco Punta

  • /105© Burkhard Rost

    Georges Braque - Houses at L'Estaque

    �16

    slide: Marco Punta

  • /105© Burkhard Rost

    �17

    Where is that?

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /105© Burkhard Rost

    �18

    Mycoplasma genitalium

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /105© Burkhard Rost

    �19

    Eukaryotic cell

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /105© Burkhard Rost

    �20

    slide: Marco Punta

  • /105© Burkhard Rost

    Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity �21

    slide: Marco Punta

  • /105© Burkhard Rost

    �22

    http://www.proteopedia.org/wiki/images/7/7b/1htb2.png

    Alcohol dehydrogenase (ADH)

    PC Sanghani, H Robinson, WF Bosron, TD Hurley (2002) Biochemistry 41:10778-86

    http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Protein_ADH5_PDB_1m6h.png/800px-Protein_ADH5_PDB_1m6h.png

    homodimer ADH5

    http://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.png

  • /105© Burkhard Rost

    �23

    slide: Marco Punta

  • /105© Burkhard Rost

    Umberto Boccioni - Dynamism of a soccer player

    �24

    slide: Marco Punta

  • /105© Burkhard Rost

    Photograph: Filippo Monteforte/AFP/Getty Images

    �25

    slide: Marco Punta

    Umberto Boccioni - Dynamism of a soccer player

  • /105© Burkhard Rost

    �26

    Different levels of abstraction

    Wu et al. unpublished

    Photograph: Filippo Monteforte/AFP/Getty Images slide: Marco Punta

    Umberto Boccioni - Dynamism of a soccer player

  • © Burkhard Rost

    Constituents of proteins: amino

    acids

    �27

  • /105© Burkhard Rost

    CH

    R

    H NH

    C O

    OH

    backbone

    side-chain

    �28

    Amino acid

    slide: Marco Punta

  • /105© Burkhard Rost

    CH

    R

    H NH

    C OH

    backbone

    side-chain

    O

    isolated amino acid

    �29

    Joining amino acids into proteins

    slide: Marco Punta

  • /105© Burkhard Rost

    �30

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    slide: Marco Punta

  • /105© Burkhard Rost

    �31

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipediawww.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    slide: Marco Punta

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /105© Burkhard Rost

    �32

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    slide: Marco Punta

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /105© Burkhard Rost

    �33

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /105© Burkhard Rost

    �34

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /105© Burkhard Rost

    �35

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /105© Burkhard Rost

    �36

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    Obackbone

    side-chain

    CH

    R

    NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    N’

    C’

    polypeptide chain

  • /105© Burkhard Rost

    �37

    Joining amino acids into proteins

    CH

    R

    H NH

    C OH

    backbone

    side-chain

    O

  • /105© Burkhard Rost

    �38

    Joining amino acids into proteins

  • /105© Burkhard Rost

    �39

    Joining amino acids into proteins

  • /105© Burkhard Rost

    �40

    Joining amino acids into proteins

  • © Burkhard Rost

    Rationalizing biophysical features

    of constituents

    �41

  • /105© Burkhard Rost

    �42

    Side chain properties

  • /105© Burkhard Rost

    �43

    Side chain properties

    +R

    +K

  • /105© Burkhard Rost

    �44

    Negatively charged amino acids-

    -D

    E

  • /105© Burkhard Rost

    �45

    Polar amino acids

    +----

    ++

    +N

    Q

    S

    YT

  • /105© Burkhard Rost

    �46

    amino acids

  • © Burkhard Rost

    “components” of protein structure:

    domains

    �47

  • /105© Burkhard Rost

    �48

    Domain from introns?

    © Wikipedia

    RNA splicing

    Gene product = protein

  • /105© Burkhard Rost

    �49

    Domain merger

    prokaryote P, protein Aprokaryote P, protein B

    prokaryote P2, protein C

  • /105© Burkhard Rost

    �50

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /105© Burkhard Rost

    �51

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /105© Burkhard Rost

    �52

    Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F

    domain 1 domain 2

  • /105© Burkhard Rost

    �53

    Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F

    domain 1 domain 2

  • /105© Burkhard Rost

    �54

    Most proteins multi-domain

    Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins 56:188-200Liu & Rost 2004 Proteins 55:678-686

    Single-domain proteins: 61% in PDB 28% in 62 proteomes

  • © Burkhard Rost

    3D comparisons: principle idea

    how to?

    �55

  • /105© Burkhard Rost

    �56

    Matching shapes

    © http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#/slide-3

    http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#

  • /105© Burkhard Rost

    �57

    How to match?

    ?

  • /105© Burkhard Rost

    �58

    How to match?

    ?

  • /105© Burkhard Rost

    �59

    Differences for corresponding points

  • /105© Burkhard Rost

    �60

    Differences for corresponding points

  • /105© Burkhard Rost

    �61

    Differences for corresponding points

  • /105© Burkhard Rost

    �62

    Differences for corresponding points

  • /105© Burkhard Rost

    �63

    Differences for corresponding points

    12

    34

    56

    87

  • /105© Burkhard Rost

    �64

    Differences for corresponding points

    12

    34

    56

    87

    Difference

    = d1+d2+d3...+d8

    = |r1a-r1b|+...+|r8a-r8b|

    RMSD (root mean square deviation)

    =SQRT [ (r1a-r1b)2+...(r8a-r8b)2 ]

    =

    !"#$ !,! = !!! − !!!!

    !!

  • /105© Burkhard Rost

    �65

    Differences for corresponding points

    12

    34

    56

    87

    12

    34

    56

    87

    !"#$ !,! = !!! − !!!!

    !!

  • /105© Burkhard Rost

    1st: find corresponding points2nd: superimpose

    �66

    Actual algorithm inverted

    !"#$ !,! = !!! − !!!!

    !!

  • /105© Burkhard Rost

    �67

    fit now?

  • /105© Burkhard Rost

    �68

    Scaling easy for simple shapes

    x^2+y^2=r^2

  • /105© Burkhard Rost

    �69

    Proteins: points are defined->no scaling

    12

    34

    56

    87

    Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel

  • /105© Burkhard Rost

    �70

    Global vs. local comparisons

  • /105© Burkhard Rost

    �71

    Global vs. local comparisons

  • /105© Burkhard Rost

    �72

    Global vs. local comparisons

    global

    solution 1:

    global

    solution 2:

  • /105© Burkhard Rost

    �73

    cut into “units”

  • /105© Burkhard Rost

    �74

    cut into “units”

  • /105© Burkhard Rost

    �75

    trouble: where to stop?

    valid “unit”

    for comparison?

  • /105© Burkhard Rost

    �76

    How to decide what is a valid unit?

    ??

    ???

  • /105© Burkhard Rost

    �77

    valid “unit”

    for comparison?

    Decision upon validity

  • /105© Burkhard Rost

    Scientifically significant:
some expert says

    �78

    Valid or not?

  • /105© Burkhard Rost

    �79

    How can a machine decide what is a valid unit?

    ??

    ???

  • /105© Burkhard Rost

    Scientifically significant:
some expert says 





    Statistically significant:
background

    �80

    Valid or not?

    background

    signal

  • /105© Burkhard Rost

    �81

    Cut, match, compare by RMSD

    !"#$ !,! = !!! − !!!!

    !!

  • /105© Burkhard Rost

    �82

    Only Cartesian RMSD comparison?

    12

    34

    56

    87

    12

    34

    56

    87

    !"#$ !,! = !!! − !!!!

    !!

  • /105© Burkhard Rost

    �83

    2D: difference matrix

    12

    34

    56

    87

    1 2 3 4 5 6 7 81

    2 3 4 5 6 7 8

  • /105© Burkhard Rost

    �84

    Comparison 2D: differences of differences

    12

    34

    56

    87

    Total of 8 x 8 differences

  • © Burkhard Rost

    3D comparisons: biology

    �85

  • /105© Burkhard Rost

    Slides taken from Patrice Koehl, UC Davis

    �86

    Structure alignment

    Patrice Koehl

  • /105© Burkhard Rost

    1. Identify equivalent positions (residues that match in 3D)2. find superposition independent of domain movements

    �87

    Structure alignment: two steps

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf© Patrice Koehl, UC Davis

    http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

  • /105© Burkhard Rost

    Step 1: find corresponding points in proteins A and Bd(i) are the distances between all corresponding points (typic: Calpha, all atoms)

    �88

    rmsd(A,B) = N1

    i=1

    N

    di2

    Root mean square displacement (rmsd)

  • /105© Burkhard Rost

    �89

    RMSD is not a metric

    A similar B B similar C NOT implying: A similar C

    cRMSD = 2.8 Ǻ = 0.28 nm

    A

    B C

    cRMSD = 2.85 Ǻ = 0.285 nm

  • © Burkhard Rost

    DALI 3D alignment

    Holm & Sander

    �90

  • /105© Burkhard Rost

    �91

    PQITLWQRPLVTIKIGGQLKEALLDTGADDTVL

    PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL

    1281757077

    120238169200247114740

    904

    466268

    11831

    1241

    292449726217

    102691

    140

    1109760691481976248590

    690

    730

    415371597395000

    5851300

    79586900

    EEEEE

    EEEEEE

    EEEEEEE

    EE

    EEEEE

    EEEEEE

    EE

    kcal/mol0 -1 -2 -3 -4 -5

    1 10 20 30 40 50 60 70 80 90

    1

    10

    20

    30

    40

    50

    60

    70

    80

    90

    1D1D 2D2D 3D3D

    Notation: protein structure 1D, 2D, 3D

  • /105© Burkhard Rost

    L Holm & C Sander (1993) JMB 233:123-38

    Distance matrix AlignmentAlgorithm: Monte Carlo on all-against-all for hexapeptides (5)

    �92

    Structural alignment: DALI

    L Holm & C Sander 91993) JMB 233:123-38: Fig. 1

  • © Burkhard Rost

    Vorolign 3D alignment

    Birzele & Zimmer

    �93

  • /105© Burkhard Rost

    Fabian Birzele, Jan E Gewehr, Gergely Csaba & Ralf Zimmer (2006) Bioinformatics 23:e205-11Dynamic programming on Voronoi environments

    �94

    Structural alignment: VOROLIGN

    F Birzele, JE Gewehr, G Csaba & R Zimmer (2006) Bioinformatics 23:e205-11: Fig. 2

  • © Burkhard Rost

    3D comparisons: local vs. global

    �95

  • /105© Burkhard Rost

    �96

    2 forms of calcium-bound Calmodulin

    Two forms of calcium-bound Calmodulin:

    Ligand free

    Complexed with trifluoperazine

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

    © Patrice Koehl, UC Davis

  • /105© Burkhard Rost

    �97

    Global alignment: RMSD =15 Å /143 residues

    Local alignment: RMSD = 0.9 Å/ 62 residues

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

    © Patrice Koehl, UC Davis

  • © Burkhard Rost

    Many other 3D alignment methods

    exist

    �98

  • © Burkhard Rost

    Recap: proteins/cells

    �99

  • /105© Burkhard Rost

    �100

    Cells: outside & inside

    Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /105© Burkhard Rost

    �101

    A gallery of proteins

    1ej9 Topoisomerase

    1i6h RNA Polymerase

    1ais TATA-binding Protein/Transcription Factor IIB

    1cgp Catabolite GeneActivator Protein

    1tau DNA Polymerase

    1lbh+1efa lacRepressor

    1aoi Nucleosome

    1aoi Nucleosome

    1b7t Myosin 1atn Actin

    1tub Microtubule1bkv Collagen

    © David Goodsell

    scale reduced

  • /105© Burkhard Rost

    �102

    HIV-1 and a Human T-cell

    gp120

    CD4

    CCR5

    HIV-1envelope

    glycoprotein

    T-cellsurfaceslide: Natasha Wood, Cape Town

  • /105© Burkhard Rost

    �103

    HIV-1 and a Human T-cell

    IMAGE:http://www.sciencemag.org/content/320/5877/760/F3.large.jpg

    V3 loop

    slide: Natasha Wood, Cape Town

  • /105© Burkhard Rost

    �104

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /105© Burkhard Rost

    01: 04/10 Tue: No lecture02: 04/12 Thu: No lecture03: 04/17 Tue: No lecture04: 04/19 Thu: Intro 1: organization of lecture: intro into cells & biology05: 04/24 Tue: Intro 2: amino acids, protein structure (comparison), domains06: 04/26 Thu: No lecture07: 05/01 Tue: SKIP: May Day08: 05/03 Thu: Alignment 109: 05/08 Tue: Alignment 210: 05/10 Thu: SKIP: Ascension Day11: 05/15 Tue: Comparative modeling & exp structure determination & secondary structure assignment12: 05/17 Thu: 1D: Secondary structure prediction 113: 05/22 Tue: SKIP: Whitsun holiday14: 05/24 Thu: 1D: Secondary structure prediction 215: 05/29 Tue: 1D: Secondary structure prediction 316: 05/31 Thu: SKIP: Corpus Christi17: 06/05 Tue: 1D: Transmembrane structure prediction 118: 06/07 Thu: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction19: 06/12 Tue: 1D: Transmembrane structure prediction 3 / Solvent accessibility prediction20: 06/14 Thu: 1D: Disorder prediction21: 06/19 Tue: 2D prediction / 3D prediction22: 06/21 Thu: No lecture23: 06/26 Tue: recap 124: 06/28 Thu: recap 225: 07/03 Tue: TBA26: 07/05 Thu: TBA27: 07/10 Tue: TBA28: 07/12 Thu: TBA

    �105

    Lecture plan (CB1 structure: INF)

    today