Upload
edith-higgins
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA
atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaactggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaagcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacggaactaaacgcggtaaagccgcttaa
augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaaccgaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggaguaaccaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuugguuacaguacggaacuaaacgcgguaaagccgcuuaa
Proteins are the primary functionalmanifestation of genomes
DNA sequence
RNA sequence
proteinsequence
proteinstructure
Protein function
transcription
translation
Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Predicting protein sequence from DNA sequence
• Protein sequence can be predicted by translating the cDNA and using the genetic code.
Translating yeast mitochondrial cDNA into protein sequence
ATGTCTCTTATATGA………SECIS sequence
MetSerThrMetsCys
MetSerLeuIleTer
There is a Gene with a considerably different protein sequence from the one we would
predict from the universal genetic code!!!!!
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
• The sequence of AAs is the primary structure of proteins• Sequence determines structure• Amino acids don’t fall neatly into classes• How we casually speak of them can affect the way we
think about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen.
• The properties of a residue type can also vary with conditions/environment
Amino acids are the primary building blocks of proteins
Proteins are made by controlled polymerization of amino acids
H2N CH C
R1
OH
O
H2N CH C
R2
OH
O
H2N CH C
R1
NH
O
CH C
R2
OH
O
pe ptide bond is formed
+ HOH
res idue 1 res idue 2
two amino a cidscondense to form...
...a dipeptide . Ifthe re a re more itbe comes a polype ptide .S hort polype ptide cha insa re usua lly ca lled peptideswhile longer one s a re ca lle dprote ins .
wa te r is e limina ted
N or aminote rminus
C or ca rboxyte rminus
Secondary structure elements in proteins
beta-strand(nonlocal interactions)
alpha-helix (local interactions)
A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles
Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted
Principal types of secondary structure found in proteins
Repeating (f,y) values
-63o -42o
-57o -30o
-119o +113o
-139o +135o
-helix(15) (right-handed)
310 helix(14)
Parallel -sheet
Antiparallel -sheet
The alpha-helix: repeating i,i+4 h-bonds
2
1
3
4
5
7
8
9
6
10
11
12
By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space?
right-handed helical region of phi-psi space
hydrogen
bond-63o -42o
-helix(15) (right-handed)
-60
-120
-180
0
60
120
180
-180 -120 -60 0 60 120
strands/sheets
Is this a parallel or anti-parallel sheet?
49
50
51
52
53
54
57
56
beta-strand region of phi-psi space
By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space?
-119o +113o
Parallel -sheet
-60
-120
-180
0
60
120
180
-180 -120 -60 0 60 120 180
Contact maps of protein structures
1avg--structure of triabin
map of C-C distances < 6 Å
rainbow ribbon diagramblue to red: N to C
-both axes are the sequence of the protein
near diagonal: local contacts in the sequence
off-diagonal: long-range (nonlocal) contacts
• If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds
What does secondary structure teach
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Tertiary structure in proteins
• Single polypeptide chain
• The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology
• Pattern of contacts between side chains/backbone also an aspect of tertiary structure
• Outer surface and interior
Obvious interactions in native protein structures
S
S
R3
R1R2
CO2
NH3
ONH
disulfide crosslinks polar interactions (hydrogen bond/salt bridge)
hydrophobic interactions
The protein databank
The protein databank is a central repository of protein structures
http://www.rcsb.org/pdb/home/home.do
Major structure classification systems
SCOP (Structural Classification of Proteins)CATH (Class-Architecture-Topology-Homology)DALI/FSSP (Fold classification based on Structure-Structure Alignment)
SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Training set of known structures
Training set of corresponding sequences
Test set of known structures
Test set of corresponding sequences
The knuts and bolts behind fold predition
p(-helix) p(coil) p(-strand)
A 0.23 0.28 0.5
Database of known structures
Database of corresponding sequences
ACDEFGTYAEE……
-helix coil -strand
p(-helix) p(coil) p(-strand)
A…C… A…C.. A…C…
A 0.1…0.03 0.04…0.002 0.1…0.21
p(aa1-coil) p(aa1-helix)
p(aa1-strand) …
Predict 2ary structureCompare
Bad Predictions:
Reshuffle training set and test set and repeat until predictions are correct
Good Predictions:
Method ready for new sequence 2ndary structure prediction
How does a fold prediction server work?
Database of known structures
Database of corresponding sequences
Database of probabilities of aa in 2ndary structure
YOUR SEQUENCE
Homology
based helix
coil-strand
profile folds database
Server
Strong Homology
… Fold Prediction
Weak/No Homology
Helix-coil-strand
profile prediction
… Fold Prediction
Predicting protein structure
• Homology Modeling– 3D-JIGSAW, SWISSMODEL
• Ab initio Modeling– ROBETTA
How does a homology modeling server work?
Database of known structures
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
Strong Homologues
Best possible alignment
(Sequence+
Structure)
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…
…
Thread sequence to predict over known structure according to alignment
…
… Optimization via energy
minimization, etc…
Predicting protein structure
• Homology Modeling– 3D-JIGSAW,SWISSMODEL
• Ab initio Modeling– ROSETTA
Predicting protein structure by ab initio methods
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
NO Homologues
Database of structures for smaller amino acid runs
…YDVR-SEQ
…YDVRMSD-……YDVR-SEQ
…YPVRMSD-…
…
…VENCE…
…YDNCD……VENCE…
…VEQCE…
…
… Assemble
Energy minimization
& optimization
…
Accuracy of modelling
• Accuracy is widely varying.• The quality of the model is VERY dependent on
the quality of the alignment • Globular proteins are more accurately predicted• Membrane proteins are still a big problem• Homology modelling is “bad” if Homology<30%• CASP is a bienial meeting where accuracy of the
different methods is predicted– Baker group is usually and consistently more accurate
than others
http://www.predictioncenter.org/
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Summary
“Accessible Surface”
Lee & Richards, 1971Shrake & Rupley, 1973
represent atoms as spheres w/appropriateradii and eliminate overlapping parts...
mathematically roll asphere all around thatsurface...
the sphere’scenter tracesout a surfaceas it rolls...
The outer surface: water in protein structures
Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally
Water is not just surrounding the protein--it is interacting with it
Water interacts with protein surfaces
second shell water:only contacts other waters
first shell waters:in contact with/hydrogen boundto protein
most waters visible in structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both
Side chain conformation
• side chains differ in their number of degreesof conformational freedom(some don’t have any, such as Ala and Gly)
•but side chains of very different size can havethe same number of cangles.
Supersecondary structures/structural motifs
• just as there are certain secondary structure elements that are common, there are also particular arrangements of multiple secondary structure elements that are common
• supersecondary structures emphasize issue of topology in protein structure
motif greek key motif
Topology: differences in connectivity
“greek key”“up-and-down”
• example: a four-stranded antiparallel b sheet can have many different topologies based on the order in which the four b strands are connected:
Topology: differences in handedness
• example: An extremely common supersecondary structure in proteins is the beta-alpha-beta motif, in which two adjacent beta-strands are arranged in parallel and are separated in the sequence by a helix which packs against them.
• if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands.
huge preference for right-handed arrangement in proteins
The CATH Hierarchy1. Divide PDB structure entries into domains (using domain recognition algorithms--domain is
the fundamental unit of structure classification
2. Classify each domain according to a five level hierarchy:
ClassArchitectureTopologyHomologous SuperfamilySequence Family
the top 3 levels of the hierarchyare purely phenetic--basedon characteristics of the structure,not on evolutionary relationships
the bottom two levels includesome phyletic classification as well--groupings according to putativecommon ancestry based on structural similarity, functionalsimilarity, and sequence similarity
There is no purely phyleticsystem of protein classification!(also unlikely that there is anycommon ancestor to all proteins)
SCOP: A different (but similar) taxonomy system
Correspondences between SCOP and CATH hierarchies:
SCOP CATH
class class
architecture
fold topology
homologous superfamily
superfamily
family sequence family
domain domain
CATH more directed toward structural classification, whereas SCOPpays more attention to evolutionary relationships. Both have in common that they have manual aspects and are curated by experts.
Amino acids: the building blocks of proteins
H2N CH C
R
OH
O
H3N CH C
R
O
O
The zwitte rionic form isthe pre domina nt form a tne utra l pH
amino group carboxylic acidgroup
s ide cha in
a lpha ca rbon
H3N CC
R
O
O
H
The a lpha ca rbon is a chira l ce nte r--na tura lprote ins a re made ofL amino acids (shownabove) as oppos ed to D