View
214
Download
0
Embed Size (px)
Citation preview
11
HW ClarificationsHW Clarifications
• Homology implies shared ancestry
• Partial sequence identity does not necessarily imply homology
• A high coverage of sequence identity can imply homology
Identity and Homology
22
HW ClarificationsHW Clarifications
Insertions and Deletions
33
Prediction of Prediction of functional/structural sites in functional/structural sites in a protein using conservation a protein using conservation
and hyper-variation and hyper-variation (ConSeq, ConSurf, Selecton)(ConSeq, ConSurf, Selecton)
44
Empirical findings ofEmpirical findings ofconservation variation among sitesconservation variation among sites::
Functional/Structural sites evolveFunctional/Structural sites evolve
slowerslowerthan than
nonfunctional/nonstructural sitesnonfunctional/nonstructural sites
55
Conservation = functional/structural Conservation = functional/structural importanceimportance
66
Histone 3 proteinHistone 3 protein
77
Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : ****
Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::*
Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****
Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******
Alignment pre-pro-insulin
88
<>
99
1010
Conserved sites: Important for the function or structureImportant for the function or structure Not allowed to mutateNot allowed to mutate “Slow evolving” sites Low rate of evolution
Variable sites: Less important (usually) Change more easily “Fast evolving” sites High rate of evolution
Conservation based inferenceConservation based inference
1111
Detecting conservation: Detecting conservation: Evolutionary rates
d T
dr
2
• Rate = distance/time• Distance = number of substitutions per site • Time = 2*#years (doubled because the sequences evolved independently)
1212
Rate computationRate computation
11223344556677
HumanHumanDDMMAAAAHHAAMM
ChimpChimpDDEEAAAAGGGGCC
CowCowDDQQAAAAWWAAPP
FishFishDDLLAAAACCAALL
S. cerevisiaeS. cerevisiaeDDDDGGAAFFAAAA
S. pombeS. pombeDDDDGGAALLGGEE
MSAPhylogeny
Evolutionary Model
1313
http://http://conseqconseq.tau.ac.il.tau.ac.ilSite-specific rate computation toolSite-specific rate computation tool
1414
Locating the active Locating the active site of Pyruvate kinasesite of Pyruvate kinase
Glycolysis pathway
1515
1616
1717
1818
Conservation scoresConservation scores:: The scores are standardized: the average score of all The scores are standardized: the average score of all
residues is residues is 00, and the standard deviation is , and the standard deviation is 11 Negative valuesNegative values: slowly evolving (= low evolutionary : slowly evolving (= low evolutionary
rate). rate). conserved sitesconserved sites The most conserved site in the protein has the lowest scoreThe most conserved site in the protein has the lowest score
Positive valuesPositive values: rapidly evolving (= fast evolutionary : rapidly evolving (= fast evolutionary rate). rate). variable sitesvariable sites The most variable site in the protein has the highest scoreThe most variable site in the protein has the highest score
Scores are relative to the protein and cannot Scores are relative to the protein and cannot be compared between different proteins!!!be compared between different proteins!!!
1919
2020
SWISS-PROT
2121
Combining protein Combining protein structurestructure
Each protein has a particular 3D structure that determines
its function
Protein structure is better conserved than protein
sequence and more closely related to function
Analyzing a protein structure is more informative than
analyzing its sequence for function inference
2222
Protein core: structurally constrained - usually conserved
Active site: functionally constrained - usually conserved
Surface: tolerant to mutations - usually variable
Core
Surface
Conservation in the structureConservation in the structure
Active site
2323
http://http://consurfconsurf.tau.ac.il.tau.ac.ilSame algorithm as ConSeq, but here the resultsSame algorithm as ConSeq, but here the results are projected onto the 3D structure of the proteinare projected onto the 3D structure of the protein
2424
The structure-function of the potassium The structure-function of the potassium channel transmembrane regionchannel transmembrane region
cytoplasm
2525
2626
2727
2828
2929
ConSeqConSeq//ConSurfConSurf user intervention user intervention(advanced options)(advanced options)
1.1. Choosing the method for calculating the amino-acid Choosing the method for calculating the amino-acid conservation scores: (conservation scores: (BayesianBayesian/Max’ Likelihood)/Max’ Likelihood)
2.2. Entering your own MSA fileEntering your own MSA file3.3. Performing the MSA using: (Performing the MSA using: (MUSCLEMUSCLE/CLUSTALW)/CLUSTALW)4.4. Collecting the homologs from: (Collecting the homologs from: (SWISS-PROTSWISS-PROT/UniProt)/UniProt)5.5. Max. number of homologs: (Max. number of homologs: (5050))6.6. No. of PSI-BLAST iterations: (No. of PSI-BLAST iterations: (11))7.7. PSI-BLAST 3-value cutoff: (PSI-BLAST 3-value cutoff: (0.0010.001))8.8. Model of substitution for proteins: Model of substitution for proteins:
((JTTJTT/Dayhoff/mtREV/cpREV/WAG)/Dayhoff/mtREV/cpREV/WAG)9.9. Entering your own PDB fileEntering your own PDB file10.10. Entering your own TREE fileEntering your own TREE file
3030
Codon-level selectionCodon-level selection
ConSeq/ConSurf:ConSeq/ConSurf: Compute the evolutionary rate of amino-acid Compute the evolutionary rate of amino-acid
sites → the data are amino acidssites → the data are amino acids
Compute only the rate of non-synonymous Compute only the rate of non-synonymous substitutionssubstitutions
UUU → UUC (Phe → Phe ): synonymous
UUU → CUU (Phe → Leu): non-synonymous
3131
For For mostmost proteins, the rate of proteins, the rate of synonymoussynonymous substitutions is muchsubstitutions is much
HigherHigherthan the than the non-synonymousnon-synonymous rate rate
This is called purifying selectionpurifying selection (= conservation (= conservation in ConSeq/Surfin ConSeq/Surf))
Synonymous vs. non-synonymous substitutions
3232
There are rare cases where the non-synonymous rate is much higher than the synonymous rate
This is called positive (Darwinian) positive (Darwinian) selectionselection
Synonymous vs. nonsynonymous substitutions
3333
Examples:Examples: Pathogen proteins evading the host immune Pathogen proteins evading the host immune
systemsystem Proteins of the immune system detecting Proteins of the immune system detecting
pathogen proteinspathogen proteins Pathogen proteins that are drug targetsPathogen proteins that are drug targets Proteins that are products of gene duplicationProteins that are products of gene duplication Proteins involved in the reproductive systemProteins involved in the reproductive system
Positive Selection
The hypothesis:The hypothesis:
promotes the fitness of the organism promotes the fitness of the organism
3434
Computing synonymous and non-synonymous rates
Evolutionary Model
Codon MSAPhylogeny
3535
Inferring positive selectionInferring positive selection
Look at the ratio between the non-Look at the ratio between the non-synonymous rate (synonymous rate (KKaa) and the ) and the
synonymous rate (synonymous rate (KKss))
s
ak
k
3636
Inferring positive selectionInferring positive selection
Ka/Ks Ka/Ks < 1< 1 purifying selectionpurifying selection
Ka/KsKa/Ks > 1 > 1 positive selectionpositive selection
Ka/KsKa/Ks = 1 = 1 no selection (neutral)no selection (neutral)
3737
Our evolutionary model assumes Our evolutionary model assumes there is positive selection in the there is positive selection in the datadata
By chance alone we expect our By chance alone we expect our model to find a few sites with model to find a few sites with Ka/KsKa/Ks >1 >1
Is this really indicative of positive Is this really indicative of positive selection or plain randomness?selection or plain randomness?
Maybe there’s no positive selection after all? Maybe there’s no positive selection after all?
Evolutionary Model
Codon MSAPhylogeny
Ks
Ka0
3838
Solution: Solution: statistically statistically compare compare between hypothesesbetween hypotheses
HH00: There’s no positive selection: There’s no positive selection
HH11: There is positive selection: There is positive selection
HH00: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data
using a model that using a model that does does not not account for positive account for positive selectionselection
10 Ks
Ka
2~)))0(|(
))1(|(ln(2
HMDataL
HMDataL P-value
< 0.05 accept H0
> 0.05 reject H0
Perform a statistical test to accept or reject HPerform a statistical test to accept or reject H00
(likelihood ratio test)(likelihood ratio test)
Ks
Ka0
HH11: compute the probability: compute the probability (likelihood) (likelihood) of the data using a model of the data using a model
that that does account for positive selectiondoes account for positive selection
3939
Note: saturation of synonymous substitutions
Human and wheat are too evolutionary remote
saturation of synonymous substitutions
Pick closer sequences for positive selection analysis
Syn.
Nonsyn.
4040
http://selecton.tau.ac.il
4141
Selecton input
Coding sequences - only ORFsCoding sequences - only ORFs No stop codonsNo stop codons If an MSA is provided it must be If an MSA is provided it must be codon alignedcodon aligned ((
RevTransRevTrans)) The user must provide the sequences – no psi-blast The user must provide the sequences – no psi-blast
optionoption
Codon-level sequences !!!
4242
Positive selection in the primatePositive selection in the primateTRIM5aTRIM5a
4343
PrimatePrimateTRIM5aTRIM5a
TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species
TRIM5α is an important natural barrier to cross-species retrovirus transmission
TRIM5α is in an antagonistic conflict with the retroviral capsid proteins
TRIM5α is under positive selection
4444
Positive selection analysisPositive selection analysis
4545
Positive selection analysis in SelectonPositive selection analysis in Selecton
H0
H1
4646
Comparing HComparing H00 and H and H11 in Selecton in Selecton
4747
Comparing HComparing H00 and H and H11 in Selecton in Selecton
4848
4949
Selecton resultsSelecton results::
5050
5151
ResultsResults
Human rhesus swaps at sites 332, 335-340 (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV