51
1 HW Clarifications HW Clarifications • Homology implies shared ancestry • Partial sequence identity does not necessarily imply homology • A high coverage of sequence identity can imply homology Identity and Homology

1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

11

HW ClarificationsHW Clarifications

• Homology implies shared ancestry

• Partial sequence identity does not necessarily imply homology

• A high coverage of sequence identity can imply homology

Identity and Homology

Page 2: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

22

HW ClarificationsHW Clarifications

Insertions and Deletions

Page 3: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

33

Prediction of Prediction of functional/structural sites in functional/structural sites in a protein using conservation a protein using conservation

and hyper-variation and hyper-variation (ConSeq, ConSurf, Selecton)(ConSeq, ConSurf, Selecton)

Page 4: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

44

Empirical findings ofEmpirical findings ofconservation variation among sitesconservation variation among sites::

Functional/Structural sites evolveFunctional/Structural sites evolve

slowerslowerthan than

nonfunctional/nonstructural sitesnonfunctional/nonstructural sites

Page 5: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

55

Conservation = functional/structural Conservation = functional/structural importanceimportance

Page 6: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

66

Histone 3 proteinHistone 3 protein

Page 7: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

77

Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : ****

Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::*

Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****

Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******

Alignment pre-pro-insulin

Page 8: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

88

<>

Page 9: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

99

Page 10: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1010

Conserved sites: Important for the function or structureImportant for the function or structure Not allowed to mutateNot allowed to mutate “Slow evolving” sites Low rate of evolution

Variable sites: Less important (usually) Change more easily “Fast evolving” sites High rate of evolution

Conservation based inferenceConservation based inference

Page 11: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1111

Detecting conservation: Detecting conservation: Evolutionary rates

d T

dr

2

• Rate = distance/time• Distance = number of substitutions per site • Time = 2*#years (doubled because the sequences evolved independently)

Page 12: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1212

Rate computationRate computation

11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. cerevisiaeS. cerevisiaeDDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEE

MSAPhylogeny

Evolutionary Model

Page 13: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1313

http://http://conseqconseq.tau.ac.il.tau.ac.ilSite-specific rate computation toolSite-specific rate computation tool

Page 14: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1414

Locating the active Locating the active site of Pyruvate kinasesite of Pyruvate kinase

Glycolysis pathway

Page 15: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1515

Page 16: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1616

Page 17: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1717

Page 18: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1818

Conservation scoresConservation scores:: The scores are standardized: the average score of all The scores are standardized: the average score of all

residues is residues is 00, and the standard deviation is , and the standard deviation is 11 Negative valuesNegative values: slowly evolving (= low evolutionary : slowly evolving (= low evolutionary

rate). rate). conserved sitesconserved sites The most conserved site in the protein has the lowest scoreThe most conserved site in the protein has the lowest score

Positive valuesPositive values: rapidly evolving (= fast evolutionary : rapidly evolving (= fast evolutionary rate). rate). variable sitesvariable sites The most variable site in the protein has the highest scoreThe most variable site in the protein has the highest score

Scores are relative to the protein and cannot Scores are relative to the protein and cannot be compared between different proteins!!!be compared between different proteins!!!

Page 19: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

1919

Page 20: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2020

SWISS-PROT

Page 21: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2121

Combining protein Combining protein structurestructure

Each protein has a particular 3D structure that determines

its function

Protein structure is better conserved than protein

sequence and more closely related to function

Analyzing a protein structure is more informative than

analyzing its sequence for function inference

Page 22: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2222

Protein core: structurally constrained - usually conserved

Active site: functionally constrained - usually conserved

Surface: tolerant to mutations - usually variable

Core

Surface

Conservation in the structureConservation in the structure

Active site

Page 23: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2323

http://http://consurfconsurf.tau.ac.il.tau.ac.ilSame algorithm as ConSeq, but here the resultsSame algorithm as ConSeq, but here the results are projected onto the 3D structure of the proteinare projected onto the 3D structure of the protein

Page 24: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2424

The structure-function of the potassium The structure-function of the potassium channel transmembrane regionchannel transmembrane region

cytoplasm

Page 25: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2525

Page 26: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2626

Page 27: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2727

Page 28: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2828

Page 29: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

2929

ConSeqConSeq//ConSurfConSurf user intervention user intervention(advanced options)(advanced options)

1.1. Choosing the method for calculating the amino-acid Choosing the method for calculating the amino-acid conservation scores: (conservation scores: (BayesianBayesian/Max’ Likelihood)/Max’ Likelihood)

2.2. Entering your own MSA fileEntering your own MSA file3.3. Performing the MSA using: (Performing the MSA using: (MUSCLEMUSCLE/CLUSTALW)/CLUSTALW)4.4. Collecting the homologs from: (Collecting the homologs from: (SWISS-PROTSWISS-PROT/UniProt)/UniProt)5.5. Max. number of homologs: (Max. number of homologs: (5050))6.6. No. of PSI-BLAST iterations: (No. of PSI-BLAST iterations: (11))7.7. PSI-BLAST 3-value cutoff: (PSI-BLAST 3-value cutoff: (0.0010.001))8.8. Model of substitution for proteins: Model of substitution for proteins:

((JTTJTT/Dayhoff/mtREV/cpREV/WAG)/Dayhoff/mtREV/cpREV/WAG)9.9. Entering your own PDB fileEntering your own PDB file10.10. Entering your own TREE fileEntering your own TREE file

Page 30: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3030

Codon-level selectionCodon-level selection

ConSeq/ConSurf:ConSeq/ConSurf: Compute the evolutionary rate of amino-acid Compute the evolutionary rate of amino-acid

sites → the data are amino acidssites → the data are amino acids

Compute only the rate of non-synonymous Compute only the rate of non-synonymous substitutionssubstitutions

UUU → UUC (Phe → Phe ): synonymous

UUU → CUU (Phe → Leu): non-synonymous

Page 31: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3131

For For mostmost proteins, the rate of proteins, the rate of synonymoussynonymous substitutions is muchsubstitutions is much

HigherHigherthan the than the non-synonymousnon-synonymous rate rate

This is called purifying selectionpurifying selection (= conservation (= conservation in ConSeq/Surfin ConSeq/Surf))

Synonymous vs. non-synonymous substitutions

Page 32: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3232

There are rare cases where the non-synonymous rate is much higher than the synonymous rate

This is called positive (Darwinian) positive (Darwinian) selectionselection

Synonymous vs. nonsynonymous substitutions

Page 33: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3333

Examples:Examples: Pathogen proteins evading the host immune Pathogen proteins evading the host immune

systemsystem Proteins of the immune system detecting Proteins of the immune system detecting

pathogen proteinspathogen proteins Pathogen proteins that are drug targetsPathogen proteins that are drug targets Proteins that are products of gene duplicationProteins that are products of gene duplication Proteins involved in the reproductive systemProteins involved in the reproductive system

Positive Selection

The hypothesis:The hypothesis:

promotes the fitness of the organism promotes the fitness of the organism

Page 34: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3434

Computing synonymous and non-synonymous rates

Evolutionary Model

Codon MSAPhylogeny

Page 35: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3535

Inferring positive selectionInferring positive selection

Look at the ratio between the non-Look at the ratio between the non-synonymous rate (synonymous rate (KKaa) and the ) and the

synonymous rate (synonymous rate (KKss))

s

ak

k

Page 36: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3636

Inferring positive selectionInferring positive selection

Ka/Ks Ka/Ks < 1< 1 purifying selectionpurifying selection

Ka/KsKa/Ks > 1 > 1 positive selectionpositive selection

Ka/KsKa/Ks = 1 = 1 no selection (neutral)no selection (neutral)

Page 37: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3737

Our evolutionary model assumes Our evolutionary model assumes there is positive selection in the there is positive selection in the datadata

By chance alone we expect our By chance alone we expect our model to find a few sites with model to find a few sites with Ka/KsKa/Ks >1 >1

Is this really indicative of positive Is this really indicative of positive selection or plain randomness?selection or plain randomness?

Maybe there’s no positive selection after all? Maybe there’s no positive selection after all?

Evolutionary Model

Codon MSAPhylogeny

Ks

Ka0

Page 38: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3838

Solution: Solution: statistically statistically compare compare between hypothesesbetween hypotheses

HH00: There’s no positive selection: There’s no positive selection

HH11: There is positive selection: There is positive selection

HH00: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data

using a model that using a model that does does not not account for positive account for positive selectionselection

10 Ks

Ka

2~)))0(|(

))1(|(ln(2

HMDataL

HMDataL P-value

< 0.05 accept H0

> 0.05 reject H0

Perform a statistical test to accept or reject HPerform a statistical test to accept or reject H00

(likelihood ratio test)(likelihood ratio test)

Ks

Ka0

HH11: compute the probability: compute the probability (likelihood) (likelihood) of the data using a model of the data using a model

that that does account for positive selectiondoes account for positive selection

Page 39: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

3939

Note: saturation of synonymous substitutions

Human and wheat are too evolutionary remote

saturation of synonymous substitutions

Pick closer sequences for positive selection analysis

Syn.

Nonsyn.

Page 40: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4040

http://selecton.tau.ac.il

Page 41: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4141

Selecton input

Coding sequences - only ORFsCoding sequences - only ORFs No stop codonsNo stop codons If an MSA is provided it must be If an MSA is provided it must be codon alignedcodon aligned ((

RevTransRevTrans)) The user must provide the sequences – no psi-blast The user must provide the sequences – no psi-blast

optionoption

Codon-level sequences !!!

Page 42: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4242

Positive selection in the primatePositive selection in the primateTRIM5aTRIM5a

Page 43: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4343

PrimatePrimateTRIM5aTRIM5a

TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species

TRIM5α is an important natural barrier to cross-species retrovirus transmission

TRIM5α is in an antagonistic conflict with the retroviral capsid proteins

TRIM5α is under positive selection

Page 44: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4444

Positive selection analysisPositive selection analysis

Page 45: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4545

Positive selection analysis in SelectonPositive selection analysis in Selecton

H0

H1

Page 46: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4646

Comparing HComparing H00 and H and H11 in Selecton in Selecton

Page 47: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4747

Comparing HComparing H00 and H and H11 in Selecton in Selecton

Page 48: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4848

Page 49: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

4949

Selecton resultsSelecton results::

Page 50: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

5050

Page 51: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity

5151

ResultsResults

Human rhesus swaps at sites 332, 335-340 (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV