1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily...

Preview:

Citation preview

11

HW ClarificationsHW Clarifications

• Homology implies shared ancestry

• Partial sequence identity does not necessarily imply homology

• A high coverage of sequence identity can imply homology

Identity and Homology

22

HW ClarificationsHW Clarifications

Insertions and Deletions

33

Prediction of Prediction of functional/structural sites in functional/structural sites in a protein using conservation a protein using conservation

and hyper-variation and hyper-variation (ConSeq, ConSurf, Selecton)(ConSeq, ConSurf, Selecton)

44

Empirical findings ofEmpirical findings ofconservation variation among sitesconservation variation among sites::

Functional/Structural sites evolveFunctional/Structural sites evolve

slowerslowerthan than

nonfunctional/nonstructural sitesnonfunctional/nonstructural sites

55

Conservation = functional/structural Conservation = functional/structural importanceimportance

66

Histone 3 proteinHistone 3 protein

77

Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : ****

Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::*

Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****

Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******

Alignment pre-pro-insulin

88

<>

99

1010

Conserved sites: Important for the function or structureImportant for the function or structure Not allowed to mutateNot allowed to mutate “Slow evolving” sites Low rate of evolution

Variable sites: Less important (usually) Change more easily “Fast evolving” sites High rate of evolution

Conservation based inferenceConservation based inference

1111

Detecting conservation: Detecting conservation: Evolutionary rates

d T

dr

2

• Rate = distance/time• Distance = number of substitutions per site • Time = 2*#years (doubled because the sequences evolved independently)

1212

Rate computationRate computation

11223344556677

HumanHumanDDMMAAAAHHAAMM

ChimpChimpDDEEAAAAGGGGCC

CowCowDDQQAAAAWWAAPP

FishFishDDLLAAAACCAALL

S. cerevisiaeS. cerevisiaeDDDDGGAAFFAAAA

S. pombeS. pombeDDDDGGAALLGGEE

MSAPhylogeny

Evolutionary Model

1313

http://http://conseqconseq.tau.ac.il.tau.ac.ilSite-specific rate computation toolSite-specific rate computation tool

1414

Locating the active Locating the active site of Pyruvate kinasesite of Pyruvate kinase

Glycolysis pathway

1515

1616

1717

1818

Conservation scoresConservation scores:: The scores are standardized: the average score of all The scores are standardized: the average score of all

residues is residues is 00, and the standard deviation is , and the standard deviation is 11 Negative valuesNegative values: slowly evolving (= low evolutionary : slowly evolving (= low evolutionary

rate). rate). conserved sitesconserved sites The most conserved site in the protein has the lowest scoreThe most conserved site in the protein has the lowest score

Positive valuesPositive values: rapidly evolving (= fast evolutionary : rapidly evolving (= fast evolutionary rate). rate). variable sitesvariable sites The most variable site in the protein has the highest scoreThe most variable site in the protein has the highest score

Scores are relative to the protein and cannot Scores are relative to the protein and cannot be compared between different proteins!!!be compared between different proteins!!!

1919

2020

SWISS-PROT

2121

Combining protein Combining protein structurestructure

Each protein has a particular 3D structure that determines

its function

Protein structure is better conserved than protein

sequence and more closely related to function

Analyzing a protein structure is more informative than

analyzing its sequence for function inference

2222

Protein core: structurally constrained - usually conserved

Active site: functionally constrained - usually conserved

Surface: tolerant to mutations - usually variable

Core

Surface

Conservation in the structureConservation in the structure

Active site

2323

http://http://consurfconsurf.tau.ac.il.tau.ac.ilSame algorithm as ConSeq, but here the resultsSame algorithm as ConSeq, but here the results are projected onto the 3D structure of the proteinare projected onto the 3D structure of the protein

2424

The structure-function of the potassium The structure-function of the potassium channel transmembrane regionchannel transmembrane region

cytoplasm

2525

2626

2727

2828

2929

ConSeqConSeq//ConSurfConSurf user intervention user intervention(advanced options)(advanced options)

1.1. Choosing the method for calculating the amino-acid Choosing the method for calculating the amino-acid conservation scores: (conservation scores: (BayesianBayesian/Max’ Likelihood)/Max’ Likelihood)

2.2. Entering your own MSA fileEntering your own MSA file3.3. Performing the MSA using: (Performing the MSA using: (MUSCLEMUSCLE/CLUSTALW)/CLUSTALW)4.4. Collecting the homologs from: (Collecting the homologs from: (SWISS-PROTSWISS-PROT/UniProt)/UniProt)5.5. Max. number of homologs: (Max. number of homologs: (5050))6.6. No. of PSI-BLAST iterations: (No. of PSI-BLAST iterations: (11))7.7. PSI-BLAST 3-value cutoff: (PSI-BLAST 3-value cutoff: (0.0010.001))8.8. Model of substitution for proteins: Model of substitution for proteins:

((JTTJTT/Dayhoff/mtREV/cpREV/WAG)/Dayhoff/mtREV/cpREV/WAG)9.9. Entering your own PDB fileEntering your own PDB file10.10. Entering your own TREE fileEntering your own TREE file

3030

Codon-level selectionCodon-level selection

ConSeq/ConSurf:ConSeq/ConSurf: Compute the evolutionary rate of amino-acid Compute the evolutionary rate of amino-acid

sites → the data are amino acidssites → the data are amino acids

Compute only the rate of non-synonymous Compute only the rate of non-synonymous substitutionssubstitutions

UUU → UUC (Phe → Phe ): synonymous

UUU → CUU (Phe → Leu): non-synonymous

3131

For For mostmost proteins, the rate of proteins, the rate of synonymoussynonymous substitutions is muchsubstitutions is much

HigherHigherthan the than the non-synonymousnon-synonymous rate rate

This is called purifying selectionpurifying selection (= conservation (= conservation in ConSeq/Surfin ConSeq/Surf))

Synonymous vs. non-synonymous substitutions

3232

There are rare cases where the non-synonymous rate is much higher than the synonymous rate

This is called positive (Darwinian) positive (Darwinian) selectionselection

Synonymous vs. nonsynonymous substitutions

3333

Examples:Examples: Pathogen proteins evading the host immune Pathogen proteins evading the host immune

systemsystem Proteins of the immune system detecting Proteins of the immune system detecting

pathogen proteinspathogen proteins Pathogen proteins that are drug targetsPathogen proteins that are drug targets Proteins that are products of gene duplicationProteins that are products of gene duplication Proteins involved in the reproductive systemProteins involved in the reproductive system

Positive Selection

The hypothesis:The hypothesis:

promotes the fitness of the organism promotes the fitness of the organism

3434

Computing synonymous and non-synonymous rates

Evolutionary Model

Codon MSAPhylogeny

3535

Inferring positive selectionInferring positive selection

Look at the ratio between the non-Look at the ratio between the non-synonymous rate (synonymous rate (KKaa) and the ) and the

synonymous rate (synonymous rate (KKss))

s

ak

k

3636

Inferring positive selectionInferring positive selection

Ka/Ks Ka/Ks < 1< 1 purifying selectionpurifying selection

Ka/KsKa/Ks > 1 > 1 positive selectionpositive selection

Ka/KsKa/Ks = 1 = 1 no selection (neutral)no selection (neutral)

3737

Our evolutionary model assumes Our evolutionary model assumes there is positive selection in the there is positive selection in the datadata

By chance alone we expect our By chance alone we expect our model to find a few sites with model to find a few sites with Ka/KsKa/Ks >1 >1

Is this really indicative of positive Is this really indicative of positive selection or plain randomness?selection or plain randomness?

Maybe there’s no positive selection after all? Maybe there’s no positive selection after all?

Evolutionary Model

Codon MSAPhylogeny

Ks

Ka0

3838

Solution: Solution: statistically statistically compare compare between hypothesesbetween hypotheses

HH00: There’s no positive selection: There’s no positive selection

HH11: There is positive selection: There is positive selection

HH00: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data

using a model that using a model that does does not not account for positive account for positive selectionselection

10 Ks

Ka

2~)))0(|(

))1(|(ln(2

HMDataL

HMDataL P-value

< 0.05 accept H0

> 0.05 reject H0

Perform a statistical test to accept or reject HPerform a statistical test to accept or reject H00

(likelihood ratio test)(likelihood ratio test)

Ks

Ka0

HH11: compute the probability: compute the probability (likelihood) (likelihood) of the data using a model of the data using a model

that that does account for positive selectiondoes account for positive selection

3939

Note: saturation of synonymous substitutions

Human and wheat are too evolutionary remote

saturation of synonymous substitutions

Pick closer sequences for positive selection analysis

Syn.

Nonsyn.

4040

http://selecton.tau.ac.il

4141

Selecton input

Coding sequences - only ORFsCoding sequences - only ORFs No stop codonsNo stop codons If an MSA is provided it must be If an MSA is provided it must be codon alignedcodon aligned ((

RevTransRevTrans)) The user must provide the sequences – no psi-blast The user must provide the sequences – no psi-blast

optionoption

Codon-level sequences !!!

4242

Positive selection in the primatePositive selection in the primateTRIM5aTRIM5a

4343

PrimatePrimateTRIM5aTRIM5a

TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species

TRIM5α is an important natural barrier to cross-species retrovirus transmission

TRIM5α is in an antagonistic conflict with the retroviral capsid proteins

TRIM5α is under positive selection

4444

Positive selection analysisPositive selection analysis

4545

Positive selection analysis in SelectonPositive selection analysis in Selecton

H0

H1

4646

Comparing HComparing H00 and H and H11 in Selecton in Selecton

4747

Comparing HComparing H00 and H and H11 in Selecton in Selecton

4848

4949

Selecton resultsSelecton results::

5050

5151

ResultsResults

Human rhesus swaps at sites 332, 335-340 (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV

Recommended