130
DECONSTRUCTING EVOLUTION TO PERTURB PROTEINS AND NETWORKS Olivier Lichtarge MD, PhD Cullen Professor of Human and Molecular Genetics Baylor College of Medicine Houston, Texas USA

What I will tell

  • Upload
    rafi

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

DECONSTRUCTING EVOLUTION TO PERTURB PROTEINS AND NETWORKS Olivier Lichtarge MD, PhD Cullen Professor of Human and Molecular Genetics Baylor College of Medicine Houston, Texas USA. PROLOGUE. What I will tell. What I want to tell. What I tried to tell. - PowerPoint PPT Presentation

Citation preview

Page 1: What I will tell

DECONSTRUCTING EVOLUTION TO PERTURB PROTEINS AND NETWORKS

Olivier Lichtarge MD, PhD

Cullen Professor of Human and Molecular GeneticsBaylor College of Medicine

Houston, TexasUSA

Page 2: What I will tell

What I will tell

What I want to tell

What I tried to tell

PROLOGUE

Page 3: What I will tell

MORBIDITY AND MORTALITY OF PROTEIN DISEASES

• Alzheimer’s• Cancers• Sickle cell• HIV entry• Autoimmune diseases• Amyloidosis• Type II diabetes

• Bleeding diathesis• Molecular mimicry• Cardiomyopaties• Cystic Fibrosis• Huntington’s chorea• Ataxias….

PROTEIN DYSFUNCTION IS LINKED TO MANY AILMENTS

Page 4: What I will tell

Functional site identification has widespread applications

UNDERSTAND FUNCTIONAL SITES

Molecular recognitionprotein-small moleculeprotein-peptideprotein-proteinprotein-nucleic acid

Functioncatalysissignalingmotionmetabolismimmunitytransport…

FOCUS EXPERIMENTS RELEVANT TARGETS

Engineer • drugs • peptide mimics• binding sites• catalytic sites

Modulate pathwayssignalingtranscriptiondevelopmentapoptosis…

FUNCTIONAL SITES MEDIATE PROTEIN FUNCTION

Lichtarge LabBaylor College of Medicine

Page 5: What I will tell

HOW DO PROTEINS WORK?

To control proteins, know their functional determinants

Page 6: What I will tell

functional determinants

RELEVANT PATTERNS OF EVOLUTIONARY VARIATIONS

Page 7: What I will tell

FUNCTIONAL DETERMINANTS IN PROTEINS

Lichtarge LabBaylor College of Medicine

Page 8: What I will tell

Block, separate function

RATIONAL PROTEIN DESIGN

Lichtarge LabBaylor College of Medicine

Galpha Bourne Onrust Science (1997)RGS Wensel Sowa PNAS (2000) Nucl. Transp Moore Cushman JMB ‘04Nucl. Recept. Smith Raviscioni Proteins “06Ku70/80 Bertuch Ribes-Zamora NSMB ’07 GRK Clark Baamaeur Mol. Pharm ‘10RecA-LexA Lichtarge Adikesavan (submitted)

Page 9: What I will tell

Peptide inhibitor or

trigger

RATIONAL PROTEIN DESIGN

Lichtarge LabBaylor College of Medicine

Cohesin PatiGRK

ClarkeKu70/80 Bertuch

Page 10: What I will tell

Rewire function

RATIONAL PROTEIN DESIGN

Lichtarge LabBaylor College of Medicine

RGS Wensel Sowa NSMB (2001) Proneural Tx Hassan Quan Develop. ‘04Nucl. Recept. Cooney Raviscioni JMB“06GPCR Wensel Rodriguez PNAS ‘10

Page 11: What I will tell

Constitutive activity“internal reprogramming”

RATIONAL PROTEIN DESIGN

Lichtarge LabBaylor College of Medicine

GPCR Wensel Madabushi JBC ’04GPCR Lefkowitz Shenoy JBC ‘06 GPCR Wensel Rodriguez PNAS ’10

Page 12: What I will tell

MUTATIONAL IMPACT

Lichtarge LabBaylor College of Medicine

Item Mol. Genet. Metab. ‘07Shaibani Arch.Neurol. ’09Haberle Hum. Mutations ‘10Katsonis in prep

Page 13: What I will tell

PROTEIN FUNCTION PREDICTION

Lichtarge LabBaylor College of Medicine

Page 14: What I will tell

PROTEIN FUNCTION PREDICTION

Lichtarge LabBaylor College of Medicine

Kristensen Prot Sci ‘06Kristensen BMC Bioinfo ‘08Ward PLoS One ‘09Kristensen J Mol Biol ‘09Venner PLoS One ‘10

Page 15: What I will tell

PATTERNS AND EMERGING PROTEOMIC RULES

ScalableRobust

Not random

Match known sites

Predict and guide

experiments

Three classes of automated accurate ET ranking functions Three ET servers:

http://mammoth.bcm.tmc.edu

Function

Sequence StructureEVOLUTION4. ET quality measures

(Clustering, Rank Information) correlate with

prediction quality

1. Amino acids may be ranked by importance

3. Clusters predict functional sites

2. Top-ranked residues cluster

6. Importance symmetry across interface

7. Top-ranked residue variations: specificity key

5. ET Clusters exchange water more slowly more H-bonds and

salt bridges

Lichtarge LabBaylor College of Medicine

Page 16: What I will tell

What I will tell

What I want to tell

What I tried to tell

– LECTURE 1 –

Page 17: What I will tell

PROBLEM

METHOD

Given a structure• Where is the active site ?• What are the key residue determinants of function?

Evolutionary Trace (ET)Use evolution’s mutations and assays

• Overview SH2, SH3, ZnF

• Functional sites, 4º structureRGS, Ga

• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTION: A COMPUTATIONAL TOOL FOR PROTEIN FUNCTIONAL SITE DISCOVERY

Lichtarge LabBaylor College of Medicine

Page 18: What I will tell

INTEGRATINGSEQUENCE-STRUCTURE-FUNCTION INFORMATION

Lichtarge LabBaylor College of Medicine

SEQUENCE

STRUCTURE FUNCTION

A FUNDAMENTAL CHALLENGE

Non-deterministicprocess

Deterministicprocess

Page 19: What I will tell

Lichtarge LabBaylor College of Medicine

STRUCTUREX

A SIMPLER PROBLEM

SEQUENCE

FUNCTIONALSITE

? EXPERIMENTSTHEORY

Given structure x, where are its functional sites?

Page 20: What I will tell

NEED a CHEAP, SCALABLE method to characterize the key residue determinants of protein function

• What is important in the structure ?• Where are the functional sites? • How is specificity encoded ?

FUNCTIONAL SITE CHARACTERIZATION:A LIMITING STEP IN EXPLOITING STRUCTURES

• Mutational analysis is precise, but protein specific, costly, and requires assays.

• Structural Genomics producing vast numbers of new structures.

Lichtarge LabBaylor College of Medicine

Page 21: What I will tell

Very basic Evolutionary Tracing (ET)

Page 22: What I will tell

• Location, architecture and function of active sites are conserved• Specific variations impart novel and unique functional modulations

GAALF…….RT…W…KL

GAALY…….RT…W…KD

GAQLF…….FT…W…RE

IF these macroscopic observations apply to proteinsTHEN active site residues will be invariant within functional classes,THEREFORE identify functional sites by looking for such class specific residues.

Lichtarge LabBaylor College of Medicine

AR

YL

DTW

AK

GA

FF

LD

TW

QR

G

AR

FL

LTW

AK

G

FUNCTIONAL SITES EVOLVE THROUGH VARIATIONS ON A CONSERVED ARCHITECTURE

Page 23: What I will tell

0. GATHER SEQUENCES

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM

Lichtarge LabBaylor College of Medicine

Page 24: What I will tell

1. SPLIT THEM INTO FUNCTIONAL SUBGROUPS

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM

ASR.YTGVKKNVASR.YTGVKKNV

ASR.YTGHKKNMASR.YTGHKKNM

Lichtarge LabBaylor College of Medicine

Page 25: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

Consensus sequence: residues that are invariant within that group

Lichtarge LabBaylor College of Medicine

2. IDENTIFY KEY RESIDUES IN EACH SUBGROUP

Page 26: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

Compare consensus sequences

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNM

EVOLUTIONARYTRACE

Lichtarge LabBaylor College of Medicine

3. COMPARE KEY RESIDUES ACROSS GROUPS

Page 27: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

EVOLUTIONARYTRACE

By definition: if X varies, function changes the sine qua non of importance

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMX

Lichtarge LabBaylor College of Medicine

4. IDENTIFY CLASS SPECIFIC RESIDUES X

Page 28: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

ACTIVE SITE

A site where any variation is linked to functional change.

EVOLUTIONARYTRACE

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMXX___T__K_XX

Lichtarge LabBaylor College of Medicine

5. MAP TRACE RESIDUES ON THE STRUCTURE

Page 29: What I will tell

HOW TO DEFINE FUNCTIONAL SUBGROUPS?

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM

Lichtarge LabBaylor College of Medicine

Expert bias

Experiments

Approximation

Page 30: What I will tell

GROUPS

1

2

3

4

APPROXIMATE FUNCTIONAL SUBGROUPS FROM EVOLUTIONARY INFORMATION

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM

ASR.YTGVKKNVASR.YTGVKKNV

ASR.YTGHKKNMASR.YTGHKKNM

Hypothesis

A sequence identity tree approximates a functional classification.

If so, each node is a virtual assay that defines functional subgroups.

4 branches 4 functional groups.

Lichtarge LabBaylor College of Medicine

Page 31: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

Lichtarge LabBaylor College of Medicine

2. IDENTIFY KEY RESIDUES IN EACH SUBGROUP

Page 32: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNM

EVOLUTIONARYTRACE

Lichtarge LabBaylor College of Medicine

3. COMPARE KEY RESIDUES ACROSS GROUPS

Page 33: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

EVOLUTIONARYTRACE

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMX

Lichtarge LabBaylor College of Medicine

4. IDENTIFY TRACE RESIDUES X

DefinitionA trace residue is one that does NOT vary within branches. Generically this property is also called class specificity.

Page 34: What I will tell

GROUPS

1

2

3

4

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNVASR.YTGVKKNVASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTGHKKNM

CONSENSUSSEQUENCES

ACTIVE SITEEVOLUTIONARYTRACE

KE-TFT-HK-LMVERT-TG-K-QMASR.YTGVKKNVASR.YTGHKKNMXX___T__K_XX

Lichtarge LabBaylor College of Medicine

5. MAP TRACE RESIDUES ON THE STRUCTURE

Page 35: What I will tell

GROUPS

1

-----T--K---_____T__K___

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNM---.-T--K---

CONSENSUSSEQUENCES

EVOLUTIONARYTRACE

1

rank 1DefinitionThe trace rank is the fewest number of branches at which a residue first becomes class specific.

ACTIVE SITE

Lichtarge LabBaylor College of Medicine

6. EVOLUTIONARY IMPORTANCE RANK

1

Page 36: What I will tell

1

1

GROUPS

1

2

-E-T-T--K--MASR.YTG-KKN-_X___T__K___

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQM-E-T-T--K--M

ASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTG-KKN-

CONSENSUSSEQUENCES

EVOLUTIONARYTRACE

ACTIVE SITE

rank 2

Lichtarge LabBaylor College of Medicine

6. RANK OF EVOLUTIONARY IMPORTANCE

2

Page 37: What I will tell

GROUPS

1

2

3

KE-TFT-HK-LMVERT-TG-K-QMASR.YTG-KKN-XX___T__K_X_

KERTFTGHKKLMKERTFTGHKRLMKERTFTVHKRLMKEKTFTGHKKLMKE-TFT-HK-LM

VERTFTGHKSQMVERTDTGHKRQMVERTFTGMKRQMVERT-TG-K-QM

ASR.YTGVKKNMASR.YTGVKKNVASR.YTGHKKNMASR.YTGHKKNMASR.YTG-KKN- Lichtarge Lab

Baylor College of Medicine

CONSENSUSSEQUENCES

EVOLUTIONARYTRACE

ACTIVE SITE

rank 3

A WELL DEFINED ALGORITHMIC PROCEDURE

1

12

3

Use the tree’s intrinsic hierarchy to assign an evolutionary trace rank to every residues.

Page 38: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga

• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 39: What I will tell

Lichtarge LabBaylor College of Medicine

SH2 DOMAIN

Get an SH2 structure

Extract the sequence

Gather homologs: Blast, FASTA...

Align: PILEUP, CLUSTALW...

Construct a tree: PHYLIP,…

Trace!

Page 40: What I will tell

Lichtarge LabBaylor College of Medicine

0° 90° 180° 270°

A

B

C

D

EF

G

SH2 DOMAIN

Page 41: What I will tell

Trace residues (colored)

• exist

• Accumulate with more branches

• map unevenly on the structure,

• up until they scatter

Lichtarge LabBaylor College of Medicine

0° 90° 180° 270°

A

B

C

D

EF

G

SH2 DOMAIN

Page 42: What I will tell

Mutations of residues ranked

• best kill function

• lower modulate it

• worst no

effect

Lichtarge LabBaylor College of Medicine

0° 90° 180° 270°

A

B

C

D

EF

G

SH2 DOMAIN

Page 43: What I will tell

Lichtarge LabBaylor College of Medicine

Trace cluster matches binding site

Binding site (Waksman et al.)

80 sequences

40 sequences

SH2 DOMAIN

Page 44: What I will tell

Trace cluster matches the binding site (cyan).

But it matches the functional site (red) even better.

Evolution’s experiments agree with laboratory experiments

SH3 DOMAIN

Lichtarge LabBaylor College of Medicine

Page 45: What I will tell

Lichtarge LabBaylor College of Medicine

INTRACELLULAR HORMONE RECEPTORS

Trace residue clusters match the protein-DNA interface

Page 46: What I will tell

If• The dendrogram approximates a functional tree.

• The active site evolves through variations on a conserved architecture.

Then• Class specific residues can be found

• They cluster at functional sites (protein-protein, protein-DNA interfaces)

• They are ranked following a functional hierarchy:• functionally essential residues are first, • modulators of specificity follow,• then noise appears, unlike signal it is scattered rather than

clustered.

Lichtarge LabBaylor College of Medicine

Lichtarge et al. J. Mol. Biol. (1996)

PROOF OF PRINCIPLE

Page 47: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview: control studies SH2, SH3, ZnF• Bona fide predictions of functional sites Galpha

and 4º structure RGS• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 48: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP

Effector1

7TMR

G PROTEIN SIGNALING

G

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000

• Ubiquitous in eukaryotes• Sight smell taste pain reward

inflammation• ≥ 80% of neuroendocrine signaling, • 100% of autonomic physiology.

• 40-60% of all drugs

Page 49: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP

Effector

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000

17TMR

G PROTEIN-COUPLED RECEPTOR ACTIVATION

Page 50: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP Ga

Gbg

Effector

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000

Activation

2

7TMR

GTP a

G PROTEIN ACTIVATION

Page 51: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP Ga

Gbg

Effector

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000

3

Adenylyl CyclasecGMP-PDEK ChannelsPhospholipase C b

GasGatGaiGaq

Activation7TMR

GTP a

EFFECTOR ACTIVATION

Page 52: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP Ga

Effector

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000 Adenylyl Cyclase

cGMP-PDEK ChannelsPhospholipase C b

4

GasGatGaiGaq

Activation7TMR

GTP

Changes in concentration of intracellular 2nd messengers

a

CELLULAR EFFECT

Page 53: What I will tell

Lichtarge LabBaylor College of Medicine

Gbg

Ga GDP Ga

Gbg

Effector

LightCalciumEpinephrineAngiotensinThrombinLH, FSH >1000

7TMR

GTP

Changes in concentration of intracellular 2nd messengers

a

RGSSTOP

FIRST PROSPECTIVE STUDY

5

Where does Galpha bind the receptor?

Page 54: What I will tell

Lichtarge LabBaylor College of Medicine

A MODEL OF THE G PROTEIN TRIMER-RECEPTOR COMPLEX

•ET identifies 3 surfaces on Ga 1. Cleft ----> GTP/GDP 2. Cterm----> GPCR 3. ? ----> Gb

• Since Gb also interact with 7TMR, this leads to a low resolution model of the complex

Structures from Wall et al Cell ‘95Lambright et al Nature ‘96

Page 55: What I will tell

Lichtarge LabBaylor College of Medicine

THE Galpha-Gbeta INTERFACE

• A2 and B1 match the footprints Gbeta and Galpha

• A2 goes beyond the Gbeta footprint: additional interaction ?

Lichtarge et al. (1996) PNAS

Page 56: What I will tell

PREDICTION vs ALA SCAN

Lichtarge LabBaylor College of Medicine

Lichtarge et al. (1996) PNAS

No effect

108 alanine mutants Two assays:

• Activation-dependent protection from Trypsin degradation

• Binding to photoactivated rhodopsin in membranes

Impaired

Onrust et al. (1997) Science

Page 57: What I will tell

Lichtarge LabBaylor College of Medicine

Ala scan ET + - + 36 17 - 17 38

p=0.004

• Accuracy > 70%• p = 0.004• Disagreement in yellow region linked to assay limitations

Lichtarge et al. Meth Enzym. (2002)

PREDICTION vs ALA SCAN

Page 58: What I will tell

EVOLUTIONARY TRACE IN G PROTEINS

Lichtarge LabBaylor College of Medicine

PROSPECTIVE STUDY

Multiple clusters of trace residues

Each assigned to a specific ligand (receptor, effector, Gb, nucleotides)

Low resolution 4º structure follows from the assignments

Anticipates 7 out of 10 alanine mutations correctly (p=0.004), disagreements reflect assay limitations

Lichtarge et al. (1996) PNAS Onrust et al. (1997) Science

Lichtarge et al. (2001, In Press) Meth. Enzym.

Page 59: What I will tell

membrane

GPCRPDE

Lichtarge LabBaylor College of Medicine

Out

In

ligand

GbgGTP

Ga

PDE

GDP RGSRegulator ofG proteinSignaling

REGULATORS OF G PROTEIN SIGNALING

What is the basis for this difference?

RGS proteins binds Galpha and enhances GTP’ase activity

PDEgamma slows the GPTase accelerating property of RGS7

PDEgamma boosts the GPTase accelerating property of RGS9

Page 60: What I will tell

Lichtarge LabBaylor College of Medicine

Family 1

Family 2

Family 3

Family 4

Family 5

Family 6

Family 7

6. Swap residues to swap function

EVOLUTION-BASED PROTEIN DESIGN

1. Identify relevant patterns of variation 2. Map these positions onto the structure

3. Clusters predict functional sites 4. Model 4º structure

5. Selectively block function

Page 61: What I will tell

17 trace residues

• 10 at Galpha interface

• 7 that extend beyond: a second active site S2 ?

A NEW FUNCTIONAL SITE - S2

RGS

RGS

S2

Lichtarge LabBaylor College of Medicine

Page 62: What I will tell

Putative Ga-PDEg Binding Site

RGS cluster S2

AN RGS SITE LINKED TO PDEgamma

Sowa et al (2000) PNAS

S2 interacts with PDEgamma and modulates its effect on Galpha

RGS7 RGS9

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

Dkin

act (

s-1)

- PDE

+ PDE

117124

(117,124) mutations mimic PDEg

117124131

131 confers RGS9-like activity

Sowa et al 2001 Nature Struct Bio Lichtarge LabBaylor College of Medicine

Page 63: What I will tell

117 124

131

Slep et al. (2001)Nature

PDE INTERACTION

Direct contact betweenS2 and PDEgamma

Putative Ga-PDEg Binding Site

RGS cluster S2

Sowa et al (2000) PNAS

Lichtarge LabBaylor College of Medicine

Page 64: What I will tell

•Uncover in part how G protein signaling turns off •Link raw sequence and structure data to function•Guide mutational studies and anticipate outcome•Design specificity by swapping trace residues among homologs•Anticipate protein-protein 4º structure

Predict • novel functional interface• specificity determinants• RGS-effector 4º structure

Target mutagenesis • allosteric on-off switch • RGS7-RGS9 specificity• trace residue pathway

Crystallography • Confirms RGS-effector 4º structure

Sowa et al. PNAS 2000

Sowa et al. Nature Struc Biol 2001

PREDICTION and VALIDATION in RGS

Lichtarge LabBaylor College of Medicine

Page 65: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview: control studies SH2, SH3, ZnF

• Bona fide predictions of functional sites Ga

and 4º structure RGS

• Functional consistency and annotation RGS, ZnF, GPCRs

• Remote homology and alignmentsGPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 66: What I will tell

Function has been experimentally determined in 0.5% of sequences It has been inferred by homology in another 4.5% (Karp P. Bioinformatics 2001)

Lichtarge LabBaylor College of Medicine

SEQUENCE

STRUCTURE FUNCTION

DATA vs INFORMATION

Page 67: What I will tell

Lichtarge LabBaylor College of Medicine

FUNCTIONAL ANNOTATION

1. How does specificity arise at a functional site?

2. Do these proteins perform the same function?

Page 68: What I will tell

INTRACELLULAR HORMONE RECEPTORS

Lichtarge LabBaylor College of Medicine

• The largest eukaryotic family of transcriptional regulators.

• Steroid IRs homodimerize onto palindromic response elements.

• Others heterodimerize onto double or inverted repeats.

• All bind DNA via a Zn-finger domain.

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

Page 69: What I will tell

Trace residue clusters match the protein-DNA interface

INTRACELLULAR HORMONE RECEPTORS

Lichtarge LabBaylor College of Medicine

Page 70: What I will tell

TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE

Lichtarge LabBaylor College of Medicine

Page 71: What I will tell

Lichtarge LabBaylor College of Medicine

GROUP 1TRACE RESIDUES

RESPONSEELEMENT

YF 452H 451KA 461R 496R 489R 466F 463

MOSTLY INVARIANT MOSTLY VARIABLE

TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE

Page 72: What I will tell

Lichtarge LabBaylor College of Medicine

GROUP 2TRACE RESIDUES

RESPONSEELEMENT

513 GKDEMN511 RKLVQ465 GKR458 EGDN459 GSA490 TKNRSC493 KPQR

MOSTLY VARIABLEMOSTLY INVARIANT

TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE

Page 73: What I will tell

Lichtarge LabBaylor College of Medicine

• DNA binding evolves through variations on a theme.• Protein-DNA contacts have similar patterns of variation.

GROUP 1TRACE RESIDUES

GROUP 2TRACE RESIDUES

RESPONSEELEMENT

YF 452H 451KA 461R 496R 489R 466F 463

513 GKDEMN511 RKLVQ465 GKR458 EGDN459 GSA490 TKNRSC493 KPQR

MOSTLY VARIABLEMOSTLY INVARIANT

TRACE RANK AND CORRELATED EVOLUTION OF A PROTEIN-DNA INTERFACE

Page 74: What I will tell

DNA RECOGNITION DETERMINANTS

Lichtarge LabBaylor College of Medicine

POSITION

N490 D513

G459R465 Q493 E458

R511

Y452K461

H451 F463 R466 R489 R496

•Highly conserved•Bind conserved bases

•Highly variable•Not conserved•Bind variable bases

Page 75: What I will tell

Lichtarge LabBaylor College of Medicine

VARIATION

POSITION

N490 D513

S A SG459R465 Q493 E458EGK GKP EKP EKP EKQ EKQ DRR DKR

R511 K L L V L Q

F

A

F

Y452K461

H451 F463 R466 R489 R496

•Highly conserved•Bind conserved bases

•Highly variable•Not conserved•Bind variable bases

DNA RECOGNITION DETERMINANTS

T K K K R R N N N N K S N N C

Page 76: What I will tell

A DNA RECOGNITION KEY ?

Lichtarge LabBaylor College of Medicine

VARIATION

POSITION

PARTITION

knirandr

prgrmcr

gcrestr thra thrb ppa rar ror nur ear rxr

P21N490 D513

S A SG459R465 Q493 E458EGK GKP EKP EKP EKQ EKQ DRR DKR

R511 K L L V L Q

F

A Y452K461

H451 F463 R466 R489 R496

P19P17

P15

P9

P7

P1

Q3

By tracking class specific variations along the tree, it is possible to link specific side chains to specific functions

•Highly conserved•Bind conserved bases•K461A correlates with a base change

•Highly variable•Not conserved•Bind variable bases

T K K K R R N N N N K S N N C

F

Page 77: What I will tell

•Uncover in part how G protein signaling turns off •Link raw sequence and structure data to function•Guide mutational studies and anticipate outcome•Design specificity by swapping trace residues among homologs•Anticipate protein-protein 4º structure

Predict • novel functional interface• specificity determinants• RGS-effector 4º structure

Target mutagenesis • allosteric on-off switch • RGS7-RGS9 specificity• trace residue pathway

Crystallography • Confirms RGS-effector 4º structure

Sowa et al. PNAS 2000

Sowa et al. Nature Struc Biol 2001

THIS WAS THE TECHNIQUE USED IN THIS RGS STUDY

Lichtarge LabBaylor College of Medicine

Page 78: What I will tell

Lichtarge LabBaylor College of Medicine

FUNCTIONAL ANNOTATION

1. How does specificity arise at a functional site?

2. Do these proteins perform the same function?

Page 79: What I will tell

Lichtarge LabBaylor College of Medicine

NO SIGNAL AT THE DIMER INTERFACE !

Page 80: What I will tell

INTRACELLULAR HORMONE RECEPTORS

Lichtarge LabBaylor College of Medicine

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

• Steroid IRs homodimerize onto palindromic response elements

• Others IRs heterodimerize onto double or inverted repeats

Page 81: What I will tell

THE STEROID DIMER INTERFACE

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

Can identify an interface that is unique to a subgroup (steroids) by restricting ET to that branch

Lichtarge LabBaylor College of Medicine

Page 82: What I will tell

DO THRs USE THE DIMER INTERFACE ?

Lichtarge LabBaylor College of Medicine

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

Can test whether a specific subgroup uses a given functional site. NO!

Page 83: What I will tell

Lichtarge LabBaylor College of Medicine

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

DO PPARs USE THE DIMER INTERFACE ?

Can test whether a specific subgroup uses a given functional site. NO!

Page 84: What I will tell

Lichtarge LabBaylor College of Medicine

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

ROR

NUR

ANDRPRGR

MCRGCR

RAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

DO RXRs USE THE DIMER INTERFACE ?

NO!

Page 85: What I will tell

Lichtarge LabBaylor College of Medicine

SteroidHead to Head

Non-SteroidHead to Tail

Non-SteroidTail to Tail

ESTR

THR

PPA

NUR

ANDRPRGR

MCRGCR

RAR

EAR

RXR

Momomer

ACTIVATION CHAPERONE BINDING LIGAND BINDING SILENCING

N CDNA

BINDING

DO RARs USE THE DIMER INTERFACE ?

Can test whether a specific branch uses a given functional site. YES !

Page 86: What I will tell

SUBGROUP ANALYSIS or DIFFERENCE ET

•ET IDENTIFIES RESIDUES INVARIANT WITHIN EVERY BRANCH OF A FAMILY.•BRANCHES CAN BE PRUNED, TO SEARCH FOR SURFACES UNIQUE TO THE REMAINING BRANCHES •IF FOUND, THESE SURFACES SUGGEST THAT THE REMAINING BRANCHES SHARE COMMON SPECIFIC FUNCTIONS.

A B

U ====

C

U

D

U

A B

U

B

UD

CA

U

Lichtarge LabBaylor College of Medicine

Page 87: What I will tell

INTRACELLULAR RECEPTORS DNA BINDING DOMAIN

Lichtarge LabBaylor College of Medicine

• Identify protein-DNA binding sites.

• Suggest how DNA recognition specificity is encoded.

• Identify subgroup specific active sites (dimerization, LH).

• Find sites, and by inference functions, that may be shared by distant branches of a sequence family.

• Consistency and intersection of ET signal is important

Lichtarge et al. J. Mol. Biol. (1997) 274:325-337

Page 88: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview: control studies SH2, SH3, ZnF• Bona fide predictions of functional sites Ga

and 4º structure RGS• Functional annotation RGS, ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 89: What I will tell

Lichtarge LabBaylor College of Medicine

FUNCTIONAL ANNOTATION

1. How does specificity arise at a functional site?

2. Do these proteins perform the same function?

Page 90: What I will tell

Asthma. Expert Opin Investig Drugs. 2000 Bertrand CP, Ponath PDCardiac diseases Cell Signal. 2000 Chakraborti S, Chakraborti T, Shaw G.Inflammation and infectious diseases. Blood Murdoch C, Finn A. 2000 May 15;95(10):3032-43.Proliferative vascular disease. Life. 1999 Sep;48(3):257-61. Iaccarino G, Hypercalcemia of malignancy: Int J Oncol. 2000 Rabbani SAAllergic lung disease Inflamm Res. 1999 . Wells TN, Proudfoot AE.Hyperthyroidism. Thyroid. 1999 Jul;9(7):727-33. Zimmerman D. HIV-1 co-receptor. Annu Rev Immunol. 1999;Berger EA, The next generation of drug targets? Br J Pharmacol. 1998 Wilson S, et alKidney: Exp Nephrol. 1998 Breyer MD. Calcium receptor Exp Nephrol. 1998 Hory B, et al. Nephrogenic diabetes insipidus. J Mol Med. 1998 Oksche A, Rosenthal W.Lung cancer. J Clin Oncol. 1998 Salgia R, Skarin AT. Alzheimer's disease Life Sci. 1995 Flynn DD, Ferrari-DiLeo G, Levey AI, Mash DC.

GPCRs

• Ubiquitous eukaryotic receptors• Mediate

• sight/smell/taste• nearly all neuroendocrine signaling • all autonomic physiology

• 40-60% of all drugs target GPCRs

Lichtarge LabBaylor College of Medicine

Page 91: What I will tell

• 7 transmembrane helices • Variable length of loops and termini (N is out, C is in)• 5 main classes:

• Rhodopsin-like• Secretin-like• Metabotropic glutamate / pheromone• Fungal pheromone• cAMP Dicty receptors• Frizzled/Smoothened• Drosophila odorant• Nematode Chemor…• Class Y• Bacterial rhodopsins

• Helices 3,6,7 changes relative orientations upon activation.

Lichtarge LabBaylor College of Medicine

Page 92: What I will tell

PURPOSE OF ET STUDIES IN GPCRS

Determine• Ligand binding site • Conformational switch involving H3, H6,...• Dimerization ?• G protein coupling site

Goals• Target mutations and drug design• Predict the G protein target• Create constitutively active receptors for assays• Modify G protein target for assay purposes

Lichtarge LabBaylor College of Medicine

Page 93: What I will tell

Trace all GPCRs

Lichtarge LabBaylor College of Medicine

STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY

Page 94: What I will tell

Trace of GPCR X

Lichtarge LabBaylor College of Medicine

STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY

Page 95: What I will tell

Trace of GPCR X

-

Trace all GPCRs

=

-

Lichtarge LabBaylor College of Medicine

STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY

Page 96: What I will tell

Trace of GPCR X

-

Trace all GPCRs

=

Lichtarge LabBaylor College of Medicine

STRATEGY TO IDENTIFY FUNCTIONAL DETERMINANTS IN A RECEPTOR FAMILY

- X specificity Determinants

Page 97: What I will tell

ET IN GPCRS

This strategy is only justified if GPCRs have related structures and function.

• Do they share a common structure?• Do they share common functional determinants?

1. Show similarities in related GPCRs (+ control)2. Show similarities in unrelated GPCRs (test).3. Show no similarities in non GPCRs (- control)

Lichtarge LabBaylor College of Medicine

Page 98: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga

• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 99: What I will tell

COGNATE RESIDUES OFTEN HAVE SIMILAR TRACE RANKS

OPSIN

THR

ADR

Trace ranks along helix 5

Peaks = greater importance. Troughs = lesser importance.

Peaks and troughs tend to align, more so closer to the G protein interface.

Lichtarge LabBaylor College of Medicine

Evolutionary importance appears to be correlated across GPCR families.

COMBINED

Lichtarge LabBaylor College of Medicine

Page 100: What I will tell

OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV

-0.2

-0.1

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9

Series1

TRACE RANKS ARE CORRELATED IN CLASS A

-4 -3 -2 -1 0 1 2 3 4offset

Lichtarge LabBaylor College of Medicine

Page 101: What I will tell

OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV

-0.2

-0.1

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9

Series1

-4 -3 -2 -1 0 1 2 3 4offset

TRACE RANKS ARE CORRELATED IN CLASS A

Lichtarge LabBaylor College of Medicine

Page 102: What I will tell

OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV

-0.2

-0.1

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9

Series1

-4 -3 -2 -1 0 1 2 3 4offset

TRACE RANKS ARE CORRELATED IN CLASS A

Lichtarge LabBaylor College of Medicine

Page 103: What I will tell

OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV

-0.2

-0.1

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9

Series1

-4 -3 -2 -1 0 1 2 3 4offset

TRACE RANKS ARE CORRELATED IN CLASS A

Lichtarge LabBaylor College of Medicine

Page 104: What I will tell

OFFSET -4 -3 -2 -1 0 1 2 3 4OPSIN AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLLIFS AVTRILTVLC5A MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTYPYA MAVRLMASTOLF G IMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTYTFL GIMRIAFLTADR VMLAVTAPL . . RVMLAVTAPL... LRVMLAVTAPL.. ILRVMLAVTAPL. IILRVMLAVTAPL .IILRVMLAVTAP ..IILRVMLAVTA ...IILRVMLAVT . . IILRV

-0.2

-0.1

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9

Series1

•The correlation is significant and greatest at the correct alignment.•Can this guide the alignment of Class A with Class B ?

-4 -3 -2 -1 0 1 2 3 4offset

TRACE RANKS ARE CORRELATED IN CLASS A

Lichtarge LabBaylor College of Medicine

Page 105: What I will tell

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

0

5

10

15

20

25

residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 40

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

ADR vs Class A Class B vs Class A

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3 -2 -1 0 1 2 3 4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3 -2 -1 0 1 2 3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

residue position shift

Spearman rank-order correlation coefficient

-4 -3-2 -1

01 2 3

4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2 -1

0

1 2 3

4

Class C vs Class A BR vs. Class A

Perc

ent I

denti

tyRa

nk C

orre

latio

nPe

rcen

t Ide

ntity

XRa

nk C

orre

latio

n

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4

offset

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

25%20%15%10%5%0%

25%20%15%10%5%0%

25%20%15%10%5%0%

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

.50

.40

.30

.20

.100

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

Lichtarge LabBaylor College of Medicine

CLASS A, GPCRs CAN BE CO-ALIGNED

Page 106: What I will tell

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

0

5

10

15

20

25

residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 40

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

ADR vs Class A Class B vs Class A

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3 -2 -1 0 1 2 3 4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3 -2 -1 0 1 2 3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

residue position shift

Spearman rank-order correlation coefficient

-4 -3-2 -1

01 2 3

4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2 -1

0

1 2 3

4

Class C vs Class A BR vs. Class A

Perc

ent I

denti

tyRa

nk C

orre

latio

nPe

rcen

t Ide

ntity

XRa

nk C

orre

latio

n

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4

offset

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

25%20%15%10%5%0%

25%20%15%10%5%0%

25%20%15%10%5%0%

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

.50

.40

.30

.20

.100

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

Lichtarge LabBaylor College of Medicine

CLASS A and B GPCRs CAN BE CO-ALIGNED

Page 107: What I will tell

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

0

5

10

15

20

25

residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 40

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

ADR vs Class A Class B vs Class A

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3 -2 -1 0 1 2 3 4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3 -2 -1 0 1 2 3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

residue position shift

Spearman rank-order correlation coefficient

-4 -3-2 -1

01 2 3

4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2 -1

0

1 2 3

4

Class C vs Class A BR vs. Class A

Perc

ent I

denti

tyRa

nk C

orre

latio

nPe

rcen

t Ide

ntity

XRa

nk C

orre

latio

n

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4

offset

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

25%20%15%10%5%0%

25%20%15%10%5%0%

25%20%15%10%5%0%

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

.50

.40

.30

.20

.100

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

Lichtarge LabBaylor College of Medicine

CLASS A, B, and C GPCRs CAN BE CO-ALIGNED

Page 108: What I will tell

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3-2

-1 0 12

3 4

0

5

10

15

20

25

residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 40

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2

-1 0 1

2

3 4

ADR vs Class A Class B vs Class A

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Residue position shift

Spearman rank-order correlation coefficient

-4 -3 -2 -1 0 1 2 3 4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3 -2 -1 0 1 2 3 4

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

residue position shift

Spearman rank-order correlation coefficient

-4 -3-2 -1

01 2 3

4

0

5

10

15

20

25

Residue position shift

% of identity

-4 -3 -2 -1 0 1 2 3 4

-2

0

2

4

6

8

10

12

Residue position shift

correlation coefficient X % identity

-4 -3

-2 -1

0

1 2 3

4

Class C vs Class A BR vs. Class A

Perc

ent I

denti

tyRa

nk C

orre

latio

nPe

rcen

t Ide

ntity

XRa

nk C

orre

latio

n

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset -4 -3 -2 -1 0 1 2 3 4

offset

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

25%20%15%10%5%0%

25%20%15%10%5%0%

25%20%15%10%5%0%

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

.50

.40

.30

.20

.100

.50

.40

.30

.20

.100

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

-4 -3 -2 -1 0 1 2 3 4offset

CLASS A, B, C GPCRs CAN BE CO-ALIGNED BUT BACTERIORHODOPSIN CANNOT

Lichtarge LabBaylor College of Medicine

Page 109: What I will tell

ET (Rhodopsin)

-

ET (A+B)

=

• Surround retinal• Funnel towards the generic signaling determinants

Lichtarge LabBaylor College of Medicine

TRACE RESIDUES UNIQUE TO RHODOPSIN

Page 110: What I will tell

CHEMOKINEOPSINOLFACTORYADRENERGIC

These variations suggest there are significant differences in the details of ligand coupling.

Lichtarge LabBaylor College of Medicine

TRACE RESIDUES UNIQUE TO OTHER GPCRs

Page 111: What I will tell

CONCLUSIONS

Lichtarge LabBaylor College of Medicine

Sheikh et al JBC 1999 Baranski et al JBC 1999Geva et al JBC 2000 Lichtarge et al Meth. Enzymol. 2001

A strategy to: • Align receptors from different Classes.• Identify global determinants of the switch mechanism.• Identify specific determinants of ligand binding.

FUTURE• Test in specific receptors.• Extend to study other aspects of GPCR function (dimerization, G protein specificity).• Correlate binding specificity determinants with ligand binding affinity data.

Page 112: What I will tell

PROBLEM

METHOD

Given a protein structure• Where is the active site ?• What are the key residue determinants of function?

the Evolutionary Trace (ET):Use evolution’s mutation and assays

• Overview SH2, SH3, ZnF• Functional sites, 4º structure RGS, Ga

• Functional annotation ZnF, GPCRs• Remote homology and alignments GPCRs• Generality

EVOLUTIONARY TRACE ANNOTATION OF PROTEIN FUNCTIONAL SITES

Lichtarge LabBaylor College of Medicine

Page 113: What I will tell

LARGE SCALE ET

Large scale ET• Scalability Input: tolerance to insertions and deletions• Statistics Output: objective significance• Pipeline Automation and GUI

Applications• Structural Genomics• Functional Genomics• Pharmaceuticals & Bioengineering

Lichtarge LabBaylor College of Medicine

Page 114: What I will tell

STATISTICS OF CLUSTER NUMBER AND SIZE

If trace residues were randomly picked

ET clusters are fewer and larger than random.

An actual trace of pyruvate decarboxylase.

Lichtarge LabBaylor College of Medicine

Page 115: What I will tell

Random Distribution of the total number of clusters at 15% coverage in Pyruvate Decarboxylase

80 residues out of 537 (15%) were drawn randomly, and the number of clusters was tallied in each of 5000 trials.

Perc

enta

ge F

requ

ency

5% 95%

Total number of clusters

Lichtarge LabBaylor College of Medicine

Page 116: What I will tell

Perc

enta

ge F

requ

ency

Protein Size

1% Threshold

Random Distribution of the total number of clusters at 15% coverage in Pyruvate Decarboxylase

1%

A random draw of 80 out of 537 (15%) residues will generate 27 clusters or less only once every 100 trials.

Total number of clusters

Total number of clusters

Lichtarge LabBaylor College of Medicine

Page 117: What I will tell

Perc

enta

ge F

requ

ency

1%

Protein Size

1% Threshold

Random Distribution of the total number of clusters at 15% coverage in a-Amylase

A random draw of 64 of 425 residues (15%) will generate less than 20 clusters only once every 100 trials in a-amylase

Total number of clusters

Lichtarge LabBaylor College of Medicine

Page 118: What I will tell

Perc

enta

ge F

requ

ency

1%

Protein Size

1% Threshold

Random Distribution of the total number of clusters at 15% coverage in annexin III

A random draw of 48 of 323 residues (15%) will generate less than 20 clusters only once every 100 trials in annexin III.

Total number of clusters

Lichtarge LabBaylor College of Medicine

Page 119: What I will tell

SIGNIFICANCE THRESHOLDS FOR THE NUMBER OF CLUSTERS VARIES LINEARLY WITH PROTEIN SIZE

100 300 500

30

20

10

Protein Size

1% Threshold

Significant at the 1% confidence level

Similar linear relationships at other confidence levels

Not significant at the 1% confidence level

Tota

l num

ber o

f clu

ster

s

Lichtarge LabBaylor College of Medicine

Page 120: What I will tell

Ligand binding domain of LDL receptor LDL receptorc-Src tyrosine kinase; SH3 Tyrosine KinaseBiotinyl domain CarboxylaseAcyl CoA binding protein Binding proteinc-Src tyrosine kinase; SH2 Tyrosine KinaseBikunin Kunitz type inhibitorMannose binding protein Binds MannoseTpr2a-domain of Hop ChaperonePseudoazurin electron transportTpr1 domain of Hop ChaperoneRegulator of G-protein signaling regulator of G-protein signalingGalectin-3 CRD Galectin carbohydrate recognition domainMyoglobin Oxygen transportThermosome chaperoninPoly-A binding protein Gene regulationGrowth hormone Growth hormone Growth hormone receptor Growth hormone receptorAstacin Metalloproteinase (hydrolase)von Willebrand factor blood coagulationHSP-90 chaperoneGlutathione S-transferase, type-III transferaseAdenylate kinase phosphotransferaseF-MuLV viral glycoproteinEstrogen receptor Nuclear receptor

Indole-3-glyceophosphate synthase SynthaseTriosephosphate isomerase gluconeogenesisCyclins transferaseB-Lactamase hydrolaseDeacetoxycephalosporin C Oxidoreductase2,5-diketo-D-gluconic acid reductase A OxidoreductaseEndonuclease IV endonucleaseDihydropteroate SynthaseProtein phospatase-1 hydrolaseSignal sequence recognition proteinCyclins transferaseThioredoxin reductase reductaseAnnexin III calcium/phospholipid binding proteinTransferrin iron transportPeroxidase peroxidaseRhodopsin signaling proteinSerine/Threonine phosphatase hydrolasecitrate synthase synthasePhosphoglycerate kinase kinaseA amylase a-amylaseHIV Reverse transcriptase reverse transcriptasePyruvate decarboxylase Carbon-Carbon lyase

Protein Function Protein Function

ET IN 46 PROTEINS THAT ARE FUNCTIONALLY, STRUCTURALLY, EVOLUTIONARILY DIVERSE

•Folds: 19 a/b, 15 a, 7 b, 2 small, 1 multidomain, 1 membrane protein.•Origin: 24 eukaryotic, 18 euk.+prok., 2 prokaryotic, 2 viral proteins.•Role: signaling, metabolism, gene regulation, transport, folding, etc...

Lichtarge LabBaylor College of Medicine

Page 121: What I will tell

0 200 400 600Protein size (aa)

Num

ber o

f Clu

ster

s

0.3%

32 proteins have a coverage fraction of ~20%.

• 29 are significant with a p-value ≤ 5%.

• 19 are significant at a p-value ≤ 0.3 %.

• Only 3 are not significant at a level of 5%.

Number of Clusters Statistics20% coverage fraction, with gaps

30

20

10

30%

5%

Lichtarge LabBaylor College of Medicine

Page 122: What I will tell

1a3k: Galectin-3 CRD

Structural Epitope Trace Without Gaps Trace With Gaps

1elw: Tpr2a-domain of Hop

1bqk: Pseudoazurin

1am1: HSP-90

http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/trace_of_the_week/

Lichtarge LabBaylor College of Medicine

Page 123: What I will tell

16pk: Phosphoglycerate kinase

3ert: Estrogen receptor

1a80: 2,5-diketo-D-gluconic acid reductase A

1qum: Endonuclease IV

Structural Epitope Trace Without Gaps Trace With Gaps

http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/trace_of_the_week/

Lichtarge LabBaylor College of Medicine

Page 124: What I will tell

LARGE SCALE ET

Lichtarge LabBaylor College of Medicine

• Gap tolerance improves signal:noise and ease of use.

• Trace residues cluster non-randomly.

• Consistent with the cooperative nature of folding/function

• In nearly all proteins tested thus far, trace cluster are statistically significant and overlap with the binding sites.

Madabushi et al. JMB 2002

Page 125: What I will tell

SUMMARY

Lichtarge LabBaylor College of Medicine

ET is useful to• Rank - residue importance• Identify - functional sites

- ligand binding pockets and - specificity determinants SH2, SH3, GPCRs

• Anticipate - mutation outcomes RGS, Ga

- quaternary structureGabg, Ga-RGS-PDEg

• Recognize - remote homology GPCR• Target mutations to relevant sites GPCR,

RGS, NTP,... • Infer which homologs may share functions ZnF, GPCR • Statistically significant• Can be applied to a significant fraction of the PDB.

Page 126: What I will tell

STRATEGIES OF SEQUENCE ANALYSIS

SEQUENCECONSERVATION

Sequence A ~ Sequence B

Protein A ~ Protein B

SEQUENCEVARIATION

DeduceR (function,sequence) = 0

AssayFunction A & Function B

Structure A ~ Structure B Function A ~ Function B

Mutate Sequences soSequence A ≠ Sequence B

COMPUTATIONAL BIOLOGICAL Lichtarge Lab

Baylor College of Medicine

Page 127: What I will tell

THE EVOLUTIONARY TREE

Lichtarge LabBaylor College of Medicine

Page 128: What I will tell

EVOLUTION INTEGRATES SEQUENCE AND STRUCTURE

SEQUENCEDATABASE

EXPERIMENTSTHEORY

STRUCTUREDATABASE

FUNCTION

Meta-DATABASE

Lichtarge LabBaylor College of Medicine

Page 129: What I will tell

EVOLUTION AS A COMPUTATIONAL PRINCIPLE

FACTS

Annotation of functional sites in proteins

EXPERIMENTS

THEORYEVOLUTIONARY

FILTER

RELEVANT FACTS

Lichtarge LabBaylor College of Medicine

Page 130: What I will tell

USING EVOLUTION TO INTEGRATESEQUENCE-STRUCTURE-FUNCTION INFORMATION

Lichtarge LabBaylor College of Medicine

SEQUENCE

STRUCTURE FUNCTION

EVOLUTION

A FUNDAMENTAL CHALLENGE