Class II MHC binding

Preview:

DESCRIPTION

Gibbs Clustering Massimo Andreatta , Morten Nielsen CBS, Department of Systems biology DTU, Denmark . Class II MHC binding. MHC class II binds peptides in the class II antigen presentation pathway Binds peptides of length 9-18 (even whole proteins can bind!) Binding cleft is open - PowerPoint PPT Presentation

Citation preview

Gibbs ClusteringMassimo Andreatta, Morten Nielsen

CBS, Department of Systems biologyDTU, Denmark

Class II MHC binding• MHC class II binds peptides in the class II antigen presentation pathway• Binds peptides of length 9-18 (even whole proteins can bind!)• Binding cleft is open• Binding core is 9 aa

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 3

Conventional Gibbs samplingMHC class II binding

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

...

...

...

... DFAAQVDYPSTGLY

9 AAs

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRT ... ... ... ...DFAAQVDYPSTGLY

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 4

Gibbs sampling - sequence alignment

Why sampling? SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

...

...

...

... DFAAQVDYPSTGLY

50 sequences 12 amino acids long

try all possible combinations with a 9-mer overlap

450 ~ 1030 possible combinations

...computationally unfeasible

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 5

Gibbs sampling - sequence alignmentState transition

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

move to state +1

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 6

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

move to state +1

Accept or reject the move?

Note that the probability of going to the new state depends on the previous state only

Gibbs sampling - sequence alignmentState transition

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 7

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

move to state +1

Gibbs sampling - sequence alignment

Numerical example - 1

Accept move with Prob = 100%

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 8

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

SLFIGLKGDIRESTVDGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFAESLHNPYPDYHWLRT

move to state +1

Gibbs sampling - sequence alignment

Numerical example - 2

Accept move with Prob = 63.8%

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 9

Gibbs sampling - sequence alignment

What is the MC temperature?

iteration

T

it’s a scalar decreased during the simulation

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 10

Gibbs sampling - sequence alignment

What is the MC temperature?

iteration

T

it’s a scalar decreased during the simulation E.g. same dE=-0.3 but at different temperatures

t1=0.4

t3=0.02

t2=0.1

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 11

Z

f(Z)

Move freely around states when the system is “warm”, then cool it off to force it into a state of high fitness

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 12

Does it work?

HLA-DRB3*01:01

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 13

It works

High Temperature Low Temperature

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 14

Gibbs sampler. Prediction accuracy

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 15

Next generation peptide-chip technology

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 16

Identification of receptor binding motifs

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 17

Data interpretationPeptide Signal

LMLSLFEQSLSCQAQ 9

QGTDATKSIIFEAER -12

RLEEAQAYLAAGQHD 10

EISELRTKVQEQQKQ 44

FAGAKKIFGSLAFLP 70

VRASSRVSGSFPEDS -7

CKAFFKRSIQGHNDY 86

CEGCKAFFKRSIQGH 100

RLSEADIRGFVAAVV -7

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 18

Or with multiple specificities? Peptide Signal

LMLSLFEQSLSCQAQ 9

QGTDATKSIIFEAER -12

RLEEAQAYLAAGQHD 10

EISELRTKVQEQQKQ 44

FAGAKKIFGSLAFLP 70

VRASSRVSGSFPEDS -7

CKAFFKRSIQGHNDY 86

CEGCKAFFKRSIQGH 100

RLSEADIRGFVAAVV -7

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 19

Gibbs clustering (multiple specificities)

--ELLEFHYYLSSKLNK----------LNKFISPKSVAGRFAESLHNPYPDYHWLRT-------NKVKSLRILNTRRKL-------MMGMFNMLSTVLGVS----AKSSPAYPSVLGQTI--------RHLIFCHSKKKCDELAAK-

----SLFIGLKGDIRESTV----DGEEEVQLIAAVPGK----------VFRLKGGAPIKGVTF---SFSCIAIGIITLYLG-------IDQVTIAGAKLRSLN--WIQKETLVTFKNPHAKKQDV-------KMLLDNINTPEGIIP

Cluster 2

Cluster 1SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTIRHLIFCHSKKKCDELAAK

Multiple motifs!

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 20

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 21

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTF

SFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDV

KMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFA

ESLHNPYPDYHWLRTNKVKSLRILNTRRKL

MMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQ

DV--ELLEFHYYLSSKLNK---

--MMGMFNMLSTVLGVS------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLG

KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

3 Simple shift Remove peptide Phase shift

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 22

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQDV--ELLEFHYYLSSKLNK-----MMGMFNMLSTVLGVS------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLGKMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

MMGMFNMLSTVLGVS

3 3. Finding the optimal configuration (the MC moves)

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 23

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQDV--ELLEFHYYLSSKLNK------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLGKMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

MMGMFNMLSTVLGVS

4. Random shift of the coreMMGMFNMLSTVLGVS

3. Finding the optimal configuration (the MC moves)

5. Score new core to log-odds

matrices6. Accept or reject move

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 24

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTF

SFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDV

KMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFA

ESLHNPYPDYHWLRTNKVKSLRILNTRRKL

MMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQ

DV--ELLEFHYYLSSKLNK---

--MMGMFNMLSTVLGVS------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLG

KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

MMGMFNMLSTVLGVS

3 Simple shift Remove peptide Phase shift

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 25

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTF

SFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDV

KMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFA

ESLHNPYPDYHWLRTNKVKSLRILNTRRKL

MMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQ

DV--ELLEFHYYLSSKLNK------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLG

KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

MMGMFNMLSTVLGVS

4b. Random shift of the coreMMGMFNMLSTVLGVS

3 Simple shift Remove peptide Phase shift

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 26

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTF

SFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDV

KMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFA

ESLHNPYPDYHWLRTNKVKSLRILNTRRKL

MMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQ

DV--ELLEFHYYLSSKLNK------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLG

KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

g1 g2 gN

MMGMFNMLSTVLGVS

4b. Random shift of the coreMMGMFNMLSTVLGVS

5b. Score new core to all log-odds

matrices

E1 E2E3

3 Simple shift Remove peptide Phase shift

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 27

The algorithm

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTF

SFSCIAIGIITLYLGIDQVTIAGAKLRSLN

WIQKETLVTFKNPHAKKQDV

KMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFA

ESLHNPYPDYHWLRTNKVKSLRILNTRRKL

MMGMFNMLSTVLGVSAKSSPAYPSVLGQTI

RHLIFCHSKKKCDELAAK

1. List of peptides 2. create Nrandom groups

----IDQVTIAGAKLRSLN-WIQKETLVTFKNPHAKKQ

DV--ELLEFHYYLSSKLNK------AKSSPAYPSVLGQTI--

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLG

KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

---ESLHNPYPDYHWLRT-RHLIFCHSKKKCDELAAK-----VFRLKGGAPIKGVTF--LNKFISPKSVAGRFA--DGEEEVQLIAAVPGK----

-MMGMFNMLSTVLGVS---

g1 g2 gN

MMGMFNMLSTVLGVS

4b. Random shift of the coreMMGMFNMLSTVLGVS

5b. Score new core to all log-odds

matrices

E1 E2E3

6b. Accept or reject move

3 Simple shift Remove peptide Phase shift

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 28

The scoring function

-SLFIGLKGDIRESTV-----SFSCIAIGIITLYLGKMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

-SLFIGLKGDIRESTV---SFSCIAIGIITLYLG--KMLLDNINTPEGIIP----LNKYVHGTWRSILP-----NKVKSLRILNTRRKL-

P = Probability of accepting the

move

Avoid small specialized clusters

(s = 10)Maximize intra cluster similarity

whilst minimize inter cluster similarity

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 29

A mixture of 500 9mer peptides. How many motifs?

OR

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 30

A mixture of 500 9mer peptides. How many motifs?

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 31

Mixture of 100 binders for the two allelesATDKAAAAY A*0101EVDQTKIQY A*0101AETGSQGVY B*4402ITDITKYLY A*0101AEMKTDAAT B*4402FEIKSAKKF B*4402LSEMLNKEY A*0101GELDRWEKI B*4402LTDSSTLLV A*0101FTIDFKLKY A*0101TTTIKPVSY A*0101EEKAFSPEV B*4402AENLWVPVY B*4402

Two MHC class I alleles: HLA-A*0101 and HLA-B*4402

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 32

Two MHC class I alleles: HLA-A*0101 and HLA-B*4402

A0101 B4402

G 1

G 2

Mixed

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 33

Two MHC class I alleles: HLA-A*0101 and HLA-B*4402

97 3

3 97

A0101 B4402

G 1

G 2

Resolved

Mixed

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 34

Five MHC class I allelesA0101

A0201 A0301 B0702 B4402

G 0

G 1

G 2

G 3

G 4

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 35

Five MHC class I allelesA0101

0 1 76 1 02 4 0 0 955 87 5 1 093 2 19 0 20 6 0 98 3

A0201 A0301 B0702 B4402

G 0

G 1

G 2

G 3

G 4

HLA-B440294%

HLA-B070292%

HLA-A020189%

HLA-A010180%

HLA-A030197%

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 36

The right number of specificities

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 37

How many motifs?

l=0

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 38

How many motifs?

OR

l>0

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 39

Adding in alignment (MHC class II)

HLA-DRB1*03:01HLA-DRB1*04:01

SLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTIRHLIFCHSKKKCDELAAK

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 40

Dealing with noisy data

• Experimental data often contain false positives

• Outliers do not match any recurrent motif

• Introduce a garbage bin to collect outliers

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 41

Dealing with noisy data

• Introduce a garbage bin to collect outliersSLFIGLKGDIRESTVDGEEEVQLIAAVPGKVFRLKGGAPIKGVTFSFSCIAIGIITLYLGIDQVTIAGAKLRSLNWIQKETLVTFKNPHAKKQDVKMLLDNINTPEGIIPELLEFHYYLSSKLNKLNKFISPKSVAGRFAESLHNPYPDYHWLRTNKVKSLRILNTRRKLMMGMFNMLSTVLGVSAKSSPAYPSVLGQTIRHLIFCHSKKKCDELAAK

g1 g2 gn trash

Move to the trash cluster peptides that do not match any motif

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 42

Dealing with noisy data

50 random sequences are added to the data set

200 binders to 3 MHC class I alleles

3 alleles HLA-A0101 HLA-B0702 HLA-B4001 Random  g0 0 197 2 6 205g1 199 1 0 2 202g2 0 0 196 5 201trash 1 2 2 37 42

200 200 200 50

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 43

Dealing with noisy data

50 random sequences are added to the data set

200 binders to 3 MHC class I alleles

3 alleles HLA-A0101 HLA-B0702 HLA-B4001 Random  g0 0 197 2 6 205g1 199 1 0 2 202g2 0 0 196 5 201trash 1 2 2 37 42

200 200 200 50

DHHFTPQIINAFGWENAYSQTSYQYLI

ELPIVTPALADKNLIKCS

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 44

Dealing with noisy data

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 45

SH3 domains

- SH3 domains are small interaction modules abundantly found in eukaryotes

- They preferentially bind proline-rich regions (minimum consensus sequence is PxxP)

- Two main classes of ligands exist:

Surface representation of the Hck SH3 domain bound to an artificial high-affinity peptide ligand, in green. Schmidt et al. (2007)

class I+xPxP

class IIPxPx+

+: R or K: hydrophobicx: any amino acid

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 46

SH3 domains - Gibbs clustering

- Large data set of 2,457 unique peptide sequences* binding to Src SH3 domain

- 12 amino acids long peptides, unaligned with respect to the binding core

WVTAPRSLPVLPGSWVVDISNVEDNYSGNRPLPGIWRSPIVRQLPSLPRALPVMPTNGPMAKSRPLPMVGLVVRPLPSPEGFGQYRGRMLPVIWGTRALPLPRAYEGIVPLLPIRNGAVNPVRVLPNIPLSV...

*Kim, T., et al. (2011) MUSI: an integrated system for identifying multiple specificity from large peptide or nucleic acid data sets, Nucleic Acids Res, 1-8.

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 47

SH3 domains

Gibbs clustering - 1 cluster

class I+xPxP

class IIPxPx+

2,350 sequences(107 to trash)

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 48

SH3 domains

Gibbs clustering - 2 clusters

class I+xPxP

class IIPxPx+ (63 to trash)

466 sequences

Class II

1,928 sequences

Class I

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 49

SH3 domains

Gibbs clustering - 3 clusters

class I+xPxP

class IIPxPx+

1,404 sequences 560 sequences

(48 to trash)

445 sequences

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 50

SH3 domains

class I+xPxPclass II

PxPx+

Optimal solution has two

specificities

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 51

http://www.cbs.dtu.dk/services/GibbsCluster

CENTER FO

R BIOLO

GICAL SEQ

UEN

CE ANALYSIS

Department of Systems BiologyTechnical University of Denmark 52

Acknowledgements

• Massimo Andreatta, PhD (for doing all the work)

• John Sidney (La Jolla Institute for Allergy and Immunology, California, USA) for the binding affinity validation of the predicted MHC class I outliers.

• The European Union 7th Framework Program FP7/2007-2013 [grant number 222773]

Recommended