18
Biol. Chem. 2020; 401(1): 165–182 Review Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and biological approaches to map substrate specificities of proteases https://doi.org/10.1515/hsz-2019-0332 Received August 1, 2019; accepted October 11, 2019; previously published online October 22, 2019 Abstract: Proteases are regulators of diverse biological pathways including protein catabolism, antigen process- ing and inflammation, as well as various disease condi- tions, such as malignant metastasis, viral infection and parasite invasion. The identification of substrates of a given protease is essential to understand its function and this information can also aid in the design of specific inhibitors and active site probes. However, the diversity of putative protein and peptide substrates makes connecting a protease to its downstream substrates technically dif- ficult and time-consuming. To address this challenge in protease research, a range of methods have been devel- oped to identify natural protein substrates as well as map the overall substrate specificity patterns of proteases. In this review, we highlight recent examples of both synthetic and biological methods that are being used to define the substrate specificity of protease so that new protease-spe- cific tools and therapeutic agents can be developed. Keywords: activity-based probe; combinatorial peptide library; protease reactive warhead; protease substrate; proteomics; substrate specificity. Introduction It is estimated that over 600 proteases, roughly 2% of the human genome (Rawlings et al., 2006; Quesada et al., 2009), function together in diverse aspects of normal cellular physiology. In addition, proteases are key regula- tors of numerous pathological processes such as tumor metastasis (Mason and Joyce, 2011; Russell et al., 2015), angiogenesis (Bauvois, 2004) and inflammation, and play critical roles in the life cycles of various pathogens. For example, HIV-1 protease is essential for the life cycle of HIV virus, which cleaves newly synthesized polypro- teins to create the mature protein components of an HIV virion (Brik and Wong, 2003). The Gram-positive human pathogen, Staphylococcus aureus, uses serine protease fluorophosphonate-binding hydrolase B (FphB) to manip- ulate host-pathogen interactions to establish infection in distinct sites in vivo (Lentz et al., 2018). Studying protease function is crucial for understanding the mechanisms of both healthy and diseased states. Therefore, proteases are promising therapeutic targets for a multitude of disease indications. Proteases are enzymes that hydrolyze peptide bonds in a process that can result not only in the destruction of the protein target but also in the activation of signal- ing and other biological functions. The active site of pro- teases generally contains a substrate recognition motif and a catalytic triad or dyad where the chemical reaction to break the scissile amide bond of a substrate protein occurs. The primary catalytic mechanisms used by pro- teases have largely remained unchanged over evolution. However, substrate specificity has evolved to enable pro- cessing of diverse substrates for both protein turnover as well as functional activation of substrate proteins. Most serine proteases, for example, have active sites com- posed of two β-barrels, with the catalytic Ser 195 , His 57 and Asp 102 amino acids at the interface of the two domains for forming H-bonds with the P1–P3 residues of the substrate. The surface loops around the active site have evolved to enable highly divergent substrate recognition (Perona and Craik, 1997). An understanding of the substrate specifici- ties of proteases provides the potential to define protease function as they have evolved over long periods of time. Functional characterization of proteases in a bio- logical system has traditionally involved determining the substrate specificity and generating a specific probe or inhibitor. Due to the coexistence of a large number *Corresponding author: Matthew Bogyo, Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; and Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA, e-mail: [email protected]. https://orcid.org/0000-0003-3753- 4412 Shiyu Chen: Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA Joshua J. Yim: Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA

Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

Biol. Chem. 2020; 401(1): 165–182

Review

Shiyu Chen, Joshua J. Yim and Matthew Bogyo*

Synthetic and biological approaches to map substrate specificities of proteaseshttps://doi.org/10.1515/hsz-2019-0332Received August 1, 2019; accepted October 11, 2019; previously published online October 22, 2019

Abstract: Proteases are regulators of diverse biological pathways including protein catabolism, antigen process-ing and inflammation, as well as various disease condi-tions, such as malignant metastasis, viral infection and parasite invasion. The identification of substrates of a given protease is essential to understand its function and this information can also aid in the design of specific inhibitors and active site probes. However, the diversity of putative protein and peptide substrates makes connecting a protease to its downstream substrates technically dif-ficult and time-consuming. To address this challenge in protease research, a range of methods have been devel-oped to identify natural protein substrates as well as map the overall substrate specificity patterns of proteases. In this review, we highlight recent examples of both synthetic and biological methods that are being used to define the substrate specificity of protease so that new protease-spe-cific tools and therapeutic agents can be developed.

Keywords: activity-based probe; combinatorial peptide library; protease reactive warhead; protease substrate; proteomics; substrate specificity.

IntroductionIt is estimated that over 600 proteases, roughly 2% of the human genome (Rawlings et  al., 2006; Quesada et  al., 2009), function together in diverse aspects of normal

cellular physiology. In addition, proteases are key regula-tors of numerous pathological processes such as tumor metastasis (Mason and Joyce, 2011; Russell et  al., 2015), angiogenesis (Bauvois, 2004) and inflammation, and play critical roles in the life cycles of various pathogens. For example, HIV-1 protease is essential for the life cycle of HIV virus, which cleaves newly synthesized polypro-teins to create the mature protein components of an HIV virion (Brik and Wong, 2003). The Gram-positive human pathogen, Staphylococcus aureus, uses serine protease fluorophosphonate-binding hydrolase B (FphB) to manip-ulate host-pathogen interactions to establish infection in distinct sites in vivo (Lentz et al., 2018). Studying protease function is crucial for understanding the mechanisms of both healthy and diseased states. Therefore, proteases are promising therapeutic targets for a multitude of disease indications.

Proteases are enzymes that hydrolyze peptide bonds in a process that can result not only in the destruction of the protein target but also in the activation of signal-ing and other biological functions. The active site of pro-teases generally contains a substrate recognition motif and a catalytic triad or dyad where the chemical reaction to break the scissile amide bond of a substrate protein occurs. The primary catalytic mechanisms used by pro-teases have largely remained unchanged over evolution. However, substrate specificity has evolved to enable pro-cessing of diverse substrates for both protein turnover as well as functional activation of substrate proteins. Most serine proteases, for example, have active sites com-posed of two β-barrels, with the catalytic Ser195, His57 and Asp102 amino acids at the interface of the two domains for forming H-bonds with the P1–P3 residues of the substrate. The surface loops around the active site have evolved to enable highly divergent substrate recognition (Perona and Craik, 1997). An understanding of the substrate specifici-ties of proteases provides the potential to define protease function as they have evolved over long periods of time.

Functional characterization of proteases in a bio-logical system has traditionally involved determining the substrate specificity and generating a specific probe or inhibitor. Due to the coexistence of a large number

*Corresponding author: Matthew Bogyo, Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; and Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA, e-mail: [email protected]. https://orcid.org/0000-0003-3753-4412 Shiyu Chen: Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USAJoshua J. Yim: Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA

Page 2: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

166      S. Chen et al.: Mapping protease substrate specificity

Tabl

e 1:

 Exa

mpl

e of

pro

teas

e su

bstra

te s

peci

ficiti

es m

appe

d us

ing

tech

nolo

gies

repo

rted

in th

is re

view

.

Prot

ease

nam

e 

Prot

ease

type

 M

etho

d/lib

rary

 Di

scov

ered

sub

stra

te s

eque

nces

a 

Refe

renc

e

Hepa

tocy

te g

row

th fa

ctor

act

ivat

or (H

GFA)

 Tr

ansm

embr

ane

serin

e pr

otea

se 

PS-S

CL 

K(L/

M/n

)R | A

CCb

 (D

amal

anka

et a

l., 2

019)

KLK8

/neu

rops

in 

Serin

e pr

otea

se 

PS-S

CL 

(T/W

)(R/K

)(L/V

/I)R

 | Acc

 (D

ebel

a et

 al.,

201

8)Ca

thep

sin

L 

Cyst

eine

pro

teas

e 

PS-S

CL 

(Dab

/Dap

/Orn

/Agp

)(R/K

/Orn

/Dab

)FR 

| ACC

 (P

oreb

a et

 al.,

201

8)Ca

spas

e-1

 Cy

stei

ne p

rote

ase

 PS

-SCL

 W

XHD 

| ACC

 (R

amire

z et a

l., 2

018)

Casp

ase-

11 

Cyst

eine

pro

teas

e 

PS-S

CL 

VXHD

 | ACC

 (R

amire

z et a

l., 2

018)

Fact

or V

II (F

VII)

activ

atin

g pr

otea

se (F

SAP)

 Se

rine

prot

ease

 PS

-SCL

 X(

K/R)

Nle(

K/R)

 | ACC

 (K

ara

et a

l., 2

017)

Urok

inas

e-ty

pe p

lasm

inog

en a

ctiv

ator

(uPA

) 

Serin

e pr

otea

se 

PS-S

CL 

AcGT

AR-p

NA 

(Li e

t al.,

201

9)Ca

spas

e-3

 Cy

stei

ne p

rote

ase

 PS

-SCL

 DE

(V/I

) | A

CC 

(Por

eba

et a

l., 2

014b

)Hu

man

neu

troph

il se

rine

prot

ease

4 

Serin

e pr

otea

se 

PS-S

CL 

Ac-h

Cha-

Phe(

guan

)-Oic

-Arg

-ACC

 (K

aspe

rkie

wic

z et a

l., 2

015)

S. a

ureu

s Cl

pXP

 M

ultip

le p

rote

ases

 PS

-SCL

 (E

/I/V

/P)(E

/K/I

/L)(A

/L/D

/G) |

(L/I

) 

(Ger

sch

et a

l., 2

016)

Esch

eric

hia

coli

ClpX

P 

Mul

tiple

pro

teas

es 

PS-S

CL 

(E/P

/V/I

)(K/L

/E/Y

)(L/A

/G/N

/M/G

) |(L

/I)

 (G

ersc

h et

 al.,

201

6)Ho

mo

sapi

ens

ClpX

P 

Mul

tiple

pro

teas

es 

PS-S

CL 

(P/V

/L/E

)(L/F

/E/K

/V)(L

/A/G

/D/N

) |(K

/Q/L

/E) 

(Ger

sch

et a

l., 2

016)

TcM

CP-1

 Cy

stei

ne p

rote

ase

 PS

-SCL

 Tc

MCP

-1 A

bz-G

XX(K

/R/F

/Y)(R

/T/F

)K(D

np)-O

H (E

kino

et a

l., 2

018)

TbM

CP-1

 Cy

stei

ne p

rote

ase

 PS

-SCL

 Tb

MCP

-1 A

bz-G

XX(K

)FK(

Dnp)

-OH

 (E

kino

et a

l., 2

018)

Hydr

olas

e im

porta

nt fo

r pat

hoge

nesi

s 1

(Hip

1) 

Serin

e pr

otea

se 

PS-S

CL 

(W/F

)(K/P

)(L/n

) | G

(F/n

)F(I/

F/n)

 (L

entz

et a

l., 2

016)

Casp

ase-

1 

Cyst

eine

pro

teas

e 

HyCo

SuL

 VX

HD-A

CC 

(Ram

irez e

t al.,

201

8)Ca

spas

e-11

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

WXH

D-AC

C 

(Ram

irez e

t al.,

201

8)Le

gum

ain

(AEP

) 

Aspa

ragi

nyl p

rote

ase

 Hy

CoSu

L 

Ac-D

-Tyr

-L-T

ic-L

-Ser

-L-A

sp-A

CC 

(Por

eba

et a

l., 2

017a

)Ca

thep

sin

L 

Cyst

eine

pro

teas

e 

HyCo

SuL

 Ac

-Dap

-Orn

-Phe

(3-C

l)-Cy

s(OM

eBzl

)-ACC

 (P

oreb

a et

 al.,

201

8)Ca

spas

e-11

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

Ac-T

le-B

pa-H

is(B

zl)-A

sp-A

CC 

(Ram

irez e

t al.,

201

8)Ca

spas

e-11

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

Ac-T

le-B

ip-H

is-A

sp-A

CC 

(Ram

irez e

t al.,

201

8)Ca

spas

e-2

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

Ac-Id

c-hG

lu-T

hr(B

zl)-S

er-A

sp-A

CC 

(Por

eba

et a

l., 2

019c

)Ca

spas

e-9

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

Oic-

Tle-

His-

Asp-

ACC

 (P

oreb

a et

 al.,

201

9a)

Casp

ase-

9 

Cyst

eine

pro

teas

e 

HyCo

SuL

 Ly

s(tfa

)-Tle

-His

-Asp

-ACC

 (P

oreb

a et

 al.,

201

9a)

Casp

ase-

9 

Cyst

eine

pro

teas

e 

HyCo

SuL

 Ly

s(Ac

)-Tle

-His

-Asp

-ACC

 (P

oreb

a et

 al.,

201

9a)

Cath

epsi

n B

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

Ac-C

ha-L

eu-h

Ser(B

zl)-A

rg-A

CC 

(Por

eba

et a

l., 2

019b

)Fa

ctor

VII

activ

atin

g pr

otea

se 

Serin

e pr

otea

se 

HyCo

SuL

 Ac

-Pro

-DTy

r-Lys

-Arg

-ACC

 (R

ut e

t al.,

201

9)hS

ENP1

 Cy

stei

ne p

rote

ase

 Hy

CoSu

L 

(Q/L

/n)(S

/T/F

/V)G

G | A

CC 

(Pon

der e

t al.,

201

1)Ca

spas

e-6

 Cy

stei

ne p

rote

ase

 Co

SeSu

L 

TETD

 | ACC

 (E

dgin

gton

et a

l., 2

012)

Muc

osa-

asso

ciat

ed ly

mph

oid

tissu

e ly

mph

oma

trans

loca

tion

prot

ein

1 (M

ALT1

) 

Cyst

eine

pro

teas

e 

CoSe

SuL

 Ac

c(Ah

x)AL

VSRG

(nV)

K(Dn

p)G

 (K

aspe

rkie

wic

z et a

l., 2

018)

DPP-

VII

 Am

inop

eptid

ase

 IS

FPL

 H-

KP-A

MC

 (L

eitin

g et

 al.,

200

3)DP

P-II

 Am

inop

eptid

ase

 IS

FPL

 H-

Nle-

Pro-

AMC

 (L

eitin

g et

 al.,

200

3)DP

P-IV

 Am

inop

eptid

ase

 IS

FPL

 H-

Ala-

Pro-

AFC

 (L

eitin

g et

 al.,

200

3)Hu

man

cath

epsi

n C

 Am

inop

eptid

ase

 IS

FPL

 M

et-N

le(O

-Bzl

)-ACC

 (P

oreb

a et

 al.,

201

4a)

Leuk

otrie

ne A

4 hy

drol

ase

 Am

inop

eptid

ase

 IS

FPL

 As

pBzl

-ACC

 (B

yzia

et a

l., 2

014)

Bleo

myc

in h

ydro

lase

 Am

inop

eptid

ase

 IS

FPL

 H-

Lys(

2-Cl

-Cbz

)-ACC

 (v

an d

er L

inde

n et

 al.,

201

5)Ki

dney

cell

lysa

te 

Amin

opep

tidas

e 

ISFP

L 

hPhe

-ACC

 (B

yzia

et a

l., 2

016)

Mal

aria

l dip

eptid

yl a

min

opep

tidas

e 3

 Am

inop

eptid

ase

 IS

FPL

 M

et-n

Leu(

o-Bz

l)-AC

C 

(de

Vrie

s et

 al.,

201

9)Tb

MCP

-1 

Met

allo

carb

oxyp

eptid

ase

 IQ

F 

Abz-

LLKF

K(Dn

p)-O

H 

(Fra

sch

et a

l., 2

018)

Page 3: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      167

Prot

ease

nam

e 

Prot

ease

type

 M

etho

d/lib

rary

 Di

scov

ered

sub

stra

te s

eque

nces

a 

Refe

renc

e

TcbM

CP-1

 M

etal

loca

rbox

ypep

tidas

e 

IQF

 Ab

z-RR

FFK(

Dnp)

-OH

 (F

rasc

h et

 al.,

201

8)NS

2B-N

S3 p

rote

ase

 Se

rine

prot

ease

 IQ

F 

ABZ-

VK(K

/R)R

-ANB

-NH 2

 (G

ruba

et a

l., 2

016)

KLK1

3 

Serin

e pr

otea

se 

IQF

 AB

Z-VR

FR-A

NB-N

H 2 

(Gru

ba e

t al.,

201

9)Ca

spas

e-3

 Cy

stei

ne p

rote

ase

 IQ

F 

Abz-

GDEV

D | G

VY(N

O 2)D-O

H 

(Ste

nnic

ke e

t al.,

200

0)NS

2B/N

S3 

Serin

e pr

otea

se 

IQF

 Bz

-Nle

-Lys

-Arg

-Arg

-ACM

C 

(Li e

t al.,

200

5)Ca

spas

e-3

 Cy

stei

ne p

rote

ase

 IQ

F 

ACC-

GDEV

D | G

VK(D

NP)D

-NH 2

 (P

oreb

a et

 al.,

201

7b)

Casp

ase-

7 

Cyst

eine

pro

teas

e 

IQF

 AC

C-GD

EVD 

| GVK

(DNP

)D-N

H 2 

(Por

eba

et a

l., 2

017b

)Ca

spas

e-8

 Cy

stei

ne p

rote

ase

 IQ

F 

ACC-

GDEV

D | G

VK(D

NP)D

-NH 2

 (P

oreb

a et

 al.,

201

7b)

Legu

mai

n 

Cyst

eine

pro

teas

e 

IQF

 AC

C-GP

TN | K

VK(D

NP)R

-NH 2

 (P

oreb

a et

 al.,

201

7b)

Elas

tase

 Se

rine

prot

ease

 IQ

F 

ACC-

GAEP

V | S

LK(D

NP)L

-NH 2

 (P

oreb

a et

 al.,

201

7b)

MM

P-2

 M

etal

lopr

otea

se 

IQF

 AC

C-GP

LG | L

K(DN

P)AR

-NH 2

 (P

oreb

a et

 al.,

201

7b)

MM

P-9

 M

etal

lopr

otea

se 

IQF

 AC

C-GP

LG | L

K(DN

P)AR

-NH 2

 (P

oreb

a et

 al.,

201

7b)

MAL

T1 

Cyst

eine

pro

teas

e 

IQF

 H 2N-

ACC-

Ahx-

ALVS

RGT-

K(Dn

p)G-

OH 

(Kas

perk

iew

icz e

t al.,

201

8)Ca

thep

sin

G 

Serin

e pr

otea

se 

IQF

 AC

C-Gl

y-Hi

s(Bz

l)-Tl

e-Pr

o-Ph

e-Se

r-Asp

-Met

(O)-

Gly-

Lys(

DNP)

-Gly

-NH 2

 (G

robo

rz e

t al.,

201

9)

HiGl

pG 

Serin

e pr

otea

se 

Synt

hetic

libr

ary

 (m

ca)R

PKPY

AvW

MK(

dnp)

 (A

ruty

unov

a et

 al.,

201

8)Pl

asm

odiu

m p

rote

asom

e 

Prot

easo

me

 Sy

nthe

tic li

brar

y 

Mor

-Hfe

-Ser

(Me)

-Thi

-ACC

 (D

ydio

et a

l., 2

017)

Thro

mbi

n 

Serin

e pr

otea

se 

Phag

e di

spla

y 

(P/A

/V/L

)R |(

S/A)

 (K

retz

et a

l., 2

018)

ADAM

TS13

 M

etal

lopr

otea

se 

Phag

e di

spla

y 

(L/I

/M)X

Y |(Y

/L/M

/F)

 (K

retz

et a

l., 2

018)

Hepa

titis

C vi

rus

(HCV

) NS3

/4A

prot

ease

 Se

rine

prot

ease

 Ye

ast d

ispl

ay 

PSTV

FC | A

 (P

ethe

et a

l., 2

019)

Atyp

ical

asp

artic

pro

teas

e in

root

s 1

(ASP

R1)

 As

part

ic p

rote

ase

 Tr

yptic

pro

teom

e lib

rary

 G(

Y/E)

(E/V

/I)(L

) |(F

/Y/H

)(A/V

)(A/G

/N)(P

/N)

 (S

oare

s et

 al.,

201

9)At

ypic

al a

spar

tic p

rote

ase

in ro

ots

1 (A

SPR1

) 

Aspa

rtic

pro

teas

e 

GluC

pro

teom

e lib

rary

 (N

/F)(F

/Y/N

)(K/I

/V)(L

/N/K

) |(F

/Y/V

)(V/A

/I)

(K/G

/A)(N

/P/T

) 

(Soa

res

et a

l., 2

019)

Sirt

ilin-

a 

Serin

e pr

otea

se 

Legu

mai

n pr

oteo

me

libra

ry 

V(G/

A)R 

|(S/T

/V)(A

/G/F

)(F/E

/M)

 (D

ahm

s et

 al.,

201

9)Si

rtili

n-a

 Se

rine

prot

ease

 Gl

uC p

rote

ome

libra

ry 

(L/Y

/V)(G

/A)R

 |(V/

T)(A

/G/Y

) 

(Dah

ms

et a

l., 2

019)

FIXa

 Se

rine

prot

ease

 Le

gum

ain

prot

eom

e lib

rary

 (G

/V)R

 |(T/

S/C/

R)(L

/I)

 (D

ahm

s et

 al.,

201

9)FI

Xa 

Serin

e pr

otea

se 

GluC

pro

teom

e lib

rary

 L(

L/G)

R |(A

/S)(L

/I)

 (D

ahm

s et

 al.,

201

9)FX

a 

Serin

e pr

otea

se 

Legu

mai

n pr

oteo

me

libra

ry 

(G/A

/E)R

 |(A/

S/G)

(G/A

) 

(Dah

ms

et a

l., 2

019)

FXa

 Se

rine

prot

ease

 Gl

uC p

rote

ome

libra

ry 

(G/A

)R |(

A/S)

(L/G

) 

(Dah

ms

et a

l., 2

019)

Tryp

sin-

3 

Serin

e pr

otea

se 

Prot

eom

e lib

rarie

s 

(K/R

) | T(

D/E)

 (S

chill

ing

et a

l., 2

018)

GluC

 Se

rine

prot

ease

 Ch

aFRA

tip 

E | X

 (N

guye

n et

 al.,

201

8)Ca

spas

e-3

 Cy

stei

ne p

rote

ase

 Ch

aFRA

tip 

DX(V

/P/L

)D |(

G/A)

 (N

guye

n et

 al.,

201

8)Ch

ymot

ryps

in 

Serin

e pr

otea

se 

ChaF

RAtip

 A(

V/T)

(L/F

/W/Y

/M) |

(K/T

) 

(Ngu

yen

et a

l., 2

018)

MM

P-1

 M

etal

lopr

otea

se 

ChaF

RAtip

 (P

/A)Q

(A/N

/D) |

(L/I

/K)(T

/K/V

)(A/D

) 

(Ngu

yen

et a

l., 2

018)

Cath

epsi

n G

 Se

rine

prot

ease

 Ch

aFRA

tip 

E(P/

K)(L

/F/Y

/M/N

) |(K

/I/A

/S)(D

/E)

 (N

guye

n et

 al.,

201

8)Su

btili

gase

 Cy

stei

ne p

rote

ase

 PI

LS 

 |(A/

G/S/

M/R

)(F/W

/Y/I

/L/V

) 

(Wee

ks a

nd W

ells

, 201

8)Ca

spas

e-3

 Cy

stei

ne p

rote

ase

 N-

term

inal

pep

tides

 DE

(V/I

/P)D

 |(G/

S) 

(Mah

rus

et a

l., 2

008)

Casp

ase-

1 

Cyst

eine

pro

teas

e 

N-te

rmin

al p

eptid

es 

(F/L

/Y/W

)(E/V

/L/D

)(S/P

/T/V

)D |(

G/S/

A)(V

/F/L

/Y)

 (A

gard

et a

l., 2

010)

Tabl

e 1 

(con

tinue

d)

Page 4: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

168      S. Chen et al.: Mapping protease substrate specificity

Prot

ease

nam

e 

Prot

ease

type

 M

etho

d/lib

rary

 Di

scov

ered

sub

stra

te s

eque

nces

a 

Refe

renc

e

Casp

ase-

8 

Cyst

eine

pro

teas

e 

N-te

rmin

al p

eptid

es 

EXD 

|(G/S

) 

(Aga

rd e

t al.,

201

2)Ca

spas

e 

Mul

tiple

pro

teas

es 

N-te

rmin

al p

eptid

es 

DEVD

 |(G/

S/A)

V 

(Julie

n et

 al.,

201

4)Ca

spas

e 

Mul

tiple

pro

teas

es 

N-te

rmin

al p

eptid

es 

HtrA

2: A

 | AVP

SPPP

ASPR

 (W

iita

et a

l., 2

014a

)Ca

spas

e 

Mul

tiple

pro

teas

es 

N-te

rmin

al p

eptid

es 

Vim

entin

: D | A

LKGT

NESL

ER 

(Wiit

a et

 al.,

201

4a)

Casp

ase-

2 

Cyst

eine

pro

teas

e 

N-te

rmin

al p

eptid

es 

DE(V

/T/P

)D |(

G/S/

A)(V

/A/L

) 

(Julie

n et

 al.,

201

6)Ca

spas

e-6

 Cy

stei

ne p

rote

ase

 N-

term

inal

pep

tides

 (V

/T)(E

/D)(V

/T)D

 |(G/

S/A)

(V/A

) 

(Julie

n et

 al.,

201

6)Bl

ood

prot

ease

s 

Mul

tiple

pro

teas

es 

N-te

rmin

al p

eptid

es 

(R/K

/N) |

(S/A

/G/V

) 

(Wild

es a

nd W

ells

, 201

0)M

itoch

ondr

ial p

rote

ases

 M

ultip

le p

rote

ases

 TA

ILS

 (R

/A/L

/V)(R

/A/S

/L/P

)(L/A

/R/K

) |(S/

A/L/

M)

(S/T

/A/E

)(S/G

/A/T

) 

(Mar

shal

l et a

l., 2

018)

Neut

roph

il el

asta

se 

Serin

e pr

otea

se 

TAIL

S 

P(V/

I) | A

LXL

 (K

ing

et a

l., 2

018)

MM

P-9

 M

etal

lopr

otea

se 

TAIL

S 

PXP 

| C(R

/Q)

 (K

ing

et a

l., 2

018)

Polio

viru

s 3C

pro

 Cy

stei

ne p

rote

ase

 TA

ILS

 (A

/V)X

XQ |(

G/A/

Q/M

) 

(Jagd

eo e

t al.,

201

8)CV

B3 3

Cpro

 Cy

stei

ne p

rote

ase

 TA

ILS

 (A

/V/I

)X(P

/H)Q

 |(G/

A)(G

/E)

 (Ja

gdeo

et a

l., 2

018)

ADAM

10 

Met

allo

prot

ease

 TA

ILS

 GH

IYG 

| EEG

SF 

(Jeffe

rson

et a

l., 2

013)

MM

P-1

 M

etal

lopr

otea

se 

TAIL

S 

SFPA

T | LE

 | | TQ

 | EQD

 (Je

ffers

on e

t al.,

201

3)M

MP-

7 

Met

allo

prot

ease

 TA

ILS

 LP

LPQ 

| E | A

GGM

S 

(Jeffe

rson

et a

l., 2

013)

ADAM

9 

Met

allo

prot

ease

 TA

ILS

 YV

IQA 

| EGK

EH 

(Jeffe

rson

et a

l., 2

013)

ADAM

TS-1

 M

etal

lopr

otea

se 

TAIL

S 

SDAL

G | R

PSEE

 | DEE

LV 

(Jeffe

rson

et a

l., 2

013)

KLK7

 Se

rine

prot

ease

 TA

ILS

 TA

GEE |

 AQG

 | DKI

ID 

(Jeffe

rson

et a

l., 2

013)

MM

P-2

 M

etal

lopr

otea

se 

TAIL

S 

(P/A

/V)(A

/S/R

)(A/G

/N) |

(L/I

)(K/A

/Y)(A

/S/G

) 

(Pru

dova

et a

l., 2

010)

MM

P-9

 M

etal

lopr

otea

se 

TAIL

S 

GPK(

G/P)

 |(L/

I)K(G

/A)(A

/P/Y

) 

(Pru

dova

et a

l., 2

010)

MM

P-2

 M

etal

lopr

otea

se 

TAIL

S 

VIQH

 | FQE

KVES

LEQE

AANE

R 

(Kel

ler e

t al.,

201

0)M

T6-M

MP

 M

etal

lopr

otei

nase

 PI

CS 

(A/P

/V)(A

/N/E

)(E/A

/N) |

(L/I

)(V/L

/T)Q

 (S

tarr

et a

l., 2

012)

AtCa

thB2

 Cy

stei

ne p

rote

inas

e 

PICS

 (P

/I/L

/V)(P

/V/D

)(G/A

/T) |

(V/L

I)(A/

T) 

(Por

odko

et a

l., 2

018)

AtCa

thB3

 Cy

stei

ne p

rote

inas

e 

PICS

 (P

/I/F

)(V/R

/P)(A

/G/R

/T) |

(V/I

/L)(D

/A)

 (P

orod

ko e

t al.,

201

8)Sl

Phyt

1 

Cyst

eine

pro

teas

e 

PICS

 (V

/I/L

)XP(

D/E)

 |(K/

A) 

(Rei

char

dt e

t al.,

201

8)Sl

PShy

t3 

Cyst

eine

pro

teas

e 

PICS

 (A

/I)D

 |(S/

G/H)

(V/I

) 

(Rei

char

dt e

t al.,

201

8)Sl

Phyt

4 

Cyst

eine

pro

teas

e 

PICS

 P(

D/M

) | H

T(E/

V)(E

/D/A

) 

(Rei

char

dt e

t al.,

201

8)Sl

Phyt

5 

Cyst

eine

pro

teas

e 

PICS

 AD

 |(G/

E/H)

V 

(Rei

char

dt e

t al.,

201

8)Sl

P69A

 Cy

stei

ne p

rote

ase

 PI

CS 

(A/T

/I)D

 |(G/

H)(Y

/I/A

) 

(Rei

char

dt e

t al.,

201

8)Ps

eudo

gym

noas

cus

dest

ruct

ans

PdCP

1 

Serin

e pr

otea

se 

MSP

-MS

 (n

/K/V

)(H/K

/R/W

)(R/P

)R |(

R/n)

 (B

eekm

an e

t al.,

201

8)An

gios

trong

ylus

cost

arice

nsis

adu

lt wo

rm ly

sate

s M

ainl

y as

part

yl p

eptid

ase,

pH

3 

MSP

-MS

 (F

/L/n

) |(F

/Y/n

)(R/T

) 

(Reb

ello

et a

l., 2

018)

A. co

star

icen

sis

adul

t wor

m ly

sate

s 

Mai

nly

cyst

eine

pep

tidas

e, p

H 5

 M

SP-M

S 

(K/R

)X(V

/F)(K

/R) |

(n/F

) 

(Reb

ello

et a

l., 2

018)

A. co

star

icen

sis

adul

t wor

m ly

sate

s 

Mai

nly

cyst

eine

pep

tidas

e, p

H 8

 M

SP-M

S 

(D/Y

) | X

R 

(Reb

ello

et a

l., 2

018)

A. co

star

icen

sis

L1 ly

sate

s 

Mai

nly

aspa

rtyl

pep

tidas

e, p

H 3

 M

SP-M

S 

E(F/

Y) | n

XV 

(Reb

ello

et a

l., 2

018)

A. co

star

icen

sis

L1 ly

sate

s 

Mai

nly

cyst

eine

pep

tidas

e, p

H 8

 M

SP-M

S 

(R/I

)(R/A

)L(R

/K/H

/W) |

 X 

(Reb

ello

et a

l., 2

018)

FheC

L1 

Cyst

eine

pro

teas

e 

MSP

-MS

 (V

/I/L

/M)(K

/R/Q

) | X

 (C

orvo

et a

l., 2

013)

FheC

L3 

Cyst

eine

pro

teas

e 

MSP

-MS

 GP

(K/R

/Q) |

(S/G

/A/M

) 

(Cor

vo e

t al.,

201

3)Ne

utro

phil

extra

cellu

lar t

raps

(NET

s) 

Mul

tiple

pro

teas

es 

MSP

-MS

 (R

/Y)(Q

/S)P

(I/V/

T) |(

S/R/

n)W

 (O

’Don

oghu

e et

 al.,

201

3)De

stru

ctin

-1 

Serin

e pr

otea

se 

MSP

-MS

 (I/

n/F)

(R/W

/K)(n

/I)(Q

/Y/F

) |(K

/T)(I

/W/Y

) 

(O’D

onog

hue

et a

l., 2

015)

Tabl

e 1 

(con

tinue

d)

Page 5: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      169

of proteases and their diverse roles in many biological pathways (Lopez-Otin and Bond, 2008), it is difficult to identify specific cleavage products in a pool of complex cellular components. Therefore, technological advances are required to fully define protease function. Further-more, a number of proteases belong to families that share highly related active sites, making conventional analyti-cal methods based on gel electrophoresis ineffective for distinguishing substrates for a single member within the family. Detailed knowledge of substrate specificities of individual proteases in complex biological systems affords new opportunities to understand their roles in homeostasis and disease. Information on substrate speci-ficity can also guide the development of chemical tools for protease detection or inhibition. Over the past decades, both synthetic and biological methods for generating combinatorial peptide libraries have greatly facilitated the process of mapping protease substrate specificity. There have also been a number of highly successful methods developed using gel-free proteomic methods to globally monitor proteolysis of native protein substrates using proteomic methods. Review papers focusing on broader topics (Poreba and Drag, 2010), such as protease hydrol-ysis mechanisms (Vizovisek et  al., 2018), applications of protease probes and profiling approaches (Rut et  al., 2015; Kasperkiewicz et  al., 2017), have been reported. This review will focus on the synthetic and biological approaches to generate and screen diverse peptide librar-ies to map protease substrate specificities, followed by a summary of some recent examples of protease specificity profiles mapped using these methods (Table 1).

The search for natural substratesThe most simple and effective way to confirm hydrolysis of individual substrate proteins by a protease is to resolve the resulting hydrolyzed polypeptide chains using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), followed by chromogenic staining or immunoblot-ting to visualize the breakdown products. However, these methods require prior knowledge of candidate substrates for a given protease. In a complex biological sample, protease substrates cannot be easily identified with the limited predictive value of SDS-PAGE staining. As an alter-native, methods such as PROTOMAP (Dix et al., 2008) have been developed to globally identify proteolytic events in complex proteomic samples that have been resolved by SDS-PAGE. This allows the direct identification of pro-teolytic fragments recovered from SDS-PAGE gels using Pr

otea

se n

ame

 Pr

otea

se ty

pe 

Met

hod/

libra

ry 

Disc

over

ed s

ubst

rate

seq

uenc

esa

 Re

fere

nce

Plas

mod

ium

falc

ipar

um 2

0S p

rote

asom

e 

Prot

easo

me

 M

SP-M

S 

(F/I

/n)(W

/L/I

/V)(R

/Y/K

)(F/Y

/L/W

) |(R

/A)

 (L

i et a

l., 2

016)

Cons

titut

ive

prot

easo

me

 Pr

otea

som

e 

MSP

-MS

 (P

/n/F

/I/L

)(I/K

/L/V

)(K/S

/R/T

/Q)

(R/L

/F/H

) |(K

/R/A

/N)(n

/L/W

) 

(Win

ter e

t al.,

201

7)

Imm

unop

rote

asom

e 

Mul

tiple

pro

teas

es 

MSP

-MS

 (P

/F/I

)(V/L

/I/n

)(K/R

/N)(L

/F/n

/W

/Y) |

(R/H

/A/N

) 

(Win

ter e

t al.,

201

7)

Pd_d

inas

e 

Cyst

eine

Pro

teas

e 

MSP

-MS

 (G

/n)(N

/H/Q

/R) |

(n/S

/L)

 (X

u et

 al.,

201

8)Sc

hist

osom

a m

anso

ni se

rine

prot

ease

2 (S

mSP

2) 

Serin

e pr

otea

se 

MSP

-MS

 (R

/K) |

 S(A

/G)

 (L

eont

ovyc

et a

l., 2

018)

PsAa

rA 

Serin

e pr

otea

se 

qMSP

-MS

 FX

L(A/

V) |(

R/S)

F 

(Lap

ek e

t al.,

201

9)Ha

emop

hilu

s in

fluen

zae

rhom

boid

pep

tidas

e (H

iglp

g) 

Serin

e pr

otea

se 

qMSP

-MS

 M

ca-V

KLFR

FN | W

MK(

DNP)

-NH 2

 (L

apek

et a

l., 2

019)

Aspe

rgill

us p

hoen

icis

mon

o-ca

rbox

ypep

tidas

e 

Mul

tiple

pro

teas

es 

qMSP

-MS

 (A

/H)(R

/K/Y

/E)W

(P/R

) | V

nK 

(Lap

ek e

t al.,

201

9)As

perg

illus

pho

enic

is f

endo

pept

idas

es 

Mul

tiple

pro

teas

es 

qMSP

-MS

 (T

/L)(R

/K/H

)(I/T

/n)(R

/K/n

) |(n/

I/F)

(F/R

)(F/K

)W 

(Lap

ek e

t al.,

201

9)Lu

ng ca

ncer

sec

retio

ns m

onoa

min

opep

tidas

es 

Mul

tiple

pro

teas

es 

qMSP

-MS

 (A

/W/F

/Y) |

(n/A

)(n/Y

)(H/L

/S)(Y

/G)

 (L

apek

et a

l., 2

019)

Lung

canc

er s

ecre

tions

di-a

min

opep

tidas

es 

Mul

tiple

pro

teas

es 

qMSP

-MS

 (A

/F/G

)(S/N

/n) |

(Y/n

)(F/Y

)(W/K

/R/N

)(R/Y

/T) 

(Lap

ek e

t al.,

201

9)

a ‘ | ’ i

ndic

ates

the

prot

ease

hyd

roly

sis

posi

tion.

b ‘n’ c

orre

spon

ds to

nor

leuc

ine

or n

ongr

ayed

resi

dues

.

Tabl

e 1 

(con

tinue

d)

Page 6: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

170      S. Chen et al.: Mapping protease substrate specificity

unbiased mass spectrometry (MS) detection. The benefits of this approach are the need for only small quantities of protein and the ability to monitor dynamic changes in proteolytic processing events en masse without any prior knowledge of the potential substrates. However, this method is limited because it is time-consuming, requires extensive MS analysis for each sample being tested and is limited to protein fragments that can be resolved by SDS-PAGE.

Alternative to PROTOMAP, a number of gel-free methods have been developed to enable simultaneous identification and quantification of substrate hydrolysis events in complex cellular environments (Van Damme et al., 2005; Enoksson et al., 2007; Schilling and Overall, 2008; Impens et al., 2010; Kleifeld et al., 2010; Wiita et al., 2014b). All of these methods depend on the enrichment of unique α-amino peptides or isotopically labeled frag-ments of native proteins produced by proteolysis. These approaches have enabled a better understanding of protease substrate networks and their roles in specific biological pathways. The identification of substrate rec-ognition motifs has also contributed to the categorization of proteases based on their specificities for substrate pro-cessing. An interesting example of the value of proteom-ics-based protease substrate discovery was the discovery of Dicer as one of the large number of substrates for the caspase family of cysteine proteases involved in apopto-sis (Pop and Salvesen, 2009; Crawford et al., 2013; Julien and Wells, 2017). Dicer, one of the central proteins of the microRNA (miRNA) processing machinery, was discovered as a target for caspases during apoptosis of HeLa cells, triggered by tumor necrosis factor α (TNFα) (Matskevich and Moelling, 2008). The level of miRNAs was also sub-stantially repressed during glucocorticoid-induced apop-tosis of primary rat thymocytes, due to Dicer depletion by caspases (Smith et al., 2010).

The disadvantage of using substrate cleavage to study protease activity is that it relies on the assumption that only a single protease is responsible for the cleavage of a given substrate protein. This assumption is often not true, especially for large, closely related protease families such as the matrix metalloproteases (MMPs) (Prudova et  al., 2010), cysteine cathepsins (Turk et  al., 2012) and cas-pases (Pop and Salvesen, 2009), in which multiple pro-teases share highly similar substrate specificity profiles. Furthermore, conditional or transient protease-substrate interactions may also lead to false-negative discovery of natural substrates (Seo and Rhee, 2018). Therefore, as a complement to proteomic methods that map processing of native protein cleavage events, a number of methods have been developed that allow direct screening of randomized

peptide sequences to identify global patterns of substrate specificity for a single protease of interest. These methods are the focus of this review and make use of both synthetic chemistry and biological expression systems to generate the necessary diversity of peptides to perform effective substrate specificity profiling studies.

Synthetic combinatorial peptide librarySolid-phase peptide synthesis (SPPS) with Fmoc chemis-try is the most widely used synthetic strategy to prepare peptide substrates of proteases (Merrifield, 1985; Behrendt et al., 2016). By attaching the C-terminus of the peptide chain to a solid support, the polypeptide chain can be synthesized in high yields by rounds of amide bond cou-plings. Due to the large chemical space of peptides that result from the combination of 20 natural amino acid building blocks, it is logistically difficult to synthesize and test a sufficiently diverse library of peptides individu-ally. Using mixtures of peptide substrates, libraries can be efficiently screened in a high throughput manner. Two methods have been developed and are commonly used for the rapid and efficient synthesis of peptide libraries using combinatorial chemistry techniques. In the first approach, amino acids are mixed according to their cou-pling efficiency to produce combinatorial libraries with equal distribution of each amino acid at designated posi-tions (Ostresh et  al., 1994). Establishing the isokinetic mixture of amino acids is important to avoid over-popu-lation of highly reactive amino acids and to ensure equal amino acid distribution in the final product. Beyond amino acids, this technique has also been used to incor-porate carboxylic acids in mixture-based combinatorial libraries (Acharya et al., 2002). In the second approach, solid support beads are physically split into equal por-tions and individual amino acids are coupled separately, ensuring equimolar substitution (Furka et  al., 1991). By splitting and combining beads over multiple rounds of amino acid couplings, millions of peptides can be syn-thesized to produce ‘one-bead, one-peptide’ libraries (Lam et al., 1991). However, these libraries are typically screened with the peptides attached to the beads and require some form of decoding to identify the sequences on beads that contain the optimal substrates.

To evaluate the peptide substrate specificity for a given protease, the cleavage of the peptide substrate must be detected. Fluorescent reporter groups such as 7-amino-4-methylcoumarin (AMC, Figure 1) are attached by an

Page 7: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      171

amide bond to the carboxylic acid end of the peptide sub-strate to convert it into a fluorogenic substrate. While the peptide is intact, the amide form of the coumarin stays optically silent. Upon proteolytic hydrolysis of the peptide substrate at the P1 residue, the free amino form of the AMC is released and becomes 700-fold brighter, allowing for fluorescent detection of a cleavage event (Zimmerman et al., 1976). To allow for solid-phase compatible synthesis of fluorogenic substrates, AMC has been further modified to 7-amino-4-carbamoylmethylcoumarin (ACC), which contains additionally carboxylic acid that can be directly coupled to a solid support (Harris et al., 2000). This allows direct split and mix or isokinetic mixture synthesis of diverse peptide substrate libraries.

The development of positional scanning substrate combinatorial libraries (PS-SCL) made it possible to rapidly and exhaustively screen peptide substrates to determine primary amino acid specificity without any knowledge of natural substrates (Rano et  al., 1997; Thornberry et al., 1997; Harris et al., 2000). The PS-SCL is composed of fluorogenic sub-libraries where each posi-tion of the peptide is fixed with one amino acid, while the remaining positions contain an equimolar mixture of amino acids (Figure  2). With the PS-SCL format, pro-teases can be assayed and the optimally preferred amino acid residue at each position of the peptide can be rapidly identified. Ultimately, optimal residues at each posi-tion in the peptide can be combined to generate sub-strate sequences that are highly specific to the protease of interest. Because substrate libraries are synthesized in an unbiased manner, it is possible to use the result-ing specificity profiles to uncover substrate specifici-ties other than those defined by the most abundant and efficiently cleaved native protein substrate. In the first example of applying PS-SCL to determine protease speci-ficity, optimal substrate sequences were discovered for

interleukin-1β converting enzyme (ICE; now known as caspase 1) that were divergent from its native substrates (Rano et  al., 1997). Substrate specificities of cathepsins K and S were also profiled and showed sequence pref-erences that matched known physiological substrates (Choe et al., 2006). Using PS-SCL, human mitochondrial intermediate peptidase (hMIP) was discovered to prefer polar, uncharged residues at P1 and P1′ substrate posi-tions (Marcondes et  al., 2015). Substrate specificity of hepatocyte growth factor activator (HGFA) was elucidated through PS-SCL screening (Damalanka et al., 2019), which was then used to design selective inhibitors of matriptase and hepsin (Damalanka et al., 2019). PS-SCL is one of the most widely applied method to map protease substrate specificity, and more examples are listed in Table 1.

However, due to overlapping substrate specificity of proteases from the same family, conventional PS-SCL approaches have often been insufficient to generate selec-tive substrates for a single family member. In addition, proteases from different but related families can also have a great deal of overlap in substrate specificities. To explore a larger chemical space of substrate peptides, hybrid combinatorial substrate libraries (HyCoSuL) using both

Figure 1: Fluorescent properties of 7-amido and 7-amino-4-methylcoumarin fluorophores.The excitation maxima (Ex) and emission maxima (Em) of 7-amido and 7-amino 4-methylcoumarin are distinct. The relative fluorescence (RF) intensity of 7-amino-4-methylcoumarin is approximately 700-fold greater than that of an equimolar amount of 7-amido-4-methylcoumarin when excited at 380 nm and emission detected at 460 nm.

Figure 2: Positional scanning-substrate combinatorial library (PS-SCL) for mapping protease substrate preferences.In each sub-library, a single position is fixed with a defined amino acid and the remaining positions mixed with equimolar concentrations of amino acids (minus cysteine and methionine to avoid oxidation). The 7-amino-4-carbamoylmethylcoumarin (ACC) reporter is conjugated to the C-terminus of the peptide library so that hydrolysis of each sub-library can be measured using a fluorescent plate reader. Ultimately, the optimal residues at each position (P1–P4) can be determined and then combined to make an optimal set of substrates for a given protease.

Page 8: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

172      S. Chen et al.: Mapping protease substrate specificity

natural and non-natural amino acids have been devel-oped. This use of non-natural amino acids has led to the development of selective protease substrates, inhibitors and activity-based probes with increased selectivity over molecules that contain only natural amino acids (Poreba et  al., 2017a). The HyCoSul approach has been success-fully applied to generate selective tools for proteases such as caspases (Poreba et  al., 2014b; Ramirez et  al., 2018), human neutrophil serine protease 4 (Kasperkiewicz et al., 2015), neutrophil elastase (Kasperkiewicz et  al., 2014) and a protease expressed by Mycobacterium tuberculosis (Lentz et al., 2016) as well as others. It has also been used to develop selective active site probes and inhibitors for protease activities within multi-proteolytic protease com-plexes such as the proteasome (Rut et al., 2018; Yoo et al., 2018). The application of non-natural amino acids in indi-vidual substrate fluorogenic peptide library (ISFPL) has also proven valuable for profiling substrate preferences of mono-, di- and tri-aminopeptidases (Drag et al., 2010; Poreba et al., 2012).

PS-SCL strategies using a C-terminal reporter are valuable for mapping substrate specificities of proteases that derive the majority of their specific binding interac-tions from the non-prime residues on the N-terminal side of the scissile bond. For protease that derives specificity from sequences on both sides or from the prime, C-ter-minal side of the amide bond, internally quenched fluo-rescent (IQF) peptide substrate libraries can be used. In IQF peptide substrate libraries, a fluorescent reporter is attached to one end of the peptide while a quencher mole-cule is incorporated at the other end. While the peptide is intact, the proximal quencher molecule absorbs the fluo-rescence from the fluorophore, keeping the peptide sub-strate optically silent. Upon proteolytic hydrolysis of the substrate peptide anywhere in the sequence between the fluorophore and quencher, the reporter is released to emit fluorescence, identifying the specific peptide sequences that the protease prefers (Yaron et al., 1979). The signifi-cant limitation of this method is that cleavage events at multiple positions in the peptide sequences can result in a positive signal, making it difficult to map the exact cleavage site without further analytical studies of the optimized substrates. Overall, synthetic combinatorial libraries of both natural and hybrid natural/non-natural peptides have enabled the profiling of substrate specifici-ties for many proteases (see Table 1 for recent examples). This general approach has greatly promoted the study of peptide sequence preferences of many proteases, acceler-ated the understanding of their biological functions and facilitated the design and discovery of clinically relevant protease inhibitors.

Fragment-based discovery of small molecule substratesIn addition to using peptide substrate libraries to identify native substrate cleavage specificities, it is also possible to design and sequentially build small molecule fragment-based substrate libraries to identify non-peptidic building blocks that can be efficiently recognized by proteases. Sub-strate activity screening (SAS) is a fragment-based method for the rapid development of selective small molecule protease substrates and inhibitors (Wood et  al., 2005). SAS can efficiently identify weak binding fragments and allow rapid optimization of the initial weak binding frag-ments into higher-affinity compounds (Figure 3). The SAS method consists of two sequential screening steps to dis-cover non-peptide small molecule substrate of proteases and a final step to convert them into potent inhibitors. Ini-tially, a fluorogenic coumarin derivative substrate library is synthesized with diverse, low-molecular-weight N-acyl fragments and screened against a given protease. In the first step, the protease substrates are identified through a high-throughput fluorescence-based assay, and generally only weak substrates are discovered. In the second step, the substrates with low activity are further elaborated using combinatorial chemistry. Key chemical structures are identified from the first fragment library screen and are incorporated to generate a new focused substrate library. After a second round of screening against the given pro-tease, specific substrates can be rapidly discovered. These substrates can then be converted into reversible or irrevers-ible inhibitors by directly replacing the aminocoumarin with known mechanism-based warheads. This method has been successfully applied to multiple protease targets (Rawls et al., 2009; Verdoes et al., 2012; Jamali et al., 2015).

Biological display methods to generate diverse protease substrate librariesWhile synthetic chemistry methods to generate diverse peptide libraries have the advantage of overall flexibility and potential to include both natural and non-natural amino acids, they suffer from an overall inability to access the complete diversity of peptide sequences for longer peptides and lack of rapid methods to screen and select for optimal sequences over multiple rounds of screening. To address these shortcomings of chemically synthesized libraries, biological tools have been developed to rapidly

Page 9: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      173

display and screen highly diverse pools of peptide sub-strates. Phage display is a technique in which degener-ated DNA sequences are fused to the pIII gene such that randomized polypeptide libraries are expressed on the phage surface. Using this approach, billions of proteins or peptides can be rapidly produced with their genetic information embedded in the connected phage particles. In this way, high diversity libraries have been used to dis-cover peptides, small proteins and single chain antibody fragment binders of target proteins (Smith, 1985; Ward et al., 1989; Scott and Smith, 1990; Clackson et al., 1991). When an extra affinity tag is placed at the N-terminus of the peptide substrate library, phage display can be used to iteratively screen for selective protease substrates (Figure 4). So far, various peptide libraries and tags have been employed to generate phage libraries for the dis-covery of protease substrates (Matthews and Wells, 1993; Deperthes, 2002; Capek et al., 2010; Caberoy et al., 2011). Compared with synthetic peptide libraries, phage-dis-played peptide libraries have much higher diversity and optimal substrate peptide sequences can be iteratively enriched over multiple rounds of selection by increasing the stringency of selection conditions. The identified pep-tides are then synthesized and tested in vitro to establish the exact site of hydrolysis.

As an alternative to phage screening, yeast endoplas-mic reticulum sequestration screening (YESS) coupled with next generation sequencing (NGS) has been reported as a method to survey protease substrate specificity

(Li et  al., 2017). In this approach, a combinatorial sub-strate library conjugated with an HA tag and FLAG tag is targeted to the yeast endoplasmic reticulum (ER) and transported through the secretory pathway, allowing any proteases present in the ER to cleave the peptide substrate (Figure  5). Fluorescein isothiocyanate (FITC)-conjugated

Figure 3: The substrate activity screening (SAS) method.A library of N-acyl aminocoumarins with diverse, low-molecular-weight N-acyl fragments is prepared and screened to identify protease substrates that bind with low affinity (red colored). A focused library is synthesized based on the substrate identified from the initial screen, and this second library of closely related non-peptidic fragments is screened to identify potent protease substrates (orange + red colored). The most potent substrate can also be converted to an inhibitor or activity-based probe by replacing the coumarin reporter with a protease reactive pharmacophore (W).

Figure 4: Schematic illustration of a phage-based approach to discover protease substrate peptides.A random peptide library (X represents any canonical amino acid) and an affinity tag (for immobilizing phage on solid support) are fused to the N-terminus of phage pIII D1D2 domains. After absorbing phage onto a solid support and adding the protease of interest, phage containing peptides that are efficiently hydrolyzed by the protease are released, recovered and subjected to amplification. Iterative rounds of selection will identify specific peptide substrate sequences. Sequencing the phage pIII gene yields the identity of the corresponding optimal substrate peptides.

Page 10: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

174      S. Chen et al.: Mapping protease substrate specificity

anti-HA and phycoerythrin (PE)-conjugated anti-FLAG antibodies, combined with multicolor fluorescence-acti-vated cell sorting (FACS) screening, are used to detect and isolate cells that presented a cleaved peptide sequence. Cleavage is detected by monitoring the ratio of PE to FITC fluorescence. High amounts of both fluorescent signals indicate a lack of cleavage, whereas a high PE to FITC ratio indicates cleavage by an expressed protease. This method was used to profile the tobacco etch mosaic virus protease and confirmed the substrate preference reported previ-ously (Li et al., 2017).

A method for profiling protease substrates displayed between two affinity domains on the Gram-positive bac-terium Staphylococcus carnosus has also recently been reported (Sandersjoo et  al., 2017). The reporter tag (RT) has affinity to both the reporter (R) and the blocking domain (BD), thereby BD can block RT from interacting with the reporter as long as the substrate peptide is intact. If the substrate peptide is hydrolyzed by the protease, the BD will diffuse away and the RT will be able to bind to the reporter (Figure 6). Proteolysis is therefore reflected by reporter binding. Incubation with fluorescently labeled reporter and human serum albumin (HSA) enables simul-taneous flow cytometric analysis of proteolysis and surface expression levels. When applied to screening MMP-1, peptides with PXXXHy consensus sequences were enriched, and the discovered peptides were shown to be effectively cleaved by the protease.

Microarray peptide librariesPeptide microarrays display a collection of peptides on a solid surface and are widely used to profile protein-protein

interactions, enzyme activity, as well as to map antibody epitopes. The advantages of using microarray peptide libraries include minimal sample and enzyme usage for analysis and ease of recording data through direct scanning of the microarray slide. Peptide microarrays have helped elucidate protease-substrate interactions to advance the understanding of proteases and have the potential for diagnostic applications (Salisbury et  al., 2002; Gosalia and Diamond, 2003; Gosalia et al., 2005a,b). Fluorogenic peptide substrate microarrays provide a rapid way to identify substrate specificity and can help to design selective protease inhibitors. In one study, a 722-member library of fluorogenic protease substrates of the general format Ac-Ala-X-X-(Arg/Lys)-coumarin was synthesized and arrayed, providing maps of protease specificity for human thrombin, factor Xa, plasmin and urokinase plas-minogen activator (Gosalia et al., 2005b).

Microarray strategies can also be used to deconvo-lute proteolytic activity signals after peptide mixtures have been incubated with a small amount of a given protease. Peptide nucleic acid (PNA)-tagged rhodamine-based fluorogenic substrates have been employed to study protease hydrolysis activity (Winssinger et  al., 2004) (Figure  7). Peptide libraries are synthesized and conjugated to the ends of the rhodamine which are PNA barcoded. These libraries are pooled and incubated with the protease of interest. Cleaved peptides generate a free amino-rhodamine, resulting in a 1000-fold increase in fluorescence. To deconvolute multiple signals from the peptide mixtures, the PNA-barcoded libraries are allowed to hybridize to a DNA microarray chip. Subsequently, the microarray chip can then be fluorescently scanned and the peptide sequence determined from the correspond-ing PNA barcode. PNA-tagged synthetic peptide libraries

Figure 5: Schematic illustration of yeast endoplasmic reticulum sequestration screening (YESS) system.The substrate peptide library cassette is fused to the C-terminus of the Aga2 protein and translocated to the ER secretory pathway. Interaction of the Aga2 with Aga1 protein displays the peptide on the yeast surface. When the peptide substrate is recognized and hydrolyzed by the protease, the HA tag is released. After staining with PE-labeled anti-FLAG and FITC-labeled anti-HA antibodies, FACS sorting isolates yeast cells containing only PE fluorescence. The identity of the peptide substrate can be determined by sequencing the peptide gene from the recovered yeast.

Page 11: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      175

Figure 6: Schematic illustration of bacteria surface display for discovering protease substrates.The substrate peptide library is inserted between a pair of interaction proteins consisting of a reporter tag (RT) and blocking domain (BD). When a peptide substrate is recognized and hydrolyzed by the protease, the BD domain diffuses away, allowing the PE fluorescently labeled reporter (R) domain to bind to the RT. The peptide expression level is quantified by measuring the expression of albumin binding protein (ABP), through its interaction with an Alexa647 fluorescently labeled affinity protein human serum albumin (HSA). After FACS sorting of bacteria carrying both PE and Alexa647 dyes, the substrate peptide sequences are determined by sequencing the peptide gene.

Figure 7: Schematic illustration of peptide nucleic acid (PNA)-tagged synthetic peptide libraries for discovering protease substrates.Rhodamine-based fluorogenic substrates encoded with PNA tags are chemically synthesized and treated with a protease of interest. Recognition by the protease results in the hydrolysis of the C-terminal amide bonds to generate free amino-rhodamine, which becomes fluorescent. After hybridization to the DNA microarray and fluorescent scanning, the sequence of the substrate peptide is deconvoluted by way of the DNA sequence information of the fluorescent array spots.

Page 12: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

176      S. Chen et al.: Mapping protease substrate specificity

have been used to quantify protease activity and sub-strate specificity of serine and cysteine proteases includ-ing caspase-3, thrombin and plasmin (Winssinger et  al., 2004). While there have not been many recent examples of applications of nucleotide-encoded protease substrate libraries, the approach still has potential value as current sequencing methods have improved and there remains great potential for use of such encoded libraries for rapid screening of highly diverse substrate libraries.

Mass spectrometry-based approachesOne of the limitations of most synthetic PS-SCLs and peptide microarrays is the use of tags or fluorescent reporters for read out of proteolytic activity. These labels can alter the substrate structure and affect cleavage rates. As an alternative, MS has a significant advantage over other methods to detect substrate specificity as it does not require analytes to be labeled with reporters, offering greater flexibility in experiments. In one recent example of an MS-based approach, self-assembled monolayers for matrix-assisted laser-desorption-ionization mass spectro-metry (SAMDI-MS) are used to detect peptide substrates in their native states. Peptides from libraries are individu-ally treated with a protease in 384-well plates and then immobilized onto a monolayer array plate. The monolayer is then irradiated with a laser that releases the ionized peptide species from the surface, which can be analyzed by a mass spectrometer for characterization. In this way, a 76-peptide array for scanning the P2, P1, P1′ and P2′ sub-strate positions led to the identification of a tetrapeptide substrate exhibiting high activity for the bacterial outer-membrane protease (OmpT) (Wood et al., 2017).

In another recently developed MS method, multiplex substrate profiling by mass spectrometry (MSP-MS), sub-strate specificity of endo- or exopeptidases was determined using liquid chromatography-tandem mass spectrometry (LC-MS/MS) sequencing (O’Donoghue et  al., 2012). This method is built around a synthetic library of 124 tetradeca-peptides, which were computationally designed to result in 1612 potential protease cleavage sites that comprehensively cover the substrate signatures of all protease families. Com-paring the LC-MS/MS traces of the library with and without peptidases added at multiple time intervals reveals cleav-age sites and amino acid residue preferences for a given protease of interest. Since its first development in 2012, this approach has been used to map the substrate specificity of a number of important protease targets including caspases,

rhomboids, matriptase and hepsin. It has also been used to map the substrate specificity of the malaria proteasome and other important pathogens (Corvo et  al., 2013; Julien et al., 2016; Lentz et al., 2016; Li et al., 2016; Beekman et al., 2018; Leontovyc et al., 2018; Dahms et al., 2019).

Quantitative information of the peptide cleavage can be collected by further incorporating isobaric tandem mass tag (TMT) labels into the MSP-MS workflow. This method, called quantitative multiplex substrate profiling by mass spectrometry (qMSP-MS) minimizes experimental and instrument-derived variance while improving throughput of the assay. Furthermore, by labeling samples at multi-ple time intervals, it is possible to accurately quantify peptides and calculate turnover rates of each proteolytic event (Figure 8). To validate the workflow of qMSP-MS, substrate specificity of papain, HiGlpG, PsAarA and lung cancer secretions were characterized (Lapek et al., 2019).

Other related methods to search for protease substratesBesides the mentioned synthetic and biological approaches to generate peptide libraries for discovering protease substrates, peptide libraries can also be gener-ated by processing of native proteomes to produce frag-ments which can then be used to map substrate specificity by MS. Terminal amine isotopic labeling of substrates (TAILS) applies quantitative proteomic methods to iden-tify the difference of N-terminal fragments of proteins in a proteome sample after addition of target proteases to identify native cut sites (Kleifeld et al., 2010). Proteomic identification of cleavage sites (PICS) quantifies the prime side of protein sequences generated by protease hydroly-sis and searches the non-prime portion of the discov-ered protein fragment to simultaneously identify both the prime- and non-prime-side specificities of individual protease targets (Schilling et  al., 2011). Some examples of protease substrates identified using proteomics-based methods are summarized in Table 1.

Conclusions and future perspectivesAdvances in protease substrate library design and syn-thesis, including chemical and biological approaches, have greatly aided in the development of selective pro-tease substrates and inhibitors. These peptides and small molecules have become powerful tools to study the roles of proteases in complex biological contexts including

Page 13: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      177

their mechanisms of action, regulation, processing and their association with specific human disease states. Fur-thermore, innovations in substrate library generation, screening and deconvolution techniques have accelerated the identification of native cellular protease substrates. However, it is still challenging to develop specific sub-strates to target single proteases without being recognized and cleaved by other proteases sharing similar substrate recognition preferences. In addition to the sequence of the peptide chains, the conformation that the substrate can adopt plays a crucial role in determining substrate speci-ficity. This is largely due to the intrinsic flexibility of linear polypeptide chains. The flexibility around C-α drives linear polypeptide chains to adopt conformations influ-enced by disulfide bonding and hydrophobic restraints. On the other hand, the flexibility of linear peptide chains enables these substrates to fit into defined specificity pockets of different proteases. Cyclizing polypeptide chains to generate rigid conformations has been shown to be a promising strategy to reduce flexibility and increase target binding affinity and specificity (Heinis et al., 2009; Angelini et  al., 2012; Baeriswyl et  al., 2012; Chen et  al., 2013; Chen et al., 2014). Cyclic peptides have been demon-strated to be potent protease inhibitors that are less prone to unspecific protease degradation resulting in increased bio-availability and improved drug-like properties. Due to the difficulties of generating cyclic peptide libraries and sequencing cyclic peptides with tandem MS methods, there are currently no synthetic cyclic peptide libraries for the screening and discovery of protease substrates. However, current biological display methods are being

engineered to allow cyclic and bi-cyclic peptide display which should help to facilitate future screening of these potentially valuable scaffolds as substrates that can be converted into substrates and inhibitors with good phar-macological properties (Maola et  al., 2019; Wang et  al., 2019). This review has hopefully provided some insight into the current synthetic and biological methods used to generate highly diverse substrates for mapping protease substrate specificity, and also has highlighted the poten-tial application of cyclic peptide in generating potent and specific probes with improved bioavailability. It is likely that future advances in these methods will lead to a further expansion of the tool box of reagents for the study and therapeutic targeting of proteases.

Acknowledgments: This work was funded by NIH grant R01 EB026285 02 (Funder Id: http://dx.doi.org/10.13039/100000002) (to M.B.), Swiss National Science Foundation Postdoc. Mobility fellowship P2ELP3_155323 P300PB_164725 (to S.C.), Stanford ChEM-H Chemistry/Biology Interface Predoctoral Training Program and NSF Graduate Research Fellowship Grant DGE-114747 (to J.J.Y.). Stanford University is also acknowledged.

ReferencesAcharya, A.N., Ostresh, J.M., and Houghten, R.A. (2002). Determina-

tion of isokinetic ratios necessary for equimolar incorporation of carboxylic acids in the solid-phase synthesis of mixture-based combinatorial libraries. Biopolymers 65, 32–39.

Figure 8: Schematic illustration of quantitative multiplex substrate profiling by mass spectrometry (qMSP-MS).A mixture of synthetic peptide libraries is treated with and without the protease of interest. Substrate peptides that are hydrolyzed expose fresh free amino groups at the N-terminus, which are further labeled with different tandem mass tags (TMT) over various timepoints. After being subjected to tandem mass spectrometric analysis, the full-length peptide substrates and their hydrolysis fragments are analyzed to reveal the sequence of the substrate peptides. The substrate sequence is resynthesized as a fluorescently quenched peptide and treated with the protease for validation of the peptide sequence.

Page 14: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

178      S. Chen et al.: Mapping protease substrate specificity

Agard, N.J., Maltby, D., and Wells, J.A. (2010). Inflammatory stimuli regulate caspase substrate profiles. Mol. Cell. Proteomics 9, 880–893.

Agard, N.J., Mahrus, S., Trinidad, J.C., Lynn, A., Burlingame, A.L., and Wells, J.A. (2012). Global kinetic analysis of proteolysis via quantitative targeted proteomics. Proc. Natl. Acad. Sci. U.S.A. 109, 1913–1918.

Angelini, A., Cendron, L., Chen, S., Touati, J., Winter, G., Zanotti, G., and Heinis, C. (2012). Bicyclic peptide inhibitor reveals large contact interface with a protease target. ACS Chem. Biol. 7, 817–821.

Arutyunova, E., Jiang, Z., Yang, J., Kulepa, A.N., Young, H.S., Verhelst, S., O’Donoghue, A.J., and Lemieux, M.J. (2018). An internally quenched peptide as a new model substrate for rhomboid intramembrane proteases. Biol. Chem. 399, 1389–1397.

Baeriswyl, V., Rapley, H., Pollaro, L., Stace, C., Teufel, D., Walker, E., Chen, S., Winter, G., Tite, J., and Heinis, C. (2012). Bicyclic peptides with optimized ring size inhibit human plasma kal-likrein and its orthologues while sparing paralogous proteases. ChemMedChem 7, 1173–1176.

Bauvois, B. (2004). Transmembrane proteases in cell growth and invasion: new contributors to angiogenesis? Oncogene 23, 317–329.

Beekman, C., Jiang, Z., Suzuki, B.M., Palmer, J.M., Lindner, D.L., O’Donoghue, A.J., Knudsen, G.M., and Bennett, R.J. (2018). Characterization of PdCP1, a serine carboxypeptidase from Pseudogymnoascus destructans, the causal agent of White-nose Syndrome. Biol. Chem. 399, 1375–1388.

Behrendt, R., White, P., and Offer, J. (2016). Advances in Fmoc solid-phase peptide synthesis. J. Pept. Sci. 22, 4–27.

Brik, A. and Wong, C.H. (2003). HIV-1 protease: mechanism and drug discovery. Org. Biomol. Chem. 1, 5–14.

Byzia, A., Haeggstrom, J.Z., Salvesen, G.S., and Drag, M. (2014). A remarkable activity of human leukotriene A4 hydrolase (LTA4H) toward unnatural amino acids. Amino Acids 46, 1313–1320.

Byzia, A., Szeffler, A., Kalinowski, L., and Drag, M. (2016). Activity profiling of aminopeptidases in cell lysates using a fluorogenic substrate library. Biochimie 122, 31–37.

Caberoy, N.B., Alvarado, G., and Li, W. (2011). Identification of calpain substrates by ORF phage display. Molecules 16, 1739–1748.

Capek, P., Kirkconnell, K.S., and Dickerson, T.J. (2010). A bacterio-phage-based platform for rapid trace detection of proteases. J. Am. Chem. Soc. 132, 13126–13128.

Chen, S., Gfeller, D., Buth, S.A., Michielin, O., Leiman, P.G., and Heinis, C. (2013). Improving binding affinity and stability of peptide ligands by substituting glycines with D-amino acids. Chembiochem 14, 1316–1322.

Chen, S., Bertoldo, D., Angelini, A., Pojer, F., and Heinis, C. (2014). Peptide ligands stabilized by small molecules. Angew. Chem. Int. Ed. 53, 1602–1606.

Choe, Y., Leonetti, F., Greenbaum, D.C., Lecaille, F., Bogyo, M., Bro-mme, D., Ellman, J.A., and Craik, C.S. (2006). Substrate profil-ing of cysteine proteases using a combinatorial peptide library identifies functionally unique specificities. J. Biol. Chem. 281, 12824–12832.

Clackson, T., Hoogenboom, H.R., Griffiths, A.D., and Winter, G. (1991). Making antibody fragments using phage display librar-ies. Nature 352, 624–628.

Corvo, I., O’Donoghue, A.J., Pastro, L., Pi-Denis, N., Eroy-Reveles, A., Roche, L., McKerrow, J.H., Dalton, J.P., Craik, C.S., Caffrey, C.R., et al. (2013). Dissecting the active site of the collageno-lytic cathepsin L3 protease of the invasive stage of Fasciola hepatica. PLoS Negl. Trop. Dis. 7, e2269.

Crawford, E.D., Seaman, J.E., Agard, N., Hsu, G.W., Julien, O., Mahrus, S., Nguyen, H., Shimbo, K., Yoshihara, H.A., Zhuang, M., et al. (2013). The DegraBase: a database of proteolysis in healthy and apoptotic human cells. Mol. Cell. Proteomics 12, 813–824.

Dahms, S.O., Demir, F., Huesgen, P.F., Thorn, K., and Brandstet-ter, H. (2019). Sirtilins – the new old members of the vitamin K-dependent coagulation factor family. J. Thromb. Haemost. 17, 470–481.

Damalanka, V.C., Han, Z., Karmakar, P., O’Donoghue, A.J., La Greca, F., Kim, T., Pant, S.M., Helander, J., Klefstrom, J., Craik, C.S., et al. (2019). Discovery of selective matriptase and hepsin serine protease inhibitors: useful chemical tools for cancer cell biology. J. Med. Chem. 62, 480–490.

de Vries, L.E., Sanchez, M.I., Groborz, K., Kuppens, L., Poreba, M., Lehmann, C., Nevins, N., Withers-Martinez, C., Hirst, D.J., Yuan, F., et al. (2019). Characterization of P. falciparum dipeptidyl aminopeptidase 3 specificity identifies differences in amino acid preferences between peptide-based substrates and cova-lent inhibitors. FEBS J. 286, 3998–4023.

Debela, M., Magdolen, V., Skala, W., Elsasser, B., Schneider, E.L., Craik, C.S., Biniossek, M.L., Schilling, O., Bode, W., Brandstet-ter, H., et al. (2018). Structural determinants of specificity and regulation of activity in the allosteric loop network of human KLK8/neuropsin. Sci. Rep. 8, 10705.

Deperthes, D. (2002). Phage display substrate: a blind method for determining protease specificity. Biol. Chem. 383, 1107–1112.

Dix, M.M., Simon, G.M., and Cravatt, B.F. (2008). Global mapping of the topography and magnitude of proteolytic events in apopto-sis. Cell 134, 679–691.

Drag, M., Bogyo, M., Ellman, J.A., and Salvesen, G.S. (2010). Amin-opeptidase fingerprints, an integrated approach for identifica-tion of good substrates and optimal inhibitors. J. Biol. Chem. 285, 3310–3318.

Dydio, P., Key, H.M., Hayashi, H., Clark, D.S., and Hartwig, J.F. (2017). Chemoselective, enzymatic C–H bond amination cata-lyzed by a cytochrome P450 containing an Ir(Me)-PIX cofactor. J. Am. Chem. Soc. 139, 1750–1753.

Edgington, L.E., van Raam, B.J., Verdoes, M., Wierschem, C., Salvesen, G.S., and Bogyo, M. (2012). An optimized activity-based probe for the study of caspase-6 activation. Chem. Biol. 19, 340–352.

Ekino, K., Yonei, S., Oyama, H., Oka, T., Nomura, Y., and Shin, T. (2018). Cloning, purification, and characterization of tripeptidyl peptidase from Streptomyces herbaricolor TY-21. Appl. Bio-chem. Biotechnol. 184, 239–252.

Enoksson, M., Li, J., Ivancic, M.M., Timmer, J.C., Wildfang, E., Eroshkin, A., Salvesen, G.S., and Tao, W.A. (2007). Identifica-tion of proteolytic cleavage sites by quantitative proteomics. J. Proteome Res. 6, 2850–2858.

Frasch, A.P., Bouvier, L.A., Oppenheimer, F.M., Juliano, M.A., Juliano, L., Carmona, A.K., Cazzulo, J.J., and Niemirowicz, G.T. (2018). Substrate specificity profiling of M32 metallocarboxypepti-dases from Trypanosoma cruzi and Trypanosoma brucei. Mol. Biochem. Parasitol. 219, 10–16.

Page 15: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      179

Furka, A., Sebestyen, F., Asgedom, M., and Dibo, G. (1991). General method for rapid synthesis of multicomponent peptide mix-tures. Int. J. Pept. Protein Res. 37, 487–493.

Gersch, M., Stahl, M., Poreba, M., Dahmen, M., Dziedzic, A., Drag, M., and Sieber, S.A. (2016). Barrel-shaped ClpP proteases display attenuated cleavage specificities. ACS Chem. Biol. 11, 389–399.

Gosalia, D.N. and Diamond, S.L. (2003). Printing chemical libraries on microarrays for fluid phase nanoliter reactions. Proc. Natl. Acad. Sci. U.S.A. 100, 8721–8726.

Gosalia, D.N., Salisbury, C.M., Ellman, J.A., and Diamond, S.L. (2005a). High throughput substrate specificity profiling of ser-ine and cysteine proteases using solution-phase fluorogenic peptide microarrays. Mol. Cell. Proteomics 4, 626–636.

Gosalia, D.N., Salisbury, C.M., Maly, D.J., Ellman, J.A., and Diamond, S.L. (2005b). Profiling serine protease substrate specificity with solution phase fluorogenic peptide microarrays. Proteom-ics 5, 1292–1298.

Groborz, K., Kolt, S., Kasperkiewicz, P., and Drag, M. (2019). Internally quenched fluorogenic substrates with unnatural amino acids for cathepsin G investigation. Biochimie. 166, 103–111.

Gruba, N., Rodriguez Martinez, J.I., Grzywa, R., Wysocka, M., Skorenski, M., Burmistrz, M., Lecka, M., Lesner, A., Sienczyk, M., and Pyrc, K. (2016). Substrate profiling of Zika virus NS2B-NS3 protease. FEBS Lett. 590, 3459–3468.

Gruba, N., Bielecka, E., Wysocka, M., Wojtysiak, A., Brzezinska-Bodal, M., Sychowska, K., Kalinska, M., Magoch, M., Pecak, A., Falkowski, K., et al. (2019). Development of chemical tools to monitor human kallikrein 13 (KLK13) activity. Int. J. Mol. Sci. 20, 1557.

Harris, J.L., Backes, B.J., Leonetti, F., Mahrus, S., Ellman, J.A., and Craik, C.S. (2000). Rapid and general profiling of protease specificity by using combinatorial fluorogenic substrate librar-ies. Proc. Natl. Acad. Sci. U.S.A. 97, 7754–7759.

Heinis, C., Rutherford, T., Freund, S., and Winter, G. (2009). Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nat. Chem. Biol. 5, 502–507.

Impens, F., Colaert, N., Helsens, K., Ghesquiere, B., Timmerman, E., De Bock, P.J., Chain, B.M., Vandekerckhove, J., and Gevaert, K. (2010). A quantitative proteomics design for systematic identification of protease cleavage events. Mol. Cell. Proteom-ics 9, 2327–2333.

Jagdeo, J.M., Dufour, A., Klein, T., Solis, N., Kleifeld, O., Kizhakke-dathu, J., Luo, H.L., Overall, C.M., and Jan, E. (2018). N-termi-nomics TAILS identifies host cell substrates of poliovirus and coxsackievirus B3 3C proteinases that modulate virus infection. J. Virol. 92, 23.

Jamali, H., Khan, H.A., Stringer, J.R., Chowdhury, S., and Ellman, J.A. (2015). Identification of multiple structurally distinct, nonpep-tidic small molecule inhibitors of protein arginine deiminase 3 using a substrate-based fragment method. J. Am. Chem. Soc. 137, 3616–3621.

Jefferson, T., Keller, U.A.D., Bellac, C., Metz, V.V., Broder, C., Hedrich, J., Ohler, A., Maier, W., Magdolen, V., Sterchi, E., et al. (2013). The substrate degradome of meprin metalloproteases reveals an unexpected proteolytic link between meprin β and ADAM10. Cell. Mol. Life Sci. 70, 309–333.

Julien, O. and Wells, J.A. (2017). Caspases and their substrates. Cell Death Differ. 24, 1380–1389.

Julien, O., Kampmann, M., Bassik, M.C., Zorn, J.A., Venditto, V.J., Shimbo, K., Agard, N.J., Shimada, K., Rheingold, A.L., Stock-well, B.R., et al. (2014). Unraveling the mechanism of cell death induced by chemical fibrils. Nat. Chem. Biol. 10, 969–976.

Julien, O., Zhuang, M., Wiita, A.P., O’Donoghue, A.J., Knudsen, G.M., Craik, C.S., and Wells, J.A. (2016). Quantitative MS-based enzymology of caspases reveals distinct protein substrate specificities, hierarchies, and cellular roles. Proc. Natl. Acad. Sci. U.S.A. 113, E2001–E2010.

Kara, E., Manna, D., Loset, G.A., Schneider, E.L., Craik, C.S., and Kanse, S. (2017). Analysis of the substrate specificity of Factor VII activating protease (FSAP) and design of specific and sensi-tive peptide substrates. Thromb. Haemost. 117, 1750–1760.

Kasperkiewicz, P., Poreba, M., Snipas, S.J., Parker, H., Winterbourn, C.C., Salvesen, G.S., and Drag, M. (2014). Design of ultrasen-sitive probes for human neutrophil elastase through hybrid combinatorial substrate library profiling. Proc. Natl. Acad. Sci. USA 111, 2518–2523.

Kasperkiewicz, P., Poreba, M., Snipas, S.J., Lin, S.J., Kirchhofer, D., Salvesen, G.S., and Drag, M. (2015). Design of a selective substrate and activity based probe for human neutrophil serine protease 4. PLoS One 10, e0132818.

Kasperkiewicz, P., Poreba, M., Groborz, K., and Drag, M. (2017). Emerging challenges in the design of selective substrates, inhibitors and activity-based probes for indistinguishable proteases. FEBS J. 284, 1518–1539.

Kasperkiewicz, P., Kolt, S., Janiszewski, T., Groborz, K., Poreba, M., Snipas, S.J., Salvesen, G.S., and Drag, M. (2018). Determina-tion of extended substrate specificity of the MALT1 as a strategy for the design of potent substrates and activity-based probes. Sci. Rep. 8, 15998.

Keller, U.A.D., Prudova, A., Gioia, M., Butler, G.S., and Overall, C.M. (2010). A statistics-based platform for quantitative N-terminome analysis and identification of protease cleavage products. Mol. Cell. Proteomics 9, 912–927.

King, S.L., Goth, C.K., Eckhard, U., Joshi, H.J., Haue, A.D., Vakhru-shev, S.Y., Schjoldager, K.T., Overall, C.M., and Wandall, H.H. (2018). TAILS N-terminomics and proteomics reveal complex regulation of proteolytic cleavage by O-glycosylation. J. Biol. Chem. 293, 7629–7644.

Kleifeld, O., Doucet, A., auf dem Keller, U., Prudova, A., Schilling, O., Kainthan, R.K., Starr, A.E., Foster, L.J., Kizhakkedathu, J.N., and Overall, C.M. (2010). Isotopic labeling of terminal amines in complex samples identifies protein N-termini and protease cleavage products. Nat. Biotechnol. 28, 281–288.

Kretz, C.A., Tomberg, K., Van Esbroeck, A., Yee, A., and Ginsburg, D. (2018). High throughput protease profiling comprehensively defines active site specificity for thrombin and ADAMTS13. Sci. Rep. 8, 2788.

Lam, K.S., Salmon, S.E., Hersh, E.M., Hruby, V.J., Kazmierski, W.M., and Knapp, R.J. (1991). A new type of synthetic peptide library for identifying ligand-binding activity. Nature 354, 82–84.

Lapek Jr, J.D., Jiang, Z., Wozniak, J.M., Arutyunova, E., Wang, S.C., Lemieux, M.J., Gonzalez, D.J., and O’Donoghue, A.J. (2019). Quantitative multiplex substrate profiling of peptidases by mass spectrometry. Mol. Cell. Proteomics 18, 968–981.

Leiting, B., Pryor, K.D., Wu, J.K., Marsilio, F., Patel, R.A., Craik, C.S., Ellman, J.A., Cummings, R.T., and Thornberry, N.A. (2003). Catalytic properties and inhibition of proline-specific dipepti-dyl peptidases II, IV and VII. Biochem. J. 371, 525–532.

Page 16: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

180      S. Chen et al.: Mapping protease substrate specificity

Lentz, C.S., Ordonez, A.A., Kasperkiewicz, P., La Greca, F., O’Donoghue, A.J., Schulze, C.J., Powers, J.C., Craik, C.S., Drag, M., Jain, S.K., et al. (2016). Design of selective substrates and activity-based probes for hydrolase important for pathogenesis 1 (HIP1) from Mycobacterium tuberculosis. ACS Infect. Dis. 2, 807–815.

Lentz, C.S., Sheldon, J.R., Crawford, L.A., Cooper, R., Garland, M., Amieva, M.R., Weerapana, E., Skaar, E.P., and Bogyo, M. (2018). Identification of a S. aureus virulence factor by activity-based protein profiling (ABPP). Nat. Chem. Biol. 14, 609–617.

Leontovyc, A., Ulrychova, L., O’Donoghue, A.J., Vondrasek, J., Maresova, L., Hubalek, M., Fajtova, P., Chanova, M., Jiang, Z., Craik, C.S., et al. (2018). SmSP2: a serine protease secreted by the blood fluke pathogen Schistosoma mansoni with anti-hemostatic properties. PLoS Negl. Trop. Dis. 12, e0006446.

Li, J., Lim, S.P., Beer, D., Patel, V., Wen, D., Tumanut, C., Tully, D.C., Williams, J.A., Jiricek, J., Priestle, J.P., et al. (2005). Functional profiling of recombinant NS3 proteases from all four serotypes of dengue virus using tetrapeptide and octapeptide substrate libraries. J. Biol. Chem. 280, 28766–28774.

Li, H., O’Donoghue, A.J., van der Linden, W.A., Xie, S.C., Yoo, E., Foe, I.T., Tilley, L., Craik, C.S., da Fonseca, P.C., and Bogyo, M. (2016). Structure- and function-based design of Plasmodium-selective proteasome inhibitors. Nature 530, 233–236.

Li, Q., Yi, L., Hoi, K.H., Marek, P., Georgiou, G., and Iverson, B.L. (2017). Profiling protease specificity: combining Yeast ER Sequestration Screening (YESS) with next generation sequenc-ing. ACS Chem. Biol. 12, 510–518.

Li, C.Y., de Veer, S.J., Law, R.H.P., Whisstock, J.C., Craik, D.J., and Swedberg, J.E. (2019). Characterising the subsite specificity of urokinase-type plasminogen activator and tissue-type plasmi-nogen activator using a sequence-defined peptide aldehyde library. Chembiochem 20, 46–50.

Lopez-Otin, C. and Bond, J.S. (2008). Proteases: multifunctional enzymes in life and disease. J. Biol. Chem. 283, 30433–30437.

Mahrus, S., Trinidad, J.C., Barkan, D.T., Sali, A., Burlingame, A.L., and Wells, J.A. (2008). Global Sequencing of proteolytic cleav-age sites in apoptosis by specific labeling of protein N termini. Cell 134, 866–876.

Maola, K., Wilbs, J., Touati, J., Sabisz, M., Kong, X.D., Baumann, A., Deyle, K., and Heinis, C. (2019). Engineered peptide macrocy-cles can inhibit matrix metalloproteinases with high selectivity. Angew. Chem. Int. Ed. 58, 11801–11805.

Marcondes, M.F., Alves, F.M., Assis, D.M., Hirata, I.Y., Juliano, L., Oliveira, V., and Juliano, M.A. (2015). Substrate specificity of mitochondrial intermediate peptidase analysed by a support-bound peptide library. FEBS Open Bio 5, 429–436.

Marshall, N.C., Klein, T., Thejoe, M., von Krosigk, N., Kizhakkedathu, J., Finlay, B.B., and Overall, C.M. (2018). Global profiling of pro-teolysis from the mitochondrial amino terminome during early intrinsic apoptosis prior to caspase-3 activation. J. Proteome Res. 17, 4279–4296.

Mason, S.D. and Joyce, J.A. (2011). Proteolytic networks in cancer. Trends Cell Biol. 21, 228–237.

Matskevich, A.A. and Moelling, K. (2008). Stimuli-dependent cleav-age of Dicer during apoptosis. Biochem. J. 412, 527–534.

Matthews, D.J. and Wells, J.A. (1993). Substrate phage: selection of protease substrates by monovalent phage display. Science 260, 1113–1117.

Merrifield, R.B. (1985). Solid-phase synthesis (nobel lecture). Angew. Chem. Int. Ed. 24, 799–810.

Nguyen, M.T.N., Shema, G., Zahedi, R.P., and Verhelst, S.H.L. (2018). Protease specificity profiling in a pipet tip using ‘charge-synchronized’ proteome-derived peptide libraries. J. Proteome Res. 17, 1923–1933.

O’Donoghue, A.J., Eroy-Reveles, A.A., Knudsen, G.M., Ingram, J., Zhou, M., Statnekov, J.B., Greninger, A.L., Hostetter, D.R., Qu, G., Maltby, D.A., et al. (2012). Global identification of peptidase specificity by multiplex substrate profiling. Nat. Methods 9, 1095–1100.

O’Donoghue, A.J., Jin, Y., Knudsen, G.M., Perera, N.C., Jenne, D.E., Murphy, J.E., Craik, C.S., and Hermiston, T.W. (2013). Global substrate profiling of proteases in human neutrophil extracel-lular traps reveals consensus motif predominantly contributed by elastase. PLoS One 8, e75141.

O’Donoghue, A.J., Knudsen, G.M., Beekman, C., Perry, J.A., Johnson, A.D., DeRisi, J.L., Craik, C.S., and Bennett, R.J. (2015). Destructin-1 is a collagen-degrading endopeptidase secreted by Pseudogymnoascus destructans, the causative agent of white-nose syndrome. Proc. Natl. Acad. Sci. U.S.A. 112, 7478–7483.

Ostresh, J.M., Winkle, J.H., Hamashin, V.T., and Houghten, R.A. (1994). Peptide libraries: determination of relative reaction rates of protected amino acids in competitive couplings. Biopolymers 34, 1681–1689.

Perona, J.J. and Craik, C.S. (1997). Evolutionary divergence of sub-strate specificity within the chymotrypsin-like serine protease fold. J. Biol. Chem. 272, 29987–29990.

Pethe, M.A., Rubenstein, A.B., and Khare, S.D. (2019). Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations. Proc. Natl. Acad. Sci. U.S.A. 116, 168–176.

Ponder, E.L., Albrow, V.E., Leader, B.A., Bekes, M., Mikolajczyk, J., Fonovic, U.P., Shen, A., Drag, M., Xiao, J., Deu, E., et al. (2011). Functional characterization of a SUMO deconjugating protease of Plasmodium falciparum using newly identified small molecule inhibitors. Chem. Biol. 18, 711–721.

Pop, C. and Salvesen, G.S. (2009). Human caspases: activation, specificity, and regulation. J. Biol. Chem. 284, 21777–21781.

Poreba, M. and Drag, M. (2010). Current strategies for prob-ing substrate specificity of proteases. Curr. Med. Chem. 17, 3968–3995.

Poreba, M., McGowan, S., Skinner-Adams, T.S., Trenholme, K.R., Gardiner, D.L., Whisstock, J.C., To, J., Salvesen, G.S., Dalton, J.P., and Drag, M. (2012). Fingerprinting the substrate specific-ity of M1 and M17 aminopeptidases of human malaria, Plasmo-dium falciparum. PLoS One 7, e31938.

Poreba, M., Mihelic, M., Krai, P., Rajkovic, J., Krezel, A., Pawelczak, M., Klemba, M., Turk, D., Turk, B., Latajka, R., et al. (2014a). Unnatural amino acids increase activity and specificity of syn-thetic substrates for human and malarial cathepsin C. Amino Acids 46, 931–943.

Poreba, M., Szalek, A., Kasperkiewicz, P., and Drag, M. (2014b). Positional scanning substrate combinatorial library (PS-SCL) approach to define caspase substrate specificity. Methods Mol. Biol. 1133, 41–59.

Poreba, M., Salvesen, G.S., and Drag, M. (2017a). Synthesis of a HyCoSuL peptide substrate library to dissect protease sub-strate specificity. Nat. Protoc. 12, 2189–2214.

Page 17: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

S. Chen et al.: Mapping protease substrate specificity      181

Poreba, M., Szalek, A., Rut, W., Kasperkiewicz, P., Rutkowska-Wlodarczyk, I., Snipas, S.J., Itoh, Y., Turk, D., Turk, B., Overall, C.M., et al. (2017b). Highly sensitive and adaptable fluorescence-quenched pair discloses the substrate specificity profiles in diverse protease families. Sci. Rep. 7, 43135.

Poreba, M., Rut, W., Vizovisek, M., Groborz, K., Kasperkiewicz, P., Finlay, D., Vuori, K., Turk, D., Turk, B., Salvesen, G.S., et al. (2018). Selective imaging of cathepsin L in breast cancer by fluorescent activity-based probes. Chem. Sci. 9, 2113–2129.

Poreba, M., Groborz, K., Navarro, M., Snipas, S.J., Drag, M., and Salvesen, G.S. (2019a). Caspase selective reagents for diagnosing apoptotic mechanisms. Cell Death Differ. 26, 229–244.

Poreba, M., Groborz, K., Vizovisek, M., Maruggi, M., Turk, D., Turk, B., Powis, G., Drag, M., and Salvesen, G.S. (2019b). Fluorescent probes towards selective cathepsin B detection and visualization in cancer cells and patient samples. Chem. Sci. 10, 8461–8477.

Poreba, M., Rut, W., Groborz, K., Snipas, S.J., Salvesen, G.S., and Drag, M. (2019c). Potent and selective caspase-2 inhibitor pre-vents MDM-2 cleavage in reversine-treated colon cancer cells. Cell Death Differ. doi: 10.1038/s41418-019-0329-2.

Porodko, A., Cirnski, A., Petrov, D., Raab, T., Paireder, M., Mayer, B., Maresch, D., Nika, L., Biniossek, M.L., Gallois, P., et al. (2018). The two cathepsin B-like proteases of Arabidopsis thaliana are closely related enzymes with discrete endopeptidase and carboxydipeptidase activities. Biol. Chem. 399, 1223–1235.

Prudova, A., auf dem Keller, U., Butler, G.S., and Overall, C.M. (2010). Multiplex N-terminome analysis of MMP-2 and MMP-9 substrate degradomes by iTRAQ-TAILS quantitative proteomics. Mol. Cell. Proteomics 9, 894–911.

Quesada, V., Ordonez, G.R., Sanchez, L.M., Puente, X.S., and Lopez-Otin, C. (2009). The degradome database: mammalian proteases and diseases of proteolysis. Nucleic Acids Res. 37, D239–D243.

Ramirez, M.L.G., Poreba, M., Snipas, S.J., Groborz, K., Drag, M., and Salvesen, G.S. (2018). Extensive peptide and natural protein substrate screens reveal that mouse caspase-11 has much nar-rower substrate specificity than caspase-1. J. Biol. Chem. 293, 7058–7067.

Rano, T.A., Timkey, T., Peterson, E.P., Rotonda, J., Nicholson, D.W., Becker, J.W., Chapman, K.T., and Thornberry, N.A. (1997). A combinatorial approach for determining protease specificities: application to interleukin-1β converting enzyme (ICE). Chem. Biol. 4, 149–155.

Rawlings, N.D., Morton, F.R., and Barrett, A.J. (2006). MEROPS: the peptidase database. Nucleic Acids Res. 34, D270–D272.

Rawls, K.A., Lang, P.T., Takeuchi, J., Imamura, S., Baguley, T.D., Grundner, C., Alber, T., and Ellman, J.A. (2009). Fragment-based discovery of selective inhibitors of the Mycobacterium tuberculosis protein tyrosine phosphatase PtpA. Bioorg. Med. Chem. Lett. 19, 6851–6854.

Rebello, K.M., McKerrow, J.H., Mota, E.M., O’Donoghue, A.J., and Neves-Ferreira, A.G.C. (2018). Activity profiling of peptidases in Angiostrongylus costaricensis first-stage larvae and adult worms. PLoS Negl. Trop. Dis. 12, e0006923.

Reichardt, S., Repper, D., Tuzhikov, A.I., Galiullina, R.A., Planas-Marques, M., Chichkova, N.V., Vartapetian, A.B., Stintzi, A., and Schaller, A. (2018). The tomato subtilase family includes several cell death-related proteinases with caspase specificity. Sci. Rep. 8, 10531.

Russell, D.L., Brown, H.M., and Dunning, K.R. (2015). ADAMTS proteases in fertility. Matrix Biol. 44–46, 54–63.

Rut, W., Kasperkiewicz, P., Byzia, A., Poreba, M., Groborz, K., and Drag, M. (2015). Recent advances and concepts in substrate specificity determination of proteases using tailored libraries of fluorogenic substrates with unnatural amino acids. Biol. Chem. 396, 329–337.

Rut, W., Poreba, M., Kasperkiewicz, P., Snipas, S.J., and Drag, M. (2018). Selective substrates and activity-based probes for imaging of the human constitutive 20S proteasome in cells and blood samples. J. Med. Chem. 61, 5222–5234.

Rut, W., Nielsen, N.V., Czarna, J., Poreba, M., Kanse, S.M., and Drag, M. (2019). Fluorescent activity-based probe for the selective detection of Factor VII activating protease (FSAP) in human plasma. Thromb. Res. 182, 124–132.

Salisbury, C.M., Maly, D.J., and Ellman, J.A. (2002). Peptide microar-rays for the determination of protease substrate specificity. J. Am. Chem. Soc. 124, 14868–14870.

Sandersjoo, L., Jonsson, A., and Lofblom, J. (2017). Protease substrate profiling using bacterial display of self-blocking affinity proteins and flow-cytometric sorting. Biotechnol. J. 12, 1600365.

Schilling, O. and Overall, C.M. (2008). Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nat. Biotechnol. 26, 685–694.

Schilling, O., Huesgen, P.F., Barre, O., Auf dem Keller, U., and Over-all, C.M. (2011). Characterization of the prime and non-prime active site specificities of proteases by proteome-derived peptide libraries and tandem mass spectrometry. Nat. Protoc. 6, 111–120.

Schilling, O., Biniossek, M.L., Mayer, B., Elsasser, B., Brandstet-ter, H., Goettig, P., Stenman, U.H., and Koistinen, H. (2018). Specificity profiling of human trypsin-isoenzymes. Biol. Chem. 399, 997–1007.

Scott, J.K. and Smith, G.P. (1990). Searching for peptide ligands with an epitope library. Science 249, 386–390.

Seo, M.Y. and Rhee, K. (2018). Caspase-mediated cleavage of the centrosomal proteins during apoptosis. Cell Death Dis. 9, 571.

Smith, G.P. (1985). Filamentous fusion phage: novel expression vec-tors that display cloned antigens on the virion surface. Science 228, 1315–1317.

Smith, L.K., Shah, R.R., and Cidlowski, J.A. (2010). Glucocorticoids modulate microRNA expression and processing during lymphocyte apoptosis. J. Biol. Chem. 285, 36698–36708.

Soares, A., Niedermaier, S., Faro, R., Loos, A., Manadas, B., Faro, C., Huesgen, P.F., Cheung, A.Y., and Simoes, I. (2019). An atypical aspartic protease modulates lateral root development in Arabidopsis thaliana. J. Exp. Bot. 70, 2157–2171.

Starr, A.E., Bellac, C.L., Dufour, A., Goebeler, V., and Overall, C.M. (2012). Biochemical characterization and N-terminomics analysis of leukolysin, the membrane-type 6 matrix metallopro-tease (MMP25) chemokine and vimentin cleavages enhance cell migration and macrophage phagocytic activities. J. Biol. Chem. 287, 13382–13395.

Stennicke, H.R., Renatus, M., Meldal, M., and Salvesen, G.S. (2000). Internally quenched fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8. Biochem. J. 350, 563–568.

Thornberry, N.A., Rano, T.A., Peterson, E.P., Rasper, D.M., Timkey, T., Garcia-Calvo, M., Houtzager, V.M., Nordstrom, P.A., Roy, S.,

Page 18: Shiyu Chen, Joshua J. Yim and Matthew Bogyo* Synthetic and …med.stanford.edu/content/dam/sm/bogyolab/documents/ChenetalBio… · metastasis (Mason and Joyce, 2011; Russell et al.,

182      S. Chen et al.: Mapping protease substrate specificity

Vaillancourt, J.P., et al. (1997). A combinatorial approach defines specificities of members of the caspase family and granzyme B. Functional relationships established for key mediators of apoptosis. J. Biol. Chem. 272, 17907–17911.

Turk, V., Stoka, V., Vasiljeva, O., Renko, M., Sun, T., Turk, B., and Turk, D. (2012). Cysteine cathepsins: from structure, function and regulation to new frontiers. Biochim. Biophys. Acta 1824, 68–88.

Van Damme, P., Martens, L., Van Damme, J., Hugelier, K., Staes, A., Vandekerckhove, J., and Gevaert, K. (2005). Caspase-specific and nonspecific in vivo protein processing during Fas-induced apoptosis. Nat. Methods 2, 771–777.

van der Linden, W.A., Segal, E., Child, M.A., Byzia, A., Drag, M., and Bogyo, M. (2015). Design and synthesis of activity-based probes and inhibitors for bleomycin hydrolase. Chem. Biol. 22, 995–1001.

Verdoes, M., Edgington, L.E., Scheeren, F.A., Leyva, M., Blum, G., Weiskopf, K., Bachmann, M.H., Ellman, J.A., and Bogyo, M. (2012). A nonpeptidic cathepsin S activity-based probe for noninvasive optical imaging of tumor-associated macrophages. Chem. Biol. 19, 619–628.

Vizovisek, M., Vidmar, R., Drag, M., Fonovic, M., Salvesen, G.S., and Turk, B. (2018). Protease specificity: towards in vivo imaging applications and biomarker discovery. Trends Biochem. Sci. 43, 829–844.

Wang, X.S., Chen, P.C., Hampton, J.T., Tharp, J.M., Reed, C.A., Das, S.K., Wang, D.S., Hayatshahi, H.S., Shen, Y., Liu, J., et al. (2019). A genetically encoded, phage-displayed cyclic-peptide library. Angew. Chem. Int. Ed. 58, 15904–15909.

Ward, E.S., Gussow, D., Griffiths, A.D., Jones, P.T., and Winter, G. (1989). Binding activities of a repertoire of single immunoglob-ulin variable domains secreted from Escherichia coli. Nature 341, 544–546.

Weeks, A.M. and Wells, J.A. (2018). Engineering peptide ligase specificity by proteomic identification of ligation sites. Nat. Chem. Biol. 14, 50–57.

Wiita, A.P., Hsu, G.W., Lu, C.Y.M., Esensten, J.H., and Wells, J.A. (2014a). Circulating proteolytic signatures of chemotherapy-

induced cell death in humans discovered by N-terminal labe-ling. Proc. Natl. Acad. Sci. U.S.A. 111, 7594–7599.

Wiita, A.P., Seaman, J.E., and Wells, J.A. (2014b). Global analysis of cellular proteolysis by selective enzymatic labeling of protein N-termini. Methods Enzymol. 544, 327–358.

Wildes, D. and Wells, J.A. (2010). Sampling the N-terminal proteome of human blood. Proc. Natl. Acad. Sci. U.S.A. 107, 4561–4566.

Winssinger, N., Damoiseaux, R., Tully, D.C., Geierstanger, B.H., Burdick, K., and Harris, J.L. (2004). PNA-encoded protease substrate microarrays. Chem. Biol. 11, 1351–1360.

Winter, M.B., La Greca, F., Arastu-Kapur, S., Caiazza, F., Cimermancic, P., Buchholz, T.J., Anderl, J.L., Ravalin, M., Bohn, M.F., Sali, A., et al. (2017). Immunoproteasome functions explained by diver-gence in cleavage specificity and regulation. eLife 6, e27364.

Wood, W.J., Patterson, A.W., Tsuruoka, H., Jain, R.K., and Ellman, J.A. (2005). Substrate activity screening: a fragment-based method for the rapid identification of nonpeptidic protease inhibitors. J. Am. Chem. Soc. 127, 15521–15527.

Wood, S.E., Sinsinbar, G., Gudlur, S., Nallani, M., Huang, C.F., Liedberg, B., and Mrksich, M. (2017). A bottom-up proteomic approach to identify substrate specificity of outer-membrane protease OmpT. Angew. Chem. Int. Ed. 56, 16531–16535.

Xu, J.H., Jiang, Z., Solania, A., Chatterjee, S., Suzuki, B., Lietz, C.B., Hook, V.Y.H., O’Donoghue, A.J., and Wolan, D.W. (2018). A commensal dipeptidyl aminopeptidase with specificity for N-terminal glycine degrades human-produced antimicrobial peptides in vitro. ACS Chem. Biol. 13, 2513–2521.

Yaron, A., Carmel, A., and Katchalski-Katzir, E. (1979). Intramolecu-larly quenched fluorogenic substrates for hydrolytic enzymes. Anal. Biochem. 95, 228–235.

Yoo, E., Stokes, B.H., de Jong, H., Vanaerschot, M., Kumar, T., Lawrence, N., Njoroge, M., Garcia, A., Van der Westhuyzen, R., Momper, J.D., et al. (2018). Defining the determinants of specificity of Plasmodium proteasome inhibitors. J. Am. Chem. Soc. 140, 11424–11437.

Zimmerman, M., Yurewicz, E., and Patel, G. (1976). A new fluorogenic substrate for chymotrypsin. Anal. Biochem. 70, 258–262.