Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
www.sciencesignaling.org/cgi/content/full/2/98/ra76/DC1
Supplementary Materials for
Eukaryotic Protein Domains as Functional Units of Cellular Evolution
Jing Jin,* Xueying Xie, Chen Chen, Jin Gyoon Park, Chris Stark, D. Andrew James, Marina Olhovsky, Rune Linding, Yongyi Mao,* Tony Pawson*
*To whom correspondence should be addressed. E-mail: [email protected] (T.P.);
[email protected] (Y.M.); [email protected] (J.J.)
Published 24 November 2009, Sci. Signal. 2, ra76 (2009) DOI: 10.1126/scisignal.2000546
This PDF file includes:
Section S-I. Characterization of the functional relationships between RhoGEF/RhoGAP pairs Tuba/Rich1, and DEPDC1/P-Rex1. Section S-II. Correlation between protein kinase regulatory and catalytic domains. Section S-III. Web site for domain clubs across seven genomes. Section S-IV. Supplementary methods. Section S-V. Supplementary references. Table S1. Domains found in common versus domains that are unique to SH2 domain proteins or SH3 domain proteins. Fig. S1. Functional characterization of RhoGEF and RhoGAP protein pairs. Fig. S2. Domain profiling of human protein kinases (PTK) and protein tyrosine phosphatases (PTP). Fig. S3. Domain profiling of the human kinome. Fig. S4. Domain profiling across six eukaryotic kinomes. Fig. S5. Domain club cut-off selection. Fig. S6. Test of the robustness of the domain profiling method for applications in large data sets. Fig. S7. A Web site for domain club analysis—example of Dicer1 and its DSRM domain. Fig. S8. General procedure for spreading-on-graph (SOG) clustering in the analysis of the molecular environments of domain families. Fig. S9. The interactomes of the 70 domain families employed in the analysis of domain-based functional compartments. Fig. S10. An overview of the Royal Family of domains. Fig. S11. Sequence characteristics of the mouse Piwi clade of the Argonaute proteins. Fig. S12. An MS/MS spectrum of a tryptic peptide containing dimethylated (2me) Arg53 of endogenous Miwi protein from testis.
Supplementary - Eukaryotic protein domains as functional units of cellular
evolution.
Jing Jin, Xueying Xie, Chen Chen, Jin Gyoon Park, Chris Stark, D. Andrew James,
Marina Olhovsky, Rune Linding, Yongyi Mao and Tony Pawson
Section S-I. Characterization of the functional relationships between
RhoGEF/RhoGAP pairs Tuba/Rich1, and DEPDC1/P-Rex1.
Human proteins with RhoGEF and RhoGAP domains were clustered based on their
non-catalytic interaction domains (Fig. 2 and fig. S1A). We have explored whether co-
clustered GEFs and GAPs may have related functions by focusing on two clusters that
share a BAR domain or a DEP domain, respectively (Fig. 2, boxes B and C respectively).
BAR domains, present in GEFs (Rich1, SH3BP1 and KIAA0672) and GAPs
(Tuba/Dnmbp, FLJ41603 and ENSG00000188436) in Cluster B (blue box in fig. S1A),
can potentially bind phospholipids and curved membranes (1). Rich1 and related proteins
have an N-terminal BAR domain, a central GAP domain, and a C-terminal proline-rich
sequence with SH3 domain-binding sites, whereas Tuba has a BAR domain immediately
following the RhoGEF domain (fig. S1B). Tuba also has four N-terminal SH3 domains
which selectively bind dynamin, a GTPase involved in endocytosis, and two C-terminal
SH3 domains (fig. S1B), previously shown to recruit actin regulatory proteins, including
N-WASP, CR16, WAVE1, WIRE, PIR121, NAP1 and Ena/VASP proteins (2). Both
Rich1 and Tuba localize to cell-cell junctions, associate with proteins involved in
junction formation and cell polarity, and are active against the Cdc42 GTPase (3, 4). In
an analysis of binding partners for the C-terminal SH3 domains of Tuba, we identified a
S-1
number of known and novel targets involved in processes such as the formation of cell
junctions, cytoskeletal regulation and vesicle trafficking (fig. S1B). Some of these
proteins, including MUPP1, CD2AP and CIN85 also associate with Rich1 (fig. S1B,
asterisks) (4), directly or indirectly. These data suggest that Tuba and Rich1 impinge on
common aspects of cellular function, consistent with the notion that co-clustering based
on domain composition is indicative of a functional relationship.
Proteins in cluster C have GEF (P-Rex1 and DEPDC2/P-Rex2) or GAP (DEPDC1
and DEPDC1b) domains, in conjunction with a DEP domain (fig. S1C), which has been
variously implicated in membrane association and interactions with G protein coupled
receptors (5, 6). Of these, P-Rex1 has a Rac-specific GEF activity that is stimulated by
PI(3,4,5)P3 and Gβγ signalling (7). The GAP domain of DEPDC1 lacks an arginine
residue that is normally critical for GAP activity (not shown), and its precise role in
regulating Rho GTPases remains uncertain. It reportedly has a nuclear location (8), which
we have confirmed by transfection of Flag-tagged DEPDC1 into HEK293 cells (fig. S1C,
right), a finding that appears contrary to the involvement of other DEP domain proteins
with signalling at the plasma membrane. However, upon breakdown of the nuclear
envelope during mitosis, DEPDC1 becomes localized to the cell cortex (fig. S1C, right).
In addition, over-expression of P-Rex1, which is cytoplasmic, led to the relocalization of
DEPDC1 to the plasma membrane as judged by immunofluorescence, even in interphase
cells (fig. S1C, right). A similar recruitment of DEPDC1 from the nucleus to the
cytoplasm upon P-Rex1 over-expression was seen following cell fractionation (fig. S1C,
bottom). Over-expression of a distinct Rac-specific GEF, Tiam1, did not influence
DEPDC1 localization (data not shown). These results are consistent with the hypothesis
S-2
that ancillary interaction domains may target GEF and GAP proteins to related cellular
processes, although it does not necessarily imply that such GEF/GAP pairs work directly
in tandem.
Section S-II. Correlation between protein kinase regulatory and catalytic domains
The human genome encodes 518 protein kinases, which fall into ten classes based
on the sequence relationships and functional properties of their catalytic domains (9).
About half of these protein kinases also possess one or more interaction domains. These
are a common feature of some sub-groups, such as TK, TKL and RGC kinase families
(fig. S3, top panel), and can have diverse functions, including regulating catalytic
activity, targeting the kinase to specific subcellular locations, and interacting with
activators and substrates. Some kinases (e.g CK and CMGC family members) do not
possess intrinsic non-catalytic domains (fig. S3, top panel), but are often associated with
regulatory domain-based subunits; we have not considered such regulatory proteins in the
analysis outlined below.
We used domain profiling to explore whether multi-domain kinases can be grouped
based on their non-catalytic domains, and whether the resulting clusters are related to the
grouping obtained by sequence analysis of kinase domains. To this end, we analyzed all
annotated human protein kinases by hierarchical clustering, based on their domain
composition after removal of the common kinase domain (fig. S3, bottom panel). The
resulting matrix, with domains arrayed on the x-axis and kinases on the y-axis, reveals
groups of kinases that cluster because they possess related non-kinase domains (detailed
in panels a-f). Kinases are color-coded by their position on the kinome tree, which is
S-3
largely based on the sequences of their kinase domains. This analysis reveals a selective
relationship between kinase and interaction domains. For example, the tyrosine kinase
(TK) group largely separates away from serine/threonione kinases, and forms two
clusters (d and e) that contain receptor tyrosine kinases (RTK) on the one hand (cluster d)
and cytoplasmic tyrosine kinases on the other (cluster e). The RTKs are co-clustered due
to the presence of domains such as Ig, FN3 and SAM, found in both their extracellular
and cytoplasmic regions, whereas the cytoplasmic tyrosine kinases are grouped through
the frequent presence of SH2, SH3, PH, Btk and B41/FERM domains. In the case of
serine/threonine kinases, the largest cluster (cluster c) is primarily formed by members of
the AGC family, which are mostly basophilic kinases that are co-clustered on the matrix
because they contain domains such as C1, C2, PH, PB1 and RGS.
We also conducted a co-clustering analysis using the non-catalytic domains of
human protein tyrosine kinases (PTK) and protein tyrosine phosphatases (PTP) groups.
We observed co-clusters of PTKs and PTPs due to the presence of common SH2 and
FERM domains (fig. S2). Similarly, receptor tyrosine kinases (RTKs) and
transmembrane PTPs co-cluster due to common domains such as FN3 and Ig family
members (fig. S2). It is less common for serine/threonine kinases and phosphatases to
share common domains (data not shown), potentially because both classes of enzymes
employ a diverse array of regulatory subunits (10).
Furthermore, we used domain profiling to cluster protein kinases from
Saccharomyces cerevisiae (Sc), Dictyostelium discoideum (Dd), Caenorhabditis elegans
(Ce), Drosophila melanogaster (Dm), Mus musculus (Mn) and Homo sapiens (Hs), based
on their non-catalytic domains (fig. S4, left panel). This shows the extent to which
S-4
kinases from different species contribute to the various domain-based clusters (detailed in
boxes a-i). As examples, clusters a and b (equivalent to clusters d and e in S3) primarily
contain tyrosine kinases, which show marked expansions and contractions between
species. For example, receptor tyrosine kinases, most notably the Eph receptor TK group
with SAM and FN3 domains have markedly increased from invertebrates to vertebrates
(box a). Cytoplasmic TKs appear in cluster b, which shows the presence of SH2-
containing kinases (Shk1-5) in Dictyostelium, each with an SH2 domain linked to a TKL
catalytic domain, a striking expansion of C. elegans kinases with SH2 and TK domains
(11), and an increase in the numbers of SH2-containing kinases with additional
interaction domains, such as SH3, in vertebrates. In contrast, yeast (Sc) and Dictyostelium
(Dd), both lacking a bona fide tyrosine kinase, are only represented in box a by Ste11
family kinases with a SAM domain linked to a serine/threonine kinase (STK) domain.
The results indicate that several families of protein kinases can be similarly grouped
either by the sequence relationships of their kinase domains, or by unbiased hierarchical
clustering of their non-catalytic interaction domains. The results support a model in
which the catalytic and interaction domains of protein kinases have co-evolved to allow
appropriate physiological regulation of kinase activity and selection of specific
substrates.
Section S-III. Website for domain clubs across seven genomes – Readers can display
any individual domain file (domain terms defined in SMART and pfam) from an
interactive website http://pawsonlab.mshri.on.ca/DomainClub/domainClub.php. The
website allows the reader to search by protein and domain names. An analysis of protein
S-5
Dicer and an evolutionary overview of its DSRM domain are presented as an example
(also in fig. S7).
S-6
Section S-IV. Supplementary methods
Experimental procedures for transfection, affinity chromatography, mass
spectrometry, immunofluorescence and nuclear fractionation
Miwi, mAgo1 and Tdrd1 cDNAs were derived from MGC:150072, 150309 and
72119 respectively. Miwi and mAgo1 protein-coding sequences are fused in-frame to an
N-terminus FLAG sequence in a pcDNA3 vector. Tdrd1 is expressed using a Creator
EGFP vector system (12). FLAG-M2 antibody was from Sigma-Aldrich; EGFP antibody
was from Abcam (ab290). For endogenous Miwi immunoprecipitation, approximately 0.4
gram of 4-week mouse testis tissue was homogenized and lysed in buffer containing
0.5% NP40, 10 mM HEPES pH 7.5, 150 mM NaCl2, 1 mM EDTA, and 2 μg of anti
Hiwi/Miwi antibody (from Abcam) was used. The immunoprcipitated Miwi protein was
separated by SDS-polyacrylamide gel electrophoresis, isolated from a gel band, and
subjected to standard proteolytic digestion using trypsin. Resulting peptides were
analyzed in the QSTAR Elite Hybrid LC/MS/MS System (from the Applied
Biosystems/MDS Sciex). Peptide identification (including modifications) was assisted by
mass spectrometry software Protein Pilot (from BC Proteomic Network). Transfection of
HEK293T cells and co-immunoprecipitation assays have been described previously (13).
Full length human cDNA for Tuba/Dnmbp was derived from clone
DKFZp451J178. Recombinant GST fusion proteins containing Tuba SH3 domains 1-4
(residues 1-315) and SH3 domains 5-6 (residues 1276-1577) were expressed and purified
following the standard pGEX-4t-2 protocol (GE Healthcare Life Sciences). MYC-epitope
S-7
tagged P-Rex1 expression vector was a gift from Dr. Heidi Welch. Full-length human
DEPDC1 coding cDNA sequence was derived from MGC:70715, and then fused in-
frame to an N-terminus FLAG sequence in a pcDNA3 vector. Myc-9E10, Histone H1
(FL-219), Dynamin-H300, Mena-H50, WASP-H100, CIN85-H300 and CD2AP-B4
antibodies were from Santa Cruz; Zo-1 and MUPP1 antibodies were from Zymed.
General methods for protein identification from affinity chromatography using LC-
tandem mass spectrometry and methods for transfection and immunofluorescence
microscopy were described previously (13). Nuclear-cytoplasmic fractionation
experiment was carried out following a modified protocol from abcam.com. Briefly, one
day after transfection, HEK293 cells were lysed on ice in buffer containing 0.05% NP40,
10 mM HEPES pH 7.5, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT. Supernatant
representing the cytoplasmic fraction was collected from low-speed centrifugation. To
extract a nuclear fraction, the pellet was resuspended in buffer containing 300 mM NaCl,
5 mM HEPES pH 7.9, 1.5 mM MgCl2, 0.2 mM EDTA, 0.5 mM DTT, 26% glycerol
(v/v).
Section S-V. Supplementary references
1. B. J. Peter, H. M. Kent, I. G. Mills, Y. Vallis, P. J. Butler, P. R. Evans et al., BAR domains as sensors of membrane curvature: the amphiphysin BAR structure. Science 303, 495-499 (2004).
2. M. A. Salazar, A. V. Kwiatkowski, L. Pellegrini, G. Cestra, M. H. Butler, K. L. Rossman et al., Tuba, a novel protein containing bin/amphiphysin/Rvs and Dbl homology domains, links dynamin to regulation of the actin cytoskeleton. J Biol Chem 278, 49031-49043 (2003).
3. T. Otani, T. Ichii, S. Aono, M. Takeichi, Cdc42 GEF Tuba regulates the junctional configuration of simple epithelial cells. J Cell Biol 175, 135-146 (2006).
S-8
4. C. D. Wells, J. P. Fawcett, A. Traweger, Y. Yamanaka, M. Goudreault, K. Elder et al., A Rich1/Amot complex regulates the Cdc42 GTPase and apical-polarity proteins in epithelial cells. Cell 125, 535-548 (2006).
5. J. D. Axelrod, J. R. Miller, J. M. Shulman, R. T. Moon, N. Perrimon, Differential recruitment of Dishevelled provides signaling specificity in the planar cell polarity and Wingless signaling pathways. Genes Dev 12, 2610-2622 (1998).
6. M. Boutros, N. Paricio, D. I. Strutt, M. Mlodzik, Dishevelled activates JNK and discriminates between JNK pathways in planar polarity and wingless signaling. Cell 94, 109-118 (1998).
7. H. C. Welch, W. J. Coadwell, C. D. Ellson, G. J. Ferguson, S. R. Andrews, H. Erdjument-Bromage et al., P-Rex1, a PtdIns(3,4,5)P3- and Gbetagamma-regulated guanine-nucleotide exchange factor for Rac. Cell 108, 809-821 (2002).
8. M. Kanehira, Y. Harada, R. Takata, T. Shuin, T. Miki, T. Fujioka et al., Involvement of upregulation of DEPDC1 (DEP domain containing 1) in bladder carcinogenesis. Oncogene 26, 6448-6455 (2007).
9. G. Manning, D. B. Whyte, R. Martinez, T. Hunter, S. Sudarsanam, The protein kinase complement of the human genome. Science 298, 1912-1934 (2002).
10. D. M. Virshup, S. Shenolikar, From promiscuity to precision: protein phosphatases get a makeover. Mol Cell 33, 537-545 (2009).
11. G. D. Plowman, S. Sudarsanam, J. Bingham, D. Whyte, T. Hunter, The protein kinases of Caenorhabditis elegans: a model for signal transduction in multicellular organisms. Proc Natl Acad Sci U S A 96, 13603-13610 (1999).
12. K. Colwill, C. D. Wells, K. Elder, M. Goudreault, K. Hersi, S. Kulkarni et al., Modification of the Creator recombination system for proteomics applications--improved expression by addition of splice sites. BMC Biotechnol 6, 13 (2006).
13. J. Jin, F. D. Smith, C. Stark, C. D. Wells, J. P. Fawcett, S. Kulkarni et al., Proteomic, functional, and domain-based analysis of in vivo 14-3-3 binding proteins involved in cytoskeletal regulation and cellular organization. Curr Biol 14, 1436-1450 (2004).
S-9
table S1. Domains found in common (asterisks and color) versus domains that are unique to the human SH2 domain proteins or SH3 domain proteins.
supplementary Table S1 -- Jin et al.
SH2 SH3
B 41/FE R M * B 41/FE R M *B T K * B T K *C 1* C 1*C 2* C 2*C H * C H *FA B D * FA B D *FC H * FC H *P H * P H *P LC X c* P LC X c*P LC Yc* P LC Yc*P T B * P T B *R A * R A *R asG A P * R asG A P *R hoG A P * R hoG A P *R hoG E F* R hoG E F*R IN G * R IN G *S A M * S A M *S H 2* S H 2*S H 3* S H 3*S T YK c* S T YK c*T yrK c* T yrK c*U B A * U B A *IP P c A D FP T P c A N KP T P c_D S P c A rfG apR asG E F B A RS 1 B P 1S O C S E fhV P S 9 FN 3YqgFc G uK c
IGIQL27M YS cM yT H 4N E BP D ZP XR U NS _T K cS E C 14S orbS P E CT B CT P RU IMV H SW D 40W WZU 5
KIAA0672ARHGAP17/RICH1SH3BP1DNMBP/TUBAFLJ41603ENSG00000188436
low
_com
pl.
Rho
Gap
Rho
Gef
PH BAR
coile
d_co
ilFC
HSH
3
A B
C
domain
protein
low
_c
om
ple
xit
yR
ho
Ga
pR
ho
Ge
fP
H
STA
RT
SA
M_
2B
AR
co
ile
d_
co
il_
reg
ion
FC
HS
H3
C1
SH
2C
HC
2E
Fh
EH
HD
AC
_in
tera
ct
FF
SP
EC
SE
C1
4IG
c2
S_
TK
cF
N3
IG_
lik
eB
41
PH
FY
VE
SA
MA
rfG
ap
RA
MY
Sc
IQ His
ton
eR
asG
EF
CR
asG
EF
ND
EP
PD
ZR
BD
RG
SV
PS
9M
OR
NW
WM
yth
4sig
na
l_p
ep
tid
eL
RR
FO
LN
EG
F_
lik
eL
RR
NT
LR
R_
TY
PL
RR
CT
La
mG
CT
EG
F_
CA
IPP
cB
RC
T
GAP GEF
ENSG00000114346 ECT2ENSG00000204084 INPP5BENSG00000122126 OCRLENSG00000147256 RP13-102H20.1ENSG00000124143 C20orf95ENSG00000187122 ARHGAP19ENSG00000164741 DLC1ENSG00000130052 STARD8ENSG00000133121 STARD13ENSG00000156299 TIAM1ENSG00000146426 TIAM2ENSG00000107863 ARHGAP21ENSG00000196914 ARHGEF12ENSG00000132694 ARHGEF11ENSG00000099331 MYO9BENSG00000066933 MYO9AENSG00000089639 GMIPENSG00000137962 ARHGAP29ENSG00000161800 RACGAP1ENSG00000116584 ARHGEF2ENSG00000170776 AKAP13ENSG00000180448 HMHA1ENSG00000097096 SYDE2ENSG00000017797 RALBP1ENSG00000171680 PLEKHG5ENSG00000138640 FAM13A1ENSG00000031003 C5orf5ENSG00000104880 ARHGEF18ENSG00000003393 ALS2ENSG00000106069 CHN2ENSG00000128656 CHN1ENSG00000164341 RGNEFENSG00000047365 CENTD1ENSG00000186635 STARD10ENSG00000120318 CENTD3ENSG00000137727 ARHGAP20ENSG00000167433 ARHGAP23ENSG00000075884 ARHGAP15ENSG00000152767 FARP1ENSG00000006607 FARP2ENSG00000127084 FGD3ENSG00000139132 FGD4ENSG00000146192 FGD2ENSG00000180263 FGD6ENSG00000154783 FGD5ENSG00000102302 FGD1ENSG00000071205 ARHGAP10ENSG00000138639 ARHGAP24ENSG00000079482 OPHN1ENSG00000163219 ARHGAP25ENSG00000128805 ARHGAP22ENSG00000186716 BCRENSG00000159842 ABRENSG00000185602 ARHGAP27ENSG00000165322 ARHGAP12ENSG00000115904 SOS1ENSG00000100485 SOS2ENSG00000058335 RASGRF1ENSG00000113319 RASGRF2ENSG00000126217 MCF2LENSG00000053524 MCF2L2ENSG00000101977 MCF2ENSG00000186654 ARHGAP8ENSG00000175220 ARHGAP1ENSG00000160145 KALRNENSG00000038382 TRIOENSG00000006740 KIAA0672ENSG00000140750 ARHGAP17/RICH1ENSG00000100092 SH3BP1ENSG00000107554 DNMBP/TUBAENSG00000183111 FLJ41603ENSG00000188436ENSG00000198399 ITSN2ENSG00000205726 ITSN1ENSG00000196935 SRGAP1ENSG00000089820 ARHGAP4ENSG00000196220 SRGAP3ENSG00000163486 SRGAP2ENSG00000131089 ARHGEF9ENSG00000066248 NGEFENSG00000050327 ARHGEF5ENSG00000136002 ARHGEF4ENSG00000182957 SPATA13ENSG00000142632 ARHGEF19ENSG00000004777 SNX26ENSG00000130762 ARHGEF16ENSG00000114790 SGEF(HMFN1864, DKFZp434D146)ENSG00000145819 ARHGAP26ENSG00000123329 ARHGAP9ENSG00000129675 ARHGEF6ENSG00000102606 ARHGEF7ENSG00000141968 VAV1ENSG00000160293 VAV2ENSG00000134215 VAV3ENSG00000145675 PIK3R1ENSG00000105647 PIK3R2ENSG00000154358 OBSCNENSG00000046889 DEPDC2ENSG00000124126 PREX1ENSG00000024526 DEPDC1ENSG00000035499 DEPDC1BENSG00000147799 KIAA1688ENSG00000160007 GRLF1ENSG00000100852 ARHGAP5
ENSG00000198826 ARHGAP11AENSG00000164691 TAGAPENSG00000134909 DKFZp451F1115(grit, RICS)ENSG00000047648 ARHGAP6ENSG00000169662 NP_997357.1ENSG00000105137 SYDE1ENSG00000169668 NP_997357.1ENSG00000187951 FAM7A2ENSG00000186517 ARHGAP30ENSG00000031081 CDGAPENSG00000165895 FLJ32810ENSG00000088756 ARHGAP28ENSG00000146376 ARHGAP18ENSG00000163947 ARHGEF3ENSG00000137135 C9orf100(RP11-331F9.7)ENSG00000104728 ARHGEF10ENSG00000196155 PLEKHG4ENSG00000198844 ARHGEF15ENSG00000153404 KIAA1909ENSG00000120278 PLEKHG1ENSG00000090924 PLEKHG2ENSG00000074964 ARHGEF10LENSG00000076928 ARHGEF1ENSG00000110237 ARHGEF17ENSG00000126822 PLEKHG3ENSG00000165801 FLJ00128(FLJ10357, FLJ00056)ENSG00000135502 SLC26A10ENSG00000187510 FLJ46688ENSG00000008323 PLEKHG6ENSG00000173848 NET1
C
B
DEPDC2PREX1DEPDC1DEPDC1B
DEP
PDZ
MYC (P-Rex1)
FLAG (DEPDC1)
C N C N C N
MYC-P-Rex1
FLAG-DEPDC1
Histone H1
DEPDC1
DEPDC1
DEPDC1
DEPDC1
P-Rex1
P-Rex1
DAPI
DAPI
DAPI
composite
composite+Actin
composite+Actin
composite
FLAG-DEPDC1+ MYC-P-Rex1
FLAG-DEPDC1
Dynamin
Zo-1
WASP
Coomassie
1 2 3 4
GS
T-S
H3
(1-4
)
GS
T-S
H3
(5,6
)
GS
T
Tota
l lys
ate
CD2AP*
MUPP1*CIN85*
Mena
Protein Mascot Score
Cell junction:Afadin 665Zo-1 396Zo-2 347Scribble 104MUPP1* 80
Cytoskeleton:Mena 415N-WASP 414 α-actinin 261Wire 145WAVE 124diaphanous 1 94NCKAP1 92KIF15 45
Vesicle trafficking:CD2AP* 513RIN3 409CIN85/SH3KBP1* 245EXOSC10 107LYST 32
Function unknown:CrkRS/cdc2L5 466ZAP3 429Bat2 266
* also interacting with Rich1 (Wells et al.)
affinity chromatographyGST-Tuba-SH3(5,6) affinity purification
supplementary Figure S1 -- Jin et al.
Proline-rich region
SH3 BARGEFSH3 SH3SH3 SH3SH3 Tuba
GAPBAR Rich1,SH3BP1,KIAA0672
WB:
++++ -
-
fig. S1. Functional characterization of RhoGEF and RhoGAP protein pairs
A. Domain profiling of human RhoGAP and RhoGEF proteins, as in Figure 2. B. A
cluster of GEF and GAP proteins containing BAR domains (blue box in A) (top panel),
and their corresponding domain architectures (middle panel); Bottom panel:
Identification of proteins that interact with SH3 domains 1-4 or 5-6 of Tuba. Affinity
chromatography using GST fusion proteins containing Tuba SH3 domains 5-6 was
conducted to purify protein from HEK-293 cell lysate, and associated proteins were
identified through mass spectrometry (left). Selected examples were confirmed by
western blotting (right - baits marked with arrows on coomassie-stained gel). C.
Relationships between P-Rex1 and DEPDC1. Top left: Co-clustering of DEP domain-
containing GEF and GAP proteins (marked with the purple box in A). Upper-right:
Subcellular localization of FLAG-DEPDC1 in HEK-293 cells (Staining with anti-FLAG
M2 antibody. Arrow: mitotic cell). Lower-right: Over-expression of P-Rex1 (MYC-
tagged) causing translocation of FLAG-DEPDC1 to the cell cortex. Bottom-left: Nuclear
(N) and cytoplasmic (C) fractionation of DEPDC1. Western blots compare the nuclear
and cytoplasmic expressions of transfected FLAG-DEPDC1 under the conditions
whereby FLAG-DEPDC1 had been co-transfected with either an empty pcDNA3 vector
(-) or MYC-P-Rex1.
PTPc
TyrK
cST
Ykc
Pfam
:CR
AL_
TRIO
_NSE
C14
Pfam
:BR
O1
LDLa
WIF
CA
FA58
CFA
BD
SH2
SH3
BTK
PH FCH
Pfam
:FA
FER
MPf
am:F
ERM
_CPD
ZK
IND
LRR_
CT
LRR_
NT
Pfam
:Rec
ep_L
_do
mai
nFU LR
RC
TPf
am:L
RR_1
LRRN
TEP
H_l
bd
SAM
FN3
Pfam
:Ig_T
ie2_
1EG
F(EG
F)EG
F_lik
eIG
_lik
eIG IG
c2M
AM
Pfam
:Car
b_a
nh
ydra
seK
RLY Pf
am:F
zSe
ma
IPT
PSI
RYKpsRYKDDR1DDR2MEG2/PTN9HDPTP/PTN23ALKRETTRKB/NTRK2TRKCTRKA/NTRK1ROR2ROR1FGFR4FGFR1FGFR2MUSKFLT4/VGFR3CCK4/PTK7FGFR3PDGFRbFLT1FLT3KDR/VGFR2FMS/CSF1RKITPDGFRaINSRIGF1RIRRHER4/ErbB4HER3/ErbB3HER2/ErbB2EGFREphA2EphA5EphA6EphA10EphB6EphB4EphB3EphA3EphA1EphB1EphA7EphA4EphA8EphB2PTPkappa/PTPRKPTPmu/PTPRMPTPrho/PTPRTPTPlamda/PTPRUAXLMERTYRO3PTPgamma/PTPRGPTPzeta/PTPRZ1SAP1PTPS31/PTPGMC1PTPbeta/PTPRBCD45/PTPRC/CD45DEP1/PTPRJGLEPP1/PTPROROSLAR/PTPRFPTPsigma/PTPRSPTPdelta/PTPRDTIE2TIE1SHP2/PTPN11SHP1/PTPN6SYKZAP70FESFERBMXTECITKBTKTNK1ABLSRMLCKCSKFYNFGRFRKCTKSRCYESpsTXKHCKBLKBRKYESARGLYNPTPD1/PTPN21MEG1/PTPN4PTPH1PTPN3PTPBAS/PTPN13TYK2JAK3JAK1JAK2FAKPYK2/FAK2PTPD2/PTPN14METRON
LTKTYRO3psFERpsTCPTP/PTPN2IA2beta/PTPRN2PTP1B/PTPN1PEST/PTPN12STEP/PTPN5PTPTyp/PTPN20HePTP/PTPN7PTPepsilon/PTPREPTPalpha/PTPRAIA2/PTPRNPCPTP1/PTPRRLyPTP/PTPN22BDP1/PTPN18LMR2LMR3SuRTK106FLT1psJAK2_HumanLMR1
Protein Tyrosine Kinase (PTK)
Protein Tyrosine Phosphatase (PTP)
Domain
Protein
supplementary Figure S2 -- Jin et al.
fig. S2. Domain profiling of human protein tyrosine kinases (PTK) and protein
tyrosine phosphatases (PTP)
Two-way hierarchical clustering of human PTK (red) and PTP (blue) by their non-
catalytic domains (the catalytic domains are listed in separate columns at left). Domains
such as SH2, FERM, FN3 and IGs (in bold) are present in both PTK and PTP proteins.
Proteins possessing these domains are indicated in yellow.
MIT
PX
BR
OM
OR
ING
BB
OX
PH
DB
BC
CA
RD
WD
40sm
all_
GTP
ase
Pfa
m:N
UC
194
Pfa
m:F
AT
Pfa
m:F
ATC
PI3
Kc
Pfa
m:H
EA
TU
ME
Pfa
m:R
AP
Pfa
m:F
AS
T_2
Pfa
m:F
AS
T_1
PA
SD
SR
MR
IOC
AD
CX
Pfa
m:R
CC
1M
ad3_
BU
B1_
ILD
LaW
IFTU
DO
RR
RM
Pfa
m:A
BC
1P
fam
:HS
P20
FA58
CcN
MP
Pfa
m:K
A1
UB
AP
fam
:PO
LO_b
oxP
fam
:LR
R_1
Pfa
m:Y
LPP
fam
:Rec
ep_L
_dom
ain
FU EP
H_l
bdS
AM
FN3
Pfa
m:F
zK
RIG
c2IG
_lik
eIG P
fam
:EG
F_2
EG
FE
GF_
like
Pfa
m:A
lpha
_kin
ase
Pfa
m:Io
n_tra
nsP
fam
:Sel
1IQ M
YS
cLY FC
HP
B1
C2
Hr1
Rho
GE
FS
EC
14S
PE
CR
hoG
AP
RB
DC
1P
HC
NH
PB
DS
H2
SH
3B
TKP
fam
:HR
1R
GS
Pfa
m:F
ocal
_AT
B41
L27
GuK
cP
DZ
LIM
Pfa
m:C
aMK
II_A
DR
HO
DTB
CP
SI
Sem
aIP
TP
UG
PQ
QA
NK
DE
ATH
FHA
AD
FP
fam
:RIO
1P
fam
:UB
AG
SP
fam
:Act
ivin
_rec
pD
naJ
Pfa
m:A
NF_
rece
ptor
CY
Cc
Pfa
m:tR
NA
-syn
t_2b
RW
DH
ATP
ase_
c
RETCYGDANPbCYGFHSERANPaBCKDKPDHK1PDHK4PDHK2PDHK3NEK9FASTKTAF1LBRDTBRD4BRD3BRD2TAF1TIF1gTIF1bTIF1aMAP3K1ALKKISLRRK2FRAPDNAPKATMTRRAPSMG1ATRPIK3R4SCYL3SCYL1FusedRSKL1SGK3SlobULK3RSKL2RIOK3RIOK1CaMK2aCaMK2bCaMK2gCaMK2dDCAMKL2DCAMKL1PASKTBCKBMPR1AACTR2ACTR2BALK7ALK1ALK2TGFbR1BMPR1BALK4LATS1LATS2GAKSIKMARK3MARK4MARK2MELKMARK1H11PKG1PKG2SgK396IKKeTBK1RIOK2PRPKPKRBUB1BUBR1A6A6rRYKMETRONRIPK2DDR1DDR2GCN2ADCK2ADCK3ADCK1RNAseLSgK307ANKRD3SgK424SgK288HH498ILKLRRK1RIPK1IRAK3DAPK1IRE2IRE1PLK3CHK2MYO3AMYO3BPAK6PAK1PAK5PAK4PAK2PAK3BARK1BARK2RHOKGPRK4GPRK7GPRK5GPRK6PKN3PKN1PKN2KSR1KSR2PKCtPKCePKCgPKCbPKChPKCaPKCdBRAFRAF1ARAFCRIKMRCKaMRCKbPKD3DMPK2ROCK2PKD2PKD1ROCK1AKT2AKT1AKT3PKCzPKCiMAP3K5MAP3K3MAP3K2ZC2/TNIKGCKHPK1KHS1ZC1/HGKKHS2ZC3/MINKAlphaK2ChaK1AlphaK3eEF2KHER4/ErbB4IGF1RIRRINSREGFRHER2/ErbB2HER3/ErbB3ROSMERSPEGTIE1TIE2EphA3EphB1EphA8EphA1EphB4EphA5EphA10EphB2EphA7EphA4EphA2EphB6EphB3EphA6ZAKTTNsmMLCKTYRO3AXLAlphaK1KITCCK4PDGFRbPDGFRaFLT1KDRFGFR3FMSFLT4ROR2FLT3FGFR4TRKCTRKATRKBFGFR1FGFR2ROR1MUSKObscnTradTrioBCRCASKLIMK2LIMK1MAST4MAST1MAST2MAST3BMXBTKTECITKYESABLSRMFGRARGCSKTXKBLKLCKLYNFYNSRCFRKHCKBRKCTKSYKZAP70MLK2ACKMLK4TNK1MLK3MLK1FESFERFAKPYK2TYK2JAK1JAK3JAK2
CAMK
TK
RGC
Atypical
STE
TKL
Other
AGC
BARK1BARK2RHOKGPRK4GPRK7GPRK5GPRK6PKN3PKN1PKN2KSR1KSR2PKCtPKCePKCgPKCbPKChPKCaPKCdBRAFRAF1ARAFCRIKMRCKaMRCKbPKD3DMPK2ROCK2PKD2PKD1ROCK1AKT2AKT1AKT3PKCzPKCi
BMXBTKTECITKYESABLSRMFGRARGCSKTXKBLKLCKLYNFYNSRCFRKHCKBRKCTKSYKZAP70MLK2ACKMLK4TNK1MLK3MLK1FESFERFAKPYK2TYK2JAK1JAK3JAK2
TAF1LBRDTBRD4BRD3BRD2TAF1TIF1gTIF1bTIF1aMAP3K1
ObscnTradTrioBCRCASK
HER4/ErbB4IGF1RIRRINSREGFRHER2/ErbB2HER3/ErbB3ROSMERSPEGTIE1TIE2EphA3EphB1EphA8EphA1EphB4EphA5EphA10EphB2EphA7EphA4EphA2EphB6EphB3EphA6ZAKTTNsmMLCKTYRO3AXLAlphaK1KITCCK4PDGFRbPDGFRaFLT1KDRFGFR3FMSFLT4ROR2FLT3FGFR4TRKCTRKATRKBFGFR1FGFR2ROR1MUSK
FRAPDNAPKATMTRRAPSMG1ATRPIK3R4SCYL3SCYL1FusedRSKL1
BR
OM
OR
ING
BB
OX
PH
DB
BC
WD
40sm
all_
GTP
ase
Pfa
m:N
UC
194
Pfa
m:F
AT
Pfa
m:F
ATC
PI3
Kc
Pfa
m:H
EA
TU
ME
PB
1C
2H
r1R
hoG
EF
SE
C14
SP
EC
Rho
GA
PR
BD
C1
PH
CN
HP
BD
SH
2S
H3
BTK
Pfa
m:H
R1
RG
S
Pfa
m:L
RR
_1P
fam
:YLP
Pfa
m:R
ecep
_LFU E
PH
_lbd
SA
MFN
3P
fam
:Fz
KR
IGc2
IG_l
ike
IG Pfa
m:E
GF_
2E
GF
EG
F_lik
e
PH
CN
HP
BD
SH
2S
H3
BTK
Pfa
m:H
R1
RG
SP
fam
:Foc
al_A
TFE
RM
FN3
Pfa
m:F
zK
RIG
c2IG
_lik
eIG P
fam
:EG
F_2
EG
FE
GF_
like
Pfa
m:A
lpha
_kin
ase
Pfa
m:Io
n_tra
nsP
fam
:Sel
1IQ M
YS
cLY FC
HP
B1
C2
Hr1
Rho
GE
FS
EC
14S
PE
CR
hoG
AP
RB
DC
1P
HC
NH
PB
DS
H2
SH
3
aa
b
b
c
c
d
d
e
e
f
f
domain
protein
12
30
2439
4725
55
24
26 195
85
1
3610
0
61
0
0
5TK TKL RGC STE
CAMK CK CMGCOther
AGC
Atypical
other domain beside kinase
only kinase domain
supplementary Figure S3 -- Jin et al.
fig. S3. Domain profiling of the human kinome
Human protein kinases were each annotated for their domain composition. Each pie-chart
(upper panel) represents a distinct functional sub-group, and indicates the number of
kinases within each group that contain or lack non-kinase domains. Lower panels:
Kinases were clustered through their non-catalytic domains (left, pixel color indicating
kinase class). To the right: Magnified version of selected clusters (boxes a-f).
Pfam
:RA
PPf
am:F
AST
_2Pf
am:F
AST
_1C
ARD
HEC
TcPf
am:R
CC
1M
ORN
DC
XD
SPc
Pfam
:Co
llag
enPf
am:C
ol_
cuti
cle_
NLD
LaPf
am:M
AM
Mad
3_B
UB
1_I
Pfam
:Mad
3_B
UB
1_II
WIF
RRM
MAT
HPf
am:F
NIP
Pfam
:HG
TP_a
nti
cod
on
RW
DPf
am:tR
NA
-syn
t_2b
RHO
DTB
CC
ASh
KT
PSI
IPT
Sem
aH
15D
SRM
MA
DF
HO
XPf
am:A
BC
1Pf
am:P
OLO
_box
PQQ
PUG
Pfam
:SA
M_1
Pfam
:SA
M_2
AN
KD
EATH
BTB
Pfam
:An
kPH
DB
BC
BB
OX
RIN
GB
RO
MO
Kel
chAT
_ho
ok
Zn
F_C
2HC
cNM
PPf
am:R
asPf
am:M
iro
Ras
GEF
NR
asG
EFG
RA
MPf
am:R
asG
EF_N
Pfam
:Myo
tub
-rel
ated
smal
l_G
TPas
eW
D40
Pfam
:Alp
ha_
kin
ase
Pfam
:Ion
_tra
ns
Pfam
:Sel
1SE
L1V
WA
Arf
Gap
Pfam
:Bea
chPB
1FB
OX
UM
EPf
am:N
UC
194
Pfam
:HEA
TPf
am:F
ATPf
am:F
ATC
PI3K
cPf
am:T
PR_2
Pfam
:WH
2FY
VE
MIT
PX Pfam
:Fz
KR
IG_l
ike
IGc2
Rh
oG
EFSE
C14
SPEC
Pfam
:YLP
Pfam
:Rec
ep_L
_do
mai
nFU EP
H_l
bd
SAM
FN3
LY Rh
oG
AP
Pfam
:Kel
ch_2
Pfam
:Kel
ch_1
Pfam
:I-se
tPf
am:L
RR_1
IG ARM
Pfam
:EG
F_2
EGF
EGF_
like
WR1
Hr1
C2
C1
PH CN
HPB
DRB
DPf
am:H
R1R
GS
L27
Gu
Kc
PDZ
LIM
B41
Pfam
:Fo
cal_
ATFC
HSH
2SH
3BT
KPf
am:R
ho
GEF
IQ MYS
cSP
RYZ
nF_
ZZ
Pfam
:PB
DPf
am:U
BA
Pfam
:KA
1U
BA
Pfam
:HSP
20Pf
am:U
BA
_2G
SPf
am:A
ctiv
in_r
ecp
Pfam
:Dea
thD
EPPf
am:A
cety
ltra
nsf
_1ca
lpai
n_I
IIPf
am:A
NF_
rece
pto
rC
YCc
TUD
OR
Pfam
:RIO
1FA
58C
RIO
HA
MP
PAC
Pfam
:CH
ASE
HAT
Pase
_cH
isK
ARE
CPA
SA
AA
GA
FFH
APf
am:C
aMK
II_A
DPT
Pc_D
SPc
Dn
aJSP
KA
DF
TBCK_HsCG4041_DmTBCK_MmC33F10.2_CeTBCK_Ddunc-43_CeCaMK2d_HsCaMK2g_MmCaMK2a_MmCaMK2g_HsCaMK2b_HsCaMK2a_HsRIOK1_HsRio1_DdRIOK3_HsZK632.3_CeRIO1_ScCG11660_DmRIOK3_MmRIO2_ScRIOK1_MmM01B12.5_CeCG3008_Dmauxillin_DmGAK_HsGAK_MmBMPR1B_MmALK2_MmALK7_MmTGFbR1_HsALK1_HsBMPR1A_Mmsma-6_CeALK1_MmSAX_DmBABO_DmBMPR1B_HsALK4_HsALK7_HsALK4_MmALK2_Hsdaf-1_CeTGFBR1_Mmtkv_DmBMPR1A_HsACTR2B_MmACTR2_HsBMPR2_MmACTR2_MmACTR2B_Hsput_DmRYK_Mmdrl_DmRYK_Hslin-18_Cednt_DmPLK1_Mmplk-1_CePLK3_HsPLK3_MmF55G1.8_CePLK5_Mmplk-2_CeCDC5_ScPLK2_Mmpolo_DmALK_Hsscd-2_CeALK_MmAlk_DmADCK2_MmAdckC_DdADCK3_HsCG3608_DmCG7616_DmAdckB1_DdYLR253W_ScY32H12A.7_CeABC1_ScAdckA_DdADCK3_MmADCK1_HsADCK4_MmD2023.6_CeADCK1_MmADCK2_HsAdckB2_DdR04A9.5_CeRoco8_DdY73B6A.1_CeRET_MmRET_HsCad96Ca_Dmchk-2_CeFHAK1_DdFHAK2_DdMEK1_ScFHAK4_DdFHAK3_Ddlok_DmCHK2_MmRAD53_ScT08D2.7_CeDUN1_ScCHK2_HsFHAK5_DdRIM15_ScDhkF_DdDhkA_DdSLN1_ScDhkK_DdDhkC_DdDhkJ_DdDhkI_C2d_DdDhkI_DdDokA_DdDhkE_DdDhkH_DdDhkB_DdDhkD_DdDhkL_DdDhkM_DdDhkG_DdPDK_DmPDHK3_HsZK370.5_CeBCKDK_MmBCKDK_HsPDHK4_HsPDHK1_MmYIL042C_ScPDHK4_MmPDHK2_MmPDHK1_HsPDHK2_HsPDHK3_MmDDB0231454_DdPASK_MmPASK_HsYOL045W_ScCG3105_DmFUN31_ScGCN2_MmY81G3A.3_CeGCN2_ScGCN2_HsIfkB_DdDDB0216407_DdIfkA_DdGcn2_DmIRAK1_MmIRAK3_MmIRAK4_Mmpik-1_CeDDB0231326_DdC01G12.1_CeMYO3B_MmninaC_DmMYO3A_HsMYO3A_MmMYO3B_HsDDB0229854_DdCG4839_DmPKG2_Mmfor_DmPKG1_MmPKG1_HsC09G4.2_Ceegl-4_CePkg21D_DmPKG2_HsCG1973_DmFused_HsSCYL1_MmFused_MmScy1_DdBcDNA:LD22679_DmSCYL3_MmSCYL1_Hstsunami_DdSCYL3_HsW07G4.3_CePIK3R4_HsVPS15_ScPIK3R4_MmCG9746_DmZK930.1_CeVps15_DdSMG1_DdCG4549_DmDNAPK_HsTOR2_ScTor_DdFRAP_MmTOR1_ScTor_DmTRA1_ScFRAP_HsATM_MmATR_Mmmei-41_DmMEC1_ScATR_DdATR_HsB0261.2_Ceatm-1_Ceatl-1_CeTRRAP_HsDNAPK_MmSMG1_MmTEL1_ScATM_HsTRRAP_DdDNAPK_DdCG6535_Dmsmg-1_CeC47D12.1_CeTRRAP_MmCG2905_DmSMG1_HsLRRK2_MmLRRK2_HsMHCK-A_DdMHCK-B_DdMHCK-C_DdMHCK-D_DdLvsG_DdDDB0230126_DdDDB0229872_DdRoco7_DdAlphaK2_HsAlphaK2_MmChaK1_HsChaK2_MmAlphaK3_MmAlphaK3_Hsefk-1_CeVwkA_Ddak1_DdeEF2K_MmeEF2K_HsGbpC_DdRoco11_DdRoco4_DdRoco5_DdPats1_DdRoco6_DdSLOB1_DdDDB0220701_DdCG8726_DmSGK3_HsSGK3_MmSlob_HsSlob_MmCG7156_DmRSKL1_MmRSKL1_HsULK3_HsULK3_MmRSKL2_HsCG8866_DmRSKL2_MmRoco10_DdDDB0229963_DdRoco9_DdDDB0229972_DdDDB0230038_DdNEK10_MmDDB0230124_DdQkgA_C2d_DdQkgA_DdTRKA_MmTRKC_Mmpqn-25_CeDDB0229848_DdSAMK-B_Ddpll_DmRIPK1_MmRIPK1_HsIRAK3_HsDAPK1_HsK12C11.4_CeDAPK1_MmARCK-1_Ddshark_DmHH498_DdLRRK1_HsSgK288_HsIlk_DmDDB0231195_DdHH498_MmSgK288_MmANKRD3_HsC24A1.3_CeDDB0231196_DdCG5483_DmDDB0229339_DdSgK424_HsDDB0231559_DdHH498_HsILK_HsMAP3K8_MmLRRK1_MmSgK307_Hspat-4_CeILK_MmANKRD3_MmSgK307_MmSgK424_MmRNAseL_HsRNAseL_MmPEK_Dmpek-1_Ceire-1_DmIRE2_MmIRE1_Scire-1_CeIreA_DdIRE1_HsIRE1_MmIRE2_HsNrk_DmRor_DmROR2_HsROR2_MmROR1_HsROR1_MmMUSK_MmMUSK_Hscam-1_CeTIE2_HsTIE2_MmF12F3.2_CeAXL_Hsunc-22_CeStrn-Mlck_DmTYRO3_HsC24G7.5_Cebt_Dmver-2_CeFGFR4_HsTRKC_HsKDR_MmFLT3_MmAlphaK1_MmTRKA_HsFMS_MmFGFR2_MmFLT3_HsAlphaK1_HsTTN_HssmMLCK_HsTYRO3_MmFLT4_MmFLT4_Hshtl_Dmver-3_CeFLT1_MmKDR_Hsver-4_CeKIT_MmPDGFRb_HsFLT1_HsCCK4_MmPDGFRa_HsFGFR3_HsFMS_HsPDGFRb_Mmegl-15_CeCCK4_HsPDGFRa_MmPvr_DmKIT_Hsbtl_Dmotk_DmTRKB_HsFGFR4_MmFGFR1_MmFGFR2_HsTRKB_MmFGFR1_HsFGFR3_MmIRR_MmIGF1R_HsINSR_MmIRR_HsIGF1R_MmINSR_HsInR_Dmdaf-2_Celet-23_CeEgfr_DmEGFR_MmHER4/ErbB4_HsErbB2_MmErbB4_MmHER3/ErbB3_HsHER2/ErbB2_HsEGFR_HsErbB3_MmSAPKalpha_DdSAPKalpha_C2d_DdDDB0229849_DdSTE11_ScZAK_MmZAK_HsR13F6.6_CeSplA_Ddvab-1_CeEph_DmEphB1_MmEphB2_MmEphA8_HsEphB4_MmEphB6_MmEphA8_MmEPHA6_MmEphA5_HsEphA10_HsEphB3_MmEphA7_MmEphA3_HsEphB1_HsEphA1_HsEphA6_HsEphB4_HsEphB3_HsEphB6_HsEphA2_HsEphA4_HsEphA7_HsEphB2_HsEphA10_MmEphA2_MmEphA4_MmEphA3_MmEphA1_MmEphA5_MmMER_MmCG18021_DmMER_HsAXL_MmSPEG_MmsmMLCK_MmSPEG_HsTIE1_HsTIE1_MmROS_Hssev_DmROS_MmC16D9.2_CeC24G6.2_CeTOR_DmWSCK_DmPar-1_Dmpar-1_CeMARK1_MmMARK-A_DdMARK1_HsMARK3_MmMARK4_MmW03G1.6_CeMARK-C_DdKIN2_ScKIN1_ScMARK-B_DdMELK_MmMELK_HsMarkmB3_MmMARK2_HsMarkmB2_MmSIK_HsMarkmD1_MmMarkmA11_MmMarkmB1_MmMARK3_HsMARK4_HsMARK2_MmMarkmA9_MmAck_DmKHS2_MmZC1/HGK_HsKHS1_MmZC2_Mmmig-15_CeHPK1_HsZC2/TNIK_HsGCK_HsZC1_Mmmsn_DmKHS1_HsZC4_MmKHS2_HsZC3/MINK_HsHPK1_MmGCK_MmCG7097_DmZC3_MmZC404.9_CeF19C6.1_CeGPRK6_HsGPRK7_HsRHOK_MmGPRK4_MmGPRK4_HsGPRK5_HsGPRK5_MmRHOK_HsGPRK6_MmGprk2_DmPAK2_HsPAK3_Mmpak-1_CePAK1_MmPAK4_HsPAK1_HsPAK5_MmPAK2_MmPak_DmSTE20_ScPakB_DdPakA_DdPAK4_Mmmbt_DmPAK3_HsPAK5_HsPAK6_MmC45B11.1_CePAK6_HsCLA4_ScPakC_DdSKM1_ScPakD_DdPak3_DmPKCt_Hsksr-2_CePKCdelta_DmDDB0231197_Ddksr-1_CeKSR2_HsKSR2_MmKSR1_MmPKCt_Mmtpa-1_Ceksr_DmKSR1_Hsphl_DmARAF_HsRAF1_Mmlin-45_CeARAF_MmRAF1_HsBRAF_MmBRAF_HsPKCeps_MmPkc98E_DminaC_DmPKCd_Mmpkc-2_CePKCa_HsPKCh_HsPKCeta_MmPKCa_MmPkc53E_DmPKCg_Mmkin-13_CePKCd_HsPKCe_HsPKCg_HsPKCb_HsPKCb_MmPKC1_ScMEKKalpha_DdMAP3K3_MmMAP3K2_MmMAP3K2_HsMAP2K5_MmMAP3K3_HsMAP2K5_HsaPKC_DmPKCz_MmPKCz_HsPKCi_Mmpkc-3_CePKCi_HsDDB0229867_DdROCK1_MmROCK1_HsROCK2_Mmrok_DmPKD3_HsPKD2_HsCG7125_DmT25E12.4_CeDMPK2_Mmlet-502_CeW09C5.5_CeDMPK2_HsROCK2_HsPKD2_MmPKD3_MmPKD1_MmPKD1_HsK08B12.5_CeMRCKa_MmMRCKa_HsMRCKb_HsMRCKb_Mmgek_DmCRIK_HsCRIK_MmW02B3.2_CeBARK2_MmBARK2_HsGprk1_DmBARK1_MmBARK1_HsPdk1A_DdAkt1_Dmpdk-1_CeAKT1_MmAKT1_HsAKT3_HsAkt1_DdDDB0229957_Ddakt-2_Ceakt-1_CeAKT3_MmAKT2_HsAKT2_MmDDB0229973_DdBCR_MmBCR_HsPKN3_MmPKN1_HsPKN3_HsPKN2_MmPKN1_MmF46F6.2_CePKN2_HsDDB0220670_DdSCH9_ScBMX_HsBMX_MmITK_HsTEC_HsBTK_HsTEC_MmITK_MmBTK_MmF26E4.5_CeC55C3.4_CeT25B9.4_CeSYK_MmC18H7.4_CeF23C8.7_CeT21G5.1_CeR05H5.4_CeY69E1A.3_CeaSWK345_CeShk1_DdY4C6A.1_CeF59A3.8_CeZAP70_HsShk2_Ddkin-26_CeT25B9.5_CeW01B6.5_CeW02A2.4_CeR11E3.1_CeC34F11.5_CeCG17309_Dmspe-8_CeY52D5A.2_CeaSWK377_CeZK593.9_CeShk4_DdF57B9.8_CeF46F5.2_Cekin-28_CeK09B11.5_CeT06C10.3_Cekin-21_CeW03F8.2_CeZK622.1_CeSYK_HsM176.9_CeShk5_DdC25A8.5_CeC35E7.10_Cekin-5_CeZC581.7_CeShk3_DdZAP70_Mmkin-24_Cekin-14_CeY116A8C.24_Cefrk-1_CeF22B3.8_CeF01D4.3_CeFGR_HsCTK_MmYES_Mmabl-1_CeCSK_MmFRK_HsBLK_MmABL_HsBRK_MmARG_MmAbl_DmBtk29A_DmTXK_MmLYN_MmARG_HsCSK_HsFGR_MmTXK_HsLCK_HsY48G1C.2_CeBLK_HsLYN_HsFYN_HsSRM_MmSRC_HsHCK_MmABL_MmHCK_HsSrc42A_DmSrc64B_DmBRK_HsSRC_MmCTK_HsY47G6A.5_CeFYN_Mmkin-22_CeLCK_MmFRK_Mmsrc-1_CeSRM_HsYES_HsFER_MmFER_HsFES_MmFES_HsFps85D_DmFAK_MmFAK_HsFak56D_DmPYK2_HsTYK2_HsJAK2_Mmhop_DmJAK1_HsTYK2_MmJAK3_MmJAK1_MmJAK3_HsJAK2_Hskin-31_CePYK2_MmCaki_Dmlin-2_CeCASK_HsCASK_MmACK_MmMLK1_MmK11D12.10_CeMLK1_Hskin-25_Ceslpr_DmMLK4_MmTNK1_MmMkcF_DdMLK2_MmMLK3_HsTNK1_HsMLK4_HsACK_HsMLK3_MmMLK2_HsPR2_Dmark-1_CeObscn_MmObscn_HsTrad_HsTrad_MmTrio_MmTrio_HsDDB0229940_DdDDB0218878e_DdMAST4_HsCG6498_DmMAST1_MmMAST1_HsMAST4_Mmkin-4_CeMAST3_HsMAST2_MmMAST2_HsMAST3_MmLIMK2_HsLIMK2_MmLIMK1_MmLIMK1_HsLIMK1_DmLATS1_HsLATS2_HsLATS2_MmLATS1_MmMAP3K1_HsMAP3K1_MmDDB0230051_DdTIF1b_MmTIF1b_HsTIF1g_MmTIF1a_HsTIF1a_MmTIF1g_HsDDB0220714_DdTaf250_Dmtaf-1_CeDDB0220693_DdBRD3_HsY119C1B.8_CeTAF1L_HsBRDT_HsBRD4_HsBRDT_MmTAF1_Mmfs(1)h_DmTAF1_DdBRD2_HsTAF1_HsBRD2_MmBRD4_MmBRD3_MmDDB0220694_DdTAF1_ScFASTK_HsFASTK_MmRON_MmMET_MmMET_HsRON_HsW05B2.1_CeDDB0216331_DdDDB0229344_DdDCAMKL2_MmCG10177_Dmzyg-8_CeDCAMKL2_HsDCAMKL1_MmCG17528_DmDCAMKL1_HsSNF1_ScBUB1_ScBub1_DmCG14030_DmBUBR1_HsBUB1_HsBub1_DdBUB1_MmBUBR1_MmH11_HsH11_MmSgK396_MmSgK396_HsKIS_HsKIS_MmRIPK2_MmRIPK2_HsY39G8B.5_CeFNIPK-C_DdFNIPK-D_DdFNIPK-A_DdFNIPK-E_DdFNIPK-D_C2d_DdDDB0231199_DdDDB0216375_DdPKR_HsPKR_MmC25F6.4_CeCG11573_DmDDR1_MmDDR2_HsF11D5.3_CeDDR1_HsDDR2_Mmtwf_Dmunc-60_CeA6_MmA6r_MmF38E9.5_CeA6r_HsA6_HsTWF1_Scgcy-27_Ceodr-1_Cegcy-25_Cegcy-23_Cegcy-21_Cegcy-17_CeCG4224_DmCYGX_Mmgcy-19_Cegcy-15_CeT01A4.1_Cegcy-11_CeCG3216_DmC04H5.4_CeKSGC_MmGyc32E_Dmdaf-11_CeHSER_MmCG9783_Dmgcy-7_Cegcy-14_CeHSER_HsCYGF_HsCYGD_Mmgcy-18_Cegcy-6_Cegcy-13_CeCG10738_Dmgcy-12_Cegcy-3_Cegcy-8_CeANPa_MmANPb_Mmgcy-22_CeANPb_HsCYGD_HsANPa_Hsgcy-5_CeCYGF_Mmgcy-20_Cegcy-9_Cegcy-1_CeGyc76C_Dmgcy-2_Cegcy-4_CeC16A11.3_CeH12I13.1_CeCG10951_DmNEK8_MmNEK9_MmNEK9_HsDDB0219986_DdTBK1_MmPRPK_HsTBK1_HsIKKe_HsaSWK440_CeRio2_DdCG11859_DmRIOK2_MmCG10673_DmRIOK2_Hs
HsMmDmCeDdSc
supplementary Figure S4 -- Jin et al.
FCH
SH
2S
H3
BTK
BMX_HsBMX_MmITK_HsTEC_HsBTK_HsTEC_MmITK_MmBTK_MmF26E4.5_CeC55C3.4_CeT25B9.4_CeSYK_MmC18H7.4_CeF23C8.7_CeT21G5.1_CeR05H5.4_CeY69E1A.3_CeaSWK345_CeShk1_DdY4C6A.1_CeF59A3.8_CeZAP70_HsShk2_Ddkin-26_CeT25B9.5_CeW01B6.5_CeW02A2.4_CeR11E3.1_CeC34F11.5_CeCG17309_Dmspe-8_CeY52D5A.2_CeaSWK377_CeZK593.9_CeShk4_DdF57B9.8_CeF46F5.2_Cekin-28_CeK09B11.5_CeT06C10.3_Cekin-21_CeW03F8.2_CeZK622.1_CeSYK_HsM176.9_CeShk5_DdC25A8.5_CeC35E7.10_Cekin-5_CeZC581.7_CeShk3_DdZAP70_Mmkin-24_Cekin-14_CeY116A8C.24_Cefrk-1_CeF22B3.8_CeF01D4.3_CeFGR_HsCTK_MmYES_Mmabl-1_CeCSK_MmFRK_HsBLK_MmABL_HsBRK_MmARG_MmAbl_DmBtk29A_DmTXK_MmLYN_MmARG_HsCSK_HsFGR_MmTXK_HsLCK_HsY48G1C.2_CeBLK_HsLYN_HsFYN_HsSRM_MmSRC_HsHCK_MmABL_MmHCK_HsSrc42A_DmSrc64B_DmBRK_HsSRC_MmCTK_HsY47G6A.5_CeFYN_Mmkin-22_CeLCK_MmFRK_Mmsrc-1_CeSRM_HsYES_HsFER_MmFER_HsFES_MmFES_HsFps85D_DmFAK_MmFAK_HsFak56D_DmPYK2_HsTYK2_HsJAK2_Mmhop_DmJAK1_HsTYK2_MmJAK3_MmJAK1_MmJAK3_HsJAK2_Hs
Pfa
m:F
zK
RIG
_lik
eIG
c2R
hoG
EF
SE
C14
SP
EC
Pfa
m:Y
LPP
fam
:Rec
ep_L
_dom
ain
FU EP
H_l
bdS
AM
FN3
LY
Nrk_DmRor_DmROR2_HsROR2_MmROR1_HsROR1_MmMUSK_MmMUSK_Hscam-1_CeTIE2_HsTIE2_MmF12F3.2_CeAXL_Hsunc-22_CeStrn-Mlck_DmTYRO3_HsC24G7.5_Cebt_Dmver-2_CeFGFR4_HsTRKC_HsKDR_MmFLT3_MmAlphaK1_MmTRKA_HsFMS_MmFGFR2_MmFLT3_HsAlphaK1_HsTTN_HssmMLCK_HsTYRO3_MmFLT4_MmFLT4_Hshtl_Dmver-3_CeFLT1_MmKDR_Hsver-4_CeKIT_MmPDGFRb_HsFLT1_HsCCK4_MmPDGFRa_HsFGFR3_HsFMS_HsPDGFRb_Mmegl-15_CeCCK4_HsPDGFRa_MmPvr_DmKIT_Hsbtl_Dmotk_DmTRKB_HsFGFR4_MmFGFR1_MmFGFR2_HsTRKB_MmFGFR1_HsFGFR3_MmIRR_MmIGF1R_HsINSR_MmIRR_HsIGF1R_MmINSR_HsInR_Dmdaf-2_Celet-23_CeEgfr_DmEGFR_MmHER4/ErbB4_HsErbB2_MmErbB4_MmHER3/ErbB3_HsHER2/ErbB2_HsEGFR_HsErbB3_MmSAPKalpha_DdSAPKalpha_C2d_DdDDB0229849_DdSTE11_ScZAK_MmZAK_HsR13F6.6_CeSplA_Ddvab-1_CeEph_DmEphB1_MmEphB2_MmEphA8_HsEphB4_MmEphB6_MmEphA8_MmEPHA6_MmEphA5_HsEphA10_HsEphB3_MmEphA7_MmEphA3_HsEphB1_HsEphA1_HsEphA6_HsEphB4_HsEphB3_HsEphB6_HsEphA2_HsEphA4_HsEphA7_HsEphB2_HsEphA10_MmEphA2_MmEphA4_MmEphA3_MmEphA1_MmEphA5_MmMER_MmCG18021_DmMER_HsAXL_MmSPEG_MmsmMLCK_MmSPEG_HsTIE1_HsTIE1_MmROS_Hssev_DmROS_MmC16D9.2_CeC24G6.2_CeTOR_DmWSCK_Dm
Hr1
C2
C1
PH CN
HPB
DRB
DPf
am:H
R1R
GS
PB1
FBO
X
KHS2_MmZC1/HGK_HsKHS1_MmZC2_Mmmig-15_CeHPK1_HsZC2/TNIK_HsGCK_HsZC1_Mmmsn_DmKHS1_HsZC4_MmKHS2_HsZC3/MINK_HsHPK1_MmGCK_MmCG7097_DmZC3_MmZC404.9_CeF19C6.1_CeGPRK6_HsGPRK7_HsRHOK_MmGPRK4_MmGPRK4_HsGPRK5_HsGPRK5_MmRHOK_HsGPRK6_MmGprk2_DmPAK2_HsPAK3_Mmpak-1_CePAK1_MmPAK4_HsPAK1_HsPAK5_MmPAK2_MmPak_DmSTE20_ScPakB_DdPakA_DdPAK4_Mmmbt_DmPAK3_HsPAK5_HsPAK6_MmC45B11.1_CePAK6_HsCLA4_ScPakC_DdSKM1_ScPakD_DdPak3_DmPKCt_Hsksr-2_CePKCdelta_DmDDB0231197_Ddksr-1_CeKSR2_HsKSR2_MmKSR1_MmPKCt_Mmtpa-1_Ceksr_DmKSR1_Hsphl_DmARAF_HsRAF1_Mmlin-45_CeARAF_MmRAF1_HsBRAF_MmBRAF_HsPKCeps_MmPkc98E_DminaC_DmPKCd_Mmpkc-2_CePKCa_HsPKCh_HsPKCeta_MmPKCa_MmPkc53E_DmPKCg_Mmkin-13_CePKCd_HsPKCe_HsPKCg_HsPKCb_HsPKCb_MmPKC1_ScMEKKalpha_DdMAP3K3_MmMAP3K2_MmMAP3K2_HsMAP2K5_MmMAP3K3_HsMAP2K5_HsaPKC_DmPKCz_MmPKCz_HsPKCi_Mmpkc-3_CePKCi_HsDDB0229867_DdROCK1_MmROCK1_HsROCK2_Mmrok_DmPKD3_HsPKD2_HsCG7125_DmT25E12.4_CeDMPK2_Mmlet-502_CeW09C5.5_CeDMPK2_HsROCK2_HsPKD2_MmPKD3_MmPKD1_MmPKD1_HsK08B12.5_CeMRCKa_MmMRCKa_HsMRCKb_HsMRCKb_Mmgek_DmCRIK_HsCRIK_MmW02B3.2_CeBARK2_MmBARK2_HsGprk1_DmBARK1_MmBARK1_HsPdk1A_DdAkt1_Dmpdk-1_CeAKT1_MmAKT1_HsAKT3_HsAkt1_DdDDB0229957_Ddakt-2_Ceakt-1_CeAKT3_MmAKT2_HsAKT2_MmDDB0229973_DdBCR_MmBCR_HsPKN3_MmPKN1_HsPKN3_HsPKN2_MmPKN1_MmF46F6.2_CePKN2_HsDDB0220670_DdSCH9_Sc
RIM15_ScDhkF_DdDhkA_DdSLN1_ScDhkK_DdDhkC_DdDhkJ_DdDhkI_C2d_DdDhkI_DdDokA_DdDhkE_DdDhkH_DdDhkB_DdDhkD_DdDhkL_DdDhkM_DdDhkG_DdPDK_DmPDHK3_HsZK370.5_CeBCKDK_MmBCKDK_HsPDHK4_HsPDHK1_MmYIL042C_ScPDHK4_MmPDHK2_MmPDHK1_HsPDHK2_HsPDHK3_MmDDB0231454_DdPASK_MmPASK_HsYOL045W_ScCG3105_DmFUN31_Sc
HA
MP
PA
CP
fam
:CH
AS
EH
ATP
ase_
cH
isK
AR
EC
PA
SA
AA
GA
FFH
A
CG1973_DmFused_HsSCYL1_MmFused_MmScy1_DdBcDNA:LD22679_DmSCYL3_MmSCYL1_Hstsunami_DdSCYL3_HsW07G4.3_CePIK3R4_HsVPS15_ScPIK3R4_MmCG9746_DmZK930.1_CeVps15_DdSMG1_DdCG4549_DmDNAPK_HsTOR2_ScTor_DdFRAP_MmTOR1_ScTor_DmTRA1_ScFRAP_HsATM_MmATR_Mmmei-41_DmMEC1_ScATR_DdATR_HsB0261.2_Ceatm-1_Ceatl-1_CeTRRAP_HsDNAPK_MmSMG1_MmTEL1_ScATM_HsTRRAP_DdDNAPK_DdCG6535_Dmsmg-1_CeC47D12.1_CeTRRAP_MmCG2905_DmSMG1_HsLRRK2_MmLRRK2_HsMHCK-A_DdMHCK-B_DdMHCK-C_DdMHCK-D_DdLvsG_DdDDB0230126_DdDDB0229872_DdRoco7_DdAlphaK2_HsAlphaK2_MmChaK1_HsChaK2_MmAlphaK3_MmAlphaK3_Hsefk-1_CeVwkA_Ddak1_DdeEF2K_MmeEF2K_Hs
smal
l_G
TPas
eW
D40
Pfam
:Alp
ha_
kin
ase
Pfam
:Ion
_tra
ns
Pfam
:Sel
1
UM
EPf
am:N
UC
194
Pfam
:HEA
TPf
am:F
ATPf
am:F
ATC
PI3K
cPf
am:T
PR_2
PQ
QP
UG
Pfa
m:S
AM
_1P
fam
:SA
M_2
AN
KD
EA
TH
DDB0229848_DdSAMK-B_Ddpll_DmRIPK1_MmRIPK1_HsIRAK3_HsDAPK1_HsK12C11.4_CeDAPK1_MmARCK-1_Ddshark_DmHH498_DdLRRK1_HsSgK288_HsIlk_DmDDB0231195_DdHH498_MmSgK288_MmANKRD3_HsC24A1.3_CeDDB0231196_DdCG5483_DmDDB0229339_DdSgK424_HsDDB0231559_DdHH498_HsILK_HsMAP3K8_MmLRRK1_MmSgK307_Hspat-4_CeILK_MmANKRD3_MmSgK307_MmSgK424_MmRNAseL_HsRNAseL_MmPEK_Dmpek-1_Ceire-1_DmIRE2_MmIRE1_Scire-1_CeIreA_DdIRE1_HsIRE1_MmIRE2_Hs
MAP3K1_HsMAP3K1_MmDDB0230051_DdTIF1b_MmTIF1b_HsTIF1g_MmTIF1a_HsTIF1a_MmTIF1g_HsDDB0220714_DdTaf250_Dmtaf-1_CeDDB0220693_DdBRD3_HsY119C1B.8_CeTAF1L_HsBRDT_HsBRD4_HsBRDT_MmTAF1_Mmfs(1)h_DmTAF1_DdBRD2_HsTAF1_HsBRD2_MmBRD4_MmBRD3_MmDDB0220694_Dd
PH
DB
BC
BB
OX
RIN
GB
RO
MO
Kel
chA
T_ho
okZn
F_C
2HC
PS
IIP
TS
ema
RON_MmMET_MmMET_HsRON_Hs
Pfa
m:R
AP
Pfa
m:F
AS
T_2
Pfa
m:F
AS
T_1
FASTK_HsFASTK_Mm
a
a
b
b
c
c d
d
e
f
f
e
g
g
hhi
i
domain
protein
fig. S4. Domain profiling across six eukaryotic kinomes
Amalgamated kinomes of six eukaryotic species of Saccharomyces cerevisiae (Sc),
Dictyostelium discoideum (Dd), Caenorhabditis elegans (Ce), Drosophila melanogaster
(Dm), Mus musculus (Mn) and Homo sapiens (Hs) were clustered through their non-
catalytic domains (similar to S3, lower panel). kinase sequences and nomenclatures were
adapted from kinase.com; pixel color indicates species. To the right: Magnified version
of selected clusters (boxes a-i). The great majority of proteins in clusters a and b
(indicated by the red box) are tyrosine kinases (TK).
0.563
0.653
0.554
0.7030.563
0.643
0.5
0.333
0.5
0.333
0.352
0.3380.447
0.325
0.434
0.2
0.289
0.378
0.20.312
0.25
00.8 0.6 0.6 0.210
0.10.20.30.40.50.60.70.80.9
1
Similarity coefficient index (Jaccard's)
Com
pact
fact
or
(# o
f clu
bs /
tota
l # o
f pro
tein
s)
cut-off point
supplementary Figure S5 -- Jin et al.
Domain club #53:
BBOXBBOX BROMOBROMO
BBOXBBOX BROMOBROMO
Domain club #54:
PHDPHD BROMOBROMO PWWPPWWP
PWWPPWWP
BROMOBROMO PWWPPWWP
Domain club #55:
PHDPHD BROMOBROMOSANDSAND
PHDPHD BROMOBROMOAT_hookAT_hook DDTDDT
PHDPHD BROMOBROMO
PHDPHD BROMOBROMODDTDDTMBDMBD
PHDPHD BROMOBROMODDTDDT
BBCBBC PHDPHD
RINGRING BBCBBC
Zf_C2H2Zf_C2H2
RINGRING
PHDPHDBBCBBCRINGRING
BROMOBROMO
PHDPHD
BBOXBBOX BROMOBROMO
MBDMBD
Domain club #56:BROMOBROMO
BROMOBROMOWD40WD40
Domain club #420:
SH2SH2SH2SH2PTBPTB
PTBPTBPTPc_DSPcPTPc_DSPc
PTPc_DSPcPTPc_DSPc
C1C1
Domain club #421:
SH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
SOCSSOCS
Domain club #422:
SH2SH2SH2SH2
SH2SH2SH2SH2
RhoGAPRhoGAP
Domain club #423:
SH2SH2SH2SH2 TyrKcTyrKc
SH3SH3
FCHFCH
SH3SH3
SH3SH3
SH2SH2SH2SH2 TyrKcTyrKc
SH2SH2SH2SH2 TyrKcTyrKc
PTBPTBSH2SH2SH2SH2
SH2SH2SH2SH2 SH3SH3 SH3SH3
SH2SH2SH2SH2SH3SH3 SH3SH3 SH3SH3
SH2SH2SH2SH2
PTPcPTPc
SH2SH2SH2SH2SH2SH2SH2SH2 PTPcPTPc
PTBPTBSH2SH2SH2SH2
SH2SH2SH2SH2
BBOXBBOX BROMOBROMO
BBOXBBOX BROMOBROMO
PHDPHD BROMOBROMO PWWPPWWP
PWWPPWWP
BROMOBROMO PWWPPWWP
PHDPHD BROMOBROMOSANDSAND
PHDPHD BROMOBROMOAT_hookAT_hook DDTDDT
PHDPHD BROMOBROMO
PHDPHD BROMOBROMODDTDDTMBDMBD
PHDPHD BROMOBROMODDTDDT
BBCBBC PHDPHD
RINGRING BBCBBC
Zf_C2H2Zf_C2H2
RINGRING
PHDPHDBBCBBCRINGRING
BROMOBROMO
PHDPHD
BBOXBBOX BROMOBROMO
MBDMBD
BROMOBROMO
BROMOBROMOWD40WD40
SH2SH2SH2SH2PTBPTB
PTBPTBPTPc_DSPcPTPc_DSPc
PTPc_DSPcPTPc_DSPc
C1C1
SH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
SOCSSOCS
SH2SH2SH2SH2
SH2SH2SH2SH2
RhoGAPRhoGAP
SH2SH2SH2SH2 TyrKcTyrKc
SH3SH3
FCHFCH
SH3SH3
SH3SH3
SH2SH2SH2SH2 TyrKcTyrKc
SH2SH2SH2SH2 TyrKcTyrKc
PTBPTBSH2SH2SH2SH2
SH2SH2SH2SH2 SH3SH3 SH3SH3
SH2SH2SH2SH2SH3SH3 SH3SH3 SH3SH3
SH2SH2SH2SH2
PTPcPTPc
SH2SH2SH2SH2SH2SH2SH2SH2 PTPcPTPc
PTBPTBSH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
RhoGAPRhoGAP
SH2SH2SH2SH2 TyrKcTyrKc
SH3SH3
FCHFCH
SH3SH3
SH3SH3
SH2SH2SH2SH2 TyrKcTyrKc
SH2SH2SH2SH2 TyrKcTyrKc
SH2SH2SH2SH2 SH3SH3 SH3SH3
SH2SH2SH2SH2SH3SH3 SH3SH3 SH3SH3
SH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
SH2SH2SH2SH2
SOCSSOCS
SH2SH2SH2SH2PTBPTB
PTBPTBPTPc_DSPcPTPc_DSPc
PTPc_DSPcPTPc_DSPc
C1C1
SH2SH2SH2SH2
PTBPTBSH2SH2SH2SH2
PTPcPTPc
SH2SH2SH2SH2SH2SH2SH2SH2 PTPcPTPc
PTBPTBSH2SH2SH2SH2
BBOXBBOX BROMOBROMO
BBOXBBOX BROMOBROMO
PHDPHD BROMOBROMO PWWPPWWP
PWWPPWWP
BROMOBROMO PWWPPWWP
PHDPHD BROMOBROMOSANDSAND
PHDPHD BROMOBROMOAT_hookAT_hook DDTDDT
PHDPHD BROMOBROMO
PHDPHD BROMOBROMODDTDDTMBDMBD
PHDPHD BROMOBROMODDTDDT
BBCBBC PHDPHD
RINGRING BBCBBC
Zf_C2H2Zf_C2H2
RINGRING
PHDPHDBBCBBCRINGRING
BROMOBROMO
PHDPHD
BBOXBBOX BROMOBROMO
MBDMBD
BROMOBROMO
BROMOBROMOWD40WD40
0.4
fig. S5. Domain club cut-off selection
For every possible value of the domain club cut-off from the 7 genome co-clustering, we
computed the number of resulting domain clubs, and plotted this number as a function of
the cut-off (plot panel). The figure shows this function, as well as the chosen cut-off
(similarity coefficient index at 0.502 - indicated by an arrow). The chosen cut-off
corresponds to a natural jump in the plotted curve (plot panel), resulting in 1,245 clubs
(#1-1,245, Figure 3A). At this cut-off (illustrated by a black dashed line), example
domain clubs (#53-56, and #420-423, as in Figure 3A) are shown in the foreground (solid
boxes, center panels). The branching patterns of the tree in these two areas are also
mapped (solid lines, values for the similarity coefficient marked at branches). The
groupings at the various cut-offs (corresponding to the dashed lines) are shown in the
background (dashed boxes in fading colors). Note that at a lower cut-off (right panels)
when proteins that only contain an SH2 domain form a club, this remains on the same
branch of the tree and is surrounded by the same neighbourhood of domain compositions.
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dicer1
similarity index in seven-proteome
sim
ilarit
y in
dex
in n
ine-
prot
eom
e
r : 0.9975p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ZMYND11
similarity index in seven-proteome
sim
ilarit
y in
dex
in n
ine-
prot
eom
e
r : 0.9444p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Tdrd1
similarity index in seven-proteome
sim
ilarit
y in
dex
in n
ine-
prot
eom
r : 0.9219p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
7-proteome
Fes
Dic
er1
r :-0.0204p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Fes
similarity index in seven-proteome
r : 0.9566p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
JMJD2C
similarity index in seven-proteome
r : 0.9363p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
EGFR
similarity index in seven-proteome
r : 0.9593p < 0.001
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
9-proteome
Fes
r :-0.0174p < 0.001
A
B
supplementary Figure S6 -- Jin et al.
fig. S6. Test of the robustness of the domain profiling method for applications in
large data sets
A. We made a direct comparison of the results from clustering of seven proteomes
(Figure 3) with those from another amalgamated set containing two additional proteomes,
those of Xenopus tropicalis and Mus musculus (nine-proteome). Specifically, we selected
six protein examples (Dicer1, Fes, ZMTND11, JMJD2C, Tdrd1 and EGFR) involved in
various biological processes, and for each protein, computed its distance to every other
protein in the seven-proteome protein set (total number of proteins exceeding 40,000) and
in the nine-proteome protein set. The distance pairs (in fact, the equivalent similarity
pairs) are represented in each plot as circles whose x-coordinates are measurements in
seven-proteome protein set and y-coordinates are the measurements in the nine-proteome
protein set. We note a consistent distribution pattern along the y=x line in every plot,
which is indicative of positive correlations between x- and y-axis values. Since the
majority of the measurement pairs are densely located in the upper-right corner in each
graph (0.95-1.0 along both x- and y-axes) and details of such correlations are not visually
apparent, we in addition calculated the Pearson’s Correlation Coefficient (r) to relate the
x- and y-axes values. For all six examples, strong positive correlations are observed (r-
values exceeding 0.9, in red), indicating the consistency between the tree-grams resulting
from datasets of different sizes (seven and nine proteomes). Also given in the plots are p-
values against the null hypotheses that the measurements in different-sized datasets be
completely unrelated.
B. In a negative control analysis, two protein examples (Dicer1 and Fes, residing in club
#188 and club #423 respectively) sampled from distinct branches of the tree (Figure 3A,
S6, and supplementary section S-III) are compared. In the plots, the y-coordinate is the
similarity between Dicer1 and every other protein in the protein set, and the x-coordiate
is the Fes counterpart. In both the seven-proteome and the nine-proteome plots, this
yielded distinct distribution patterns from those shown in A, and in both instances, a lack
of correlation indicated by small r-values (-0.0204 and -0.0174 respectively, in blue) was
observed. Such results indicate that distant protein pairs in the proteomes (i.e. Dicer and
Fes) would consistently remain separated from each other following domain profiling,
and this is not affected by the size of dataset.
Domain Clubs (1 to 1245)
DICER1
..
A
supplementary Figure S7 -- Jin et al.
B
HELICc Domain
200 400 600 800 1000 1200
Sim
ilarit
yIn
dex
Yea
stS
lime
Mol
dW
orm
Frui
t Fly
Zebr
afis
hC
hick
enH
uman
Num
ber o
f Pro
tein
s
1.00
0.98
0.96
0
103
51
0
73
36
0
84
42
0
70
35
0
81
40
0
26
13
0
72
36
DSRM Domain
200 400 600 800 1000 1200
Sim
ilarit
yIn
dex
Yea
stS
lime
Mol
dW
orm
Frui
t Fly
Zebr
afis
hC
hick
enH
uman
Num
ber o
f Pro
tein
s
1.00
0.98
0.96
0
9
4
0
6
3
0
8
4
0
8
4
0
6
3
0
10
2
..Club #180
DGCR8* WWDSRM
DSRM
DUS2L Pfam:DusDSRM
EIF2AK2 DSRMDSRM
STYKc
ILF3 DZFDSRM
DSRM
PRKRA DSRMDSRM
DSRM
STAU1* DSRMDSRM
DSRM
STAU2 DSRMDSRM
DSRMDSRM
STRBP DZFDSRM
DSRM
TARBP2 DSRMDSRM
DSRM
Club #181
ADAD1 DSRMADEAMc
ADAR ZalphaZalpha
DSRMDSRM
DSRMADEAMc
ADARB1 DSRMDSRM
ADEAMc
ADARB2 DSRMDSRM
ADEAMc
Club #188
DHX9* DSRMDSRM
DEXDcHELICc
HA2Pfam:DUF1605
DICER1* DEXDcHELICc
Pfam:dsRNA_bindPfam:PAZ
RIBOcRIBOc
DSRM
Club #190
NKRF DSRMDSRM
G_patchR3H
Club #265
SLC4A1AP FHADSRM
CCCC
Club #458
RNASEN/DROSHA* RIBOcRIBOc
DSRM
DEXDcHELICc
Pfam:dsRNA_bind
Pfam:PAZRIBOc RIBOc
DSRM
DSRM domainDSRM
fig. S7. A website for domain club analysis – example of Dicer1 and its DSRM
domain
Panel A. A keyword query for Dicer (protein) directs the reader to a page that displays
domain club profiles across seven eukaryotic proteomes (similar to the panels in Figure
3B), for each of Dicer’s domains (SMART definition), namely DEXDc, HELICc, DSRM
and RIBOc. The profiles for HELICc and DSRM are shown. The top panel shows all the
clubs that contain HELICc domains, and the height of the rod indicates the number of
proteins in these clubs with HELICc domains. Similarly, the lower panel indicates all of
the clubs that contain the DSRM domain, and the height of the rod indicates the total
number of proteins in these clubs with DSRM domains. The domain club containing
Dicer1 is labeled in red. It is interesting to note that DSRM (and RIBOc, not shown)
domains are absent from the “red” club (#188) of yeast and slime mold.
A query for the DSRM domain displays the relevant domain club profiles (shown at the
bottom of panel A) and the domain architectures of all human proteins containing the
DSRM domain (panel B). In yeast and slime mold, there is only one DSRM domain club,
which is occupied by the more ancient RNASEN/Ribonuclease III proteins (dashed boxes
in A, bottom panel and B) and an atypical slime mold Dicer-like protein that, like
RNASEN, has only DSRM and RIBOc domains. The metazoan Drosha protein has the
same domain combination as RNASEN, and is therefore found in club #458. The results
indicate that the DSRM domain has expanded to new domain clubs (#180, 181, 188, 190
and 265; asterisks mark proteins with known functions in small non-coding RNA
biology.) in metazoan species (arrow).
Protein A
Protein B
A
Protein C
Domain Y
Domain Y
Domain X
Domain X
Dom
ain
XDo
main
Y
... ........ ................. ..
..........
.........
... ... ....
...... ..........
.
..... ............
.Spreading (refined)
1 2
1 3
1 4
Relatedness
+
=+
=
+
=
1 1 13 3 3
1
2
4
3
Spreading (simple)
Domain
Pro
tein
Domain
Pro
tein
Domain
B
Density (height) survey*
* Density plot on matrix (visualization) and vector scanning of density elements between density maps (computing correlations)
Spreading Spreading with decay (smoothing)
verticalview
horizontalview
Molecular environment(niche) for domain X
Domain X -containing protein
Bachground: whole proteome domain clustergramPixels on the grid, right panel: Molecular environment for domain X family
supplementary Figure S8 -- Jin et al.
fig. S8. General procedure for spreading-on-graph (SOG) clustering in the analysis
of the molecular environments of domain families
A. The molecular environment of a domain family, say X, is the set of all proteins that
reportedly interact with domain X (direct interaction) or with coexisting domains/motifs
on the same polypeptide chain as domain X (associated interactions). B. As a template to
plot such interactions, we generated a protein-domain clustergram of the human proteome
(schematized in B, left panel), and scored those proteins on the matrix that interact with
proteins containing domain X, yielding a scatter pattern that represents the molecular
environment of domain X (B, right panel). The molecular environments of multiple
domains are then compared by spreading on graph (SOG) clustering of their respective
scatter patterns. In SOG clustering, the impact of each point in the scatter pattern is
spread to neighbouring areas in the clustergram to reflect the relatedness of the pixels (B
right), yielding a smoothed distribution of density (represented by height). Most simply,
spreading extends the impact of a point uniformly to coordinates within a fixed distance.
This approach may be refined by spreading the impact of a point to all other coordinates
non-uniformly following a Gaussian function of distance.
SH3
RIN
GSH
2P
H C2
PD
ZW
D40 C
1A
NK EF
BR
OM
O14
_3_3
LRR
UB
AC
HW
WB
41 LIM
DE
ATH
TPR
AR
MM
H1
MH
2C
HR
OM
OB
TBS
AM
PTB
RH
OG
EF
BR
CT
HE
CT
FHA
PB
1TR
AF
PA
SR
GS
CA
RD IQ
TUD
OR PX
FYV
ED
SH
UIM
DED
F_B
OX
TIR
WH
1B
H4
VH
SE
HT_
SN
AR
ES
OC
SH
EA
TB
AR
CU
EB
IRP
OLO
_BO
XG
EL
AD
FE
NTH
GA
TFH
2V
HL
STA
RT
GR
AM FF
TUB
PU
MIL
IOG
YF
BE
AC
HG
RIP
0
200
400
600
800
1000
1200
0
50
100
150
200
250
SH3
RIN
GSH
2P
H C2
PD
ZW
D40 C1
AN
K EF
BR
OM
O14
_3_3
LRR
UB
AC
HW
WB
41 LIM
DE
ATH
TPR
AR
MM
H1
MH
2C
HR
OM
OB
TBS
AM
PTB
RH
OG
EF
BR
CT
HE
CT
FHA
PB
1TR
AF
PA
SR
GS
CA
RD IQ
TUD
OR PX
FYV
ED
SH
UIM
DED
F_B
OX
TIR
WH
1B
H4
VH
SE
HT_
SN
AR
ES
OC
SH
EA
TB
AR
CU
EB
IRP
OLO
_BO
XG
EL
AD
FE
NTH
GA
TFH
2V
HL
STA
RT
GR
AM FF
TUB
PU
MIL
IOG
YF
BE
AC
HG
RIP#
of d
omai
n-co
ntai
ning
pro
tein
s#
of in
tera
ctin
g pr
otei
ns
supplementary Figure S9 -- Jin et al.
fig S9. The interactomes of the 70 domain families employed in the analysis of domain-based functional compartmentsThe panels plot the number of interacting proteins reported in HPRD for each studied domain family (top, the solid bars correspond to the heat map examples in Figure 4B) and the total number of proteins (in the human proteome) containing each studied domain family (bottom).
other domainPostSET
SET
BRC T
G_patch
KH
BRIGH T
Pfam:zfC2HC
Pfam:MR G
RING
Pfam:MutS_I
PHD
Pfam:HA2
Pfam:zfCW
HM G
SAM
UBA(elongation...)
Pfam:DNA_methylase
Pfam:Myosin_N
Pfam:MutS_II
MUTSac
ChSh
Pfam:NAD_binding_2
JmjC
Pfam:RBB1NT
Pfam:TC H
Pfam:DUF1087
JmjN
Pfam:MOZ_SAS
ZnF_C2H2
Pfam:PreSET
Pfam:CHDCT2
Pfam:Pkinase
MBD
SANT
Pfam:CHDNT
ANK
Pfam:ECH
ZnF_C3H1
AT_hook
Pfam:MBD
DEXD c
IQ
MYSc
Pfam:Shikimate_D H
Pfam:zfMYND
BROMO
Pfam:SWIR M
AWS
PTX
BRK
Pfam:DUF1295
Pfam:ERG4_ERG24
HELICc
PreSET
Pfam:Myosin_tail_1
VWA
Pfam:DUF1086
SNc
MUTSd
Pfam:F420_oxidored
Tudor ENSG198639 1
Tudor TP53BP1 1
Tudor FXR2 1
Tudor SETDB1 2
Tudor RNF17 3
Tudor TDRD7 2
Tudor STK31 1Tudor RNF17 4
Tudor TDRD6 5Tudor SND1 1
Tudor TDR
D1 1
Tudor TDR
D3
1
Tudor SMN
1 1Tudor SM
N2 1
Tudor SMN
1
Tudor SMN
DC
1 1
Tudo
r TD
RD
6 4
Tudo
r E
NS
G19
8639
4
Tudo
r TD
RKH
1
Tudo
r TD
RD
1 2
Tudo
r TD
RD
1 4
Tudo
r TD
RD
1 3
Tudo
r TD
RD
6 6
Tudo
r EN
SG19
8639
3
Tudo
r TD
RD
9 1
Tudo
r AK
AP1
1
Tudo
r TD
RD
6 3
Tudor ENSG
198639 2Tudor TD
RD
6 2Tudor TD
RD
5 1Tudor R
NF17 1
Tudor TDRD6 1
Tudor TDRD6 7
Tudor TDRD10 1Tudor RNF17 2
Tudor ENSG198639 5
Tudor ECAT8 2
Tudor ECAT8
Tudor TDRD7 1
Tudor TDRD7 3
Tudo
r TD
RD
6 8
Tudo
r MYH
3 1
PWW
P ZC
WPW
2 1
Tudo
r SET
DB1
1
Tudo
r BAH
CC1
1
Tudo
r TNR
C18
1
Tudo
r ARI
D4B
1
Tudo
r ARID
4A 1
Tudo
r JMJD
2B 1
Tudo
r JMJD
2A 1
Tudo
r JMJD
2C 1
Tudor
MTF2 1
Tudor
PHF1 1
Tudor PHF19 1
Tudor LBR 1
Tudor JMJD2B 2
Tudor JMJD2C 2
Tudor JMJD2A 2
Tudor Tudorlik
e PHF20L1 2
Tudor Tudorlik
e PHF20 2
Tudor ZGPAT 1
PWWP BRD1 1
PWWP ARPC4 1
PWWP BRPF3 1
PWWP NP 115958.2 1
PWWP NSD1 2
PWWP WHSC1 2
PWWP WHSC1L1 2
PWWP HDGFL1 1
PWWP PSIP1 1
PWWP NP 001001520.1 1PWWP HDGR3 1
PWWP HDGF 1
PWWP DNMT3B 1PWWP DNMT3A 1PWWP PWWP2 1PWWP MBD5 1
PWWP MSH6 1PWWP WHSC1 1
PWWP WHSC1L1 1
PWWP NSD1 1
PWWP ZMYND11 1PWWP ZMYND8 1
PWWP MUM1 1
PWWP MUM1L1 1PWWP ZCWPW1 1
Mbt SFMBT1 3
Mbt SFMBT2 3
Mbt L3MBTL2 3
Mbt MBTD1 2
Mbt SCM
H1 2
Mbt SC
ML2 2
Mbt L3M
BTL4 3
Mbt L3M
BTL3 3
Mbt L3M
BTL 3
Mbt M
BTD1 4
Mbt L3M
BTL2 4
Mbt L3M
BTL 2
Mbt L3M
BTL3 2
Mbt L3M
BTL4 2
Mbt SCM
L2 1
Mbt SCM
H1 1
Mbt L3M
BTL3 1
Mbt L3M
BTL 1M
bt L3MBTL4 1
Mbt SFMBT1 4
Mbt SFM
BT2 4
Mbt SFMBT1 1Mbt SFMBT2 1
Mbt MBTD1 1Mbt L3MBTL2 1
Mbt SFMBT2 2
Mbt SFMBT1 2
Mbt L3MBTL2 2
Mbt MBTD1 3
Chrom
o SM
AR
CC
1 1Chr
omo
SM
AR
CC
2 1
Chr
omo
CH
D4
1
Chr
omo
CH
D5
2
Chr
omo
CH
D3
1
Chr
omo
MSL
3L1
1
Tudo
r PH
F20
1
Tudo
r Tud
orlik
e P
HF2
0L1
1
Mbt
PH
F20L
1 1
Chr
omo
ARID
4A 1
Tudo
r Chr
omo
AR
ID4B
2
Chr
omo
HTA
TIP
1
Chro
mo
MYS
T1 1
Agen
et F
MR1
1
Agen
et F
XR1
1
Tudo
rlike
Age
net
FXR2
1
Chromo CHD7 1
Chromo CHD9 1
Chromo CHD8 2
Chromo C
HD6 2
Chromo C
HD2 1
Chromo C
HD1 1
Chromo SUV39H2 1Chromo SUV39H1 1Chromo CBX2 1Chromo CBX7 1
Chromo CBX4 1Chromo CBX6 1Chromo CBX8 1
Chromo MPP8 1
Chromo CDYL2 1
Chromo CDYL 1
Chromo CDY2 1
Chromo CDY1 1
Chromo CBX1 1
Chromo CBX5 1
Chromo ENSG177447 1
Chromo CBX3 1Chromo LOC642721 1
Chromo CHD7 2Chromo CHD6 1Chromo CHD9 2
Chromo CHD8 1
Chromo CHD3 2Chromo CHD5 1
Chromo CHD4 2
Chromo CHD2 2
Chromo CHD1 2
Chro
mo
shad
ow C
BX5
2
Chrom
o sh
adow
CBX
1 2
Chrom
o sha
dow C
BX3 2
Chrom
o sha
dow E
NSG1774
47 2
shad
owLO
C642
721
1
shad
owEN
SG18
5790
1
A
'Royal family' domains
RRM
supplementary Figure S10 -- Jin et al.
chromo domain: MBT domain:
PWWP domain:
Tudor domain:(Smn,SND1,germline)
ChSh
ECH
ANK
PTXAT_hook
SET
TCHSANTDEXD HELIC
C2H2DUFDEXD HELICCTNT
DEXD HELIC
KH KH
MOS-SAS
SWIRM SANT
C2H2 SAM-pnt
C2H2
SAM-pntSAM-pnt
SAM-pnt
PHD AWS SETHMG AWS SET
MusT I II III IV MusT CBromoPHD zfMYND
MBD
PHD DNA_methylase
BromoPHD
zfCW
Tudor domain (histone):
C2H2 PHD
RBP Bright Chromo
zfC3H1 Gpatch
jmjN PHDjmjC
ERG4_ERG24 DUF
PHD PHD
PHD
MBD SET
PHDMyosinN MYScIQ _tail
RBB BrightTudor
zfMYND
KHKHKH
HA2HELIC
RINGUBASNc SNc SNc SNc
PkinaseRRM
DEXDBRCT BRCT
B
PHD
BRKPHD RING
PHD
PHD
DEXD HELIC BRKSANTPHD
JMJD
Smn
SND1
fig. S10. An overview of the Royal Family of domains
A. A wheel constructed on results from multiple alignments of sequences of Chromo,
Chromo shadow, Agenet, Mbt, PWWP and Tudor domains (the degree of similarity is
represented by the hierarchical tree at the center). These domains likely derived from a
common ancestor (12). The domains are labeled with the domain name, followed by the
protein name, and finally the sequential order of the domain from N- to C-terminus in the
case of proteins with tandem repeats of the same domain type. The color bars at the
outer-most rim of the wheel show one or more of 60 co-existing domains (see legend to
the left) that are linked to a Royal family domain.
B. Domain architectures of proteins in A (color matched to legend in A).
supplementary Figure S11 -- Jin et al.
Ago1 ---------------------------------------------------MEAGPS----------GAAAGAYLPPLQQVFQAPRRPGIG---------------------------------------------------------TVGKPIKLLANYFEVDIPK---------IDVY 53 Ago3 ---------------------------------------------------MEIGSA----------GPIG------AQPLFIVPRRPGYG---------------------------------------------------------TMGKPIKLLANCFQVEIPK---------IDVY 47 Ago4 ---------------------------------------------------MEALGP----------GPPA--------SLFQPPRRPGLG---------------------------------------------------------TVGKPIRLLANHFQVQIPK---------IDVY 45 Ago2 ---------------------------------------------------MYSGAGPVL-------ASPAPTTSPIPGYAFKPPPRPDFG---------------------------------------------------------TTGRTIKLQANFFEMDIPK---------IDIY 56 Miwi ----------------------------MT-------GRARARARGRARG----QETVQH------VGAAASQQPGYIPPR---PQQSPTE---GDLVGRGR-----QRG-------------------MVVGAT-------------SKSQELQISAGFQELSLAE---RGGR-RRDFH 88 Miwi2 -----------------MRLRILGVHRALP-------THARVAVCSNYLGKLEYSQTPSH------SHTVSFAKEKTLLLRLTSPGKPLAP---RNMSGRAR-----VRARGITTGHSAREVGRSSRDLMVTSASPGDSEAGGGTSVISQPYELGVSSGDGGRTFME---RRGKGRQDFE 139 Mili MDPVRPLFRGPTPVHPSQCVRMPGCWPQAPRPLEPAWGRAGPAGRGLVFRKPEDSSPPLQPVQKDSVGLVSMFRGMGLDTAFRPPSKREVPPLGRGVLGRGLSANMVRKDREEPRSSLPDPSVLAAGDSKLAEASVGWSRMLGRGSSEVSLLPLGRAASSIGRGMDKPPSAFGLTARDPP 180
. * : : :. : : * Ago1 HYE-----------------VDIKPDKCPRRVNREVVEYMVQHFKPQIFGDRKPVYDGKENIYTVTALPIGNERVDFEVTIPGEG---KDRIFKVSIKWLAIVSWRMLHEALVSGQIPVPLES---------------VQALDVAMR-HLASMRYTPVGRSFFSPPEGYYHPLGGGREVW 197 Ago3 LYE-----------------VDIKPDKCPRRVNREVVDSKVQHFKVTIFGDRRPVYDGKRSLYTANPLPVATTGVDLDVTLPGEGG--KDRPFKVSVKFVSRVSWHLLHEALAGGTLPEPLELDKPVS-------TNPVHAVDVVLR-HLPSMKYTPVGRSFFSAPEGYDHPLGGGREVW 200 Ago4 HYD-----------------VDIKPEKRPRRVNREVVDTMVRHFKMQIFGDRQPGYDGKRNMYTAHPLPIGRDRIDMEVTLPGEG---KDQTFKVSVQWVSVASLQLLLEALAGHLNEVPDDS---------------VQALDVITR-HLPSMRYTPVGRSFFSPPEGYYHPLGGGREVW 189 Ago2 HYE-----------------LDIKPEKRPRRVNREIVEHMVQHFKTQIFGDRKPVFDGRKNLYTAMPLPIGRDKVELEVTLPGEG---KDRILKVSIKWVSCVSLQALHDALSGRLPSVPFET---------------IQALDVVMR-HLPSMRYTPVGRSFFTASEGCSNPLGGGREVW 200 Miwi DLG-----------------VNTRQNLDHVKESKTGSSGIIVKLSTNHFRLTSRPQWALYQYHIDYNPLMEARRLRSALLFQHEDLIGRCHAFDGTILFLPKRLQHKVTEVFSQTRNGEHVRITITLTNELPPTSPTCLQFYNIIFRRLLKIMNLQQIGRNYYNPSDPIDIPNHR-LVIW 250 Miwi2 ELG-----------------VCTREKLTHVKDCKTGSSGIPVRLVTNLFNLDLPQDWQLYQYHVTYSPDLASRRLRIALLYNHSILSDKAKAFDGASLFLSEKLDQKVTELTSETQRGETIKITLTLTSKLFPNSPVCIQFFNVIFRKILKNLSMYQIGRNFYKPSEPVEIPQY------ 296 Mili RLPQPPALSPTSLHSADPPPVLTMERKEKELLVKQGSKGTPQSLGLNLIKIQCHN-EAVYQYHVTFSPSVECKSMRFGMLKDHQSVTGNVTAFDGSILYLPVKLQQVVELKSQRKTDDAEISIKIQLTKILEPCSDLCIPFYNVVFRRVMKLLDMKLVGRNFYDPTSAMVLQQHR-LQIW 358 : : . : : . : : : : . . :. : ::. : : : :: * : : :**.:: ... Ago1 FGFHQSVRPAMWKMMLNIDVSATAFYKAQPVIEFMCEVLDIRNIDEQPKPLTDSQRVRFTKEIKGLKVEVTHCGQMKRKYRVCNVTRRPASHQTFPLQLESGQTVECTVAQHFKQKYNLQLKYPHLPCLQVGQ---------EQKHTYLPLEVCNIVAGQRCIKKLTDNQTSTMIKATAR 368 Ago3 FGFHQSVRPAMWKMMLNIDVSATAFYKAQPVIQFMCEVLDIHNIDEQPRPLTDSHRVKFTKEIKGLKVEVTHCGTMRRKYRVCNVTRRPASHQTFPLQLENGQTVERTVAQYFREKYTLQLKYPHLPCLQVGQ---------EQKHTYLPLEVCNIVAGQRCIKKLTDNQTSTMIKATAR 371 Ago4 FGFHQSVRPAMWNMMLNIDVSATAFYRAQPIIEFMCEVLDIQNINEQTKPLTDSQRVKFTKEIRGLKVEVTHCGQMKRKYRVCNVTRRPASHQTFPLQLENGQAMECTVAQYFKQKYSLQLKHPHLPCLQVGQ---------EQKHTYLPLEVCNIVAGQRCIKKLTDNQTSTMIKATAR 360 Ago2 FGFHQSVRPSLWKMMLNIDVSATAFYKAQPVIEFVCEVLDFKSIEEQQKPLTDSQRVKFTKEIKGLKVEITHCGQMKRKYRVCNVTRRPASHQTFPLQQESGQTVECTVAQYFKDRHKLVLRYPHLPCLQVGQ---------EQKHTYLPLEVCNIVAGQRCIKKLTDNQTSTMIRATAR 371 Miwi PGFTTSILQYENNIMLCTDVSHKVLRSET-VLDFMFNLYQQTEEHKFQEQVS--------KELIGLIVLTKYN---NKTYRVDDIDWDQNPKSTFKKA----DGSEVSFLEYYRKQYNQEITDLKQPVLVSQP-KRRRGPGGTLPGPAMLIPELCYLTGLTDKMRNDFNVMKDLAVHTRL 413 Miwi2 -----------NKLLFNADVNYKVLRNET-VLDFMTDLCLRTGMSCFTEMCH--------KQLVGLVVLTRYN---NKTYRIDDIDWSVKPTQAFQKR----DGSEVTYVDYYKQQYDITLSDLNQPVLVSLL-KRKRN-DNSEPQMVHLMPELCFLTGLSSQATSDFRLMKAVAEETRL 447 Mili PGYAASIRRTDGGLFLLADVSHKVIRNDS-VLDVMHAIYQQNKE-HFQDECS--------KLLVGSIVITRYN---NRTYRIDDVDWNKTPKDSFVMS----DGKEITFLEYYSKNYGITVKEDDQPLLIHRPSERQNNHGMLLKGEILLLPELSFMTGIPEKMKKDFRAMKDLTQQINL 512 ::: **. ..: :::.: * : * * : .:.**: :: . .:* : * : ::: ..: : . * * : ::* . . : Ago1 SAPDRQEEISRLMKNASCN--LDPYIQ-----------EFGIK---VKDDMTEVTGRVLPAPILQYGGRNRAIATP-----------NQGVWDMRGKQFYNGIEIKVWAIACFAPQKQCREEVLKNFTDQLRKISKDAGMPIQGQPCFCKYAQGADSVEPMFRHLKNTYSGLQLIIVILP 521 Ago3 SAPDRQEEISRLVRSANYE--TDPFVQ-----------EFQLK---VRDEMAHVTGRVLPAPMLQYGGRNRTVATP-----------SHGVWDMRGKQFHTGVEIKMWAIACFATQRQCREEILKGFTDQLRKISKDAGMPIQGQPCFCKYAQGADSVEPMFRHLKNTYSGLQLIIVILP 524 Ago4 SAPDRQEEISRLVKSNSMVGGPDPYLK-----------EFGIV---VHNEMTELTGRVLPAPMLQYGGRNKTVATP-----------SQGVWDMRGKQFYAGIEIKVWAVACFAPQKQCREDLLKSFTDQLRKISKDAGMPIQGQPCFCKYAQGADSVEPMFKHLKMTYVGLQLIVVILP 515 Ago2 SAPDRQEEISKLMRSASFN--TDPYVR-----------EFGIM---VKDEMTDVTGRVLQPPSILYGGRNKAIATP-----------VQGVWDMRNKQFHTGIEIKVWAIACFAPQRQCTEVHLKSFTEQLRKISRDAGMPIQGQPCFCKYAQGADSVEPMFRHLKNTYAGLQLVVVILP 524 Miwi TPEQRQREVGRLIDYIHKDDNVQRELR-----------DWGLS---FDSNLLSFSGRILQSEKIHQGGKTFDYNP----------QFADWSKETRGAPLISVKPLDNWLLIYTRR----NYEAANSLIQNLFKVTPAMGIQMKKAIMIEV-DDRTEAYLRALQQKV--TSDTQIVVCLLS 562 Miwi2 SPVGRQQQLARLVDDIQRTLPSSQEVLSHTSLPLWAPEPGGLSSAIPLSTVLPFAQQLLTALSLSPGIPLPHLKPPSFLFLCQPAFAADWSKDMRSCKVLSSQPLNRWLIVCCNR----AEHLIEAFLSCLRRVGGSMGFNVGYPKIIKV-DETPAAFLRAIQVHG--DPDVQLVMCILP 620 Mili SPKQHHGALECLLQRISQNETASNELT-----------RWGLS---LHKDVHKIEGRLLPMERINLRNTSFVTSED-----------LNWVKEVTRDASILTIPMHFWALFYPKR----AMDQARELVNMLEKIAGPIGMRISPPAWVELKDDRIETYIRTIQSLLGVEGKIQMVVCIIM 672 :. :: : *: . : : . : . ::* : . : :. * : . : . * :: *: : . : : :: *::: :: Ago1 G-KTPVYAEVKRVGDTLLGMATQCVQVKNAVK--TSPQTLSNLCLKINVKLGGINNILVPHQRSAVFQQPVIFLGADVTHPPAGDGKKPSITAVVGSMDAHPSRYCATVRVQRPRQ----------EIIEDLSYMVRELLIQFYKSTRFKPTRIIFYRDGVPEGQLPQILHYELLAIRDA 688 Ago3 G-KTPVYAEVKRVGDTLLGMATQCVQVKNVIK--TSPQTLSNLCLKINVKLGGINNILVPHQRPSVFQQPVIFLGADVTHPPAGDGKKPSIAAVVGSMDAHPSRYCATVRVQRPRQ----------EIIQDLASMVRELLIQFYKSTRFKPTRIIFYRDGVSEGQFRQVLYYELLAIREA 691 Ago4 G-KTPVYAEVKRVGDTLLGMATQCVQIKNVVK--TSPQTLSNLCLKMNAKLGGINNVPVPHQRPSVFQQPVIFLGADVTHPPAGDGKKPSIAAVVGSMDGHPSRYCATVWVQTSRQEIAQELLYSQEVVQDLTSMARELLIQFYKSTRFKPTRIIYYRGGVSEGQMKQVAWPELIAIRKA 692 Ago2 G-KTPVYAEVKRVGDTVLGMATQCVQMKNVQR--TTPQTLSNLCLKINVKLGGVNNILLPQGRPPVFQQPVIFLGADVTHPPAGDGKKPSIAAVVGSMDAHPNRYCATVRVQQHRQ----------EIIQDLAAMVRELLIQFYKSTRFKPTRIIFYRDGVSEGQFQQVLHHELLAIREA 691 Miwi SNRKDKYDAIKKYLCTDCPTPSQCVVARTLGKQQTVMAIATKIALQMNCKMGG------ELWRVDMPLKLAMIVGIDCYHDTTA--GRRSIAGFVASINEGMTRWFSRCVFQDRGQ----------ELVDGLKVCLQAALRAWSGCNEYMPSRVIVYRDGVGDGQLKTLVNYEVPQFLDC 724 Miwi2 SNQKNYYDSIKKYLSSDCPVPSQCVLTRTLNKQGTMLSVATKIAMQMTCKLGG------ELWSVEIPLKSLMVVGIDICRDALN--KNVVVVGFVASINSRITRWFSRCVLQRTAA----------DIADCLKVCMTGALNRWYRHNHDLPARIVVYRDGVGNGQLKAVLEYEVPQLLKS 782 Mili GTRDDLYGAIKKLCCVQSPVPSQVINVRTIGQPTRLRSVAQKILLQMNCKLGG------ELWGVDIPLKQLMVIGMDVYHDPSR--GMRSVVGFVASINLTLTKWYSRVVFQMPHQ----------EIVDSLKLCLVGSLKKYYEVNHCLPEKIVVYRDGVSDGQLKTVANYEIPQLQKC 834 . : * :*: .:* : :. : :: :::. *:** : : :.:* * : . :...*.*:: .:: : .* :: : * * : .. * ::: **.** :**: : *: : .. Ago1 CIKLEKDYQPGITYIVVQKRHHTRLFCADKNERIGKSGNIPAGTTVDTNITHPFEFDFYLCSHAGIQGTSRPSHYYVLWDDNRFTADELQILTYQLCHTYVRCTRSVSIPAPAYYARLVAFRARYHLVDKEHDSGEGSHISGQSNGRDPQALAKAVQVHQDTLRTMYFA 857 Ago3 CISLEKDYQPGITYIVVQKRHHTRLFCADRTERVGRSGNIPAGTTVDTDITHPYEFDFYLCSHAGIQGTSRPSHYHVLWDDNFFTADELQLLTYQLCHTYVRCTRSVSIPAPAYYAHLVAFRARYHLVDKEHDSAEGSHVSGQSNGRDPQALAKAVQIHQDTLRTMYFA 860 Ago4 CISLEEDYRPGITYIVVQKRHHTRLFCADKMERVGKSGNVPAGTTVDSTVTHPSEFDFYLCSHAGIQGTSRPSHYQVLWDDNCFTADELQLLTYQLCHTYVRCTRSVSIPAPAYYARLVAFRARYHLVDKDHDSAEGSHVSGQSNGRDPQALAKAVQIHHDTQHTMYFA 861 Ago2 CIKLEKDYQPGITFIVVQKRHHTRLFCTDKNERVGKSGNIPAGTTVDTKITHPTEFDFYLCSHAGIQGTSRPSHYHVLWDDNRFSSDELQILTYQLCHTYVRCTRSVSIPAPAYYAHLVAFRARYHLVDKEHDSAEGSHTSGQSNGRDHQALAKAVQVHQDTLRTMYFA 860 Miwi LKSVGRGYNPRLTVIVVKKRVNARFFAQSG----GRLQNPLPGTVIDVEVTRPEWYDFFIVSQAVRSGSVSPTHYNVIYDSSGLKPDHIQRLTYKLCHVYYNWPGVIRVPAPCQYAHKLAF------------------LVGQSIHREP---------NLSLSNRLYYL 862 Miwi2 VTECG----------------------------------------------SDARYDFYLISQTANRGTVSPTHYNVIYDDNALKPDHMQRLTFKLCHLYYNWQGLISVPAPCQYAHKLTF------------------LVAQSVHKEP---------SLELANNLFYL 878 Mili FEAFDN-YHPKMVVFVVQKKISTNLYLAAP----DHFVTPSPGTVVDHTITSCEWVDFYLLAHHVRQGCGIPTHYICVLNTANLSPDHMQRLTFKLCHMYWNWPGTIRVPAPCKYAHKLAF------------------LSGQILHHEP---------AIQLCGNLFFL 971 **:: :: * *:** : : :..*.:* **::*** * . : :***. **: :: * .* :: . :::
fig. S11. Sequence characteristics of the mouse Piwi clade of the Argonaute proteins
Sequence alignment of Argonaute family proteins was conducted using ClustalW
programming (www.ebi.ac.uk/clustalw/). Note that Piwi clade and Ago clade proteins
differ at their N-termini, where they show a low degree of overall sequence similarity as
compared to the rest of the sequences, which are more highly conserved (compare the
frequency of conserved residues, indicated below the alignment). Of interest, The Piwi
clade proteins have multiple arginine-glycine motifs (RG-boxes – highlighted) in their N-
termini, which are absent from the Ago proteins.
Q- (2me)R53 -G -M-V-V-G -A-T-S-K
K S T A G VV M+o G R+2me QKSTAGV VM+oGR+2meQ
fig. S12. An MS/MS spectrum of a tryptic peptide containing dimethylated (2me) Arg53 of endogenous Miwi protein from testis.
supplementary Figure S12 -- Jin et al.