Rewiring the dynamic interactome

ISSN 1742-206X

1742-206X(2012)8:8;1-5

www.molecularbiosystems.org Volume 8 | Number 8 | August 2012 | Pages 2013–2222

PAPERMark A. RaganRewiring the dynamic interactome

Interfacing chemical biology with the -omic sciences and systems biology

MolecularBioSystems

Indexed in

MED

LINE!

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online / Journal Homepage / Table of Contents for this issue

http://dx.doi.org/10.1039/c2mb25050k

http://pubs.rsc.org/en/journals/journal/MB

http://pubs.rsc.org/en/journals/journal/MB?issueid=MB008008

2054 Mol. BioSyst., 2012, 8, 2054–2066 This journal is c The Royal Society of Chemistry 2012

Cite this: Mol. BioSyst., 2012, 8, 2054–2066

Rewiring the dynamic interactomew

Melissa J. Davis,zabc Chang Jin Shin,zab Ning Jingaand Mark A. Ragan*

ab

Received 14th February 2012, Accepted 30th May 2012

DOI: 10.1039/c2mb25050k

Transcriptomics continues to provide ever-more evidence that in morphologically complex

eukaryotes, each protein-coding genetic locus can give rise to multiple transcripts that differ in

length, exon content and/or other sequence features. In humans, more than 60% of loci give rise

to multiple transcripts in this way. Motifs that mediate protein–protein interactions can be

present or absent in these transcripts. Analysis of protein interaction networks has been a

valuable development in systems biology. Interactions are typically recorded for representative

proteins or even genes, although exploratory transcriptomics has revealed great spatiotemporal

diversity in the output of genes at both the transcript and protein-isoform levels. The increasing

availability of high-resolution protein structures has made it possible to identify the

domain–domain interactions that underpin many protein interactions. To explore the impact of

transcript and isoform diversity we use full-length human cDNAs to interrogate the protein-

coding transcriptional output of genes, identifying variation in the inclusion of protein interaction

domains. We map these data to a set of high-quality protein interactions, and characterise the

variation in network connectivity likely to result. We find strong evidence for altered interaction

potential in nearly 20% of genes, suggesting that transcriptional variation can significantly rewire

the human interactome.

Introduction

Mammalian transcriptome sequencing has revealed surprising

diversity in the transcriptional output of those regions of the

genome we typically think of as genes. The ‘‘one gene–one

protein’’ hypothesis prevalent in molecular biology since the

1940s1 has been superseded by a new paradigm formulated

from the results of projects such as the Functional Annotation

of Mouse (FANTOM)2,3 and the Human Genome Informa-

tion Integration Project.4 These projects have demonstrated

great diversity not only in the transcriptional output of genetic

loci, but also in the complement of proteins produced from

those transcripts.3 The idea of a gene as a unit of information

encoding a single piece of biological functionality has given

way to the realisation that any given genetic loci within the

genome may produce a great variety of protein-coding and

non-coding transcripts. Even a single coding transcript may

be spliced in different ways to produce a variety of protein

isoforms. It is likely that in excess of 60% of what were

once thought of as human genes are alternatively spliced.5–7

This, coupled with other sources of variation such as alter-

native promoter/first exon usage, and alternative transcrip-

tional initiation and termination sites, generates great diversity

in the mammalian transcriptome and proteome. These obser-

vations have also given rise to the use of terms such as

‘‘transcriptional unit’’ to capture the idea that a genomic

region can generate a set of related transcripts sharing a core

of genetic information.8

In parallel with our evolving understanding of the complex

mammalian transcriptome, high-throughput experimental

techniques have lead to the increasing use of network analysis

as a framework for interpreting results. Systems of molecular

interactions underpin all cellular processes; as such, under-

standing the ways in which molecular-interaction networks

function in cells provides vital insight into cellular processes

and enhances our understanding of both normal and pathological

cellular states. In recent years, significant advances have been

made in characterizing the molecular interaction networks of

morphologically complex eukaryotes. Currently, public and com-

mercial databases provide access to collections of interaction data

extracted manually from research publications (for example

IntACT,9 DIP,10 and MINT11), or integrated collections

harvested from primary databases (for example STRING,12

APID,13 and ConsensusPathDB14). Many such resources

are listed at http://ppi.fli-leibniz.de/jcb_ppi_databases.html.

It is possible to query these resources and find very large

numbers of interactions, either for specific organisms, or for

a The University of Queensland, Institute for Molecular Bioscience,St Lucia, Queensland 4072, Australia. E-mail: [email protected];Fax: 61-7-3346-2101; Tel: 61-7-3346-2616

bARC Centre of Excellence in Bioinformatics, AustraliacQueensland Facility for Advanced Bioinformatics, Australiaw Electronic supplementary information (ESI) available. See DOI:10.1039/c2mb25050kz Joint First Authors.

MolecularBioSystems

Dynamic Article Links

www.rsc.org/molecularbiosystems PAPER

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online




This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 2054–2066 2055

individually queried proteins, thus building a picture of the

networks present in a given system of interest. However, these

networks still do not project the dynamics of the interactome:

cellular processes are by nature dynamic, and often specific to

a particular cell type, stage or physiological context.15,16 Like-

wise, many databases and meta-collections assign interactions

to the longest protein known for a particular gene,17,18 or to

the gene itself,19 and the specific protein isoform is rarely

identified. Indeed, the nature of the experiment may mean that

the precise identity of the interacting protein is not known.

Proteins are typically composed of structural and/or func-

tional modules referred to as domains, and the specific,

ordered set of these domains within a protein forms its domain

architecture. Many domains are well-studied, and computa-

tional models exist for their prediction from sequence. Analysis

of the alternative splicing of protein-coding transcripts indicates

that complete domains are alternatively spliced more frequently

than expected by chance,20 and alteration of the domain

architecture of proteins due to alternative splicing or other

post-translational mechanisms is known to result in several

types of loss or gain of function. For example, recent studies

have described changed enzymatic activity,21 altered protein

stability, and changed subcellular localisation.22

Certain combinations of domains are also known to mediate

protein–protein interactions23–25 as determined either by

examination of high-resolution structures,26,27 or by methods

that examine domain co-occurrence in the interactome.28,29

Domains known to be involved in protein interactions are

more likely to be spliced than are others,30 and many alter-

natively spliced regions correspond to interaction interfaces.31

Although some smaller-scale studies have investigated the

effects of splicing on protein functions,32,33 to our knowledge

there has as yet been no systematic, transcriptome-wide study

to examine the collective impact of these changes on the

mammalian interactome. We present here the first such

systematic analysis over integrated human transcriptomic

datasets. We characterise the extent to which changes in the

domain architecture of protein isoforms is likely to alter their

interaction potential, and we illustrate our findings with

experimentally verified examples.

Results

Integrated human transcriptome data

We identified two sources of transcriptome data that provide

full-length cDNA sequences suitable for building a high-

quality picture of the human transcriptome.34 Transcript

clusters based on full-length human cDNA derived from the

H-Invitational Database (H-InvDB)4 and FANTOM32 were

integrated. A full characterisation of this dataset is available in

the Supplementary Material, and notable attributes of the

integrated dataset important for our analysis are described

briefly here.

Integration of these two datasets created a set of 26955 clusters,

covering 68748 transcripts that encode distinct protein isoforms.

Fewer than half (11 397) of the clusters in our set are composed

of transcripts from both H-InvDB and FANTOM3. A further

700 contained representatives from both sets, but not all

transcripts were identified with an Entrez Gene ID (gene

identifier). Over half of the clusters were derived from either

H-InvDB or FANTOM3: 6517 (1453 with gene identifiers)

contain sequences from H-Inv DB, while 8341 (7268 with gene

identifiers) contain sequences from FANTOM3. We further

analysed the composition of data drawn from each resource by

examining the extent to which transcripts were shared between

the datasets, and the extent to which individual exons are

shared between sets (details in Supplementary Fig. 1).

We divided transcript clusters into two categories: those

containing only a single protein-coding transcript (single-isoform

units, 48%) and those containing multiple protein-coding

transcripts (multiple-isoform clusters, 52%) (Fig. 1A). The

average length of coding regions in the transcripts of single-

isoform units was significantly shorter than the average from

multiple-isoform clusters (184 amino acids (aa) compared to

498 aa, p { 0.0001). Further, half of the single-isoform units

have no match to an Entrez gene identifier. Units containing

only one protein-coding transcript are not useful for this study

because they cannot illuminate variation in the protein pro-

ducts of genes. The multiple-isoform clusters contain an

average of four transcripts per unit. Only these clusters con-

tain data informative for our analysis. The high degree of

concordance we see between multiple-isoform clusters and

Entrez gene identifiers (fewer than 2% of these clusters fail

to map to a gene identifier) means that we can, for practical

purposes, consider these transcript clusters to represent genes

in the sense that the encoded isoforms share common genetic

information associated with a particular gene identifier.

Domain predictions

We generated Pfam domain predictions for all protein-coding

transcripts in our dataset; 4251 unique types of Pfam domain

were predicted. The most abundant domains were the

Cys2His2 Zinc finger (PF00096), the Protein kinase domain

(PF00069), and the WD40 repeat (PF00400) (Supplementary

Fig. 2A and B). For 72% of the single-isoform units no PFAM

domain was predicted (Fig. 1A), whereas when we examined the

longest protein-coding region from each multiple-isoform gene,

Fig. 1 Analysis of transcriptional units encoding either a single isoform

(single-isoform unit, or si Unit) or multiple protein-coding transcripts

(multiple-isoform cluster, or mi Cluster). (A) Pfam domain predictions

for clusters, showing the proportion with and without Pfam domain

predictions. (B) Protein–protein interactions (PPI) classified by the type

of gene that encodes each interacting protein (both si Units (s,s), both mi

Clusters (m,m), or one from each (m,s)). The pie chart depicts the

percentage of PPIs collected in this study according to this classification

(X2 = 1692.969, df = 2, P { 0.001).

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



85% had at least one predicted Pfam domain. A chi-squared

test shows this domain distribution to be highly significant

(X2 = 9021.24, df = 3, p { 0.0001). This contrasts with the

reported coverage of Pfam domains over human protein

sequences (B72%35), and suggests that these genes are enriched

with well-studied functions.

We further investigated the low level of domain prediction

observed in the single-isoform units, examining both sequence

length and conservation against orthologous mouse sequences

using SSEARCH.36 The average length of protein-coding

regions without domain prediction in this set (369 nt) was

one-third the length of regions with domain predictions (1014 nt).

Moreover, those protein-coding sequences with no predicted

domains showed low sequence identity when compared to a

presumptive ortholog in the mouse proteome (Supplementary

Table 1), averaging only 38% identity. In contrast, protein-

coding regions with no domain in multiple-isoform genes were

substantially longer on average (498 aa) and showed higher

average sequence identity to mouse (68%).

Variation of domain architecture

Here we are specifically interested in the use of domains likely to

be involved in protein interactions, which we refer to as Protein

Interaction Domains (PIDs). We define a PID as a Pfam domain

present in one of two structurally derived domain–domain inter-

action datasets, 3DID and iPfam (Materials and Methods). Of

11890 multi-isoform genes that contain predicted Pfam domains

(Fig. 1A), we identified a subset of 8860 in which at least one

protein-coding transcript is predicted to contain at least one PID.

We compared the PID architecture of isoforms within each

of these TUXs, defining five types of variation in PID archi-

tecture: identical, conserved, subset, mutually exclusive and

totally removed (Table 1). Except for the identical types, we

did not consider non-PID domains. Genes with identical

structure show no variation in domain architecture. For all

the other categories of variation, we considered the PIDs

predicted in each protein-coding transcript as a multiset

(a set for which we consider the number, but not the order,

of items). Due to the presence of more than two isoforms in

many clusters, individual genes may show more than one kind

of variation.

The conserved type indicates that no variation exists

between a pair of isoforms which have the same multiset of

PIDs (although other isoforms from the gene may vary). The

subset type of variation is seen where the set of PIDs in one

isoform is a subset of the PIDs in another isoform. Mutually

exclusive variation is found where two isoforms each have a

PID not found in the other. Duplicated domains may appear

as mutually exclusive if a novel domain is present in the

isoform with a lower domain copy number (e.g. for isoforms

P and P0, where P= [A1, A2, C1] and P0= [A1, B1, C1], A2 and

B1 are mutually exclusive).The last category, totally removed,

indicates that at least one protein-coding transcript has no

PIDs. A gene that does not have identical domain archi-

tectures (5612 in our dataset, see Table 1) may contain any

combination of kinds of variation, depending on the number

of isoforms and the diversity observed in the use of PIDs for

products of that gene. In this schema, repeats of a specific

domain are considered to be variation.

We used this classification to determine how many genes

show the different kinds of variable domain usage (Table 1,

derived from data presented in Supplementary Data File 1).

Notably, we found that of 8860 genes, 40% do not demon-

strate any variation in the use of PIDs. The proteins produced

from these transcripts are likely to have the same domain-

mediated interaction potential, although their interactions

may be modulated in other ways, for example through post-

translational variation.

As seen in Table 1, 40% of the clusters also produce at least

one isoform that has completely lost all of the known PIDs

found in other isoforms produced from that gene (totally

removed). Unless they contain currently unknown PIDs, these

isoforms are unlikely to interact with other proteins, lacking as

they do the structural elements required to underpin potential

interactions. The next-most-common kind of variation is the

loss of some, but not all, PIDs from an isoform (subset) �31%of our mi-TUXs contain such isoforms. These proteins may

share some, but not all, interaction partners with isoforms

containing more PIDs. The least common category of varia-

tion, mutually exclusive, was observed for relatively few genes

(7%). Finally, 48% of TUXs have at least two protein iso-

forms for which there is no variation in the PID architecture,

although other isoforms from that TUX may vary (conserved).

Table 1 Patterns of variation in protein interaction domains for genes with variable transcriptional output affecting the coding regions oftranscripts

Domain type All domainsaProtein interaction domainsb (across remaining 5612 genes with variablearchitectures)

Category of variation Identical architectures Conserved Subset Mutually exclusive Totally removed

Graphical representation of pattern

Number of TUXs with pattern Total:8860 3248 4282 2669 631 3562

a These identical domain architectures include PID and non-PID domains. All coding transcripts produced from these genes appear to generate

proteins with identical domain architectures. b Clusters can contain more than one kind of variation (Conserved (C), Subset (S), Mutually

exclusive (M), Totally removed (R)): 1413 genes have only one kind of variation (C = 320, S = 344, M = 14, R = 735), 3081 show two kinds

of variation (most common: CR = 1789, CS = 1043, other combinations total 249) and 1118 show three or more categories of variation

(most common CRS = 607).

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



We also examined a set of 4196 human signalling genes

derived from the Gene Ontology database. Using Entrez Gene

Identifiers, we mapped 4151 of these genes clusters in our

dataset, and determined that signalling genes were strongly

enriched for genes with variation in protein interaction

domains, and that reciprocally, variable genes are strongly

enriched for signalling function (X2 = 428.6, df = 3, p {0.0001). The full set of 8860 genes, with variation classified and

corresponding domain predictions is presented as Supplementary

Data 1.

Isoforms with altered interaction potential in a set of known

protein interactions

In order to determine the likely impact of these variations on

known protein–protein interactions (PPIs), we mapped our

dataset onto a collection of high-quality, experimentally

determined PPIs (see Materials and Methods). This process

mapped 28 309 interactions involving 8522 unique protein

isoforms. Genes that produce more than one distinct protein

encode the majority of the proteins involved in these PPIs.

Only 14% of proteins involved in our PPI dataset belong to

single-isoform units. As expected given these numbers, most

protein interactions (77%) occur between a pair of proteins

that are each encoded by a multiple-isoform gene. A substan-

tial proportion (20%) of interactions occur between a protein

from a multiple-isoform gene and another protein from a

single-isoform gene, whereas few protein interactions occur

between proteins encoded only by single-isoform genes (3%)

(Fig. 1B).

From this set, we identified 7805 PPIs in which the inter-

acting proteins together contain at least one pair of domains

known to interact (Supplementary Table 2). These interacting

protein pairs contain 1649 DDIs, the vast majority of which

(73%) are found in only five or fewer PPIs. Only 2% of DDIs

occur in more than 50 of our mapped protein interactions

(the full table of DDIs is provided in Supplementary Data 2).

These 41 domain pairs include many signalling-related

domains (involving common domains such as SH2, SH3,

and kinase-related domains) that are abundant in the human

proteome (see Supplementary Fig. 2). The pattern for DDI

frequency within the PPI set is similar: 63% of the DDIs occur

five or fewer times in our PPI set, while 4% occur over 200 times.

The 69 most-frequently occurring DDIs include interactions

between known protein interaction domains (such as Immuno-

globulin I-set, Fibronectin type III and Spectrin domains).

Many PPIs contain more than one DDI, but half of the 7805 PPIs

contain only one identified DDI and the great majority (86%)

contain five or fewer.

Our results show that variable use of PIDs occurs in 60%

of transcript clusters for which we have informative data

(Table 1 – all clusters not identical are variable, with the

exception of 320 transcript clusters with only Conserved PID

multisets). We used the loss or gain of PIDs (as described in

the previous section) to infer the interactive potential of the

protein products of 3814 genes that map to the set of 7805

reference PPIs. With respect to variation in PID architecture,

we observe three classes of genes: (i) those with only one

apparent protein product, which are therefore not capable of

giving rise to functionally differentiated products (614); (ii) those

which generate multiple protein products with identical PID

architectures (1413); and (iii) those which generate multiple

protein products with variable PID architectures (1787, Table 2).

Isoforms of the first two classes (53%) have a static interaction

domain architecture that is unlikely to affect connectivity.

However, nearly half of the genes that map to our reference

PPI network have isoforms with variation in the PID archi-

tecture that may modulate network connectivity.

We examined the effect of this variation on the connectivity

of proteins from these 1787 genes (Table 2). By far the most

common form of variation we found was the loss of all PIDs

from an isoform, implying that 1287 genes produce at least

one protein isoform incapable of participating in known

protein–protein interactions. In the second scenario, the number

of occurrences of a given domain varies. Only a small number of

genes is affected by such variation (215) and it is generally

unclear what effect such variation might have on connectivity.

An example of this type of variation involves the domain repeat

WD40 (PF00400), where 4–16WD40 repeats are known to form

a rigid scaffold for protein interactions. The outcome of an

alteration in the number of repeats is not obvious and it is not

clear if truncation of domain repeats affects a protein’s binding

potential. Finally, 644 genes showed a pattern of PID variation

in which loss of some domains changed interaction potential.

Below we explore examples of this kind of variation identified

through searches of the primary literature.

Table 2 Three possible scenarios in isoform interactions

Loss of whole DDI domain(s),no interaction

Variable members of DDI domains (repeats only),uncertain impact on interaction mechanism

Variable use of DDI domain(s),modulation of connectivity

Diagram

# of genes 1287 215 644

P1 and P01 are isoforms, and P2 and P3 represent partners known to interact with P1. Rectangles and circles in each panel indicate the presence of

interaction domains. The solid line between proteins represents a known interaction and a dot-line indicates predicted interaction based on DDI

existence in PPI. Genes may fall into more than one of the above categories.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



To characterise the variable genes (Table 2) more fully, we

performed functional enrichment clustering using the DAVID

Bioinformatics Database (see Material andMehtods). Clusters

with enrichment scores410 are presented in full in Supplementary

Data 3. The most-significantly enriched cluster was, unsurprisingly,

related to protein binding and protein dimerisation (enrich-

ment score = 30.02, p value range: 1.5 � 10�38–2.4 � 10�17).

The second most-enriched cluster was related to protein kinase

activity and phosphorylation (enrichment score = 27.46,

p value range: 5.3 � 10�42–1.1 � 10�18), while the third cluster

was related to the regulation and induction of apoptosis and

programmed cell death (enrichment score = 24.34, p value

range: 5.6 � 10�38–5.8 � 10�12). Other clusters with strong

enrichment scores and significant p values include (in decreas-

ing order of enrichment) associations with (i) cell junctions,

focal adhesion and the basolateral plasma membrane,

(ii) cell–cell adhesion, (iii) cell migration and motility,

(iv) regulation of kinase activity and MAPK signalling, and

(v) response to hormone and insulin stimulus. Reactome

pathways related to signalling, apoptosis, and cell surface

interactions were also enriched in this set (Supplementary

Data 3). These results are consistent with the strong enrich-

ment for signalling proteins in the whole variable dataset, and

also indicate that this subset at least contains a strong repre-

sentation of functions associated with cell adhesion and

motility, and apoptosis.

Validation and case studies

Literature searches for papers describing alternative splicing

and protein–protein interactions prioritised the review of fifty

articles, thirty of which met the selection criteria, that is,

they reported experimentally verified protein interactions with

isoform-level specificity. Of these papers, 17 described scenarios

in which alternative splicing changed protein interactions, or

had other significant effects likely to impact interactions, such

as changing the subcellular location of the protein. The effect

of alternative splicing on the 22 genes identified in these papers

is presented in Supplementary Table 3. We select two case

studies from this review (ADAM15 and NRP1), along with a

well-known signalling pathway (JAK-STAT pathway), to

illustrate the impact of changes in the PID architecture on

protein interactions. We also briefly highlight other examples in

which alternative splicing causes the disruption of protein–protein

interactions through other mechanisms (BRCA-1) or disrupts

protein-RNA interactions (RBM9).

Transcript variations alter protein interactions

ADAM15. The protein disintegrin and metalloproteinase

domain-containing protein 15 (ADAM15: ADA15_HUMAN)

is a single-pass type I membrane protein encoded by the

ADAM15 gene. The protein appears to localise largely to a

perinuclear compartment, where it may associate with the

trans-Golgi network or the late endosome.37 It has also been

shown to localise to adherens junctions in epithelial cells.38

Following standard Type I membrane protein topology, the

C-terminus of the protein remains in the cytoplasm.39 The

cytoplasmic part of ADAM15 isoforms is encoded by exons

18–23, and most ADAM15 protein isoforms contain proline-

rich SH3 ligand domains in this C-terminal region part.40

Domain analysis of the N-terminal (lumenal or extracellular)

region of ADAM15 isoforms indicates that most conserve the

N-terminal Disintegrin (PF00200) and ADAM, cysteine-rich

(PF08516) domains, and the peptidase functionality (PF01421

and PF01562) expected of a peptidase involved in ectodomain

shedding.41,42

Alternative splicing causes the cytoplasmic SH3 ligand

domains encoded by exons 18–23 to be assembled in different

combinations across six different isoforms: five contain

domains in various combinations, whereas one isoform lacks

any cytoplasmic SH3-binding domains.40 Two of the SH3

ligand domain subtypes bind most SH3 domains in known

binding partners Src, Tks5 and Lyn amongst others, while an

alternative domain subtype predominantly binds SNX9 and

SNX33. Notably, the isoform lacking SH3 ligand domains in

the C-terminal region fails to bind any SH3-domain-containing

proteins.40 This clearly demonstrates that alternative splicing of

protein interaction domains, in this case SH3 ligand domains,

alters the ability of isoforms to bind SH3-containing signalling

proteins, while the loss of all SH3 ligand domains completely

removes the ability of this protein transduce an intracellular signal

via SH3 signalling. This suggests a divergence in the functional

roles of the ADAM15 isoforms, allowing extracellular signalling

to stimulate alternative pathways of intracellular signalling, or

to completely decouple the external function of the molecule

(integrin binding and proteolysis) from signal transduction.

NRP1. Neuropilins are transmembrane glycoproteins with

large extracellular regions containing two CUB domains

(PF00431), two coagulation factor V/VIII homology domains,

and a MAM domain (PF00629).43,44 In addition they contain

a transmembrane domain and a short intracellular domain

with no clearly defined signalling function.45,46 The protein

neuropilin-1 (NRP1: NRP1_HUMAN) is encoded by the

NRP1 gene. To date, NRP1 is thought to function in

endothelial cells by enhancing vascular endothelial growth

factor (VEGF) binding to vascular endothelial growth factor

receptor 2 (KDR) and downstream signalling events, through

binding of NRP1 to VEGF and KDR.47 Isoform VEGF-165

promotes the formation of a complex of NRP1 and KDR in

endothelial cells, which is thought to be important for optimal

VEGF signalling and function.47 Moreover, the cytosolic

MAM domain of NRP1 is required for interaction of NRP1

and KDR.48

Based on our computational analysis as well as previous

experimental results, neuropilin-1 (NRP1) interacts with KDR

through the DDI between the MAM and immunoglobulin-like

C2-type domains (PF05790). Of the two NRP1 isoforms,

isoform 1 is a membrane-bound receptor involved in the

development of cardiovascular system, angiogenesis and organo-

genesis outside the nervous system, whereas isoform 2 is a

soluble protein.49 The interaction of isoform 1 with KDR

leads to increased VEGF-165 binding to KDR as well as

increased chemotaxis. Alternative splicing removes the signal

peptide and transmembrane domains to generate a soluble

protein. The MAM domain is also excluded from isoform 2 by

alternative splicing, thus disrupting the DDI between MAM

and ig-like C2-type domains. Consequently, the PPI between

NRP1 and KDR is lost.50

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



Transcript variation affects the JAK-STAT signalling pathway.

JAK-STAT signalling represents a classical signal transduc-

tion pathway, and is in many respects well-studied and well-

understood. We mapped our dataset of genes with variable

protein interaction domain (PID) architectures onto the 155 genes

recognised by KEGG as components of the JAK-STAT

signalling pathway: first we retrieved Uniprot IDs for the

KEGG pathway components, yielding 152 matches, and then

we retrieved the genes encoding these proteins from our

dataset. Of the genes encoding pathway elements, 72% con-

tained one or more isoforms, and 43 produced alternative

isoforms that did not include these domains. Fig. 2 shows the

KEGG JAK-STAT pathway. We have outlined in red any

node associated with genes for which we found variable PID

architecture in the protein products.

Several well-known genes in the pathway highlight the

potential impact of these variations on signal transduction:

STAT1. Two isoforms of STAT1, a and b, are known.

These isoforms are identical except that isoform b lacks 38

C-terminal residues present in the a form. This region encodes

the STAT1 TAZ2 binding domain (PF12162), which selec-

tively binds the TAZ2 domain of the CREB-binding protein

CREBBP (p300), an acetyltransferase involved in transcrip-

tional regulation.51 STAT1b is unable to bind CREBBP.52

Loss of binding to CREBBP results in the failure of STAT1bto activate transcription on chromatin templates, as CREBBP

functions as a normal STAT1 transcriptional activation

involves the CREBBP-mediated acetylation of histones at

the site of transcription.52

AKT1. There are three canonical transcripts for AKT1

recorded in Entrez Gene DB (GeneID 207), all of which

encode the same protein sequence. Our analysis identified an

additional four transcripts that encode additional putative

isoforms of the protein (Supplementary Data 1). Three distinct

PID architectures are seen for AKT1 products: the canonical

protein contains a plextrin homology (PH) domain (PF00169),

a protein kinase catalytic domain (PF00069) and a protein

kinase C-terminal domain (PF00433). Two isoforms retain the

kinase domains, but do not include the PH domain, raising the

possibility that there exist isoforms of AKT1 that retain kinase

function, but lose the ability to interact with other signalling

molecules that is usually mediated through the PH domain.

PTPN11. The non-receptor protein tyrosine phosphatase

PTPN11 (SHP-2) contains two SH2 domains in the N-terminal

region of the protein. It has been demonstrated that the

N-terminal SH2 domain binds Jak2, and is responsible for

the recruitment of Jak2 to the angiotensin II type AT1

receptor, whereas PTPN11 lacking this region is unable to

bind Jak2, and thus cannot recruit it to the receptor.53

Although this study was conducted using a cell line in which

a point mutation in PTPN11 results in aberrant splicing that

removes exon 3 and results in the deletion of the N-terminal

SH2 domain,54 we hypothesise that naturally occurring alter-

natively spliced isoforms that lack the SH2 domain would

demonstrate a similar uncoupling of downstream JAK-STAT

signalling from angiotensin II stimulation.

GRB2. The adaptor protein, growth factor receptor-bound

protein 2 encoding gene GRB2 generates a canonical protein

product that contains a SH2 domain flanked by two SH3

domains. This protein acts to couple receptor signalling (by

binding to receptor tyrosine kinases via the SH2 domain) to

downstream signal transduction (binding signalling molecules

through the SH3 domains).55 Alternative splicing of GRB2

generates an isoform (known as GRB3-3) lacking a functional

SH2 domain, but retaining the flanking SH3 domains. The

GRB3-3 isoform of GRB2 has been noted to have a dominant

Fig. 2 The JAK-STAT signalling pathway (hsa04630) from KEGG. Nodes associated with genes that have variable PID architecture are

highlighted in red.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



negative effect over canonical GRB2, causing apoptosis in an

over-expression system.56 It is generally expressed at very low

levels in adult tissue, and is often not detected. Experiments in

rat, however, detected the presence of this isoform spiking in

the rat hippocampus at the same time as a wave of pro-

grammed cell death responsible for neuronal pruning in that

region.56 GRB3-3 has also been observed to be selectively

up-regulated in HIV infected T-cells, where it appears to

promote an environment conducive to HIV replication

through a pathway unrelated to the normal signalling pathway

of the canonical GRB2 protein.57 Identification of a GRB3-3

specific binding partner, adenosine deaminase (ADA),58 which

is known to play a role in immunodeficiency, hints at a

potential mechanism for this effect, as well as pointing to the

possibility of GRB3-3-specific downstream signalling unrelated

to normal GRB2 signalling.

The strong representation of signalling proteins and

domains in our results, as illustrated by the JAK-STAT results

(1.5 times as many variable genes as expected, with a hyper-

geometric probability of 0.00073) suggests that the production

of proteins with tunable interaction potential generates

significant potential for plasticity in signalling networks, and

presents an important mechanism through which these key

information transduction networks can be rapidly modulated,

without disrupting the genomic encoding of core components

of the network, which remain expressed in canonical isoforms.

Alternative splicing changes the location of BRCA1, altering

its interaction profile. BRCA1 is a nuclear protein with a

molecular mass of 220 kDa. Defects in BRCA1 function have

been implicated as a cause of susceptibility to breast cancer,

breast-ovarian cancer familial type 1, and ovarian cancer.59,60

Experiments to identify the subcellular location of the BRCA1

isoform BRCA1D672-4095 have demonstrated that in contrast

to the full-length BRCA1, which is found primarily in the

nucleus, BRCA1D672-4095 is found in the cytoplasm.61

BRCA1D672-4095 is generated by exclusion of exon 11 by

in-frame splicing and produces a 97 kDa protein lacking a

functional nuclear localisation signal. Full-length BCRA1 protein

implements DNA-repair tasks by binding to FANCD1-BRCA2

and RAD51 in a nuclear complex.62 Isoforms missing exon 11,

however, are located only in the cytoplasm, and are therefore not

available to interact with BRCA1’s nuclear-localised binding

partners. This example illustrates how the splicing of a localisa-

tion signal resulting in the altered location of the protein within

the cell disrupts protein interactions that normally occur with

correctly localised binding partners, and renders the isoform

unavailable for its normal function.

Alternative splicing disrupts the protein–RNA interactions of

RBM9. RBM9 (Fox-2) is one member of Fox protein family,

members of which control the alternative splicing of many

transcripts in neurons, muscle, and other tissues. RBM9

produces proteins with a single RNA-binding domain

(RRM) flanked by N- and C-terminal domains that are highly

diversified through the utilization of alternative promoters and

alternative splicing patterns.63–65 All the isoforms have calcitonin

gene-related peptide regulator C terminal domains (PF12414)

and RNA recognition motif 1 domains (RRM_1) (PF00076),

though the RRM_1 domain of isoform 3 is 36 aa shorter than

the canonical one (67 aa). Fox-induced splicing creates a Fox

isoform that lacks a proper RRM_1 domain, and thus will

not mediate splicing enhancement or repression through the

UGCAUG element. Instead, the intact N- and C-terminal

domains of this isoform can counteract the effect of full-length

Fox proteins in enhancing a Fox-dependent exon. As a

consequence, rather than the auto-regulated splicing reducing

the overall level of the protein, the new isoform directly

antagonizes Fox activity.66

Discussion

Quality of the datasets

The most-recent analysis comparable to the work we present

here was published in 2004, from assembled EST data, and

covered B4500 genes.30 By contrast we considered B26 K

transcript clusters, and found informative data on 13 704 genes.

Our analysis is based on full-length cDNAs, avoiding the

problems of 30 and 50 bias seen in EST-based studies.67 We

also filtered domain–domain interactions to ensure only highly

confident DDIs with evidence from eukaryotic structures were

included in our analysis. Because of the stringent criteria we

applied in filtering the DDI data, some, perhaps many, inter-

actions may have been omitted from our dataset. For example,

the disintegrin domain occurs in the 3DID dataset, but because

DDIs including it were not supported by evidence from

eukaryotic PDB structures, DDIs with this domain were not

included in our dataset.

Based on our analysis of the integrated transcript set, we

were initially surprised at the low rate of variation, compared

with recent estimates of transcript variability, that range from

68–92% of genes.68,69 However, higher estimates include

several categories of variability in the untranslated regions of

transcripts,68 only 20% of which is estimated to impact open

reading frames.70 Thus the majority of these events will not be

captured in our variable dataset. Additionally, these studies

consider transcripts mapped to genes, and not all of our

transcripts map to gene identifiers. If we remove clusters not

mapped to known genes, our rate of variability increases to

67%, well within range of some recent estimates. We suspect

that low-quality or dubious transcripts are concentrated in the

single-isoform units rather than in the multiple-isoform genes.

Of those clusters that fail to map to Entrez Gene identifiers,

97% are units producing only a single coding transcript. This,

along with low rates of domain prediction and the very short

average length of sequences that fail to generate domains (369 nt),

suggests that at least the uncharacterised transcriptional units in

the single isoform set may be (i) of dubious quality and prove-

nance, (ii) potentially newly discovered genes, (iii) represent

transcripts of unknown function that do not necessarily encode

proteins, or, more likely (iv) simply represent short proteins which

are typically not well studied or annotated.71 We also noted that

4% of Entrez Gene IDs in our data set map to more than one

transcript cluster. In most of the above cases, one cluster has

multiple isoforms and the other is a single-isoform unit,

suggesting that transcripts assigned to some genes via gene

identifiers do not completely cluster using our criteria.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



We built our proteome dataset by examining the proteins

encoded by ORFs in our transcript dataset. If two transcripts

encoded the same protein sequence, the transcripts were

included in the cluster, but only one representative protein

sequence was included in the proteome set attached to that

gene. Single-nucleotide polymorphisms and minor sequence

variants that produce changes in the amino acid sequence are

included as distinct proteins in our dataset. In most cases, such

minor variation will not affect the domain composition or

function of the protein; however it is possible that variation at

critical points may produce a functionally distinct protein, as

seen, for example, in the human growth hormone receptor

GHR, in which many single-nucleotide polymorphisms have

been associated with diseases (NCBI OMIM database MIM

ID *600946). Inclusion of such sequences in our dataset

represents natural variations on the canonical human genome

sequence.

The DDI data used in this study were derived from high-

resolution structures deposited in the Protein Data Bank.72

We use the domain interaction datasets made available by

3DID and Pfam, without further curation. However, not all

the DDI pairs in the dataset may have a causative role in

protein interactions, but instead may owe their appearance in

the DDI data to protein interactions caused by other features.

Additionally, many domains known to have a role in signal

transduction (for example the SH2, SH3 and kinase domains

common in our dataset) may participate in transient, rather

than stable, protein interactions and may interact with peptide

regions rather than with structured domains. Although the

interaction target of such domains may not always be another

domain, their loss will nevertheless alter the ability of protein

isoforms to participate in transient signalling interactions, and

for this reason we consider the variable use of these domains to

be informative with respect to variation in the interactome.

Emerging theme of isoforms having opposing function

The computational results and examples presented here high-

light an interesting theme: protein isoforms frequently appear

to have opposing function. This trend has been noted for

enzymes, in which splicing events produce truncated versions

of the enzyme, lacking the active centre or deleting catalytic

domains, thus generating isoforms with a dominant negative

effect.73,74 A review of the functional impact of alternative

splicing75 describes alternatively spliced isoforms that display

a dominant negative effect with respect to interactions between

proteins and small ligands and nucleic acids, as well as in the

function of enzymes.

This trend is also seen in receptors. For example, in

membrane-bound receptors, alternative splicing of trans-

membrane domains can generate soluble isoforms which often

act as decoy receptors to decouple a signal from its trans-

membrane transduction. A classic example is the difference in

function of membrane-bound and soluble variants of the surface

antigen FAS (TNR6_HUMAN), for which the isoform integral

to the membrane promotes apoptosis, whereas an alternatively

spliced soluble isoform inhibits it.76 In another example, loss

of the ligand-binding domain (LBD) in androgen receptor

(AR) isoforms leads to constitutively active AR signalling,

uncoupled from ligand-binding activation (that is, signalling is

ON in the absence of ligand, whereas AR with a functional

LBD has signalling OFF in absence of ligand).77 The presence

of this constitutively active AR splice variant is known to be

a significant factor in development of hormone-refractory

prostate cancer.78

Unfortunately, most functional annotation (such as with

terms from the Gene Ontology) is aggregated at the gene level,

and very little transcript- or isoform-specific annotation is

available external to the literature or in machine-readable

format. Increasing appreciation for the importance of anno-

tating biological entities with greater resolution (i.e. at the

isoform-specific level, rather than at the gene or representative

protein level) will enable broader analysis of this phenomenon.

Variable use of other features also impacts interaction potential

of proteins

Other mechanisms through which alternative splicing or other

kinds of transcript variation may induce alterations in the

interactome are hinted at in the literature. Specifically, we

observe that variation in sequence features that encode locali-

sation signals (such as the NLS in BRCA1) can change the

location of the protein, resulting in the disruption of inter-

actions with otherwise co-located partners. A previous study

characterised extensive alteration in the use of other sequence

features (specifically signal peptides and transmembrane

domains) that determine localisation of mouse proteins.22

Given our understanding of the importance of co-localisation

for protein interactions,79 it is likely that many of these events,

which frequently result in dramatically different protein locali-

sation or topology with respect to the membrane, will have

consequences for the participation of those isoforms in molecular

interactions. There is also evidence that small sequence varia-

tions that do not disrupt domains may, nonetheless, have

potential to disrupt post-translational modifications. As many

binding proteins recognise specific post-translational modifi-

cations in target proteins (for example, PTPN11 binds Jak2 at

the phosphorylated Y201 residue,53 loss of which prevents

binding), such small variations may still alter interactions

between proteins.

Implications for the interactome and systems biology

There is evidence that the frequency of alternative splicing is

inversely proportional to gene or paralog copy number.80,81

This suggests that alternative splicing provides an alternative

to gene duplication and divergence as a source of protein

diversity. We also see a strong enrichment for signalling

pathways and function in our set of genes with variable PID

architecture, and a strong enrichment for variability in signal-

ling proteins. Together, these results suggest that the variation

at the level of the transcriptome we describe here presents a

more-rapid, context-specific mechanism to modulate the con-

nectivity of signalling systems than changes to the genome

itself. Other studies have also demonstrated that the inter-

actions of signalling networks evolve at a faster rate than

many other kinds of interactions82 highlighting the importance

of plasticity and robustness in these systems. We see examples of

isoforms which retain selective binding ability while decoupling

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



physical interactions with other proteins or complexes, or

alternate between signalling pathways (see the ADAM15 and

GRB2 examples above).

It is now known that the production of variable transcripts

can be regulated in a tissue- or developmental stage-specific

manner.83–86 For this reason, we hypothesise that modulation

of interaction networks by different protein isoforms will also

demonstrate tissue- or stage-specificity. Evidence in the litera-

ture supports this hypothesis. For example, FLT1 (VEGF

receptor-1) gene expresses a soluble, kinase-deficient isoform

in the murine cornea.87 This isoform functions as ligand trap,

soaking up the growth factor ligands of FLT1 and preventing

the development of vasculature in the cornea. This demon-

strates how the dominant negative effect of isoforms generates

a functional result in a tissue-specific manner.

Since isoforms frequently exhibit very different interaction

capabilities, it is critical to identify specific isoforms in experi-

ments, and in the capture of literature-based metadata for PPI

databases. A shallow (one protein deep) and flattened (ignoring

the specificity introduced by various cell types representing

different tissues or developmental stages) interactome completely

fails to capture the complexity present in the full set of molecular

interactions supported by a proteome of 468 000 protein

isoforms, many with the capability for distinct and indepen-

dent function. The results we present here demonstrate that

protein interactions need to be analysed at the level of protein

isoforms, not of representative proteins or, worse, genes.

Increasingly, advanced-generation sequencing datasets will

capture organism-specific, tissue-specific, or even cell-type-

specific transcript sequences. Consideration of the impact of

transcript diversity will enable context-specific networks to be

derived from these data, as opposed to mapping data to a

shallowly annotated reference interactome.

Material and Methods

Transcriptome data

Two transcriptional datasets were integrated: transcrip-

tional units (TUs) from FANTOM3 (2004-10-17)2,88 and

H-Invitational clusters (HIXs) from H-Invitational DB

(H-InvDB) (2009-03-30).4,89 In our study, only HIXs and

TUs which are constructed based on the full-length cDNAs

and encode proteins were used. We identified transcriptional

units in our merged dataset using an internal transcriptional

unit index (TUX). Clusters were built using the following

shared attributes: genomic loci, mRNAs and assigned Entrez

GeneID (or gene identifier) annotated for each TU and HIX.

Then interacting proteins in the collected PPI set were mapped

to the corresponding cluster and isoform based on either an

exact sequence match to an isoform within a cluster or the

assigned gene identifier for the protein (Supplementary Table 4).

In the FANTOM dataset, if a TU is composed of more than

one transcription framework (a region containing commonly

expressed regions and sharing common transcriptional

features, described in ref. 2) the TU was split into separate

clusters based on the annotated transcriptional frameworks.2,90

Clusters from either set composed of chimeric genes were

discarded. To obtain a non-redundant protein set from each

HIX or TU, we used the isoform protein set (IPS)2 from

FANTOM3 and collected proteins annotated as an isoform

verified by full length cDNA in H-InvDB. Thus, if a HIX

or TU contained no protein-coding transcript, it was not

included in the merge process (depicted in Supplementary

Fig. 3) outlined in the steps below:

STEP (1) TUs in FANTOM3 were built based on the

human genome 17 (hg17) while HIXs in H-InvDB were

constructed based on the human genome 18 (hg18). For

consistency of locus annotation between datasets, we mapped

TUs from hg17 to hg18 by using ‘liftover’ in the UCSC

genome browser91 with the following options: minimum ratio

of bases that must remap (1.0), minimum chain size in target (0),

minimum hit size in query (0), allow multiple output regions

(NO) and min ratio of alignment blocks/exons that must

map (1). After the mapping process, we retain 21480 TUs

from FANTOM3 and 18513 HIXs from H-InvDB for inte-

gration (Supplementary Table 5).

STEP (2) We grouped HIXs and TUs according to their

orientation on the human genome: 20119 and 19874 (HIXs

and TUs in total) for the FORWARD and REVERSE strand

sets, respectively.

STEP (3) HIXs and TUs were merged when the genomic

locations of their transcripts overlapped in the same strand.

HIXs and TUs within a group were regrouped based on

mRNAs produced from each cluster member to distinguish

overlapped genes such as nested genes.92,93 For this regrouping,

we used GenBank accession numbers to identify each mRNA.

When HIXs or TUs within a cluster share at least one mRNA,

they were merged into a sub-cluster. For a cluster in which

there is no common mRNA, we used the gene identifier given

to each transcriptional unit as a proxy for shared transcripts.

Thus, if a transcriptional unit has the same gene identifier as

another transcriptional unit, they are merged. For example,

the cluster consisting of TU7908, HIX0023813 and TU30007

was separated into two groups based on shared transcripts:

HIX0023813 and TU30007 share AK130277 (cluster 2), while

TU7908 (cluster 1) has no common mRNA or gene identifier

with TU30007 or HIX0023813.

STEP (4) We reassessed our clusters by comparing the gene

identifiers of the transcripts in each cluster. If there were any

transcripts with different annotated gene identifiers, we discarded

those clusters from our study. We also examined redundant

isoforms presented in different clusters. If corresponding clusters

have different GI, those clusters were removed from our dataset.

Protein and protein interaction data

Interaction databases and transcriptome data sources use

different standard systems for protein identifiers. We imple-

mented the CRC64 checksum algorithm94 for consistent

identification of protein sequences. All protein sequences in

interaction databases and two isoform sets from H-InvDB and

FANTOM3 were assigned to 16 digits consisting of numbers

and letters, using SPcrc (ftp://ftp.ebi.ac.uk/pub/software/swis

sprot/Swissknife/old/SPcrc.tar.gz).

We downloaded human protein–protein interaction (PPI)

sets obtained from six publicly available PPI databases: BIND

(2006-05-25),95 DIP (2008-10-14),10 HPRD (2009-06-07),96

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



IntAct (2009-08-05),9 MINT (2009-07-29)11 and MPPI

(2005).97 We considered only PPIs defined as binary, direct

physical interactions. We excluded indirectly predicted inter-

actions in which PPIs are predicted based on the SPOKE and

the MATRIX models98 because of the high false-positive (FP)

rate of those predicted interactions from protein complexes.99

Then, we merged PPIs based on sequence identity of interacting

proteins (see above Section 1).

Artificial conditions in high-throughput PPI detection methods

are known to generate a high proportion of false positive PPIs,

especially in yeast-two-hybrid screens (Y2H).100 PPIs discovered

through low-throughput screens (LTS) and supported by several

different experimental methods showed more-reliable interactions

than those identified in high-throughput screens (HTS).101 To

obtain a more-reliable PPI set, we filtered PPIs based on the

discovery methods and the lines of evidence, the scale of a

detection method and supporting literature:

1. PPIs generated from only the Y2H analysis were

removed from each PPI dataset due to the high FP rate;102

2. then, we integrated all of the six PPI datasets with

43104 PPIs and 10484 proteins (Supplementary Table 2); and

3. we retained a PPI in our dataset if the interaction has

been reported in more than one paper, or if a PPI was detected

in a small-scale experiment (one where the associated paper

describes fewer than 10 PPIs).

This process resulted in a reliable PPI dataset of 29761

interactions between 8872 proteins (Supplementary Table 2).

We assigned an interacting protein in the PPI set to the

corresponding transcript cluster where that interacting protein

is encoded. For the assignment, the protein sequence and GI

were used. If an interacting protein matches to an isoform with

100 percent sequence identity over the entire length, this

protein was considered to be encoded by that transcript and

to belong to the corresponding gene. If no exactly matched

isoforms could be found, interacting proteins were mapped to

genes identified by the same gene identifier as the interacting

protein. This process mapped 28 309 interactions involving

8522 proteins from our dataset.

Domain prediction and domain–domain interactions

To examine domain architecture for isoforms and interacting

proteins, we used Pfam–A (version 24.035). We used ‘pfam_scan.pl’

which is based on HMM3.0 beta 3103 with Pfam-A.hmm for

the profile-hidden Markov models. 54 630 out of 68748 protein

isoforms in our dataset have at least one Pfam domain.

We took domain–domain interactions (DDIs) from iPfam

(2007, Pfam 21)27 and 3DID (2010, Pfam 24).26 Both DDI sets

were generated based on known, high-resolution 3D structures

available in PDB.104 Only DDIs found in known 3D structures

of eukaryotes were collected, with 1962 DDIs contributed by

iPfam and 2642 DDIs contributed by 3DID to give a total of

3021 DDIs (Supplementary Fig. 4). For this study, we accept

DDIs that result from both inter- and intra-chain molecular inter-

actions. From 3DID, we used only DDIs with a Z-score Z1.3

corresponding to a significance of over 90%.105 These conditions

are stringent, and generate a conservative list of DDIs.

The impact of variation in transcript sequences on potential

isoform interactions was inferred with reference to our DDI

dataset and known protein interactions (Supplementary

Fig. 5). First, the integrated set of DDIs was mapped to our

set of high-quality human PPIs. We then identified interacting

protein pairs that contain at least one DDI, and examine the

isoforms of these reference proteins to see if they maintain

the relevant domains. If a pair of isoforms maintains one of

the DDIs found in the reference PPI, this pair of isoforms was

designated a potential isoform interaction.

We analysed the frequency of protein interaction domains

(PIDs) in isoforms and PPIs (Supplementary Data 2), counting

the number of hits for a given PID in isoforms. For the

abundance of DDI in PPIs, numbers of hits for a given DDI

in PPIs were counted. In both analyses, when a given domain

or DDI was repeated in an isoform or a PPI, only one hit was

counted.

Statistical tests

Enrichment tests for the JAK-STAT pathway were based on

the hypergeometric distribution function, against the back-

ground of our complete dataset, in which genes were assigned

as either variable or not variable. For the JAK-STAT pathway

enrichment, 155 gene identifiers associated with the pathway

were downloaded from KEGG. Clusters were mapped to the

JAK-STAT set via Entrez Gene IDs. Five genes mapped to

two clusters (Gene IDs: 867, 2057, 3575, 10000, and 30837). In

these cases, we took the value (variable or not variable) of the

multi-isoform cluster mapped to the gene in preference to the

single-isoform unit. Four JAK-STAT genes were not mapped

to clusters in our dataset (Gene IDs: 1270, 3563, 10379 and

64109). In the absence of data, these were assigned as not

variable. All other enrichment tests were calculated using the

Chi-squared distribution.

The statistical significance of differences in length, and

identity of proteins from single-isoform units compared with

proteins from multiple-isoform clusters, were calculated using

Student’s T test. The analysis of the distribution of categories

of PPI in Fig. 1B used an expected frequency calculated from

the conditional probability of picking a pair of proteins

given the number of proteins in each category (P(m,m) = 0.66,

P(m,s) = 0.30, P(s,s) = 0.04). Pairs are undirected, so P(m,s)

includes P(m,s)=0.15 and P(s,m) = 0.15.

Functional analysis of the variable genes mapped to PPIs

was conducted using the DAVID Bioinformatics Database

(http://david.abcc.ncifcrf.gov/). Entrez Gene IDs associated

with our variable multiple-isoform clusters were uploaded

as a query list, and compared to the whole human genome

background. Gene ontology categories Biological Process,

Molecular Function and Cellular component were collectively

interrogated using the functional annotation clustering

method. DAVID pathway analysis was conducted against

annotation from the Reactome Pathway Database. The p values

we report from DAVID control for multiple hypothesis testing

using the Benjamini-Hochberg correction. Results of these

enrichment tests are available in Supplementary Data File 3.

Signalling proteins were identified based on annotation in

the Gene Ontology Database, where 7053 gene products are

annotated with the term GO:0023052 Signalling, or annotated

with children of this term. We collected these proteins from

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



the human gene association file (September 2011). Uniprot

entries for the Uniprot IDs in this file were downloaded, and

Entrez Gene IDs and HGNC IDs were extracted from the

entries. If Entrez Gene IDs were not present in the Uniprot

protein entry, the HGNC ID was used to retrieve Entrez Gene

IDs from the HGNC database. The 571 proteins for which we

were unable to retrieve an Entrez Gene ID were omitted from

the analysis. These lists were converted to a non-redundant set

of Entrez Gene IDs, which were mapped onto our dataset to

assign a value of either variable or not variable.

Acknowledgements

The authors acknowledge the computational infrastructure

provided by theQueensland Facility for Advanced Bioinformatics,

which is supported by a Queensland Government Smart State

grant. The authors also thank Chang Liu for his contribution

to the STAT1 case study, and Dr Stefan Maetschke for

technical advice. MJD and MAR are funded by the Australian

Research Council [grant number DP110103384].

References

1 G. W. Beadle and E. L. Tatum, Proc. Natl. Acad. Sci. U. S. A.,1941, 27, 499–506.

2 P. Carninci, T. Kasukawa, S. Katayama, J. Gough, M. C. Frith,N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells,R. Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner,S. Batalov, A. R. Forrest, M. Zavolan, M. J. Davis, L. G.Wilming, V. Aidinis, J. E. Allen, A. Ambesi-Impiombato,R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. Bansal, L. Baxter,K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu,V. Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe,E. Dalla, B. P. Dalrymple, B. de Bono, G. Della Gatta,D. di Bernardo, T. Down, P. Engstrom, M. Fagiolini, G. Faulkner,C. F. Fletcher, T. Fukushima, M. Furuno, S. Futaki, M. Gariboldi,P. Georgii-Hemming, T. R. Gingeras, T. Gojobori, R. E. Green,S. Gustincich, M. Harbers, Y. Hayashi, T. K. Hensch, N. Hirokawa,D. Hill, L. Huminiecki, M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa,M. Jakt, A. Kanapin, M. Katoh, Y. Kawasawa, J. Kelso,H. Kitamura, H. Kitano, G. Kollias, S. P. Krishnan, A. Kruger,S. K. Kummerfeld, I. V. Kurochkin, L. F. Lareau, D. Lazarevic,L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan Babu,M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki,F. Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder,N. Nakano, H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi,S. Nishikawa, F. Nori, O. Ohara, Y. Okazaki, V. Orlando,K. C. Pang, W. J. Pavan, G. Pavesi, G. Pesole, N. Petrovsky,S. Piazza, J. Reed, J. F. Reid, B. Z. Ring, M. Ringwald, B. Rost,Y. Ruan, S. L. Salzberg, A. Sandelin, C. Schneider, C. Schonbach,K. Sekiguchi, C. A. Semple, S. Seno, L. Sessa, Y. Sheng, Y. Shibata,H. Shimada, K. Shimada, D. Silva, B. Sinclair, S. Sperling, E. Stupka,K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. Tammoja,S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann,H. R. Ueda, E. van Nimwegen, R. Verardo, C. L. Wei, K. Yagi,H. Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult,S. M. Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic,J. Quackenbush, C. Wahlestedt, J. S. Mattick, D. A. Hume,C. Kai, D. Sasaki, Y. Tomaru, S. Fukuda, M. Kanamori-Katayama,M. Suzuki, J. Aoki, T. Arakawa, J. Iida, K. Imamura, M. Itoh,T. Kato, H. Kawaji, N. Kawagashira, T. Kawashima, M. Kojima,S. Kondo, H. Konno, K. Nakano, N. Ninomiya, T. Nishio,M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. Suzuki,M. Tagami, K. Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki,J. Kawai and Y. Hayashizaki, Science, 2005, 309, 1559–1563.

3 Y. Hayashizaki and P. Carninci, PLoS Genet., 2006, 2, e63.4 T. Imanishi, T. Itoh, Y. Suzuki, C. O’Donovan, S. Fukuchi,K. O. Koyanagi, R. A. Barrero, T. Tamura, Y. Yamaguchi-Kabata, M. Tanino, K. Yura, S. Miyazaki, K. Ikeo, K. Homma,

A. Kasprzyk, T. Nishikawa, M. Hirakawa, J. Thierry-Mieg,D. Thierry-Mieg, J. Ashurst, L. Jia, M. Nakao, M. A. Thomas,N. Mulder, Y. Karavidopoulou, L. Jin, S. Kim, T. Yasuda,B. Lenhard, E. Eveno, Y. Suzuki, C. Yamasaki, J. Takeda,C. Gough, P. Hilton, Y. Fujii, H. Sakai, S. Tanaka, C. Amid,M. Bellgard, F. Bonaldo Mde, H. Bono, S. K. Bromberg,A. J. Brookes, E. Bruford, P. Carninci, C. Chelala,C. Couillault, S. J. de Souza, M. A. Debily, M. D. Devignes,I. Dubchak, T. Endo, A. Estreicher, E. Eyras, K. Fukami-Kobayashi, G. R. Gopinath, E. Graudens, Y. Hahn, M. Han,Z. G. Han, K. Hanada, H. Hanaoka, E. Harada, K. Hashimoto,U. Hinz, M. Hirai, T. Hishiki, I. Hopkinson, S. Imbeaud,H. Inoko, A. Kanapin, Y. Kaneko, T. Kasukawa, J. Kelso,P. Kersey, R. Kikuno, K. Kimura, B. Korn, V. Kuryshev,I. Makalowska, T. Makino, S. Mano, R. Mariage-Samson,J. Mashima, H. Matsuda, H. W. Mewes, S. Minoshima,K. Nagai, H. Nagasaki, N. Nagata, R. Nigam, O. Ogasawara,O. Ohara, M. Ohtsubo, N. Okada, T. Okido, S. Oota, M. Ota,T. Ota, T. Otsuki, D. Piatier-Tonneau, A. Poustka, S. X. Ren,N. Saitou, K. Sakai, S. Sakamoto, R. Sakate, I. Schupp,F. Servant, S. Sherry, R. Shiba, N. Shimizu, M. Shimoyama,A. J. Simpson, B. Soares, C. Steward, M. Suwa, M. Suzuki,A. Takahashi, G. Tamiya, H. Tanaka, T. Taylor, J. D. Terwilliger,P. Unneberg, V. Veeramachaneni, S. Watanabe, L. Wilming,N. Yasuda, H. S. Yoo, M. Stodolsky, W. Makalowski, M. Go,K. Nakai, T. Takagi, M. Kanehisa, Y. Sakaki, J. Quackenbush,Y. Okazaki, Y. Hayashizaki, W. Hide, R. Chakraborty,K. Nishikawa, H. Sugawara, Y. Tateno, Z. Chen, M. Oishi,P. Tonellato, R. Apweiler, K. Okubo, L. Wagner, S. Wiemann,R. L. Strausberg, T. Isogai, C. Auffray, N. Nomura, T. Gojoboriand S. Sugano, PLoS Biol., 2004, 2, e162.

5 A. J. Matlin, F. Clark and C. W. Smith, Nat. Rev. Mol. Cell Biol.,2005, 6, 386–398.

6 B. Modrek and C. Lee, Nat. Genet., 2002, 30, 13–19.7 B. R. Graveley, Trends Genet., 2001, 17, 100–107.8 Y. Hasegawa and Y. Hayashizaki, in Introduction to Systems

Biology, ed. S. Choi, Humana Press, Totowa, 2007, pp. 85–105.9 S. Kerrien, Y. Alam-Faruque, B. Aranda, I. Bancarz, A. Bridge,

C. Derow, E. Dimmer, M. Feuermann, A. Friedrichsen, R. Huntley,C. Kohler, J. Khadake, C. Leroy, A. Liban, C. Lieftink, L.Montecchi-Palazzi, S. Orchard, J. Risse, K. Robbe, B. Roechert, D. Thorneycroft,Y. Zhang, R. Apweiler and H. Hermjakob, Nucleic Acids Res., 2007,35, D561–D565.

10 L. Salwinski, C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowieand D. Eisenberg, Nucleic Acids Res., 2004, 32, D449–D451.

11 A. Chatr-aryamontri, A. Ceol, L. M. Palazzi, G. Nardelli,M. V. Schneider, L. Castagnoli and G. Cesareni, Nucleic AcidsRes., 2007, 35, D572–D574.

12 D. Szklarczyk, A. Franceschini, M. Kuhn, M. Simonovic,A. Roth, P. Minguez, T. Doerks, M. Stark, J. Muller, P. Bork,L. J. Jensen and C. von Mering, Nucleic Acids Res., 2011, 39,D561–D568.

13 C. Prieto and J. De Las Rivas, Nucleic Acids Res., 2006, 34,W298–W302.

14 A. Kamburov, K. Pentchev, H. Galicka, C. Wierling, H. Lehrachand R. Herwig, Nucleic Acids Res., 2011, 39, D712–D717.

15 R. M. Kaake, T. Milenkovic, N. Przulj, P. Kaiser and L. Huang,J. Proteome Res., 9, 2016–2029.

16 A. J. Lusis and J. N. Weiss, Circulation, 2010, 121, 157–170.17 B. Lehne and T. Schlitt, Hum. Genomics, 2009, 3, 291–297.18 A. Chatr-aryamontri, S. Kerrien, J. Khadake, S. Orchard, A. Ceol,

L. Licata, L. Castagnoli, S. Costa, C. Derow, R. Huntley,B. Aranda, C. Leroy, D. Thorneycroft, R. Apweiler, G. Cesareniand H. Hermjakob, GenomeBiology, 2008, 9(Suppl 2), S5.

19 S. Mathivanan, B. Periaswamy, T. K. Gandhi, K. Kandasamy,S. Suresh, R. Mohmood, Y. L. Ramachandra and A. Pandey,BMC Bioinformatics, 2006, 7(Suppl 5), S19.

20 E. V. Kriventseva, I. Koch, R. Apweiler, M. Vingron, P. Bork,M. S. Gelfand and S. Sunyaev, Trends Genet., 2003, 19, 124–128.

21 M. Lorenz, B. Hewing, J. Hui, A. Zepp, G. Baumann,A. Bindereif, V. Stangl and K. Stangl, FASEB J., 2007, 21,1556–1564.

22 M. J. Davis, K. A. Hanson, F. Clark, J. L. Fink, F. Zhang,T. Kasukawa, C. Kai, J. Kawai, P. Carninci, Y. Hayashizaki andR. D. Teasdale, PLoS Genet., 2006, 2, e46.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



23 T. Pawson and P. Nash, Science, 2003, 300, 445–452.24 P. Aloy and R. B. Russell, FEBS Lett., 2005, 579,

1854–1858.25 T. Pawson and P. Nash, Genes Dev., 2000, 14, 1027–1047.26 A. Stein, A. Panjkovich and P. Aloy, Nucleic Acids Res., 2009, 37,

D300–D304.27 R. D. Finn, M. Marshall and A. Bateman, Bioinformatics, 2005,

21, 410–412.28 M. Deng, S. Mehta, F. Sun and T. Chen, Genome Res., 2002, 12,

1540–1548.29 R. Riley, C. Lee, C. Sabatti and D. Eisenberg, GenomeBiology,

2005, 6, R89.30 A. Resch, Y. Xing, B. Modrek, M. Gorlick, R. Riley and C. Lee,

J. Proteome Res., 2004, 3, 76–83.31 K. Yura, M. Shionyu, K. Hagino, A. Hijikata, Y. Hirashima,

T. Nakahara, T. Eguchi, K. Shinoda, A. Yamaguchi,K. Takahashi, T. Itoh, T. Imanishi, T. Gojobori and M. Go,Gene, 2006, 380, 63–71.

32 P. R. Romero, S. Zaidi, Y. Y. Fang, V. N. Uversky, P. Radivojac,C. J. Oldfield, M. S. Cortese, M. Sickmeier, T. LeGall,Z. Obradovic and A. K. Dunker, Proc. Natl. Acad. Sci. U. S. A.,2006, 103, 8390–8395.

33 A. Valletti, A. Anselmo, M. Mangiulli, I. Boria, F. Mignone,G. Merla, V. D’Angelo, A. Tullo, E. Sbisa, A. D’Erchia andG. Pesole, Mol. Cancer, 2010, 9, 230.

34 J. Kawai, A. Shinagawa, K. Shibata, M. Yoshino, M. Itoh,Y. Ishii, T. Arakawa, A. Hara, Y. Fukunishi, H. Konno,J. Adachi, S. Fukuda, K. Aizawa, M. Izawa, K. Nishi,H. Kiyosawa, S. Kondo, I. Yamanaka, T. Saito, Y. Okazaki,T. Gojobori, H. Bono, T. Kasukawa, R. Saito, K. Kadota,H. Matsuda, M. Ashburner, S. Batalov, T. Casavant,W. Fleischmann, T. Gaasterland, C. Gissi, B. King,H. Kochiwa, P. Kuehl, S. Lewis, Y. Matsuo, I. Nikaido,G. Pesole, J. Quackenbush, L. M. Schriml, F. Staubli,R. Suzuki, M. Tomita, L. Wagner, T. Washio, K. Sakai,T. Okido, M. Furuno, H. Aono, R. Baldarelli, G. Barsh,J. Blake, D. Boffelli, N. Bojunga, P. Carninci, M. F. de Bonaldo,M. J. Brownstein, C. Bult, C. Fletcher, M. Fujita, M. Gariboldi,S. Gustincich, D. Hill, M. Hofmann, D. A. Hume, M. Kamiya,N. H. Lee, P. Lyons, L. Marchionni, J. Mashima, J. Mazzarelli,P. Mombaerts, P. Nordone, B. Ring, M. Ringwald, I. Rodriguez,N. Sakamoto, H. Sasaki, K. Sato, C. Schonbach, T. Seya,Y. Shibata, K. F. Storch, H. Suzuki, K. Toyo-oka,K. H. Wang, C. Weitz, C. Whittaker, L. Wilming,A. Wynshaw-Boris, K. Yoshida, Y. Hasegawa, H. Kawaji,S. Kohtsuki and Y. Hayashizaki, Nature, 2001, 409, 685–690.

35 R. D. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger,J. E. Pollington, O. L. Gavin, P. Gunasekaran, G. Ceric,K. Forslund, L. Holm, E. L. Sonnhammer, S. R. Eddy andA. Bateman, Nucleic Acids Res., 2010, 38, D211–D222.

36 W. R. Pearson, Methods Mol. Biol., 2000, 132, 185–219.37 L. Lum, M. S. Reid and C. P. Blobel, J. Biol. Chem., 1998, 273,

26236–26247.38 C. Ham, B. Levkau, E. W. Raines and B. Herren, Exp. Cell Res.,

2002, 279, 239–247.39 M. Higy, T. Junne and M. Spiess, Biochemistry, 2004, 43,

12716–12722.40 I. Kleino, R. M. Ortiz, M. Yritys, A.-P. J. Huovila and

K. Saksela, J. Cell. Biochem., 2009, 108, 877–885.41 D. R. Dries and G. Yu, Proc. Natl. Acad. Sci. U. S. A., 2009, 106,

14737–14738.42 P. D. Sbarba and E. Rovida, Biol. Chem., 2002, 383, 69–83.43 B. Herzog, C. Pellet-Many, G. Britton, B. Hartzoulakis and

I. C. Zachary, Mol. Biol. Cell, 2011, 22, 2766–2776.44 C. Pellet-Many, P. Frankel, H. Jia and I. Zachary, Biochem. J.,

2008, 411, 211–226.45 H. Fujisawa and T. Kitsukawa, Curr. Opin. Neurobiol., 1998, 8,

587–592.46 M. Rossignol, M. L. Gagnon and M. Klagsbrun, Genomics, 2000,

70, 211–222.47 S. Soker, H. Q. Miao, M. Nomi, S. Takashima and

M. Klagsbrun, J. Cell. Biochem., 2002, 85, 357–368.48 C. Prahst, M. Heroult, A. A. Lanahan, N. Uziel, O. Kessler,

N. Shraga-Heled, M. Simons, G. Neufeld and H. G. Augustin,J. Biol. Chem., 2008, 283, 25110–25114.

49 B. A. Appleton, P. Wu, J. Maloney, J. Yin, W. C. Liang,S. Stawicki, K. Mortara, K. K. Bowman, J. M. Elliott,W. Desmarais, J. F. Bazan, A. Bagri, M. Tessier-Lavigne,A. W. Koch, Y. Wu, R. J. Watts and C. Wiesmann, EMBO J.,2007, 26, 4902–4912.

50 M. L. Gagnon, D. R. Bielenberg, Z. Gechtman, H. Q. Miao,S. Takashima, S. Soker and M. Klagsbrun, Proc. Natl. Acad. Sci.U. S. A., 2000, 97, 2573–2578.

51 S. Biethahn, F. Alves, S. Wilde and W. Hiddemann, Exp.Hematol., 1999, 27, 885–894.

52 N. Zakharova, E. S. Lymar, E. Yang, S. Malik, J. J. Zhang,R. G. Roeder and J. E. Darnell, J. Biol. Chem., 2003, 278,43067–43073.

53 M. D. Godeny, J. Sayyah, D. VonDerLinden, M. Johns, D. A.Ostrov, J. Caldwell-Busby and P. P. Sayeski, Cell. Signalling,2007, 19, 600–609.

54 T. M. Saxton, M. Henkemeyer, S. Gasca, R. Shen, D. J. Rossi,F. Shalaby, G.-S. Feng and T. Pawson, EMBO J., 1997, 16,2352–2364.

55 E. J. Lowenstein, R. J. Daly, A. G. Batzer, W. Li, B. Margolis,R. Lammers, A. Ullrich, E. Y. Skolnik, D. Bar-Sagi andJ. Schlessinger, Cell, 1992, 70, 431–442.

56 I. Fath, F. Schweighoffer, I. Rey, M.-C. Multon, J. Boiziau,M. Duchesne and B. Tocque, Science, 1994, 264, 971–974.

57 X. Li, M.-C. Multon, Y. Henin, F. Schweighoffer, C. Venot,J. Josef, C. Zhou, J. LaVecchio, P. Stuckert, M. Raab,A. Mhashilkar, B. Tocque and W. A. Marasco, The Journal ofBiological Chemistry, 2000, 275, 30925–30933.

58 F. Ramos-Morales, A. Domınguez, R. M. Rios, S. I. Barroso,C. Infante, F. Schweighoffer, B. Tocque, J. Pintor-Toro andM. Tortolero, Biochem. Biophys. Res. Commun., 1997, 237,735–740.

59 S. Cantor, D. Bell, S. Ganesan, E. Kass, R. Drapkin,S. Grossman, D. Wahrer, D. Sgroi, W. Lane and D. Haber, Cell,2001, 105, 149–160.

60 F. Durocher, D. Shattuck-Eidens, M. McClure, F. Labrie,M. Skolnick, D. Goldgar and J. Simard, Hum. Mol. Genet.,1996, 5, 835.

61 S. Thakur, H. B. Zhang, Y. Peng, H. Le, B. Carroll, T. Ward,J. Yao, L. M. Farid, F. J. Couch, R. B. Wilson and B. L. Weber,Mol. Cell. Biol., 1997, 17, 444–452.

62 E. Witt and A. Ashworth, Science, 2002, 297, 534.63 G. W. Yeo, E. Van Nostrand, D. Holste, T. Poggio and

C. B. Burge, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 2850–2855.64 J. L. Ponthier, C. Schluepen, W. Chen, R. A. Lersch, S. L. Gee,

V. C. Hou, A. J. Lo, S. A. Short, J. A. Chasis, J. C. Winkelmannand J. G. Conboy, J. Biol. Chem., 2006, 281, 12468–12474.

65 A. P. Baraniak, J. R. Chen and M. A. Garcia-Blanco, Mol. Cell.Biol., 2006, 26, 1209–1222.

66 A. Damianov and D. L. Black, RNA, 16, 405–416.67 B. J. Blencowe, Cell, 2006, 126, 37–47.68 E. T. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang,

C. Mayr, S. F. Kingsmore, G. P. Schroth and C. B. Burge,Nature, 2008, 456, 470–476.

69 G. Koscielny, V. Le Texier, C. Gopalakrishnan, V. Kumanduri,J. J. Riethoven, F. Nardone, E. Stanley, C. Fallsehr,O. Hofmann, M. Kull, E. Harrington, S. Boue, E. Eyras,M. Plass, F. Lopez, W. Ritchie, V. Moucadel, T. Ara,H. Pospisil, A. Herrmann, G. R. J. R. Guigo, P. Bork,M. K. Doeberitz, J. Vilo, W. Hide, R. Apweiler,T. A. Thanaraj and D. Gautheret, Genomics, 2009, 93, 213–220.

70 F. Denoeud, P. Kapranov, C. Ucla, A. Frankish, R. Castelo,J. Drenkow, J. Largarde, T. Alioto, C. Manzano, J. Chrast,S. Dike, C. Wyss, C. Henrichsen, N. Holroyd, M. Dickson,R. Taylor, Z. Hance, S. Foissac, R. Myers, J. Rogers,T. Hubbard, J. Harrow, R. Guigo, T. Gingeras, S. Antonarakisand A. Reymond, Genome Res., 2007, 17, 746–759.

71 M. C. Frith, A. R. Forrest, E. Nourbakhsh, K. C. Pang, C. Kai,J. Kawai, P. Carninci, Y. Hayashizaki, T. L. Bailey andS. M. Grimmond, PLoS Genet., 2006, 2, e52.

72 A. Kouranov, L. Xie, J. de la Cruz, L. Chen, J. Westbrook,P. Bourne and H. Berman, Nucleic AcidsRes., 2006, 34,302–305.

73 S. Li and A. E. Koromilas, Journal of Biological Chemistry, 2001,276, 13881–13890.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online



74 Y. Stasiv, M. Regulski, B. Kuzin, T. Tully and G. Enikolopov,J. Biol. Chem., 2001, 276, 42241–42251.

75 S. Stamm, S. Ben-Ari, I. Rafalska, Y. Tang, Z. Zhang, D. Toiber,T. A. Thanaraj and H. Soreq, Gene, 2005, 344, 1–20.

76 I. Cascino, G. Fiucci, G. Papoff and G. Ruberti, Immunology,1995, 154, 2706–2713.

77 S. M. Dehm and D. J. Tindall, Endocrine Related Cancer, 2011.78 R. Hu, T. A. Dunn, S. Wei, S. Isharwal, R. W. Veltri,

E. Humphries, M. Han, A. W. Partin, R. L. Vessella,W. B. Isaacs, G. S. Bova and J. Luo, Cancer Res., 2009, 69, 16–22.

79 C. J. Shin, S. Wong, M. J. Davis and M. A. Ragan, BMC Syst.Biol., 2009, 3, 28.

80 N. M. Kopelman, D. Lancet and I. Yanai, Nat. Genet., 2005, 37,588–589.

81 Z. Su, J. Wang, J. Yu, X. Huang and X. Gu, Genome Res., 2006,16, 182–189.

82 C. Shou, N. Bhardwaj, H. Y. K. Lam, K.-K. Yan, P. M. Kim,M. Snyder and M. B. Gerstein, PLoS Comput. Biol., 2011,7, e1001050.

83 S.-J. Noh, K. Lee, H. Paik and C.-G. Hur, DNA Res., 2006, 13,229–243.

84 Q. Xu, B. Modrek and C. Lee, Nucleic Acids Res., 2002, 30,3754–3766.

85 Y. Xing and C. J. Lee, PLoS Genet., 2005, 1, e34.86 K. Shimokawa, Y. Okamura-Oho, T. Kurita, M. C. Frith,

J. Kawai, P. Carninci and Y. Hayashizaki, BMC Bioinformatics,2007, 8, 161.

87 B. Ambati, M. Nozaki, N. Singh, A. Takeda, P. Jani, T. Suthar,R. Albuquerque, E. Richter, E. Sakurai, M. Newcomb,M. Kleinman, R. Caldwell, Q. Lin, Y. Ogura, A. Orecchia,D. Samuelson, D. Agnew, J. St Leger, W. Green,P. Mahasreshti, D. Curiel, D. Kwan, H. Marsh, S. Ikeda,L. Leiper, J. Collinson, S. Bogdanovich, T. Khurana,M. Shibuya, M. Baldwin, N. Ferrara, H. Gerber, S. De Falco,J. Witta, J. Baffi, B. Raisler and J. Ambati, Nature, 2006, 443.

88 T. Kasukawa, S. Katayama, H. Kawaji, H. Suzuki, D. A. Humeand Y. Hayashizaki, Genomics, 2004, 84, 913–921.

89 J. Takeda, Y. Suzuki, M. Nakao, R. A. Barrero, K. O. Koyanagi,L. Jin, C. Motono, H. Hata, T. Isogai, K. Nagai, T. Otsuki,V. Kuryshev, M. Shionyu, K. Yura, M. Go, J. Thierry-Mieg,D. Thierry-Mieg, S. Wiemann, N. Nomura, S. Sugano,T. Gojobori and T. Imanishi, Nucleic Acids Res., 2006, 34,3917–3928.

90 S. Gustincich, A. Sandelin, C. Plessy, S. Katayama, R. Simone,D. Lazarevic, Y. Hayashizaki and P. Carninci, J. Physiol., 2006,575, 321–332.

91 Q. Pan, O. Shai, L. J. Lee, B. J. Frey and B. J. Blencowe,Nat. Genet., 2008, 40, 1413–1415.

92 A. Kumar, Eukaryotic Cell, 2009, 8, 1321–1329.93 J. Takeda, Y. Suzuki, M. Nakao, T. Kuroda, S. Sugano,

T. Gojobori and T. Imanishi, Nucleic Acids Res., 2007, 35,D104–D109.

94 W. H. Press and W. H. Press, Numerical recipes in C: the art ofscientific computing, Cambridge University Press, Cambridge,2nd edn, 1992.

95 C. Alfarano, C. E. Andrade, K. Anthony, N. Bahroos, M. Bajec,K. Bantoft, D. Betel, B. Bobechko, K. Boutilier, E. Burgess,K. Buzadzija, R. Cavero, C. D’Abreo, I. Donaldson,D. Dorairajoo, M. J. Dumontier, M. R. Dumontier, V. Earles,R. Farrall, H. Feldman, E. Garderman, Y. Gong, R. Gonzaga,V. Grytsan, E. Gryz, V. Gu, E. Haldorsen, A. Halupa, R. Haw,A. Hrvojic, L. Hurrell, R. Isserlin, F. Jack, F. Juma, A. Khan,T. Kon, S. Konopinsky, V. Le, E. Lee, S. Ling, M. Magidin,J. Moniakis, J. Montojo, S. Moore, B. Muskat, I. Ng,J. P. Paraiso, B. Parker, G. Pintilie, R. Pirone, J. J. Salama,S. Sgro, T. Shan, Y. Shu, J. Siew, D. Skinner, K. Snyder,R. Stasiuk, D. Strumpf, B. Tuekam, S. Tao, Z. Wang,M. White, R. Willis, C. Wolting, S. Wong, A. Wrong, C. Xin,R. Yao, B. Yates, S. Zhang, K. Zheng, T. Pawson, B. F. Ouelletteand C. W. Hogue, Nucleic Acids Res., 2005, 33, D418–D424.

96 T. S. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar,S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen,A. Venugopal, L. Balakrishnan, A. Marimuthu, S. Banerjee,D. S. Somanathan, A. Sebastian, S. Rani, S. Ray, C. J. HarrysKishore, S. Kanth, M. Ahmed, M. K. Kashyap, R. Mohmood,Y. L. Ramachandra, V. Krishna, B. A. Rahiman, S. Mohan,P. Ranganathan, S. Ramabadran, R. Chaerkady and A. Pandey,Nucleic Acids Res., 2009, 37, D767–D772.

97 P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, P. Mark, V. Stumpflen,H. W. Mewes, A. Ruepp and D. Frishman, Bioinformatics, 2005,21, 832–834.

98 G. D. Bader and C. W. Hogue, Nat. Biotechnol., 2002, 20,991–997.

99 B. Zhang, B. H. Park, T. Karpinets and N. F. Samatova,Bioinformatics, 2008, 24, 979–986.

100 S. Fields, FEBS J., 2005, 272, 5391–5399.101 C. von Mering, R. Krause, B. Snel, M. Cornell, S. G. Oliver,

S. Fields and P. Bork, Nature, 2002, 417, 399–403.102 E. Sprinzak, S. Sattath and H. Margalit, J. Mol. Biol., 2003, 327,

919–923.103 S. R. Eddy, Bioinformatics, 1998, 14, 755–763.104 S. Dutta, K. Burkhardt, J. Young, G. J. Swaminathan,

T. Matsuura, K. Henrick, H. Nakamura and H. M. Berman,Mol. Biotechnol., 2009, 42, 1–13.

105 P. Aloy and R. B. Russell, Bioinformatics, 2003, 19, 161–162.

Publ

ishe

d on

01

June

201

2. D

ownl

oade

d by

Uni

vers

itat A

uton

oma

de B

arce

lona

on

30/1

0/20

14 1

2:40

:45.

View Article Online


Documents

Rewiring the dynamic interactome