Upload
phungminh
View
218
Download
0
Embed Size (px)
Citation preview
Network analyses for functional genomic screens incancer
by
Jennifer L. Wilson
B.S. Biomedical Engineering, University of Virginia, 2010
SUBMITTED TO THE DEPARTMENT OF BIOLOGICAL ENGINEERING IN PARTIALFULFILLMENT OF THE REQUIREMENTS FOR, THE DEGREE OF
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
JUNE 2016
@Massachusetts Institute of Technology, 2016. All rights reserved.
Signature of Author:
MASSACHUSETTS INSTITUTEOF TECHNOLOGY
JUN 0 8 2016
LIBRARIESARCHIVE
Signature redacted(7 Department of Biological Engineering
Signature redactedCertified by:
May 3rd, 2016
Douglas Lauffenburger
Biological Engineering
Signature redactedAccepted by: -
Thesis Advisor
Forest M. White
Chair, Graduate Program Committee
This doctoral thesis has been examined by the following committee:
Ernest Fraenkel (Thesis Committee Chair), Biological Engineering, MIT
Michael Hemann, Department of Biology, MIT
Andreas Herrlich, Brigham and Women's, Harvard Institutes of Medicine
2
Network analyses for functional genomic screens in cancer
by
Jennifer L. Wilson
Submitted to the Department of Biological Engineering on May 3, 2016 in partialfulfillment of the requirements for the degree of Doctor of Philosophy
Abstract
Gene interference screens are a widely adopted and popular tool for uncovering genefunction but imperfections in the technology limit the power of these investigations. Thereare many completed and on-going RNAi investigations across a multitude of biological sys-tems because these experiments are scalable, cost-effective, and relatively easily adaptedto multiple experimental environments. The most influential disadvantage is that many ofthe individual reagents are non-specific and interfere with genes other than the intendedtarget. Efforts to improve limitations in RNAi have focused on statistical models and improv-ing reagents, yet have not explored using biological context to select gene targets. Thisthesis uses network modeling and data integration to provide context for gene interferencestudies, and demonstrates the utility of this approach in two systems:
Acute Lymphoblastic Leukemia (ALL) is a disease of undifferentiated B-cells that re-sults from accumulation of genetic lesions, yet we have an incomplete understanding of allgenes contributing to the disease and how they interact. To discover genetic mediators ofthis disease, we employ a genome-scale shRNA screen, and complement this data withdifferential mRNA expression and ChIP-seq data using network integration. The integratedmodel identifies processes not represented in any input set and predicts novel genes con-tributing to disease. We specifically validate the role of Wwpl as a tumor suppressor inALL.
Aberrant growth factor pathway activity drives cancer pathology and is the target ofmolecular cancer therapies. Specifically, the epidermal growth factor receptor (EFGR)pathway and its ligand, transforming growth factor alpha (TGFo) are clinically relevant togastric cancer. We use an shRNA screen and Prize Collecting Steiner Forest (PCSF) algo-rithm to discover the pathway regulating TGF shedding. This pathway identifies commonregulators of TGFa shedding and NFxB regulation, yet targeting NFxB and the EGFR path-way has thus far been unsuccessful in cancer therapies. Our network identifies IRAK1 asa viable path forward for modulating both TGFL and NFxB in gastric cancer.
Thesis Supervisor: Douglas A. LauffenburgerTitle: Ford Professor of Bioengineering
3
Acknowledgements
My years at MIT have been some of the best of my life so far. I have very much enjoyed
the enthusiasm of this environment, admired the quality of my peers, and appreciate the
challenges that come with research. It will be a bitter sweet departure.
Firstly, I am eternally grateful to my advisor Doug Lauffenburger. Without him, I may
not have pursued graduate school, let alone at MIT. Doug creates a collaborative, warm,
and enabling culture in the department and in his lab. His people skills are second to none,
and his ability to navigate difficult situations while supporting his students is impressive.
I thank you for your ability to diffuse whatever panic, frustration, or anxiety I brought to
our meetings, and for creating the relief to think critically about the science and tackle
challenging problems. Thank you for encouraging me to pursue this work and for making
it possible.
My thesis committee has also been instrumental to my success. My project was highly
collaborative, and I was fortunate to work closely with so many high caliber individuals.
Ernest Fraenkel - thank you for inviting me to join the lab and become a Fraenkelian.
The lab's expertise was invaluable to my work, and the immersion into the computational
environment was essential to all of my projects. Mike Hemann - thank you for including me
at group meetings, and opening your lab for experimental work. I will never forget the night
where you injected ~30 mice for me in under 30 minutes. Andreas Herrlich - Thank you for
taking a chance on my network modeling approaches, and allowing me to work on some
incredibly interesting biology with your group.
I would like to thank my colleagues and office mates in BE and beyond: Hsinhwa Le,
Sara Gosline, Simona Dalin, Eirini Kefaloyianni, Brian Joughin, Christina Harrison, Sarah
Schrier, Allison Claas, Melody Morris, Annelian Zweemer, Manu Kumar, Eleanor Fiedler,
Pete Bruno, Emanuel Kreidl, Nurcan Tuncbag, Chris Ng, Gabriela Pregernig, Anthony
Soltis, Susy Ramos, Jordan Bartlebaugh, and Aran Parillo. All of you have contributed to
making this work possible, and I would need an entire document to thank you individually.
Additionally, I want to express gratitude to my undergraduate student, Christina "Tia" Har-
rison. You've been an incredible companion and mentee through these projects and have
4
encouraged me to think about the work in a different light. I wish you the best in your future
endeavors as a scientist.
To my friends, Jenn Brophy, Byron Kwan, Marcus Parrish, and Erin Turowski, thank
you for regular games nights, hugs, and good food. My residents at McCormick Hall for
making my home life special; I have been humbled to know all of you and watch you
progress through MIT. I would like to thank the BE Communications Lab for the opportunity
to develop as a communication coach and develop as a professional scientist. Thank you
Jaime Goldstein, Scott Olesen, Diana Chien, and Georgia Lagoudas.
I would not be where I am now without the MIT Cycling Team and would like to thank
them separately as this group has been a definitive part of my time at MIT. Thank you for
accepting me as a novice rider, teaching me to ride a bike, and allowing me to develop
a new hobby. I cannot say enough about how teammates mitigate the stress of graduate
school and the uncertainty of research. I will miss training camps, morning coffee (and
pastry) rides, racing the team time trial in skinsuits and aero helmets, eating pie after
riding up big mountains, and having the opportunity to compete at collegiate nationals.
Thank you to my coaches, Nicole Freedman and Kolie Moore for supporting me through
training and teaching me how to race bikes. I'll look forward to seeing many of you for the
usual weekend rides that now increasing occur in the greater San Francisco area.
Lastly, this work wouldn't have been possible without my family. They have been sup-
portive of my unpredictable schedule, enthusiasm for all things bike racing, and have made
the highs and lows of this lifestyle bearable. I want to thank my mother, Carol Wilson, for
being an example of a nurturing, loving, career-driven, and successful female role model.
I want to thank my father, Charles Wilson, for being compassionate, understanding, and
for the many, many cards. I have kept them all. I would like to thank my younger sister,
Kelsey, for being the good daughter and for being a life companion; she knows me better
than most, and it takes a special person to have grown up with me and still want to hang
out. Thank you.
5
Contents
I Introduction 11
The role and consequences of biological pathways . . . . . . . . . . . . . . . . . 12
Network analyses for functional genomic screens . . . . . . . . . . . . . . . . . . 14
II Integrated Network Analyses for Functional Genomic Studies in Can-
cer 17
RNAi screens as a tool for cancer biology . . . . . . . . . . . . . . . . . . . . . . 18
RNAi screening challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Biological networks add contextual value . . . . . . . . . . . . . . . . . . . . . . 21
Functional pathways explain variability among genes classified as "hits" . . . . . 23
Network integration is a viable tool for hypothesis generation . . . . . . . . . . . 24
Reciprocal engineering for future gene-interference investigations . . . . . . . . 26
III Pathway-based network modeling finds hidden genes in shRNA screen
for regulators of Acute Lymphoblastic Leukemia 31
Insight, innovation, integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
R esults . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . 34
A network-based data integration scheme . . . . . . . . . . . . . . . . . . . 34
shRNA screen and mRNA expression data identify distinct and incomplete
gene sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Measuring histone activation for model specificity . . . . . . . . . . . . . . 40
SAMNet identifies a network of genes affecting ALL progression . . . . . . 41
Integrated approach finds genes connecting disparate data sets . . . . . . 45
Robustness metrics prioritize and predict genes relevant for ALL progression 47
Predicted pathway contains genes contributing to ALL progression specifi-
cally in vivo and genes affecting B-cell viability . . . . . . . . . . . . 48
6
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Supplemental Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
IV Network modeling identifies common regulators of NFxB and TGFa
ectodomain shedding 62
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Results ..... ......................................... 67
A kinome/phosphatome screen for regulators of TGFu shedding . . . . . . 67
An shEnrich method for selecting gene candidates . . . . . . . . . . . . . . 68
PCSF identifies a TGFca shedding pathway . . . . . . . . . . . . . . . . . . 72
Robustness analysis prioritizes predicted pathway genes . . . . . . . . . . 75
NFxB activation and TGFa shedding share common regulators . . . . . . . 79
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Supplemental Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Supplemental Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
V Mutual information identifies gene cohorts predictive of response to
Dasatinib treatment in Acute Lymphoblastic Leukemia 99
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
An shRNA screen measures shRNA representation before and after treatmenti 05
A mutual information heuristic selects cohorts of co-acting shRNAs . . . . . 105
shRNA cohorts identify gene targets affecting disease progression and re-
sponse to treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7
Biclustering benchmarks mutual information heuristic . . . . . . . . . . . . . 116
In vitro validation measures shRNA cohort depleting in response to treatment 116
Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
VI Conclusions and Discussion 122
Characterization of cancer pathways will affect personalized and precision
medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Loss of function characterizations for forward engineering approaches . . . 126
8
List of Figures
1 Network Filtering Uses Functional Relations to Identify Candidate Targets. 28
2 Reciprocal Engineering Balances Forward and Reverse Engineering Princi-
ples . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Constructing a network model from multiple 'omic measurements. . . . . . 35
4 ChIP-seq with valley-finding identifies regions for transcription-factor binding. 41
5 SAMNet identifies integrated network for ALL progression. . . . . . . . . . . 44
6 SAMNet selects transcription factors that explain genes with greatest differ-
ential expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7 Validation shows in vitro and in vivo effects for Hgs and Wwpl. . . . . . . . 50
8 The whole network, unformatted as created by SAMNet. . . . . . . . . . . . 60
9 Percent reduction in mRNA after knockdown as measured via qPCR. . . . . 61
10 An shRNA screen measures kinase and phosphatase effects on TGFax shed-
ding ............................................. 68
11 shEnrich selects genes for consistency and effect size . . . . . . . . . . . . 71
12 PCSF identifies the TGFx shedding regulatory network . . . . . . . . . . . . 74
13 TAB1 increases and XIAP decreases TGFc at the surface . . . . . . . . . . 80
14 histograms for shEnrich (forward) . . . . . . . . . . . . . . . . . . . . . . . . 91
15 histograms for shEnrich (reverse) . . . . . . . . . . . . . . . . . . . . . . . . 92
16 Scatter plots for shEnrich method . . . . . . . . . . . . . . . . . . . . . . . . 93
17 Optimization of network node selection . . . . . . . . . . . . . . . . . . . . . 94
18 The full network selected by PCSF . . . . . . . . . . . . . . . . . . . . . . . 95
19 Centrality metrics test gene robustness. . . . . . . . . . . . . . . . . . . . . 96
20 Lightening plots for NFxB regulators. . . . . . . . . . . . . . . . . . . . . . 97
21 A longitudinal screen for mediators of ALL and response to Dasatinib . . . . 105
22 Hierarchical clustering of 'early' and 'late' fold-change data . . . . . . . . . 106
23 Intra-cluster distances for all mice . . . . . . . . . . . . . . . . . . . . . . . . 108
24 K-means clustering for all mice with k=9 . . . . . . . . . . . . . . . . . . . . 109
9
25 Mutual information between mice shows some mice are more similar than
others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
26 Component mutual information scores show that few clusters have high mu-
tual inform ation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
27 Hypergeometric testing of component comparisons. . . . . . . . . . . . . . 112
28 shRNA fold-changes for clusters with significant overlap . . . . . . . . . . . 115
29 M9cO and m3cl identify overlapping gene targets . . . . . . . . . . . . . . . 116
30 Biclustering of shRNAs across mice. . . . . . . . . . . . . . . . . . . . . . . 117
31 Branch 931 recapitulates trends in m9cO/m3cl clusters. . . . . . . . . . . . 118
32 In vitro competition assay shows shRNA cohort sensitizes ALL to Dasatinib
treatm ent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
List of Tables
1 Top 1 % of depleting shRNAs in vitro and in vivo. . . . . . . . . . . . . . . . 37
2 GO enrichment of shRNA targets from the in vitro screen. . . . . . . . . . 38
3 Genes selected as top candidates from mRNA expression data. . . . . . . 39
4 GO enrichment for genes up-regulated in vivo. . . . . . . . . . . . . . . . . 40
5 GO enrichment of network gene identifies processes associated with B-cell
leukem ia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Aggregate scoring of top network nodes. . . . . . . . . . . . . . . . . . . . 48
7 Gene hits selected by shEnrich . . . . . . . . . . . . . . . . . . . . . . . . . 72
8 Experimental genes ranked by multiple robustness metrics . . . . . . . . . 77
9 Predicted genes ranked by multiple robustness metrics . . . . . . . . . . . 78
10 Comparison of alternative scoring metrics for redundant shRNAs. . . . . . 98
11 Gene-targeting shRNAs in significantly overlapping clusters . . . . . . . . . 114
12 Gene targets overlapping between m9c0 and m3cl . . . . . . . . . . . . . . 116
10
Part I
Introduction
Gene interference technology and the advent of high-throughput functional
genomics
The ability to modulate gene function is central to pursuing many biological hypotheses.
Before the 1990s, classical genetic approaches depended upon the creation of mutated
organisms (e.g. plants, mice, yeast) to test gene function. The time required to test gene
function was enormous compared to modern methods, and this greatly limited the number
of genes tested and the number of contexts in which one could interrogate gene func-
tion. Fire and Mello's discovery of RNA-interference (RNAi) [99] drastically changed this
landscape by removing the need to create individual mutant organisms. By mimicking the
dsRNAs of invading pathogens, the technology manipulated the cell's inherent ability to
recognize and silence dsRNA sequences. Cells can process these dsRNA sequences
in a way that selectively binds, and degrades existing mRNA sequences that are compli-
mentary to the original dsRNA. Coupled with oligonucleotide synthesis technology, it was
relatively fast and easy to make these dsRNA reagents. Further, because the dsRNAs se-
lectively degraded complimentary sequences, one could target any gene by designing the
dsRNA to match a gene of interest. Finally, RNAi enabled manipulation of gene function in
living systems ad hoc and could be applied to many conditions, treatments, etc.
Nowadays, RNA-interference (RNAi) is just one tool among many functional genomic
technologies that enable the modulation of gene function. Other technologies include TAL-
ENS, CRISPR, and many varieties of RNAi (shRNAs, siRNAs, esiRNAs). All of these
technologies aimed to improve gene knockdown and used different techniques to reach
this goal. shRNAs and siRNAs differ in the chemical structure of the dsRNA, esiRNAs
undergo an enzymatic pre-processing before introduction into the cell, and TALENS and
CRISPR rely on distinct enzymatic processes for gene processing. For biologists, this
expanded tool set was an exciting addition to the artillery available for interrogating the
11
cell. With an experimental assay for a phenotype of interest, scientists could use RNAi to
uncover the contribution of a gene of interest to many biological processes.
In the age of high-throughput biological sciences, gene-interference technologies have
moved to screening platforms where many genes can be tested in tandem. Several exam-
ples of RNAi screens highlight the types of biological questions enabled by this technology:
the Broad Achilles project used genome wide RNAi screens in over 100 cancer cell lines
to systematically identify essential genes, and compare these effects across multiple can-
cer types[89]. Meacham et al adapted a genome-scale screen to test genetic mediators
of lymphoma in vivo[92]. Further applications include using RNAi to identify genes that
affect homologous recombination [1], searching for genes that exhibit synthetic lethality
with KRAS mutations[130], and identifying genes that regulate ectodomain shedding[24].
It appeared that RNAi technology had transitioned classical genetics into the realm of high-
throughput biology.
The role and consequences of biological pathways
Pathways biology appreciates the interconnectedness of gene products and is an attrac-
tive application of RNAi technologies. Pathways thinking considers a gene as part of a
collection of genes rather than as a single entity. In contrast to pathways thinking, reduc-
tionist theories characterize individual parts within the cell - genes, proteins, enzymes - but
are unable to appreciate the full spectrum of their interactions. For a simplistic example,
if knock-down of a given gene yielded reduced cell growth, one could infer that the gene
was required for the cell viability phenotype. In practice, these individual parts function in
a vast network of interactions, and biological phenotypes such as growth result from the
concerted effort of many genes. The effect of concerted gene action is that sometimes
a knocking down a single gene doesn't show an effect on a phenotype such as growth.
Especially in the case of disease characterization, biologists have come to appreciate the
interconnectedness and complexity of combinations of genes that give rise to pathology.
In cancer, biologists have recognized that combinations of gene alterations give rise to dis-
ease. At its simplest level, the discrepancy between reductionist and pathway biology can
12
be explained through a sports analogy: auditioning an athlete separately from the team
will rarely predict whether or not the team will win the game; rather it is how that player in-
teracts with and supports the other players that determines victory. Given the relative ease
of adaptation and the ability to conduct experiments in high-throughput, RNAi appeared
an attractive technology for perturbing whole pathways and presented a viable avenue for
moving classical genetic approaches into the realm of pathways biology.
As we discover new pathways using functional genetics, these gene annotations are
added to a growing cohort of existing pathways. Databases catalogue the sets of molecu-
lar interactions that constitute pathways. Examples of these databases include KEGG, GO,
STRING, MINT, HMDB, and iRefWeb. Annotated entries include growth factor pathways,
biochemical pathways, drug pathways, and many others. As we explore more biological
systems - for instance targeted therapies in cancer - it becomes possible to identify path-
ways for nuanced aspects of biology - pathways for disease progression, drug response,
and relapse.
Biological pathways have real consequences for cancer biology and therapeutic de-
velopment. Molecular-based medicine aims to classify cancers based on molecular vul-
nerabilities and then target these vulnerabilities with therapeutic compounds. A striking
example is that of Herceptin therapy, which was designed to target HER2, a protein recep-
tor that is over-expressed in some types of cancers. The treatment gained attention as it
increased overall survival for breast cancer patients with over-expressed HER2 receptor.
Extensive profiling of cancers, at the cell and patient levels, have endeavored to find these
molecular vulnerabilities, design novel therapies, and bring these to the clinic. However,
these molecular vulnerabilities belong to cellular pathways that are interconnected and re-
dundant. As a result, these pathways cause even well-designed and specific chemical
inhibitors to fail in the clinic, or worse, cause patients to become resistant to the initial ther-
apy. Even in the case of the acclaimed Herceptin therapy, patients saw remissions and
metastases after treatment. Metaphorically, this process can be compared to plugging a
leak in a water-filled balloon, except that plugging the initial leak forces additional leaks or
the expansion of smaller, and previously unnoticed leaks. Thus, the field of cancer biology
has moved to investigating these pathways holistically to characterize all molecular vulner-
13
abilities to try and design the best therapeutic interventions. A more extensive discussion
about resistance to therapy and the importance of pathways is included in Chapter 3.
Network analyses for functional genomic screens
RNAi quickly became a well-developed and widely adopted technology in the cancer bi-
ology field. There are numerous screens employed to tackle many biological problems.
However, attractive as the technology us, RNAi is not without limitations. The largest crit-
icisms arise as a result of 'off-target' effects. Primary off-target effects occur when the
RNAi reagent (siRNAs, shRNAs, etc.) have sequence matches with genes other than the
original target. Certain amounts of similarity between the sequences in dsRNA could af-
fect multiple mRNAs, and lead researchers to misidentify genes responsible for a given
phenotype. These effects diminish the power of any individual RNAi in a screen, making it
difficult to reproduce and validate results.
Scientists in RNAi biology have explored many techniques to fix the reagent-based lim-
itations of RNAi. Innovation has focused on improving sequence matching, adding chem-
ical modifications to each RNAi reagent to promote specific binding, creating algorithms
to predict if and when sequence mismatching will occur, and deploying redundant RNAis
against the same gene in aggregate to ensure on-target effects. I refer to these innovations
as reagent-based because they exclusively consider the performance of the technology to
improve existing approaches. I more thoroughly review these limitations and current inno-
vations in Chapter 1.
However, knowing that these genes belong to cellular pathways became a novel av-
enue for interpreting and modeling the results from these screens. For instance, using an
orchestral analogy, it's possible to continue the performance if any individual player (ex-
cluding soloists) is unable to play; the instrument section can compensate for this loss and
the concert continues. In cellular systems, which have evolved to be robust to perturba-
tions (such as mutations or copy number alterations), this same redundancy exists. Some
genes exist within redundant pathways or sections, and thus, even if you knock them down,
it may not be possible to 'hear' the effects of their loss. Knowing that genes act in concert
14
shifts the emphasis from discovering single genes to discovering cohorts of genes respon-
sible for a phenotype. This perspective changes the interpretation of RNAi screen results
and improves the utility of existing RNAi investigations.
However, as mentioned, pathway annotations are incomplete, and so resolving cohorts
from RNAi data is not as simple as matching the genes to an existing directory of biology.
Instead, we can leverage other biological information to create plausible pathways which
best explain the data. There exists databases of protein-protein interactions; catalogues
of which molecular entities (proteins, enzymes, receptors, genes, etc.) affiliate with each
other. Metaphorically, this information is akin to knowing the available roadways of biology.
If we think about genes as intersections of these biological roadways, we can infer path-
ways by collecting interactions between genes that scored in RNAi experiments. In this
process, we will also computationally predict additional gene intersections that are rele-
vant to our phenotype. This type of pathway discovery is valuable for growing our existing
pathway annotations.
In this thesis I explore the nexus of RNAi screens, and pathways construction in the
context of cancer. I employ network biology techniques to construct plausible biological
roadmaps using RNAi datasets, and use targeted molecular biology techniques to validate
the predictions from these models. This work has two main contributions: an extension of
seminal work to find pathways from loss-of-function screens[1 66] where I explore methods
for testing the robustness of network predictions and gain practical insights into how to find
signal within RNAi datasets that have not been designed for downstream network model-
ing. I expand the value of this methodological extension by validating specific predictions
from these networks to uncover new aspects of biology relevant to cancer.
The first chapter of this thesis work explores the challenges and limitations of RNAi
screening technology. While the majority of work has focused on improving the technology
itself, few investigations have considered using biology to overcome the limitations of RNAi.
I review functional genomic studies, their limitations, and articulate how network integra-
tion and pathways approaches can improve these investigations. In the second section, I
identify a pathway governing Acute Lymphoblastic Leukemia (ALL) progression using net-
work integration. This project demonstrates the value of network-based approaches and
15
uncovers novel biology missed by the original shRNA screen. In the third section, we in-
vestigate signaling regulators of TGFa shedding, a cellular growth factor that is important
for multiple biological processes. We construct a pathway around shRNA data and predict
novel genes affecting TGFa shedding. The network predicts that modulators of the NFxB
pathway also belong to the TGFa shedding pathway, thus identifying the first connection
between the NFxB pro-survival pathway and the shedding of this growth factor. In the
final project, we use a pathways-based understanding to formulate a heuristic for finding
cohorts of shRNAs governing response to therapy. This project uses an abstraction of
mutual information with machine learning to associate the behavior of these regions with
regions that have known function. Lastly, the discussion focuses on the potential benefits
of having another framework for interpreting gene interference screens. As these experi-
ments have been employed in numerous settings, a pathways-based framework stands to
have impacts in many areas of biology. This pathway-discovery framework is also attrac-
tive for forward-engineering contexts where engineers aim to assemble and select genes
to program a specific cellular output.
16
Part I
Integrated Network Analyses for FunctionalGenomic Studies in Cancer
Discovery of gene products vital for function of a biological system, whether cells in
vitro or tissues/organs/animals in vivo, using gene-interference studies at both small and
large scales has become increasing popular because of the capability for RNAi methods
for manipulating multiple cellular components in either biased or unbiased manner. These
experiments aspire to identify high-confidence "hit" sets as putatively responsible for an
experimental phenotype and conceivably imaginable as drug "targets", although requiring
dedicated follow-up tests to buttress confidence in validity. Typically, the findings from the
initial "screen" study are compiled as list of individual genes whose knockdown yielded
significant alteration of biological system function, and the follow-up validation experiments
are considered in isolation. While there are encouraging successes along this avenue, the
realization that molecular components executing or governing cell/tissue phenotypic oper-
ation work in concert among myriad dynamic partners - directly and indirectly - motivates
appreciation for considering a more integrative perspective on interpretation of RNAi-based
functional genomic studies.
'Concerted' operation brings to mind an instrumental orchestra as one notional metaphor.
Proper generation of a musical program depends on the collective efforts of the players
involved, and deviations of any individual in pitch, volume, or timing can produce inap-
propriate sound and affect the overall orchestral performance as other individuals attempt
to adapt - or naturally produce further errors themselves. The sound of any particular
individual is rarely decisive, while an instrumental section can either mitigate or amplify
aberrations and other instrumental sections may aim to compensate. Accordingly, flawed
performance may be viewed as arising from identifiable "drivers" but sustained pathology
is more likely manifested by inability of the overall company to find an appropriate new bal-
ance via diverse modulations. And when aspiring for remediation, as the music proceeds
the original deviations no longer remain the most effective points of correction because
the propagated adaptations and compensations render a simple "re-set" difficult to achieve
17
dynamically.
We use this integrative, or 'concerted' point of view to inform our recommendations
about the investigation of cancer systems using RNAi. We offer that a most effective
framework uses multi-node pathways for gaining greatest insight about how a system is
dysregulated and for how that system might be remediated, and further that this point of
view is essential to RNAi analyses.
RNAi screens as a tool for cancer biology
Because cancer is a mutation-driven disease, many investigators have focused on using
genetic characterizations of cancers, yet there are often non-intuitive relationships between
gene features and disease phenotypes[1 1, 139, 91, 10]. Because cancer is a mutation-
driven disease, many investigators have focused on using genetic characterizations of
cancers, yet there are often non-intuitive relationships between gene features and disease
phenotypes [139, 153, 97, 71, 151]. Further, occurrence of drug resistance also does
not exhibit direct correlation with mutational status [91, 134]. For instance, in pediatric
medulloblastoma, systematic measurement of mutation-status and transcriptional profiling
revealed that mutation rates are not consistent across pediatric tumors[1 34, 120].
In our orchestral analogy, these investigations are akin to rating the quality of the com-
pany using each players' individual audition. This perspective lacks context and an un-
derstanding of the player's contribution to the orchestra's performance. To account for this
context, investigators have turned to RNA-mediated interference (RNAi) technologies to
fine tune a genetic player's ability. These tools manipulate genetic features at a functional
level and may be a complementary approach for studying the non-intuitive relationship be-
tween mutation, expression, and disease phenotype[97, 134, 65]just as a conductor may
better appreciate a musician's performance while playing with their section.
From an engineering perspective, gene-interference experiments are attractive exper-
iments for understanding cancer because of the opportunity to modulate gene function
under diverse potentially relevant conditions. Investigators have targeted single genes,
or multiple genes together, in large scale screens, as well as pathway specific studies
[97, 134]. When investigating genetic amplifications in liver cancer, one group simulta-
18
neously explored the role of these amplification events and the relative contribution of
the in vivo environment with a genome-scale RNAi screen [1, 169]. In this instance, and
many others, RNAi screens afford the opportunity to explore numerous targets simultane-
ously. The Achilles Project from the Broad Institute added another dimension to genome-
wide screens by drastically increasing the scale of their investigation and challenging the
reproducibility of shRNA libraries. They introduced a library of shRNAs into more than
100 established cancer cell lines and identified functional phenotypes that were common
and unique to each cell line [135, 94]. Researchers can take advantage of varying RNAi
reagent targeting efficacy to create titrations of gene interference, known as epi-allelic
series [135, 56]. This technique manipulates variation in mRNA expression to create a
gradient of disease phenotype. As expected, this approach created varying lymphoma
phenotypes which increased in disease severity as shRNA targeting efficiency against
p53 increased [56]. While we note here only a few investigations, RNAi experiments lend
themselves to the perturbation of many more parameters: multiple cues, multiple dosing
schemes, multiple environments, and multiple time points.
RNAi reagents hold significant advantages over other interference methods, such as
small molecule inhibitors. More specifically, siRNA offers the advantage of isoform speci-
ficity and enable fine-tuning of individual isoform expression and activity. For an investiga-
tion of T-cell Erk regulation, researchers used epi-allelic series with siRNAs against ERK1
and ERK2 to identify the role of these kinases on downstream IL-2 production [158]. The
epi-allelic series again showed a correlation between siRNA targeting efficiency and phe-
notype. In addition, the researchers identified that IL-2 production scaled with total ERK
activation and was not isoform specific. When siRNA-mediated effects on IL-2 to those
of a MEK inhibitor's effect, they also found that the gene-interference methods reduced
IL-2 production to a greater extent than chemical inhibitor dosing at an equivalent level of
ERK activation [139, 151, 158]. From this finding they inferred that ERK may also have a
role as a scaffold in downstream IL2 production; such a phenomenon may have not been
indicated using only either approach alone.
19
RNAi screening challenges
Gene interference screens are quickly becoming high-throughput, but they are poorly
suited to the well-accepted data analysis tools from other 'omics biology experiments.
Birmingham et al (2009) provide a thorough review of statistical adaptations for target
discovery from RNAi experiments [11, 91]. Generally, these adaptations consist of nor-
malization, and some means of 'top-hit' identification based on outstanding performance
relative to the remaining population. However, inconsistent reagent performance limits
statistical power and subsequent validation of these candidates often fails.
Variability in RNAi screening data can derive from a variety of factors, both off-target
and crosstalk events, and cause varying rates of false positives and false negatives in RNAi
screens, reducing confidence in final hit selection [97, 71, 120]. Off-target events are a non-
specific result of the experimental reagents, and may include the inadvertent knockdown
of additional transcripts through microRNA-like effects and the incomplete knockdown of
a protein target due to a protein half-life greater than the experimental timeline. Crosstalk
events, on the other hand, are a result of the biological response to RNAi perturbation as
opposed to the experimental reagents used. These events may include increased expres-
sion of transcripts normally repressed by microRNAs that have to compete for use of the
internal degradation machinery, and increased expression or activity of proteins which are
compensatory for the RNAi target [97, 134, 65].
Many approaches attempt to compensate for off-target effects. One method utilizes
multiple RNAi reagents against the same gene, and only considers the gene a hit if mul-
tiple reagents yield a similar phenotype [97, 134]. However, the ability to identify true
positives from redundant reagents is complicated by the targeted gene product's context
within the cell [134, 169]. For example, unintended effects are less likely for gene targets
with highly specific, non-redundant roles or those that exist in linear pathways. However, for
highly connected genes or those involved in multiple pathways, there is a greater chance
of biological crosstalk, and thus varied results between redundant siRNAs [134, 94].
A genome-wide screen for homologous recombination (HR) mediators highlights the
role of unintended effects and how redundant RNAi reagents may mislead results [1]. For
20
instance, 5 out of 10 RNAi reagents against the HIRIP3 gene decreased capacity for ho-
mologous recombination. While all reagents successfully reduced mRNA expression, res-
cue experiments with RNAi-resistant mRNA failed to recover homologous recombination
activity. Further, relative mRNA expression changes did not correlate with changes in ho-
mologous recombination.
Computational analyses of sequence similarity between siRNA reagents and non-targeted,
mRNA transcripts can predict off-target effects but is imperfect in all situations. Genome-
wide enrichment of seed sequences (GESS) analysis looks for enrichment of non-targeted
3' UTR regions in siRNA sense and antisense sequences [135]. In theory, these 3'UTR
matches identify unintended target genes and subsequent modulation of these genes
should recapitulate the phenotype erroneously assigned to the original siRNA. The method
successfully identifies genes enriched in active siRNAs for multiple screens, and can filter
primary screening hits to decrease the false positive rate [135].
In the previously mentioned screen for homologous recombination mediators, GESS
analysis identified a significant enrichment for RAD51 3' UTR in the high-scoring, non-
RAD51 siRNAs [1]. As expected, RAD51 mRNA was depleted in the presence of 4 of
the 7 siRNAs against HIRIP3 and RAD51 mRNA levels better correlated with changes in
the homologous recombination phenotype than HIRIP3 mRNA levels. Yet, only 1 of the 7
HIRIP3 siRNAs actually contained the seed match for the RAD51 UTR demonstrating that
additional cross-talk events may occur in the presence of the HIRIP3 siRNAs. While GESS
successfully identified RAD51 mRNA levels as the true predictor for homologous recombi-
nation, it was unable to fully explain the observed changes in this gene's transcription, as
all HIRIP3 siRNAs did not reduce RAD51.
Biological networks add contextual value
A network framework enables researchers to consider contextual influences on how path-
way components assimilate, integrate, and propagate knowledge in a manner that is dis-
tinct from the list model [9, 75]. More specifically, a network cluster, consisting of a coherent
group of functionally-related genetic regulators, may better explain an observed phenotype
21
where statistically-ranked lists are insufficient [10, 164]. Already, these network motifs for
target discovery have lead to better understanding of the non-intuitive relationships be-
tween genotype and disease phenotype and identification of better therapeutic targets
[10, 151].
Networks can be useful for predicting drug targets and also for selecting drug combina-
tions [75]. Their functional context provides rational selection of single targets as well
as combinatorial targets that could synergistically affect a desired phenotype because
they consider pathway membership [75]. Where toxicity had previously constrained the
selection of combination therapies, researchers may now instead prioritize combinations
based on specificity to controlling a particular phenotype. Understanding macromolecu-
lar pathways led to conclusions about synergies between doxorubicin and TNF-alpha as
chemotherapeutic agents [143]. The authors showed that administering TNF-alpha as an
adjuvant to doxorubicin treatment increased apoptotic cell death in the presence of low-
levels of DNA damage by using an integrated network approach. Without pathway and
network-level information, this non-intuitive relationship may have been missed.
Network interpretation has already added depth to non-intuitive instances of drug re-
sistance. Recently, Wilson et al (2012) showed that growth-factors within the tumor mi-
croenvironment may increase resistance to kinase inhibitor therapy [162]. While this might
seem counterintuitive in a linear-process formalism, considering the cell's underlying sig-
naling network make these results less surprising. Wagner et al used network inference
methods to create interaction networks by combining systematic RNAi-perturbation data
with phosphorylation information at multiple time points for six receptor-tyrosine kinases
(RTKs) (EGFR, FGFR1,c-Met,IGF-1R,NTRK2, and PDGFR-beta) [154]. From the result-
ing networks, they clustered each RTK network, identifying core signaling components
shared between all RTKs as well as cluster-specific modules. They postulated that mod-
ules shared between RTKs within the same cluster could explain resistance to targeted
RTK therapy. More specifically, if RTKs of a particular class shared signaling components
and affected the same downstream phenotypes, then these within-cluster RTKs could com-
pensate for chemical inhibition by actuating the original downstream phenotype[1 54]. They
demonstrated this compensation within the EGFR/c-Met/FGFR1 cluster by showing corre-
22
lation of receptor expression with resistance to therapies targeted to other within-cluster
RTKs.
Functional pathways explain variability among genes classified as "hits"
A meta-analysis of nine RNAi screens for HIV-replication factors used functional enrich-
ment to explain discrepancies across high-scoring targets from each screen[14]. When
they investigated the percentage of scoring targets across three screens, this overlap only
included a modest 3-6% of gene targets. They show that variability between screens, vari-
ability between experimental timing and toxicity thresholds all contributed to the minimal
overlap among these screens. However, when they looked at gene membership in GO
ontology categories, they found much greater overlap in the enrichment of GO categories
across screens than in the individual gene targets. This finding indicates that a more global,
functional filter is useful for identifying true positives from highly variable RNAi screens.
Additionally, using functional pathway membership increased experimental validation
rates in an RNAi screen for DNA-damage mediators [123]. The authors screened all
protein-coding genes in Drosophila melanogaster and compared top hits to an analogous
screen in Saccharomyces cervisiae, but did not see a statistically significant overlap be-
tween screening targets [123]. They expanded their target list by identifying functional
pathways that contained gene targets from their screen in Drosophila melanogaster and
were able to find enrichment of targets from the Saccharomyces cervisiae screen within
this set. Further, using this expanded set for validation experiments identified false nega-
tives from the original screen. These results reaffirm the utility of filtering data by pathway
membership to identify true positives and also using pathway membership as a search
space for false negatives.
In a pioneering study, Jones et al (2008) demonstrated the significance of using path-
way context in a patient setting [69]. They performed a global analysis of mutations in
pancreatic cancers, but found little overlap in the specific mutations across patients. How-
ever, they instead found a core set of signaling pathways that consistently enriched for
patient-specific mutations. They postulate that targeting the physiological consequences
23
of these pathways instead of the individual mutations would improve therapeutic devel-
opment [69]. If we consider the discrepancy between RNAi reagent performance across
replicates as similar to the mutational differences between patients, these findings present
more motivation for using a pathway-centered approach for functional genomic studies.
Network integration is a viable tool for hypothesis generation
Given the importance of understanding the functional context of a genetic alteration, net-
work methods are a useful computational tool. Additionally, these tools enable the incor-
poration of multiple data sets and experiments to create more holistic interpretations of
biological systems. Because of the availability of many experimental datasets through var-
ious databases, data integration will be influential in future investigations [75]. Here, we
review a few integrated network approaches and highlight how networks have improved
the interpretation of biological investigations and affected further hypothesis generation.
In metastatic breast cancer, integrating copy-number variation (CNV) and gene expres-
sion data across multiple samples accurately predicted novel drivers of disease [145]. The
authors used a refined method for first identifying recurrent CNVs from gene expression
data and then used a Bayesian methodology to create a network of mutated genes. From
this network, they found master regulators by selecting genes that had a high authority
score. Mathematically, the authority score identified genes with a statistically significant
number of outgoing connections as compared to the mean number of connections. To test
their hypotheses about mediators for breast cancer, they performed an siRNA screen test-
ing the effect of gene interference on cell viability. Of the gene targets that had the greatest
effect on cell viability, they found a significant enrichment of their high-authority regulators
[145]. This finding demonstrates that networks can synchronize disparate datasets and
that network properties are viable characterizations for finding novel regulators.
Utilizing a data-integration approach, Huang et al (2013) constructed an EGFRvIII-
specific signaling network for glioblastoma multiforme (GBM) by incorporating proteomic
and transcriptional data[62]. More interestingly, they used this network to identify, test and
validate novel therapies. Their final networks consisted of a high-confidence set of experi-
24
mental data points as well as gene targets not included in the original set, but rather added
through known protein-protein interactions. From this network set, they systematically ex-
panded targets for therapeutic intervention by identifying targets with known chemical in-
hibitors and ranking them based on their proximity to the core functional network. From this
target set, they identified compounds in clinical trial with known effects on cancer systems
and chemical inhibitors not yet tested for GBM [62].
In the interferon-stimulatory DNA (ISD) sensing pathway, an integrated network ap-
proach proved successful for identifying novel regulators of this process and for testing
new therapeutics[83]. In this analysis, the authors created an interaction network of po-
tential ISD regulators by combining direct interacting partners of known ISD pathway com-
ponents with interacting pairs from their own quantitative mass-spectrometry experiments.
Perturbation of this compendium network with RNAi reagents identified Abcf1, Cdc37, ad
Ptpnl as effectors of the ISD-sensing response to dsDNA. In this situation, curating and
expanding interaction information around known pathway components successfully identi-
fied novel genes for the ISD response. The authors also measured ISD-pathway induction
after treatment with chemical inhibitors against their novel genes and demonstrated a re-
duction in deleterious interferon production. These results show that integration is useful
for developing new hypotheses for therapeutic development and supports the Jones et
al perspective concerning efficacy of designing therapeutic options around downstream
pathway physiology[69].
Data integration within a network framework also added depth to understanding metabolic
disorders using SNP and genetic linkage data [19]. In this investigation, researchers
created a network where interactions depended upon significant co-expression and link-
age data between genes. Using optimization, they selected highly connected gene sub-
modules and then used these modules for further hypothesis generation. Many sub-
modules were enriched for genetic features that were significantly associated with disease
traits (fat mass, weight, plasma insulin levels etc) and one sub-module was significantly en-
riched for genetic features with significant correlation to all disease traits. They expanded
this module, and created a macrophage-driven superior module from which they selected
and further perturbed genetic loci. From these perturbations, they were able to demon-
25
strate the sub-network's contribution to the observed disease traits and classify genetic
features previously not associated with metabolic traits. This collective understanding of
genetic interactions ultimately created a more comprehensive view of their dataset and
novel hypotheses distinct from those that would be identified from looking at gene-trait
associations independently.
Recently, our own efforts investigating in vivo mediators of Acute Lymphoblastic Leukemia
(ALL) have employed a data integration approach to ascertain GO biological function en-
richment rather than to looking at screening targets independently (unpublished). A B-cell
model of ALL was infected with a genome-scale shRNA library and after infection, cells
were plated in vitro or tail-vein injected into syngeneic recipient mice. After disease de-
veloped, cells were harvested and sequenced for final shRNA representation. Simulta-
neous Analysis of Multiple Networks (SAMNet), is a flow-based formalism which relates
screening hits to downstream expression data using the interactome as a guide for pos-
sible connections among the data [32]. The method generated a network enriched for
functional pathways, such as developmental processes, that are known to play a role in
ALL - whereas these were not identified when analyzing experimental data independently.
This enrichment increases confidence that RNAi hits identified within the network are true
positives. Further, SAMNet adds targets, or nodes, to the network that were not present in
the original high-scoring target set, making it possible to hypothesize about potential false
negatives in the data. In these examples, data analysis in isolation was insufficient for
discovering novel regulators and targets for therapeutic intervention. Instead, a concerted
network approach, integrating multiple data sets or experimental results, improved target
identification and created testable hypotheses for therapeutic development.
Reciprocal engineering for future gene-interference investigations
Understanding and modulating cancer requires a concerted understanding of gene func-
tion and appreciation for each gene's pathway membership. Much like an orchestra, the
performance of the group depends on the collective group effort rather than the ability of
any one player. Auditioning players individually is important for assessing skills and musi-
26
cality, yet their full potential depends on their ability to contribute to the sound of the group.
Gene-interference studies are the experimental parallel of 'auditioning', yet their interpreta-
tion is limited if each player is considered in isolation. Instead, the conductor must observe
the player within his section to see if deficiencies affect the overall sound or if the sound
of his peers compensate for his weaknesses. In the same way, building biological net-
works using RNAi experimental data puts the player in his section, and uses his pathway
membership to assess his effect on the sound of the orchestra.
'Network Filtering' techniques will increasingly become a secondary post-processing
step to statistical analyses for gene-interference studies. We have conceptualized how
identifying network clusters may complement existing statistical approaches in 1. The net-
work provides context for any RNAi reagent's performance and leverages a gene target's
pathway membership to identify true and false positives. This approach assumes that if
multiple RNAi target genes enrich within a pathway, they are more likely contributing to
an experimental phenotype. Similarly, if an RNAi-targeted gene stands out experimentally,
but does not have interactions with other targets, this target may be a false positive. In this
mindset, networks offer a filter for removing false positives from entering the final 'hit-list'.
Where RNAi study results are affected by unintended effects, noise-reduction techniques
will have a large impact on data interpretation.
However, there is still much to be learned about gene pathways and their modulation for
cancer systems. While pathways provide context for gene function, pathways themselves
are context specific and require systematic perturbation[1 53]. For instance, melanoma pa-
tients with BRAF mutations have drastically different sensitivity to therapy than colorectal
cancer patients with similar mutations[1 53]. As such, many considerations remain about
interpreting pathways and using them to refine experimental evidence. Conceptualizing
pathways instead of individual molecules changes the hypotheses generated as well as
the experimental validations that follow[9]. For instance, because of the increased level
of interconnectedness, disease modules are computationally identifiable by graph theory
parameters such as clustering coefficients, and shorter path lengths. Further, designing
validation experiments around these modules may provide novel insight into understand-
ing disease and also improve correlation between predicted perturbation and experimen-
27
1. High-throughput Measurements
Y V2. Statistical FiltersSelect with Significance Thres?
< 0.05
3. Experimental Validatior
Traditional'TargetSelection
NetworkMotif BasedTargetSelection2a. Network FilterSelect Based on) Relationshipsto Other Experimental Data
Possible Methods:Correlation MethodsInteractome ScaffoldingPathway RecoveryNetwork Flow
Figure 1: Network Filtering Uses Functional Relations to Identify Candidate Targets.Traditional methods for target selection collect high-throughput data, then use statisticalmethods to filter data for top candidates and further experimental validation. A network fil-tering approach would collect high-throughput data, perform statistical analyses, but wouldadd an additional network-construction step to find functional consensus among top targetsbefore identifying candidate regulators or predicted pharmacological targets and moving toexperimental validation.
28
n
tal phenotype[151, 154]. This conceptualization further requires consideration of network
properties instead of only experimental phenotype and may instead prioritize candidates
based on number of connections (network degree) or enrichment against randomized net-
works. Yet, it is unclear which graphical parameters are most predictive of false positives or
false negatives and whether these parameters are consistently predictive across multiple
systems.
As network motif discovery becomes more common, we envision an accompanying
shift in approach to these methods. This shift incorporates reverse engineering principles
through a desire to find models that explain system behaviors as well as forward engineer-
ing principles in which the investigator designs the system to control a particular pheno-
type. We propose that future efforts to construct and manipulate cancer networks will use
an 'Reciprocal Engineering' approach (Figure 2). In this 'Reciprocal Engineering' mindset,
researchers balance motivation to explain a system with motivation to design a controllable
system. Already researchers have alluded to the value of 'Integrated Interactomics' and
how future data integration which balances motivations for understanding and prediction,
changes our investigation of cancer systems[1 0].
Recent commentaries in the area underscore the potential impact of this paradigm shift.
These articles concur with the notion that signaling pathways drive cancer progression, and
are a rich source of targets for therapeutic development [153, 164]. Both biological network
models and gene-interference studies are cutting edge techniques that have greatly added
to our understanding of cancer systems. As such, future endeavors merging these growing
fields will enhance understanding of cancer systems and improve ability to manipulate a
complicated disease.
29
'Adivabng'
r- -A.w nn ..1ritfty"
Different gene targets showdifferential effects ondownstream phenotype,
0-
o 00
Inference Methods and NetworkConstruction IdentifyRelationships BetweenActivating and InhibitoryTargets.
Experimental results reveal howsome targets arc interrelated basedon treatment or condition
0-0O 0
The interactome provides a scaffoldfor connecting experimental resultsand identifying the pathway contextfor targets. This balances discoveryfor mechanism and desire formanipulation.
/010
00G0o\0 O
OO
Using identified and relevantpathways, select the targetslikely to affect the down-stream phenotype.
Wild-type
}Sigle Tre.-nwiilcombination
Use pathway modulationto manipulate the downstreamphenotype.
Mechanism-Motivated Design-Motivated
Figure 2: Reciprocal Engineering Balances Forward andReciprocal Engineering balances the mechanism-driven
Reverse Engineering Principles.paradigm of reverse engineering
with the design-motivated paradigm of forward engineering. In the graphical representa-tion, reverse engineering is exemplified by the desire to uncover mechanisms explainingrelationships among the data - for example using correlation to infer relationships amongexpression. On the other hand, forward engineering is using a known network to ma-nipulate the system - in this example, inhibiting node functions to alter downstream geneactivation. Blending both paradigms, Reciprocal Engineering balances relationship discov-ery with design motives to create networks that infer biological function as well as discoverpoints for manipulation.
30
I
Part III
Pathway-based network modeling findshidden genes in shRNA screen forregulators of Acute LymphoblasticLeukemia
Abstract
Data integration stand to improve interpretation of RNAi screens which, as a re-sult of off-target effects, typically yield numerous gene hits, of which, only a few vali-date. These off-target effects can result from seed matches to unintended gene targets(reagent-based) or cellular pathways which can compensate for gene perturbations(biology-based). We focus on the biology-based effects and use network modelingtools to discover pathways de novo around RNAi hits. By looking at hits in a functionalcontext, we can uncover novel biology not identified from any individual 'omics mea-surement. We leverage multiple 'omic measurements using the Simultaneous Analy-sis of Multiple Networks (SAMNet) computational framework to model a genome scaleshRNA screen investigating Acute Lymphoblastic Leukemia (ALL) progression in vivo.Our network model is enriched for cellular processes associated with hematapoeticdifferentiation and homeostasis even though none of the individual 'omic sets showedthis enrichment. The model identifies genes associated with the TGF-beta pathwayand predicts a role in ALL progression for many genes without functional annotation.We further experimentally validate the hidden genes - Wwpl, a ubiquitin ligase, andHgs, a multi-vesicular body associated protein - for their role in ALL progression. OurALL pathway model includes genes with roles in multiple leukemias and roles in hema-tological development. We identify a tumor suppressor role for Wwpl in ALL progres-sion. This work demonstrates that network integration approaches can compensate foroff-target effects, and that these methods can uncover novel biology retroactively onexisting screening data. We anticipate that this framework will be valuable to multiplefunctional genomic technologies -siRNA, shRNA, and CRISPR - generally, and willimprove the utility of functional genomic studies.
31
Insight, innovation, integration
This work integrates multiple 'omic data sets to better understand leukemia pathology. The
most striking finding in this work is that we can identify biological annotations from these
data when these datasets are integrated in a network model even though these annotations
were not present in either dataset alone. Further, we use RNAi screening data as the
model's foundation; this screening technology is a popular tool for probing gene function
that is criticized for noise and off-target effects. Though, with integration, we derive novel
insights in spite of this noise and are able to test our theoretical findings through dedicated
validation.
Introduction
Functional genomic screens are a powerful tool for systematically probing gene function in
the context of many biological systems[97, 142, 38, 98, 161, 12]. Shortly after its adapta-
tion to experimental work, RNAi gained popularity as the technology is relatively easily and
quickly adaptable to multiple biological systems[142, 38, 12]. However, off-target effects
(OTEs) which can result from seed matches between the individual RNAi and unintended
genes[65] are a widely criticized limitation of this technology. These effects complicate
validation and pursuit of further hypotheses[142, 71, 135] and thus, RNAi screens require
analysis methods that can more efficiently identify true targets for validation[1 42, 65]. Many
studies have focused on improving the stability and specificity of the RNAi reagents them-
selves[1 0], or have developed algorithms for predicting real effects by considering unin-
tended seed matches[1 35, 122, 13]. Other studies have considered alternative platforms,
such as TALENS or CRISPR, for probing gene function[12]. CRISPR systems can ex-
hibit stronger loss of function phenotypes than RNAi and seem a promising alternative[48].
These systems enable precision genome engineering by more efficiently blocking gene
function, and have the added capacity of being able to activate and inactivate genes[48].
CRISPR systems still suffer from OTEs and are sensitive to mismatches in the 12 bases
proximal to the guide strand[14]. Overall, these approaches only address the technical
challenges of gene interference and do not consider the biological consequences.
32
Few investigations have considered how a cell may compensate for a gene perturba-
tion. An RNAi reagent may perfectly block an mRNA transcript, but a redundant protein
may compensate for the loss of a particular gene or the targeted protein may be stable
beyond the duration of knockdown, leading to false negatives. Further reasons for false
negatives include the inability for RNAi to sufficiently diminish expression of highly stable
proteins (e.g. enzymes which may retain function after knockdown) or the use of small, tar-
geted libraries. To compensate for the possibility of false positives and false negatives, one
group used GO analysis to find consensus among three different siRNA screens for HIV
replication factors. The group saw little overlap between the specific hits from each screen,
but saw that all three screens had top hits enriched for the same GO functions[14]. Given
the propensity for OTEs, it is not surprising that three independent screens identified dif-
ferent candidate hits, though it is striking that the individual hits fall in similar pathways[1 4].
This work foreshadows the value of using pathways to provide context around any individ-
ual hit. Though, our current definitions of cellular pathways are incomplete and there is a
real need for discovering pathways and attributing new genes to existing pathways.
Here we pursued an integrated, pathway-based approach to identify in vivo specific reg-
ulators of Acute Lymphoblastic Leukemia (ALL) progression. Development of treatments
for Acute Lymphoblastic Leukemia (ALL) has had mixed success and improvements in
patient overall survival is still unchanging[1 26]. For childhood ALL patients, 10% suffer
remissions and further, treatments have high toxicity[64]. We already know that the tumor
microenvironment affects how cancers progress and respond to therapies in a complex
manner. Paracrine signaling in the bone-marrow microenvironment can confer resistance
to therapy in myeloma[58] and local cytokines can promote cancer development in the con-
text of specific genetic lesions[47]. More thorough disease characterization in the native
microenvironment would facilitate development of new treatment strategies. Further, ALL
is just one of many types of cancers which arises from incomplete hematopoieitic differ-
entiation. Given the similar origin of these disesases, it is possible that we can learn and
re-purpose molecular studies from other hematopoeitc cancers to accelerate development
for ALL. To explore the role of the microenvironment in determining genetic mediators of
ALL, we have used a genome-wide shRNA screen for mediators of pre-B-cell ALL progres-
33
sion in vivo, and demonstrated the ability to identify micro-environment-specific genes[93].
Recognizing the likelihood of significant OTEs and acknowledging the significant mer-
its of pathways analysis methods, we developed a network model to discover missed tar-
gets, or predicted genes, from the initial shRNA screen. This network model builds on the
pathway-based approach described earlier by incorporating diverse experimental datasets
to better model the genes contributing to ALL progression. We assume that many pathway
annotations are incomplete, and that modeling multiple experimental datasets will uncover
novel pathways. We construct our network model using shRNA, ChlPseq, and mRNA
expression data and use this model to understand and validate features of the in vivo sys-
tem. We experimentally validate novel roles for Hgs and Wwpl: Hgs is a gene that is
generally deleterious to B-cell ALL viability, and Wwpl is an in vivo specific regulator of
disease progression. We perform this analysis using screeing data that was not designed
for further computational modeling in mind - the screen did not contain redundant shRNAs
or non-targeting controls. Taken together these results demonstrate the ability of network
models to select candidate targets from an shRNA screen and discover novel pathways
from disparate datasets. Biologically, the model makes specific predictions about gene
targets that affect ALL progression by affecting the tumor microenvironment, illuminating
multiple pathways that are relevant for therapeutic development in ALL.
Results
A network-based data integration scheme
To identify pathways that mediate ALL progression, using multiple experimental data- we
introduce a network-based approach, described in Figure 3. Conceptually, this approach
uses published protein-protein interaction data (Figure 3A) alongside computational de-
rived protein-DNA interactions to construct a set of all possible interactions that can relate
experimental measurements from shRNA screening and mRNA expression. This larger
network will then be reduced (Figure 3B, C) to identify biological pathways, either known
or unknown, that are implicated by the experimental data, described below.
34
A 0
B C
Figure 3: Constructing a network model from multiple 'omic measurements.A. We start with a probabilistic interactome that includes protein-protein interactions scoredby the confidence of their interaction. This confidence score reflects the strength of evi-dence across multiple interaction databases and this score constrains the edge's capac-ity within our flow-based model. Higher confidence leads to higher capacity. Some ofthese proteins are transcription factors (triangles). We complement these edges withtranscription-factor (triangles) to DNA (octagons) binding interactions. We predict theseinteractions and their edge probabilities by measuring active and open chromatin via ChIP-seq and looking for enrichment of transcription factor binding motifs. Conceptually, this isour available road map for creating pathways where the capacities are akin to speed limits.B. We connect an artificial source node to all proteins that have corresponding shRNAsthat were considered hits in the screen. These edge capacities reflect the strength of theshRNA effect. In our model, these edges reflect how strongly an shRNA depletes frominput to morbidity. We connect an artificial sink node to differentially expressed mRNAs.These edges reflect the fold-change in expression. The algorithm introduces flow into thenetwork and looks for an optimal route from the source to the sink, selecting edges basedon available capacity. C. The final path through the interactome becomes the de novopathway. This pathway may or may not include all of the original inputs (e.g. differentiallyexpressed mRNA or depleted shRNAs). Further, SAMNet allows the simultaneous con-struction of pathways for multiple conditions. In our investigation we treated the parallelin vitro and in vivo screens as separate conditions. D. Screening design interrogates invivo specific regulators of ALL progression. A genome-scale library was introduced to ALLcells in vitro. Representative samples were either maintained in culture or transplantedinto mouse models. At time of morbidity, blood and culture samples were re-sequenced tomeasure shRNA representation.
35
shRNA screen and mRNA expression data identify distinct and incomplete gene sets
We collected previously-published shRNA screens and mRNA expression data from a
mouse model of ALL, one in vitro and the other in vivo [93] (Figure 3D). Together this
data represents the direct and indirect effects of RNA knockdown in both environments.
To identify the direct effects of shRNA screens, we calculated a fold-change in shRNA
representation, comparing sequencing reads for each shRNA at input (time of transplant)
and post disease burden (at morbidity). We ranked genes based on the greatest depletion
from input to post disease burden and considered the top 1% (84/87 genes for the in
vivo/in vitro screens) for further investigation (Table 1). GO enrichment of the ranked list,
using GOrilla [29, 30] of targets from the in vivo dataset did not identify any enrichment
of GO terms. GO enrichment of the ranked in vitro targets found enrichment of cellular
homeostasis, and cation regulation; however, these terms were enriched merely based
on the presence of TMEM165, KCNA5, and CEBPA, without contribution from any other
targets in the ranking (Table 2). Overall, we found little functional enrichment or information
from shRNA data alone, suggesting that we had an incomplete picture of genes regulating
ALL progression.
36
In vivo In vitroGene Abs(f.c.)I Gene Abs(f.c.) I Gene Abs(f.c.) I Gene Abs(f.c.)ASRGL1FNDC5MRPL37ATP6V1CI1DHX37NRG2ATP1OBHOXD3HDHD1PCBP4DHDDSSERF1ASLC29A2STK32BPOLOGTF3C5SPC24IPo11lGMPSUBE2HZNF416CASP14SCDMAPRE3PLEKHA5SMARCD1DENND1CTMEM156TMEM176ALPPR5SRP72ATP6AP2FOXO6WISP3LCE3CZBTB4RCFXYD6UBE2L6SNTB1TIMD4DVL1COLEC12DHRS7CCTSCCECR5IOCHMTRF1L
TGS1
TIGD5SPATA13
10.81810.2339.6709.4789.2449.1558.9408.7658.7608.7448.6338.4748.3908.2968.2418.1648.1408.1028.1028.0738.0258.0148.0007.8987.8797.8567.8407.8217.8007.7517.7047.5957.5887.5647.5627.5127.5127.5127.5057.4987.4807.4527.4407.4407.4357.4257.4257.3667.3377.3277.306
Table 1: Top 1% of depleting shRNAs in vitro and in vivo.We calculated the absolute values of fold-changes for all genes that depleted from inputto end point. We mapped all genes to their human homologues for use with SAMNet.Fold-changes were calculated using DESeq2.
37
CPSF6NDUFS1ARMCX6OR13C4BCMO1SF3A1DIAPH3PGLYRP2ABHD12BDRD1
ODF2POGZ
TDRD7
C9orf69
ALKBH2
GRHL3C17orf97
SUZ12
TMEM79
LAG3
IDH3A
C4oif32
PNO1
FUT10VCPIP1TFDP1
RPL7
RNF11SLC6A2
YEATS2
LAP3
ADAMTS18TRIM33
7.2967.2927.2817.2757.2707.2687.2667.2497.2427.2417.2387.2157.2067.1937.1437.1197.1117.1117.1067.0627.0427.0397.0187.0056.9906.9746.9666.9566.9516.9326.9286.9276.916
ClOorf71
KCNA5TMEM165C10,185
KLF3CEBPATPSG1ZNF367GTPBP10MFSD3GLB1L
DNTTMYT 1ANKS6KRT77NRARPPPP3CBMAD2L1CRNKL1ACTC1AHCYL1FCRL3SURF4RCOR2
FCGR3A
ZNF616BAG5SASS6SMYD5TRIM59ZNF347NFAT5MATR3
MED8CCT2RA833AMCPH1KCTD1POLR3GL
MPZL1INTS1
DNMT3AARL9DOK2
SYNJ2
MBTPS2
C4orf17NHLRC1ATXN3
MID2
CHM
10.0389.9659.7519.6719.6339.3338.9758.8268.7748.7548.5588.4988.2858.2628.160
8.1038.0998.0558.0168.0168.0167.9967.9967.9627.9557.9137.9027.8037.7607.7457.7237.7217.6497.6437.5507.5457.4407.4077.3807.3657.3417.3347.3047.2857.2857.2287.1837.1077.0907.0907.022
MEIS1ZNF583LATZNF217TPRKBPOLR2BSPIRE1ENO3ZKSCAN2C10r106
REG1AZBTB24TTLL9ITGB1BP1IL5RASNTB2AKR1D1ZFYVE26MRPL13TDGF1LGALS7GCH1DSELEIF2S3S100A4MROHMGA2RASLIOAGIT2PPARGSCLT1ASXL2TM7SF3NOS3
BICD1NR3C1
6.9976.9226.9096.9086.8936.8816.8496.8266.722&B6946.6756.6676.6636.6336.6076.598&.5606.5546.4846.4576.4286.3786.3016.3006.2996.2796.1576.1546.1486.1316.1046.0916.0786.0446.040
8.035
cellular homeostasis 3.79E-02ion homeostasis 4.07E-02cellular divalent inorganiccation homeostasis 4.38E-02divalent inorganic cationhomeostasis 4.74E-02cellular chemicalhomeostasis 5.17E-02cation homeostasis 5.69E-02calcium ion homeostasis 6.32E-02metal ion homeostasis 7.12E-02monovalent inorganiccation homeostasis 8.13E-02chemical homeostasis 9.32E-02inorganic ion homeostasis 9.49E-02metal ion transport 1.1 4E-01cellular cation homeostasis 1.42E-01cellular metal ionhomeostasis 1.90E-01cellular ion homeostasis 2.85E-01cellular calcium ionhomeostasis 5.69E-01
Table 2: GO enrichment of shRNA targets from the in vitro screen.Enrichment used a single ranked list against the whole genome via the GOrilla web tool.
Using microarrays, we also collected differential mRNA expression data to compare
between cellular contexts at morbidity. Genes were considered based on their fold-change
relative to the in vivo context, and ranked based on fold-change. Again, we investigated
the top 1% of genes up-regulated in vivo (77 genes) and up-regulated in vitro (66 genes)
(Table 3). Using GOrilla, we found few enriched GO terms in the in vivo data genes as
compared to the whole genome; these terms included regulation of transport, and actin
cytoskeleton organization (Table 4). There were no enriched GO terms in the set of genes
up regulated in the in vitro context. We would expect that for any biological process, not
all relevant genes will be differentially expressed or be sensitive to shRNA perturbation.
Given these circumstances, we would not expect differential expression analysis or shRNA
screening to uncover all relevant genes or expect these measurements to identify the same
genes. The fact that there was little functional enrichment in these top candidates reaffirms
the incomplete nature of individual high-throughput measurements[1 66], and suggests an
integrated approach could find hidden information.
38
In vivo, genes with fold-change values12.10 1 LOC671894 // LOC674
RTP4
MS4A14732416N19RIK2900041A09RIK
CASP4IFITM3
SLFN45830431A10RIKZBP1
B230343A10RIKCCL54921525009RIKC5AR1
7.75 1 MYOM2
7.637.617.577.567.557.537.537.49
7.447.417.377.367.33
MX1TIAM1D430019H16RIKDOCK9 /// LOC670309SPARCFPR-RS2PLF /// PLF2 /// MRP
MYH6 /// LOC671894/
CCL3S100A5TIMM8A2LGMNDLGH3
HBA-A1
S100A9S100A8
IIGP1
RHOJ
NKG7AQP1ANXA2CSF1R
IL18
NRP1FOSSAA3P
CH13L3
TCRB-J ///
TCRB-V13LOC240327IGL-V1 /I
2010309G2ENPP3
PGLYRP1
SLC9A3R2
LCN2
BLR1
HYDIN
EPPK1
NGP
MPA2L /LOC626578
8.29 C1QB8.27 HTRA3
LAMB2PLXNB1
IL2RAA33010 2K04RIKGPRC5A
TCRB-V13 /// LOC6655TYROBP1100001G20RIKLMNA
7.80 1 XDH
7.267.257.247.197.167.127.107.057.04
7.04
LGALS3BP
CD97DIRAS2KLF4
GZMA2010300C02RIK /// LOADCY6
A930013B1 0RIKTLR1
In vitro, genes with fold-changesIL21RBEX61190002F15RIKFETUBNUPRIREEP1GLRP1
SENP8
CCR2PAX7POLH
2610019103RIKFADS2DKK3
PKP2AXIN2KIF2C /// LOC631653NAP1L32610021K21 RIKBARD1CHAC14731417B20RIK
-4.82
-4.57-4.39-4.34
-4.33-4.29-4.25
-4.17-4.15-4.13-4.10
-4.05-4.03-3.99-3.95-3.92-3.88-3.84-3.83-3.83-3.80-3.79
5730442G03RIK1700025G04RIKB230107K20RIKJDP2NETO2
PRG3RTN4RL2
SLC6A13
D19ERTD652E
ORCIL
CD248
DPM3
GM129
USP2
EVA1
ABCG1
AMMECR1CMAHPLK1
ZDHHC2GCM2MAP6
Table 3: Genes selected as top candidates from mRNA expression data.The table shows the top 1% of genes up-regulated in vivo (top) and in vitro (bottom).
39
11.9310.929.919.809.769.759.648.94
8.858.688.528.478.45
7.30 ITGA57.28 KLK3
7.037.017.006.976.966.966.936.916.816.756.736.726.696.65
6.656.64
6.64
6.636.606.556.526.516.516.496.48
8.268.24
8.19
8.11
8.007.907.897.887.82
TGFB3HBB-BH1VLDLRHS3ST1PLA2G2F1700097N02RIKNKX1-2ACTR3BPPP1R3BUBQLN2ANKRD15SOX6CDH118100llHllRIKCD28
LOC433844A1427515ART4PTGS1GFI1BSELENBP1CTH
-9.74
-8.50-7.05-6.99-6.02
-5.97-5.77-5.69-5.52-5.49-5.42
-5.38-5.37-5.35-5.20-5.13-5.09-5.01-4.95-4.95-4.93
-4.93
-3.78-3.77-3.77-3.77-3.76-3.69-3.57-3.57
-3.53-3.51-3.51-3.48-3.48-3.48-3.45
-3.44-3.43-3.42
-3.40-3.39-3.39-3.38
GO Function FDR q-value
regulation of transport 1.24E-01response to transitionmetal nanoparticle 1.96E-01positive regulation oftransport 2.15E-01actin filament-basedprocess 2.46E-01actin cytoskeletonorganization 3.27E-01
Table 4: GO enrichment for genes up-regulated in vivo.Enrichment used a single ranked list against the whole genome via the GOrilla web tool.
There were no enriched GO terms for genes up-regulated in vitro.
Measuring histone activation for model specificity
When designing this model, we wanted to find interaction sets that connected the genes
identified from the shRNA screen and from differential expression analysis. This model re-
quired protein to DNA interactions in addition to the existing protein to protein interactions
in our interactome (Figure 3A). These interactions were not publically available so we used
a combined computational and experimental approach. Specifically, we used measure-
ments of open and active chromatin to identify regions of putative transcriptional activity
and predicted transcription factor/DNA binding interactions using computational modeling
techniques.
We collected ChIP-seq data for the activating histone markers H3K27Ac and H3K4me3
from our ALL model in culture. Compared to an IgG control, we identified 29468 and
18142 peaks in the H3K27Ac and H3K4me3 datasets respectively. To identify regions of
possible transcription factor binding, we searched within these peak regions for local min-
ima, or valleys, between histones; from this analysis we found 70894 and 24617 valley
sequences in the H3K27Ac and H3K4me3 datasets. Of these valleys, 15178 regions over-
lapped between the two datasets (21% of the H3K27Ac and 62% of the H3K4me3). Figure
4 shows example ChIP-seq reads, MACS peaks, valley regions, and IgG controls for 3
control genes: Trim27, E2f3, and Hist1h1b. We selected these example genes because
they had relevant histone activation marks in mouse B-cell samples that were previously
40
published in the UCSC Genome Browser. For these genes, the H3K4me3 and H3K27Ac
ChIP-seq and MACs peaks were well aligned, although the valleys varied slightly. In the
case of Histi hi b, the H3K27Ac data had a larger MACS peak and thus more valley re-
gions. Given these trends, we used the union of all valley regions to identify possible
regions of transcription factor binding.
From these active valley regions, we used our own software suite, Garnet, to identify
transcription factor-DNA binding interactions. Garnet uses a weighted scoring approach to
quantify the probability that a transcription factor occupies a region based on the strength
of the transcription factor motif [50, 106]. The analysis created a list of 262705 interac-
tions with scores distributed from 0.3 to 0.99. We append these interactions to an existing
protein-protein interaction network derived from iRefWeb (interactome described in meth-
ods).
H3K27Ac -- - --
H3K4me3 -A-. ______
Figure 4: ChIP-seq with valley-finding identifies regions for transcription-factor binding.Genome viewer tracks for Trim27 (chr1 3:21,267,345-21,277,316), E2f3 (chr1 3:30,071,171-30,083,320), and Histi hi b (chr1 3:21,868,763-21,874,488), showing ChIP-seq reads (top),MACs peaks (middle), valley regions (lower, orange), and IgG control (grey, lower) forH3K27Ac (top 4 rows) and H3K4me3 (bottom 4 rows).
SAMNet identifies a network of genes affecting ALL progression
As we already knew that shRNA and mRNA measurements capture distinct pathway components[1 66],
we pursued a data integration approach to predict new genes in the ALL progression path-
way. We used the Simultaneous Analysis of Multiple Networks (SAMNet)[49] to construct
such a pathway because this was the most powerful tool given the data we had collected.
The algorithm integrates diverse perturbation and response datasets and maps them to a
41
physical interaction network comprising all possible ways by which the perturbed species
(e.g. shRNA hits) can give rise to the observed response (e.g. differentially expressed
transcripts). SAMNet then applies a network flow-based paradigm to select a subset of
interactions that relates the experimentally-perturbed genes from the shRNA screen to
the differentially expressed mRNA through the specified interactome. We conceptually
depict how SAMNet integrates shRNA and mRNA data with our interaction network, or
interactome in Figure 3. In the context of the resulting model, RNAi genes are connected
upstream to transcription factors and differentially-expressed mRNAs; sometimes, RNAi
genes represent transcription factors themselves and are directly connected to differential
mRNAs. Mathematically, the algorithm selects an interaction sub-network by pushing flow
through a probabilistically-weighted interactome (described in methods). Flow initiates at
the RNAi hits and terminates on differentially expressed mRNAs. Fold-change values from
RNAi and mRNA expression data constrain the amount of flow any experimental gene hit
can capture in the network. Genes are able to capture flow from either commodity. Flow
sharing is constrained by the edge capacity (set by the edge confidence); common nodes
force the algorithm to select interaction edges beyond the common nodes. and the algo-
rithm Flow is shared between data from the in vivo and in vitro experiments through the
multi-commodity abstraction in the underlying SAMNet implementation.
The foundational model consisted of 311 nodes and 480 edges (Supplementary Figure
8). We proceeded from this foundation by removing the 'source' and 'sink' nodes which are
topological formalities for creating the network model. This created an interaction network
where upstream protein-protein interactions converge on transcription factor/DNA binding
interactions. To first focus on a protein interaction sub-network, we omitted the 43 mRNA
nodes and transcription factors that are not directly connected to the protein-protein inter-
actome (see below). The resulting sub-network contains 258 nodes and 259 edges (Figure
5). This network contained 91 targets from the RNAi inputs (i.e. experimental genes), and
167 predicted genes. Of the 167 predicted genes, 33 are transcription factors.
A predicted transcription factor/DNA binding sub-network contains 84 nodes with 79
edges (Figure 6); 41 nodes are transcription factors, and 43 nodes are differentially ex-
pressed mRNAs. This sub-network includes an additional 7 experimental genes from the
42
RNAi input set; this yields a total of 40 transcription factors in the whole network, combin-
ing the two sub-networks. These 7 targets are transcription factors which had predicted
connections to altered mRNAs, but did not have further connections to the protein-protein
interaction network. This set of transcription factors comprises Tfdpl, Foxo6, Hmga2,
Meis1, Myt1, Cebpa, and Klf3. Another identified transcription factor, Pparg, had con-
nectivity both to the protein-protein interactome and directly to differential mRNA (Figures
5&6).
43
4 zS4 c C
4C
a
Figure 5: SAMNet identifies integrated network for ALL progression.The magenta/green edges represent interactions from the in vivo/in vitro screens. RNAihits are represented by a shaded square; the shading refers to the extent of depletion inthe original screen. A diamond is a transcription factor selected by SAMNet; those thatare shaded are also hits from the shRNA screen. All white-face nodes are hidden targetsselected by the algorithm. Node border color represents fractional representation in afamily of 100 random networks. Downstream mRNA not pictured.where Wwpl, Hgs, Lmo2, and Pogz exist within the network.
Red arrows indicate
44
mRlNA o d change-- 9.97
gene0 d n -2euae Cn v hi5 J
8Q, 85V *AeIu-rgltN nvv
Figure 6: SAMNet selects transcription factors that explain genes with greatest differentialexpression.The transcription factors and differentially expressed genes are represented as trianglesand octagons respectively. Grey shading on the transcription factors represents the extentof depletion in the original screen (legend shown in Figure 3). Shading on the differentiallyexpressed genes reflects either down-regulation (green) or up-regulation (pink) in the invivo screen relative to the in vitro screen. The thickness of the interaction line representsthe amount of flow captured by that interaction. Node shapes and border colors are thesame as explained in Figure 3.
Integrated approach finds genes connecting disparate data sets
This network further specifies interactions specific to either the in vivo or in vitro screen, as
well as interactions common to both screens (Figures 3&4). The network identified genes
hits that connected the in vitro data (e.g. HGS, ARR, CASP1) or the in vivo data (e.g.
CPSF6, WWP1, TCF4, KLF5, HOXA9), and genes that connected data from both screens
(e.g. IKBKB, NFKBIA, RELA).
We can also use the predicted genes identified by the algorithm to enhance the abil-
ity to identify known pathways using Gene Ontology (GO) functional enrichment statistics.
Functional enrichment of the network genes compared to the whole genome identified GO
processes distinct from those identified in the RNAi or mRNA expression data alone (Ta-
ble 5). The processes include many associated with hematopoiesis including leukocyte
homeostasis (q-value: 9.76e-1o), lymphocyte homeostasis (q-value: 1 .67e-9), regulation
of leukocyte differentiation (q-value: 2.05e-8), hemopoiesis (q-value: 4.16e-8), and positive
regulation of lymphocyte proliferation (q-value: 1.97e-4). These processes reflected B-cell
45
specific functions such as negative regulation of cell differentiation (q-value: 4.92e-10)
which reflect the undifferentiated state of B-cell leukemias[20]. This set of GO processes
also included functions associated with other hematopoietic progenitors, namely leuko-
cytes. This enrichment occurs because genes with these annotations are shared among
these two biological functions. 12 of the 13 genes annotated as having a role in leukocyte
homeostasis are also involved in lymphocyte homeostasis.
The enrichment analysis also identified pathways with known relationships to leukemias,
if not specifically acute lymphoblastic leukemia. We address the potential signifigance of
these pathways in the discussion.
transforming growth factorbeta receptor signaling 1376 Fos, Parpi, Skil, Smad. Tgfb3, Smad2, Smad9, Ptk2, Smad3 Trp53pathway 1 27E-12 9.29E-11 (20822,69,307.14) Creb1. Jun Map3kl, Src
Med1. Hdac2, Vhl, Lmo2, TWt712, Pparg, Hoxa9 Ptk2, 1118, Trp53,Foxo . Jdp2. ApIs, Pkp2, Xdh, ltgbl, Vim, Ezh2, Erbb2, E2f1. Skil
negative regulation of cell 389 Erbb4, Myc. Smad3. Hmga2, Meisi. Gsk3b, Itgbibpl, Mapki, Pax6,differentiation 7 07E-12 4.92E-10 (20822,610,307,35) Suzl2. Trp73. Ctnnbl. Stat5a, NfktAa
Mcphl Jak2, Med1, Plscri. Pcna, Bag5, Map2k1, Vhl, Brcal, Pparg,Fadd, Hifla CmkIl, Trp53, Foxo, Tnm28, Mdfi, Traf2, Gch, Mecom.Atxn3 Ubqln2, Traf6, A4. Bag6, Xdh, Nck1. Zbpt. Kf4. Ccr2 Parpi.
positive regulation of protein 300 Skil Epm2a Dv12. My Smad3 Mapkl Ppp4c. Trp73. Rela Cd28,import into nucleus 8 16E-12 565E-10 (20822,1084,307.48) fl2ra Map3k3, Stat5a. Map3kl Map2k4, Nfkbia. Rab33a
13.18 Ahr, Sos1, Fas, Lat, Skil. Ppp3cb, Hif ta, Fadd, Casp3. lkbkb. If2ra,leukocyte homeostasis 1 50E-11 9 76E-10 20822.67,307,13) Stat5a Mecom
14. 3 Sos1. fkbkb. Ahr. Lat, Fas, lf2ra, Skil, Ppp3cb. Stat5a, Fadd, HifIa.lymphocyte homeostasis 2 66E-11 1,67E-09 (20822.56.307.12) Casp3
regulation of leukocyte 575 Sost. Fas. Ccr2, Erbb2, Fos. Fadd. Myc.Tal1, Creb1, Gfib, Jun. Apms.differentiation 3 73E-10 2.05E-08 (20822,236,307 20) Asxl2. Cd28, ff2ra, Ctnnb1. Rb1, Stat5a 1 raf. Tyrubp
regulation of myeloid 9 13 Fos, Fadd. Myc, Tal, Crebt, Gfitb, Jun, Asxl2. Apcs, Ctnnbl, Rbt,Ieukocyte differentiation 4 13E-10 2.24E-08 (20822.104,307.14) Stat5a. Traf6. Tyrobp
8.71 Jak2, Med1, Klf4, Ahr, Ccr2 Lmo2, Hitla. Hoxa9, Meisi, Tall, Sox6hemopoeisis 7 79E-10 4.16E-08 (20822,109,307,14) Ctnnbl. Sp3. Sp1positive regulation of 8 48 Brmaf, Smad4. Eed, Trp53, Tall. Plk1, Ctbp1, Jdp2, Gfilb Asxl2,chromosome organization 4406-09 2.13E-07 (20822,104,307,13) Ctnnbl, Rb, Ep300positive regulation of myeloid 11.52leukocyte differentiation 7 35E-08 3.06E-06 (20822.53.307.9) Gfllb, Jun, Fos. Asxl2. nb1, Stat5a Fadd, Trafe. Creb
positive regulation of 4.21 Jak2, 2Atp6ap2, Ccr2, Brcal, Fadd, Hifla, Smad3, ff18, Crebi, Rela,cytokine production 7.47E-08 3.10E-06 (20822,322,307,20) Cd28, Rel. Traf2. Stat5a, Arn, Hdacl, Traf6, Src, At14,regulation of apoptotic 531 Jak2, Pfscrt. Nck1, Fas, Skil. Fadd, Mapk8. Smacd. Myc, Trp53.signaling pathway 1 16E-06 3.79E-05 (20822,166 307.13) Gsk3b Trp73, Traf2
10 32myeloid cell development 4 62E-06 1 31E-04 (20822,46 307.7) Sox6, Med1, Ptpn11, Tal, Meis1, Ep300, Sr
positive regulation of 595lymphocyte proliteration 7.38E-06 1.97E-04 (20822,114,307,10) Nckt, Ccr2. Cdknla. Cd28, Stat5a, Fadd, Traf6
Hdac2 Cdhl Atp6ap2, Tcf712. Dy12. Smad33, Dkk3Regulation of Wnt signaling 3.88 Dvl, Foxol, Mdi. Hdac1, Src. Xiappathway 3 49E-05 7 84-04 l20810,227,307,13 )I
Table 5: GO enrichment of network gene identifies processes associated with B-cellleukemia.GOrilla identified enriched GO processes using the network nodes as the foregroundagainst a background of the whole genome.
46
Robustness metrics prioritize and predict genes relevant for ALL progression
Toward determining targets for dedicated validation, we quantified the robustness of each
gene selected in the network using randomization. This routine involved running SAM-
Net 100 times with randomly selected input protein and mRNA targets. The number and
value of prizes remained constant, but were assigned to random genes, and the under-
lying interactome was unchanged. We counted a gene's fractional representation in the
family of random networks and display this as a specificity score (see Figures 5 & 6); a
lower score represents a greater specificity. This randomization mitigated against potential
targets arising artificially due simply to high connectivity within the interactome.
To prioritize targets for further investigation, we leveraged information contained in the
network to create a ranking score metric. The ranking reflects the gene's robustness to
randomization (1-specificity) as calculated above, an authority index (reflecting the number
of out-going connections from a gene), and aflow quantity (the amount of 'weight' upstream
of a gene) (Table 6 and Supplemental Table 1). The aggregated ranking score is a weighted
sum of these three factors, where each is weighted by the standard deviation of that factor;
the sum is normalized to the max possible score (the total of the standard deviations). The
specificity varied the most across all network nodes, and was the most discerning factor in
contribution to the aggregate ranking metric.
47
Phenotypic HitsNode Acoreaate ScoieRp17 0.811Lap3 0.811St3al 0.803Pogz 0.803Lat 0.836T prkb 0.835ZbIh24 0.835htgb IbpI 0835Sc8t1 0.835_Ppp3cb 0.828Fcr15 0.828
Tsgl _ 0.40
Hr2 0.80Ubpd4 08007Tre2i 0.265
Arnt 0.261_-K-f6 0.840Map3k14 0.840T g101 0.840Smur12 0.840TrIrn24 0.840Pfscr1 0 840WWp1 0.838SkI[ 0.832Lmo2 0.824T0f4 0.8232810408M09Rlik 0.833Mid1 0.833Rabacl 0.816
Hgs 0 800Tbc1d4 0,800
Transcription FactorsNode Aggregate Score
Hinga2 0897
TIdp1_ 0 883Foxo6 0. 85r
Slat5a 0.496
Pei 0.457
Gabpbi 0.832Foxol 0.831
Sp3 0.750Alt4 0.749
Table 6: Aggregate scoring of top network nodes.Genes are grouped by type (transcription factor, phenotypic, and hidden) and contextual
effect. The shading in the right columns refers to colors from the network key in Figures
3&4.
Predicted pathway contains genes contributing to ALL progression specifically in
vivo and genes affecting B-cell viability
As initial evidence supporting the effective capabilities of our integrative pathways-based
modeling approach, we note that the model predicts in vivo specific effects for Lmo2 and
Pogz. These two targets have, in fact, been previously validated successfully[93]. For
additional evidence, we undertook here new dedicated experimental tests of predicted
hidden genes selected from the model-determined ranking list (Table 6). From the ranked
gene list (Table 6), we used in vivo competition assays (Figure 7) to measure the effects of
loss of these predicted genes on ALL progression.
Hgs is a vesicular body-associated protein and was not included in our initial library
48
screen. The model predicts that shRNAs targeting Hgs will deplete in vitro. In competition
assays, an shRNA against Hgs depleted in culture, the blood, bone marrow, and spleen.
The depletion in culture was greater than in all organs and was consistent with our model
predictions (Figure 7). These findings suggest that while Hgs confers a growth disadvan-
tage to a pre-B-cell ALL, the in vivo environment mitigates the effect of Hgs loss. This
may be due to growth factors or other paracrine signals that are relevant to hematological
malignancies[47], direct cell-to-cell contact, disease compartmentalization, or many other
factors unique to the in vivo environment[93].
Wwpl, is a ubiquitin conjugating enzyme that was in our initial library, though the se-
quencing reads were below the limit of detection, suggesting a bad shRNA reagent. The
model predicts that Wwpl knowckdown has an in vivo specific effect, and competition as-
says confirm this phenotype. shWwpl enriches in the blood, bone marrow, and spleen
(Figure 7).
49
Input
Input
Hgs (sh1), normalized
0.0001< 0.0001
17-0 0002
*g
0,0078 l
0
- 0
0
0 0 0-'\0 \0 0 e 4
V, llf' '0 4 "q e10'0
In Vitro
In Vivo
Wwpl (sh5), normalized
00001 <00001 0.0046
1.5- ns -Z
1.0-1b- A t1
n n 1 I
0. 0
C:'
Figure 7: Validation shows in vitro and in vivo effects for Hgs and Wwpl.In the competition assays, we measure the relative abundances of pre-B-cells with andwithout an shRNA against our gene of interest either in culture or transplanted into mice.We measure relative proportions at the time of morbidity using FACS. All plots are mean+/- S.D. For all samples, n=3, except for Hgs, and Wwpl tissue samples where n=4.
50
I
1.5
1.0
C00.1.
0CLo 0
MM16=
SInfect*g
Author Contributions
All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F), Michael Hemann
(M.H.), Sara Gosline (S.G.), Simona Dalin (S.D.), and Jennifer L. Wilson (J.L.W.)
Contributions: D.A.L., E.F., J.L.W., and S.G. designed the project; J.L.W. and S.D.
performed research; E.F., D.A.L., M.H., and S.G., contributed new reagents/analytic tools;
J.L.W. analyzed data; J.L.W. wrote the paper; and E.F., and D.A.L. supervised all aspects
of the work.
Discussion
Discovering pathways de novo by leveraging multiple 'omics measurements can find latent
information from functional genomic screens. We confirm the relevance of our model first
through GO analysis. GO enrichment of the input RNAi and mRNA expression sets found
no enrichment of relevant functions. This reaffirmed the disparate nature of individual data
sets and suggested the possible value of an integrated approach. Further, we validated
specific gene predictions from our network model. The model predicted an in vivo specific
effect for Lmo2 and Pogz. They were not within the top 1% of ranked shRNAs, but were
functionally related to the very top gene targets as predicted by the pathway model, and
were found to have in vivo specific effects through further validation. Hgs and Wwpl were
not included in the original screen analysis due to low sequencing coverage, yet the model
predicted their relevance. Without such a model, they may never have been considered.
Through focused validation, they both proved relevant to ALL progression.
The model makes many predictions about genes that have unclear roles in ALL and
this underscores our incomplete knowledge of biological pathways. Some of these genes
have mixed roles in development. Hemopoeisis is governed by stage- and lineage- specific
transcription factor regulation [136, 107]; finding these relationships is valuable to under-
standing disease progression. Some transcription factors, such as TCF3, PAX5, IKZF1,
and EBF1, are involved during many hematopoietic stages[l 36]. Other network genes are
characteristic of non-ALL leukemias. For instance, our network selects the transcription
factor TCF4. This factor is up-regulated in the solid tumors of adult T-cell ALL patients[1 07]
51
but currently has an unknown roll in B-cell ALL, though it is highly expressed in this pre-
B-cell ALL system. Additionally, functional enrichment identifies that network genes are
enriched for the TGF-beta growth factor pathway process. These pathways have been
identified in leukemias other than ALL, and have been characterized for clinical utility. In-
vestigations following on the utility of these findings could be fruitful paths forward in the
context of ALL. Multiple leukemias, including chronic myelogenous (CML)[1 04], acute T-cell
leukemia (ATL)[1 09], and ALL[28], express TGF-beta associated genes. Though, expres-
sion of TGF-beta components was higher in the T-cell leukemias than in the other leukemic
cell lines[1 09] and the ATL samples responded to exogenous TGF-beta. Further, TGF-beta
and Foxo3a activity both promote the maintenance of leukemia-initiating cells in CML[1 04]
and loss of Foxo3a and TGF-beta inhibition better sensitized cells to Imatinib treatment.
Clinically, patients with higher levels of TGF-beta are considered high-risk, as they often
harbor additional mutations that prevent the tumor-suppressor roles of TGF-beta[28]. The
network also identifies the Wnt signaling pathway which is relevant in multiple hemato-
logical malignancies; specifically, stabilized beta-catenin is associated with differentiation
arrest, is highly expressed in childhood T-cell ALL, and Wnt signaling can promote drug
resistance[107, 39, 7, 43]. Understanding the role of TGF-beta and Wnt in ALL will in-
form research in therapeutics as these pathways have already been characterized in other
leukemic contexts.
These findings confirm the relevance of network gene selections and highlight the simi-
larity of hematopoietic malignancies; many of the same genes are involved, though degree
of expression and directionality of their effects can be context dependent. Further, B-cell
development pathways have been identified as hallmarks of drug-resistant leukemias and
thus, therapeutic strategies promoting B-cell maturation show promise[64]. Previous stud-
ies have also demonstrated the role of histone modification pathways in relapsed ALL;
specifically CREBBP and CTCF, are mutated in these patients and may affect treatment
response[1 02]. The network confirms the relevance of CREBBP to pre-B-cell ALL, though,
the model indicates that this gene is deleterious to B-cell viability and does not specifically
modulate the response to the in vivo environment. Identifying developmental pathways in
ALL is not novel, but there is novelty in specifying which of these genes are deleterious for
52
pre-B-cell growth and those that are deleterious in vivo.
We develop our model to select for context-specific interactions and to identify genes
relevant to the in vitro or in vivo settings, or genes that are common to both. In the case of
Hgs, the model predicts an in vitro specific effect. We demonstrate that Hgs loss confers
a growth disadvantage in culture, blood, spleen, and bone marrow, but that this loss is
most drastic in culture. This trend emphasizes the ability to distinguish genes relevant to
the pre-B-cell model and those specifically responding to the cancer microenvironment.
In the case of Hgs, loss of this gene is significant in culture, blood, spleen, and bone
marrow, though our model predicts and confirms that this is not an effect specific to the
tumor microenvironment. This framework is valuable as screening context (e.g. treatments
or experimental environment) can affect which parts of a pathway are relevant. In our
example, all genes in the model are predicted as part of the ALL pathway, but only a
subset are relevant to ALL progression in vivo.
Hgs (Hrs) is a member of the ESCRT-0 family of proteins involved with multivesicular
body (MVB) formation [31]. This protein is involved with the internalization and degradation
of activated cell-surface receptors[1 44, 103] and loss of Hrs is associated with increased
accumulation of E-cadherin and decreased cell proliferation[144]. Hrs is also essential
for the termination of 1L-6 signals by sorting gpl 30, a transducer of IL-6 stimulation, for
endosomal degradation[141]. Further, Hrs deficiency in mice decreased B-cell receptor
expression; BCR expression is necessary for pre-B-cell expansion[1 03, 170]. The conse-
quences of Hgs loss suggest that deficiencies in receptor internalization are deleterious to
this type of pre-B-cell ALL.
The network predicts a novel role for Wwpl in pre-B-cell ALL. Even though the network
model was constructed using shRNAs that conferred a growth disadvantage, the model
does not predict the directionality of effects for predicted genes. Validation experiments
confirmed that Wwpl had an in vivo specific effect, and that Wwp1 loss conferred a growth
advantage. This enrichment is surprising because so far, most evidence suggests that
Wwpl is an oncogene; our competition assays suggests that in ALL, Wwpl may have a
tumor suppressor role. WWP1 is a member of the Nedd4 family of ubiquitin ligases, many
of which are over-expressed in cancer[18]. Many results suggest that WWP1 acts an
53
oncogene by targeting tumor-suppressing proteins, such as LATS, for degradation. WWP1
is over-expressed in breast and prostate cancers, and seems to function in this oncogenic
manner[17, 167]. Though, WWP1 also participates in a unique feedback loop with p53.
Wwpl stabilizes p53 protein in the cytoplasm, but decreases its expression, and in turn,
p53 reduces the expression of Wwpl[82]. This feedback loop is dependent upon p53
mutation status; mutated p53 abrogates this feedback dynamic leading to increased Wwpl
expression. In our pre-B-cell ALL, Wwpl has relatively low expression and intact p53
suggesting an intact feedback loop. Loss of Wwpl could lead to decreased p53 protein
stability and enable interactions with other proteins. For instance, these cells express high
levels of survivin (Birc5), which can prevent p53-mediated apoptosis in pediatric ALL[1 47].
Off target effects are one of the largest criticisms of RNAi screens, yet even with these
effects, we construct and validate a model for genes regulating ALL progression. We cre-
ate this model with less than 100 input targets from each screen and find GO enrichment
of genes related to leukemia and hematopoietic development. This demonstrates that the
network filter is sufficiently powerful for analysis of screening data and that even stringent
thresholds are sufficient for identifying genes relevant to a pathway. Further, we perform
this analysis retroactively without controlling for OTEs or requiring specific negative con-
trols, showing that RNAi analyses do not have to specifically compensate for OTEs. This
suggests that even with limited and imperfect screening results, we can learn more from
these screens, and that pathway discovery approaches are a viable path forward for the
gene-interference community.
There are limitations associated with this approach. We have not yet made an estimate
of validation rate for these network-based filters. To make this estimate, we would need
to validate all genes in the network and determine the false positive rate. This method
requires multiple datasets to create a possible network whereas other approaches require
fewer input datasets. Some examples of tools that require a single input dataset include the
Prize Collecting Steiner Forest (PCSF) [61, 146], TieDIE[1 16], and HOTNET [149, 150].
For a biologist, these tools could be easier to implement, but they lack the perspective
gained by integrating multiple 'omic measurements. Each 'omic measurement captures a
slightly different aspect of the cellular process under investigation, and so there are trade-
54
offs between completeness and simplicity when selecting these tools.
Application of integration methods will improve understanding of gene-interference screens
regardless of improvements in reagent design. CRISPR systems are more sensitive than
RNAi and are better suited for discovering essential human genes[52, 53]. Though, both
technologies are limited by biological compensation - alternative pathways or redundant
proteins. Thus, adapting and applying interaction-based methods for data integration will
continue to be of importance for functional genomic investigations. Many have promoted
the value of data integration analysis methods[98, 161, 135, 166], however, these analysis
approaches remain under utilized in the gene interference community.
55
Materials and Methods
shRNA screen, and data processing
The original shRNA screen and mRNA expression data are described previously[93]. DE-
SEQ2 size-factor normalized and calculated fold-changes for shRNA sequencing data. We
ranked these fold-change values by p-value from DESEQ2[5]. The GOrilla tool (http://cbl-
gorilla.cs.technion.ac.il/) determined GO functional enrichment using the top 1 % of dif-
ferentially depleted shRNAs against the background of the whole mouse genome. GO-
rilla also determined functional enrichment of the top 1% of differentially expressed (up-
regulated/down-regulated mRNAs for the in vivo/in vitro contexts) mRNAs against the
background of the whole mouse genome.
Cell culture
Pre-B-cell ALL cells[1 60, 159] were cultured as published[93].
ChIP-SEQ
We performed ChIP assays as previously described[90]. Briefly, 8.5x106 pre-B-cells were
cross-linked with 11% formaldehyde. Pellets were lysed and sonicated using the BioRup-
tor. Following sonication, activated DNA regions were immunoprecipitated using the fol-
lowing antibodies: K4me3 (Millipore Lot#1974075), K27Ac (Abcam Lot#GR1048521), and
IgG (Millipore Lot#JBC17938060).
Peak Finding
We used the MACS algorithm [37]to identify peak regions 10K kb upstream of the transcrip-
tion start site. Within these peak regions, we searched for valleys (local minima within peak
regions) and then used these regions as inputs for potential transcription factor binding lo-
cations. The ChIP-seq data are available: GSE77570 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?ac
56
Interactome construction
We used Garnet to create transcription factor to DNA binding interactions. The Garnet
package uses a weighted scoring function[106] to determine which transcription factor
motifs were the most likely to bind a set of Fasta sequences. For input, we used the valleys
identified from our previously mentioned ChIP-seq data. The software is part of a suite of
'omics integration tools (http://fraenkel.mit.edu/omicsintegrator).
For our starting interaction network, we downloaded interactions from iRefWeb version
9 and we only kept interactions that mapped to UNIPROT reviewed human proteins. These
interactions were scored using the miscore framework[1 52] to create a probabilistically-
weighted interactome. This score considers the number of publications, the type of inter-
action, and the evidence supporting the interaction. We discarded interactions with a score
below 0.3. This yielded a starting network of 88117 interactions. The interactome is hosted
on the Fraenkel Lab Website (http://fraenkel-nsf.csbi.mit.edu/psiquic/).
SAMNet
We built the model using SAMNet[49]. We used the top 1% of depleted shRNAs (for in
vivo and in vitro), the top 1% differentially expressed mRNAs (up-regulated genes were
assigned prizes for the in vivo context and down-regulated genes were assigned prizes for
the in vitro context), the transcription factor to DNA binding interactions, and the weighted
interactome as inputs to the algorithm. For simplicity, we mapped all mouse data to human
gene symbols using homology. We used a gamma parameter of 17 as this was found to
maximize the number of shRNA-targeted genes in the network. We again used GOrilla
to determine the functional enrichment of the network, this time using the set of network
nodes as a foreground against the background of the whole mouse genome.
Randomizations and Robustness Analysis
For randomizations, we completed 100 runs of the algorithm at the same gamma parame-
ter, but with the depletion and differential expression scores distributed to random sets of
genes. We calculated node enrichment fraction, or specificity, by counting the number of
57
times a node from the real network was selected in a random network. For further scoring
and ranking, we used 1-specificity to keep all network parameters on a 0-1 scale. We used
networkx in Python to determine the authority scores for network genes. SAMNet outputs
the amount of flow converging on a gene. Our aggregate scoring method weighted each of
these normalized values by the standard deviation of these metrics across all genes and
was normalized to the sum of all metric standard deviations:
R = (1-specificty)*o( 1-specifcit-y)+authority*Oauthaority+Iow*O1fIow0(1- specifcity) +01authority +aflow
We completed this calculation separately for genes unique to in vivol in vitro contexts
and common between both contexts because common genes had two authority and flow
scores. Because we designed the model to allow for multiple out-going connections from
transcription factors but not from hidden genes and because phenotypic (shRNA) hits did
not have incoming flow in the mode, we created separate rankings for transcription factors
and phenotypic hits.
GFP Competition Assays
For model validation, we conducted parallel in vitro and in vivo GFP competition as-
says. We created shRNA constructs for Hgs and Wwpl, and infected pure populations
of mCherry positive ALL cells as described[931. At the time of experiment, infected pre-
B-cells were mixed 50:50 with mCherry positive ALL cells. From these populations, 100K
cells were tail-vein injected into four 8-week-old, female C57BL6 mice, and 100K cells
were plated in triplicate. Cultured populations were split 1:5 daily until mice reached
morbidity (10-12 days following injection). At the time of morbidity, we collected blood,
spleen, and bone marrow. For all in vivo and in vitro samples, we measure the %GFP
of mCherry-labeled cells and calculated a fold-change relative to the 50:50 ratio at input.
We normalized %GFP-fold-change to the MLS control for the respective tissue and used a
t-test with Welch's correction to determine significance. The sequences used for Hgs and
Wwpl shRNAs are CCAGAAACCACTTATATGTCTA and CTCCCTATTTTATACAGAGCAA.
We tested shRNA knockdown using qPCR relative to Gapdh using Taqman expression
assays. shHgs and shWwpl left 23.28% and 67.64% mRNA remaining (Supplementary
58
Figure 9).
59
Supplemental Figures and Tables
__ iA
V I
_4A
Figure 8: The whole network, unformatted as created by SAMNet.The Source and Sink nodes have been colored blue and orange respectively to demon-strate their connectivity to the original inputs. Labels remain in Human Uniprot identifierswith the exception of mRNA targets which are mapped to gene symbol. We show the fullnetwork for completeness, but do not use the network in the unformatted version for anyinterpretation. For better viewing, an interactive format is available on the Fraenkel Labwebsite (http://fraenkel-nsf.csbi.mit.edu/psiquic/).
60
qPnnr
Figure 9: Percent reduction in mRNA after knockdown as measured via qPGR.
61
-- ------ 061M -- - . - - -%mom.
Part IV
Network modeling identifies commonregulators of NFxB and TGFa ectodomainshedding
Abstract
Studying pathway dynamics affected by growth factors has implications for under-standing cancer and designing therapeutics, yet few studies have investigated regu-lation of the growth factors themselves. Relevant growth factors, such as the Epider-mal growth factor ligand pro-Transforming-growth-factor-alpha (pro-TGFc), can haveboth autocrine and paracrine effects. Sheddases, notably the metalloprotease A-Disintegrin-And-Metalloprotease-1 7 (ADAM1 7), can cleavage-activate multiple cell sur-face pro-ligands on the cell surface, in a process termed ectodomain shedding. In thiswork we investigate the role of select signaling genes that specifically affect the cleav-age of TGFa. We use an shRNA screen and network optimization to identify a putativepathway regulating shedding of TGFa. This network contains experimental genes fromthe shRNA screen and predicts additional "hidden" genes not previously analyzed. Inour analysis we identify a TGFa cleavage regulatory pathway that contains a cluster ofgenes with known NFxB regulatory functions - PEF1, PEF2, CALM1, OBSCN, IRAK1,XIAP, and TAB1. We validate that the hidden genes XIAP and TAB1 have roles inTGFt shedding. Further, we provide evidence of NFxB and EGFR pathway crosstalk,and demonstrate for the first time that regulators common to both pathways affectectodomain shedding of TGFL.
62
Introduction
Growth factor pathways, and specifically receptor tyrosine kinases (RTKs) are the focus
of multiple therapeutic efforts in cancer [51, 157, 95], though even with well-designed in-
hibitors, resistant cancers arise [34, 33, 108, 148]. For instance, tumors can evolve point
mutations that render the therapies ineffective, or the cell can activate alternative pathways
to overcome inhibition[33, 114, 163]. Alternative pathway activation, sometimes referred to
as 'RTK switching'[44], results from bypass signaling mechanisms such as feedback loops
or the presence of growth ligands for other growth receptors. For instance, exogenous EGF
can confer full resistance to the cMet inhibitor, Crizotinib, treatment in gastric cancers, and
partial resistance to this therapy in lung cancer[1 62]. Wilson et al further characterize this
phenomenon in 41 cells lines, with six different RTK ligands in the presence of nine RTK
inhibitors. Further examples of growth factor-induced resistance include NRG1 resistance
to HER2 inhibition, HGF- and FGF-conferred resistance to NRG1 inhibition, and EGF-
conferred resistance to FGFR inhibition[1'62]. These findings suggest that these pathways
are redundant in their ability to actuate growth programs. Though, it is unclear if these
pathways converge on the same downstream signaling molecules or if they use unique
signaling paths to actuate growth. To understand growth factor induced mechanisms for
resistance, we need a thorough understanding of the pathways that govern regulation and
cleavage of these factors.
The growth factor TGFcx has already been studied in many disease contexts, yet, com-
prehensive understanding of the signaling network surrounding this ligand is lacking. In
breast cancer, TGFc expression is negatively correlated with progesterone receptor ex-
pression, but positively associated with tumor aggressiveness[87]. In gastric cancer, sig-
nal peptidase 18 (SECi 1 A) loss increased tumor invasion and tumor volume, and also
induced increased TGFa cleavage[1 13]. In pancreatic cancers, cells with increased TGFX
expression were more sensitive to gefitinib than those with lower TGFX expression. The
authors found that gefitinib sensitivity was dependent on basal EGFR pathway activation,
suggesting a feedback mechanism between activated EGFR and TGFa [118]. While TGFi
expression has been characterized in cancer and response to therapy, there is an incom-
63
plete understanding of how regulation of the shedding of mature TGFa relates to this pro-
cess. We know that activation of the RTKs FGFR, PDGFR, stimulates TGF shedding and
that the Erk MAP pathway is necessary for this receptor-mediated stimulation[35]. When
compared to shedding of other ligands, MAP kinase signaling was identified as a general
mechanism for shedding of TGFc, TNFcx, and L-selectin[35], though in other work, using
hypertonic stress, LPA, or TPA stimulation could selectively shed HB-EGF, EGF, NRG, and
TGFa[57]. Further work extended this result and showed that different PRKC isoforms
were required for shedding depending on the stimulus[25].
The EGFR pathway and its related growth ligands are becoming interesting molec-
ular targets for gastric cancer which has limited targeted therapies. Currently, gastric
cancer is one of the largest causes of cancer-associated death worldwide[8] and surgi-
cal resection remains the frontline treatment[132] The disease is associated with obesity,
and H.pylori infection, and these lifestyle factors have been associated with molecular
level phenotypes[132, 70]. Specifically, H. pylori infection increased expression of EGF,
and EGFR, and these expression patterns persisted after eradication of the infection[70].
EGFR inhibition blocked tumor growth and invasion in gastric cancer cells[174]. EGFR
status had an association with short overall survival whereas HER3 expression had a
borderline association with longer overall survival in an immunohistochemical analysis of
esophageal and gastric cancer patient samples[54].
Some gastric cancer samples harbor c-Met amplifications, thus making both EGFR
and c-Met pathways the focus of therapeutic interventions[1 12, 8, 67]. In lung cancer
patients, c-Met amplifications were associated with resistance to gefinitib and erlotinib,
and combined inhibition of both c-Met and EGFR was necessary and sufficient for block-
ing downstream MEK/ERK activity[33]. However, targeting c-Met and EGFR has conse-
quences beyond reducing activity of the RTKs themselves. First, pEGFR, and the ma-
trix metalloproteinases, MMP7 and MMP9, were significantly elevated in gastric cancer
resections[165]and TGFa, and the matrix metalloproteinases, MMP-7, and MMP-9 were
associated with poor overall survival in gastric cancer patient biopsies[38]. Further experi-
ments in gastric cancer cell lines showed that blocking EGF-induced EGFR activation could
prevent MMP7 and MMP9 activation[165].Because the metalloproteinases are responsible
64
for shedding multiple surface-bound growth factors, this finding suggests that targeting the
EGFR pathway could alter growth factor dynamics for multiple pathways. Further, targeting
both c-Met and EGFR had limited efficacy in MKN45 cell lines, a c-Met amplified gastric
cancer cell model. Specifically, siRNA knockdown of c-Met decreased activation of the im-
portant downstream signaling molecules p-AKT and p-pl3K in this cell line, though these
cells did not show an altered sensitivity to gefinitib treatment[1 75].
From a molecular perspective, NFxB appears to be an ideal co-therapy for EGFR in-
hibitors in epithelial cancers. NFxB has been associated with resistance to EGFR therapies
in cancer[1 33], and inhibition of NFxB has also sensitized cells to erlotinib treatment. Anti-
EGFR therapies showed reduction in NFxB activation, but did not drastically alter patient
outcome and survival[133]. The EGFR pathway can activate NFxB through proteasome-
mediated degradation of IkBot, yet proteasome inhibition with Bortezomib was not suc-
cessful because non-canonical NFkB activation remained intact[3, 4]. Unfortunately, IkK
inhibitors have not been effective co-therapies for EGFR inhibition[133]. Further, NFxB
targeted therapies have high toxicity [26].
The sheddase enzymes ADAM 10 and ADAM 17 are a major focus of growth factor reg-
ulation because they cleave and activate these factors. Though, these enzymes serve a
multitude of functions: they shed growth factors, cytokines, growth factor receptors, and ad-
hesion molecules, and as a result of these activities, ADAMs can actuate cell growth, death,
differentiation, and angiogensis, among others[22]. The ADAMs have widespread effects
that can either inhibit or activate phenotypes/cellular processes; e.g. ADAM17 cleavage of
EGF family ligands is activating, ADAM10 cleavage of Ephrin ligands is inhibitory. These
qualities make them unideal candidates for therapeutic intervention[1 19, 2], and thus it is
important to consider other means for modulating growth factor levels in the presence of
targeted therapies.
Here we endeavored to discover a pathway regulating TGFa cleavage and further char-
acterize novel regulators of this pathway. We first pursue an shRNA screen for kinases and
phosphatases affecting TGFx release. Knowing that RNAi screens are incomplete and bi-
ased due to off-target effects, we pursued an integrated network approach for discovering
the true TGFa pathway from this type of data. We have previously demonstrated the power
65
of integrated network tools for the de novo discovery of pathways from gene interference
screens, and believed this would be a powerful approach for the discovery of the poorly
characterized pathway affecting TGFa shedding. From this pathway, we discover a novel
connection between NFxB signaling and TGFa shedding; specifically that these pathways
share common regulators. Using this information, we explore IRAK1 as a specific modu-
lator of both NFxB and TGFx and postulate that therapies against this enzyme could have
therapeutic potential.
66
Results
A kinome/phosphatome screen for regulators of TGFa shedding
We conducted a screen for signaling regulators of TGFa shedding using a FACS based
assay (10A). We measured the relative extracellular TGFa (APC) to intracellular TGFa
(GFP) and normalize this red:green ratio for all shRNAs to the shlacZ controls (distribution
in 10 C). Most individual shRNAs fall in the region of no effect, or have a normalized ratio
with an absolute value less than 1 z-score from the distribution mean. 1 OB represents the
distribution of redundant shRNAs against a given gene; most genes have 4 or 5 redundant
shRNAs, others have 2,3,8,9, or 10 redundant shRNAs. In further interpretations, we
use the shRNA family size to represent the number of redundant shRNAs against a given
gene. This screen was originally published and validated in[25], based on a two-best
shRNAs scoring metric (7). We briefly compare different scoring methods and the gene
hits selected by these methods in 10. The two-best shRNA scoring method requires a
gene's top two shRNAs contain one hairpin with a score 2 z-scores above the mean and
one hairpin above 1.5 z-scores above the mean; this method selects 5 genes. The median
score method simply takes the median of all redundant shRNAs and finds 22 genes with
an effect 1 z-score above the mean (in the positive and negative direction). The shEnrich
method selects the most targets of all the methods. Across the three methods, PRKCA
scores in all methods. TRPV5 scores in the two-best and shEnrich method (explained
below).
67
A
TGra H'- mH acelular
.racellularGFP GFP
B
01)
E
500
400
300
200
100
02 3 4 5 8
num. shRNAs
C
4
a Soo 1000 1h00 noC 3'00 000 15W
rank
_5 4 j 2 0 1 2 3zscore
0 500 10 110 2000 2509 IAX
9 10
Figure 10: An shRNA screen measures kinase and phosphatase effects on TGFCt shedding(A) The Jurkat cell line expressed a GFP-HA tagged TGFa. After staining with an APC-coupled anti-HA antibody, we measured relative TGFot shedding by red:green ratio. (B)shRNA coverage per gene; for most genes in the library, we have 4-5 shRNAs against thesame gene. (C) Z-score normalized red:green ratios for all independent shRNAs relativeto shlacZ controls plotted as a ranked distribution and as a histogram of scores. Z-scores< 0 correspond to shRNAs which had a low red:green ratio and induced shedding, and z-scores > 0 correspond to shRNAs which had a relatively high red:green ratio and preventedshedding. The image below shows where all lacZ shRNAs (20 total) and all AXL shRNAs(10 total) fell within this distribution, and demonstrates the difficulty in determining whichgenes are consistently affecting shedding.
An shEnrich method for selecting gene candidates
To score the genes from the screen, we used an shEnrich method to measure consis-
tency of redundant shRNAs and strength of their effect. The enrichment scoring method
is conceptually modeled after the original GSEA method, and is representative of other
rank-order methods for RNAi scoring (e.g. RIGER and dRIGER)[140, 89, 171, 72, 121].
Our method most closely resembles that of Kampmann et al [72] in that our screening
readout was a single measurement (i.e. z-score normalized red:green ratios) and that we
derived our null distributions from a cohort of non-targeting controls. Our approach differed
68
3500
in that we calculated expected distributions for each gene based on the shRNA family size;
e.g. for a gene with 4 shRNAs, we calculated 1000 permutations using 4 shLacZs from
our control set. The final result of our enrichment method is a projected z-score and a
maximal enrichment score. This represents the rank at which maximal enrichment occurs
for a gene's shRNA family.
We demonstrate this selection process in 11. 10C depicts a conceptual motivation for
AXL relative to the lacZ controls. The dot plots represent where redundant shRNAs against
the same gene score in the distribution (AXL in green, lacZ in black). 11 A represents
the shEnrich score for Axi in the forward direction (highest rank reflects that an shRNA
increases shedding) and for PPP1 R1 4D in the reverse direction (highest rank indicates an
shRNA decreases shedding or maintains TGFa at the surface). Both AXL and PPP1 R1 4D
show high enrichment scores relative to the lacZ controls indicating high consistency. We
repeated this process for all genes of all shRNA family sizes in both the forward and reverse
directions. Within each family size, we compared gene shEnrich scores against 1000
permutations calculated using subsets of the shlacZ controls. 10C shows the distribution
of shEnrich scores for all genes relative to the distribution of shlacZ scores for shRNA
family size 5 (all other family sizes plotted in 14&15). Most genes did not show an effect
that was as consistent or more consistent than the shlacZ controls.
To additionally filter gene candidates for strength of effect, we plotted genes' shEnrich
score against the projected z-score (explained above) (11 D). Many genes had a low shEn-
rich score and a low projected z-score (low referring to a score with a value >-1 in the
forward direction or <1 in the reverse direction). Only a few genes had a strong shEnrich
score and a relatively high projected z-score; these genes fell in the upper-left/upper-right
quadrants (highlighted in orange) in the forward/reverse directions (scatter plots for all other
family sizes shown in supplementary 12). For our example genes, AXL did not make the
projected z-score cut-off of -1, but PPP1 R1 4D did make the 1 z-score cut-off. The full list
of genes that passed these stringent criteria is shown in 7.
We validated a portion of these gene targets in an MDA-231 cell line, measuring TGFo
cleavage via ELISA assay. Effects for PPP1 R1 4D, and PRKCA are shown in 11 D. We
have already published additional validation for PRKCA and PPP1 R1 4D[25]. Combined,
69
these experiments show that the shEnrich method can identify gene targets with an effect
on TGFa cleavage and that these effects are consistent in new cellular contexts. However,
not all targets were consistent with the shEnrich predictions; DUSP9 showed an effect on
TGFc cleavage in MDA-231 cells, but the directionality of this effect was reversed (11 D,
bottom) as compared to the effects observed in Jurkat cells. This switch in directionality
suggests that contextual factors affect the shedding phenotype, making it difficult to identify
pathways components from enrichment scores alone. This further suggests the need for
additional analysis to identify the true TGFa pathway.
For comparison, we also created tables for a "two-best" and median scoring approach.
The median scoring approach took the median value of all shRNAs against a given gene.
10 contains the results of these scoring methods. The "two-best" approach selects only
five genes as hits. The median approach identifies 22 genes as hits. While the most strin-
gent, the "two-best" approach selects genes identified with the shEnrich method, where
the median approach selects distinct hits from the other two methods.
70
A
1.2 J
B
Co sme*"
D TGFa cleavage in MDA-231
44
00
TGFa cleavage in MDA-231
4 .
Figure 11: shEnrich selects genes for consistency and effect size(A) Lightening plots show the enrichment score for AXL in the forward direction (red, left)and PPP1 R1 4D (gold, right) in the reverse direction. Grey lines represent 100/1000 enrich-ment scores calculated for shlacZ. Dashed line represents position of maximal enrichmentfor the gene of interest. (B) Normalized enrichment scores (NES) for all genes with 5redundant shRNAs plotted against 1000 NES for lacZ in the forward direction (left) andreverse direction (right). (C) Maximum NES score for all genes and lacZ controls with 5redundant shRNAs plotted against the z-score at which maximum enrichment occurs ('pro-jected z-score'). Normalized distributions and maximal enrichment plots for all other genefamily sizes are included in the supplement. (D) ELISA for TGFa cleavage in MDA-231 cellsfor PPP1 R1 4D and PRKCA (top), and DUSP9 (bottom). Each point represents redundanttests of the same shRNA; error bars are standard errors of the ratio of shRNA:shlacZ.
71
m piatelA
plate1B40 pla*38
p4Me3A
Splate2Aplate2B
mpIatelA3'0 pate3B
plale3A-l Plate2A
plIabo2B
shEmich Method
shRNA famdy size 2 DDR2 PPP5CPPPR1A DKFZP566K0524 PRG-3PII3P1 DUSP13 PRKWNK3RIOK2 EAT2 PTPRMshRNA family size 3 EEF2K RAF1CDC25A EIF2AK3 RIMS4PTPN18 EPM2A SGPP2TP53RK FBP2 SH2DIAshRNA family size 4 FLJ25449 SMG1BLK Gp 109b SNRKCKM GUCY2C SPAPIDUSP5 IMPA2 SYT14LEGLN1 INPP5F WWP2FLJ16518 IRAK1 shRNA family size 8GALK2 ITPKA CIB2GMFB LO283871 ERBB4HYPB LOC401313INPP5B LPP84KIAA2002 MAP3K1 1 vssnc
LOC400687 MAP3K71P1 shRNA family size 2LOC441868 MAPK4 CALM1LOC90353 NRI13 PPP2RIALOC91461 NT5E DUSP9NEK7 NUDT11 shRNA lamilysire 3
NUOTlO OBSCN PPAP2APHKA2 PCTK3 SSH3PIN1 PIB5PA shRNA family size 4PMVK PIP5K1A GPR109APTPN22 PLCB4 LCKPTPN5 PPAPDC1 PPP1R14DPTPRE PPAPDC1A SIK2RIPK5 PPEF1 TRPV5RXRB PPEF2 shRNA family size 5SACMIL PPFIA1 CDC14ATEX14 PPM1J ENPP1TRPM7 PPPIR12B EPB41L4AshRNA family size 6 PPP2CB NMESABLI PPP3CA PIM1AKAPI1 PPP4R1 PKIBCKS2 PPP4R1L PRKCA
Table 7: Gene hits selected by shEnrich
PCSF identifies a TGFu shedding pathway
We have previously articulated the role of off-target effects (OTEs) in RNAi screens[1 61];
specifically that both reagent-based and biology-based effects affect if a gene can be iden-
tified as part of a pathway from RNAi data. Our previous work demonstrated that data in-
tegration in a network context could construct pathways de novo despite noise from RNAi
screens. Given these results, we used the Prize-Collecting Steiner Forest (PCSF) to con-
struct a pathway for TGFax cleavage; in this process we predict additional pathway genes
not identified in the initial experimental set. Functionally, the Prize Collecting Steiner Forest
(PCSF) acts as a filter, requiring genes to have functional associations with other hits from
the screen to be included in the final pathway. To define all sets of possible associations,
we derived an interactome from iRefWeb and add predicted kinase- and phophatase- site-
specific interactions from Networkin and DEPOD (interactome construction explained in
methods). The algorithm then constructs an optimal sub-network from this interactome
by selecting edges that capture as many genes from the shEnrich ("experimental genes")
72
method as possible. In this process, the algorithm adds predicted genes that connect
genes without prizes, but pays a cost for adding these edges (12, top). We use a set of
parameters to tune the algorithm to ensure an appropriate selection of experimental and
predicted genes and explain this process in methods.
The network contains 86 genes, and 97 edges (12, bottom; full network in 18). Of the
original 108 genes selected from the shEnrich method, 43 are retained in the final network.
The remaining 43 genes are predicted by the algorithm to be relevant to TGFX cleavage.
These predicted genes include proteins that are refractory to shRNA knockdown and those
that were not in the original screen. We hypothesized that the network gene list contained
subsets of functionally related genes. To find these subsets, we leveraged edges in the
network solution; genes with a high number of inter-neighbor edges are more likely to
have functional similarities. We used the GLay Community clustering Cytoscape plug-in
to subdivide the final target set into 9 sub-modules. This process removed 14 edges (12,
bottom).
73
CASP9 $183 XIAP
CAR3
WNK3
PPPtf2BAKIO 11
Ef2K
EEF2K S78
ITP-RK F K
TP5L"15
SIK2 5 S 73
S*2
P (Rl E
BCL_ Sr87 P"C
P
/ PP1A
T '5 Nrl
TRPV5 654 3
\ NR1I3 T38
VASP 9157 PTB P1 16
Pti
SLAME11
SLAMW Y307 S
Pow- 9
- - LCK ITyr394
RAF1 S499
BCL2 S70
CASPS Y153
C(02
TS(:2gD4
PINt $138
MAl 11
lA C A
CDC25A Sor116Ser32'
*2 CD64A
Closeness Sensitiviq
AAB1 Insensitive
A Sensitive
Af1 0,668
SWId 0.394
ilCA Experimental gene, non specific
iR15i Experimental gene. specific
( Predicted gene, non specific, sensitive
BCL2 Predicted gene, specific, insensitive, high closeness centrality
Figure 12: PCSF identifies the TGFa shedding regulatory network(top) The cartoon schematic represents how PCSF selects an optimal sub-network. In-teraction edges are aggregated across multiple databases and all edges are scored (twoshown for simplicity). Prizes are assigned from experimental data; in our case prizeswere assigned to genes selected by shEnrich. The algorithm weights the cost of addingedges with capturing prizes in selecting an optimal network. (bottom)The optimal networkcontains 86 nodes (genes) and 83 edges. Grey face coloring indicates genes selectedfrom the original experimental data set; white face represents hidden nodes (genes andphospho-sites) selected by the algorithm. Node border represents specificity to random-ization (pink<=0.1; orange<=0.05), and node size represents closeness centrality (largernodes are more central and more robust). Grey node labels indicate nodes that are sen-sitive to edge noise. Nodes on the far right are annotated examples to help interpret all oftheir network properties.
74
1 -Specificity
I0 950.950.99
Robustness analysis prioritizes predicted pathway genes
To vet the candidates selected by PCSF, we performed a series of robustness and con-
nectivity analyses. We measure the sensitivity of each gene in the solution by counting
representation in 100 runs of PCSF with noise added to the edges. Genes that are insen-
sitive to noise show up in all networks and are assigned a score of 0.00, whereas genes
that are sensitive appear in a few networks and are assigned a score equal to 1-fraction of
network in which they are represented. A score of 0.99 indicates that the gene was present
in only one network. None of the experimental genes were sensitive to edge noise, though
12 of the predicted genes (e.g. MYH9, ING4, PPP1 CB) were highly sensitive and showed
up only in the solution without edge noise (Node label color in 12; quantification in8&9).
Randomization using random inputs from the initial library tests specificity of the final
solution. This process involves selecting random sets of genes from the original library,
rerunning the PCSF routine at the optimal parameters and counting gene representation
in this family of 100 random networks. Gene border color indicates fractional represen-
tation in these 100 random networks (Border color in 12; quantification in 8). Of the 43
experimental genes, 30 of these have low specificity, indicating general association among
targets from the original library; these targets may still have a role in shedding regulation
(such as PRKCA), though, we cannot distinguish these effects from general library asso-
ciation. Of the 47 predicted genes, 13 of these - PHKA2, PTPRM, ABL1, NR1 13, RXRB,
GMFB, PTBP1, CKS2, TRPM7, WWP2, PPEF2, PPEF1, PPP2R1 A, SNRK- are robust to
randomization and have high specificity.
We further use centrality metrics to quantify the robustness of our network selection.
Centrality does not imply a biological centrality per se, but reflects how the genes were
selected in this network solution. Having a high centrality implies a greater resilience to
experimental noise from the input set; mathematically, centrality reflects how interaction
edges contribute to a gene's presence in the final network. We calculated the degree,
betweenness, page-rank, and closeness centrality, and an aggregate score (1 9A) based
on interactions in the original, unclustered network. We created a normalized histogram
of centrality scores to look at the relative distribution of values (1 9B). Closeness centrality
75
best discerns connectivity, where other metrics are dominated by a few high-valued nodes
(i.e. Page-Rank, Betweenness, and Degree centrality). Using closeness centrality, we can
visualize these genes within the network solution (Node size inl 2; quantification in 8).
Lastly, we create a normalized, weighted sum reflecting an experimental gene's perfor-
mance across these three metrics. We adjust specificity/sensitivity to be (1 -specificity)/(1 -
sensitivity), and normalize closeness centrality to the max value to scale all metrics from
0-1. We weight each metric by the standard deviation for that metric and normalize to the
maximum score possible (8).
W = (1-sens) *Osens +(1 -spec)*sipec+normd*OnormciOsens +OSpec+Ofnormcl
We complete a similar analysis for the predicted genes, and again measure sensi-
tivity to edge noise, specificity using random input terminals, and closeness centrality
as a metric for connectivity (9; all nodes in Supplemental Table 2). From this analysis,
RAF1_S499, CASP9, XIAP, CASP9_S183, PIN1_S16, VASP_S157, CASP3, NR113_T38,
and PTBP1_S16 are ranked highly.
76
LCK 0.00 0.891 0.920 0.963RAF1 0.00 0.723 1.000 0.956PHKA2 0.00 0.941 0.780 0.936PIN1 0.00 0.871 0.821 0.935PRKCA 0.00 0.713 0.923 0.935PTPRM 0.00 0.921 0.730 0.921PPP3CA 0.00 0.851 0.774 0.920BLK 0.00 0.861 0.758 0.918SH2D1A 0.00 0.851 0.738 0.911PTPN22 0.00 0.832 0.732 0.907ABL1 0.00 0.901 0.668 0.902NR113 0.00 0.960 0.609 0.897RXRB 0.00 0.911 0.635 0.895GMFB 0.00 0.950 0.609 0.895PIM1 0.00 0.881 0.654 0.895PPP2CB 0.00 0.842 0.673 0.894PTBP1 0.00 0.901 0.609 0.887TRPV5 0.00 0.891 0.609 0.886IRAK1 0.00 0.911 0.587 0.883CKS2 0.00 0.881 0.563 0.873CDC14A 0.00 0.891 0.554 0.872DUSP5 0.00 0.891 0.554 0.872TRPM7 0.00 0.911 0.540 0.872CDC25A 0.00 0.505 0.801 0.872MAP3K11 0.00 0.871 0.563 0.871WWP2 0.00 0.931 0.524 0.871PPEF2 0.00 0.950 0.509 0.870PPEF1 0.00 0.941 0.509 0.869PPP2R1A 0.00 0.822 0.574 0.866ERBB4 0.00 0.802 0.562 0.860OBSCN 0.00 0.881 0.509 0.859AKAP11 0.00 0.891 0.497 0.858CALM1 0.00 0.733 0.594 0.857PPP5C 0.00 0.891 0.489 0.856PPP4R1 0.00 0.881 0.493 0.855PPP1R12B 0.00 0.871 0.497 0.855SNRK 0.00 0.911 0.441 0.847PPFIA1 0.00 0.802 0.497 0.844EEF2K 0.00 0.851 0.415 0.831SIK2 0.00 0.871 0.393 0.829EGLN1 0.00 0.851 0.390 0.825TP53RK 0.00 0.822 0.394 0.821SMG1 0.00 0.802 0.394 0.818
Table 8: Experimental genes ranked by multiple robustness metricsSensitivity represents fractional representation in a family of 100 networks created with10% noise added to the interactome edges. Specificity represents the fractional repre-sentation in a family of 100 networks created with random inputs; 1 -Specificity is used forranking as a low specificity indicates robustness to this type of randomization. Closenesscentrality reflects a node's robustness as this metric indicates association to other nodeswithin the network. Shading highlights the highest scoring nodes from this analysis.
77
BCL2 0.00 0 901 0 964 0 975RAF1 S499 000 0 950 0 876 0 961BCL2_ S70 0.00 0.970 0.849 0.958XIAP 0 00 0 950 0.851 0.955GASP9 0.00 0 941 0.835 0 950CA SP3 0 00 0 931 0 821 0.945CASP9_ S183 0.00 0.960 0.799 0.944BCL2_Ser87 0.00 0.950 0.795 0,941PIN1 S16 0.00 0.941 0.786 0.938SLAMF1 .Y307 0 00 0,990 0738 0.933NR113_T38 0.00 0.990 0,736 0.933PTBP1 S16 000 0,970 0.736 0.930LCK Tyr394 0.00 0 970 0-732 0.929VASP S157 0.00 0950 0.739 0.928TRPV5_3S654 0.00 0.950 0.736 0.927CASP9_.Y153 0.00 0.990 0.688 0.921PIN1 S138 0.00 0.950 0.670 0.910WNK3 0.00 0.950 0.667 0.909CDC25ASer16Ser321 0.00 0.931 0.656 0.904TAB1 0.00 0.842 0.703 0.901TSC22D4 0.00 0.871 0.670 0.898VASP 0.00 0.941 0.614 0.895SLAMF1 0.00 0.891 0.616 0.887PPP4C 0.00 0.881 0.572 0.875SH2D1B 0.00 0.891 0.525 0.865PTEN 0.00 0.683 0.656 0.864EEF2K_S78 0.00 0.950 0.470 0.861SIK2 T175 0.00 0.921 0.442 0.849SNRKT173 0.00 0.911 0.441 0.847TP53_S15 0.00 0.644 0.443 0.805STK11 0.00 0.535 0.503 0.803LCKS42 0.99 0.990 0.842 0.373PTENY240 0.99 0.970 0.768 0.351MYH9 S1916 0.99 0.980 0.751 0.348GMFBS72 0.99 0.980 0.736 0.345CASP9 Thrl25 0.99 0.960 0.685 0.329MYH9 0.99 0.980 0.630 0.318MAPT Ser 98Serl 99Ser202Thr2O5Thr212Ser214Ser235Ser262Ser396Ser404Ser409 0.99 0.950 0.568 0.298MYH9_T1800 0.99 0.990 0.538 0.297PTENT383 0.99 0.921 0.571 0.294MAPK3_na 0.99 0.782 0.656 0.293ING4 0.99 0.960 0.438 0.268PPP1CB 0.99 0.723 0.579 0.265
Table 9: Predicted genes ranked by multiple robustness metricsSensitivity represents fractional representation in a family of 100 networks created with10% noise added to the interactome edges. Specificity represents the fractional repre-sentation in a family of 100 networks created with random inputs; 1-Specificity is used forranking as a low specificity indicates robustness to this type of randomization. Closenesscentrality reflects a node's robustness as this metric indicates association to other nodeswithin the network. Shading highlights the highest scoring nodes from this analysis.
78
:111 11, 1111 A, 111311 1T, 11,!!
NFxB activation and TGFa shedding share common regulators
Our network contains a 15-gene cluster with high centrality, specificity, and low sensitivity
(12). This cluster contains 7 experimental genes from the original screen: PPEF1, PPEF2,
OBSCN, CALM1, IRAK1, PPP1 R1 2B, and AKAP1 1. shRNAs against all of these genes
decreased shedding except for AKAP1 1 where shRNAs induced shedding. We show the
original screening data as lightening plots in Figure 20.
Many of these genes have roles in NF-xB and apoptotic signaling; Loss of obscurins
increased cell survival and is associated with decreased caspase 3 and 9 activity[1 17].
TAB1 interacts with TAK1 which is an IKK kinase[23]. Calmodulin is needed for IKK acti-
vation during hypoxic conditions[23]. IRAK1 is an intermediate in the IL-1p induced NF-xB
pathway[27]. In a meta-analysis of late onset Alzheimer's disease, PPEF1 was associated
with disease pathology and NF-xB was one of the most perturbed pathways[85]. PPEF2
is a negative regulator of ASK1, which can disrupt NF-xB activation[96, 80]. There is ev-
idence that the NF-xB and EGFR signaling pathways positively regulate each other, but
there is not evidence of NF-xB regulators affecting TGFa cleavage.
The cluster also predicts that TAB1 and XIAP are closely related to TGFX shedding, yet
they were not included in our original analysis. To validate these predicted genes, we used
3 shRNAs to test each gene target, and measured total TGFa, surface-bound TGFot, and
plot the relative amount of TGFe at the surface (13); we made these measurements with
and without PMA stimulation. TAB1 shRNAs consistently reduce the proportion of TGFa
at the surface, and show little effect on total TGFa. shRNAs against XIAP showed mixed
effects on the total TGFa and the proportion at the surface. However, the amount of TGF
at the surface track with XIAP knockdown, and this suggests a dosing relationship between
XIAP levels and TGFa. XIAP and TAB1 also have roles in regulating the NFxB pathway
and cell death. This identifies a novel link between regulators of the NFxB pathway and
ectodomain shedding of the EGFR ligand, TGFa.
79
I-A
t
Relative TGFa at the surface (TABI)nsa
0.4- --7
unstim -- -
.1
stimulated
TGFa total (TABI)
20000- s
m stimI7 77- -
stimulated
TGFa at the surface (TAB1)i C, M
400 -
unstim - -
Mmm
100-
0-
Figure 13: TAB1 increases and XIAP decreases TGFa at the surfaceWe measure the proportion of TGFI at the surface (A,B), total TGFY (CD), and TGFa at
the surface (E, F) for 3 shRNAs against TABI (left column) and XIAP (right column). qPCRresults for all shRNAs shown for shRNAs against TAB1 (G) and XIAP (H).
80
B Relative TGFa at the surface (XIAP)
Ls-n Ia 4
stimulated
D TGFa total (XIAP)
m
atim - -- --- unStim ---- - - ---
a .. I 4 . a .a
stirrulated
F TGFa at the surface (XIAP)
4000- __ --
Unsns
aiim *
stimulated
Ii XIAP PCR
100.
50 AE
C
0.
EF
G
M
E
Author Contributions
All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F), Andreas Herrlich
(A.H.), Eirini Kefaloyianni (E.K.), Christina Harrison (C.H.), and Jennifer L. Wilson (J.L.W.)
Contributions: J.L.W., A.H., E.F., and D.A.L. designed the project. J.L.W., C.H., E.K.
and A.H. acquired the data. J.L.W. analyzed the data. J.L.W. wrote the manuscript. E.K.,
A.H., E.F., and D.A.L. provided materials and reagents. A.H. and D.A.L. supervised the
research.
Discussion
Here, we started exploring regulators of TGFa shedding in an effort to better explain the
relationship between the EGFR pathway and its relevance to gastric cancer. Through a
combined computational and experimental approach, we identify NFxB pathway regula-
tors as affecting TGFa shedding. We validated the relevance of these gene components
through experimental validation. Our work demonstrates interplay between the NFkB and
EGFR pathways and suggests novel consequences for modulating these pathways, sin-
gularly or in combination. Though, there exists a link between NFxB and resistance to
EGFR therapies, no one has explored whether growth factor shedding is involved with this
form of resistance. It's possible that dual inhibition of NFkB and EGFR was not success-
ful because upstream modulators of the NFkB pathway affect TGFa cleavage. Targeting
the NFkB pathway downstream of these common regulators could result in more growth
factor in the environment, competing with EGFR inhibitors. We lastly hypothesize a path
forward using existing IRAK1 inhibitors, as this target sits at the nexus of the NFkB and
TGFa pathways.
Current literature has characterized interplay between NF-xB and EGFR, but has not
yet identified ectodomain shedding as a mechanistic link between these pathways. The
NF-xB pathway can promote epithelial to mesenchymal transition in cancer cells through
up-regulation of matrix metalloproteinases[63, 59]. EGFR family growth factors can initi-
ate and activate NF-xB signaling using pathways that are distinct from cytokines[133]. In
head and neck squamous cell carcinoma, EGFR and TGFt genes are up-regulated con-
81
currently with NF-xB signaling components[1 10]. The NF-xB activators, IKKat and IKKp
enhance EGFR signaling, and activation of both lKKs can induce EGFR, TGFa, and Jun
expression[1 10].
Previously, IRAK1 had limited connections to gastric cancer, but has been better char-
acterized in other pathologies. This gene has been identified as a prognostic factor in gas-
tric cancer through its seed match with miR-146a which targets both EGFR and IRAK1.
miR-1 46a was significantly under expressed in clinical gastric cancer samples, and ectopic
expression reduced invasion and down-regulated both EGFR and IRAK1[77]. IRAK1 has
been characterized in breast cancer where it is over-expressed in triple negative breast
cancer (TNBC) samples and is involved in resistance to microtubule-targeting agents. Pa-
clitaxel induces IRAK1 phosphorylation and subsequent release of pro- NFkB cytokines
preventing apoptosis[1 55]; this effect is also observed with vincristine. The group also
saw that using IKKB inhibition could not fully abrogate effects of IRAK1 - NFkB activation,
implying that an alternative mechanism may exist. Lastly, the group explored the use
of traditional Chinese medicines, Ginseng products ginsenoside Rbl and its metabolite
compound K (CK), to modulate IRAK1. The metabolite CK could recapitulate the IRAK1 -
inhibitor phenotype. IRAK1 has also already been explored for its ability to modulate NFkB
activity in myelodysplastic syndromes (MDs) [125].
Modulating the NFxB pathway could have other effects on EGFR dynamics beyond
affecting TGFa shedding. In the case of CAPE, another modulator of NFxB activation,
administration of this compound increased the anti-tumor effects of tamoxifen in breast
cancer[ll01], and sensitized ovarian cancer cells to chemotherapy[46]. In gastric cancer,
CAPE inhibited proliferation of gastric cancer cells on multiple substrates and reduced
the expression of MMP9[78]. While, we know that sheddase enzymes are involved in
ectodomain shedding, we have not characterized their role in NFxB modulation and re-
sponse to targeted therapies, the effects of this network on the sheddases themselves
remains undetermined.
While we focus on one potential therapeutic route forward, our model also predicts new
mechanistic information about XIAP a gene that is currently a focus for therapeutic devel-
opment. XIAP-TAB1 binding is an upstream regulator of NFxB signaling[21] and over ex-
82
pression of XIAP is associated with loss of apoptotic signaling in cancer[1 11]. XIAP binds
and inhibits caspases 3 and 9, and also TAK1; binding TAK1 causes ubiquitination and
degradation of TAK1 to prevent its activity with JNK[73]. NF023, a new compound, inhibits
XIAP-TAB1 binding, and thus could be a novel pro-apoptotic therapeutic in cancer[21 ]. Our
work suggests that XIAP targeting can affect EGFR signaling by modulating TGFa cleav-
age. Reducing TGFt at the surface may diminish the effects of the autocrine loop between
TGF and EGFR[79] and may be another justification for why XIAP targeting could be an
effective cancer therapy.
Our network approach is useful for identifying how signaling pathways intersect and
can affect targeted therapies. Already, single RTK-targeted therapies have had success
in circulating tumors and mixed success in solid tumors. The idea that solid tumors are
genetically more complex than circulating tumors suggests that multi-targeted therapies
may be more successful, and further, that instead of targeting multiple RTKs, a successful
approach may involve the incorporation of signaling intermediates, such as the case of ABL
as a down-stream mediator of MET[44]. While discovering how these pathways intersect
remains a difficult challenge, our application of network methods to understand growth
factor shedding demonstrates a path forward.
Methodologically, we offer a few considerations for future work modeling RNAi data: We
compared three methods based on different RNAi assumptions and find that methods that
equally weight all shRNAs diminish any possible real effects. The median method is simple
and accessible, but assumes that all shRNAs are creating an on-target effect and that the
cell cannot compensate for these perturbations. Though, given literature about seed-driven
effects, and many publications about off-target effects, we are unconvinced that the median
approach is appropriate[97, 71,161, 42]. Ranking methods such as shEnrich (e.g. RIGER)
offer the richest path forward.
The shEnrich method is robust to redundant shRNA performance and is useful for se-
lecting targets from the initial screen. This is unsurprising, as many other groups have
also shown that similar enrichment methods reduce the role of off-target effects on screen
interpretation[140, 89, 171, 72, 121]. Though, even this approach lead to inconsistent
validation results, indicating residual complexity in RNAi effects. However, using the net-
83
work approach as a filter accommodates this type of complexity and can confirm a gene's
membership in the final target set.
The two-best shRNAs method is also simple, but yields few targets. In using shEnrich,
it appeared that the enrichment score trended with the two strongest shRNAs (e.g. for
PRKCA and TRPV5). So while simplistic but not comprehensive, the two-best shRNA
approach may have merits as a first pass analysis technique.
It's unclear as to what number of redundant shRNAs are necessary and sufficient for
conducting pooled shRNA screens. The field has moved to screens with increasing num-
bers of RNAi reagents, sometimes using more than 20 shRNAs against a given gene. This
approach does increase statistical power for analyses, though it disregards the richness
of pathways-based approaches for sorting out off-target effects. Our network analysis and
scoring comparison suggests that smaller numbers (-5 shRNAs) can be sufficient when
analyzing screens with network-filtering approaches.
The network approach may also minimize seed-driven RNAi effects. In a screen for host
factors responsible for viral and bacterial infection, the authors demonstrated that specific
seed sequences were more predictive of phenotypic outcome than their gene targets[42].
In our screen, seed-driven effects may be responsible for measured TGFa cleavage. In
the case of the PRKC family, redundant shRNAs against the same gene had varying phe-
notypes in the initial shEnrich analyses, yet almost every family member had at least one
shRNA with a relatively strong effect. Second, subsequent validation of PRKCA with mul-
tiple shRNAs showed both decreased and non-affected levels of TGFX cleavage. The
network identified just PRKCZ and PRKCA as the most likely candidates; thus, network-
based approaches may be sufficient filters for reducing the tendency for seed-driven RNAi
effects.
For pathways discovery, interactome-based methods offer a means to discover path-
ways that are tissue non-specific. When conducting pathways discovery based solely on
experimental data, one is constrained by the elements expressed within the experimen-
tal tissue. By interpreting these experimental results in the context of an interactome,
we create a larger possible pathway landscape. It's possible that all components of an
interactome-derived pathway are not expressed in all tissues. In this case, filtering the
84
pathway after creation maybe appropriate. However, the initial pathway model stands as a
general reference pathway and is likely more comprehensive than a pathway derived from
any single tissue or cell model.
85
Methods
shRNA screen for regulators of TGFx shedding
The initial screen was conducted in Jurkats and is fully described in [25]. An important
feature of this screen is that it uses a Jurkat-TE cell line which expresses TGFc with an
intracellular GFP and extracellular HA tag.
shEnrich method
The shEnrich method analyzes redundant shRNAs against a given gene to select for con-
sistency and strength of biological effect (in our case, projected effect on TGFL shedding).
The method rank-orders all shRNAs and calculates a moving enrichment score (ES) based
on summing the probability of finding an shRNA within the gene's family (probability of a
hit, "phit") and the probability of finding an shRNA outside the family (probability of a miss,
"pmiss"). For each gene's shRNA cohort, there is a total effect represented by the sum of
all shRNAs z-scores; the phit is a fractional representation of how much of this total effect
is captured at each rank. The p_miss is a fractional penalty equivalent to the inverse of the
number of nom-family shRNAs in the screen.
ELISA assays in MDA-231
Validation of screen hits was conducted using MDA-231 cells plated in 96 well plates. 90%
confluent cells were stimulated with 100 nM PMA for 2 hours after which time supernatant
was collected. A microsphere-based Luminex Technology and ELISA kits (DY239, R&D
Systems) were used to measure a panel of shed growth factors, as previously described
[129].
A phosphorylation-site specific interactome
We used the set of human interactions contained in version 13 of the iRefIndex database
[124] as our source for protein-protein interactions, which consolidates information from
a variety of source databases. We used the Mlscore system [152] to assign confidence
86
scores (ranging from 0 to 1) to these interactions, which considers the number of publica-
tions (publication score), the type of interaction (type score), and the experimental method
used to find the interaction (method score). We extracted the relevant scoring information
for interactions from the iRefindex MITAB2.6 file, using the redundant interaction group
identifier (RIGID) to consolidate interactions between the same two proteins reported by
multiple databases, and used a java implementation of Mlscore (version 1.3.2, available
at https://github.com/EBI-IntAct/miscore/blob/wiki/api.md) with default parameters for indi-
vidual score weights. We only considered interactions between two human proteins (i.e.
we excluded human-viral interactions) and converted protein identifiers, generally provided
as UniProt or RefSeq accessions, to valid HUGO gene nomenclature committee (HGNC)
symbols. Once converted, we removed redundant interactions (generally arising from
isoform-specific interactions that map to the same protein/gene symbols) and retained
the maximum observed score. This produced a total of 175,854 unique protein-protein
interactions.
To this interaction set, we added predicted and experimental interactions for kinase and
phosphatase sites. We collected the kinase-site interactions from Phosphosite [60] and
phosphatase-site interactions from DEPOD[86]. Where site-specific information existed,
we created edges in the interactome, first from the kinase/phosphatase to a substrate-site
node (represented in the network as 'PROTEIN_SITE') and second from the substrate-site
to the substrate (represented as 'PROTEIN'). The kinase/phosphatase to substrate-site
interactions were scored using a modified miscore framework[152]. This scoring method
is a normalized, weighted sum of an interaction type score(Sp), evidence type score(St),
and publication value score(Sp):
S KpSp+KmSm+KtStKp+Km+Kt
Using the miscore framework as a guide, all interaction type scores (St) were set to
1 (because this interaction information was 'direct'). Publication scores were set using
the miscore scale (Sp=0.00/0.33/0.53/0.67/0.77/0.86/0.94/1.00 for 0/1/2/3/4/5/6/7 publica-
tions). The method scores (Sm) were set based on the evidence available for each data
set:
87
DEPOD Phosphosite
in vitro reaction or in vivo in one lab 0.33 in vitro evidence 0.33
in vitro and in vivo or evidence in multiple labs 0.66 in vivo evidence 0.66
in vitro and in vivo in multiple labs 1.0 both 1.0
All K values were set to 1. The interactome uses Uniprot identifiers.
Prize-collecting Steiner Forest (PCSF)
We used the genes selected by the shEnrich method and the interactome network de-
scribed above as inputs to the Prize Collecting Steiner Forest algorithm[1 46]. The genes
from shEnrich were all assigned prize values of 1 and converted to Uniprot identifiers. We
used the parameters P=10, D=10, w=1, and p=0.006 (parameter optimization explained
in supplemental methods). After finding an optimal network of genes, we augmented this
forest and added back all possible interactions among these genes from our initial interac-
tome. We converted to gene identifiers back to gene symbol and visualized the network in
Cytoscape. The PCSF algorithm is available online as part of our 'Omics Integrator suite
of tools (http://fraenkel-nsf.csbi.mit.edu/omicsintegrator/).
Sensitivity, Specificity, and Centrality Metrics
To determine a gene's sensitivity within our network solution, we create a family of 100
networks using the original prize values and genes selected from the shEnrich method,
and add 0.5% noise to the interaction edge scores in our interactome. For a given gene, we
measure sensitivity by counting the gene's representation in the family of noisy networks.
A gene that shows up in all of the networks is insensitive to noise and thus robust to
our method. Because the prize nodes are constant inputs to the algorithm, we expect
them to be insensitive to edge noise. To measure specificity, we again create a family
of 100 networks. This time, we hold the interactome constant and randomly select gene
targets from our initial library as inputs. We run the algorithm at the same parameters as
above and count a gene's representation in the family of random networks. We calculate
1-specificity to rank genes that are specific to the real experimental data and eliminate
88
biases in selection due to the initial library construction. For all centrality metrics we use
the networkX module in Python.
Measuring surface bound TGFot in Jurkat cells
For all measurements in Jurkat-TE cells, we used the following staining and washing pro-
cedure: Jurkat-TEs were spun out of media and re-suspended in 25 uL of 1:100 goat
anti-HA (lot #) on ice for 1 hour. They were washed 3 times with 200 uL cold PBS with 3%
FCS, and resuspended with 25 uL 1:100 APC-coupled anti-goat (lot#) on ice for 2 hours.
After secondary staining, they spun down, and were resuspended with propidium iodide
(PI). Relative TGFx at the surface was quantified using FACS. We analyzed all FACS data
using FlowJo. We measured the FITC and APC-A geometric means in the PI-negative
(live) population. Relative TGFa is determined from the ratio of red:green fluorescence
and represents the relative amounts of extracellular:intracellular TGF.
shRNA knockdown experiments
We ordered three Mission SHRNA Transduction Particles from Sigma Aldrich for TAB1
(clone IDs: TRCN0000381913, TRCN0000380746, TRCN0000380345) and XIAP (clone
IDs: TRCN0000231575, TRCN0000231576, TRCN0000231577). Jurkat cells were plated
at 2.5x105 cells per well in 96 well v-bottom plates, and infected with 1 uL of virus in
media with 4ug/mL polybrene. After 24 hours (day 1), cells were switched to media with
2.5ug/mL of puromycin. On day 3, fresh media with puromycin was added. On day 5, cells
were switched to regular growth media and allowed to recover before stimulation. On day
7, cells were either stimulated with 100 nM PMA or left in media. At 5 minutes cells were
spun down, media was removed, and we measured surface bound TGFa as described
previously.
89
Supplemental Methods
Parameter Optimization
We first optimized the algorithm and parameters to minimize selection of "hub" nodes (or
genes that are highly connected due to publication bias). We set these as our "grey listed"
targets and quantify their representation in 100 runs of the algorithm at various degree
penalty values (controlled by the mu parameter; data now shown).
We further optimize the remaining parameters - P, D, and w. These parameters con-
trol the algorithm's inclusion of experimental genes ("terminal nodes") relative to non-
experimental genes ("hidden nodes"), network size, and relative density. We ran PCSF
over 36 parameter configurations, using all combinations of the following sets: P [1,10,100],
o [0.1, 1, 10, 100], and D [10, 20, 30]. Briefly, p sets the relative cost of capturing gene
(node) prizes with adding edges, o tunes the initial cost associated with connecting the
dummy node to all terminal prizes in the input set, and the depth, D, controls how long a
given pathway goes in the final network. We look for a balance of efficiency (ratio of ter-
minal nodes:hidden nodes), network size, and tree composition (rejecting parameter sets
that contain any sub-trees with 3 or fewer nodes.
We find the optimal network parameters by looking for a high efficiency (high termi-
nal:hidden node ratio), reasonable network size relative to the input set, and exclusion of
grey-listed nodes (Figure 17). After vetting the sub-network creation, we select an optimal
parameter setting of p=1, D=30, co=10, and ji = 0.006 (Figure 12).
We also consider representation of hub nodes, or nodes that have extremely high con-
nectivity in the interactome, and consider how often a parameter set includes these hub
nodes in the final solution. Prior to running PCSF simulations, we identify a set of grey-
listed nodes by ranking all nodes in our interaction by degree and taking those with a
degree > 200 to add to the grey list. At each parameter set, we count the fractional rep-
resentation of these grey nodes in 100 runs of PCSF with random input prizes. Each
heatmap shows this fractional representation across the previously mentioned 36 param-
eter configurations. The final optimal parameters were P=10, D=1 0, co=1, and pL=0.006.
90
Supplemental Figures and Tables
forw~ard NLS for al,
71 fAnily size 2
tl1~
forward NES fu, )Ifamily size 4
genes
I
A
forward NES for afarnily size 9
ag41,1 laZ
genres
LiEn' I I ~ I I06 08 10 12 14 1f 18
,nr o Nt 1
F1
1150
for ward NIS for alfArrily size 10
im 18C
0.8 20 1.2 14 1 b 8 2.0 2e
Figure 14: histograms for shEnrich (forward)Forward normalized enrichment scores (NES) histograms for shRNA families with
2/3/4/8/9/10 (A/B/C/D/E/F) shRNAs.Histograms for 5-shRNA family shown in 11. In all
panels, the dark purple distributions represent a distribution of shRNA scores calculated
from lacZ controls and the light pink distributions represent scores calculated using gene-
targeting shRNAs.
91
acZgenes
B
C100
'orward NLS for allfarriiy siZe:3
genes
foi ward NES for adfamily size: 8
genes
200
I1D
E
MRL
L&kL
'I
10D
L
2i 3 0.8 1 C 1,
AE B,I t'
NFIC
E F[UY-:7
Figure 15: histograms for shEnrich (reverse)Reverse normalized enrichment scores (NES) histograms for all shRNA families with2/3/4/8/9/10 (A/B/C/D/E/F) shRNAs. Histograms for 5-shRNA family shown in 11. In allpanels, the teal distributions represent a distribution of shRNA scores calculated fromlacZ controls and the yellow distributions represent scores calculated using gene-targetingshRNAs.
92
BA NEI 1 1 CY - 3
A B
A*I,
E
H
:i.
/
K
F
*11~L
L.
Figure 16: Scatter plots for shEnrich methodScatter plots for all shRNA family sizes in the forward (A-F) and reverse directions (G-L).
In all plots, each individual point represents an shEnrich score (y-axis) and max effect size
(x-axis) for each gene (colored points) with a particular shRNA family size or a random
sampling of lacZ controls (black points). Orange rectangles define a region for selecting
gene targets for further modeling. This region is defined as having an shEnrich score
above all lacZ controls and a max effect size < -1.0 z-scores (for forward calculations, A-F)
or an shEnrich score above lacZ controls and a max effect size >1.0 z-scores (for reverse
calculations, G-L). For most genes, they were no more consistent than the lacZ controls
and did not make the threshold for further modeling.
93
D0
-~G
J
B C
J.34I1
I
A D
B EE 60.20
'60r
C F
- d
Figure 17: Optimization of network node selection
We optimized the PCSF parameters, beta, omega, and depth. (A-C) show fractional rep-
resentation of 'grey nodes' (the nodes with the highest connectivity in our starting inter-
actome) in 100 random networks created with depth=1 0 (A), depth=20 (B), and depth=30
(C). Each row represents an individual grey node, and the columns represent different
combinations of omega (0.1,1,10, and 100) and beta (1,10,100). We additionally measure
network efficiency (ratio of terminal:hidden nodes) (D), the relative network size (E), and
the number of 1/2/3-node trees (F) at each of these parameter combinations.
94
ii
P*c
/ Ts ?D4 Ty
PIN _S138 PTBP1 S16
-- 1
EVAsp 8157 TRPV5 654
Ei1iBCL2 SaL7
EPZ2 $72
cAS#9 Y153 ASP9 183 4*3qNK3
RA1 Si199
PPPi12B OLN
A A
CDC25A Der 18Ssr321
CD 4A
jyt394 TPqXRK
'2 TP 1i5
SNII Ti 73
RKCA Experimental gene, non specific
tj:1 Experimental gene, specific
Predicted gene, non specific. sensitive
IBCL2j Predicted gene, specific, insensitive, high closeness centrality
Figure 18: The full network selected by PCSFGrey face coloring indicates genes selected from the original experimental data set; whiteface represents hidden nodes (genes and phospho-sites) selected by the algorithm. Nodeborder represents specificity to randomization (pink<=0.1; orange<=0.05), and node sizerepresents closeness centrality (larger nodes are more central and more robust). Greynode labels indicate nodes that are sensitive to edge noise. Nodes on the far right areannotated examples to help interpret all of their network properties.
95
EEF2v S78
E10K
SH211B
-jSLAJJF1 Y307
Normalized Closeness
1.00
A1 0.668
Sensitivity
1 Insensitive5/ Sensitive
1 -specificity
0.950 99 SW31 0394
_141111911111- - - - -
Genes oinneo oy centrahity metrics
L1
LFi
Uj Ki3
IS
CentralwC' tyv"
Figure 19: Centrality metrics test gene robustness.(A) Network genes ranked by Degree, Betweenness, Page-Rank, and Closeness centrality.The Aggregate represents the multiplicative sum of all ranks. (B) Gene membership incentrality metric bins. Bin 1 contains genes with low centrality scores and bin 15 containsgenes with high centrality scores.
96
C 1
CASP9 183 LAP
CASPt. t W NA~3
PF 3 H
PPPO 2BAK 11
1 K
NJ
--------- K
V -i:-- ;~h~
Figure 20: Lightening plots for NFxB regulators.The cluster of genes is shown in the upper left. Below, lightening plots for all experimentalgenes are as in 11: the enrichment score (ES) is plotted against the ranked shRNAs. TheES increases each time that an shRNA at a ranking targets the gene of interest. Otherwise,the ES decreases linearly. The grey lines are 100 representative plots of subsets of thenon-targeting shlacZ controls. Genes which are enriched in the forward/reverse directioncorrespond to genes whose shRNAs increase/decrease shedding.
97
ORM I S ft, USI-N
Two best ShONA;Gene shRNA_ I shRNA-.2PHKCA 3 1/2 1 430I HPV, 2042 1.530PPP1R14D 2.455 2.220MAST2 2.684 2.621PPEF1 2.676 1.608
MAT? 1 790
PKIB 1 472
DUSP6 1.292SSH1 -1.275FASTK 1.225PTPRE -1 219PTP4A1 1.178GUCY2C -1.121INPPSA -1101PPAP2A 1.090SSH3 1.068GMFG 1.059CDC14A 1.052DUSP9 1 050IRAK4 1 048SGKL -1,042STK33 1.042ERBB3 1.035INPPSD 1.030PPP4R1 -1.029CHP -1.024TPS3RK -1.008
Table 10: Comparison of alternative scoring metrics for redundant shRNAs.The median score reflects the median value of all shRNAs against a given gene. Medianvalues that are above or below +/- 1 z-score are shown. The two-best approach requiresthat a gene have one shRNA with an abs(z-score) > 2.0 and a second shRNA with anabs(z-score) > 1.25 to be considered a hit.
98
Part V
Mutual information identifies gene cohortspredictive of response to Dasatinibtreatment in Acute Lymphoblastic Leukemia
Abstract
Many patients suffer relapse after treatment with Dasatinib, a second-line tyrosinekinase inhibitor, yet, little is know about what genetic factors govern this process. Wesystematically measure a 20K pool of shRNAs in vivo to identify genetic mediatorsof Dasatinib response in an pre-B-cell Acute Lymphoblastic Leukemia (ALL) model.Mouse to mouse variability greatly affects the ability to easily measure shRNA effects inthis environment. We develop a mutual information heuristic to find cohorts of shRNAsgoverning response to treatment and identify a gene set that confers a growth advan-tage to ALL prior to treatment and a disadvantage following Dasatinib treatment. Theset in includes AF366264, Art4, Fati, GtpbplO, KIf13, Nxphl, and Ube2ql. Thesegenes have known associations with leukemic progression but have not been exploredas prognostic markers for predicting response to Dasatinib.
99
Introduction
Dasatinib is a second-line tyrosine kinase inhibitor (TKI) for treating hematopoietic can-
cers, yet many patients still experience relapse after this secondary treatment[137]. Point
mutations in the drug target, BCR-ABL fusion protein, are one source of resistance. In
one mutational analysis, 78% of patients harbored single mutations in the BCR-ABL, and
58% had multiple mutations in this protein[137]. Therefor it is desirable to identify which
genes modulate response to Dasatinib and thus infer which genetic lesions could abrogate
or enhance response to this therapy.
As understanding of complexity in cancer genetics increases, the field has started
adopting a systems-wide approach. It's becoming more clear that single mutations or sin-
gle sensitivities are individually insufficient for explaining disease progression, response to
treatment, and other complex phenotypes. Rather, whole pathways or groups of genes,
mutations or proteins are more informative for understanding these phenotypes, such as
drug resistance(161, 551. High-throughput screens are a tool which can uncover such
pathways.
High-throughput rna-interference (RNAi) screens in vivo are one tool for probing ge-
netic pathways and their contribution to disease. These screens have identified novel
targets in hematopoietic malignancies[97, 92]. One screen used stably integrated short-
hairpin RNAs (shRNAs) to identify genes relevant to Lymphoma progression in vivo. This
shRNA library tested thousands of targets simultaneously in mice, and identified compo-
nents of cytoskeletal dynamics specifically as in vivo regulators of the disease [92]. RNAi
experiments in mice can now accommodate tens of thousands of shRNAs and multiple
treatment conditions, expanding the use of these loss of function techniques.
Here we sought to discover genes contributing to Dasatinib response in Acute Lym-
phoblastic Leukemia (ALL). To understand this effect in vivo, we introduced an shRNA
library into mice using a B-cell ALL model and tested whether or not cells representing
shRNAs remained in the circulation before and after Dasatinib treatment. However, due
to mouse to mouse variation, there is often differential shRNA response across mice un-
der similar conditions. This consequence masks the effect of some shRNAs that may be
100
relevant to disease and/or treatment effects, and necessitates novel methods for extract-
ing these potentially informative shRNAs. Here, we employ a modified mutual information
heuristic which analyzes sub-sets of shRNAs in each mouse and looks for patterns of
co-acting shRNAs across mice. From these co-acting subgroups, we hypothesize new
pathways relevant to disease progression, and response to treatment.
101
Methods
shRNA screen and Dasatinib treatment
We infected B-cell acute lymphoblastic leukemia cells (B-ALL) in culture with a 20K library
of shRNAs at a multiplicity of infection < 1. After integration, we FACS-sorted cells for
shRNA integration and tail-vein injected 1x106 cells per mouse. At the time of injection,
we reserved 1x106 cells as the input sample. Leukemia developed over 11 days, at which
time, we sampled pre-B-cell ALL cells from the blood as the pre-treatment sample. After
sampling, mice were treated with Dasatinib at 10 mg/kg for 3 days. Mice were then allowed
to relapse, at which time, we sampled B-ALL cells as the post treatment sample. All
samples were then sequenced to determine shRNA representation.
Data Processing, clustering, and mutual information
We filtered out all shRNA counts below 100 reads and performed size-factor normalization
on raw sequencing counts using DESeq2[5]. To further reduce noise across samples, we
calculated the natural logarithm of the fold-change between pre and input samples (early
change, "pre-treatment") and post and pre samples (late change, "post-treatment"). We
collapse these fold-changes into a single (x,y) variable for clustering and plotting. In the
end, we have one data vector for each mouse (5 in total: "mouse 2", "mouse 3", "mouse 5",
"mouse 8", "mouse 9"), where each data point is a (x,y) coordinate representing the (pre-
treatment,post-treatment) log-transformed fold changes. We used hierarchical clustering
to compare the pre-treatment and post-treatment data across the five mice that survived
the screen using the Pycluster module in Python. We next clustered data points within
each mouse vector using Pycluster's K-means clustering algorithm. We used Euclidean
distance and k = 2,3,4,5,6,9,10 initially. We determined that 9 clusters represented the
'elbow' on the intra-cluster distance vs. k plot and selected 9 clusters as ideal divisions
among the data (explained in results). The selection of 9 clusters was also consistent with
our anticipated number of shRNA groups given the possible combination of responses at
each time point. We employed a mutual information metric to compare the overlap between
mice. The mutual information metric is given by the following:
102
Equation M.I. = ijp(a, b) * log (ab))
Where p(a,b) = (no. of shRNAs in both cluster a and cluster b)/(total shRNAs); p(a) M.I.
= (number of shRNAs in cluster a)/(total shRNAs); and p(b) = number of shRNAs in cluster
b)/(total shRNAs); summed over all clusters ij in the two mice under comparison(similar
to[138]).
From this analysis, we were interested in the extent to which each of the inter-cluster
components were contributing to the overall score between mice. Identifying these sub-
groups is consistent with our aim to find cohorts of shRNAs responding to disease and
treatment. To identify these clusters, we separately plotted each component comparison
in a heat map. The components are equivalent to p(a,b) * log(p(a,b)/p(a)p(b)) for a two
cluster comparison.
To correct for the amount of overlap expected by chance we background-corrected the
component scores using randomized data. For each mouse, we created random clusters
by dividing the shRNA data into 9 bins of the same size as the original clusters, and re-
peated the M.1. component calculation 100 times. We averaged these random M.l. scores
and subtracted this 'background' value from the original value. This correction method also
accounts for differences in cluster size as set by the k-means algorithm.
We used a standard hypergeometric test to compute the significance of overlap be-
tween clusters and used the scipy package in python for this calculation.
Bi-clustering
We used the clustergram function in MATLAB to bicluster the normalized, fold-change data.
In vitro validation of shRNA cohort
We designed 97mer sequences using the Euphrates website (http://Euphrates.mit.edu/cgi-
bin/shRNA/index.pl). We ordered the following oligos (forward and reverse sequences
represented for each gene):
103
AF366264AF366264Art4Art4FatiFatiGtpbpl0Gtpbpl0KIfM3KIf13Nfat5Nfat5NxphlNxphlUbe2q1Ube2ql
V2MM191661V2MM19166_2V2MM_17020_1V2MM_17020_2V2MM_109794_1V2MM_109794_2V2MM35901V2MM35902V2MM1766071V2MM_176607_2V2MM_2869_1V2MM_2869_2V2MM_981121V2MM981122V2MM_9292_1V2MM92922
TGCTGTTGACAGTGAGCGCGGGCAGAGAAATGTATCAAATTAGTGAAGCCACATGAAGCCACAGATGTAATTTGATACATTTCTCTGCCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGCGGAGACTATTCATAAAGGAATAGTGAAGCCACATGAAGCCACAGATGTATTCCTrATGAAATAGTCTCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCTGTCTGAAGCATCTGTAATTAGTGAAGCCACATGAAGCCACAGATGTAATACAGATGCTTCAGACAGGGTGCCTACTGCCTCGGATGCTGTrGACAGTGAGCGACGCACTCTTAGCAATTAATAATAGTGAAGCCACATGAAGCCACAGATGTATTATTAATTGCTAAGAGTGCGGTGCCTACTGCCTCGGATGCTGTrGACAGTGAGCGCGGCATAGATTCTATATTGTAATAGTGAAGCCACATGAAGCCACAGATGTATTACAATATAGAATCTATGCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCTGTATCAGTGGGAATATATTAGTGAAGCCACATGAAGCCACAGATGTAATATATTCCCACTGATACAGGCTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCCTTCAAGGTGATCTGTATTrAGTGAAGCCACATGAAGCCACAGATGTAAATACAGATCACCTTGAAGGGCTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGCCCAGAGGCAAGATTACTTAAATAGTGAAGCCACATGAAGCCACAGATGTATrAAGTAATCTTGCCTCTGGTTGCCTACTGCCTCGGA
We PCR amplified these sequences, extracted the 138bp product and ligated into an
MLS + GFP vector. With this vector we transformed competent cells and verified liga-
tion products by sequencing. We maxi prepped DNA from transformed competent cells,
cleaned DNA using a zymoclean kit, and used this DNA to transfect Phoenix cells. We
collected virus from Phoenix cells and used this to infect tomato+ pre-B-cells.
We plated 2.0 x106pre-B-cell ALL cells into 6-well plates. For each shRNA construct
(including MLS) we tested 3 volumes of viral media (1.5, 2.0, 2.5 mL). After 24 hours, all
wells were sampled via FACS and the sample with the greatest infection was considered
the input (day 0) measurement for each construct. This sample was split 1:3 and main-
tained separately in culture for the duration of the experiment. The cells were split every 24
hours, we used FACS to measure the GFP percentage. On day 6, the cells were switched
to media with 1 nM Dasatinib. FACs analysis was conducted using FlowJo.
104
4 pre-B-cells representing20K shRNA pool
Input Pre Post
Amplify and sequence for shRNA representation
Figure 21: A longitudinal screen for mediators of ALL and response to Dasatinib
We introduce a 20K pool of shRNAs into pre-B-cells in culture (top). We take representa-
tive samples of this population and transplant them into syngeneic mice recipients. At the
time of input we take a representative sample and sequence for shRNA representation.At disease representation, we sample blood from leukemic mice and again sequence for
shRNA representation. At this time, we administer Dasatinib treatment. Following treat-
ment, we again sample blood and sequence for shRNA representation. We can charac-terize shRNA trends by looking at the fold-change of any shRNA at the early time point
(pre-treatment/input) and the late time point (post-treatment/pre-treatment).
Results
An shRNA screen measures shRNA representation before and after treatment
We conducted a longitudinal shRNA screen to identify genetic mediators of ALL and re-
sponse to treatment. We represent the work-flow for this experiment in Figure 21.
We normalized sequencing reads across mice, filtered reads for high variance, and
calculated early (pre-treatment/input) and late (post-treatment/pre-treatment) fold-changes
(Figure 22). We used hierarchical clustering to identify distinct groups between the early
and late change data points (Figure 22).
A mutual information heuristic selects cohorts of co-acting shRNAs
We chose to look for trends across pairwise comparisons of mice because we expected
variability across mice due to differences in library representation. The longitudinal na-
ture of the screen and the added effects of treatment make it difficult to identify a strict
105
I I S VI.H 14-
-i 4 s% .- Tt /.
I .
%I~
i,,n
. am ia
UI
r'i
L aetJ"'.iU o m
I M[4G-U. a'J
0.37 -0.25 1&
0.12-0.00-0.12-0.2-0.371
Figure 22: Hierarchical clustering of 'early' and 'late' fold-change dataBlue/red indicates shRNAs which deplete/enrich from input -> pre-treatment or from pre-treatment -> post-treatment. Color bar shows the scale of fold-changes. An enlargeddendrogram is included on the right.
106
fold-change threshold for identifying shRNAs related to disease progression and treatment
response. We were interested to see if mutual information could be extended to this prob-
lem because we saw it as a means for avoiding the need to define fold-change thresholds.
To identify cohorts within mice, we first binned the shRNAs using k-means clustering
on the log-transformed fold-change data. For each mouse, we sampled k=2,3,4,5,6,9,and
12 clusters. At each value of k, we measured the intra-cluster distance and looked for the
inflection point at which this distance stopped decreasing (Figure 23). For all mice, the
inflection point for intra-cluster distance was found at k=9. This is fitting because we hy-
pothesized 9 possible groupings based on the possible shRNA response at (early/late) time
points: (no-change/no-change), (no-change/increase), (no-change/decrease), (increase/no-
change), (increase/increase), (increase/decrease), (decrease/no-change), (decrease/increase),
and (decrease/decrease). We plot all of the shRNAs clustered per mice (Figure 24). K-
means clustering does not cleaning separate the data and it's possible that other clustering
metrics, such as fuzzy k-means, would be better suited to this data set. It is also the case
that many of the shRNAs have subtle effects making it difficult to cleaning separate them
into distinct bins. However, this approach was strongly motivated to reduce the necessity
of setting specific fold-change cut-offs. K-means clustering allowed for an unsupervised
means to create these bins within mice without having to manually set thresholds for each
mouse.
We first look across mice by asking by looking at how much overlap existed between
all pairwise combinations of inter-mouse clusters. We use Equation I and plot the results
as a heatmap in Figure 25. Because we expect the self-comparisons to have the highest
possible mutual information, we compute the highest possible mutual information score
for all comparisons and subtract this background score to more clearly compare between
mice. As expected, self-comparisons drop to near zero, but the comparisons between
other mice remain positive. Mouse 2 and mouse 9 were the most similar, and mouse 8/9
and mouse 2/3 also having relatively high mutual information.
107
ise 2
mouse 5
mouse 9W4ltha 1uster sun 0 d starte,
mouse 3
mouse 8
Figure 23: Intra-cluster distances for all miceFor each mouse, we used a k-means clustering algorithm to cluster shRNAs and calculatethe intra-cluster distance based on the early and late fold-change values.
108
I
MOMEM116-
mou
-2 4 6
mouse 2
t a... .I
mouse 5
mouse 9
*-- -.-
mouse 3
mouse 8
I 4~SO
Figure 24: K-means clustering for all mice with k=9Each point is an shRNAs where (x,y) represents (early, late) fold-changes.
109
0.9
0.8
0.7
0.6
0.5
0.4
-0 3
0.2
0. 1
00
0.105
0.090
0075
0.060
- 0,045
- 0.030
0.0 15
0.000
Figure 25: Mutual information between mice shows some mice are more similar than oth-ersWe used Equationi to determine similarity across mice and plot the raw scores (top) andthe background-subtracted scores (bottom) as a heatmap.
110
mouse 2 c 0mouse-2 c I
mouse 2_ 0,120mouise 2 e-mouse 2mouse 2mouse_3- 0.105mouise 3moose 3mnouse 3 e-moujse 3mouse-3 0.090mouse 3mouse 3mouse 3
0
,mouse-5-,mose 5 -0.075
mouse c -nouse S ,
mouse 5 1mouse 5 0moure-80
mouse E
mouje 9
moc-se-8-
mouse 0mouse s 0.015mouse 9mouse 8
mouse_0mouse 2mouse-9 0.015
mouse 9mouse 9
mouse' 0.0150
mouse 3 . -0mouse 2 -
mouse_4
mou e_ , - 0.0100mojse J
mouSe 5
Mouse2
mouse 5 2noss.5 8 .
mous:3 0.0175mousez
5
nooe910.00250
mousenouse .4
mc, Se~s, 0.0125
mouse5 - 7mouse 5 c
ose 5 e -c
mouse_ 7 e0.000mouse5
S
mcous e 5 4 -
mouse_5 c BI,)- se 6 0.0050mouse 8 8mou-,se-9 emous~e 9
monse_892 0.00250
m e'9 4-
mouse9~-6 c -7 0.0000mouse 9_c:_8I
Figure 26: Component mutual information scores show that few clusters have high mutual
information.We plot all pairwise comparisons in raw scores (A) and background-subtracted scores (B).
111
m
0 r
I I *.
" %*I .*; *
- * U U
UE U
* dl
If MA-
I1 -
m Na ICI
M= r,
0 OI I
mm 1&m.
I In. I IN m 0 v I N
a ~*iu161 U* 71
%~~ %= -_%, - . 1
U ii
Figure 27: Hypergeometric testing of component comparisons.We show the raw Fisher's exact p-values (A) and the bonferroni-corrected p-values (B).
112
* U
mu -Ua
Ue
U -
U.
UaQU
I i f I i i , a | i i i 6 a i r i a a i ] ieU
-
I A
I
-7
shRNA cohorts identify gene targets affecting disease progression and response to
treatment
We then wanted to know which groupings of shRNAs were contributing to the mutual in-
formation observed across mice and if any of these groupings were statistically signifi-
cant. To test this hypothesis, we plotted the "component" mutual information scores for all
cluster-cluster comparisons. In this context, a "component" refers to any individual shRNA
cluster. We plot a heatmap of all component comparisons in Figure 26. As expected, self-
comparisons had the highest mutual information scores in the raw values, and intra-mouse
comparisons had a score of zero (because no shRNA could be assigned to two different
clusters). To mitigate the strong effect of self-comparisons, we again calculated an average
expected mutual information score for all cluster comparisons, and background subtracted
this score from the raw values (Figure 26). The background-subtracted heatmap highlights
clusters that have more shRNA overlap than expected by chance. Clusters which have
relatively high overlap include mouse 2 cluster 0 and mouse 8 cluster 1, mouse 9 cluster
0 and mouse 3 cluster 1, mouse 9 cluster 0 and mouse 5 cluster 0, mouse 5 cluster 4 and
mouse 8 cluster 1, and mouse 9 cluster 4 and mouse 8 cluster 1. This analysis suggests
that there is relatively high overlap of shRNAs in each of these clusters, and we can infer
that the shRNAs are showing similar changes at early and late time points across mice.
Before investigating which shRNAs belonged to which cluster and the direction of their
effects, we wanted to vet our method statistically, and with other machine learning ap-
proaches. In our adaptation of mutual information, we used background subtraction to
correct for expected overlaps. Fisher's exact test is a standard, well-recognized method
for looking at statistical significance between gene groups. To compare results from our
method, we calculated a Fisher's exact test for each component comparison and use a
bonferroni mutilple hypothesis correction to filter component scores that remain significant
(Figure 27). As expected, self comparisons were the most statistically significant (because
they had 100% overlap in shRNAs). Only three other cluster comparisons remained sig-
nificant - mouse 8 cluster 1 and mouse 2 cluster 0, mouse 9 cluster 0 and mouse 3 cluster
1, and mouse 5 cluster 4 and mouse 8 cluster 1 (Figure 27).
113
Mcaa Fbxll? LelDoc*3 RimkItP L21 (rid ,Olank LrriqiD2Eting s 70002B 1 RikgynLJgt Ia2 6,3G407111ikOllr8S4 [inda
Gm51011AF366264(deliMatsUgtla2I rim37
Table 11: Gene-targeting shRNAs in significantly overlapping clusters
We plot these clusters and look at shRNA fold-changes at pre/input and post/pre (Figure28).
Group 1 shows mouse 8 cluster 1 (m8c1) and mouse 2 cluster 0 (m2cO), between which
there are 12 overlapping shRNAs (Figure 28, top). M8cl contains shRNAs that enrich from
input to pre-treatment and then deplete from pre-treatment to post-treatment and m2cO
identifies shRNAs that enrich both from input to pre-treatment and from pre-treatment to
post-treatment. Of these 12 overlapping shRNAs, there are 5 targeting known genes (Ta-
ble 11). Group 2 compares mouse 9 cluster 0 (m9cO) and mouse 3 cluster 1 (m3c1). Both
of these clusters identify shRNAs that enrich from input to pre-treatment and from pre-
treatment to post-treatment. Within this comparison there are 19 shRNAs represented in
both clusters; 13 of these shRNAs target annotated parts of the genome (Table 11). Group
3 contains mouse 5 cluster 4 (m5c4) and mouse 8 cluster 1 (m8cl) (Figure 28, bottom).
Between these clusters there are 6 overlapping shRNAs, 2 of which target genome regions
with known annotations (Table 11). M5c4 identifies shRNAs with enrich at both early and
late time points.
M8cl had significant overlaps with both m2cO and m5c4. Across all three clusters 2
shRNAs are represented in all three clusters; only one of these genes targets a gene with
known annotation: D2Ertd750e.
Group 2 (Figure 28, middle) contains the highest number of overlapping shRNAs with
known gene annotations. If we also consider overlap of gene targets (instead of consider-
ing overlap of individual shRNAs), the number of gene targets identified as overlapping in
this cluster increases to 32 (Figure 29). The full list of gene targets from this cluster is in
Table 12.
114
G ru :l 1? qhRNA
Mouse 8, cluster 1 (50 shRNAs) Mouse 2, cluster 0 (67 shRNAs)
Mouse 9, cluster 0 (50 shRNAs)0.8
,0 00.6 0
0000. 0 0
r~ %
06
00
0.8-
0.
0.8
-0.2
O
000.4
-0.2-' pre/input
Group 2: 19 shRNA overlap
Mouse 3, cluster 1 (64 shRNAs)
0.4
0L
-0.2-0.2-
0.4 0.6
0.4-0
0
8 0
A9M a a noel0.4 00.6
pre/input
Mouse 8, cluster 1 (50 shRNAs)
0.2 0
0.8Odm 0.4 0 0.6
0 0
bpre/input
Mouse 5, cluster 4 (36 shRNAs)
0
-0.5 0.0
0
(D0
,0*' _ 00S6b 0
0
Figure 28: shRNA fold-changes for clusters with significant overlapGroup 1 (top, red) contains mouse 8 cluster 1 and mouse 2 cluster 0. Group 2 (middle,blue) contains mouse 9 cluster 0 and mouse 3, cluster 1. Group 3 (bottom, orange) con-tains mouse 5 cluster 4, and mouse 8 cluster 1. For all figures, open circles represent allshRNAs in the cluster and red/blue/yellow indicates shRNAs which overlapped betweenthe two clusters.
115
0.2-
-0.2
-0.6-
00
pre/input
0
-0.2 0.0 0.2
pre/input
-0.2 D
-0.6-0.5 1.0
pre/input
Mouse 9, cluster 0 (50 shRNAs) Mouse 3, cluster 1 (64 shRNAs)
0.8- 0.850 * Fat1 .8Fat1
0010 * GtpbplO * GtpbplO
0 0 KI13 * KIfM3
0 0-4 0 0 Nxphl
SUbe2ql 0 Ube2ql0
0 0 post/pre
0 post/pre
overlapping genes -0.2 -. 0.4 00.6 overlapping genes
-0.2 0.0 0.2 0.4 0.6 2 prelnputpreoinput
Figure 29: M9cO and m3cl identify overlapping gene targetsScatter plots represent all shRNA fold-changes (open circles) in m9cO (top) and m3cl (bot-tom) at early (x-axis) and late (y-axis) time points. Grey circles highlight shRNAs targetinggenes that are represented in both clusters. Coloring highlights specific genes and theshRNAs targeting those genes in each cluster.
Gem tP% onnd in mo4 and mnu11190020J1290 GmOSdO OXLad41700025811 Rik GtpbplO Fhoa
633040 7011 Fik K01113 HImpIsAF386264 I.,aia SI(A Ila
B u rck3 b kql Imnes3L.1l Ncapg mrmdFall Nats Tinm3Fball 1 Nfe?12 lUhe2ql1(i'li Nxph1 [Jgtla3
Gll 01~11419 Zip7iSGmS11)1 01111290
Table 12: Gene targets overlapping between m9cO and m3cl
Biclustering benchmarks mutual information heuristic
We used biclustering to bench mark against an alternative, and widely-adopted, unsuper-
vised machine learning approach3O. Bi-clustering separates all early fold-changes from
late fold-changes. We can manually select subsets of the data and notice trends that were
also selected in the mutual information heuristic. Branch '931 " in the bi-clustered heatmap
overlaps with gene selections identified by the m9cO and m3cl overlapping cluster. This
section of the clustergram contains shRNAs against KIM 1, Eeal, Nxphl, Zfp71 9, and
Gm5546 (Figure 31).
In vitro validation measures shRNA cohort depleting in response to treatment
As a first pass at method validation, we used in vitro assays to test the response of gene
targets selected by our mutual information heuristic. We anticipated that the fold-change
values from the original screen were affected by mouse to mouse variability and wanted
116
I
I
1.q
-0.50
-0.5
-L C L L C C) C) 0) C) C
V) U) tr) W) U) C) )1 0 ) 0 0 0 0 0
CO CNJ m~ kl ob 00 M L CW) CNJE E E E E E E E E E
Figure 30: Biclustering of shRNAs across mice.The heatmap represents all shRNAs (rows) and their fold-changes in either early or latetime points (columns).
117
MEMOMMEMM-_
I
- -
NV1I 131/5/- 2 MM 1241025";MM 215301
,MM 2051b9V MMI 96112
M V2MMA 128:)bV2MM 1400b8V2MM 1 b948b
rl V2MM 11660i
V2MM 128193V2MMA 1 / Z33V2MM 13:35/1
~ o c ~ Q~ 0 0 0 0
m~ (VI o) In 00 0 IE E E E E E E E E
Figure 31: Branch 931 recapitulates trends in m9cO/m3cl clusters.As in Figure 30, rows represent individual shRNAs and columns are fold-changes at earlyand late time points. Branch 931 contains a cohort of shRNAs that target similar genes asidentified in m9cO and m3cl.
to use follow-up assays to test their response. Using this assay we could see that our
cohort set of shRNAs enriched relative to and MLS control,yet depleted after Dasatinib
treatment. This suggests that loss of AF366264, Art4, Fati, Gtpbpl0, KIfM1, Nxphl, and
Ube2ql confer a growth advantage to pre-B-celI ALL. Depletion of these shRNAs following
treatment suggests that loss of these genes confers sensitivity to treatment.
118
Mm_ 7 __M
Percent GFP /oGFP relative to MLS
MLS MLSAF366264. AF366264A04 Art4
IL FatT FatGtpbpl0 GtpbplO
-0- KIfM3 -0- K~fl3_ Nxphl MLS- Nxph
10 + Ube2ql * 10 + Ube2ql
t t+Dasatinib +Dasabnib
Figure 32: In vitro competition assay shows shRNA cohort sensitizes ALL to DasatinibtreatmentWe measure the percentage of cells expressing an shRNA ("percent GFP", top) for eachsingle infected population, and track this percentage for the duration of the competitionassay. At day 6 we introduce Dasatinib treatment. Each point represents 3 biological repli-cates. We additionally calculated the fold-change relative to a non-targeting MLS control.
Author Contributions
All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F.), Michael Hemann
(M.H.), Eleanor (Cameron) Fiedler (E.C.F.), Christina Harrison (C.H.), and Jennifer L. Wil-
son (J.L.W.)
Contributions: J.L.W. designed the project; E.C.F, J.L.W. and C.H. performed research;
D.A.L., M.H., and E.C.F., contributed new reagents/analytic tools; J.L.W. analyzed data;
J.L.W. wrote the paper; and D.A.L. supervised all aspects of the work.
Discussion
Here we present a mutual information heuristic for identifying cohorts of co-acting shRNAs.
The approach was conceptually motived by the fact that complex biological phenotypes are
governed by pathways, and at a simplistic level we can begin to discern these pathways by
looking at co-acting genes. We derived a heuristic for selecting cohorts of these shRNAs
based on mutual information. The method identified a set of genes that conferred growth
advantages to ALL and to Dasatinib treatment. Among this cohort we validate AF366264,
Art4, Fat1, Gtpbpl0, Klf13, Nxphl, and Ube2q1 and show that this cohort enriches from
input to pre-treatment and then depletes following Dasatinib treatment.
There are many factors affecting shRNA performance in screens with high library com-
119
plexity that are longitudinal or subject to treatments. Mouse to mouse variability can greatly
skew results for an individual shRNA. In our screen, if an shRNA did not integrate prop-
erly, we would not measure its effects at all. On the other hand, if an shRNA is inserted
behind a highly active promoter, shRNA representation can be drastically inflated at any
time point and create inflated fold-changes at either time point. Further, preparations prior
to sequencing can greatly affect shRNA representation during the experiment. In the end,
these effects make it difficult to use standard approaches such as averaging fold-changes
across mice.
Given these confounding effects, it is necessary to think about novel approaches for an-
alyzing the data to find signal among these measurements. We variance filtered the data
set to remove the most extreme of insertion effects and to reduce computational time. Fur-
ther, initial clustering and later biclustering saw more similarity between treatment groups
than between individual mice, suggesting that inter-mice comparisons would be viable for
looking at trends between early and late time points. This heuristic also eliminated the
need to define fold-change thresholds for the individual shRNAs. A disadvantage is that
our heuristic creates an arbitrary score for all component comparisons. We used this score
to rank and select clusters for further validation. Bi-clustering identified similar trends, but
this method requires manual selection to find gene sets for further validation. The Fisher's
exact test creates the most interpretable score for these comparisons by identifying a p-
value. Experimental validation verified the utility of our approach for data sets with low
signal-to-noise. As a proof of principle, the mutual information heuristic is viable as a novel
means for considering these data.
We computed this method only looking at individual shRNA performance but could have
considered other pre-processing before using our heuristic. For instance, when looking at
the overlap between mouse 9 cluster 4 and mouse 3 cluster 1, we find overlaps in the gene
targets within these clusters even though the mutual information heuristic only calculated
overlap based on individual shRNA performance. We could have also calculated scores
per gene based on redundant shRNAs and then looked for trends in gene performance.
However, we pursued this approach to minimize data processing and chose to rely on the
heuristic to look for trends.
120
Many of the gene targets in this shRNA cohort are associated with leukemia, and none
are associated with response to Dasatinib. Art4, which encodes Ecto-ADP-ribosyltransferase
4, is associated with chromosomal translocation events in pediatric ALL patients at re-
lapse [16]. Fat1 encodes a development-associated protein which acts as a tumor sup-
pressor. However, Fat1 is aberrantly expressed, frequently mutated, and associated with
advanced disease in solid tumors [105]. Recently, NGS of adult ALL patients confirmed
a high occurrence of FAT1 mutations in T-cell ALL patients, though, Fat1 did not corre-
late directly with any treatment response phenotype [105]. KIf13 is a zinc-finger family
transcription factor that had increased expression in dexamethasone sensitive ALL patient
derived xenografts as compared to insensitive xenografts [68]. Neurexophilin-1 is a ligand
for the alpha-neurexin cell adhesion molecules. In addition to affecting synapse signaling,
Nxphl can suppress hematopoietic cell differentiation [76]. Nxphl is also hyper-methylated
in low grade breast carcinoma suggesting that this gene would be a viable biomarker for
disease [36]. Ube2q1 is a ubiquitin-conjugating enzyme that shows decreased expression
in pediatric ALL patients when compared to non-leukemic controls [131]. Gtpbp1 0 is a
small G-protein is a signaling protein without any annotated function in cancer. Though, it
trended with genes that have roles in ALL development and disease. Generally, published
evidence confirms that loss of cohort genes is associated with advanced disease. This
work confirms that finding, but also predicts that low expression of these genes could sen-
sitize these leukemias to Dasatinib treatment. This cohort may also serve as a prognostic
marker for predetermining if patients would respond to second-line Dasatinib treatment.
Further molecular biology and in vivo validation would improve the predictions made by
this method. In vitro characterization was a relatively quick and useful first step, but does
not tell the whole story as to why or how these gene targets are affecting ALL progression
or response to Dasatinib. Further, there were complications with the MLS control leading
to large deviations in these measurements.
121
Part VI
Conclusions and Discussion
This thesis has demonstrated the utility and power of pathways-based modeling for RNAi
loss of function screens. The popularity of RNAi techniques waned when discovery of
off-target effects cast doubt on the tool's power {Jackson2003 Bickle}. Yet, compared to
other functional genomic techniques, RNAi is still the most cost-effective and most easily
adaptable to many experimental systems. For this reason, scientists have endeavored to
compensate for these limitations and improve the technology; this has caused a modest
and cautious return to RNAi technology {Bickle}. Yet, few investigations have considered
the use of biological networks for overcoming these limitations. This thesis explores these
network models and identifies viable hypotheses from existing RNAi studies. The success
of this application further supports the revival of RNAi investigations, and underscores the
value of 'network filtering' as a necessary companion to these investigations.
This work began with a thorough characterization of the limitations of RNA-interference.
Chapter 1 reviewed the role of off-target effects and the current approaches to mitigate
these effects. This review also highlighted the absence of pathway-based modeling for
RNAi screens. Chapter 2 then applied network-based integration to identify novel media-
tors of Acute Lymphoblastic Leukemia (ALL). This work demonstrated the utility of network-
based integration and identified a novel tumor suppressor, Wwpl, in the context of ALL.
Next, Chapter 3 modeled a novel pathway governing shedding of the growth factor TGFa
and used this pathway to explore mechanisms of growth-factor induced resistance to tar-
geted chemotherapy. This model identified common upstream regulators of NFxB and
TGFa, and suggests novel genes to target therapeutically. While this work focused specif-
ically on effects in gastric cancer, the network model is general to all tissue and cell types,
suggesting that these pathway components could be useful for other cancer indications.
Lastly, Chapter 4 explored a novel computational approach for selecting shRNA cohorts
from a pooled, longitudinal screen. The technique identified a set of co-acting shRNAs
that governed ALL progression and response to Dasatinib treatment. However, further
122
molecular characterization would improve the predictions made in this investigation. In ad-
dition to the highlighted stories, this thesis contributed to the tool set used to quantitatively
examine network model predictions. Specifically, this work explored multiple randomiza-
tion and robustness metrics for these network models. These metrics include routines for
minimizing the selection of hub nodes (genes within the interaction network that are highly
connected due to study bias), specificity (routines for selecting genes that are unique to
experimental data and not selected due to bias in the initial screening library), and sen-
sitivity (routines for minimizing bias in edge scoring). This thesis also explored multiple
network centrality metrics for discerning the number and nature of interaction edges that
lead to selection of an individual gene target. So far, network centrality has been explored
mostly for its proposed utility in selecting biologically important genes. But these metrics
are also valuable for scrutinizing the genes selected by network methods. These best
practices are essential for dissemination and use of these models in future investigations
and are described in the methods and supplemental chapter sections. The ability to adopt
these analyses into mainstream is supported by the maturity of the relevant computational
algorithms. Specifically, SAMNet Web and Omics Integrator are stable implementations of
the SAMNet, PCSF, and Garnet algorithms used in this work. These tools make it possi-
ble for researchers to adapt this network modeling framework to their investigations and
bring network-filtering of RNAi data into the mainstream. This work also opened many
unanswered questions.
- Chapter 2 demonstrated the ability to uncover hidden genes from RNAi studies. Yet,
without the ability to validate every prediction in the network model, it is unclear if and
how much we can improve validation rates by using these modeling approaches as
to compared to existing statistical methods.
- In both chapters 2 and 3, I applied a general interactome to develop the optimal sub-
network. I did not filter the network for tissue specific expression and instead con-
sidered all possible interactions. However, in practice, all documented interactions
are not relevant in every tissue and cell type, and this could bias gene candidate
selections that include genes not expressed in the system of interest.
123
- In chapter 3, I explored augmentation of the interaction network with novel biological
interactions. Specifically, I added predicted phosphorylation and de-phosphorylation
events from Phosphosite and DEPOD. Computationally, it is relatively easy to incor-
porate these predictions in the network and the approach is attractive for considering
any type of post-translational modification. Currently, though, the work lacks thor-
ough validation of these predictions for biological relevance.
- Currently, there is no consensus in the RNAi field as to how many redundant shRNAs
are sufficient for screening applications. Anecdotally, experts claim that at least 10
shRNAs are required to have confidence in a gene's selection. In practice, each gene
is differentially sensitive to RNAi perturbation, suggesting that even with increasing
number of redundant shRNAs, some genes may not show a phenotype. Further,
this work's emphasis on pathways-level effects suggests that very high coverage
may not be necessary for initial screening approaches. Thus, especially as network
approaches become mainstream, it may be more beneficial for scientists to focus on
post-processing and analysis instead of high-coverage libraries. Though, the validity
of this claim would drastically be improved if we could better estimate validation rates
from network-modeling approaches.
- These networks may offer a viable path for parsing the effects of gene homologues.
For instance, the shRNA screen in chapter 3 included redundant shRNAs for all mem-
bers of the PRKC family. Initial screening data for all members of this family had at
least one shRNA performing at the extreme end of the distribution, though our final
network solution only included PRKCa. It is widely accepted that there are seed-
matching effects that cause shRNAs to affect genes beyond the original target, so it
is possible that many of these extreme effects are due to seed-matching with gene
homologues. Using a network filter could reduce the selection of genes affected by
seed-matching by requiring connectivity to other experimental genes. Conceptually,
this approach could be modeled after work in the Fraenkel lab using network models
to make predictions from untargeted metabolomic data. These approaches allow an
unassigned m/z value to be connected to multiple nodes within the network and then
124
allow the algorithm to select the most likely proteomic candidate.
Lastly, I highlight two additional areas where I believe the combination of network mod-
els and loss of functions screens will have impact: integrated personalized medicine and
synthetic loss of function engineering.
Characterization of cancer pathways will affect personalized and precision medicine
Genome sequencing technology has matured such that it is feasible to sequence the can-
cers of individual patients and ideally define a molecular cause for disease. There were
early successes with this approach such as the discovery that BRAF is frequently mutated
at codon 600 in melanoma [40, 88, 6]. There are other similar successes in non-small
cell lung cancer (NSCLC) where scientists have identified EGFR and ALK mutations in
many patients [115, 128, 84]. These findings were great successes and motivators for the
field of molecular targeted therapeutics and spurred development of BRAF, EGFR, and
ALK inhibitors. Targeted molecular therapies showed increased survival and therapeutic
response. However, patients ultimately relapsed, suggesting that these initial molecular
vulnerabilities were insufficient for characterizing the disease[66]. In most cases patients
carry a heterogeneous set of genetic abnormalities and any cancer can contain subpopula-
tions with various combinations of mutations. Computational approaches have made great
strides in considering heterogeneous clones and demonstrating that optimal drug dosing
can prevent the outgrowth of these sub-clones [172]. Further, charting clonal evolution in
tumors can identify points of genetic vulnerability than can specifically sensitize cancers
to targeted therapies [173]. These findings highlight two distinctive features of the cur-
rent cancer treatment landscape: cancer is a molecularly-driven disease and single-agent
therapies are insufficient to prevent relapse.
Next generation sequencing technology has created a wealth of information about po-
tential molecular vulnerabilities, but it is not clear how these characterizations should guide
therapeutic development. Next generation sequencing can measure RNA expression and
epigenetic markers in addition to sequencing mutations. From these measurements, it is
desirable to have a molecular 'signature' or combination of features that can be used to
125
profile any individual patient and make clinical decisions. Frampton et al demonstrate one
implementation of this idea through the creation of a cancer genome profiling test [41]. The
test uses shotgun sequencing to identify actionable molecular features in cancer patients,
and they report a 76% rate of actionable items from this one assay[41]. Ultimately, creat-
ing these reduced assays is going to be invaluable to making clinical decisions. However,
there are unanswered questions related to what molecular entities these assays should
measure. Further, most investigations focus narrowly on a single 'omic measurement:
GWAS, mRNA, RNAi, etc. Some investigations make multiple 'omic measurements, but it
is unclear how to collectively interpret these disparate data sets.
Parallel development in understanding gene membership in pathways will influence
how these molecular assays succeed in the clinic. For instance, in pancreatic cancer,
individual mutations vary across patients. Jones et al. characterized this phenotype and
showed that these mutations occurred in a core set of signaling pathways[69]. In practice,
the cell is an integrated system that is highly interconnected; any individual component
exists in an ecosystem of interactions, and any of our experimental measurements can
only capture a subset of these processes. This realization isn't new, but its acceptance into
routine biological investigations is lacking. Data integration tools can solve these problems
and create novel representations for next generation sequencing data sets. As these tools
mature, data integration will guide molecular characterization and profiling for oncology.
Loss of function characterizations for forward engineering approaches
The field of synthetic biology is a growing discipline that could benefit from tools adapted
in this thesis. The synthetic biology field aims to engineer the cell to actuate a specific
program, and has applications for many societal problems, from medicine to green energy.
Early successes in the field, such as the creation of the bi-stable genetic switch in E. coi
[45] and the repressilator [32], demonstrated the possibility of engineering logical circuits
with biology. This field has blossomed as a result of systems biology approaches and
the ability to create recombinant DNA [15]. Synthetic biology has advanced beyond the
demonstration of simple circuitry and started tackling more complex biological problems.
126
For instance, engineered Notch receptors can tune the specificity of cell response [100]
and can be combined in engineered T-cells for selective cancer targeting[1 27]. However,
the success of many synthetic biology approaches depend on the fidelity of individual com-
ponents; circuit component expression depends on promoter activity, and leaky transcrip-
tion can prevent circuit specificity. Coordinated expression of these promoters is difficult
and can cause build up of components that are detrimental for cell growth[74]. As such,
many investigations have sought to define and characterize libraries of parts for rational
design [156]. In contrast to building large parts libraries, discovery and development of
CRISPR elements has increased the tools available by offering an alternative platform for
genome engineering. For example, a combination of constitutively active scRNAs and an
inducible Cas9 master regulator could selectively direct flow through the VioC pathway
in E. coli [168]. The ability of CRISPR activation and inhibition (CRISPRa and CRISPRi
respectively) holds additional promise for synthetic biology's toolbox[81].
However, this field depends on characterized cellular pathways to identify the parts
list associated with any biological phenotype and thus, is limited to controlling simple cir-
cuits. As synthetic biology matures and tackles more complex problems, pathway dis-
covery will become essential for more thoroughly defining the processes which we hope
to control. This thesis emphasized the uncharacterized nature of many biological path-
ways and demonstrated a means for annotating novel pathways from loss of function stud-
ies. Currently, RNAi screens have an unappreciated role in synthetic biology but could
be coupled with transcriptional synthetic technologies to actuate complex programs. For
instance, Zalatan et al's use of a Cas9 master regulator with multiple scRNAs is an attrac-
tive approach for modulating a cellular process. However, this system is limited to a sim-
ple, well-characterized pathway. For more complex systems, loss of function screens and
network models could identify molecular components amenable to RNA perturbation and
could eliminate the dependency on specific promoters for tuning component expression.
For the most part, synthetic biology has focused on the induction of required components,
but hasn't yet explored the use of RNA interference as a means of damping existing com-
ponents. This platform could become an alternative or orthogonal approach to increase
tenability of biological systems.
127
References
[1] Britt Adamson, Agata Smogorzewska, Frederic D Sigoillot, Randall W King, and
Stephen J Elledge. A genome-wide homologous recombination screen identifies the
RNA-binding protein RBMX as a component of the DNA-damage response. Nature
Cell Biology, 14(3):318-328, February 2012.
[2] C Adrain and M Freeman. Regulation of Receptor Tyrosine Kinase Ligand Process-
ing. Cold Spring Harbor Perspectives in Biology, 6(1 ):a008995-a008995, January
2014.
[3] Clint Allen, Kunal Saigal, Liesl Nottingham, Pattatheyil Arun, Zhong Chen, and
Carter Van Waes. Bortezomib-induced apoptosis with limited clinical response is
accompanied by inhibition of canonical but not alternative nuclear factor-kappaB
subunits in head and neck cancer. Clinical cancer research : an official journal
of the American Association for Cancer Research, 14(13):4175-4185, July 2008.
[4] Clint T Allen, Barbara Conley, John B Sunwoo, and Carter Van Waes. CCR 20th
anniversary commentary: Preclinical study of proteasome inhibitor bortezomib in
head and neck cancer. Clinical cancer research : an official journal of the American
Association for Cancer Research, 21(5):942-943, March 2015.
[5] Simon Anders and Wolfgang Huber. gb-2010-11-10-r106. Genome Biology,
11(1 0):R1 06, October 2010.
[6] P A Ascierto, J M Kirkwood, J J Grob, and E Simeone. The role of BRAF V600
mutation in melanoma. J Transl..., 2012.
[7] Eishi Ashihara, Tetsuya Takada, and Taira Maekawa. Targeting the canonical Wnt/-
catenin pathway in hematological malignancies. Cancer Science, 106(6):665-671,
April 2015.
[8] Sabarish Ayyappan, Dhivya Prabhakar, and Neelesh Sharma. Epidermal growth
factor receptor (EGFR)-targeted therapies in esophagogastric cancer. Anticancer
research, 33(10):4139-4155, October 2013.
128
[9] Albert-Ldszl6 Barabdsi, Natali Gulbahce, and Joseph Loscalzo. Network medicine:
a network-based approach to human disease. Nature Reviews Genetics, 12(1):56-
68, January 2011.
[101 Bonnie Berger, Jian Peng, and Mona Singh. Computational solutions for omics data.
Nature Reviews Genetics, 14(5):333-346, May 2013.
[11] Amanda Birmingham, Laura M Selfors, Thorsten Forster, David Wrobel, Caleb J
Kennedy, Emma Shanks, Javier Santoyo-Lopez, Dara J Dunican, Aideen Long,
Dermot Kelleher, Queta Smith, Roderick L Beijersbergen, Peter Ghazal, and Car-
oline E Shamu. Statistical methods for analysis of high-throughput RNA interference
screens. Nature Publishing Group, 6(8):569-575, August 2009.
[12] M Boettcher and M T McManus. Choosing the Right Tool for the Job: RNAi, TALEN,
or CRISPR. Molecular Cell, 2015.
[13] E Buehler, A A Khan, S Marine, M Rajaram, and A Bahl. siRNA off-target effects in
genome-wide screens identify signaling pathway members. Scientific reports, 2012.
[14] Frederic D Bushman, Nirav Malani, Jason Fernandes, lvdn D'Orso, Gerard Cagney,
Tracy L Diamond, Honglin Zhou, Daria J Hazuda, Amy S Espeseth, Renate K6nig,
Sourav Bandyopadhyay, Trey Ideker, Stephen P Goff, Nevan J Krogan, Alan D
Frankel, John A T Young, and Sumit K Chanda. Host Cell Factors in HIV Repli-
cation: Meta-Analysis of Genome-Wide Studies. PLoS Pathogens, 5(5):el 000437,
May 2009.
[15] D Ewen Cameron, Caleb J Bashor, and James J Collins. A brief history of synthetic
biology. Nature reviews. Microbiology, 12(5):381-390, May 2014.
[16] Cai Chen, Christoph Bartenhagen, Michael Gombert, Vera Okpanyi, Vera Binder,
Silja R6ttgers, Jutta Bradtke, Andrea Teigler-Schlegel, Jochen Harbott, Sebastian
Ginzel, Ralf Thiele, Peter Husemann, Pina F I Krell, Arndt Borkhardt, Martin Dugas,
Jianda Hu, and Ute Fischer. Next-generation-sequencing of recurrent childhood
129
high hyperdiploid acute lymphoblastic leukemia reveals mutations typically associ-
ated with high risk patients. Leukemia research, 39(9):990-1001, September 2015.
[17] Ceshi Chen. The Oncogenic Role of WWP1 E3 Ubiquitin Ligase in Prostate Cancer
Development. May 2011.
[18] Ceshi Chen and Lydia E Matesic. The Nedd4-like family of E3 ubiquitin ligases and
cancer. Cancer and Metastasis Reviews, 26(3-4):587-604, August 2007.
[19] Yanqing Chen, Jun Zhu, Pek Yee Lum, Xia Yang, Shirly Pinto, Douglas J Mac-
Neil, Chunsheng Zhang, John Lamb, Stephen Edwards, Solveig K Sieberts,
Amy Leonardson, Lawrence W Castellini, Susanna Wang, Marie-France Champy,
Bin Zhang, Valur Emilsson, Sudheer Doss, Anatole Ghazalpour, Steve Horvath,
Thomas A Drake, Aldons J Lusis, and Eric E Schadt. Variations in DNA elucidate
molecular networks that cause disease. Nature, 452(7186):429-435, March 2008.
[20] C Cobaleda and I Sdnchez Garcia. B-cell acute lymphoblastic leukaemia: towards
understanding its cellular origin. Bioessays, 2009.
[21] Federica Cossu, Mario Milani, Serena Grassi, Francesca Malvezzi, Alessandro
Corti, Martino Bolognesi, and Eloise Mastrangelo. NF023 binding to XIAP-BIR1:
searching drugs for regulation of the NF-rB pathway. Proteins, 83(4):612-620, April
2015.
[22] L M Coussens, B Fingleton, and L M Matrisian. Matrix metalloproteinase inhibitors
and cancer-trials and tribulations. Science, 2002.
[23] C Culver, A Sundqvist, S Mudie, A Melvin, D Xirodimas, and S Rocha. Mecha-
nism of Hypoxia-Induced NF-rB. Molecular and Cellular Biology, 30(20):4901--4921,
September 2010.
[24] M Dang, N Armbruster, M A Miller, E Cermeno, M Hartmann, G W Bell, D E
Root, D A Lauffenburger, H F Lodish, and A Herrlich. Regulated ADAM17-
dependent EGF family ligand release by substrate-selecting signaling pathways.
PNAS, 110(24):9776-9781, June 2013.
130
[251 Michelle Dang, Karen Dubbin, Antonio D'Aiello, Monika Hartmann, Harvey Lodish,
and Andreas Herrlich. Epidermal growth factor (EGF) ligand release by substrate-
specific a disintegrin and metalloproteases (ADAMs) involves different protein kinase
C (PKC) isoenzymes depending on the stimulus. Journal of Biological Chemistry,
286(20):17704-17713, May 2011.
[26] Joseph A DiDonato, Frank Mercurio, and Michael Karin. NF-rB and the link between
inflammation and cancer. Immunological reviews, 246(1):379-400, March 2012.
[27] D Dinour, L Ganon, L I Nomy, R Ron, and E J Holtzman. Wild-type uromodulin
prevents NFkB activation in kidney cells, while mutant uromodulin, causing FJHU
nephropathy, does not. Journal of nephrology, 2014.
[28] James R Downing. TGF-beta signaling, tumor suppression, and acute lymphoblastic
leukemia. The New England journal of medicine, 351(6):528-530, August 2004.
[29] Eran Eden, Doron Lipson, Sivan Yogev, and Zohar Yakhini. Discovering motifs in
ranked lists of DNA sequences. PLoS Computational Biology, 3(3):e39, March 2007.
[30] Eran Eden, Roy Navon, Israel Steinfeld, Doron Lipson, and Zohar Yakhini. GOrilla: a
tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC
Bioinformatics, 10:48, 2009.
[31] J R Edgar, E R Eden, and C E Futter. Hrs- and CD36- Dependent Competing
Mechanisms Make Different Sized Endosomal Intraluminal Vesicles. Traffic, 2014.
[32] M B Elowitz and S Leibler. A synthetic oscillatory network of transcriptional regula-
tors. Nature, 403(6767):335-338, January 2000.
[33] J A Engelman and P A Janne. Mechanisms of Acquired Resistance to Epidermal
Growth Factor Receptor Tyrosine Kinase Inhibitors in Non-Small Cell Lung Cancer.
Clinical Cancer Research, 14(10):2895-2899, May 2008.
[34] Jeffrey A Engelman, Kreshnik Zejnullahu, Tetsuya Mitsudomi, Youngchul Song,
Courtney Hyland, Joon Oh Park, Neal Lindeman, Christopher-Michael Gale, Xiaojun
131
Zhao, James Christensen, Takayuki Kosaka, Alison J Holmes, Andrew M Rogers,
Federico Cappuzzo, Tony Mok, Charles Lee, Bruce E Johnson, Lewis C Cantley,
and Pasi A Jdnne. MET amplification leads to gefitinib resistance in lung cancer by
activating ERBB3 signaling. Science, 316(5827):1039-1043, May 2007.
[35] H Fan and R Derynck. Ectodomain shedding of TGF-alpha and other transmem-
brane proteins is induced by receptor tyrosine kinase activation and MAP kinase
signaling cascades. The EMBO journal, 18(24):6962-6972, December 1999.
[36] Marta Faryna, Carolin Konermann, Sebastian Aulmann, Justo Lorenzo Bermejo,
Markus Brugger, Sven Diederichs, Joachim Rom, Dieter Weichenhan, Rainer Claus,
Michael Rehli, Peter Schirmacher, Hans-Peter Sinn, Christoph Plass, and Clarissa
Gerhauser. Genome-wide methylation screen in low-grade breast cancer identi-
fies novel epigenetically altered genes as potential biomarkers for tumor diagnosis.
FASEB journal : official publication of the Federation of American Societies for Ex-
perimental Biology, 26(12):4937-4950, December 2012.
[37] J Feng, T Liu, and Y Zhang. Using MACS to identify peaks from ChIP-seq data.
Current Protocols in Bioinformatics, 2011.
[38] M Fennell, Q Xiang, A Hwang, and C Chen. Impact of RNA-guided technologies for
target identification and deconvolution. Journal of..., 2014.
[39] W Fiskus, S Sharma, S Saha, B Shah, S G T Devaraj, B Sun, S Horrigan, C Leveque,
Y Zu, S lyer, and K N Bhalla. Pre-clinical efficacy of combined therapy with novel
-catenin antagonist BC2059 and histone deacetylase inhibitor against AML cells.
Leukemia, 29(6):1267-1278, December 2014.
[40] S A Forbes, G Bhamra, and S Bamford. The catalogue of somatic mutations in
cancer (COSMIC). Current protocols in ... , 2008.
[41] Garrett M Frampton, Alex Fichtenholtz, Geoff A Otto, Kai Wang, Sean R Downing,
Jie He, Michael Schnall-Levin, Jared White, Eric M Sanford, Peter An, James Sun,
Frank Juhn, Kristina Brennan, Kiel Iwanik, Ashley Maillet, Jamie Buell, Emily White,
132
Mandy Zhao, Sohail Balasubramanian, Selmira Terzic, Tina Richards, Vera Ban-
ning, Lazaro Garcia, Kristen Mahoney, Zac Zwirko, Amy Donahue, Himisha Beltran,
Juan Miguel Mosquera, Mark A Rubin, Snjezana Dogan, Cyrus V Hedvat, Michael F
Berger, Lajos Pusztai, Matthias Lechner, Chris Boshoff, Mirna Jarosz, Christine Vi-
etz, Alex Parker, Vincent A Miller, Jeffrey S Ross, John Curran, Maureen T Cronin,
Philip J Stephens, Doron Lipson, and Roman Yelensky. Development and validation
of a clinical cancer genomic profiling test based on massively parallel DNA sequenc-
ing. Nature biotechnology, 31(11):1023-1031, November 2013.
[42] A Franceschini, R Meier, A Casanova, S Kreibich, N Daga, D Andritschke, S Dilling,
P Ramo, M Emmenlauer, A Kaufmann, R Conde-Alvarez, S H Low, L Pelkmans,
A Helenius, W D Hardt, C Dehio, and C von Mering. Specific inhibition of diverse
pathogens in human cells by synthetic microRNA-like oligonucleotides inferred from
RNAi screens. PNAS, 111(12):4548-4553, March 2014.
[431 T K Fung, AYH Leung, and CWE So. The Wnt/-Catenin Pathway as a Potential
Target for Drug Resistant Leukemic Stem Cells. Stem Cells and Cancer Stem Cells,
2013.
[44] A Furlan, V Stagni, A Hussain, S Richelme, F Conti, A Prodosmo, A Destro, M Ron-
calli, D BarilA, and F Maina. Abl interconnects oncogenic Met and p53 core pathways
in cancer cells. Cell Death and Differentiation, 18(10):1608-1616, April 2011.
[45] T S Gardner, C R Cantor, and J J Collins. Construction of a genetic toggle switch in
Escherichia coli. Nature, 2000.
[46] Claudia Gherman, Ovidiu Leonard Braicu, Oana Zanoaga, Anca Jurj, Valentina
Pileczki, Mahafarin Maralani, Flaviu Drigla, Cornelia Braicu, Liviuta Budisan, Patri-
ciu Achimas-Cadariu, and loana Berindan-Neagoe. Caffeic acid phenethyl ester
activates pro-apoptotic and epithelial-mesenchymal transition-related genes in ovar-
ian cancer cells A2780 and A2780cis. Molecular and cellular biochemistry, 413(1-
2):189-198, February 2016.
133
[47] L A Gilbert and M T Hemann. Context-specific roles for paracrine IL-6 in lymphoma-
genesis. Genes & development, 26(15):1758-1768, August 2012.
[48] Luke A Gilbert, Max A Horlbeck, Britt Adamson, Jacqueline E Villalta, Yuwen Chen,
Evan H Whitehead, Carla Guimaraes, Barbara Panning, Hidde L Ploegh, Michael C
Bassik, Lei S Qi, Martin Kampmann, and Jonathan S Weissman. Genome-Scale
CRISPR-Mediated Control of Gene Repression and Activation. Cell, 159(3):647-
661, October 2014.
[49] Sara Gosline, Sarah Spencer, and Ernest Fraenkel. SAMNET: a network-based
approach to integrate multi-dimensional high throughput datasets. Technical report.
[50] SJC Gosline, C Oh, and E Fraenkel. SAMNetWeb: identifying condition-specific
networks linking signaling and transcription. Bioinformatics, 2015.
[51] Andreas Gschwind, Oliver M Fischer, and Axel Ullrich. The discovery of receptor
tyrosine kinases: targets for cancer therapy. Nature Reviews Cancer, 4(5):361-370,
May 2004.
[52] T Hart, K R Brown, F Sircoulomb, R Rottapel, and J Moffat. Measuring error rates
in genomic perturbation screens: gold standards for human functional genomics.
Molecular Systems Biology, 10(7):733-733, July 2014.
[53] Traver Hart, Megha Chandrashekhar, Michael Aregger, Zachary Steinhart, Kevin R
Brown, Stephane Angers, and Jason Moffat. Systematic discovery and classification
of human cell line essential genes. Technical report, February 2015.
[54] Charlotta Hedner, David Borg, Bj6rn Nodin, Emelie Karnevi, Karin Jirstr6m, and
Jakob Eberhard. Expression and Prognostic Significance of Human Epidermal
Growth Factor Receptors 1 and 3 in Gastric and Esophageal Adenocarcinoma. PLoS
ONE, 11 (2):eOl 48101, February 2016.
[55] M T Hemann. From Breaking Bad to Worse: Exploiting Homologous DNA Repair
Deficiency in Cancer. Cancer Discovery, 4(5):516-518, May 2014.
134
[56] Michael T Hemann, Jordan S Fridman, Jack T Zilfou, Eva Hernando, Patrick J Pad-
dison, Carlos Cordon-Cardo, Gregory J Hannon, and Scott W Lowe. An epi-allelic
series of p53 hypomorphs created by stable RNAi produces distinct tumor pheno-
types in vivo. Nature Genetics, 33(3):396-400, March 2003.
[57] Andreas Herrlich, Eva Klinman, Jonathan Fu, Cameron Sadegh, and Harvey Lodish.
Ectodomain cleavage of the EGF ligands HB-EGF, neuregulini -beta, and TGF-alpha
is specifically triggered by different stimuli and involves different PKC isoenzymes.
FASEB journal : official publication of the Federation of American Societies for Ex-
perimental Biology, 22(12):4281-4295, December 2008.
[58] Teru Hideshima, Constantine Mitsiades, Giovanni Tonon, Paul G Richardson, and
Kenneth C Anderson. Understanding multiple myeloma pathogenesis in the bone
marrow to identify new therapeutic targets. Nature Reviews Cancer, 7(8):585-598,
August 2007.
[59] B Hoesel and J A Schmid. The complexity of NF-KB signaling in inflammation and
cancer. Molecular cancer, 2013.
[60] Peter V Hornbeck, Indy Chabra, Jon M Kornhauser, Elzbieta Skrzypek, and Bin
Zhang. PhosphoSite: A bioinformatics resource dedicated to physiological protein
phosphorylation. PROTEOMICS, 4(6):1551-1561, June 2004.
[61] S s C Huang and E Fraenkel. Integrating Proteomic, Transcriptional, and Interactome
Data Reveals Hidden Components of Signaling and Regulatory Networks. Science
Signaling, 2(81):ra4O-ra4O, July 2009.
[62] Shao-shan Carol Huang, David C Clarke, Sara J C Gosline, Adam Labadorf, Can-
dace R Chouinard, William Gordon, Douglas A Lauffenburger, and Ernest Fraenkel.
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome
Reveals a Map of Oncogene-induced Signaling. PLoS Computational Biology,
9(2):el 002887, February 2013.
135
[63] M A Huber, N Azoitei, and B Baumann. NF-KB is essential for epithelial-
mesenchymal transition and metastasis in a model of breast cancer progression.
The Journal of..., 2004.
[64] Julie A E Irving. Towards an understanding of the biology and targeted treatment of
paediatric relapsed acute lymphoblastic leukaemia. British journal of haematology,
November 2015.
[65] Aimee L Jackson and Peter S Linsley. Recognizing and avoiding siRNA off-target
effects for target identification and therapeutic application. Nature reviews Drug dis-
covery, pages 1-11, January 2010.
[66] Filip Janku. Tumor heterogeneity in the clinic: is it a real problem? Therapeutic
advances in medical oncology, 6(2):43-51, March 2014.
[67] Lu Jiang, Yongchang Chen, Yueying Li, Ting Lan, Min Wu, Ying Wang, and Hai Qian.
Type 11 cGMP-dependent protein kinase inhibits ligand-induced activation of EGFR
in gastric cancer cells. Molecular Medicine Reports, February 2014.
[68] Duohui Jing, Vivek A Bhadri, Dominik Beck, Julie A I Thoms, Nurul A Yakob, Jason
W H Wong, Kathy Knezevic, John E Pimanda, and Richard B Lock. Opposing regu-
lation of BIM and BCL2 controls glucocorticoid-induced apoptosis of pediatric acute
lymphoblastic leukemia cells. Blood, 125(2):273-283, January 2015.
[69] S Jones, X Zhang, D W Parsons, J C H Lin, R J Leary, P Angenendt, P Mankoo,
H Carter, H Kamiyama, A Jimeno, S M Hong, B Fu, M T Lin, E S Calhoun,
M Kamiyama, K Walter, T Nikolskaya, Y Nikolsky, J Hartigan, D R Smith, M Hidalgo,
S D Leach, A P Klein, E M Jaffee, M Goggins, A Maitra, C lacobuzio-Donahue, J R
Eshleman, S E Kern, R H Hruban, R Karchin, N Papadopoulos, G Parmigiani, B Vo-
gelstein, V E Velculescu, and K W Kinzler. Core Signaling Pathways in Human Pan-
creatic Cancers Revealed by Global Genomic Analyses. Science, 321 (5897):1801-
1806, September 2008.
136
[70] Grazyna Jurkowska, Grazyna Piotrowska-Staworko, Katarzyna Guzihska-
Ustymowicz, Andrzej Kemona, Agnieszka Swidnicka-Siergiejko, Wiktor kaszewicz,
and Andrzej Dqbrowski. The impact of Helicobacter pylori on EGF, EGF receptor,
and the c-erb-B2 expression. Advances in Medical Sciences, 59(2):221-226,
September 2014.
[71] William G Kaelin. Molecular biology. Use and abuse of RNAi to study mammalian
gene function. Science, 337(6093):421-422, July 2012.
[72] M Kampmann, M C Bassik, and J S Weissman. Integrated platform for genome-wide
screening and construction of high-density genetic interaction maps in mammalian
cells. PNAS, 110(25):E2317-E2326, June 2013.
[73] S Kaur, F Wang, M Venkatraman, and M Arsura. X-linked inhibitor of apoptosis
(XIAP) inhibits c-Jun N-terminal kinase 1 (JNK1) activation by transforming growth
factor 31 (TGF-,31) through ubiquitin-mediated ....... journal of biological..., 2005.
[74] Jay D Keasling. Synthetic biology and the development of tools for metabolic engi-
neering. Metabolic engineering, 14(3):189-195, May 2012.
[75] Boris Kholodenko, Michael B Yaffe, and Walter Kolch. Computational approaches for
analyzing information flow in biological networks. Science Signaling, 5(220):rel-rel,
April 2012.
[76] John Kinzfogl, Giao Hangoc, and Hal E Broxmeyer. Neurexophilin 1 suppresses the
proliferation of hematopoietic progenitor cells. Blood, 118(3):565-575, July 2011.
[77] Ryunosuke Kogo, Koshi Mimori, Fumiaki Tanaka, Shizuo Komune, and Masaki Mori.
Clinical significance of miR-1 46a in gastric cancer cases. Clinical cancer research
: an officialjournal of the American Association for Cancer Research, 17(13):4277-
4284, July 2011.
[78] F Kosova, FQ Kurt, E Olmez, I Tuglu, and Z Ari. Effects of caffeic acid phenethyl
ester on matrix molecules and angiogenetic and anti-angiogenetic factors in gastric
137
cancer cells cultured on different substrates. Biotechnic and histochemistry: official
publication of the Biological Stain Commission, 91(1 ):38-47, jan 2016.
[79] Christoph KOper, Helmut Bartels, Maria-Luisa Fraek, Franz X Beck, and Wolfgang
Neuhofer. Ectodomain shedding of pro-TGF-alpha is required for COX-2 induction
and cell survival in renal medullary cells exposed to osmotic stress. AJP: Cell Phys-
iology, 293(6):C1 971-C1982, October 2007.
[80] M A Kutuzov, N Bennett, and A V Andreeva. Protein phosphatase with EF-hand
domains 2 (PPEF2) is a potent negative regulator of apoptosis signal regulating
kinase-1 (ASK1). The international journal of..., 2010.
[81] Marie F La Russa and Lei S Qi. The New State of the Art: Cas9 for Gene Activa-
tion and Repression. Molecular and Cellular Biology, 35(22):3800-3809, November
2015.
[82] A Laine and Z Ronai. Regulation of p53 localization and transcription by the HECT
domain E3 ligase WWP1. Oncogene, 26(10):1477-1483, August 2006.
[83] Mark N Lee, Matthew Roy, Shao-En Ong, Philipp Mertins, Alexandra-Chloe Villani,
Weibo Li, Farokh Dotiwala, Jayita Sen, John G Doench, Megan H Orzalli, Igor Kram-
nik, David M Knipe, Judy Lieberman, Steven A Carr, and Nir Hacohen. Identification
of regulators of the innate immune response to cytosolic DNA and retroviral infection
by an integrative approach. Nature Immunology, 14(2):179-185, December 2012.
[84] Taebum Lee, Boram Lee, YoonLa Choi, Joungho Han, Myung-Ju Ahn, and Sang-
Won Um. Non-small Cell Lung Cancer with Concomitant EGFR, KRAS, and ALK
Mutation: Clinicopathologic Features of 12 Cases. Journal of pathology and transla-
tional medicine, April 2016.
[85] X Li, J Long, T He, R Belshaw, and J Scott. Integrated genomic approaches identify
major pathways and upstream regulators in late onset Alzheimer's disease. Scientific
reports, 2015.
138
[86] X Li, M Wilmanns, J Thornton, and m Kohn. Elucidating human phosphotase-
substrate networks. Science Signaling, pages 1-15, May 2013.
[87] C Lindet, F R6villion, V Lhotellier, and L Hornez. Relationships between proges-
terone receptor isoforms and the HER/ErbB receptors and ligands network in 299
primary breast cancers. ... journal of biological..., 2011.
[88] Georgina V Long, Alexander M Menzies, Adnan M Nagrial, Lauren E Haydu,
Anne L Hamilton, Graham J Mann, T Michael Hughes, John F Thompson,
Richard A Scolyer, and Richard F Kefford. Prognostic and clinicopathologic asso-
ciations of oncogenic BRAF in metastatic melanoma. Journal of Clinical Oncology,
29(10):1239-1246, April 2011.
[89] Biao Luo, Hiu Wing Cheung, Aravind Subramanian, Tanaz Sharifnia, Michael
Okamoto, Xiaoping Yang, Greg Hinkle, Jesse S Boehm, Rameen Beroukhim, Bar-
bara A Weir, Craig Mermel, David A Barbie, Tarif Awad, Xiaochuan Zhou, Tuyen
Nguyen, Bruno Piqani, Cheng Li, Todd R Golub, Matthew Meyerson, Nir Hacohen,
William C Hahn, Eric S Lander, David M Sabatini, and David E Root. Highly par-
allel identification of essential genes in cancer cells. PNAS, 105(51):20380-20385,
December 2008.
[90] K D Maclsaac and E Fraenkel. Sequence analysis of chromatin immunoprecipita-
tion data for transcription factors. Computational Biology of Transcription Factor ... ,
2010.
[91] Howard L McLeod. Cancer pharmacogenomics: early promise, but concerted effort
needed. Science, 339(6127):1563-1566, March 2013.
[92] Corbin E Meacham, Emily E Ho, Esther Dubrovsky, Frank B Gertler, and Michael T
Hemann. In vivo RNAi screening identifies regulators of actin dynamics as key deter-
minants of lymphoma progression. Nature Genetics, 41(10):1133-1137, September
2009.
139
[93] Corbin E Meacham, Lee N Lawton, Yadira M Soto-Feliciano, Justin R Pritchard,
Brian A Joughin, Tobias Ehrenberger, Nina Fenouille, Johannes Zuber, Richard T
Williams, Richard A Young, and Michael T Hemann. A genome-scale in vivo loss-of-
function screen identifies Phf6as a lineage-specific regulator of leukemia cell growth.
Genes & development, 29(5):483-488, March 2015.
[94] JP Mesirov and LA Garraway. Systematic investigation of genetic vulnerabilities
across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. In
Proceedings of the ... , pages 1-6, 2011.
[95] Aaron S Meyer, Miles A Miller, Frank B Gertler, and Douglas A Lauffenburger. The
receptor AXL diversifies EGFR signaling and limits the response to EGFR-targeted
inhibitors in triple-negative breast cancer cells. Science Signaling, 6(287):ra66, Au-
gust2013.
[96] Y Mochida, K Takeda, M Saitoh, and H Nishitoh. ASK1 inhibits interleukin-1 -induced
NF-KB activity through disruption of TRAF6-TAK1 interaction. ... journal of biological
... ,2000.
[97] Stephanie Mohr, Chris Bakal, and Norbert Perrimon. Genomic Screening with RNAi:
Results and Challenges. Annual Review of Biochemistry, 79(1):37-64, June 2010.
[98] Stephanie E Mohr, Jennifer A Smith, Caroline E Shamu, Ralph A NeumOller, and
Norbert Perrimon. RNAi screening comes of age: improved techniques and com-
plementary approaches. Nature Reviews Molecular Cell Biology, 15(9):591-600,
August 2014.
[99] M K Montgomery, S A Kostas, S E Driver, and C C Mello. Potent and specific genetic
interference by double-stranded RNA in Caenorhabditis elegans. Nature, 1998.
[100] Leonardo Morsut, Kole T Roybal, Xin Xiong, Russell M Gordley, Scott M Coyle,
Matthew Thomson, and Wendell A Lim. Engineering Customized Cell Sensing
and Response Behaviors Using Synthetic Notch Receptors. Cell, 164(4):780-791,
February 2016.
140
[101] Tarek K Motawi, Samy A Abdelazim, Hebatallah A Darwish, Eman M Elbaz, and
Samia A Shouman. Could Caffeic Acid Phenethyl Ester Expand the Antitumor Effect
of Tamoxifen in Breast Carcinoma? Nutrition and cancer, pages 1-11, March 2016.
[102] Charles G Mullighan, Jinghui Zhang, Lawryn H Kasper, Stephanie Lerach, Debbie
Payne-Turner, Letha A Phillips, Sue L Heatley, Linda Holmfeldt, J Racquel Collins-
Underwood, Jing Ma, Kenneth H Buetow, Ching-Hon Pui, Sharyn D Baker, Paul K
Brindle, and James R Downing. CREBBP mutations in relapsed acute lymphoblastic
leukaemia. Nature, 471(7337):235-239, March 2011.
[103] Takayuki Nagata, Kazuko Murata, Ryo Murata, Shu-lan Sun, Yutaro Saito, Shuhei
Yamaga, Nobuyuki Tanaka, Keiichi Tamai, Kunihiko Moriya, Noriyuki Kasai, Kazuo
Sugamura, and Naoto Ishii. Hepatocyte growth factor regulated tyrosine kinase sub-
strate in the peripheral development and function of B-cells. Biochemical and Bio-
physical Research Communications, 443(2):351-356, January 2014.
[104] Kazuhito Naka, Takayuki Hoshii, Teruyuki Muraguchi, Yuko Tadokoro, Takako
Ooshio, Yukio Kondo, Shinji Nakao, Noboru Motoyama, and Atsushi Hirao. TGF-,3-
FOXO signalling maintains leukaemia-initiating cells in chronic myeloid leukaemia.
Nature, 463(7281):676-680, February 2010.
[105] M Neumann, M Seehawer, C Schlee, and S Vosberg. FAT1 expression and muta-
tions in adult acute lymphoblastic leukemia. Blood cancer ... , 2014.
[106] Christopher W Ng, Ferah Yildirim, Yoon Sing Yap, Simona Dalin, Bryan J Matthews,
Patricio J Velez, Adam Labadorf, David E Housman, and Ernest Fraenkel. Extensive
changes in DNA methylation are associated with expression of mutant huntingtin.
2013.
[107] 0 H Ng, Y Erbilgin, S Firtina, T Celkan, and Z Karakas. Deregulated WNT signaling
in childhood T-cell acute lymphoblastic leukemia. Blood cancer ... , 2014.
141
[108] Matthew J Niederst and Jeffrey A Engelman. Bypass mechanisms of resistance
to receptor tyrosine kinase inhibition in lung cancer. Science Signaling, 6(294):re6,
September 2013.
[109] Y Niitsu, Y Urushizaki, Y Koshida, K Terui, and K Mahara. Expression of TGF-beta
gene in adult T cell leukemia. Blood, pages 1-5, November 2015.
[110] L K Nottingham, C H Yan, X Yang, H Si, J Coupar, Y Bian, T-F Cheng, C Allen,
P Arun, D Gius, L Dang, C Van Waes, and Z Chen. Aberrant IKKa and IKK 3 co-
operatively activate NF-B and induce EGFR/AP1 signaling to promote survival and
migration of head and neck cancer. Oncogene, 33(9):1135-1147, March 2013.
[111] Petra Obexer and Michael J Ausserlechner. X-linked inhibitor of apoptosis protein -
a critical death resistance regulator and therapeutic target for personalized cancer
therapy. Frontiers in oncology, 4:197, 2014.
[112] Wataru Okamoto, Isamu Okamoto, Tokuzo Arao, Kiyoko Kuwata, Erina Hatashita,
Haruka Yamaguchi, Kazuko Sakai, Kazuyoshi Yanagihara, Kazuto Nishio, and
Kazuhiko Nakagawa. Antitumor action of the MET tyrosine kinase inhibitor crizotinib
(PF-02341066) in gastric cancer positive for MET amplification. Molecular Cancer
Therapeutics, 11 (7):1557-1564, July 2012.
[113] N Oue, Y Naito, T Hayashi, M Takigahira, A Kawano-Nagatsuma, K Sentani,
N Sakamoto, H Zarni Oo, N Uraoka, K Yanagihara, A Ochiai, H Sasaki, and W Yasui.
Signal peptidase complex 18, encoded by SEC11 A, contributes to progression via
TGF-a secretion in gastric cancer. Oncogene, 33(30):3918-3926, September 2013.
[114] A Pallis, E Briasoulis, and H Linardou. Mechanisms of resistance to epidermal
growth factor receptor tyrosine kinase inhibitors in patients with advanced non-small-
cell lung cancer: clinical and molecular .... Current medicinal..., 2011.
[115] W Pao and J Chmielecki. Rational, biologically based treatment of EGFR-mutant
non-small-cell lung cancer. Nature Reviews Cancer, 2010.
142
[116] Evan 0 Paull, Daniel E Carlin, Mario Niepel, Peter K Sorger, David Haussler, and
Joshua M Stuart. Discovering causal pathways linking genomic events to transcrip-
tional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformat-
ics, 29(21):2757-2764, November 2013.
[117] Nicole A Perry, Marey Shriver, Marie G Mameza, Bryan Grabias, Eric Balzer, and
Aikaterini Kontrogianni-Konstantopoulos. Loss of giant obscurins promotes breast
epithelial cell survival through apoptotic resistance. FASEBjournal: officialpublica-
tion of the Federation of American Societies for Experimental Biology, 26(7):2764-
2775, July 2012.
[118] M S Pino, M Shrader, C H Baker, F Cognetti, H Q Xiong, J L Abbruzzese, and D J
McConkey. Transforming Growth Factor Expression Drives Constitutive Epidermal
Growth Factor Receptor Pathway Activation and Sensitivity to Gefitinib (Iressa) in
Human Pancreatic Cancer Cell Lines. Cancer Research, 66(7):3802-3812, April
2006.
[119] M E Pitulescu and R H Adams. Eph/ephrin molecules-a hub for signaling and en-
docytosis. Genes & development, 24(22):2480-2492, November 2010.
[120] Trevor J Pugh, Shyamal Dilhan Weeraratne, Tenley C Archer, Daniel A Pomer-
anz Krummel, Daniel Auclair, James Bochicchio, Mauricio 0 Carneiro, Scott L
Carter, Kristian Cibulskis, Rachel L Erlich, Heidi Greulich, Michael S Lawrence,
Niall J Lennon, Aaron McKenna, James Meldrim, Alex H Ramos, Michael G Ross,
Carsten Russ, Erica Shefler, Andrey Sivachenko, Brian Sogoloff, Petar Stojanov,
Pablo Tamayo, Jill P Mesirov, Vladimir Amani, Natalia Teider, Soma Sengupta,
Jessica Pierre Francois, Paul A Northcott, Michael D Taylor, Furong Yu, Gerald R
Crabtree, Amanda G Kautzman, Stacey B Gabriel, Gad Getz, Natalie Jager, David
T W Jones, Peter Lichter, Stefan M Pfister, Thomas M Roberts, Matthew Meyerson,
Scott L Pomeroy, and Yoon-Jae Cho. Medulloblastoma exome sequencing uncovers
subtype-specific somatic mutations. Nature, 488(7409):106-110, July 2012.
143
[121] J Rameseder, K Krismer, Y Dayma, T Ehrenberger, M K Hwang, E M Airoldi, S R
Floyd, and M B Yaffe. A Multivariate Computational Method to Analyze High-Content
RNAi Screening Data. Journal of Biomolecular Screening, 20(8):985-997, August
2015.
[122] Amy Rapkiewicz, Virginia Espina, Jo Anne Zujewski, Peter F. Lebowitz, Armando
Filie, Julia Wulfkuhle, Kevin Camphausen, Emanuel Petricoin, Lance Liotta, and An-
drea Abati. The Needle in the Haystack: Application of Breast Fine-needle Aspira-
tion Samples to Quantitative Protein Microarray Technology. Cancer, 111:1-12, May
2007.
[123] Dashnamoorthy Ravi, Amy M Wiles, Selvaraj Bhavani, Jianhua Ruan, Philip Leder,
and Alexander J R Bishop. A network of conserved damage survival pathways re-
vealed by a genomic RNAi screen. PLoS Genetics, 5(6):el 000527, June 2009.
[124] Sabry Razick, Antonio Mora, Katerina Michalickova, Paul Boddie, and Ian M Don-
aldson. iRefScape. A Cytoscape plug-in for visualization and data mining of protein
interaction data from iRef Index. BMC Bioinformatics, 12:388, 2011.
[125] Garrett W Rhyasen, Lyndsey Bolanos, Jing Fang, Andres Jerez, Mark Wunder-
lich, Carmela Rigolino, Lesley Mathews, Marc Ferrer, Noel Southall, Rajarshi Guha,
Jonathan Keller, Craig Thomas, Levi J Beverly, Agostino Cortelezzi, Esther N
Oliva, Maria Cuzzola, Jaroslaw P Maciejewski, James C Mulloy, and Daniel T Star-
czynowski. Targeting IRAK1 as a therapeutic approach for myelodysplastic syn-
drome. Cancer Cell, 24(1):90-104, July 2013.
[126] Jacob M Rowe. Reasons for optimism in the therapy of acute leukemia. Best practice
& research. Clinical haematology, 28(2-3):69-72, June 2015.
[127] Kole T Roybal, Levi J Rupp, Leonardo Morsut, Whitney J Walker, Krista A McNally,
Jason S Park, and Wendell A Lim. Precision Tumor Recognition by T Cells With
Combinatorial Antigen-Sensing Circuits. Cell, 164(4):770-779, February 2016.
144
[128] A Russo, T Franchina, GRR Ricciardi, and A Picone. A decade of EGFR inhibition
in EGFR-mutated non small cell lung cancer (NSCLC): Old successes and future
perspectives. Oncotarget, 2015.
[129] V S Sabbisetti, K Ito, C Wang, and L Yang. Novel assays for detection of urinary
KIM-1 in mouse models of kidney injury. Toxicological..., 2012.
[130] Claudia Scholl, Stefan Fr6hling, Ian F Dunn, Anna C Schinzel, David A Barbie,
So Young Kim, Serena J Silver, Pablo Tamayo, Raymond C Wadlow, Sridhar Ra-
maswamy, Konstanze DOhner, Lars Bullinger, Peter Sandy, Jesse S Boehm, David E
Root, Tyler Jacks, William C Hahn, and D Gary Gilliland. Synthetic Lethal Interaction
between Oncogenic KRAS Dependency and STK33 Suppression in Human Cancer
Cells. Cell, 137(5):821-834, May 2009.
[131] Atefeh Seghatoleslam, Farzaneh Bozorg-Ghalati, Ahmad Monabati, Mohsen
Nikseresht, and Ali Akbar Owji. UBE2Q1, as a Down Regulated Gene in Pedi-
atric Acute Lymphoblastic Leukemia. International journal of molecular and cellular
medicine, 3(2):95-101, 2014.
[132] Jing Shi. Pathogenetic mechanisms in gastric cancer. World Journal of Gastroen-
terology, 20(38):13804, 2014.
[133] K Shostak and A Chariot. EGFR and NF-rB: partners in cancer. Trends in molecular
medicine, 2015.
[134] Frederic D Sigoillot and Randall W King. Vigilance and validation: Keys to success
in RNAi screening. ACS Chemical Biology, 6(1):47-60, January 2011.
[135] Frederic D Sigoillot, Susan Lyman, Jeremy F Huckins, Britt Adamson, Eunah Chung,
Brian Quattrochi, and Randall W King. A bioinformatics method identifies prominent
off-targeted transcripts in RNAi screens. Nature Publishing Group, 9(4):363-366,
February 2012.
[136] R Somasundaram, M Prasad, and J Ungerbdck. Transcription factor networks in
B-cell differentiation link development to acute lymphoid leukemia. Blood, 2015.
145
[137] Simona Soverini, Caterina De Benedittis, Cristina Papayannidis, Stefania Paolini,
Claudia Venturi, Ilaria lacobucci, Mario Luppi, Paola Bresciani, Marzia Salvucci,
Domenico Russo, Simona Sica, Ester Orlandi, Tamara Intermesoli, Antonella
Gozzini, Massimiliano Bonifacio, Gian Matteo Rigolin, Fabrizio Pane, Michele Bac-
carani, Michele Cavo, and Giovanni Martinelli. Drug resistance and BCR-ABL
kinase domain mutations in Philadelphia chromosome-positive acute lymphoblas-
tic leukemia from the imatinib to the second-generation tyrosine kinase inhibitor
era: The main changes are in the type of mutations, but not in the fr. Cancer,
120(7):1002-1009, December 2013.
[138] R Steuer, J Kurths, C 0 Daub, J Weise, and J Selbig. The mutual information:
detecting and evaluating dependencies between variables. Bioinformatics, 18 Suppl
2:S231-40, 2002.
[139] Michael R Stratton, Peter J Campbell, and P Andrew Futreal. The cancer genome.
Nature, 458(7239):719-724, April 2009.
[140] A Subramanian and P Tamayo. Gene set enrichment analysis: a knowledge-based
approach for interpreting genome-wide expression profiles. In Proceedings of the
... 2005.
[141] Y Tanaka, N Tanaka, Y Saeki, K Tanaka, M Murakami, T Hirano, N Ishii, and K Sug-
amura. c-Cbl-Dependent Monoubiquitination and Lysosomal Degradation of gp130.
Molecular and Cellular Biology, 28(15):4805-4818, July 2008.
[142] J Taylor and S Woodcock. A Perspective on the Future of High-Throughput RNAi
Screening: Will CRISPR Cut Out the Competition or Can RNAi Help Guide the Way?
Journal of Biomolecular Screening, 20(8):1040-1051, August 2015.
[143] Andrea R Tentner, Michael J Lee, Gerry J Ostheimer, Leona D Samson, Douglas A
Lauffenburger, and Michael B Yaffe. Combined experimental and computational
analysis of DNA damage signaling reveals context-dependent roles for Erk in apop-
tosis and G1/S arrest after genotoxic stress. Molecular Systems Biology, 8:1-18,
January 2012.
146
[144] M Toyoshima, N Tanaka, J Aoki, Y Tanaka, K Murata, M Kyuuma, H Kobayashi,
N Ishii, N Yaegashi, and K Sugamura. Inhibition of Tumor Growth and Metastasis by
Depletion of Vesicular Sorting Protein Hrs: Its Regulatory Role on E-Cadherin and
-Catenin. Cancer Research, 67(11):5162-5171, June 2007.
[145] Linh M Tran, Bin Zhang, Zhan Zhang, Chunsheng Zhang, Tao Xie, John R Lamb,
Hongyue Dai, Eric E Schadt, and Jun Zhu. Inferring causal genomic alterations in
breast cancer using gene expression data. BMC systems biology, 5(1):121, August
2011.
[146] Nurcan Tuncbag, Alfredo Braunstein, Andrea Pagnani, Shao-shan Carol Huang,
Jennifer Chayes, Christian Borgs, Riccardo Zecchina, and Ernest Fraenkel. Simul-
taneous reconstruction of multiple signaling pathways via the prize-collecting steiner
forest problem. Journal of Computational Biology, 20(2):124-136, February 2013.
[147] J W Tyner, A M Jemal, M Thayer, B J Druker, and B H Chang. Targeting survivin and
p53 in pediatric acute lymphoblastic leukemia. Leukemia, 2012.
[148] Beth 0 Van Emburgh, Andrea Sartore-Bianchi, Federica Di Nicolantonio, Salvatore
Siena, and Alberto Bardelli. ScienceDirect. Molecular Oncology, 8(6):1084-1094,
September 2014.
[149] F Vandin, E Upfal, and B J Raphael. Algorithms for detecting significantly mutated
pathways in cancer. Research in Computational Molecular ... , 2010.
[150] Fabio Vandin, Eli Upfal, and Benjamin J Raphael. Algorithms for detecting signifi-
cantly mutated pathways in cancer. Journal of Computational Biology, 18(3):507-
522, March 2011.
[151] Marc Vidal, Michael E Cusick, and Albert-Lcszl6 Barabdsi. Interactome Networks
and Human Disease. Cell, 144(6):986-998, March 2011.
[152] J M Villaveces, R C Jimenez, P Porras, N del Toro, M Duesbury, M Dumousseau,
S Orchard, H Choi, P Ping, N C Zong, M Askenazi, B H Habermann, and H Her-
mjakob. Merging and scoring molecular interactions utilising existing community
147
standards: tools, use-cases and a case study. Database, 2015(0):bau131-bau131,
January 2015.
[153] B Vogelstein, N Papadopoulos, V E Velculescu, S Zhou, L A Diaz, and K W Kinzler.
Cancer Genome Landscapes. Science, 339(6127):1546-1558, March 2013.
[154] Joel Wagner, Alejandro Wolf-Yadlin, Mark Sevecka, Jennifer K Grenier, David E
Root, Douglas A Lauffenburger, and G MacBeath. Receptor Tyrosine Kinases Fall
Into Distinct Classes Based on Their Inferred Signaling Networks. Technical report,
March 2013.
[155] Zhen Ning Wee, Siti Maryam J M Yatim, Vera K Kohlbauer, Min Feng, Jian Yuan Goh,
Bao Yi, Puay Leng Lee, Songjing Zhang, Pan Pan Wang, Elgene Lim, Wai Leong
Tam, Yu Cai, Henrik J Ditzel, Dave S B Hoon, Ern Yu Tan, and Qiang Yu. IRAK1 is a
therapeutic target that drives breast cancer metastasis and resistance to paclitaxel.
Nature communications, 6:8746, 2015.
[156] Tim Weenink and Tom Ellis. Creation and characterization of component libraries for
synthetic biology. Methods in molecular biology (Clifton, N.J.), 1073:51-60, 2013.
[157] Scott Wilhelm, Christopher Carter, Mark Lynch, Timothy Lowinger, Jacques Dumas,
Roger A Smith, Brian Schwartz, Ronit Simantov, and Susan Kelley. Discovery and
development of sorafenib: a multikinase inhibitor for treating cancer. Nature reviews
Drug discovery, 5(10):835-844, October 2006.
[158] Lucia Wille, Melissa L Kemp, Peter Sandy, Christina L Lewis, and Douglas A Lauf-
fenburger. Epi-allelic Erk1 and Erk2 knockdown series for quantitative analysis of T
cell Erk regulation and IL-2 production. Molecular Immunology, 44(12):3085-3091,
May 2007.
[159] Richard T Williams, Willem den Besten, and Charles J Sherr. Cytokine-dependent
imatinib resistance in mouse BCR-ABL+, Arf-null lymphoblastic leukemia. Genes &
development, 21(18):2283-2287, September 2007.
148
[160] R.T. Williams, M.F. Roussel, and C.J. Sherr. Arf gene loss enhances oncogenicity
and limits imatinib response in mouse models of Bcr-Abl-induced acute lymphoblas-
tic leukemia. Proceedings of the National Academy of Sciences, 103(17):6688-
6693, 2006.
[161] Jennifer L Wilson, Michael T Hemann, Ernest Fraenkel, and Douglas A Lauffen-
burger. Integrated network analyses for functional genomic studies in cancer. Sem-
inars in Cancer Biology, 23(4):213-218, August 2013.
[162] Timothy R Wilson, Jane Fridlyand, Yibing Yan, Elicia Penuel, Luciana Burton, Emily
Chan, Jing Peng, Eva Lin, Yulei Wang, Jeff Sosman, Antoni Ribas, Jiang Li, John
Moffat, Daniel P Sutherlin, Hartmut Koeppen, Mark Merchant, Richard Neve, and
Jeff Settleman. Widespread potential for growth-factor-driven resistance to anti-
cancer kinase inhibitors. Nature, 487(7408):505-509, July 2012.
[163] Meng Xu, Yiqun Xie, Songshi Ni, and Hua Liu. The latest therapeutic strategies
after resistance to first generation epidermal growth factor receptor tyrosine kinase
inhibitors (EGFR TKIs) in patients with non-small cell lung cancer (NSCLC). Annals
of translational medicine, 3(7):96, May 2015.
[164] Michael B Yaffe. The scientific drunk and the lamppost: massive sequencing ef-
forts in cancer discovery and treatment. Science Signaling, 6(269):pel 3-pel 3, April
2013.
[165] Yinghai Ye, Xiaocong Zhou, Xiaoyang Li, Yinhe Tang, Yusheng Sun, and Jun Fang.
Inhibition of epidermal growth factor receptor signaling prohibits metastasis of gastric
cancer via downregulation of MMP7 and MMP13. Tumor Biology, 35(11):10891-
10896, August 2014.
[166] Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar,
Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan, David R Karger,
Susan Lindquist, and Ernest Fraenkel. Bridging high-throughput genetic and tran-
scriptional data reveals cellular responses to alpha-synuclein toxicity. Nature Genet-
ics, 41(3):316-323, March 2009.
149
[167] Benjamin Yeung, King-Ching Ho, and Xiaolong Yang. WWP1 E3 Ligase Targets
LATS1 for Ubiquitin-Mediated Degradation in Breast Cancer Cells. PLoS ONE,
8(4):e61027, April 2013.
[168] Jesse G Zalatan, Michael E Lee, Ricardo Almeida, Luke A Gilbert, Evan H White-
head, Marie La Russa, Jordan C Tsai, Jonathan S Weissman, John E Dueber, Lei S
Qi, and Wendell A Lim. Engineering complex synthetic transcriptional programs with
CRISPR RNA scaffolds. Cell, 160(1-2):339-350, January 2015.
[169] Lars Zender, Wen Xue, Johannes Zuber, Camile P Semighini, Alexander Kras-
nitz, Beicong Ma, Peggy Zender, Stefan Kubicka, John M Luk, Peter Schirmacher,
W Richard McCombie, Michael Wigler, James Hicks, Gregory J Hannon, Scott Pow-
ers, and Scott W Lowe. An Oncogenomics-Based In Vivo RNAi Screen Identifies
Tumor Suppressors in Liver Cancer. Cell, 135(5):852-864, November 2008.
[170] Min Zhang, Gopesh Srivastava, and Liwei Lu. The pre-B cell receptor and its function
during B cell development. Cellular & molecular immunology, 1(2):89-94, April 2004.
[171] X D Zhang, F Santini, R Lacson, S D Marine, Q Wu, L Benetti, R Yang, A McCamp-
bell, J P Berger, D M Toolan, E M Stec, D J Holder, K A Soper, J F Heyse, and
M Ferrer. cSSMD: assessing collective activity for addressing off-target effects in
genome-scale RNA interference screens. Bioinformatics, 27(20):2775-2781, Octo-
ber2011.
[172] Boyang Zhao, Justin R Pritchard, Douglas A Lauffenburger, and Michael T Hemann.
Addressing genetic tumor heterogeneity through computationally predictive combi-
nation therapy. Cancer Discovery, 4(2):166-174, February 2014.
[173] Boyang Zhao, Joseph C Sedlak, Raja Srinivas, Pau Creixell, Justin R Pritchard,
Bruce Tidor, Douglas A Lauffenburger, and Michael T Hemann. Exploiting Temporal
Collateral Sensitivity in Tumor Clonal Evolution. Cell, 165(1):234-246, March 2016.
[174] Y Zhen, L Guanghui, and Z Xiefu. Knockdown of EGFR inhibits growth and invasion
of gastric cancer cells. Cancer Gene Therapy, 21(11):491-497, November 2014.
150
[175] D Zhou, W Zou, Y Wei, and Y Zhou. Downregulation of c-Met expression does not
enhance the sensitivity of gastric cancer cell line MKN-45 to gefitinib. Molecular...,
2015.
151