Network analyses for functional genomic screens in cancer

Network analyses for functional genomic screens incancer

by

Jennifer L. Wilson

B.S. Biomedical Engineering, University of Virginia, 2010

SUBMITTED TO THE DEPARTMENT OF BIOLOGICAL ENGINEERING IN PARTIALFULFILLMENT OF THE REQUIREMENTS FOR, THE DEGREE OF

Doctor of Philosophy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUNE 2016

@Massachusetts Institute of Technology, 2016. All rights reserved.

Signature of Author:

MASSACHUSETTS INSTITUTEOF TECHNOLOGY

JUN 0 8 2016

LIBRARIESARCHIVE

Signature redacted(7 Department of Biological Engineering

Signature redactedCertified by:

May 3rd, 2016

Douglas Lauffenburger

Biological Engineering

Signature redactedAccepted by: -

Thesis Advisor

Forest M. White

Chair, Graduate Program Committee

This doctoral thesis has been examined by the following committee:

Ernest Fraenkel (Thesis Committee Chair), Biological Engineering, MIT

Michael Hemann, Department of Biology, MIT

Andreas Herrlich, Brigham and Women's, Harvard Institutes of Medicine

2

Network analyses for functional genomic screens in cancer

by

Jennifer L. Wilson

Submitted to the Department of Biological Engineering on May 3, 2016 in partialfulfillment of the requirements for the degree of Doctor of Philosophy

Abstract

Gene interference screens are a widely adopted and popular tool for uncovering genefunction but imperfections in the technology limit the power of these investigations. Thereare many completed and on-going RNAi investigations across a multitude of biological sys-tems because these experiments are scalable, cost-effective, and relatively easily adaptedto multiple experimental environments. The most influential disadvantage is that many ofthe individual reagents are non-specific and interfere with genes other than the intendedtarget. Efforts to improve limitations in RNAi have focused on statistical models and improv-ing reagents, yet have not explored using biological context to select gene targets. Thisthesis uses network modeling and data integration to provide context for gene interferencestudies, and demonstrates the utility of this approach in two systems:

Acute Lymphoblastic Leukemia (ALL) is a disease of undifferentiated B-cells that re-sults from accumulation of genetic lesions, yet we have an incomplete understanding of allgenes contributing to the disease and how they interact. To discover genetic mediators ofthis disease, we employ a genome-scale shRNA screen, and complement this data withdifferential mRNA expression and ChIP-seq data using network integration. The integratedmodel identifies processes not represented in any input set and predicts novel genes con-tributing to disease. We specifically validate the role of Wwpl as a tumor suppressor inALL.

Aberrant growth factor pathway activity drives cancer pathology and is the target ofmolecular cancer therapies. Specifically, the epidermal growth factor receptor (EFGR)pathway and its ligand, transforming growth factor alpha (TGFo) are clinically relevant togastric cancer. We use an shRNA screen and Prize Collecting Steiner Forest (PCSF) algo-rithm to discover the pathway regulating TGF shedding. This pathway identifies commonregulators of TGFa shedding and NFxB regulation, yet targeting NFxB and the EGFR path-way has thus far been unsuccessful in cancer therapies. Our network identifies IRAK1 asa viable path forward for modulating both TGFL and NFxB in gastric cancer.

Thesis Supervisor: Douglas A. LauffenburgerTitle: Ford Professor of Bioengineering

3

Acknowledgements

My years at MIT have been some of the best of my life so far. I have very much enjoyed

the enthusiasm of this environment, admired the quality of my peers, and appreciate the

challenges that come with research. It will be a bitter sweet departure.

Firstly, I am eternally grateful to my advisor Doug Lauffenburger. Without him, I may

not have pursued graduate school, let alone at MIT. Doug creates a collaborative, warm,

and enabling culture in the department and in his lab. His people skills are second to none,

and his ability to navigate difficult situations while supporting his students is impressive.

I thank you for your ability to diffuse whatever panic, frustration, or anxiety I brought to

our meetings, and for creating the relief to think critically about the science and tackle

challenging problems. Thank you for encouraging me to pursue this work and for making

it possible.

My thesis committee has also been instrumental to my success. My project was highly

collaborative, and I was fortunate to work closely with so many high caliber individuals.

Ernest Fraenkel - thank you for inviting me to join the lab and become a Fraenkelian.

The lab's expertise was invaluable to my work, and the immersion into the computational

environment was essential to all of my projects. Mike Hemann - thank you for including me

at group meetings, and opening your lab for experimental work. I will never forget the night

where you injected ~30 mice for me in under 30 minutes. Andreas Herrlich - Thank you for

taking a chance on my network modeling approaches, and allowing me to work on some

incredibly interesting biology with your group.

I would like to thank my colleagues and office mates in BE and beyond: Hsinhwa Le,

Sara Gosline, Simona Dalin, Eirini Kefaloyianni, Brian Joughin, Christina Harrison, Sarah

Schrier, Allison Claas, Melody Morris, Annelian Zweemer, Manu Kumar, Eleanor Fiedler,

Pete Bruno, Emanuel Kreidl, Nurcan Tuncbag, Chris Ng, Gabriela Pregernig, Anthony

Soltis, Susy Ramos, Jordan Bartlebaugh, and Aran Parillo. All of you have contributed to

making this work possible, and I would need an entire document to thank you individually.

Additionally, I want to express gratitude to my undergraduate student, Christina "Tia" Har-

rison. You've been an incredible companion and mentee through these projects and have

4

encouraged me to think about the work in a different light. I wish you the best in your future

endeavors as a scientist.

To my friends, Jenn Brophy, Byron Kwan, Marcus Parrish, and Erin Turowski, thank

you for regular games nights, hugs, and good food. My residents at McCormick Hall for

making my home life special; I have been humbled to know all of you and watch you

progress through MIT. I would like to thank the BE Communications Lab for the opportunity

to develop as a communication coach and develop as a professional scientist. Thank you

Jaime Goldstein, Scott Olesen, Diana Chien, and Georgia Lagoudas.

I would not be where I am now without the MIT Cycling Team and would like to thank

them separately as this group has been a definitive part of my time at MIT. Thank you for

accepting me as a novice rider, teaching me to ride a bike, and allowing me to develop

a new hobby. I cannot say enough about how teammates mitigate the stress of graduate

school and the uncertainty of research. I will miss training camps, morning coffee (and

pastry) rides, racing the team time trial in skinsuits and aero helmets, eating pie after

riding up big mountains, and having the opportunity to compete at collegiate nationals.

Thank you to my coaches, Nicole Freedman and Kolie Moore for supporting me through

training and teaching me how to race bikes. I'll look forward to seeing many of you for the

usual weekend rides that now increasing occur in the greater San Francisco area.

Lastly, this work wouldn't have been possible without my family. They have been sup-

portive of my unpredictable schedule, enthusiasm for all things bike racing, and have made

the highs and lows of this lifestyle bearable. I want to thank my mother, Carol Wilson, for

being an example of a nurturing, loving, career-driven, and successful female role model.

I want to thank my father, Charles Wilson, for being compassionate, understanding, and

for the many, many cards. I have kept them all. I would like to thank my younger sister,

Kelsey, for being the good daughter and for being a life companion; she knows me better

than most, and it takes a special person to have grown up with me and still want to hang

out. Thank you.

5

Contents

I Introduction 11

The role and consequences of biological pathways . . . . . . . . . . . . . . . . . 12

Network analyses for functional genomic screens . . . . . . . . . . . . . . . . . . 14

II Integrated Network Analyses for Functional Genomic Studies in Can-

cer 17

RNAi screens as a tool for cancer biology . . . . . . . . . . . . . . . . . . . . . . 18

RNAi screening challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Biological networks add contextual value . . . . . . . . . . . . . . . . . . . . . . 21

Functional pathways explain variability among genes classified as "hits" . . . . . 23

Network integration is a viable tool for hypothesis generation . . . . . . . . . . . 24

Reciprocal engineering for future gene-interference investigations . . . . . . . . 26

III Pathway-based network modeling finds hidden genes in shRNA screen

for regulators of Acute Lymphoblastic Leukemia 31

Insight, innovation, integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

R esults . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . 34

A network-based data integration scheme . . . . . . . . . . . . . . . . . . . 34

shRNA screen and mRNA expression data identify distinct and incomplete

gene sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Measuring histone activation for model specificity . . . . . . . . . . . . . . 40

SAMNet identifies a network of genes affecting ALL progression . . . . . . 41

Integrated approach finds genes connecting disparate data sets . . . . . . 45

Robustness metrics prioritize and predict genes relevant for ALL progression 47

Predicted pathway contains genes contributing to ALL progression specifi-

cally in vivo and genes affecting B-cell viability . . . . . . . . . . . . 48

6

Author Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Supplemental Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

IV Network modeling identifies common regulators of NFxB and TGFa

ectodomain shedding 62

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Results ..... ......................................... 67

A kinome/phosphatome screen for regulators of TGFu shedding . . . . . . 67

An shEnrich method for selecting gene candidates . . . . . . . . . . . . . . 68

PCSF identifies a TGFca shedding pathway . . . . . . . . . . . . . . . . . . 72

Robustness analysis prioritizes predicted pathway genes . . . . . . . . . . 75

NFxB activation and TGFa shedding share common regulators . . . . . . . 79


D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Supplemental Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Supplemental Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

V Mutual information identifies gene cohorts predictive of response to

Dasatinib treatment in Acute Lymphoblastic Leukemia 99

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

M ethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

An shRNA screen measures shRNA representation before and after treatmenti 05

A mutual information heuristic selects cohorts of co-acting shRNAs . . . . . 105

shRNA cohorts identify gene targets affecting disease progression and re-

sponse to treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7

Biclustering benchmarks mutual information heuristic . . . . . . . . . . . . . 116

In vitro validation measures shRNA cohort depleting in response to treatment 116


Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

VI Conclusions and Discussion 122

Characterization of cancer pathways will affect personalized and precision

medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Loss of function characterizations for forward engineering approaches . . . 126

8

List of Figures

1 Network Filtering Uses Functional Relations to Identify Candidate Targets. 28

2 Reciprocal Engineering Balances Forward and Reverse Engineering Princi-

ples . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Constructing a network model from multiple 'omic measurements. . . . . . 35

4 ChIP-seq with valley-finding identifies regions for transcription-factor binding. 41

5 SAMNet identifies integrated network for ALL progression. . . . . . . . . . . 44

6 SAMNet selects transcription factors that explain genes with greatest differ-

ential expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Validation shows in vitro and in vivo effects for Hgs and Wwpl. . . . . . . . 50

8 The whole network, unformatted as created by SAMNet. . . . . . . . . . . . 60

9 Percent reduction in mRNA after knockdown as measured via qPCR. . . . . 61

10 An shRNA screen measures kinase and phosphatase effects on TGFax shed-

ding ............................................. 68

11 shEnrich selects genes for consistency and effect size . . . . . . . . . . . . 71

12 PCSF identifies the TGFx shedding regulatory network . . . . . . . . . . . . 74

13 TAB1 increases and XIAP decreases TGFc at the surface . . . . . . . . . . 80

14 histograms for shEnrich (forward) . . . . . . . . . . . . . . . . . . . . . . . . 91

15 histograms for shEnrich (reverse) . . . . . . . . . . . . . . . . . . . . . . . . 92

16 Scatter plots for shEnrich method . . . . . . . . . . . . . . . . . . . . . . . . 93

17 Optimization of network node selection . . . . . . . . . . . . . . . . . . . . . 94

18 The full network selected by PCSF . . . . . . . . . . . . . . . . . . . . . . . 95

19 Centrality metrics test gene robustness. . . . . . . . . . . . . . . . . . . . . 96

20 Lightening plots for NFxB regulators. . . . . . . . . . . . . . . . . . . . . . 97

21 A longitudinal screen for mediators of ALL and response to Dasatinib . . . . 105

22 Hierarchical clustering of 'early' and 'late' fold-change data . . . . . . . . . 106

23 Intra-cluster distances for all mice . . . . . . . . . . . . . . . . . . . . . . . . 108

24 K-means clustering for all mice with k=9 . . . . . . . . . . . . . . . . . . . . 109

9

25 Mutual information between mice shows some mice are more similar than

others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

26 Component mutual information scores show that few clusters have high mu-

tual inform ation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

27 Hypergeometric testing of component comparisons. . . . . . . . . . . . . . 112

28 shRNA fold-changes for clusters with significant overlap . . . . . . . . . . . 115

29 M9cO and m3cl identify overlapping gene targets . . . . . . . . . . . . . . . 116

30 Biclustering of shRNAs across mice. . . . . . . . . . . . . . . . . . . . . . . 117

31 Branch 931 recapitulates trends in m9cO/m3cl clusters. . . . . . . . . . . . 118

32 In vitro competition assay shows shRNA cohort sensitizes ALL to Dasatinib

treatm ent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

List of Tables

1 Top 1 % of depleting shRNAs in vitro and in vivo. . . . . . . . . . . . . . . . 37

2 GO enrichment of shRNA targets from the in vitro screen. . . . . . . . . . 38

3 Genes selected as top candidates from mRNA expression data. . . . . . . 39

4 GO enrichment for genes up-regulated in vivo. . . . . . . . . . . . . . . . . 40

5 GO enrichment of network gene identifies processes associated with B-cell

leukem ia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Aggregate scoring of top network nodes. . . . . . . . . . . . . . . . . . . . 48

7 Gene hits selected by shEnrich . . . . . . . . . . . . . . . . . . . . . . . . . 72

8 Experimental genes ranked by multiple robustness metrics . . . . . . . . . 77

9 Predicted genes ranked by multiple robustness metrics . . . . . . . . . . . 78

10 Comparison of alternative scoring metrics for redundant shRNAs. . . . . . 98

11 Gene-targeting shRNAs in significantly overlapping clusters . . . . . . . . . 114

12 Gene targets overlapping between m9c0 and m3cl . . . . . . . . . . . . . . 116

10

Part I

Introduction

Gene interference technology and the advent of high-throughput functional

genomics

The ability to modulate gene function is central to pursuing many biological hypotheses.

Before the 1990s, classical genetic approaches depended upon the creation of mutated

organisms (e.g. plants, mice, yeast) to test gene function. The time required to test gene

function was enormous compared to modern methods, and this greatly limited the number

of genes tested and the number of contexts in which one could interrogate gene func-

tion. Fire and Mello's discovery of RNA-interference (RNAi) [99] drastically changed this

landscape by removing the need to create individual mutant organisms. By mimicking the

dsRNAs of invading pathogens, the technology manipulated the cell's inherent ability to

recognize and silence dsRNA sequences. Cells can process these dsRNA sequences

in a way that selectively binds, and degrades existing mRNA sequences that are compli-

mentary to the original dsRNA. Coupled with oligonucleotide synthesis technology, it was

relatively fast and easy to make these dsRNA reagents. Further, because the dsRNAs se-

lectively degraded complimentary sequences, one could target any gene by designing the

dsRNA to match a gene of interest. Finally, RNAi enabled manipulation of gene function in

living systems ad hoc and could be applied to many conditions, treatments, etc.

Nowadays, RNA-interference (RNAi) is just one tool among many functional genomic

technologies that enable the modulation of gene function. Other technologies include TAL-

ENS, CRISPR, and many varieties of RNAi (shRNAs, siRNAs, esiRNAs). All of these

technologies aimed to improve gene knockdown and used different techniques to reach

this goal. shRNAs and siRNAs differ in the chemical structure of the dsRNA, esiRNAs

undergo an enzymatic pre-processing before introduction into the cell, and TALENS and

CRISPR rely on distinct enzymatic processes for gene processing. For biologists, this

expanded tool set was an exciting addition to the artillery available for interrogating the

11

cell. With an experimental assay for a phenotype of interest, scientists could use RNAi to

uncover the contribution of a gene of interest to many biological processes.

In the age of high-throughput biological sciences, gene-interference technologies have

moved to screening platforms where many genes can be tested in tandem. Several exam-

ples of RNAi screens highlight the types of biological questions enabled by this technology:

the Broad Achilles project used genome wide RNAi screens in over 100 cancer cell lines

to systematically identify essential genes, and compare these effects across multiple can-

cer types[89]. Meacham et al adapted a genome-scale screen to test genetic mediators

of lymphoma in vivo[92]. Further applications include using RNAi to identify genes that

affect homologous recombination [1], searching for genes that exhibit synthetic lethality

with KRAS mutations[130], and identifying genes that regulate ectodomain shedding[24].

It appeared that RNAi technology had transitioned classical genetics into the realm of high-

throughput biology.

The role and consequences of biological pathways

Pathways biology appreciates the interconnectedness of gene products and is an attrac-

tive application of RNAi technologies. Pathways thinking considers a gene as part of a

collection of genes rather than as a single entity. In contrast to pathways thinking, reduc-

tionist theories characterize individual parts within the cell - genes, proteins, enzymes - but

are unable to appreciate the full spectrum of their interactions. For a simplistic example,

if knock-down of a given gene yielded reduced cell growth, one could infer that the gene

was required for the cell viability phenotype. In practice, these individual parts function in

a vast network of interactions, and biological phenotypes such as growth result from the

concerted effort of many genes. The effect of concerted gene action is that sometimes

a knocking down a single gene doesn't show an effect on a phenotype such as growth.

Especially in the case of disease characterization, biologists have come to appreciate the

interconnectedness and complexity of combinations of genes that give rise to pathology.

In cancer, biologists have recognized that combinations of gene alterations give rise to dis-

ease. At its simplest level, the discrepancy between reductionist and pathway biology can

12

be explained through a sports analogy: auditioning an athlete separately from the team

will rarely predict whether or not the team will win the game; rather it is how that player in-

teracts with and supports the other players that determines victory. Given the relative ease

of adaptation and the ability to conduct experiments in high-throughput, RNAi appeared

an attractive technology for perturbing whole pathways and presented a viable avenue for

moving classical genetic approaches into the realm of pathways biology.

As we discover new pathways using functional genetics, these gene annotations are

added to a growing cohort of existing pathways. Databases catalogue the sets of molecu-

lar interactions that constitute pathways. Examples of these databases include KEGG, GO,

STRING, MINT, HMDB, and iRefWeb. Annotated entries include growth factor pathways,

biochemical pathways, drug pathways, and many others. As we explore more biological

systems - for instance targeted therapies in cancer - it becomes possible to identify path-

ways for nuanced aspects of biology - pathways for disease progression, drug response,

and relapse.

Biological pathways have real consequences for cancer biology and therapeutic de-

velopment. Molecular-based medicine aims to classify cancers based on molecular vul-

nerabilities and then target these vulnerabilities with therapeutic compounds. A striking

example is that of Herceptin therapy, which was designed to target HER2, a protein recep-

tor that is over-expressed in some types of cancers. The treatment gained attention as it

increased overall survival for breast cancer patients with over-expressed HER2 receptor.

Extensive profiling of cancers, at the cell and patient levels, have endeavored to find these

molecular vulnerabilities, design novel therapies, and bring these to the clinic. However,

these molecular vulnerabilities belong to cellular pathways that are interconnected and re-

dundant. As a result, these pathways cause even well-designed and specific chemical

inhibitors to fail in the clinic, or worse, cause patients to become resistant to the initial ther-

apy. Even in the case of the acclaimed Herceptin therapy, patients saw remissions and

metastases after treatment. Metaphorically, this process can be compared to plugging a

leak in a water-filled balloon, except that plugging the initial leak forces additional leaks or

the expansion of smaller, and previously unnoticed leaks. Thus, the field of cancer biology

has moved to investigating these pathways holistically to characterize all molecular vulner-

13

abilities to try and design the best therapeutic interventions. A more extensive discussion

about resistance to therapy and the importance of pathways is included in Chapter 3.

Network analyses for functional genomic screens

RNAi quickly became a well-developed and widely adopted technology in the cancer bi-

ology field. There are numerous screens employed to tackle many biological problems.

However, attractive as the technology us, RNAi is not without limitations. The largest crit-

icisms arise as a result of 'off-target' effects. Primary off-target effects occur when the

RNAi reagent (siRNAs, shRNAs, etc.) have sequence matches with genes other than the

original target. Certain amounts of similarity between the sequences in dsRNA could af-

fect multiple mRNAs, and lead researchers to misidentify genes responsible for a given

phenotype. These effects diminish the power of any individual RNAi in a screen, making it

difficult to reproduce and validate results.

Scientists in RNAi biology have explored many techniques to fix the reagent-based lim-

itations of RNAi. Innovation has focused on improving sequence matching, adding chem-

ical modifications to each RNAi reagent to promote specific binding, creating algorithms

to predict if and when sequence mismatching will occur, and deploying redundant RNAis

against the same gene in aggregate to ensure on-target effects. I refer to these innovations

as reagent-based because they exclusively consider the performance of the technology to

improve existing approaches. I more thoroughly review these limitations and current inno-

vations in Chapter 1.

However, knowing that these genes belong to cellular pathways became a novel av-

enue for interpreting and modeling the results from these screens. For instance, using an

orchestral analogy, it's possible to continue the performance if any individual player (ex-

cluding soloists) is unable to play; the instrument section can compensate for this loss and

the concert continues. In cellular systems, which have evolved to be robust to perturba-

tions (such as mutations or copy number alterations), this same redundancy exists. Some

genes exist within redundant pathways or sections, and thus, even if you knock them down,

it may not be possible to 'hear' the effects of their loss. Knowing that genes act in concert

14

shifts the emphasis from discovering single genes to discovering cohorts of genes respon-

sible for a phenotype. This perspective changes the interpretation of RNAi screen results

and improves the utility of existing RNAi investigations.

However, as mentioned, pathway annotations are incomplete, and so resolving cohorts

from RNAi data is not as simple as matching the genes to an existing directory of biology.

Instead, we can leverage other biological information to create plausible pathways which

best explain the data. There exists databases of protein-protein interactions; catalogues

of which molecular entities (proteins, enzymes, receptors, genes, etc.) affiliate with each

other. Metaphorically, this information is akin to knowing the available roadways of biology.

If we think about genes as intersections of these biological roadways, we can infer path-

ways by collecting interactions between genes that scored in RNAi experiments. In this

process, we will also computationally predict additional gene intersections that are rele-

vant to our phenotype. This type of pathway discovery is valuable for growing our existing

pathway annotations.

In this thesis I explore the nexus of RNAi screens, and pathways construction in the

context of cancer. I employ network biology techniques to construct plausible biological

roadmaps using RNAi datasets, and use targeted molecular biology techniques to validate

the predictions from these models. This work has two main contributions: an extension of

seminal work to find pathways from loss-of-function screens[1 66] where I explore methods

for testing the robustness of network predictions and gain practical insights into how to find

signal within RNAi datasets that have not been designed for downstream network model-

ing. I expand the value of this methodological extension by validating specific predictions

from these networks to uncover new aspects of biology relevant to cancer.

The first chapter of this thesis work explores the challenges and limitations of RNAi

screening technology. While the majority of work has focused on improving the technology

itself, few investigations have considered using biology to overcome the limitations of RNAi.

I review functional genomic studies, their limitations, and articulate how network integra-

tion and pathways approaches can improve these investigations. In the second section, I

identify a pathway governing Acute Lymphoblastic Leukemia (ALL) progression using net-

work integration. This project demonstrates the value of network-based approaches and

15

uncovers novel biology missed by the original shRNA screen. In the third section, we in-

vestigate signaling regulators of TGFa shedding, a cellular growth factor that is important

for multiple biological processes. We construct a pathway around shRNA data and predict

novel genes affecting TGFa shedding. The network predicts that modulators of the NFxB

pathway also belong to the TGFa shedding pathway, thus identifying the first connection

between the NFxB pro-survival pathway and the shedding of this growth factor. In the

final project, we use a pathways-based understanding to formulate a heuristic for finding

cohorts of shRNAs governing response to therapy. This project uses an abstraction of

mutual information with machine learning to associate the behavior of these regions with

regions that have known function. Lastly, the discussion focuses on the potential benefits

of having another framework for interpreting gene interference screens. As these experi-

ments have been employed in numerous settings, a pathways-based framework stands to

have impacts in many areas of biology. This pathway-discovery framework is also attrac-

tive for forward-engineering contexts where engineers aim to assemble and select genes

to program a specific cellular output.

16

Part I

Integrated Network Analyses for FunctionalGenomic Studies in Cancer

Discovery of gene products vital for function of a biological system, whether cells in

vitro or tissues/organs/animals in vivo, using gene-interference studies at both small and

large scales has become increasing popular because of the capability for RNAi methods

for manipulating multiple cellular components in either biased or unbiased manner. These

experiments aspire to identify high-confidence "hit" sets as putatively responsible for an

experimental phenotype and conceivably imaginable as drug "targets", although requiring

dedicated follow-up tests to buttress confidence in validity. Typically, the findings from the

initial "screen" study are compiled as list of individual genes whose knockdown yielded

significant alteration of biological system function, and the follow-up validation experiments

are considered in isolation. While there are encouraging successes along this avenue, the

realization that molecular components executing or governing cell/tissue phenotypic oper-

ation work in concert among myriad dynamic partners - directly and indirectly - motivates

appreciation for considering a more integrative perspective on interpretation of RNAi-based

functional genomic studies.

'Concerted' operation brings to mind an instrumental orchestra as one notional metaphor.

Proper generation of a musical program depends on the collective efforts of the players

involved, and deviations of any individual in pitch, volume, or timing can produce inap-

propriate sound and affect the overall orchestral performance as other individuals attempt

to adapt - or naturally produce further errors themselves. The sound of any particular

individual is rarely decisive, while an instrumental section can either mitigate or amplify

aberrations and other instrumental sections may aim to compensate. Accordingly, flawed

performance may be viewed as arising from identifiable "drivers" but sustained pathology

is more likely manifested by inability of the overall company to find an appropriate new bal-

ance via diverse modulations. And when aspiring for remediation, as the music proceeds

the original deviations no longer remain the most effective points of correction because

the propagated adaptations and compensations render a simple "re-set" difficult to achieve

17

dynamically.

We use this integrative, or 'concerted' point of view to inform our recommendations

about the investigation of cancer systems using RNAi. We offer that a most effective

framework uses multi-node pathways for gaining greatest insight about how a system is

dysregulated and for how that system might be remediated, and further that this point of

view is essential to RNAi analyses.

RNAi screens as a tool for cancer biology

Because cancer is a mutation-driven disease, many investigators have focused on using

genetic characterizations of cancers, yet there are often non-intuitive relationships between

gene features and disease phenotypes[1 1, 139, 91, 10]. Because cancer is a mutation-

driven disease, many investigators have focused on using genetic characterizations of

cancers, yet there are often non-intuitive relationships between gene features and disease

phenotypes [139, 153, 97, 71, 151]. Further, occurrence of drug resistance also does

not exhibit direct correlation with mutational status [91, 134]. For instance, in pediatric

medulloblastoma, systematic measurement of mutation-status and transcriptional profiling

revealed that mutation rates are not consistent across pediatric tumors[1 34, 120].

In our orchestral analogy, these investigations are akin to rating the quality of the com-

pany using each players' individual audition. This perspective lacks context and an un-

derstanding of the player's contribution to the orchestra's performance. To account for this

context, investigators have turned to RNA-mediated interference (RNAi) technologies to

fine tune a genetic player's ability. These tools manipulate genetic features at a functional

level and may be a complementary approach for studying the non-intuitive relationship be-

tween mutation, expression, and disease phenotype[97, 134, 65]just as a conductor may

better appreciate a musician's performance while playing with their section.

From an engineering perspective, gene-interference experiments are attractive exper-

iments for understanding cancer because of the opportunity to modulate gene function

under diverse potentially relevant conditions. Investigators have targeted single genes,

or multiple genes together, in large scale screens, as well as pathway specific studies

[97, 134]. When investigating genetic amplifications in liver cancer, one group simulta-

18

neously explored the role of these amplification events and the relative contribution of

the in vivo environment with a genome-scale RNAi screen [1, 169]. In this instance, and

many others, RNAi screens afford the opportunity to explore numerous targets simultane-

ously. The Achilles Project from the Broad Institute added another dimension to genome-

wide screens by drastically increasing the scale of their investigation and challenging the

reproducibility of shRNA libraries. They introduced a library of shRNAs into more than

100 established cancer cell lines and identified functional phenotypes that were common

and unique to each cell line [135, 94]. Researchers can take advantage of varying RNAi

reagent targeting efficacy to create titrations of gene interference, known as epi-allelic

series [135, 56]. This technique manipulates variation in mRNA expression to create a

gradient of disease phenotype. As expected, this approach created varying lymphoma

phenotypes which increased in disease severity as shRNA targeting efficiency against

p53 increased [56]. While we note here only a few investigations, RNAi experiments lend

themselves to the perturbation of many more parameters: multiple cues, multiple dosing

schemes, multiple environments, and multiple time points.

RNAi reagents hold significant advantages over other interference methods, such as

small molecule inhibitors. More specifically, siRNA offers the advantage of isoform speci-

ficity and enable fine-tuning of individual isoform expression and activity. For an investiga-

tion of T-cell Erk regulation, researchers used epi-allelic series with siRNAs against ERK1

and ERK2 to identify the role of these kinases on downstream IL-2 production [158]. The

epi-allelic series again showed a correlation between siRNA targeting efficiency and phe-

notype. In addition, the researchers identified that IL-2 production scaled with total ERK

activation and was not isoform specific. When siRNA-mediated effects on IL-2 to those

of a MEK inhibitor's effect, they also found that the gene-interference methods reduced

IL-2 production to a greater extent than chemical inhibitor dosing at an equivalent level of

ERK activation [139, 151, 158]. From this finding they inferred that ERK may also have a

role as a scaffold in downstream IL2 production; such a phenomenon may have not been

indicated using only either approach alone.

19

RNAi screening challenges

Gene interference screens are quickly becoming high-throughput, but they are poorly

suited to the well-accepted data analysis tools from other 'omics biology experiments.

Birmingham et al (2009) provide a thorough review of statistical adaptations for target

discovery from RNAi experiments [11, 91]. Generally, these adaptations consist of nor-

malization, and some means of 'top-hit' identification based on outstanding performance

relative to the remaining population. However, inconsistent reagent performance limits

statistical power and subsequent validation of these candidates often fails.

Variability in RNAi screening data can derive from a variety of factors, both off-target

and crosstalk events, and cause varying rates of false positives and false negatives in RNAi

screens, reducing confidence in final hit selection [97, 71, 120]. Off-target events are a non-

specific result of the experimental reagents, and may include the inadvertent knockdown

of additional transcripts through microRNA-like effects and the incomplete knockdown of

a protein target due to a protein half-life greater than the experimental timeline. Crosstalk

events, on the other hand, are a result of the biological response to RNAi perturbation as

opposed to the experimental reagents used. These events may include increased expres-

sion of transcripts normally repressed by microRNAs that have to compete for use of the

internal degradation machinery, and increased expression or activity of proteins which are

compensatory for the RNAi target [97, 134, 65].

Many approaches attempt to compensate for off-target effects. One method utilizes

multiple RNAi reagents against the same gene, and only considers the gene a hit if mul-

tiple reagents yield a similar phenotype [97, 134]. However, the ability to identify true

positives from redundant reagents is complicated by the targeted gene product's context

within the cell [134, 169]. For example, unintended effects are less likely for gene targets

with highly specific, non-redundant roles or those that exist in linear pathways. However, for

highly connected genes or those involved in multiple pathways, there is a greater chance

of biological crosstalk, and thus varied results between redundant siRNAs [134, 94].

A genome-wide screen for homologous recombination (HR) mediators highlights the

role of unintended effects and how redundant RNAi reagents may mislead results [1]. For

20

instance, 5 out of 10 RNAi reagents against the HIRIP3 gene decreased capacity for ho-

mologous recombination. While all reagents successfully reduced mRNA expression, res-

cue experiments with RNAi-resistant mRNA failed to recover homologous recombination

activity. Further, relative mRNA expression changes did not correlate with changes in ho-

mologous recombination.

Computational analyses of sequence similarity between siRNA reagents and non-targeted,

mRNA transcripts can predict off-target effects but is imperfect in all situations. Genome-

wide enrichment of seed sequences (GESS) analysis looks for enrichment of non-targeted

3' UTR regions in siRNA sense and antisense sequences [135]. In theory, these 3'UTR

matches identify unintended target genes and subsequent modulation of these genes

should recapitulate the phenotype erroneously assigned to the original siRNA. The method

successfully identifies genes enriched in active siRNAs for multiple screens, and can filter

primary screening hits to decrease the false positive rate [135].

In the previously mentioned screen for homologous recombination mediators, GESS

analysis identified a significant enrichment for RAD51 3' UTR in the high-scoring, non-

RAD51 siRNAs [1]. As expected, RAD51 mRNA was depleted in the presence of 4 of

the 7 siRNAs against HIRIP3 and RAD51 mRNA levels better correlated with changes in

the homologous recombination phenotype than HIRIP3 mRNA levels. Yet, only 1 of the 7

HIRIP3 siRNAs actually contained the seed match for the RAD51 UTR demonstrating that

additional cross-talk events may occur in the presence of the HIRIP3 siRNAs. While GESS

successfully identified RAD51 mRNA levels as the true predictor for homologous recombi-

nation, it was unable to fully explain the observed changes in this gene's transcription, as

all HIRIP3 siRNAs did not reduce RAD51.

Biological networks add contextual value

A network framework enables researchers to consider contextual influences on how path-

way components assimilate, integrate, and propagate knowledge in a manner that is dis-

tinct from the list model [9, 75]. More specifically, a network cluster, consisting of a coherent

group of functionally-related genetic regulators, may better explain an observed phenotype

21

where statistically-ranked lists are insufficient [10, 164]. Already, these network motifs for

target discovery have lead to better understanding of the non-intuitive relationships be-

tween genotype and disease phenotype and identification of better therapeutic targets

[10, 151].

Networks can be useful for predicting drug targets and also for selecting drug combina-

tions [75]. Their functional context provides rational selection of single targets as well

as combinatorial targets that could synergistically affect a desired phenotype because

they consider pathway membership [75]. Where toxicity had previously constrained the

selection of combination therapies, researchers may now instead prioritize combinations

based on specificity to controlling a particular phenotype. Understanding macromolecu-

lar pathways led to conclusions about synergies between doxorubicin and TNF-alpha as

chemotherapeutic agents [143]. The authors showed that administering TNF-alpha as an

adjuvant to doxorubicin treatment increased apoptotic cell death in the presence of low-

levels of DNA damage by using an integrated network approach. Without pathway and

network-level information, this non-intuitive relationship may have been missed.

Network interpretation has already added depth to non-intuitive instances of drug re-

sistance. Recently, Wilson et al (2012) showed that growth-factors within the tumor mi-

croenvironment may increase resistance to kinase inhibitor therapy [162]. While this might

seem counterintuitive in a linear-process formalism, considering the cell's underlying sig-

naling network make these results less surprising. Wagner et al used network inference

methods to create interaction networks by combining systematic RNAi-perturbation data

with phosphorylation information at multiple time points for six receptor-tyrosine kinases

(RTKs) (EGFR, FGFR1,c-Met,IGF-1R,NTRK2, and PDGFR-beta) [154]. From the result-

ing networks, they clustered each RTK network, identifying core signaling components

shared between all RTKs as well as cluster-specific modules. They postulated that mod-

ules shared between RTKs within the same cluster could explain resistance to targeted

RTK therapy. More specifically, if RTKs of a particular class shared signaling components

and affected the same downstream phenotypes, then these within-cluster RTKs could com-

pensate for chemical inhibition by actuating the original downstream phenotype[1 54]. They

demonstrated this compensation within the EGFR/c-Met/FGFR1 cluster by showing corre-

22

lation of receptor expression with resistance to therapies targeted to other within-cluster

RTKs.

Functional pathways explain variability among genes classified as "hits"

A meta-analysis of nine RNAi screens for HIV-replication factors used functional enrich-

ment to explain discrepancies across high-scoring targets from each screen[14]. When

they investigated the percentage of scoring targets across three screens, this overlap only

included a modest 3-6% of gene targets. They show that variability between screens, vari-

ability between experimental timing and toxicity thresholds all contributed to the minimal

overlap among these screens. However, when they looked at gene membership in GO

ontology categories, they found much greater overlap in the enrichment of GO categories

across screens than in the individual gene targets. This finding indicates that a more global,

functional filter is useful for identifying true positives from highly variable RNAi screens.

Additionally, using functional pathway membership increased experimental validation

rates in an RNAi screen for DNA-damage mediators [123]. The authors screened all

protein-coding genes in Drosophila melanogaster and compared top hits to an analogous

screen in Saccharomyces cervisiae, but did not see a statistically significant overlap be-

tween screening targets [123]. They expanded their target list by identifying functional

pathways that contained gene targets from their screen in Drosophila melanogaster and

were able to find enrichment of targets from the Saccharomyces cervisiae screen within

this set. Further, using this expanded set for validation experiments identified false nega-

tives from the original screen. These results reaffirm the utility of filtering data by pathway

membership to identify true positives and also using pathway membership as a search

space for false negatives.

In a pioneering study, Jones et al (2008) demonstrated the significance of using path-

way context in a patient setting [69]. They performed a global analysis of mutations in

pancreatic cancers, but found little overlap in the specific mutations across patients. How-

ever, they instead found a core set of signaling pathways that consistently enriched for

patient-specific mutations. They postulate that targeting the physiological consequences

23

of these pathways instead of the individual mutations would improve therapeutic devel-

opment [69]. If we consider the discrepancy between RNAi reagent performance across

replicates as similar to the mutational differences between patients, these findings present

more motivation for using a pathway-centered approach for functional genomic studies.

Network integration is a viable tool for hypothesis generation

Given the importance of understanding the functional context of a genetic alteration, net-

work methods are a useful computational tool. Additionally, these tools enable the incor-

poration of multiple data sets and experiments to create more holistic interpretations of

biological systems. Because of the availability of many experimental datasets through var-

ious databases, data integration will be influential in future investigations [75]. Here, we

review a few integrated network approaches and highlight how networks have improved

the interpretation of biological investigations and affected further hypothesis generation.

In metastatic breast cancer, integrating copy-number variation (CNV) and gene expres-

sion data across multiple samples accurately predicted novel drivers of disease [145]. The

authors used a refined method for first identifying recurrent CNVs from gene expression

data and then used a Bayesian methodology to create a network of mutated genes. From

this network, they found master regulators by selecting genes that had a high authority

score. Mathematically, the authority score identified genes with a statistically significant

number of outgoing connections as compared to the mean number of connections. To test

their hypotheses about mediators for breast cancer, they performed an siRNA screen test-

ing the effect of gene interference on cell viability. Of the gene targets that had the greatest

effect on cell viability, they found a significant enrichment of their high-authority regulators

[145]. This finding demonstrates that networks can synchronize disparate datasets and

that network properties are viable characterizations for finding novel regulators.

Utilizing a data-integration approach, Huang et al (2013) constructed an EGFRvIII-

specific signaling network for glioblastoma multiforme (GBM) by incorporating proteomic

and transcriptional data[62]. More interestingly, they used this network to identify, test and

validate novel therapies. Their final networks consisted of a high-confidence set of experi-

24

mental data points as well as gene targets not included in the original set, but rather added

through known protein-protein interactions. From this network set, they systematically ex-

panded targets for therapeutic intervention by identifying targets with known chemical in-

hibitors and ranking them based on their proximity to the core functional network. From this

target set, they identified compounds in clinical trial with known effects on cancer systems

and chemical inhibitors not yet tested for GBM [62].

In the interferon-stimulatory DNA (ISD) sensing pathway, an integrated network ap-

proach proved successful for identifying novel regulators of this process and for testing

new therapeutics[83]. In this analysis, the authors created an interaction network of po-

tential ISD regulators by combining direct interacting partners of known ISD pathway com-

ponents with interacting pairs from their own quantitative mass-spectrometry experiments.

Perturbation of this compendium network with RNAi reagents identified Abcf1, Cdc37, ad

Ptpnl as effectors of the ISD-sensing response to dsDNA. In this situation, curating and

expanding interaction information around known pathway components successfully identi-

fied novel genes for the ISD response. The authors also measured ISD-pathway induction

after treatment with chemical inhibitors against their novel genes and demonstrated a re-

duction in deleterious interferon production. These results show that integration is useful

for developing new hypotheses for therapeutic development and supports the Jones et

al perspective concerning efficacy of designing therapeutic options around downstream

pathway physiology[69].

Data integration within a network framework also added depth to understanding metabolic

disorders using SNP and genetic linkage data [19]. In this investigation, researchers

created a network where interactions depended upon significant co-expression and link-

age data between genes. Using optimization, they selected highly connected gene sub-

modules and then used these modules for further hypothesis generation. Many sub-

modules were enriched for genetic features that were significantly associated with disease

traits (fat mass, weight, plasma insulin levels etc) and one sub-module was significantly en-

riched for genetic features with significant correlation to all disease traits. They expanded

this module, and created a macrophage-driven superior module from which they selected

and further perturbed genetic loci. From these perturbations, they were able to demon-

25

strate the sub-network's contribution to the observed disease traits and classify genetic

features previously not associated with metabolic traits. This collective understanding of

genetic interactions ultimately created a more comprehensive view of their dataset and

novel hypotheses distinct from those that would be identified from looking at gene-trait

associations independently.

Recently, our own efforts investigating in vivo mediators of Acute Lymphoblastic Leukemia

(ALL) have employed a data integration approach to ascertain GO biological function en-

richment rather than to looking at screening targets independently (unpublished). A B-cell

model of ALL was infected with a genome-scale shRNA library and after infection, cells

were plated in vitro or tail-vein injected into syngeneic recipient mice. After disease de-

veloped, cells were harvested and sequenced for final shRNA representation. Simulta-

neous Analysis of Multiple Networks (SAMNet), is a flow-based formalism which relates

screening hits to downstream expression data using the interactome as a guide for pos-

sible connections among the data [32]. The method generated a network enriched for

functional pathways, such as developmental processes, that are known to play a role in

ALL - whereas these were not identified when analyzing experimental data independently.

This enrichment increases confidence that RNAi hits identified within the network are true

positives. Further, SAMNet adds targets, or nodes, to the network that were not present in

the original high-scoring target set, making it possible to hypothesize about potential false

negatives in the data. In these examples, data analysis in isolation was insufficient for

discovering novel regulators and targets for therapeutic intervention. Instead, a concerted

network approach, integrating multiple data sets or experimental results, improved target

identification and created testable hypotheses for therapeutic development.

Reciprocal engineering for future gene-interference investigations

Understanding and modulating cancer requires a concerted understanding of gene func-

tion and appreciation for each gene's pathway membership. Much like an orchestra, the

performance of the group depends on the collective group effort rather than the ability of

any one player. Auditioning players individually is important for assessing skills and musi-

26

cality, yet their full potential depends on their ability to contribute to the sound of the group.

Gene-interference studies are the experimental parallel of 'auditioning', yet their interpreta-

tion is limited if each player is considered in isolation. Instead, the conductor must observe

the player within his section to see if deficiencies affect the overall sound or if the sound

of his peers compensate for his weaknesses. In the same way, building biological net-

works using RNAi experimental data puts the player in his section, and uses his pathway

membership to assess his effect on the sound of the orchestra.

'Network Filtering' techniques will increasingly become a secondary post-processing

step to statistical analyses for gene-interference studies. We have conceptualized how

identifying network clusters may complement existing statistical approaches in 1. The net-

work provides context for any RNAi reagent's performance and leverages a gene target's

pathway membership to identify true and false positives. This approach assumes that if

multiple RNAi target genes enrich within a pathway, they are more likely contributing to

an experimental phenotype. Similarly, if an RNAi-targeted gene stands out experimentally,

but does not have interactions with other targets, this target may be a false positive. In this

mindset, networks offer a filter for removing false positives from entering the final 'hit-list'.

Where RNAi study results are affected by unintended effects, noise-reduction techniques

will have a large impact on data interpretation.

However, there is still much to be learned about gene pathways and their modulation for

cancer systems. While pathways provide context for gene function, pathways themselves

are context specific and require systematic perturbation[1 53]. For instance, melanoma pa-

tients with BRAF mutations have drastically different sensitivity to therapy than colorectal

cancer patients with similar mutations[1 53]. As such, many considerations remain about

interpreting pathways and using them to refine experimental evidence. Conceptualizing

pathways instead of individual molecules changes the hypotheses generated as well as

the experimental validations that follow[9]. For instance, because of the increased level

of interconnectedness, disease modules are computationally identifiable by graph theory

parameters such as clustering coefficients, and shorter path lengths. Further, designing

validation experiments around these modules may provide novel insight into understand-

ing disease and also improve correlation between predicted perturbation and experimen-

27

1. High-throughput Measurements

Y V2. Statistical FiltersSelect with Significance Thres?

< 0.05

3. Experimental Validatior

Traditional'TargetSelection

NetworkMotif BasedTargetSelection2a. Network FilterSelect Based on) Relationshipsto Other Experimental Data

Possible Methods:Correlation MethodsInteractome ScaffoldingPathway RecoveryNetwork Flow

Figure 1: Network Filtering Uses Functional Relations to Identify Candidate Targets.Traditional methods for target selection collect high-throughput data, then use statisticalmethods to filter data for top candidates and further experimental validation. A network fil-tering approach would collect high-throughput data, perform statistical analyses, but wouldadd an additional network-construction step to find functional consensus among top targetsbefore identifying candidate regulators or predicted pharmacological targets and moving toexperimental validation.

28

n

tal phenotype[151, 154]. This conceptualization further requires consideration of network

properties instead of only experimental phenotype and may instead prioritize candidates

based on number of connections (network degree) or enrichment against randomized net-

works. Yet, it is unclear which graphical parameters are most predictive of false positives or

false negatives and whether these parameters are consistently predictive across multiple

systems.

As network motif discovery becomes more common, we envision an accompanying

shift in approach to these methods. This shift incorporates reverse engineering principles

through a desire to find models that explain system behaviors as well as forward engineer-

ing principles in which the investigator designs the system to control a particular pheno-

type. We propose that future efforts to construct and manipulate cancer networks will use

an 'Reciprocal Engineering' approach (Figure 2). In this 'Reciprocal Engineering' mindset,

researchers balance motivation to explain a system with motivation to design a controllable

system. Already researchers have alluded to the value of 'Integrated Interactomics' and

how future data integration which balances motivations for understanding and prediction,

changes our investigation of cancer systems[1 0].

Recent commentaries in the area underscore the potential impact of this paradigm shift.

These articles concur with the notion that signaling pathways drive cancer progression, and

are a rich source of targets for therapeutic development [153, 164]. Both biological network

models and gene-interference studies are cutting edge techniques that have greatly added

to our understanding of cancer systems. As such, future endeavors merging these growing

fields will enhance understanding of cancer systems and improve ability to manipulate a

complicated disease.

29

'Adivabng'

r- -A.w nn ..1ritfty"

Different gene targets showdifferential effects ondownstream phenotype,

0-

o 00

Inference Methods and NetworkConstruction IdentifyRelationships BetweenActivating and InhibitoryTargets.

Experimental results reveal howsome targets arc interrelated basedon treatment or condition

0-0O 0

The interactome provides a scaffoldfor connecting experimental resultsand identifying the pathway contextfor targets. This balances discoveryfor mechanism and desire formanipulation.

/010

00G0o\0 O

OO

Using identified and relevantpathways, select the targetslikely to affect the down-stream phenotype.

Wild-type

}Sigle Tre.-nwiilcombination

Use pathway modulationto manipulate the downstreamphenotype.

Mechanism-Motivated Design-Motivated

Figure 2: Reciprocal Engineering Balances Forward andReciprocal Engineering balances the mechanism-driven

Reverse Engineering Principles.paradigm of reverse engineering

with the design-motivated paradigm of forward engineering. In the graphical representa-tion, reverse engineering is exemplified by the desire to uncover mechanisms explainingrelationships among the data - for example using correlation to infer relationships amongexpression. On the other hand, forward engineering is using a known network to ma-nipulate the system - in this example, inhibiting node functions to alter downstream geneactivation. Blending both paradigms, Reciprocal Engineering balances relationship discov-ery with design motives to create networks that infer biological function as well as discoverpoints for manipulation.

30

I

Part III

Pathway-based network modeling findshidden genes in shRNA screen forregulators of Acute LymphoblasticLeukemia

Abstract

Data integration stand to improve interpretation of RNAi screens which, as a re-sult of off-target effects, typically yield numerous gene hits, of which, only a few vali-date. These off-target effects can result from seed matches to unintended gene targets(reagent-based) or cellular pathways which can compensate for gene perturbations(biology-based). We focus on the biology-based effects and use network modelingtools to discover pathways de novo around RNAi hits. By looking at hits in a functionalcontext, we can uncover novel biology not identified from any individual 'omics mea-surement. We leverage multiple 'omic measurements using the Simultaneous Analy-sis of Multiple Networks (SAMNet) computational framework to model a genome scaleshRNA screen investigating Acute Lymphoblastic Leukemia (ALL) progression in vivo.Our network model is enriched for cellular processes associated with hematapoeticdifferentiation and homeostasis even though none of the individual 'omic sets showedthis enrichment. The model identifies genes associated with the TGF-beta pathwayand predicts a role in ALL progression for many genes without functional annotation.We further experimentally validate the hidden genes - Wwpl, a ubiquitin ligase, andHgs, a multi-vesicular body associated protein - for their role in ALL progression. OurALL pathway model includes genes with roles in multiple leukemias and roles in hema-tological development. We identify a tumor suppressor role for Wwpl in ALL progres-sion. This work demonstrates that network integration approaches can compensate foroff-target effects, and that these methods can uncover novel biology retroactively onexisting screening data. We anticipate that this framework will be valuable to multiplefunctional genomic technologies -siRNA, shRNA, and CRISPR - generally, and willimprove the utility of functional genomic studies.

31

Insight, innovation, integration

This work integrates multiple 'omic data sets to better understand leukemia pathology. The

most striking finding in this work is that we can identify biological annotations from these

data when these datasets are integrated in a network model even though these annotations

were not present in either dataset alone. Further, we use RNAi screening data as the

model's foundation; this screening technology is a popular tool for probing gene function

that is criticized for noise and off-target effects. Though, with integration, we derive novel

insights in spite of this noise and are able to test our theoretical findings through dedicated

validation.

Introduction

Functional genomic screens are a powerful tool for systematically probing gene function in

the context of many biological systems[97, 142, 38, 98, 161, 12]. Shortly after its adapta-

tion to experimental work, RNAi gained popularity as the technology is relatively easily and

quickly adaptable to multiple biological systems[142, 38, 12]. However, off-target effects

(OTEs) which can result from seed matches between the individual RNAi and unintended

genes[65] are a widely criticized limitation of this technology. These effects complicate

validation and pursuit of further hypotheses[142, 71, 135] and thus, RNAi screens require

analysis methods that can more efficiently identify true targets for validation[1 42, 65]. Many

studies have focused on improving the stability and specificity of the RNAi reagents them-

selves[1 0], or have developed algorithms for predicting real effects by considering unin-

tended seed matches[1 35, 122, 13]. Other studies have considered alternative platforms,

such as TALENS or CRISPR, for probing gene function[12]. CRISPR systems can ex-

hibit stronger loss of function phenotypes than RNAi and seem a promising alternative[48].

These systems enable precision genome engineering by more efficiently blocking gene

function, and have the added capacity of being able to activate and inactivate genes[48].

CRISPR systems still suffer from OTEs and are sensitive to mismatches in the 12 bases

proximal to the guide strand[14]. Overall, these approaches only address the technical

challenges of gene interference and do not consider the biological consequences.

32

Few investigations have considered how a cell may compensate for a gene perturba-

tion. An RNAi reagent may perfectly block an mRNA transcript, but a redundant protein

may compensate for the loss of a particular gene or the targeted protein may be stable

beyond the duration of knockdown, leading to false negatives. Further reasons for false

negatives include the inability for RNAi to sufficiently diminish expression of highly stable

proteins (e.g. enzymes which may retain function after knockdown) or the use of small, tar-

geted libraries. To compensate for the possibility of false positives and false negatives, one

group used GO analysis to find consensus among three different siRNA screens for HIV

replication factors. The group saw little overlap between the specific hits from each screen,

but saw that all three screens had top hits enriched for the same GO functions[14]. Given

the propensity for OTEs, it is not surprising that three independent screens identified dif-

ferent candidate hits, though it is striking that the individual hits fall in similar pathways[1 4].

This work foreshadows the value of using pathways to provide context around any individ-

ual hit. Though, our current definitions of cellular pathways are incomplete and there is a

real need for discovering pathways and attributing new genes to existing pathways.

Here we pursued an integrated, pathway-based approach to identify in vivo specific reg-

ulators of Acute Lymphoblastic Leukemia (ALL) progression. Development of treatments

for Acute Lymphoblastic Leukemia (ALL) has had mixed success and improvements in

patient overall survival is still unchanging[1 26]. For childhood ALL patients, 10% suffer

remissions and further, treatments have high toxicity[64]. We already know that the tumor

microenvironment affects how cancers progress and respond to therapies in a complex

manner. Paracrine signaling in the bone-marrow microenvironment can confer resistance

to therapy in myeloma[58] and local cytokines can promote cancer development in the con-

text of specific genetic lesions[47]. More thorough disease characterization in the native

microenvironment would facilitate development of new treatment strategies. Further, ALL

is just one of many types of cancers which arises from incomplete hematopoieitic differ-

entiation. Given the similar origin of these disesases, it is possible that we can learn and

re-purpose molecular studies from other hematopoeitc cancers to accelerate development

for ALL. To explore the role of the microenvironment in determining genetic mediators of

ALL, we have used a genome-wide shRNA screen for mediators of pre-B-cell ALL progres-

33

sion in vivo, and demonstrated the ability to identify micro-environment-specific genes[93].

Recognizing the likelihood of significant OTEs and acknowledging the significant mer-

its of pathways analysis methods, we developed a network model to discover missed tar-

gets, or predicted genes, from the initial shRNA screen. This network model builds on the

pathway-based approach described earlier by incorporating diverse experimental datasets

to better model the genes contributing to ALL progression. We assume that many pathway

annotations are incomplete, and that modeling multiple experimental datasets will uncover

novel pathways. We construct our network model using shRNA, ChlPseq, and mRNA

expression data and use this model to understand and validate features of the in vivo sys-

tem. We experimentally validate novel roles for Hgs and Wwpl: Hgs is a gene that is

generally deleterious to B-cell ALL viability, and Wwpl is an in vivo specific regulator of

disease progression. We perform this analysis using screeing data that was not designed

for further computational modeling in mind - the screen did not contain redundant shRNAs

or non-targeting controls. Taken together these results demonstrate the ability of network

models to select candidate targets from an shRNA screen and discover novel pathways

from disparate datasets. Biologically, the model makes specific predictions about gene

targets that affect ALL progression by affecting the tumor microenvironment, illuminating

multiple pathways that are relevant for therapeutic development in ALL.

Results

A network-based data integration scheme

To identify pathways that mediate ALL progression, using multiple experimental data- we

introduce a network-based approach, described in Figure 3. Conceptually, this approach

uses published protein-protein interaction data (Figure 3A) alongside computational de-

rived protein-DNA interactions to construct a set of all possible interactions that can relate

experimental measurements from shRNA screening and mRNA expression. This larger

network will then be reduced (Figure 3B, C) to identify biological pathways, either known

or unknown, that are implicated by the experimental data, described below.

34

A 0

B C

Figure 3: Constructing a network model from multiple 'omic measurements.A. We start with a probabilistic interactome that includes protein-protein interactions scoredby the confidence of their interaction. This confidence score reflects the strength of evi-dence across multiple interaction databases and this score constrains the edge's capac-ity within our flow-based model. Higher confidence leads to higher capacity. Some ofthese proteins are transcription factors (triangles). We complement these edges withtranscription-factor (triangles) to DNA (octagons) binding interactions. We predict theseinteractions and their edge probabilities by measuring active and open chromatin via ChIP-seq and looking for enrichment of transcription factor binding motifs. Conceptually, this isour available road map for creating pathways where the capacities are akin to speed limits.B. We connect an artificial source node to all proteins that have corresponding shRNAsthat were considered hits in the screen. These edge capacities reflect the strength of theshRNA effect. In our model, these edges reflect how strongly an shRNA depletes frominput to morbidity. We connect an artificial sink node to differentially expressed mRNAs.These edges reflect the fold-change in expression. The algorithm introduces flow into thenetwork and looks for an optimal route from the source to the sink, selecting edges basedon available capacity. C. The final path through the interactome becomes the de novopathway. This pathway may or may not include all of the original inputs (e.g. differentiallyexpressed mRNA or depleted shRNAs). Further, SAMNet allows the simultaneous con-struction of pathways for multiple conditions. In our investigation we treated the parallelin vitro and in vivo screens as separate conditions. D. Screening design interrogates invivo specific regulators of ALL progression. A genome-scale library was introduced to ALLcells in vitro. Representative samples were either maintained in culture or transplantedinto mouse models. At time of morbidity, blood and culture samples were re-sequenced tomeasure shRNA representation.

35

shRNA screen and mRNA expression data identify distinct and incomplete gene sets

We collected previously-published shRNA screens and mRNA expression data from a

mouse model of ALL, one in vitro and the other in vivo [93] (Figure 3D). Together this

data represents the direct and indirect effects of RNA knockdown in both environments.

To identify the direct effects of shRNA screens, we calculated a fold-change in shRNA

representation, comparing sequencing reads for each shRNA at input (time of transplant)

and post disease burden (at morbidity). We ranked genes based on the greatest depletion

from input to post disease burden and considered the top 1% (84/87 genes for the in

vivo/in vitro screens) for further investigation (Table 1). GO enrichment of the ranked list,

using GOrilla [29, 30] of targets from the in vivo dataset did not identify any enrichment

of GO terms. GO enrichment of the ranked in vitro targets found enrichment of cellular

homeostasis, and cation regulation; however, these terms were enriched merely based

on the presence of TMEM165, KCNA5, and CEBPA, without contribution from any other

targets in the ranking (Table 2). Overall, we found little functional enrichment or information

from shRNA data alone, suggesting that we had an incomplete picture of genes regulating

ALL progression.

36

In vivo In vitroGene Abs(f.c.)I Gene Abs(f.c.) I Gene Abs(f.c.) I Gene Abs(f.c.)ASRGL1FNDC5MRPL37ATP6V1CI1DHX37NRG2ATP1OBHOXD3HDHD1PCBP4DHDDSSERF1ASLC29A2STK32BPOLOGTF3C5SPC24IPo11lGMPSUBE2HZNF416CASP14SCDMAPRE3PLEKHA5SMARCD1DENND1CTMEM156TMEM176ALPPR5SRP72ATP6AP2FOXO6WISP3LCE3CZBTB4RCFXYD6UBE2L6SNTB1TIMD4DVL1COLEC12DHRS7CCTSCCECR5IOCHMTRF1L

TGS1

TIGD5SPATA13

10.81810.2339.6709.4789.2449.1558.9408.7658.7608.7448.6338.4748.3908.2968.2418.1648.1408.1028.1028.0738.0258.0148.0007.8987.8797.8567.8407.8217.8007.7517.7047.5957.5887.5647.5627.5127.5127.5127.5057.4987.4807.4527.4407.4407.4357.4257.4257.3667.3377.3277.306

Table 1: Top 1% of depleting shRNAs in vitro and in vivo.We calculated the absolute values of fold-changes for all genes that depleted from inputto end point. We mapped all genes to their human homologues for use with SAMNet.Fold-changes were calculated using DESeq2.

37

CPSF6NDUFS1ARMCX6OR13C4BCMO1SF3A1DIAPH3PGLYRP2ABHD12BDRD1

ODF2POGZ

TDRD7

C9orf69

ALKBH2

GRHL3C17orf97

SUZ12

TMEM79

LAG3

IDH3A

C4oif32

PNO1

FUT10VCPIP1TFDP1

RPL7

RNF11SLC6A2

YEATS2

LAP3

ADAMTS18TRIM33

7.2967.2927.2817.2757.2707.2687.2667.2497.2427.2417.2387.2157.2067.1937.1437.1197.1117.1117.1067.0627.0427.0397.0187.0056.9906.9746.9666.9566.9516.9326.9286.9276.916

ClOorf71

KCNA5TMEM165C10,185

KLF3CEBPATPSG1ZNF367GTPBP10MFSD3GLB1L

DNTTMYT 1ANKS6KRT77NRARPPPP3CBMAD2L1CRNKL1ACTC1AHCYL1FCRL3SURF4RCOR2

FCGR3A

ZNF616BAG5SASS6SMYD5TRIM59ZNF347NFAT5MATR3

MED8CCT2RA833AMCPH1KCTD1POLR3GL

MPZL1INTS1

DNMT3AARL9DOK2

SYNJ2

MBTPS2

C4orf17NHLRC1ATXN3

MID2

CHM

10.0389.9659.7519.6719.6339.3338.9758.8268.7748.7548.5588.4988.2858.2628.160

8.1038.0998.0558.0168.0168.0167.9967.9967.9627.9557.9137.9027.8037.7607.7457.7237.7217.6497.6437.5507.5457.4407.4077.3807.3657.3417.3347.3047.2857.2857.2287.1837.1077.0907.0907.022

MEIS1ZNF583LATZNF217TPRKBPOLR2BSPIRE1ENO3ZKSCAN2C10r106

REG1AZBTB24TTLL9ITGB1BP1IL5RASNTB2AKR1D1ZFYVE26MRPL13TDGF1LGALS7GCH1DSELEIF2S3S100A4MROHMGA2RASLIOAGIT2PPARGSCLT1ASXL2TM7SF3NOS3

BICD1NR3C1

6.9976.9226.9096.9086.8936.8816.8496.8266.722&B6946.6756.6676.6636.6336.6076.598&.5606.5546.4846.4576.4286.3786.3016.3006.2996.2796.1576.1546.1486.1316.1046.0916.0786.0446.040

8.035

cellular homeostasis 3.79E-02ion homeostasis 4.07E-02cellular divalent inorganiccation homeostasis 4.38E-02divalent inorganic cationhomeostasis 4.74E-02cellular chemicalhomeostasis 5.17E-02cation homeostasis 5.69E-02calcium ion homeostasis 6.32E-02metal ion homeostasis 7.12E-02monovalent inorganiccation homeostasis 8.13E-02chemical homeostasis 9.32E-02inorganic ion homeostasis 9.49E-02metal ion transport 1.1 4E-01cellular cation homeostasis 1.42E-01cellular metal ionhomeostasis 1.90E-01cellular ion homeostasis 2.85E-01cellular calcium ionhomeostasis 5.69E-01

Table 2: GO enrichment of shRNA targets from the in vitro screen.Enrichment used a single ranked list against the whole genome via the GOrilla web tool.

Using microarrays, we also collected differential mRNA expression data to compare

between cellular contexts at morbidity. Genes were considered based on their fold-change

relative to the in vivo context, and ranked based on fold-change. Again, we investigated

the top 1% of genes up-regulated in vivo (77 genes) and up-regulated in vitro (66 genes)

(Table 3). Using GOrilla, we found few enriched GO terms in the in vivo data genes as

compared to the whole genome; these terms included regulation of transport, and actin

cytoskeleton organization (Table 4). There were no enriched GO terms in the set of genes

up regulated in the in vitro context. We would expect that for any biological process, not

all relevant genes will be differentially expressed or be sensitive to shRNA perturbation.

Given these circumstances, we would not expect differential expression analysis or shRNA

screening to uncover all relevant genes or expect these measurements to identify the same

genes. The fact that there was little functional enrichment in these top candidates reaffirms

the incomplete nature of individual high-throughput measurements[1 66], and suggests an

integrated approach could find hidden information.

38

In vivo, genes with fold-change values12.10 1 LOC671894 // LOC674

RTP4

MS4A14732416N19RIK2900041A09RIK

CASP4IFITM3

SLFN45830431A10RIKZBP1

B230343A10RIKCCL54921525009RIKC5AR1

7.75 1 MYOM2

7.637.617.577.567.557.537.537.49

7.447.417.377.367.33

MX1TIAM1D430019H16RIKDOCK9 /// LOC670309SPARCFPR-RS2PLF /// PLF2 /// MRP

MYH6 /// LOC671894/

CCL3S100A5TIMM8A2LGMNDLGH3

HBA-A1

S100A9S100A8

IIGP1

RHOJ

NKG7AQP1ANXA2CSF1R

IL18

NRP1FOSSAA3P

CH13L3

TCRB-J ///

TCRB-V13LOC240327IGL-V1 /I

2010309G2ENPP3

PGLYRP1

SLC9A3R2

LCN2

BLR1

HYDIN

EPPK1

NGP

MPA2L /LOC626578

8.29 C1QB8.27 HTRA3

LAMB2PLXNB1

IL2RAA33010 2K04RIKGPRC5A

TCRB-V13 /// LOC6655TYROBP1100001G20RIKLMNA

7.80 1 XDH

7.267.257.247.197.167.127.107.057.04

7.04

LGALS3BP

CD97DIRAS2KLF4

GZMA2010300C02RIK /// LOADCY6

A930013B1 0RIKTLR1

In vitro, genes with fold-changesIL21RBEX61190002F15RIKFETUBNUPRIREEP1GLRP1

SENP8

CCR2PAX7POLH

2610019103RIKFADS2DKK3

PKP2AXIN2KIF2C /// LOC631653NAP1L32610021K21 RIKBARD1CHAC14731417B20RIK

-4.82

-4.57-4.39-4.34

-4.33-4.29-4.25

-4.17-4.15-4.13-4.10

-4.05-4.03-3.99-3.95-3.92-3.88-3.84-3.83-3.83-3.80-3.79

5730442G03RIK1700025G04RIKB230107K20RIKJDP2NETO2

PRG3RTN4RL2

SLC6A13

D19ERTD652E

ORCIL

CD248

DPM3

GM129

USP2

EVA1

ABCG1

AMMECR1CMAHPLK1

ZDHHC2GCM2MAP6

Table 3: Genes selected as top candidates from mRNA expression data.The table shows the top 1% of genes up-regulated in vivo (top) and in vitro (bottom).

39

11.9310.929.919.809.769.759.648.94

8.858.688.528.478.45

7.30 ITGA57.28 KLK3

7.037.017.006.976.966.966.936.916.816.756.736.726.696.65

6.656.64

6.64

6.636.606.556.526.516.516.496.48

8.268.24

8.19

8.11

8.007.907.897.887.82

TGFB3HBB-BH1VLDLRHS3ST1PLA2G2F1700097N02RIKNKX1-2ACTR3BPPP1R3BUBQLN2ANKRD15SOX6CDH118100llHllRIKCD28

LOC433844A1427515ART4PTGS1GFI1BSELENBP1CTH

-9.74

-8.50-7.05-6.99-6.02

-5.97-5.77-5.69-5.52-5.49-5.42

-5.38-5.37-5.35-5.20-5.13-5.09-5.01-4.95-4.95-4.93

-4.93

-3.78-3.77-3.77-3.77-3.76-3.69-3.57-3.57

-3.53-3.51-3.51-3.48-3.48-3.48-3.45

-3.44-3.43-3.42

-3.40-3.39-3.39-3.38

GO Function FDR q-value

regulation of transport 1.24E-01response to transitionmetal nanoparticle 1.96E-01positive regulation oftransport 2.15E-01actin filament-basedprocess 2.46E-01actin cytoskeletonorganization 3.27E-01

Table 4: GO enrichment for genes up-regulated in vivo.Enrichment used a single ranked list against the whole genome via the GOrilla web tool.

There were no enriched GO terms for genes up-regulated in vitro.

Measuring histone activation for model specificity

When designing this model, we wanted to find interaction sets that connected the genes

identified from the shRNA screen and from differential expression analysis. This model re-

quired protein to DNA interactions in addition to the existing protein to protein interactions

in our interactome (Figure 3A). These interactions were not publically available so we used

a combined computational and experimental approach. Specifically, we used measure-

ments of open and active chromatin to identify regions of putative transcriptional activity

and predicted transcription factor/DNA binding interactions using computational modeling

techniques.

We collected ChIP-seq data for the activating histone markers H3K27Ac and H3K4me3

from our ALL model in culture. Compared to an IgG control, we identified 29468 and

18142 peaks in the H3K27Ac and H3K4me3 datasets respectively. To identify regions of

possible transcription factor binding, we searched within these peak regions for local min-

ima, or valleys, between histones; from this analysis we found 70894 and 24617 valley

sequences in the H3K27Ac and H3K4me3 datasets. Of these valleys, 15178 regions over-

lapped between the two datasets (21% of the H3K27Ac and 62% of the H3K4me3). Figure

4 shows example ChIP-seq reads, MACS peaks, valley regions, and IgG controls for 3

control genes: Trim27, E2f3, and Hist1h1b. We selected these example genes because

they had relevant histone activation marks in mouse B-cell samples that were previously

40

published in the UCSC Genome Browser. For these genes, the H3K4me3 and H3K27Ac

ChIP-seq and MACs peaks were well aligned, although the valleys varied slightly. In the

case of Histi hi b, the H3K27Ac data had a larger MACS peak and thus more valley re-

gions. Given these trends, we used the union of all valley regions to identify possible

regions of transcription factor binding.

From these active valley regions, we used our own software suite, Garnet, to identify

transcription factor-DNA binding interactions. Garnet uses a weighted scoring approach to

quantify the probability that a transcription factor occupies a region based on the strength

of the transcription factor motif [50, 106]. The analysis created a list of 262705 interac-

tions with scores distributed from 0.3 to 0.99. We append these interactions to an existing

protein-protein interaction network derived from iRefWeb (interactome described in meth-

ods).

H3K27Ac -- - --

H3K4me3 -A-. ______

Figure 4: ChIP-seq with valley-finding identifies regions for transcription-factor binding.Genome viewer tracks for Trim27 (chr1 3:21,267,345-21,277,316), E2f3 (chr1 3:30,071,171-30,083,320), and Histi hi b (chr1 3:21,868,763-21,874,488), showing ChIP-seq reads (top),MACs peaks (middle), valley regions (lower, orange), and IgG control (grey, lower) forH3K27Ac (top 4 rows) and H3K4me3 (bottom 4 rows).

SAMNet identifies a network of genes affecting ALL progression

As we already knew that shRNA and mRNA measurements capture distinct pathway components[1 66],

we pursued a data integration approach to predict new genes in the ALL progression path-

way. We used the Simultaneous Analysis of Multiple Networks (SAMNet)[49] to construct

such a pathway because this was the most powerful tool given the data we had collected.

The algorithm integrates diverse perturbation and response datasets and maps them to a

41

physical interaction network comprising all possible ways by which the perturbed species

(e.g. shRNA hits) can give rise to the observed response (e.g. differentially expressed

transcripts). SAMNet then applies a network flow-based paradigm to select a subset of

interactions that relates the experimentally-perturbed genes from the shRNA screen to

the differentially expressed mRNA through the specified interactome. We conceptually

depict how SAMNet integrates shRNA and mRNA data with our interaction network, or

interactome in Figure 3. In the context of the resulting model, RNAi genes are connected

upstream to transcription factors and differentially-expressed mRNAs; sometimes, RNAi

genes represent transcription factors themselves and are directly connected to differential

mRNAs. Mathematically, the algorithm selects an interaction sub-network by pushing flow

through a probabilistically-weighted interactome (described in methods). Flow initiates at

the RNAi hits and terminates on differentially expressed mRNAs. Fold-change values from

RNAi and mRNA expression data constrain the amount of flow any experimental gene hit

can capture in the network. Genes are able to capture flow from either commodity. Flow

sharing is constrained by the edge capacity (set by the edge confidence); common nodes

force the algorithm to select interaction edges beyond the common nodes. and the algo-

rithm Flow is shared between data from the in vivo and in vitro experiments through the

multi-commodity abstraction in the underlying SAMNet implementation.

The foundational model consisted of 311 nodes and 480 edges (Supplementary Figure

8). We proceeded from this foundation by removing the 'source' and 'sink' nodes which are

topological formalities for creating the network model. This created an interaction network

where upstream protein-protein interactions converge on transcription factor/DNA binding

interactions. To first focus on a protein interaction sub-network, we omitted the 43 mRNA

nodes and transcription factors that are not directly connected to the protein-protein inter-

actome (see below). The resulting sub-network contains 258 nodes and 259 edges (Figure

5). This network contained 91 targets from the RNAi inputs (i.e. experimental genes), and

167 predicted genes. Of the 167 predicted genes, 33 are transcription factors.

A predicted transcription factor/DNA binding sub-network contains 84 nodes with 79

edges (Figure 6); 41 nodes are transcription factors, and 43 nodes are differentially ex-

pressed mRNAs. This sub-network includes an additional 7 experimental genes from the

42

RNAi input set; this yields a total of 40 transcription factors in the whole network, combin-

ing the two sub-networks. These 7 targets are transcription factors which had predicted

connections to altered mRNAs, but did not have further connections to the protein-protein

interaction network. This set of transcription factors comprises Tfdpl, Foxo6, Hmga2,

Meis1, Myt1, Cebpa, and Klf3. Another identified transcription factor, Pparg, had con-

nectivity both to the protein-protein interactome and directly to differential mRNA (Figures

5&6).

43

4 zS4 c C

4C

a

Figure 5: SAMNet identifies integrated network for ALL progression.The magenta/green edges represent interactions from the in vivo/in vitro screens. RNAihits are represented by a shaded square; the shading refers to the extent of depletion inthe original screen. A diamond is a transcription factor selected by SAMNet; those thatare shaded are also hits from the shRNA screen. All white-face nodes are hidden targetsselected by the algorithm. Node border color represents fractional representation in afamily of 100 random networks. Downstream mRNA not pictured.where Wwpl, Hgs, Lmo2, and Pogz exist within the network.

Red arrows indicate

44

mRlNA o d change-- 9.97

gene0 d n -2euae Cn v hi5 J

8Q, 85V *AeIu-rgltN nvv

Figure 6: SAMNet selects transcription factors that explain genes with greatest differentialexpression.The transcription factors and differentially expressed genes are represented as trianglesand octagons respectively. Grey shading on the transcription factors represents the extentof depletion in the original screen (legend shown in Figure 3). Shading on the differentiallyexpressed genes reflects either down-regulation (green) or up-regulation (pink) in the invivo screen relative to the in vitro screen. The thickness of the interaction line representsthe amount of flow captured by that interaction. Node shapes and border colors are thesame as explained in Figure 3.

Integrated approach finds genes connecting disparate data sets

This network further specifies interactions specific to either the in vivo or in vitro screen, as

well as interactions common to both screens (Figures 3&4). The network identified genes

hits that connected the in vitro data (e.g. HGS, ARR, CASP1) or the in vivo data (e.g.

CPSF6, WWP1, TCF4, KLF5, HOXA9), and genes that connected data from both screens

(e.g. IKBKB, NFKBIA, RELA).

We can also use the predicted genes identified by the algorithm to enhance the abil-

ity to identify known pathways using Gene Ontology (GO) functional enrichment statistics.

Functional enrichment of the network genes compared to the whole genome identified GO

processes distinct from those identified in the RNAi or mRNA expression data alone (Ta-

ble 5). The processes include many associated with hematopoiesis including leukocyte

homeostasis (q-value: 9.76e-1o), lymphocyte homeostasis (q-value: 1 .67e-9), regulation

of leukocyte differentiation (q-value: 2.05e-8), hemopoiesis (q-value: 4.16e-8), and positive

regulation of lymphocyte proliferation (q-value: 1.97e-4). These processes reflected B-cell

45

specific functions such as negative regulation of cell differentiation (q-value: 4.92e-10)

which reflect the undifferentiated state of B-cell leukemias[20]. This set of GO processes

also included functions associated with other hematopoietic progenitors, namely leuko-

cytes. This enrichment occurs because genes with these annotations are shared among

these two biological functions. 12 of the 13 genes annotated as having a role in leukocyte

homeostasis are also involved in lymphocyte homeostasis.

The enrichment analysis also identified pathways with known relationships to leukemias,

if not specifically acute lymphoblastic leukemia. We address the potential signifigance of

these pathways in the discussion.

transforming growth factorbeta receptor signaling 1376 Fos, Parpi, Skil, Smad. Tgfb3, Smad2, Smad9, Ptk2, Smad3 Trp53pathway 1 27E-12 9.29E-11 (20822,69,307.14) Creb1. Jun Map3kl, Src

Med1. Hdac2, Vhl, Lmo2, TWt712, Pparg, Hoxa9 Ptk2, 1118, Trp53,Foxo . Jdp2. ApIs, Pkp2, Xdh, ltgbl, Vim, Ezh2, Erbb2, E2f1. Skil

negative regulation of cell 389 Erbb4, Myc. Smad3. Hmga2, Meisi. Gsk3b, Itgbibpl, Mapki, Pax6,differentiation 7 07E-12 4.92E-10 (20822,610,307,35) Suzl2. Trp73. Ctnnbl. Stat5a, NfktAa

Mcphl Jak2, Med1, Plscri. Pcna, Bag5, Map2k1, Vhl, Brcal, Pparg,Fadd, Hifla CmkIl, Trp53, Foxo, Tnm28, Mdfi, Traf2, Gch, Mecom.Atxn3 Ubqln2, Traf6, A4. Bag6, Xdh, Nck1. Zbpt. Kf4. Ccr2 Parpi.

positive regulation of protein 300 Skil Epm2a Dv12. My Smad3 Mapkl Ppp4c. Trp73. Rela Cd28,import into nucleus 8 16E-12 565E-10 (20822,1084,307.48) fl2ra Map3k3, Stat5a. Map3kl Map2k4, Nfkbia. Rab33a

13.18 Ahr, Sos1, Fas, Lat, Skil. Ppp3cb, Hif ta, Fadd, Casp3. lkbkb. If2ra,leukocyte homeostasis 1 50E-11 9 76E-10 20822.67,307,13) Stat5a Mecom

14. 3 Sos1. fkbkb. Ahr. Lat, Fas, lf2ra, Skil, Ppp3cb. Stat5a, Fadd, HifIa.lymphocyte homeostasis 2 66E-11 1,67E-09 (20822.56.307.12) Casp3

regulation of leukocyte 575 Sost. Fas. Ccr2, Erbb2, Fos. Fadd. Myc.Tal1, Creb1, Gfib, Jun. Apms.differentiation 3 73E-10 2.05E-08 (20822,236,307 20) Asxl2. Cd28, ff2ra, Ctnnb1. Rb1, Stat5a 1 raf. Tyrubp

regulation of myeloid 9 13 Fos, Fadd. Myc, Tal, Crebt, Gfitb, Jun, Asxl2. Apcs, Ctnnbl, Rbt,Ieukocyte differentiation 4 13E-10 2.24E-08 (20822.104,307.14) Stat5a. Traf6. Tyrobp

8.71 Jak2, Med1, Klf4, Ahr, Ccr2 Lmo2, Hitla. Hoxa9, Meisi, Tall, Sox6hemopoeisis 7 79E-10 4.16E-08 (20822,109,307,14) Ctnnbl. Sp3. Sp1positive regulation of 8 48 Brmaf, Smad4. Eed, Trp53, Tall. Plk1, Ctbp1, Jdp2, Gfilb Asxl2,chromosome organization 4406-09 2.13E-07 (20822,104,307,13) Ctnnbl, Rb, Ep300positive regulation of myeloid 11.52leukocyte differentiation 7 35E-08 3.06E-06 (20822.53.307.9) Gfllb, Jun, Fos. Asxl2. nb1, Stat5a Fadd, Trafe. Creb

positive regulation of 4.21 Jak2, 2Atp6ap2, Ccr2, Brcal, Fadd, Hifla, Smad3, ff18, Crebi, Rela,cytokine production 7.47E-08 3.10E-06 (20822,322,307,20) Cd28, Rel. Traf2. Stat5a, Arn, Hdacl, Traf6, Src, At14,regulation of apoptotic 531 Jak2, Pfscrt. Nck1, Fas, Skil. Fadd, Mapk8. Smacd. Myc, Trp53.signaling pathway 1 16E-06 3.79E-05 (20822,166 307.13) Gsk3b Trp73, Traf2

10 32myeloid cell development 4 62E-06 1 31E-04 (20822,46 307.7) Sox6, Med1, Ptpn11, Tal, Meis1, Ep300, Sr

positive regulation of 595lymphocyte proliteration 7.38E-06 1.97E-04 (20822,114,307,10) Nckt, Ccr2. Cdknla. Cd28, Stat5a, Fadd, Traf6

Hdac2 Cdhl Atp6ap2, Tcf712. Dy12. Smad33, Dkk3Regulation of Wnt signaling 3.88 Dvl, Foxol, Mdi. Hdac1, Src. Xiappathway 3 49E-05 7 84-04 l20810,227,307,13 )I

Table 5: GO enrichment of network gene identifies processes associated with B-cellleukemia.GOrilla identified enriched GO processes using the network nodes as the foregroundagainst a background of the whole genome.

46

Robustness metrics prioritize and predict genes relevant for ALL progression

Toward determining targets for dedicated validation, we quantified the robustness of each

gene selected in the network using randomization. This routine involved running SAM-

Net 100 times with randomly selected input protein and mRNA targets. The number and

value of prizes remained constant, but were assigned to random genes, and the under-

lying interactome was unchanged. We counted a gene's fractional representation in the

family of random networks and display this as a specificity score (see Figures 5 & 6); a

lower score represents a greater specificity. This randomization mitigated against potential

targets arising artificially due simply to high connectivity within the interactome.

To prioritize targets for further investigation, we leveraged information contained in the

network to create a ranking score metric. The ranking reflects the gene's robustness to

randomization (1-specificity) as calculated above, an authority index (reflecting the number

of out-going connections from a gene), and aflow quantity (the amount of 'weight' upstream

of a gene) (Table 6 and Supplemental Table 1). The aggregated ranking score is a weighted

sum of these three factors, where each is weighted by the standard deviation of that factor;

the sum is normalized to the max possible score (the total of the standard deviations). The

specificity varied the most across all network nodes, and was the most discerning factor in

contribution to the aggregate ranking metric.

47

Phenotypic HitsNode Acoreaate ScoieRp17 0.811Lap3 0.811St3al 0.803Pogz 0.803Lat 0.836T prkb 0.835ZbIh24 0.835htgb IbpI 0835Sc8t1 0.835_Ppp3cb 0.828Fcr15 0.828

Tsgl _ 0.40

Hr2 0.80Ubpd4 08007Tre2i 0.265

Arnt 0.261_-K-f6 0.840Map3k14 0.840T g101 0.840Smur12 0.840TrIrn24 0.840Pfscr1 0 840WWp1 0.838SkI[ 0.832Lmo2 0.824T0f4 0.8232810408M09Rlik 0.833Mid1 0.833Rabacl 0.816

Hgs 0 800Tbc1d4 0,800

Transcription FactorsNode Aggregate Score

Hinga2 0897

TIdp1_ 0 883Foxo6 0. 85r

Slat5a 0.496

Pei 0.457

Gabpbi 0.832Foxol 0.831

Sp3 0.750Alt4 0.749

Table 6: Aggregate scoring of top network nodes.Genes are grouped by type (transcription factor, phenotypic, and hidden) and contextual

effect. The shading in the right columns refers to colors from the network key in Figures

3&4.

Predicted pathway contains genes contributing to ALL progression specifically in

vivo and genes affecting B-cell viability

As initial evidence supporting the effective capabilities of our integrative pathways-based

modeling approach, we note that the model predicts in vivo specific effects for Lmo2 and

Pogz. These two targets have, in fact, been previously validated successfully[93]. For

additional evidence, we undertook here new dedicated experimental tests of predicted

hidden genes selected from the model-determined ranking list (Table 6). From the ranked

gene list (Table 6), we used in vivo competition assays (Figure 7) to measure the effects of

loss of these predicted genes on ALL progression.

Hgs is a vesicular body-associated protein and was not included in our initial library

48

screen. The model predicts that shRNAs targeting Hgs will deplete in vitro. In competition

assays, an shRNA against Hgs depleted in culture, the blood, bone marrow, and spleen.

The depletion in culture was greater than in all organs and was consistent with our model

predictions (Figure 7). These findings suggest that while Hgs confers a growth disadvan-

tage to a pre-B-cell ALL, the in vivo environment mitigates the effect of Hgs loss. This

may be due to growth factors or other paracrine signals that are relevant to hematological

malignancies[47], direct cell-to-cell contact, disease compartmentalization, or many other

factors unique to the in vivo environment[93].

Wwpl, is a ubiquitin conjugating enzyme that was in our initial library, though the se-

quencing reads were below the limit of detection, suggesting a bad shRNA reagent. The

model predicts that Wwpl knowckdown has an in vivo specific effect, and competition as-

says confirm this phenotype. shWwpl enriches in the blood, bone marrow, and spleen

(Figure 7).

49

Input

Input

Hgs (sh1), normalized

0.0001< 0.0001

17-0 0002

*g

0,0078 l

0

- 0

0

0 0 0-'\0 \0 0 e 4

V, llf' '0 4 "q e10'0

In Vitro

In Vivo

Wwpl (sh5), normalized

00001 <00001 0.0046

1.5- ns -Z

1.0-1b- A t1

n n 1 I

0. 0

C:'

Figure 7: Validation shows in vitro and in vivo effects for Hgs and Wwpl.In the competition assays, we measure the relative abundances of pre-B-cells with andwithout an shRNA against our gene of interest either in culture or transplanted into mice.We measure relative proportions at the time of morbidity using FACS. All plots are mean+/- S.D. For all samples, n=3, except for Hgs, and Wwpl tissue samples where n=4.

50

I

1.5

1.0

C00.1.

0CLo 0

MM16=

SInfect*g

Author Contributions

All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F), Michael Hemann

(M.H.), Sara Gosline (S.G.), Simona Dalin (S.D.), and Jennifer L. Wilson (J.L.W.)

Contributions: D.A.L., E.F., J.L.W., and S.G. designed the project; J.L.W. and S.D.

performed research; E.F., D.A.L., M.H., and S.G., contributed new reagents/analytic tools;

J.L.W. analyzed data; J.L.W. wrote the paper; and E.F., and D.A.L. supervised all aspects

of the work.

Discussion

Discovering pathways de novo by leveraging multiple 'omics measurements can find latent

information from functional genomic screens. We confirm the relevance of our model first

through GO analysis. GO enrichment of the input RNAi and mRNA expression sets found

no enrichment of relevant functions. This reaffirmed the disparate nature of individual data

sets and suggested the possible value of an integrated approach. Further, we validated

specific gene predictions from our network model. The model predicted an in vivo specific

effect for Lmo2 and Pogz. They were not within the top 1% of ranked shRNAs, but were

functionally related to the very top gene targets as predicted by the pathway model, and

were found to have in vivo specific effects through further validation. Hgs and Wwpl were

not included in the original screen analysis due to low sequencing coverage, yet the model

predicted their relevance. Without such a model, they may never have been considered.

Through focused validation, they both proved relevant to ALL progression.

The model makes many predictions about genes that have unclear roles in ALL and

this underscores our incomplete knowledge of biological pathways. Some of these genes

have mixed roles in development. Hemopoeisis is governed by stage- and lineage- specific

transcription factor regulation [136, 107]; finding these relationships is valuable to under-

standing disease progression. Some transcription factors, such as TCF3, PAX5, IKZF1,

and EBF1, are involved during many hematopoietic stages[l 36]. Other network genes are

characteristic of non-ALL leukemias. For instance, our network selects the transcription

factor TCF4. This factor is up-regulated in the solid tumors of adult T-cell ALL patients[1 07]

51

but currently has an unknown roll in B-cell ALL, though it is highly expressed in this pre-

B-cell ALL system. Additionally, functional enrichment identifies that network genes are

enriched for the TGF-beta growth factor pathway process. These pathways have been

identified in leukemias other than ALL, and have been characterized for clinical utility. In-

vestigations following on the utility of these findings could be fruitful paths forward in the

context of ALL. Multiple leukemias, including chronic myelogenous (CML)[1 04], acute T-cell

leukemia (ATL)[1 09], and ALL[28], express TGF-beta associated genes. Though, expres-

sion of TGF-beta components was higher in the T-cell leukemias than in the other leukemic

cell lines[1 09] and the ATL samples responded to exogenous TGF-beta. Further, TGF-beta

and Foxo3a activity both promote the maintenance of leukemia-initiating cells in CML[1 04]

and loss of Foxo3a and TGF-beta inhibition better sensitized cells to Imatinib treatment.

Clinically, patients with higher levels of TGF-beta are considered high-risk, as they often

harbor additional mutations that prevent the tumor-suppressor roles of TGF-beta[28]. The

network also identifies the Wnt signaling pathway which is relevant in multiple hemato-

logical malignancies; specifically, stabilized beta-catenin is associated with differentiation

arrest, is highly expressed in childhood T-cell ALL, and Wnt signaling can promote drug

resistance[107, 39, 7, 43]. Understanding the role of TGF-beta and Wnt in ALL will in-

form research in therapeutics as these pathways have already been characterized in other

leukemic contexts.

These findings confirm the relevance of network gene selections and highlight the simi-

larity of hematopoietic malignancies; many of the same genes are involved, though degree

of expression and directionality of their effects can be context dependent. Further, B-cell

development pathways have been identified as hallmarks of drug-resistant leukemias and

thus, therapeutic strategies promoting B-cell maturation show promise[64]. Previous stud-

ies have also demonstrated the role of histone modification pathways in relapsed ALL;

specifically CREBBP and CTCF, are mutated in these patients and may affect treatment

response[1 02]. The network confirms the relevance of CREBBP to pre-B-cell ALL, though,

the model indicates that this gene is deleterious to B-cell viability and does not specifically

modulate the response to the in vivo environment. Identifying developmental pathways in

ALL is not novel, but there is novelty in specifying which of these genes are deleterious for

52

pre-B-cell growth and those that are deleterious in vivo.

We develop our model to select for context-specific interactions and to identify genes

relevant to the in vitro or in vivo settings, or genes that are common to both. In the case of

Hgs, the model predicts an in vitro specific effect. We demonstrate that Hgs loss confers

a growth disadvantage in culture, blood, spleen, and bone marrow, but that this loss is

most drastic in culture. This trend emphasizes the ability to distinguish genes relevant to

the pre-B-cell model and those specifically responding to the cancer microenvironment.

In the case of Hgs, loss of this gene is significant in culture, blood, spleen, and bone

marrow, though our model predicts and confirms that this is not an effect specific to the

tumor microenvironment. This framework is valuable as screening context (e.g. treatments

or experimental environment) can affect which parts of a pathway are relevant. In our

example, all genes in the model are predicted as part of the ALL pathway, but only a

subset are relevant to ALL progression in vivo.

Hgs (Hrs) is a member of the ESCRT-0 family of proteins involved with multivesicular

body (MVB) formation [31]. This protein is involved with the internalization and degradation

of activated cell-surface receptors[1 44, 103] and loss of Hrs is associated with increased

accumulation of E-cadherin and decreased cell proliferation[144]. Hrs is also essential

for the termination of 1L-6 signals by sorting gpl 30, a transducer of IL-6 stimulation, for

endosomal degradation[141]. Further, Hrs deficiency in mice decreased B-cell receptor

expression; BCR expression is necessary for pre-B-cell expansion[1 03, 170]. The conse-

quences of Hgs loss suggest that deficiencies in receptor internalization are deleterious to

this type of pre-B-cell ALL.

The network predicts a novel role for Wwpl in pre-B-cell ALL. Even though the network

model was constructed using shRNAs that conferred a growth disadvantage, the model

does not predict the directionality of effects for predicted genes. Validation experiments

confirmed that Wwpl had an in vivo specific effect, and that Wwp1 loss conferred a growth

advantage. This enrichment is surprising because so far, most evidence suggests that

Wwpl is an oncogene; our competition assays suggests that in ALL, Wwpl may have a

tumor suppressor role. WWP1 is a member of the Nedd4 family of ubiquitin ligases, many

of which are over-expressed in cancer[18]. Many results suggest that WWP1 acts an

53

oncogene by targeting tumor-suppressing proteins, such as LATS, for degradation. WWP1

is over-expressed in breast and prostate cancers, and seems to function in this oncogenic

manner[17, 167]. Though, WWP1 also participates in a unique feedback loop with p53.

Wwpl stabilizes p53 protein in the cytoplasm, but decreases its expression, and in turn,

p53 reduces the expression of Wwpl[82]. This feedback loop is dependent upon p53

mutation status; mutated p53 abrogates this feedback dynamic leading to increased Wwpl

expression. In our pre-B-cell ALL, Wwpl has relatively low expression and intact p53

suggesting an intact feedback loop. Loss of Wwpl could lead to decreased p53 protein

stability and enable interactions with other proteins. For instance, these cells express high

levels of survivin (Birc5), which can prevent p53-mediated apoptosis in pediatric ALL[1 47].

Off target effects are one of the largest criticisms of RNAi screens, yet even with these

effects, we construct and validate a model for genes regulating ALL progression. We cre-

ate this model with less than 100 input targets from each screen and find GO enrichment

of genes related to leukemia and hematopoietic development. This demonstrates that the

network filter is sufficiently powerful for analysis of screening data and that even stringent

thresholds are sufficient for identifying genes relevant to a pathway. Further, we perform

this analysis retroactively without controlling for OTEs or requiring specific negative con-

trols, showing that RNAi analyses do not have to specifically compensate for OTEs. This

suggests that even with limited and imperfect screening results, we can learn more from

these screens, and that pathway discovery approaches are a viable path forward for the

gene-interference community.

There are limitations associated with this approach. We have not yet made an estimate

of validation rate for these network-based filters. To make this estimate, we would need

to validate all genes in the network and determine the false positive rate. This method

requires multiple datasets to create a possible network whereas other approaches require

fewer input datasets. Some examples of tools that require a single input dataset include the

Prize Collecting Steiner Forest (PCSF) [61, 146], TieDIE[1 16], and HOTNET [149, 150].

For a biologist, these tools could be easier to implement, but they lack the perspective

gained by integrating multiple 'omic measurements. Each 'omic measurement captures a

slightly different aspect of the cellular process under investigation, and so there are trade-

54

offs between completeness and simplicity when selecting these tools.

Application of integration methods will improve understanding of gene-interference screens

regardless of improvements in reagent design. CRISPR systems are more sensitive than

RNAi and are better suited for discovering essential human genes[52, 53]. Though, both

technologies are limited by biological compensation - alternative pathways or redundant

proteins. Thus, adapting and applying interaction-based methods for data integration will

continue to be of importance for functional genomic investigations. Many have promoted

the value of data integration analysis methods[98, 161, 135, 166], however, these analysis

approaches remain under utilized in the gene interference community.

55

Materials and Methods

shRNA screen, and data processing

The original shRNA screen and mRNA expression data are described previously[93]. DE-

SEQ2 size-factor normalized and calculated fold-changes for shRNA sequencing data. We

ranked these fold-change values by p-value from DESEQ2[5]. The GOrilla tool (http://cbl-

gorilla.cs.technion.ac.il/) determined GO functional enrichment using the top 1 % of dif-

ferentially depleted shRNAs against the background of the whole mouse genome. GO-

rilla also determined functional enrichment of the top 1% of differentially expressed (up-

regulated/down-regulated mRNAs for the in vivo/in vitro contexts) mRNAs against the

background of the whole mouse genome.

Cell culture

Pre-B-cell ALL cells[1 60, 159] were cultured as published[93].

ChIP-SEQ

We performed ChIP assays as previously described[90]. Briefly, 8.5x106 pre-B-cells were

cross-linked with 11% formaldehyde. Pellets were lysed and sonicated using the BioRup-

tor. Following sonication, activated DNA regions were immunoprecipitated using the fol-

lowing antibodies: K4me3 (Millipore Lot#1974075), K27Ac (Abcam Lot#GR1048521), and

IgG (Millipore Lot#JBC17938060).

Peak Finding

We used the MACS algorithm [37]to identify peak regions 10K kb upstream of the transcrip-

tion start site. Within these peak regions, we searched for valleys (local minima within peak

regions) and then used these regions as inputs for potential transcription factor binding lo-

cations. The ChIP-seq data are available: GSE77570 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?ac

56

Interactome construction

We used Garnet to create transcription factor to DNA binding interactions. The Garnet

package uses a weighted scoring function[106] to determine which transcription factor

motifs were the most likely to bind a set of Fasta sequences. For input, we used the valleys

identified from our previously mentioned ChIP-seq data. The software is part of a suite of

'omics integration tools (http://fraenkel.mit.edu/omicsintegrator).

For our starting interaction network, we downloaded interactions from iRefWeb version

9 and we only kept interactions that mapped to UNIPROT reviewed human proteins. These

interactions were scored using the miscore framework[1 52] to create a probabilistically-

weighted interactome. This score considers the number of publications, the type of inter-

action, and the evidence supporting the interaction. We discarded interactions with a score

below 0.3. This yielded a starting network of 88117 interactions. The interactome is hosted

on the Fraenkel Lab Website (http://fraenkel-nsf.csbi.mit.edu/psiquic/).

SAMNet

We built the model using SAMNet[49]. We used the top 1% of depleted shRNAs (for in

vivo and in vitro), the top 1% differentially expressed mRNAs (up-regulated genes were

assigned prizes for the in vivo context and down-regulated genes were assigned prizes for

the in vitro context), the transcription factor to DNA binding interactions, and the weighted

interactome as inputs to the algorithm. For simplicity, we mapped all mouse data to human

gene symbols using homology. We used a gamma parameter of 17 as this was found to

maximize the number of shRNA-targeted genes in the network. We again used GOrilla

to determine the functional enrichment of the network, this time using the set of network

nodes as a foreground against the background of the whole mouse genome.

Randomizations and Robustness Analysis

For randomizations, we completed 100 runs of the algorithm at the same gamma parame-

ter, but with the depletion and differential expression scores distributed to random sets of

genes. We calculated node enrichment fraction, or specificity, by counting the number of

57

times a node from the real network was selected in a random network. For further scoring

and ranking, we used 1-specificity to keep all network parameters on a 0-1 scale. We used

networkx in Python to determine the authority scores for network genes. SAMNet outputs

the amount of flow converging on a gene. Our aggregate scoring method weighted each of

these normalized values by the standard deviation of these metrics across all genes and

was normalized to the sum of all metric standard deviations:

R = (1-specificty)*o( 1-specifcit-y)+authority*Oauthaority+Iow*O1fIow0(1- specifcity) +01authority +aflow

We completed this calculation separately for genes unique to in vivol in vitro contexts

and common between both contexts because common genes had two authority and flow

scores. Because we designed the model to allow for multiple out-going connections from

transcription factors but not from hidden genes and because phenotypic (shRNA) hits did

not have incoming flow in the mode, we created separate rankings for transcription factors

and phenotypic hits.

GFP Competition Assays

For model validation, we conducted parallel in vitro and in vivo GFP competition as-

says. We created shRNA constructs for Hgs and Wwpl, and infected pure populations

of mCherry positive ALL cells as described[931. At the time of experiment, infected pre-

B-cells were mixed 50:50 with mCherry positive ALL cells. From these populations, 100K

cells were tail-vein injected into four 8-week-old, female C57BL6 mice, and 100K cells

were plated in triplicate. Cultured populations were split 1:5 daily until mice reached

morbidity (10-12 days following injection). At the time of morbidity, we collected blood,

spleen, and bone marrow. For all in vivo and in vitro samples, we measure the %GFP

of mCherry-labeled cells and calculated a fold-change relative to the 50:50 ratio at input.

We normalized %GFP-fold-change to the MLS control for the respective tissue and used a

t-test with Welch's correction to determine significance. The sequences used for Hgs and

Wwpl shRNAs are CCAGAAACCACTTATATGTCTA and CTCCCTATTTTATACAGAGCAA.

We tested shRNA knockdown using qPCR relative to Gapdh using Taqman expression

assays. shHgs and shWwpl left 23.28% and 67.64% mRNA remaining (Supplementary

58

Figure 9).

59

Supplemental Figures and Tables

__ iA

V I

_4A

Figure 8: The whole network, unformatted as created by SAMNet.The Source and Sink nodes have been colored blue and orange respectively to demon-strate their connectivity to the original inputs. Labels remain in Human Uniprot identifierswith the exception of mRNA targets which are mapped to gene symbol. We show the fullnetwork for completeness, but do not use the network in the unformatted version for anyinterpretation. For better viewing, an interactive format is available on the Fraenkel Labwebsite (http://fraenkel-nsf.csbi.mit.edu/psiquic/).

60

qPnnr

Figure 9: Percent reduction in mRNA after knockdown as measured via qPGR.

61

-- ------ 061M -- - . - - -%mom.

Part IV

Network modeling identifies commonregulators of NFxB and TGFa ectodomainshedding

Abstract

Studying pathway dynamics affected by growth factors has implications for under-standing cancer and designing therapeutics, yet few studies have investigated regu-lation of the growth factors themselves. Relevant growth factors, such as the Epider-mal growth factor ligand pro-Transforming-growth-factor-alpha (pro-TGFc), can haveboth autocrine and paracrine effects. Sheddases, notably the metalloprotease A-Disintegrin-And-Metalloprotease-1 7 (ADAM1 7), can cleavage-activate multiple cell sur-face pro-ligands on the cell surface, in a process termed ectodomain shedding. In thiswork we investigate the role of select signaling genes that specifically affect the cleav-age of TGFa. We use an shRNA screen and network optimization to identify a putativepathway regulating shedding of TGFa. This network contains experimental genes fromthe shRNA screen and predicts additional "hidden" genes not previously analyzed. Inour analysis we identify a TGFa cleavage regulatory pathway that contains a cluster ofgenes with known NFxB regulatory functions - PEF1, PEF2, CALM1, OBSCN, IRAK1,XIAP, and TAB1. We validate that the hidden genes XIAP and TAB1 have roles inTGFt shedding. Further, we provide evidence of NFxB and EGFR pathway crosstalk,and demonstrate for the first time that regulators common to both pathways affectectodomain shedding of TGFL.

62

Introduction

Growth factor pathways, and specifically receptor tyrosine kinases (RTKs) are the focus

of multiple therapeutic efforts in cancer [51, 157, 95], though even with well-designed in-

hibitors, resistant cancers arise [34, 33, 108, 148]. For instance, tumors can evolve point

mutations that render the therapies ineffective, or the cell can activate alternative pathways

to overcome inhibition[33, 114, 163]. Alternative pathway activation, sometimes referred to

as 'RTK switching'[44], results from bypass signaling mechanisms such as feedback loops

or the presence of growth ligands for other growth receptors. For instance, exogenous EGF

can confer full resistance to the cMet inhibitor, Crizotinib, treatment in gastric cancers, and

partial resistance to this therapy in lung cancer[1 62]. Wilson et al further characterize this

phenomenon in 41 cells lines, with six different RTK ligands in the presence of nine RTK

inhibitors. Further examples of growth factor-induced resistance include NRG1 resistance

to HER2 inhibition, HGF- and FGF-conferred resistance to NRG1 inhibition, and EGF-

conferred resistance to FGFR inhibition[1'62]. These findings suggest that these pathways

are redundant in their ability to actuate growth programs. Though, it is unclear if these

pathways converge on the same downstream signaling molecules or if they use unique

signaling paths to actuate growth. To understand growth factor induced mechanisms for

resistance, we need a thorough understanding of the pathways that govern regulation and

cleavage of these factors.

The growth factor TGFcx has already been studied in many disease contexts, yet, com-

prehensive understanding of the signaling network surrounding this ligand is lacking. In

breast cancer, TGFc expression is negatively correlated with progesterone receptor ex-

pression, but positively associated with tumor aggressiveness[87]. In gastric cancer, sig-

nal peptidase 18 (SECi 1 A) loss increased tumor invasion and tumor volume, and also

induced increased TGFa cleavage[1 13]. In pancreatic cancers, cells with increased TGFX

expression were more sensitive to gefitinib than those with lower TGFX expression. The

authors found that gefitinib sensitivity was dependent on basal EGFR pathway activation,

suggesting a feedback mechanism between activated EGFR and TGFa [118]. While TGFi

expression has been characterized in cancer and response to therapy, there is an incom-

63

plete understanding of how regulation of the shedding of mature TGFa relates to this pro-

cess. We know that activation of the RTKs FGFR, PDGFR, stimulates TGF shedding and

that the Erk MAP pathway is necessary for this receptor-mediated stimulation[35]. When

compared to shedding of other ligands, MAP kinase signaling was identified as a general

mechanism for shedding of TGFc, TNFcx, and L-selectin[35], though in other work, using

hypertonic stress, LPA, or TPA stimulation could selectively shed HB-EGF, EGF, NRG, and

TGFa[57]. Further work extended this result and showed that different PRKC isoforms

were required for shedding depending on the stimulus[25].

The EGFR pathway and its related growth ligands are becoming interesting molec-

ular targets for gastric cancer which has limited targeted therapies. Currently, gastric

cancer is one of the largest causes of cancer-associated death worldwide[8] and surgi-

cal resection remains the frontline treatment[132] The disease is associated with obesity,

and H.pylori infection, and these lifestyle factors have been associated with molecular

level phenotypes[132, 70]. Specifically, H. pylori infection increased expression of EGF,

and EGFR, and these expression patterns persisted after eradication of the infection[70].

EGFR inhibition blocked tumor growth and invasion in gastric cancer cells[174]. EGFR

status had an association with short overall survival whereas HER3 expression had a

borderline association with longer overall survival in an immunohistochemical analysis of

esophageal and gastric cancer patient samples[54].

Some gastric cancer samples harbor c-Met amplifications, thus making both EGFR

and c-Met pathways the focus of therapeutic interventions[1 12, 8, 67]. In lung cancer

patients, c-Met amplifications were associated with resistance to gefinitib and erlotinib,

and combined inhibition of both c-Met and EGFR was necessary and sufficient for block-

ing downstream MEK/ERK activity[33]. However, targeting c-Met and EGFR has conse-

quences beyond reducing activity of the RTKs themselves. First, pEGFR, and the ma-

trix metalloproteinases, MMP7 and MMP9, were significantly elevated in gastric cancer

resections[165]and TGFa, and the matrix metalloproteinases, MMP-7, and MMP-9 were

associated with poor overall survival in gastric cancer patient biopsies[38]. Further experi-

ments in gastric cancer cell lines showed that blocking EGF-induced EGFR activation could

prevent MMP7 and MMP9 activation[165].Because the metalloproteinases are responsible

64

for shedding multiple surface-bound growth factors, this finding suggests that targeting the

EGFR pathway could alter growth factor dynamics for multiple pathways. Further, targeting

both c-Met and EGFR had limited efficacy in MKN45 cell lines, a c-Met amplified gastric

cancer cell model. Specifically, siRNA knockdown of c-Met decreased activation of the im-

portant downstream signaling molecules p-AKT and p-pl3K in this cell line, though these

cells did not show an altered sensitivity to gefinitib treatment[1 75].

From a molecular perspective, NFxB appears to be an ideal co-therapy for EGFR in-

hibitors in epithelial cancers. NFxB has been associated with resistance to EGFR therapies

in cancer[1 33], and inhibition of NFxB has also sensitized cells to erlotinib treatment. Anti-

EGFR therapies showed reduction in NFxB activation, but did not drastically alter patient

outcome and survival[133]. The EGFR pathway can activate NFxB through proteasome-

mediated degradation of IkBot, yet proteasome inhibition with Bortezomib was not suc-

cessful because non-canonical NFkB activation remained intact[3, 4]. Unfortunately, IkK

inhibitors have not been effective co-therapies for EGFR inhibition[133]. Further, NFxB

targeted therapies have high toxicity [26].

The sheddase enzymes ADAM 10 and ADAM 17 are a major focus of growth factor reg-

ulation because they cleave and activate these factors. Though, these enzymes serve a

multitude of functions: they shed growth factors, cytokines, growth factor receptors, and ad-

hesion molecules, and as a result of these activities, ADAMs can actuate cell growth, death,

differentiation, and angiogensis, among others[22]. The ADAMs have widespread effects

that can either inhibit or activate phenotypes/cellular processes; e.g. ADAM17 cleavage of

EGF family ligands is activating, ADAM10 cleavage of Ephrin ligands is inhibitory. These

qualities make them unideal candidates for therapeutic intervention[1 19, 2], and thus it is

important to consider other means for modulating growth factor levels in the presence of

targeted therapies.

Here we endeavored to discover a pathway regulating TGFa cleavage and further char-

acterize novel regulators of this pathway. We first pursue an shRNA screen for kinases and

phosphatases affecting TGFx release. Knowing that RNAi screens are incomplete and bi-

ased due to off-target effects, we pursued an integrated network approach for discovering

the true TGFa pathway from this type of data. We have previously demonstrated the power

65

of integrated network tools for the de novo discovery of pathways from gene interference

screens, and believed this would be a powerful approach for the discovery of the poorly

characterized pathway affecting TGFa shedding. From this pathway, we discover a novel

connection between NFxB signaling and TGFa shedding; specifically that these pathways

share common regulators. Using this information, we explore IRAK1 as a specific modu-

lator of both NFxB and TGFx and postulate that therapies against this enzyme could have

therapeutic potential.

66

Results

A kinome/phosphatome screen for regulators of TGFa shedding

We conducted a screen for signaling regulators of TGFa shedding using a FACS based

assay (10A). We measured the relative extracellular TGFa (APC) to intracellular TGFa

(GFP) and normalize this red:green ratio for all shRNAs to the shlacZ controls (distribution

in 10 C). Most individual shRNAs fall in the region of no effect, or have a normalized ratio

with an absolute value less than 1 z-score from the distribution mean. 1 OB represents the

distribution of redundant shRNAs against a given gene; most genes have 4 or 5 redundant

shRNAs, others have 2,3,8,9, or 10 redundant shRNAs. In further interpretations, we

use the shRNA family size to represent the number of redundant shRNAs against a given

gene. This screen was originally published and validated in[25], based on a two-best

shRNAs scoring metric (7). We briefly compare different scoring methods and the gene

hits selected by these methods in 10. The two-best shRNA scoring method requires a

gene's top two shRNAs contain one hairpin with a score 2 z-scores above the mean and

one hairpin above 1.5 z-scores above the mean; this method selects 5 genes. The median

score method simply takes the median of all redundant shRNAs and finds 22 genes with

an effect 1 z-score above the mean (in the positive and negative direction). The shEnrich

method selects the most targets of all the methods. Across the three methods, PRKCA

scores in all methods. TRPV5 scores in the two-best and shEnrich method (explained

below).

67

A

TGra H'- mH acelular

.racellularGFP GFP

B

01)

E

500

400

300

200

100

02 3 4 5 8

num. shRNAs

C

4

a Soo 1000 1h00 noC 3'00 000 15W

rank

_5 4 j 2 0 1 2 3zscore

0 500 10 110 2000 2509 IAX

9 10

Figure 10: An shRNA screen measures kinase and phosphatase effects on TGFCt shedding(A) The Jurkat cell line expressed a GFP-HA tagged TGFa. After staining with an APC-coupled anti-HA antibody, we measured relative TGFot shedding by red:green ratio. (B)shRNA coverage per gene; for most genes in the library, we have 4-5 shRNAs against thesame gene. (C) Z-score normalized red:green ratios for all independent shRNAs relativeto shlacZ controls plotted as a ranked distribution and as a histogram of scores. Z-scores< 0 correspond to shRNAs which had a low red:green ratio and induced shedding, and z-scores > 0 correspond to shRNAs which had a relatively high red:green ratio and preventedshedding. The image below shows where all lacZ shRNAs (20 total) and all AXL shRNAs(10 total) fell within this distribution, and demonstrates the difficulty in determining whichgenes are consistently affecting shedding.

An shEnrich method for selecting gene candidates

To score the genes from the screen, we used an shEnrich method to measure consis-

tency of redundant shRNAs and strength of their effect. The enrichment scoring method

is conceptually modeled after the original GSEA method, and is representative of other

rank-order methods for RNAi scoring (e.g. RIGER and dRIGER)[140, 89, 171, 72, 121].

Our method most closely resembles that of Kampmann et al [72] in that our screening

readout was a single measurement (i.e. z-score normalized red:green ratios) and that we

derived our null distributions from a cohort of non-targeting controls. Our approach differed

68

3500

in that we calculated expected distributions for each gene based on the shRNA family size;

e.g. for a gene with 4 shRNAs, we calculated 1000 permutations using 4 shLacZs from

our control set. The final result of our enrichment method is a projected z-score and a

maximal enrichment score. This represents the rank at which maximal enrichment occurs

for a gene's shRNA family.

We demonstrate this selection process in 11. 10C depicts a conceptual motivation for

AXL relative to the lacZ controls. The dot plots represent where redundant shRNAs against

the same gene score in the distribution (AXL in green, lacZ in black). 11 A represents

the shEnrich score for Axi in the forward direction (highest rank reflects that an shRNA

increases shedding) and for PPP1 R1 4D in the reverse direction (highest rank indicates an

shRNA decreases shedding or maintains TGFa at the surface). Both AXL and PPP1 R1 4D

show high enrichment scores relative to the lacZ controls indicating high consistency. We

repeated this process for all genes of all shRNA family sizes in both the forward and reverse

directions. Within each family size, we compared gene shEnrich scores against 1000

permutations calculated using subsets of the shlacZ controls. 10C shows the distribution

of shEnrich scores for all genes relative to the distribution of shlacZ scores for shRNA

family size 5 (all other family sizes plotted in 14&15). Most genes did not show an effect

that was as consistent or more consistent than the shlacZ controls.

To additionally filter gene candidates for strength of effect, we plotted genes' shEnrich

score against the projected z-score (explained above) (11 D). Many genes had a low shEn-

rich score and a low projected z-score (low referring to a score with a value >-1 in the

forward direction or <1 in the reverse direction). Only a few genes had a strong shEnrich

score and a relatively high projected z-score; these genes fell in the upper-left/upper-right

quadrants (highlighted in orange) in the forward/reverse directions (scatter plots for all other

family sizes shown in supplementary 12). For our example genes, AXL did not make the

projected z-score cut-off of -1, but PPP1 R1 4D did make the 1 z-score cut-off. The full list

of genes that passed these stringent criteria is shown in 7.

We validated a portion of these gene targets in an MDA-231 cell line, measuring TGFo

cleavage via ELISA assay. Effects for PPP1 R1 4D, and PRKCA are shown in 11 D. We

have already published additional validation for PRKCA and PPP1 R1 4D[25]. Combined,

69

these experiments show that the shEnrich method can identify gene targets with an effect

on TGFa cleavage and that these effects are consistent in new cellular contexts. However,

not all targets were consistent with the shEnrich predictions; DUSP9 showed an effect on

TGFc cleavage in MDA-231 cells, but the directionality of this effect was reversed (11 D,

bottom) as compared to the effects observed in Jurkat cells. This switch in directionality

suggests that contextual factors affect the shedding phenotype, making it difficult to identify

pathways components from enrichment scores alone. This further suggests the need for

additional analysis to identify the true TGFa pathway.

For comparison, we also created tables for a "two-best" and median scoring approach.

The median scoring approach took the median value of all shRNAs against a given gene.

10 contains the results of these scoring methods. The "two-best" approach selects only

five genes as hits. The median approach identifies 22 genes as hits. While the most strin-

gent, the "two-best" approach selects genes identified with the shEnrich method, where

the median approach selects distinct hits from the other two methods.

70

A

1.2 J

B

Co sme*"

D TGFa cleavage in MDA-231

44

00

TGFa cleavage in MDA-231

4 .

Figure 11: shEnrich selects genes for consistency and effect size(A) Lightening plots show the enrichment score for AXL in the forward direction (red, left)and PPP1 R1 4D (gold, right) in the reverse direction. Grey lines represent 100/1000 enrich-ment scores calculated for shlacZ. Dashed line represents position of maximal enrichmentfor the gene of interest. (B) Normalized enrichment scores (NES) for all genes with 5redundant shRNAs plotted against 1000 NES for lacZ in the forward direction (left) andreverse direction (right). (C) Maximum NES score for all genes and lacZ controls with 5redundant shRNAs plotted against the z-score at which maximum enrichment occurs ('pro-jected z-score'). Normalized distributions and maximal enrichment plots for all other genefamily sizes are included in the supplement. (D) ELISA for TGFa cleavage in MDA-231 cellsfor PPP1 R1 4D and PRKCA (top), and DUSP9 (bottom). Each point represents redundanttests of the same shRNA; error bars are standard errors of the ratio of shRNA:shlacZ.

71

m piatelA

plate1B40 pla*38

p4Me3A

Splate2Aplate2B

mpIatelA3'0 pate3B

plale3A-l Plate2A

plIabo2B

shEmich Method

shRNA famdy size 2 DDR2 PPP5CPPPR1A DKFZP566K0524 PRG-3PII3P1 DUSP13 PRKWNK3RIOK2 EAT2 PTPRMshRNA family size 3 EEF2K RAF1CDC25A EIF2AK3 RIMS4PTPN18 EPM2A SGPP2TP53RK FBP2 SH2DIAshRNA family size 4 FLJ25449 SMG1BLK Gp 109b SNRKCKM GUCY2C SPAPIDUSP5 IMPA2 SYT14LEGLN1 INPP5F WWP2FLJ16518 IRAK1 shRNA family size 8GALK2 ITPKA CIB2GMFB LO283871 ERBB4HYPB LOC401313INPP5B LPP84KIAA2002 MAP3K1 1 vssnc

LOC400687 MAP3K71P1 shRNA family size 2LOC441868 MAPK4 CALM1LOC90353 NRI13 PPP2RIALOC91461 NT5E DUSP9NEK7 NUDT11 shRNA lamilysire 3

NUOTlO OBSCN PPAP2APHKA2 PCTK3 SSH3PIN1 PIB5PA shRNA family size 4PMVK PIP5K1A GPR109APTPN22 PLCB4 LCKPTPN5 PPAPDC1 PPP1R14DPTPRE PPAPDC1A SIK2RIPK5 PPEF1 TRPV5RXRB PPEF2 shRNA family size 5SACMIL PPFIA1 CDC14ATEX14 PPM1J ENPP1TRPM7 PPPIR12B EPB41L4AshRNA family size 6 PPP2CB NMESABLI PPP3CA PIM1AKAPI1 PPP4R1 PKIBCKS2 PPP4R1L PRKCA

Table 7: Gene hits selected by shEnrich

PCSF identifies a TGFu shedding pathway

We have previously articulated the role of off-target effects (OTEs) in RNAi screens[1 61];

specifically that both reagent-based and biology-based effects affect if a gene can be iden-

tified as part of a pathway from RNAi data. Our previous work demonstrated that data in-

tegration in a network context could construct pathways de novo despite noise from RNAi

screens. Given these results, we used the Prize-Collecting Steiner Forest (PCSF) to con-

struct a pathway for TGFax cleavage; in this process we predict additional pathway genes

not identified in the initial experimental set. Functionally, the Prize Collecting Steiner Forest

(PCSF) acts as a filter, requiring genes to have functional associations with other hits from

the screen to be included in the final pathway. To define all sets of possible associations,

we derived an interactome from iRefWeb and add predicted kinase- and phophatase- site-

specific interactions from Networkin and DEPOD (interactome construction explained in

methods). The algorithm then constructs an optimal sub-network from this interactome

by selecting edges that capture as many genes from the shEnrich ("experimental genes")

72

method as possible. In this process, the algorithm adds predicted genes that connect

genes without prizes, but pays a cost for adding these edges (12, top). We use a set of

parameters to tune the algorithm to ensure an appropriate selection of experimental and

predicted genes and explain this process in methods.

The network contains 86 genes, and 97 edges (12, bottom; full network in 18). Of the

original 108 genes selected from the shEnrich method, 43 are retained in the final network.

The remaining 43 genes are predicted by the algorithm to be relevant to TGFX cleavage.

These predicted genes include proteins that are refractory to shRNA knockdown and those

that were not in the original screen. We hypothesized that the network gene list contained

subsets of functionally related genes. To find these subsets, we leveraged edges in the

network solution; genes with a high number of inter-neighbor edges are more likely to

have functional similarities. We used the GLay Community clustering Cytoscape plug-in

to subdivide the final target set into 9 sub-modules. This process removed 14 edges (12,

bottom).

73

CASP9 $183 XIAP

CAR3

WNK3

PPPtf2BAKIO 11

Ef2K

EEF2K S78

ITP-RK F K

TP5L"15

SIK2 5 S 73

S*2

P (Rl E

BCL_ Sr87 P"C

P

/ PP1A

T '5 Nrl

TRPV5 654 3

\ NR1I3 T38

VASP 9157 PTB P1 16

Pti

SLAME11

SLAMW Y307 S

Pow- 9

- - LCK ITyr394

RAF1 S499

BCL2 S70

CASPS Y153

C(02

TS(:2gD4

PINt $138

MAl 11

lA C A

CDC25A Sor116Ser32'

*2 CD64A

Closeness Sensitiviq

AAB1 Insensitive

A Sensitive

Af1 0,668

SWId 0.394

ilCA Experimental gene, non specific

iR15i Experimental gene. specific

( Predicted gene, non specific, sensitive

BCL2 Predicted gene, specific, insensitive, high closeness centrality

Figure 12: PCSF identifies the TGFa shedding regulatory network(top) The cartoon schematic represents how PCSF selects an optimal sub-network. In-teraction edges are aggregated across multiple databases and all edges are scored (twoshown for simplicity). Prizes are assigned from experimental data; in our case prizeswere assigned to genes selected by shEnrich. The algorithm weights the cost of addingedges with capturing prizes in selecting an optimal network. (bottom)The optimal networkcontains 86 nodes (genes) and 83 edges. Grey face coloring indicates genes selectedfrom the original experimental data set; white face represents hidden nodes (genes andphospho-sites) selected by the algorithm. Node border represents specificity to random-ization (pink<=0.1; orange<=0.05), and node size represents closeness centrality (largernodes are more central and more robust). Grey node labels indicate nodes that are sen-sitive to edge noise. Nodes on the far right are annotated examples to help interpret all oftheir network properties.

74

1 -Specificity

I0 950.950.99

Robustness analysis prioritizes predicted pathway genes

To vet the candidates selected by PCSF, we performed a series of robustness and con-

nectivity analyses. We measure the sensitivity of each gene in the solution by counting

representation in 100 runs of PCSF with noise added to the edges. Genes that are insen-

sitive to noise show up in all networks and are assigned a score of 0.00, whereas genes

that are sensitive appear in a few networks and are assigned a score equal to 1-fraction of

network in which they are represented. A score of 0.99 indicates that the gene was present

in only one network. None of the experimental genes were sensitive to edge noise, though

12 of the predicted genes (e.g. MYH9, ING4, PPP1 CB) were highly sensitive and showed

up only in the solution without edge noise (Node label color in 12; quantification in8&9).

Randomization using random inputs from the initial library tests specificity of the final

solution. This process involves selecting random sets of genes from the original library,

rerunning the PCSF routine at the optimal parameters and counting gene representation

in this family of 100 random networks. Gene border color indicates fractional represen-

tation in these 100 random networks (Border color in 12; quantification in 8). Of the 43

experimental genes, 30 of these have low specificity, indicating general association among

targets from the original library; these targets may still have a role in shedding regulation

(such as PRKCA), though, we cannot distinguish these effects from general library asso-

ciation. Of the 47 predicted genes, 13 of these - PHKA2, PTPRM, ABL1, NR1 13, RXRB,

GMFB, PTBP1, CKS2, TRPM7, WWP2, PPEF2, PPEF1, PPP2R1 A, SNRK- are robust to

randomization and have high specificity.

We further use centrality metrics to quantify the robustness of our network selection.

Centrality does not imply a biological centrality per se, but reflects how the genes were

selected in this network solution. Having a high centrality implies a greater resilience to

experimental noise from the input set; mathematically, centrality reflects how interaction

edges contribute to a gene's presence in the final network. We calculated the degree,

betweenness, page-rank, and closeness centrality, and an aggregate score (1 9A) based

on interactions in the original, unclustered network. We created a normalized histogram

of centrality scores to look at the relative distribution of values (1 9B). Closeness centrality

75

best discerns connectivity, where other metrics are dominated by a few high-valued nodes

(i.e. Page-Rank, Betweenness, and Degree centrality). Using closeness centrality, we can

visualize these genes within the network solution (Node size inl 2; quantification in 8).

Lastly, we create a normalized, weighted sum reflecting an experimental gene's perfor-

mance across these three metrics. We adjust specificity/sensitivity to be (1 -specificity)/(1 -

sensitivity), and normalize closeness centrality to the max value to scale all metrics from

0-1. We weight each metric by the standard deviation for that metric and normalize to the

maximum score possible (8).

W = (1-sens) *Osens +(1 -spec)*sipec+normd*OnormciOsens +OSpec+Ofnormcl

We complete a similar analysis for the predicted genes, and again measure sensi-

tivity to edge noise, specificity using random input terminals, and closeness centrality

as a metric for connectivity (9; all nodes in Supplemental Table 2). From this analysis,

RAF1_S499, CASP9, XIAP, CASP9_S183, PIN1_S16, VASP_S157, CASP3, NR113_T38,

and PTBP1_S16 are ranked highly.

76

LCK 0.00 0.891 0.920 0.963RAF1 0.00 0.723 1.000 0.956PHKA2 0.00 0.941 0.780 0.936PIN1 0.00 0.871 0.821 0.935PRKCA 0.00 0.713 0.923 0.935PTPRM 0.00 0.921 0.730 0.921PPP3CA 0.00 0.851 0.774 0.920BLK 0.00 0.861 0.758 0.918SH2D1A 0.00 0.851 0.738 0.911PTPN22 0.00 0.832 0.732 0.907ABL1 0.00 0.901 0.668 0.902NR113 0.00 0.960 0.609 0.897RXRB 0.00 0.911 0.635 0.895GMFB 0.00 0.950 0.609 0.895PIM1 0.00 0.881 0.654 0.895PPP2CB 0.00 0.842 0.673 0.894PTBP1 0.00 0.901 0.609 0.887TRPV5 0.00 0.891 0.609 0.886IRAK1 0.00 0.911 0.587 0.883CKS2 0.00 0.881 0.563 0.873CDC14A 0.00 0.891 0.554 0.872DUSP5 0.00 0.891 0.554 0.872TRPM7 0.00 0.911 0.540 0.872CDC25A 0.00 0.505 0.801 0.872MAP3K11 0.00 0.871 0.563 0.871WWP2 0.00 0.931 0.524 0.871PPEF2 0.00 0.950 0.509 0.870PPEF1 0.00 0.941 0.509 0.869PPP2R1A 0.00 0.822 0.574 0.866ERBB4 0.00 0.802 0.562 0.860OBSCN 0.00 0.881 0.509 0.859AKAP11 0.00 0.891 0.497 0.858CALM1 0.00 0.733 0.594 0.857PPP5C 0.00 0.891 0.489 0.856PPP4R1 0.00 0.881 0.493 0.855PPP1R12B 0.00 0.871 0.497 0.855SNRK 0.00 0.911 0.441 0.847PPFIA1 0.00 0.802 0.497 0.844EEF2K 0.00 0.851 0.415 0.831SIK2 0.00 0.871 0.393 0.829EGLN1 0.00 0.851 0.390 0.825TP53RK 0.00 0.822 0.394 0.821SMG1 0.00 0.802 0.394 0.818

Table 8: Experimental genes ranked by multiple robustness metricsSensitivity represents fractional representation in a family of 100 networks created with10% noise added to the interactome edges. Specificity represents the fractional repre-sentation in a family of 100 networks created with random inputs; 1 -Specificity is used forranking as a low specificity indicates robustness to this type of randomization. Closenesscentrality reflects a node's robustness as this metric indicates association to other nodeswithin the network. Shading highlights the highest scoring nodes from this analysis.

77

BCL2 0.00 0 901 0 964 0 975RAF1 S499 000 0 950 0 876 0 961BCL2_ S70 0.00 0.970 0.849 0.958XIAP 0 00 0 950 0.851 0.955GASP9 0.00 0 941 0.835 0 950CA SP3 0 00 0 931 0 821 0.945CASP9_ S183 0.00 0.960 0.799 0.944BCL2_Ser87 0.00 0.950 0.795 0,941PIN1 S16 0.00 0.941 0.786 0.938SLAMF1 .Y307 0 00 0,990 0738 0.933NR113_T38 0.00 0.990 0,736 0.933PTBP1 S16 000 0,970 0.736 0.930LCK Tyr394 0.00 0 970 0-732 0.929VASP S157 0.00 0950 0.739 0.928TRPV5_3S654 0.00 0.950 0.736 0.927CASP9_.Y153 0.00 0.990 0.688 0.921PIN1 S138 0.00 0.950 0.670 0.910WNK3 0.00 0.950 0.667 0.909CDC25ASer16Ser321 0.00 0.931 0.656 0.904TAB1 0.00 0.842 0.703 0.901TSC22D4 0.00 0.871 0.670 0.898VASP 0.00 0.941 0.614 0.895SLAMF1 0.00 0.891 0.616 0.887PPP4C 0.00 0.881 0.572 0.875SH2D1B 0.00 0.891 0.525 0.865PTEN 0.00 0.683 0.656 0.864EEF2K_S78 0.00 0.950 0.470 0.861SIK2 T175 0.00 0.921 0.442 0.849SNRKT173 0.00 0.911 0.441 0.847TP53_S15 0.00 0.644 0.443 0.805STK11 0.00 0.535 0.503 0.803LCKS42 0.99 0.990 0.842 0.373PTENY240 0.99 0.970 0.768 0.351MYH9 S1916 0.99 0.980 0.751 0.348GMFBS72 0.99 0.980 0.736 0.345CASP9 Thrl25 0.99 0.960 0.685 0.329MYH9 0.99 0.980 0.630 0.318MAPT Ser 98Serl 99Ser202Thr2O5Thr212Ser214Ser235Ser262Ser396Ser404Ser409 0.99 0.950 0.568 0.298MYH9_T1800 0.99 0.990 0.538 0.297PTENT383 0.99 0.921 0.571 0.294MAPK3_na 0.99 0.782 0.656 0.293ING4 0.99 0.960 0.438 0.268PPP1CB 0.99 0.723 0.579 0.265

Table 9: Predicted genes ranked by multiple robustness metricsSensitivity represents fractional representation in a family of 100 networks created with10% noise added to the interactome edges. Specificity represents the fractional repre-sentation in a family of 100 networks created with random inputs; 1-Specificity is used forranking as a low specificity indicates robustness to this type of randomization. Closenesscentrality reflects a node's robustness as this metric indicates association to other nodeswithin the network. Shading highlights the highest scoring nodes from this analysis.

78

:111 11, 1111 A, 111311 1T, 11,!!

NFxB activation and TGFa shedding share common regulators

Our network contains a 15-gene cluster with high centrality, specificity, and low sensitivity

(12). This cluster contains 7 experimental genes from the original screen: PPEF1, PPEF2,

OBSCN, CALM1, IRAK1, PPP1 R1 2B, and AKAP1 1. shRNAs against all of these genes

decreased shedding except for AKAP1 1 where shRNAs induced shedding. We show the

original screening data as lightening plots in Figure 20.

Many of these genes have roles in NF-xB and apoptotic signaling; Loss of obscurins

increased cell survival and is associated with decreased caspase 3 and 9 activity[1 17].

TAB1 interacts with TAK1 which is an IKK kinase[23]. Calmodulin is needed for IKK acti-

vation during hypoxic conditions[23]. IRAK1 is an intermediate in the IL-1p induced NF-xB

pathway[27]. In a meta-analysis of late onset Alzheimer's disease, PPEF1 was associated

with disease pathology and NF-xB was one of the most perturbed pathways[85]. PPEF2

is a negative regulator of ASK1, which can disrupt NF-xB activation[96, 80]. There is ev-

idence that the NF-xB and EGFR signaling pathways positively regulate each other, but

there is not evidence of NF-xB regulators affecting TGFa cleavage.

The cluster also predicts that TAB1 and XIAP are closely related to TGFX shedding, yet

they were not included in our original analysis. To validate these predicted genes, we used

3 shRNAs to test each gene target, and measured total TGFa, surface-bound TGFot, and

plot the relative amount of TGFe at the surface (13); we made these measurements with

and without PMA stimulation. TAB1 shRNAs consistently reduce the proportion of TGFa

at the surface, and show little effect on total TGFa. shRNAs against XIAP showed mixed

effects on the total TGFa and the proportion at the surface. However, the amount of TGF

at the surface track with XIAP knockdown, and this suggests a dosing relationship between

XIAP levels and TGFa. XIAP and TAB1 also have roles in regulating the NFxB pathway

and cell death. This identifies a novel link between regulators of the NFxB pathway and

ectodomain shedding of the EGFR ligand, TGFa.

79

I-A

t

Relative TGFa at the surface (TABI)nsa

0.4- --7

unstim -- -

.1

stimulated

TGFa total (TABI)

20000- s

m stimI7 77- -

stimulated

TGFa at the surface (TAB1)i C, M

400 -

unstim - -

Mmm

100-

0-

Figure 13: TAB1 increases and XIAP decreases TGFa at the surfaceWe measure the proportion of TGFI at the surface (A,B), total TGFY (CD), and TGFa at

the surface (E, F) for 3 shRNAs against TABI (left column) and XIAP (right column). qPCRresults for all shRNAs shown for shRNAs against TAB1 (G) and XIAP (H).

80

B Relative TGFa at the surface (XIAP)

Ls-n Ia 4

stimulated

D TGFa total (XIAP)

m

atim - -- --- unStim ---- - - ---

a .. I 4 . a .a

stirrulated

F TGFa at the surface (XIAP)

4000- __ --

Unsns

aiim *

stimulated

Ii XIAP PCR

100.

50 AE

C

0.

EF

G

M

E


All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F), Andreas Herrlich

(A.H.), Eirini Kefaloyianni (E.K.), Christina Harrison (C.H.), and Jennifer L. Wilson (J.L.W.)

Contributions: J.L.W., A.H., E.F., and D.A.L. designed the project. J.L.W., C.H., E.K.

and A.H. acquired the data. J.L.W. analyzed the data. J.L.W. wrote the manuscript. E.K.,

A.H., E.F., and D.A.L. provided materials and reagents. A.H. and D.A.L. supervised the

research.

Discussion

Here, we started exploring regulators of TGFa shedding in an effort to better explain the

relationship between the EGFR pathway and its relevance to gastric cancer. Through a

combined computational and experimental approach, we identify NFxB pathway regula-

tors as affecting TGFa shedding. We validated the relevance of these gene components

through experimental validation. Our work demonstrates interplay between the NFkB and

EGFR pathways and suggests novel consequences for modulating these pathways, sin-

gularly or in combination. Though, there exists a link between NFxB and resistance to

EGFR therapies, no one has explored whether growth factor shedding is involved with this

form of resistance. It's possible that dual inhibition of NFkB and EGFR was not success-

ful because upstream modulators of the NFkB pathway affect TGFa cleavage. Targeting

the NFkB pathway downstream of these common regulators could result in more growth

factor in the environment, competing with EGFR inhibitors. We lastly hypothesize a path

forward using existing IRAK1 inhibitors, as this target sits at the nexus of the NFkB and

TGFa pathways.

Current literature has characterized interplay between NF-xB and EGFR, but has not

yet identified ectodomain shedding as a mechanistic link between these pathways. The

NF-xB pathway can promote epithelial to mesenchymal transition in cancer cells through

up-regulation of matrix metalloproteinases[63, 59]. EGFR family growth factors can initi-

ate and activate NF-xB signaling using pathways that are distinct from cytokines[133]. In

head and neck squamous cell carcinoma, EGFR and TGFt genes are up-regulated con-

81

currently with NF-xB signaling components[1 10]. The NF-xB activators, IKKat and IKKp

enhance EGFR signaling, and activation of both lKKs can induce EGFR, TGFa, and Jun

expression[1 10].

Previously, IRAK1 had limited connections to gastric cancer, but has been better char-

acterized in other pathologies. This gene has been identified as a prognostic factor in gas-

tric cancer through its seed match with miR-146a which targets both EGFR and IRAK1.

miR-1 46a was significantly under expressed in clinical gastric cancer samples, and ectopic

expression reduced invasion and down-regulated both EGFR and IRAK1[77]. IRAK1 has

been characterized in breast cancer where it is over-expressed in triple negative breast

cancer (TNBC) samples and is involved in resistance to microtubule-targeting agents. Pa-

clitaxel induces IRAK1 phosphorylation and subsequent release of pro- NFkB cytokines

preventing apoptosis[1 55]; this effect is also observed with vincristine. The group also

saw that using IKKB inhibition could not fully abrogate effects of IRAK1 - NFkB activation,

implying that an alternative mechanism may exist. Lastly, the group explored the use

of traditional Chinese medicines, Ginseng products ginsenoside Rbl and its metabolite

compound K (CK), to modulate IRAK1. The metabolite CK could recapitulate the IRAK1 -

inhibitor phenotype. IRAK1 has also already been explored for its ability to modulate NFkB

activity in myelodysplastic syndromes (MDs) [125].

Modulating the NFxB pathway could have other effects on EGFR dynamics beyond

affecting TGFa shedding. In the case of CAPE, another modulator of NFxB activation,

administration of this compound increased the anti-tumor effects of tamoxifen in breast

cancer[ll01], and sensitized ovarian cancer cells to chemotherapy[46]. In gastric cancer,

CAPE inhibited proliferation of gastric cancer cells on multiple substrates and reduced

the expression of MMP9[78]. While, we know that sheddase enzymes are involved in

ectodomain shedding, we have not characterized their role in NFxB modulation and re-

sponse to targeted therapies, the effects of this network on the sheddases themselves

remains undetermined.

While we focus on one potential therapeutic route forward, our model also predicts new

mechanistic information about XIAP a gene that is currently a focus for therapeutic devel-

opment. XIAP-TAB1 binding is an upstream regulator of NFxB signaling[21] and over ex-

82

pression of XIAP is associated with loss of apoptotic signaling in cancer[1 11]. XIAP binds

and inhibits caspases 3 and 9, and also TAK1; binding TAK1 causes ubiquitination and

degradation of TAK1 to prevent its activity with JNK[73]. NF023, a new compound, inhibits

XIAP-TAB1 binding, and thus could be a novel pro-apoptotic therapeutic in cancer[21 ]. Our

work suggests that XIAP targeting can affect EGFR signaling by modulating TGFa cleav-

age. Reducing TGFt at the surface may diminish the effects of the autocrine loop between

TGF and EGFR[79] and may be another justification for why XIAP targeting could be an

effective cancer therapy.

Our network approach is useful for identifying how signaling pathways intersect and

can affect targeted therapies. Already, single RTK-targeted therapies have had success

in circulating tumors and mixed success in solid tumors. The idea that solid tumors are

genetically more complex than circulating tumors suggests that multi-targeted therapies

may be more successful, and further, that instead of targeting multiple RTKs, a successful

approach may involve the incorporation of signaling intermediates, such as the case of ABL

as a down-stream mediator of MET[44]. While discovering how these pathways intersect

remains a difficult challenge, our application of network methods to understand growth

factor shedding demonstrates a path forward.

Methodologically, we offer a few considerations for future work modeling RNAi data: We

compared three methods based on different RNAi assumptions and find that methods that

equally weight all shRNAs diminish any possible real effects. The median method is simple

and accessible, but assumes that all shRNAs are creating an on-target effect and that the

cell cannot compensate for these perturbations. Though, given literature about seed-driven

effects, and many publications about off-target effects, we are unconvinced that the median

approach is appropriate[97, 71,161, 42]. Ranking methods such as shEnrich (e.g. RIGER)

offer the richest path forward.

The shEnrich method is robust to redundant shRNA performance and is useful for se-

lecting targets from the initial screen. This is unsurprising, as many other groups have

also shown that similar enrichment methods reduce the role of off-target effects on screen

interpretation[140, 89, 171, 72, 121]. Though, even this approach lead to inconsistent

validation results, indicating residual complexity in RNAi effects. However, using the net-

83

work approach as a filter accommodates this type of complexity and can confirm a gene's

membership in the final target set.

The two-best shRNAs method is also simple, but yields few targets. In using shEnrich,

it appeared that the enrichment score trended with the two strongest shRNAs (e.g. for

PRKCA and TRPV5). So while simplistic but not comprehensive, the two-best shRNA

approach may have merits as a first pass analysis technique.

It's unclear as to what number of redundant shRNAs are necessary and sufficient for

conducting pooled shRNA screens. The field has moved to screens with increasing num-

bers of RNAi reagents, sometimes using more than 20 shRNAs against a given gene. This

approach does increase statistical power for analyses, though it disregards the richness

of pathways-based approaches for sorting out off-target effects. Our network analysis and

scoring comparison suggests that smaller numbers (-5 shRNAs) can be sufficient when

analyzing screens with network-filtering approaches.

The network approach may also minimize seed-driven RNAi effects. In a screen for host

factors responsible for viral and bacterial infection, the authors demonstrated that specific

seed sequences were more predictive of phenotypic outcome than their gene targets[42].

In our screen, seed-driven effects may be responsible for measured TGFa cleavage. In

the case of the PRKC family, redundant shRNAs against the same gene had varying phe-

notypes in the initial shEnrich analyses, yet almost every family member had at least one

shRNA with a relatively strong effect. Second, subsequent validation of PRKCA with mul-

tiple shRNAs showed both decreased and non-affected levels of TGFX cleavage. The

network identified just PRKCZ and PRKCA as the most likely candidates; thus, network-

based approaches may be sufficient filters for reducing the tendency for seed-driven RNAi

effects.

For pathways discovery, interactome-based methods offer a means to discover path-

ways that are tissue non-specific. When conducting pathways discovery based solely on

experimental data, one is constrained by the elements expressed within the experimen-

tal tissue. By interpreting these experimental results in the context of an interactome,

we create a larger possible pathway landscape. It's possible that all components of an

interactome-derived pathway are not expressed in all tissues. In this case, filtering the

84

pathway after creation maybe appropriate. However, the initial pathway model stands as a

general reference pathway and is likely more comprehensive than a pathway derived from

any single tissue or cell model.

85

Methods

shRNA screen for regulators of TGFx shedding

The initial screen was conducted in Jurkats and is fully described in [25]. An important

feature of this screen is that it uses a Jurkat-TE cell line which expresses TGFc with an

intracellular GFP and extracellular HA tag.

shEnrich method

The shEnrich method analyzes redundant shRNAs against a given gene to select for con-

sistency and strength of biological effect (in our case, projected effect on TGFL shedding).

The method rank-orders all shRNAs and calculates a moving enrichment score (ES) based

on summing the probability of finding an shRNA within the gene's family (probability of a

hit, "phit") and the probability of finding an shRNA outside the family (probability of a miss,

"pmiss"). For each gene's shRNA cohort, there is a total effect represented by the sum of

all shRNAs z-scores; the phit is a fractional representation of how much of this total effect

is captured at each rank. The p_miss is a fractional penalty equivalent to the inverse of the

number of nom-family shRNAs in the screen.

ELISA assays in MDA-231

Validation of screen hits was conducted using MDA-231 cells plated in 96 well plates. 90%

confluent cells were stimulated with 100 nM PMA for 2 hours after which time supernatant

was collected. A microsphere-based Luminex Technology and ELISA kits (DY239, R&D

Systems) were used to measure a panel of shed growth factors, as previously described

[129].

A phosphorylation-site specific interactome

We used the set of human interactions contained in version 13 of the iRefIndex database

[124] as our source for protein-protein interactions, which consolidates information from

a variety of source databases. We used the Mlscore system [152] to assign confidence

86

scores (ranging from 0 to 1) to these interactions, which considers the number of publica-

tions (publication score), the type of interaction (type score), and the experimental method

used to find the interaction (method score). We extracted the relevant scoring information

for interactions from the iRefindex MITAB2.6 file, using the redundant interaction group

identifier (RIGID) to consolidate interactions between the same two proteins reported by

multiple databases, and used a java implementation of Mlscore (version 1.3.2, available

at https://github.com/EBI-IntAct/miscore/blob/wiki/api.md) with default parameters for indi-

vidual score weights. We only considered interactions between two human proteins (i.e.

we excluded human-viral interactions) and converted protein identifiers, generally provided

as UniProt or RefSeq accessions, to valid HUGO gene nomenclature committee (HGNC)

symbols. Once converted, we removed redundant interactions (generally arising from

isoform-specific interactions that map to the same protein/gene symbols) and retained

the maximum observed score. This produced a total of 175,854 unique protein-protein

interactions.

To this interaction set, we added predicted and experimental interactions for kinase and

phosphatase sites. We collected the kinase-site interactions from Phosphosite [60] and

phosphatase-site interactions from DEPOD[86]. Where site-specific information existed,

we created edges in the interactome, first from the kinase/phosphatase to a substrate-site

node (represented in the network as 'PROTEIN_SITE') and second from the substrate-site

to the substrate (represented as 'PROTEIN'). The kinase/phosphatase to substrate-site

interactions were scored using a modified miscore framework[152]. This scoring method

is a normalized, weighted sum of an interaction type score(Sp), evidence type score(St),

and publication value score(Sp):

S KpSp+KmSm+KtStKp+Km+Kt

Using the miscore framework as a guide, all interaction type scores (St) were set to

1 (because this interaction information was 'direct'). Publication scores were set using

the miscore scale (Sp=0.00/0.33/0.53/0.67/0.77/0.86/0.94/1.00 for 0/1/2/3/4/5/6/7 publica-

tions). The method scores (Sm) were set based on the evidence available for each data

set:

87

DEPOD Phosphosite

in vitro reaction or in vivo in one lab 0.33 in vitro evidence 0.33

in vitro and in vivo or evidence in multiple labs 0.66 in vivo evidence 0.66

in vitro and in vivo in multiple labs 1.0 both 1.0

All K values were set to 1. The interactome uses Uniprot identifiers.

Prize-collecting Steiner Forest (PCSF)

We used the genes selected by the shEnrich method and the interactome network de-

scribed above as inputs to the Prize Collecting Steiner Forest algorithm[1 46]. The genes

from shEnrich were all assigned prize values of 1 and converted to Uniprot identifiers. We

used the parameters P=10, D=10, w=1, and p=0.006 (parameter optimization explained

in supplemental methods). After finding an optimal network of genes, we augmented this

forest and added back all possible interactions among these genes from our initial interac-

tome. We converted to gene identifiers back to gene symbol and visualized the network in

Cytoscape. The PCSF algorithm is available online as part of our 'Omics Integrator suite

of tools (http://fraenkel-nsf.csbi.mit.edu/omicsintegrator/).

Sensitivity, Specificity, and Centrality Metrics

To determine a gene's sensitivity within our network solution, we create a family of 100

networks using the original prize values and genes selected from the shEnrich method,

and add 0.5% noise to the interaction edge scores in our interactome. For a given gene, we

measure sensitivity by counting the gene's representation in the family of noisy networks.

A gene that shows up in all of the networks is insensitive to noise and thus robust to

our method. Because the prize nodes are constant inputs to the algorithm, we expect

them to be insensitive to edge noise. To measure specificity, we again create a family

of 100 networks. This time, we hold the interactome constant and randomly select gene

targets from our initial library as inputs. We run the algorithm at the same parameters as

above and count a gene's representation in the family of random networks. We calculate

1-specificity to rank genes that are specific to the real experimental data and eliminate

88

biases in selection due to the initial library construction. For all centrality metrics we use

the networkX module in Python.

Measuring surface bound TGFot in Jurkat cells

For all measurements in Jurkat-TE cells, we used the following staining and washing pro-

cedure: Jurkat-TEs were spun out of media and re-suspended in 25 uL of 1:100 goat

anti-HA (lot #) on ice for 1 hour. They were washed 3 times with 200 uL cold PBS with 3%

FCS, and resuspended with 25 uL 1:100 APC-coupled anti-goat (lot#) on ice for 2 hours.

After secondary staining, they spun down, and were resuspended with propidium iodide

(PI). Relative TGFx at the surface was quantified using FACS. We analyzed all FACS data

using FlowJo. We measured the FITC and APC-A geometric means in the PI-negative

(live) population. Relative TGFa is determined from the ratio of red:green fluorescence

and represents the relative amounts of extracellular:intracellular TGF.

shRNA knockdown experiments

We ordered three Mission SHRNA Transduction Particles from Sigma Aldrich for TAB1

(clone IDs: TRCN0000381913, TRCN0000380746, TRCN0000380345) and XIAP (clone

IDs: TRCN0000231575, TRCN0000231576, TRCN0000231577). Jurkat cells were plated

at 2.5x105 cells per well in 96 well v-bottom plates, and infected with 1 uL of virus in

media with 4ug/mL polybrene. After 24 hours (day 1), cells were switched to media with

2.5ug/mL of puromycin. On day 3, fresh media with puromycin was added. On day 5, cells

were switched to regular growth media and allowed to recover before stimulation. On day

7, cells were either stimulated with 100 nM PMA or left in media. At 5 minutes cells were

spun down, media was removed, and we measured surface bound TGFa as described

previously.

89

Supplemental Methods

Parameter Optimization

We first optimized the algorithm and parameters to minimize selection of "hub" nodes (or

genes that are highly connected due to publication bias). We set these as our "grey listed"

targets and quantify their representation in 100 runs of the algorithm at various degree

penalty values (controlled by the mu parameter; data now shown).

We further optimize the remaining parameters - P, D, and w. These parameters con-

trol the algorithm's inclusion of experimental genes ("terminal nodes") relative to non-

experimental genes ("hidden nodes"), network size, and relative density. We ran PCSF

over 36 parameter configurations, using all combinations of the following sets: P [1,10,100],

o [0.1, 1, 10, 100], and D [10, 20, 30]. Briefly, p sets the relative cost of capturing gene

(node) prizes with adding edges, o tunes the initial cost associated with connecting the

dummy node to all terminal prizes in the input set, and the depth, D, controls how long a

given pathway goes in the final network. We look for a balance of efficiency (ratio of ter-

minal nodes:hidden nodes), network size, and tree composition (rejecting parameter sets

that contain any sub-trees with 3 or fewer nodes.

We find the optimal network parameters by looking for a high efficiency (high termi-

nal:hidden node ratio), reasonable network size relative to the input set, and exclusion of

grey-listed nodes (Figure 17). After vetting the sub-network creation, we select an optimal

parameter setting of p=1, D=30, co=10, and ji = 0.006 (Figure 12).

We also consider representation of hub nodes, or nodes that have extremely high con-

nectivity in the interactome, and consider how often a parameter set includes these hub

nodes in the final solution. Prior to running PCSF simulations, we identify a set of grey-

listed nodes by ranking all nodes in our interaction by degree and taking those with a

degree > 200 to add to the grey list. At each parameter set, we count the fractional rep-

resentation of these grey nodes in 100 runs of PCSF with random input prizes. Each

heatmap shows this fractional representation across the previously mentioned 36 param-

eter configurations. The final optimal parameters were P=10, D=1 0, co=1, and pL=0.006.

90

Supplemental Figures and Tables

forw~ard NLS for al,

71 fAnily size 2

tl1~

forward NES fu, )Ifamily size 4

genes

I

A

forward NES for afarnily size 9

ag41,1 laZ

genres

LiEn' I I ~ I I06 08 10 12 14 1f 18

,nr o Nt 1

F1

1150

for ward NIS for alfArrily size 10

im 18C

0.8 20 1.2 14 1 b 8 2.0 2e

Figure 14: histograms for shEnrich (forward)Forward normalized enrichment scores (NES) histograms for shRNA families with

2/3/4/8/9/10 (A/B/C/D/E/F) shRNAs.Histograms for 5-shRNA family shown in 11. In all

panels, the dark purple distributions represent a distribution of shRNA scores calculated

from lacZ controls and the light pink distributions represent scores calculated using gene-

targeting shRNAs.

91

acZgenes

B

C100

'orward NLS for allfarriiy siZe:3

genes

foi ward NES for adfamily size: 8

genes

200

I1D

E

MRL

L&kL

'I

10D

L

2i 3 0.8 1 C 1,

AE B,I t'

NFIC

E F[UY-:7

Figure 15: histograms for shEnrich (reverse)Reverse normalized enrichment scores (NES) histograms for all shRNA families with2/3/4/8/9/10 (A/B/C/D/E/F) shRNAs. Histograms for 5-shRNA family shown in 11. In allpanels, the teal distributions represent a distribution of shRNA scores calculated fromlacZ controls and the yellow distributions represent scores calculated using gene-targetingshRNAs.

92

BA NEI 1 1 CY - 3

A B

A*I,

E

H

:i.

/

K

F

*11~L

L.

Figure 16: Scatter plots for shEnrich methodScatter plots for all shRNA family sizes in the forward (A-F) and reverse directions (G-L).

In all plots, each individual point represents an shEnrich score (y-axis) and max effect size

(x-axis) for each gene (colored points) with a particular shRNA family size or a random

sampling of lacZ controls (black points). Orange rectangles define a region for selecting

gene targets for further modeling. This region is defined as having an shEnrich score

above all lacZ controls and a max effect size < -1.0 z-scores (for forward calculations, A-F)

or an shEnrich score above lacZ controls and a max effect size >1.0 z-scores (for reverse

calculations, G-L). For most genes, they were no more consistent than the lacZ controls

and did not make the threshold for further modeling.

93

D0

-~G

J

B C

J.34I1

I

A D

B EE 60.20

'60r

C F

- d

Figure 17: Optimization of network node selection

We optimized the PCSF parameters, beta, omega, and depth. (A-C) show fractional rep-

resentation of 'grey nodes' (the nodes with the highest connectivity in our starting inter-

actome) in 100 random networks created with depth=1 0 (A), depth=20 (B), and depth=30

(C). Each row represents an individual grey node, and the columns represent different

combinations of omega (0.1,1,10, and 100) and beta (1,10,100). We additionally measure

network efficiency (ratio of terminal:hidden nodes) (D), the relative network size (E), and

the number of 1/2/3-node trees (F) at each of these parameter combinations.

94

ii

P*c

/ Ts ?D4 Ty

PIN _S138 PTBP1 S16

-- 1

EVAsp 8157 TRPV5 654

Ei1iBCL2 SaL7

EPZ2 $72

cAS#9 Y153 ASP9 183 4*3qNK3

RA1 Si199

PPPi12B OLN

A A

CDC25A Der 18Ssr321

CD 4A

jyt394 TPqXRK

'2 TP 1i5

SNII Ti 73

RKCA Experimental gene, non specific

tj:1 Experimental gene, specific

Predicted gene, non specific. sensitive

IBCL2j Predicted gene, specific, insensitive, high closeness centrality

Figure 18: The full network selected by PCSFGrey face coloring indicates genes selected from the original experimental data set; whiteface represents hidden nodes (genes and phospho-sites) selected by the algorithm. Nodeborder represents specificity to randomization (pink<=0.1; orange<=0.05), and node sizerepresents closeness centrality (larger nodes are more central and more robust). Greynode labels indicate nodes that are sensitive to edge noise. Nodes on the far right areannotated examples to help interpret all of their network properties.

95

EEF2v S78

E10K

SH211B

-jSLAJJF1 Y307

Normalized Closeness

1.00

A1 0.668

Sensitivity

1 Insensitive5/ Sensitive

1 -specificity

0.950 99 SW31 0394

_141111911111- - - - -

Genes oinneo oy centrahity metrics

L1

LFi

Uj Ki3

IS

CentralwC' tyv"

Figure 19: Centrality metrics test gene robustness.(A) Network genes ranked by Degree, Betweenness, Page-Rank, and Closeness centrality.The Aggregate represents the multiplicative sum of all ranks. (B) Gene membership incentrality metric bins. Bin 1 contains genes with low centrality scores and bin 15 containsgenes with high centrality scores.

96

C 1

CASP9 183 LAP

CASPt. t W NA~3

PF 3 H

PPPO 2BAK 11

1 K

NJ

--------- K

V -i:-- ;~h~

Figure 20: Lightening plots for NFxB regulators.The cluster of genes is shown in the upper left. Below, lightening plots for all experimentalgenes are as in 11: the enrichment score (ES) is plotted against the ranked shRNAs. TheES increases each time that an shRNA at a ranking targets the gene of interest. Otherwise,the ES decreases linearly. The grey lines are 100 representative plots of subsets of thenon-targeting shlacZ controls. Genes which are enriched in the forward/reverse directioncorrespond to genes whose shRNAs increase/decrease shedding.

97

ORM I S ft, USI-N

Two best ShONA;Gene shRNA_ I shRNA-.2PHKCA 3 1/2 1 430I HPV, 2042 1.530PPP1R14D 2.455 2.220MAST2 2.684 2.621PPEF1 2.676 1.608

MAT? 1 790

PKIB 1 472

DUSP6 1.292SSH1 -1.275FASTK 1.225PTPRE -1 219PTP4A1 1.178GUCY2C -1.121INPPSA -1101PPAP2A 1.090SSH3 1.068GMFG 1.059CDC14A 1.052DUSP9 1 050IRAK4 1 048SGKL -1,042STK33 1.042ERBB3 1.035INPPSD 1.030PPP4R1 -1.029CHP -1.024TPS3RK -1.008

Table 10: Comparison of alternative scoring metrics for redundant shRNAs.The median score reflects the median value of all shRNAs against a given gene. Medianvalues that are above or below +/- 1 z-score are shown. The two-best approach requiresthat a gene have one shRNA with an abs(z-score) > 2.0 and a second shRNA with anabs(z-score) > 1.25 to be considered a hit.

98

Part V

Mutual information identifies gene cohortspredictive of response to Dasatinibtreatment in Acute Lymphoblastic Leukemia

Abstract

Many patients suffer relapse after treatment with Dasatinib, a second-line tyrosinekinase inhibitor, yet, little is know about what genetic factors govern this process. Wesystematically measure a 20K pool of shRNAs in vivo to identify genetic mediatorsof Dasatinib response in an pre-B-cell Acute Lymphoblastic Leukemia (ALL) model.Mouse to mouse variability greatly affects the ability to easily measure shRNA effects inthis environment. We develop a mutual information heuristic to find cohorts of shRNAsgoverning response to treatment and identify a gene set that confers a growth advan-tage to ALL prior to treatment and a disadvantage following Dasatinib treatment. Theset in includes AF366264, Art4, Fati, GtpbplO, KIf13, Nxphl, and Ube2ql. Thesegenes have known associations with leukemic progression but have not been exploredas prognostic markers for predicting response to Dasatinib.

99

Introduction

Dasatinib is a second-line tyrosine kinase inhibitor (TKI) for treating hematopoietic can-

cers, yet many patients still experience relapse after this secondary treatment[137]. Point

mutations in the drug target, BCR-ABL fusion protein, are one source of resistance. In

one mutational analysis, 78% of patients harbored single mutations in the BCR-ABL, and

58% had multiple mutations in this protein[137]. Therefor it is desirable to identify which

genes modulate response to Dasatinib and thus infer which genetic lesions could abrogate

or enhance response to this therapy.

As understanding of complexity in cancer genetics increases, the field has started

adopting a systems-wide approach. It's becoming more clear that single mutations or sin-

gle sensitivities are individually insufficient for explaining disease progression, response to

treatment, and other complex phenotypes. Rather, whole pathways or groups of genes,

mutations or proteins are more informative for understanding these phenotypes, such as

drug resistance(161, 551. High-throughput screens are a tool which can uncover such

pathways.

High-throughput rna-interference (RNAi) screens in vivo are one tool for probing ge-

netic pathways and their contribution to disease. These screens have identified novel

targets in hematopoietic malignancies[97, 92]. One screen used stably integrated short-

hairpin RNAs (shRNAs) to identify genes relevant to Lymphoma progression in vivo. This

shRNA library tested thousands of targets simultaneously in mice, and identified compo-

nents of cytoskeletal dynamics specifically as in vivo regulators of the disease [92]. RNAi

experiments in mice can now accommodate tens of thousands of shRNAs and multiple

treatment conditions, expanding the use of these loss of function techniques.

Here we sought to discover genes contributing to Dasatinib response in Acute Lym-

phoblastic Leukemia (ALL). To understand this effect in vivo, we introduced an shRNA

library into mice using a B-cell ALL model and tested whether or not cells representing

shRNAs remained in the circulation before and after Dasatinib treatment. However, due

to mouse to mouse variation, there is often differential shRNA response across mice un-

der similar conditions. This consequence masks the effect of some shRNAs that may be

100

relevant to disease and/or treatment effects, and necessitates novel methods for extract-

ing these potentially informative shRNAs. Here, we employ a modified mutual information

heuristic which analyzes sub-sets of shRNAs in each mouse and looks for patterns of

co-acting shRNAs across mice. From these co-acting subgroups, we hypothesize new

pathways relevant to disease progression, and response to treatment.

101

Methods

shRNA screen and Dasatinib treatment

We infected B-cell acute lymphoblastic leukemia cells (B-ALL) in culture with a 20K library

of shRNAs at a multiplicity of infection < 1. After integration, we FACS-sorted cells for

shRNA integration and tail-vein injected 1x106 cells per mouse. At the time of injection,

we reserved 1x106 cells as the input sample. Leukemia developed over 11 days, at which

time, we sampled pre-B-cell ALL cells from the blood as the pre-treatment sample. After

sampling, mice were treated with Dasatinib at 10 mg/kg for 3 days. Mice were then allowed

to relapse, at which time, we sampled B-ALL cells as the post treatment sample. All

samples were then sequenced to determine shRNA representation.

Data Processing, clustering, and mutual information

We filtered out all shRNA counts below 100 reads and performed size-factor normalization

on raw sequencing counts using DESeq2[5]. To further reduce noise across samples, we

calculated the natural logarithm of the fold-change between pre and input samples (early

change, "pre-treatment") and post and pre samples (late change, "post-treatment"). We

collapse these fold-changes into a single (x,y) variable for clustering and plotting. In the

end, we have one data vector for each mouse (5 in total: "mouse 2", "mouse 3", "mouse 5",

"mouse 8", "mouse 9"), where each data point is a (x,y) coordinate representing the (pre-

treatment,post-treatment) log-transformed fold changes. We used hierarchical clustering

to compare the pre-treatment and post-treatment data across the five mice that survived

the screen using the Pycluster module in Python. We next clustered data points within

each mouse vector using Pycluster's K-means clustering algorithm. We used Euclidean

distance and k = 2,3,4,5,6,9,10 initially. We determined that 9 clusters represented the

'elbow' on the intra-cluster distance vs. k plot and selected 9 clusters as ideal divisions

among the data (explained in results). The selection of 9 clusters was also consistent with

our anticipated number of shRNA groups given the possible combination of responses at

each time point. We employed a mutual information metric to compare the overlap between

mice. The mutual information metric is given by the following:

102

Equation M.I. = ijp(a, b) * log (ab))

Where p(a,b) = (no. of shRNAs in both cluster a and cluster b)/(total shRNAs); p(a) M.I.

= (number of shRNAs in cluster a)/(total shRNAs); and p(b) = number of shRNAs in cluster

b)/(total shRNAs); summed over all clusters ij in the two mice under comparison(similar

to[138]).

From this analysis, we were interested in the extent to which each of the inter-cluster

components were contributing to the overall score between mice. Identifying these sub-

groups is consistent with our aim to find cohorts of shRNAs responding to disease and

treatment. To identify these clusters, we separately plotted each component comparison

in a heat map. The components are equivalent to p(a,b) * log(p(a,b)/p(a)p(b)) for a two

cluster comparison.

To correct for the amount of overlap expected by chance we background-corrected the

component scores using randomized data. For each mouse, we created random clusters

by dividing the shRNA data into 9 bins of the same size as the original clusters, and re-

peated the M.1. component calculation 100 times. We averaged these random M.l. scores

and subtracted this 'background' value from the original value. This correction method also

accounts for differences in cluster size as set by the k-means algorithm.

We used a standard hypergeometric test to compute the significance of overlap be-

tween clusters and used the scipy package in python for this calculation.

Bi-clustering

We used the clustergram function in MATLAB to bicluster the normalized, fold-change data.

In vitro validation of shRNA cohort

We designed 97mer sequences using the Euphrates website (http://Euphrates.mit.edu/cgi-

bin/shRNA/index.pl). We ordered the following oligos (forward and reverse sequences

represented for each gene):

103

AF366264AF366264Art4Art4FatiFatiGtpbpl0Gtpbpl0KIfM3KIf13Nfat5Nfat5NxphlNxphlUbe2q1Ube2ql

V2MM191661V2MM19166_2V2MM_17020_1V2MM_17020_2V2MM_109794_1V2MM_109794_2V2MM35901V2MM35902V2MM1766071V2MM_176607_2V2MM_2869_1V2MM_2869_2V2MM_981121V2MM981122V2MM_9292_1V2MM92922

TGCTGTTGACAGTGAGCGCGGGCAGAGAAATGTATCAAATTAGTGAAGCCACATGAAGCCACAGATGTAATTTGATACATTTCTCTGCCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGCGGAGACTATTCATAAAGGAATAGTGAAGCCACATGAAGCCACAGATGTATTCCTrATGAAATAGTCTCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCTGTCTGAAGCATCTGTAATTAGTGAAGCCACATGAAGCCACAGATGTAATACAGATGCTTCAGACAGGGTGCCTACTGCCTCGGATGCTGTrGACAGTGAGCGACGCACTCTTAGCAATTAATAATAGTGAAGCCACATGAAGCCACAGATGTATTATTAATTGCTAAGAGTGCGGTGCCTACTGCCTCGGATGCTGTrGACAGTGAGCGCGGCATAGATTCTATATTGTAATAGTGAAGCCACATGAAGCCACAGATGTATTACAATATAGAATCTATGCCTTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCTGTATCAGTGGGAATATATTAGTGAAGCCACATGAAGCCACAGATGTAATATATTCCCACTGATACAGGCTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGACCCTTCAAGGTGATCTGTATTrAGTGAAGCCACATGAAGCCACAGATGTAAATACAGATCACCTTGAAGGGCTGCCTACTGCCTCGGATGCTGTTGACAGTGAGCGCCCAGAGGCAAGATTACTTAAATAGTGAAGCCACATGAAGCCACAGATGTATrAAGTAATCTTGCCTCTGGTTGCCTACTGCCTCGGA

We PCR amplified these sequences, extracted the 138bp product and ligated into an

MLS + GFP vector. With this vector we transformed competent cells and verified liga-

tion products by sequencing. We maxi prepped DNA from transformed competent cells,

cleaned DNA using a zymoclean kit, and used this DNA to transfect Phoenix cells. We

collected virus from Phoenix cells and used this to infect tomato+ pre-B-cells.

We plated 2.0 x106pre-B-cell ALL cells into 6-well plates. For each shRNA construct

(including MLS) we tested 3 volumes of viral media (1.5, 2.0, 2.5 mL). After 24 hours, all

wells were sampled via FACS and the sample with the greatest infection was considered

the input (day 0) measurement for each construct. This sample was split 1:3 and main-

tained separately in culture for the duration of the experiment. The cells were split every 24

hours, we used FACS to measure the GFP percentage. On day 6, the cells were switched

to media with 1 nM Dasatinib. FACs analysis was conducted using FlowJo.

104

4 pre-B-cells representing20K shRNA pool

Input Pre Post

Amplify and sequence for shRNA representation

Figure 21: A longitudinal screen for mediators of ALL and response to Dasatinib

We introduce a 20K pool of shRNAs into pre-B-cells in culture (top). We take representa-

tive samples of this population and transplant them into syngeneic mice recipients. At the

time of input we take a representative sample and sequence for shRNA representation.At disease representation, we sample blood from leukemic mice and again sequence for

shRNA representation. At this time, we administer Dasatinib treatment. Following treat-

ment, we again sample blood and sequence for shRNA representation. We can charac-terize shRNA trends by looking at the fold-change of any shRNA at the early time point

(pre-treatment/input) and the late time point (post-treatment/pre-treatment).

Results

An shRNA screen measures shRNA representation before and after treatment

We conducted a longitudinal shRNA screen to identify genetic mediators of ALL and re-

sponse to treatment. We represent the work-flow for this experiment in Figure 21.

We normalized sequencing reads across mice, filtered reads for high variance, and

calculated early (pre-treatment/input) and late (post-treatment/pre-treatment) fold-changes

(Figure 22). We used hierarchical clustering to identify distinct groups between the early

and late change data points (Figure 22).

A mutual information heuristic selects cohorts of co-acting shRNAs

We chose to look for trends across pairwise comparisons of mice because we expected

variability across mice due to differences in library representation. The longitudinal na-

ture of the screen and the added effects of treatment make it difficult to identify a strict

105

I I S VI.H 14-

-i 4 s% .- Tt /.

I .

%I~

i,,n

. am ia

UI

r'i

L aetJ"'.iU o m

I M[4G-U. a'J

0.37 -0.25 1&

0.12-0.00-0.12-0.2-0.371

Figure 22: Hierarchical clustering of 'early' and 'late' fold-change dataBlue/red indicates shRNAs which deplete/enrich from input -> pre-treatment or from pre-treatment -> post-treatment. Color bar shows the scale of fold-changes. An enlargeddendrogram is included on the right.

106

fold-change threshold for identifying shRNAs related to disease progression and treatment

response. We were interested to see if mutual information could be extended to this prob-

lem because we saw it as a means for avoiding the need to define fold-change thresholds.

To identify cohorts within mice, we first binned the shRNAs using k-means clustering

on the log-transformed fold-change data. For each mouse, we sampled k=2,3,4,5,6,9,and

12 clusters. At each value of k, we measured the intra-cluster distance and looked for the

inflection point at which this distance stopped decreasing (Figure 23). For all mice, the

inflection point for intra-cluster distance was found at k=9. This is fitting because we hy-

pothesized 9 possible groupings based on the possible shRNA response at (early/late) time

points: (no-change/no-change), (no-change/increase), (no-change/decrease), (increase/no-

change), (increase/increase), (increase/decrease), (decrease/no-change), (decrease/increase),

and (decrease/decrease). We plot all of the shRNAs clustered per mice (Figure 24). K-

means clustering does not cleaning separate the data and it's possible that other clustering

metrics, such as fuzzy k-means, would be better suited to this data set. It is also the case

that many of the shRNAs have subtle effects making it difficult to cleaning separate them

into distinct bins. However, this approach was strongly motivated to reduce the necessity

of setting specific fold-change cut-offs. K-means clustering allowed for an unsupervised

means to create these bins within mice without having to manually set thresholds for each

mouse.

We first look across mice by asking by looking at how much overlap existed between

all pairwise combinations of inter-mouse clusters. We use Equation I and plot the results

as a heatmap in Figure 25. Because we expect the self-comparisons to have the highest

possible mutual information, we compute the highest possible mutual information score

for all comparisons and subtract this background score to more clearly compare between

mice. As expected, self-comparisons drop to near zero, but the comparisons between

other mice remain positive. Mouse 2 and mouse 9 were the most similar, and mouse 8/9

and mouse 2/3 also having relatively high mutual information.

107

ise 2

mouse 5

mouse 9W4ltha 1uster sun 0 d starte,

mouse 3

mouse 8

Figure 23: Intra-cluster distances for all miceFor each mouse, we used a k-means clustering algorithm to cluster shRNAs and calculatethe intra-cluster distance based on the early and late fold-change values.

108

I

MOMEM116-

mou

-2 4 6

mouse 2

t a... .I

mouse 5

mouse 9

*-- -.-

mouse 3

mouse 8

I 4~SO

Figure 24: K-means clustering for all mice with k=9Each point is an shRNAs where (x,y) represents (early, late) fold-changes.

109

0.9

0.8

0.7

0.6

0.5

0.4

-0 3

0.2

0. 1

00

0.105

0.090

0075

0.060

- 0,045

- 0.030

0.0 15

0.000

Figure 25: Mutual information between mice shows some mice are more similar than oth-ersWe used Equationi to determine similarity across mice and plot the raw scores (top) andthe background-subtracted scores (bottom) as a heatmap.

110

mouse 2 c 0mouse-2 c I

mouse 2_ 0,120mouise 2 e-mouse 2mouse 2mouse_3- 0.105mouise 3moose 3mnouse 3 e-moujse 3mouse-3 0.090mouse 3mouse 3mouse 3

0

,mouse-5-,mose 5 -0.075

mouse c -nouse S ,

mouse 5 1mouse 5 0moure-80

mouse E

mouje 9

moc-se-8-

mouse 0mouse s 0.015mouse 9mouse 8

mouse_0mouse 2mouse-9 0.015

mouse 9mouse 9

mouse' 0.0150

mouse 3 . -0mouse 2 -

mouse_4

mou e_ , - 0.0100mojse J

mouSe 5

Mouse2

mouse 5 2noss.5 8 .

mous:3 0.0175mousez

5

nooe910.00250

mousenouse .4

mc, Se~s, 0.0125

mouse5 - 7mouse 5 c

ose 5 e -c

mouse_ 7 e0.000mouse5

S

mcous e 5 4 -

mouse_5 c BI,)- se 6 0.0050mouse 8 8mou-,se-9 emous~e 9

monse_892 0.00250

m e'9 4-

mouse9~-6 c -7 0.0000mouse 9_c:_8I

Figure 26: Component mutual information scores show that few clusters have high mutual

information.We plot all pairwise comparisons in raw scores (A) and background-subtracted scores (B).

111

m

0 r

I I *.

" %*I .*; *

- * U U

UE U

* dl

If MA-

I1 -

m Na ICI

M= r,

0 OI I

mm 1&m.

I In. I IN m 0 v I N

a ~*iu161 U* 71

%~~ %= -_%, - . 1

U ii

Figure 27: Hypergeometric testing of component comparisons.We show the raw Fisher's exact p-values (A) and the bonferroni-corrected p-values (B).

112

* U

mu -Ua

Ue

U -

U.

UaQU

I i f I i i , a | i i i 6 a i r i a a i ] ieU

-

I A

I

-7

shRNA cohorts identify gene targets affecting disease progression and response to

treatment

We then wanted to know which groupings of shRNAs were contributing to the mutual in-

formation observed across mice and if any of these groupings were statistically signifi-

cant. To test this hypothesis, we plotted the "component" mutual information scores for all

cluster-cluster comparisons. In this context, a "component" refers to any individual shRNA

cluster. We plot a heatmap of all component comparisons in Figure 26. As expected, self-

comparisons had the highest mutual information scores in the raw values, and intra-mouse

comparisons had a score of zero (because no shRNA could be assigned to two different

clusters). To mitigate the strong effect of self-comparisons, we again calculated an average

expected mutual information score for all cluster comparisons, and background subtracted

this score from the raw values (Figure 26). The background-subtracted heatmap highlights

clusters that have more shRNA overlap than expected by chance. Clusters which have

relatively high overlap include mouse 2 cluster 0 and mouse 8 cluster 1, mouse 9 cluster

0 and mouse 3 cluster 1, mouse 9 cluster 0 and mouse 5 cluster 0, mouse 5 cluster 4 and

mouse 8 cluster 1, and mouse 9 cluster 4 and mouse 8 cluster 1. This analysis suggests

that there is relatively high overlap of shRNAs in each of these clusters, and we can infer

that the shRNAs are showing similar changes at early and late time points across mice.

Before investigating which shRNAs belonged to which cluster and the direction of their

effects, we wanted to vet our method statistically, and with other machine learning ap-

proaches. In our adaptation of mutual information, we used background subtraction to

correct for expected overlaps. Fisher's exact test is a standard, well-recognized method

for looking at statistical significance between gene groups. To compare results from our

method, we calculated a Fisher's exact test for each component comparison and use a

bonferroni mutilple hypothesis correction to filter component scores that remain significant

(Figure 27). As expected, self comparisons were the most statistically significant (because

they had 100% overlap in shRNAs). Only three other cluster comparisons remained sig-

nificant - mouse 8 cluster 1 and mouse 2 cluster 0, mouse 9 cluster 0 and mouse 3 cluster

1, and mouse 5 cluster 4 and mouse 8 cluster 1 (Figure 27).

113

Mcaa Fbxll? LelDoc*3 RimkItP L21 (rid ,Olank LrriqiD2Eting s 70002B 1 RikgynLJgt Ia2 6,3G407111ikOllr8S4 [inda

Gm51011AF366264(deliMatsUgtla2I rim37

Table 11: Gene-targeting shRNAs in significantly overlapping clusters

We plot these clusters and look at shRNA fold-changes at pre/input and post/pre (Figure28).

Group 1 shows mouse 8 cluster 1 (m8c1) and mouse 2 cluster 0 (m2cO), between which

there are 12 overlapping shRNAs (Figure 28, top). M8cl contains shRNAs that enrich from

input to pre-treatment and then deplete from pre-treatment to post-treatment and m2cO

identifies shRNAs that enrich both from input to pre-treatment and from pre-treatment to

post-treatment. Of these 12 overlapping shRNAs, there are 5 targeting known genes (Ta-

ble 11). Group 2 compares mouse 9 cluster 0 (m9cO) and mouse 3 cluster 1 (m3c1). Both

of these clusters identify shRNAs that enrich from input to pre-treatment and from pre-

treatment to post-treatment. Within this comparison there are 19 shRNAs represented in

both clusters; 13 of these shRNAs target annotated parts of the genome (Table 11). Group

3 contains mouse 5 cluster 4 (m5c4) and mouse 8 cluster 1 (m8cl) (Figure 28, bottom).

Between these clusters there are 6 overlapping shRNAs, 2 of which target genome regions

with known annotations (Table 11). M5c4 identifies shRNAs with enrich at both early and

late time points.

M8cl had significant overlaps with both m2cO and m5c4. Across all three clusters 2

shRNAs are represented in all three clusters; only one of these genes targets a gene with

known annotation: D2Ertd750e.

Group 2 (Figure 28, middle) contains the highest number of overlapping shRNAs with

known gene annotations. If we also consider overlap of gene targets (instead of consider-

ing overlap of individual shRNAs), the number of gene targets identified as overlapping in

this cluster increases to 32 (Figure 29). The full list of gene targets from this cluster is in

Table 12.

114

G ru :l 1? qhRNA

Mouse 8, cluster 1 (50 shRNAs) Mouse 2, cluster 0 (67 shRNAs)

Mouse 9, cluster 0 (50 shRNAs)0.8

,0 00.6 0

0000. 0 0

r~ %

06

00

0.8-

0.

0.8

-0.2

O

000.4

-0.2-' pre/input

Group 2: 19 shRNA overlap

Mouse 3, cluster 1 (64 shRNAs)

0.4

0L

-0.2-0.2-

0.4 0.6

0.4-0

0

8 0

A9M a a noel0.4 00.6

pre/input


0.2 0

0.8Odm 0.4 0 0.6

0 0

bpre/input


0

-0.5 0.0

0

(D0

,0*' _ 00S6b 0

0

Figure 28: shRNA fold-changes for clusters with significant overlapGroup 1 (top, red) contains mouse 8 cluster 1 and mouse 2 cluster 0. Group 2 (middle,blue) contains mouse 9 cluster 0 and mouse 3, cluster 1. Group 3 (bottom, orange) con-tains mouse 5 cluster 4, and mouse 8 cluster 1. For all figures, open circles represent allshRNAs in the cluster and red/blue/yellow indicates shRNAs which overlapped betweenthe two clusters.

115

0.2-

-0.2

-0.6-

00

pre/input

0

-0.2 0.0 0.2

pre/input

-0.2 D

-0.6-0.5 1.0

pre/input

Mouse 9, cluster 0 (50 shRNAs) Mouse 3, cluster 1 (64 shRNAs)

0.8- 0.850 * Fat1 .8Fat1

0010 * GtpbplO * GtpbplO

0 0 KI13 * KIfM3

0 0-4 0 0 Nxphl

SUbe2ql 0 Ube2ql0

0 0 post/pre

0 post/pre

overlapping genes -0.2 -. 0.4 00.6 overlapping genes

-0.2 0.0 0.2 0.4 0.6 2 prelnputpreoinput

Figure 29: M9cO and m3cl identify overlapping gene targetsScatter plots represent all shRNA fold-changes (open circles) in m9cO (top) and m3cl (bot-tom) at early (x-axis) and late (y-axis) time points. Grey circles highlight shRNAs targetinggenes that are represented in both clusters. Coloring highlights specific genes and theshRNAs targeting those genes in each cluster.

Gem tP% onnd in mo4 and mnu11190020J1290 GmOSdO OXLad41700025811 Rik GtpbplO Fhoa

633040 7011 Fik K01113 HImpIsAF386264 I.,aia SI(A Ila

B u rck3 b kql Imnes3L.1l Ncapg mrmdFall Nats Tinm3Fball 1 Nfe?12 lUhe2ql1(i'li Nxph1 [Jgtla3

Gll 01~11419 Zip7iSGmS11)1 01111290

Table 12: Gene targets overlapping between m9cO and m3cl

Biclustering benchmarks mutual information heuristic

We used biclustering to bench mark against an alternative, and widely-adopted, unsuper-

vised machine learning approach3O. Bi-clustering separates all early fold-changes from

late fold-changes. We can manually select subsets of the data and notice trends that were

also selected in the mutual information heuristic. Branch '931 " in the bi-clustered heatmap

overlaps with gene selections identified by the m9cO and m3cl overlapping cluster. This

section of the clustergram contains shRNAs against KIM 1, Eeal, Nxphl, Zfp71 9, and

Gm5546 (Figure 31).

In vitro validation measures shRNA cohort depleting in response to treatment

As a first pass at method validation, we used in vitro assays to test the response of gene

targets selected by our mutual information heuristic. We anticipated that the fold-change

values from the original screen were affected by mouse to mouse variability and wanted

116

I

I

1.q

-0.50

-0.5

-L C L L C C) C) 0) C) C

V) U) tr) W) U) C) )1 0 ) 0 0 0 0 0

CO CNJ m~ kl ob 00 M L CW) CNJE E E E E E E E E E

Figure 30: Biclustering of shRNAs across mice.The heatmap represents all shRNAs (rows) and their fold-changes in either early or latetime points (columns).

117

MEMOMMEMM-_

I

- -

NV1I 131/5/- 2 MM 1241025";MM 215301

,MM 2051b9V MMI 96112

M V2MMA 128:)bV2MM 1400b8V2MM 1 b948b

rl V2MM 11660i

V2MM 128193V2MMA 1 / Z33V2MM 13:35/1

~ o c ~ Q~ 0 0 0 0

m~ (VI o) In 00 0 IE E E E E E E E E

Figure 31: Branch 931 recapitulates trends in m9cO/m3cl clusters.As in Figure 30, rows represent individual shRNAs and columns are fold-changes at earlyand late time points. Branch 931 contains a cohort of shRNAs that target similar genes asidentified in m9cO and m3cl.

to use follow-up assays to test their response. Using this assay we could see that our

cohort set of shRNAs enriched relative to and MLS control,yet depleted after Dasatinib

treatment. This suggests that loss of AF366264, Art4, Fati, Gtpbpl0, KIfM1, Nxphl, and

Ube2ql confer a growth advantage to pre-B-celI ALL. Depletion of these shRNAs following

treatment suggests that loss of these genes confers sensitivity to treatment.

118

Mm_ 7 __M

Percent GFP /oGFP relative to MLS

MLS MLSAF366264. AF366264A04 Art4

IL FatT FatGtpbpl0 GtpbplO

-0- KIfM3 -0- K~fl3_ Nxphl MLS- Nxph

10 + Ube2ql * 10 + Ube2ql

t t+Dasatinib +Dasabnib

Figure 32: In vitro competition assay shows shRNA cohort sensitizes ALL to DasatinibtreatmentWe measure the percentage of cells expressing an shRNA ("percent GFP", top) for eachsingle infected population, and track this percentage for the duration of the competitionassay. At day 6 we introduce Dasatinib treatment. Each point represents 3 biological repli-cates. We additionally calculated the fold-change relative to a non-targeting MLS control.


All authors: Douglas A. Lauffenburger (D.A.L.), Ernest Fraenkel (E.F.), Michael Hemann

(M.H.), Eleanor (Cameron) Fiedler (E.C.F.), Christina Harrison (C.H.), and Jennifer L. Wil-

son (J.L.W.)

Contributions: J.L.W. designed the project; E.C.F, J.L.W. and C.H. performed research;

D.A.L., M.H., and E.C.F., contributed new reagents/analytic tools; J.L.W. analyzed data;

J.L.W. wrote the paper; and D.A.L. supervised all aspects of the work.

Discussion

Here we present a mutual information heuristic for identifying cohorts of co-acting shRNAs.

The approach was conceptually motived by the fact that complex biological phenotypes are

governed by pathways, and at a simplistic level we can begin to discern these pathways by

looking at co-acting genes. We derived a heuristic for selecting cohorts of these shRNAs

based on mutual information. The method identified a set of genes that conferred growth

advantages to ALL and to Dasatinib treatment. Among this cohort we validate AF366264,

Art4, Fat1, Gtpbpl0, Klf13, Nxphl, and Ube2q1 and show that this cohort enriches from

input to pre-treatment and then depletes following Dasatinib treatment.

There are many factors affecting shRNA performance in screens with high library com-

119

plexity that are longitudinal or subject to treatments. Mouse to mouse variability can greatly

skew results for an individual shRNA. In our screen, if an shRNA did not integrate prop-

erly, we would not measure its effects at all. On the other hand, if an shRNA is inserted

behind a highly active promoter, shRNA representation can be drastically inflated at any

time point and create inflated fold-changes at either time point. Further, preparations prior

to sequencing can greatly affect shRNA representation during the experiment. In the end,

these effects make it difficult to use standard approaches such as averaging fold-changes

across mice.

Given these confounding effects, it is necessary to think about novel approaches for an-

alyzing the data to find signal among these measurements. We variance filtered the data

set to remove the most extreme of insertion effects and to reduce computational time. Fur-

ther, initial clustering and later biclustering saw more similarity between treatment groups

than between individual mice, suggesting that inter-mice comparisons would be viable for

looking at trends between early and late time points. This heuristic also eliminated the

need to define fold-change thresholds for the individual shRNAs. A disadvantage is that

our heuristic creates an arbitrary score for all component comparisons. We used this score

to rank and select clusters for further validation. Bi-clustering identified similar trends, but

this method requires manual selection to find gene sets for further validation. The Fisher's

exact test creates the most interpretable score for these comparisons by identifying a p-

value. Experimental validation verified the utility of our approach for data sets with low

signal-to-noise. As a proof of principle, the mutual information heuristic is viable as a novel

means for considering these data.

We computed this method only looking at individual shRNA performance but could have

considered other pre-processing before using our heuristic. For instance, when looking at

the overlap between mouse 9 cluster 4 and mouse 3 cluster 1, we find overlaps in the gene

targets within these clusters even though the mutual information heuristic only calculated

overlap based on individual shRNA performance. We could have also calculated scores

per gene based on redundant shRNAs and then looked for trends in gene performance.

However, we pursued this approach to minimize data processing and chose to rely on the

heuristic to look for trends.

120

Many of the gene targets in this shRNA cohort are associated with leukemia, and none

are associated with response to Dasatinib. Art4, which encodes Ecto-ADP-ribosyltransferase

4, is associated with chromosomal translocation events in pediatric ALL patients at re-

lapse [16]. Fat1 encodes a development-associated protein which acts as a tumor sup-

pressor. However, Fat1 is aberrantly expressed, frequently mutated, and associated with

advanced disease in solid tumors [105]. Recently, NGS of adult ALL patients confirmed

a high occurrence of FAT1 mutations in T-cell ALL patients, though, Fat1 did not corre-

late directly with any treatment response phenotype [105]. KIf13 is a zinc-finger family

transcription factor that had increased expression in dexamethasone sensitive ALL patient

derived xenografts as compared to insensitive xenografts [68]. Neurexophilin-1 is a ligand

for the alpha-neurexin cell adhesion molecules. In addition to affecting synapse signaling,

Nxphl can suppress hematopoietic cell differentiation [76]. Nxphl is also hyper-methylated

in low grade breast carcinoma suggesting that this gene would be a viable biomarker for

disease [36]. Ube2q1 is a ubiquitin-conjugating enzyme that shows decreased expression

in pediatric ALL patients when compared to non-leukemic controls [131]. Gtpbp1 0 is a

small G-protein is a signaling protein without any annotated function in cancer. Though, it

trended with genes that have roles in ALL development and disease. Generally, published

evidence confirms that loss of cohort genes is associated with advanced disease. This

work confirms that finding, but also predicts that low expression of these genes could sen-

sitize these leukemias to Dasatinib treatment. This cohort may also serve as a prognostic

marker for predetermining if patients would respond to second-line Dasatinib treatment.

Further molecular biology and in vivo validation would improve the predictions made by

this method. In vitro characterization was a relatively quick and useful first step, but does

not tell the whole story as to why or how these gene targets are affecting ALL progression

or response to Dasatinib. Further, there were complications with the MLS control leading

to large deviations in these measurements.

121

Part VI

Conclusions and Discussion

This thesis has demonstrated the utility and power of pathways-based modeling for RNAi

loss of function screens. The popularity of RNAi techniques waned when discovery of

off-target effects cast doubt on the tool's power {Jackson2003 Bickle}. Yet, compared to

other functional genomic techniques, RNAi is still the most cost-effective and most easily

adaptable to many experimental systems. For this reason, scientists have endeavored to

compensate for these limitations and improve the technology; this has caused a modest

and cautious return to RNAi technology {Bickle}. Yet, few investigations have considered

the use of biological networks for overcoming these limitations. This thesis explores these

network models and identifies viable hypotheses from existing RNAi studies. The success

of this application further supports the revival of RNAi investigations, and underscores the

value of 'network filtering' as a necessary companion to these investigations.

This work began with a thorough characterization of the limitations of RNA-interference.

Chapter 1 reviewed the role of off-target effects and the current approaches to mitigate

these effects. This review also highlighted the absence of pathway-based modeling for

RNAi screens. Chapter 2 then applied network-based integration to identify novel media-

tors of Acute Lymphoblastic Leukemia (ALL). This work demonstrated the utility of network-

based integration and identified a novel tumor suppressor, Wwpl, in the context of ALL.

Next, Chapter 3 modeled a novel pathway governing shedding of the growth factor TGFa

and used this pathway to explore mechanisms of growth-factor induced resistance to tar-

geted chemotherapy. This model identified common upstream regulators of NFxB and

TGFa, and suggests novel genes to target therapeutically. While this work focused specif-

ically on effects in gastric cancer, the network model is general to all tissue and cell types,

suggesting that these pathway components could be useful for other cancer indications.

Lastly, Chapter 4 explored a novel computational approach for selecting shRNA cohorts

from a pooled, longitudinal screen. The technique identified a set of co-acting shRNAs

that governed ALL progression and response to Dasatinib treatment. However, further

122

molecular characterization would improve the predictions made in this investigation. In ad-

dition to the highlighted stories, this thesis contributed to the tool set used to quantitatively

examine network model predictions. Specifically, this work explored multiple randomiza-

tion and robustness metrics for these network models. These metrics include routines for

minimizing the selection of hub nodes (genes within the interaction network that are highly

connected due to study bias), specificity (routines for selecting genes that are unique to

experimental data and not selected due to bias in the initial screening library), and sen-

sitivity (routines for minimizing bias in edge scoring). This thesis also explored multiple

network centrality metrics for discerning the number and nature of interaction edges that

lead to selection of an individual gene target. So far, network centrality has been explored

mostly for its proposed utility in selecting biologically important genes. But these metrics

are also valuable for scrutinizing the genes selected by network methods. These best

practices are essential for dissemination and use of these models in future investigations

and are described in the methods and supplemental chapter sections. The ability to adopt

these analyses into mainstream is supported by the maturity of the relevant computational

algorithms. Specifically, SAMNet Web and Omics Integrator are stable implementations of

the SAMNet, PCSF, and Garnet algorithms used in this work. These tools make it possi-

ble for researchers to adapt this network modeling framework to their investigations and

bring network-filtering of RNAi data into the mainstream. This work also opened many

unanswered questions.

- Chapter 2 demonstrated the ability to uncover hidden genes from RNAi studies. Yet,

without the ability to validate every prediction in the network model, it is unclear if and

how much we can improve validation rates by using these modeling approaches as

to compared to existing statistical methods.

- In both chapters 2 and 3, I applied a general interactome to develop the optimal sub-

network. I did not filter the network for tissue specific expression and instead con-

sidered all possible interactions. However, in practice, all documented interactions

are not relevant in every tissue and cell type, and this could bias gene candidate

selections that include genes not expressed in the system of interest.

123

- In chapter 3, I explored augmentation of the interaction network with novel biological

interactions. Specifically, I added predicted phosphorylation and de-phosphorylation

events from Phosphosite and DEPOD. Computationally, it is relatively easy to incor-

porate these predictions in the network and the approach is attractive for considering

any type of post-translational modification. Currently, though, the work lacks thor-

ough validation of these predictions for biological relevance.

- Currently, there is no consensus in the RNAi field as to how many redundant shRNAs

are sufficient for screening applications. Anecdotally, experts claim that at least 10

shRNAs are required to have confidence in a gene's selection. In practice, each gene

is differentially sensitive to RNAi perturbation, suggesting that even with increasing

number of redundant shRNAs, some genes may not show a phenotype. Further,

this work's emphasis on pathways-level effects suggests that very high coverage

may not be necessary for initial screening approaches. Thus, especially as network

approaches become mainstream, it may be more beneficial for scientists to focus on

post-processing and analysis instead of high-coverage libraries. Though, the validity

of this claim would drastically be improved if we could better estimate validation rates

from network-modeling approaches.

- These networks may offer a viable path for parsing the effects of gene homologues.

For instance, the shRNA screen in chapter 3 included redundant shRNAs for all mem-

bers of the PRKC family. Initial screening data for all members of this family had at

least one shRNA performing at the extreme end of the distribution, though our final

network solution only included PRKCa. It is widely accepted that there are seed-

matching effects that cause shRNAs to affect genes beyond the original target, so it

is possible that many of these extreme effects are due to seed-matching with gene

homologues. Using a network filter could reduce the selection of genes affected by

seed-matching by requiring connectivity to other experimental genes. Conceptually,

this approach could be modeled after work in the Fraenkel lab using network models

to make predictions from untargeted metabolomic data. These approaches allow an

unassigned m/z value to be connected to multiple nodes within the network and then

124

allow the algorithm to select the most likely proteomic candidate.

Lastly, I highlight two additional areas where I believe the combination of network mod-

els and loss of functions screens will have impact: integrated personalized medicine and

synthetic loss of function engineering.

Characterization of cancer pathways will affect personalized and precision medicine

Genome sequencing technology has matured such that it is feasible to sequence the can-

cers of individual patients and ideally define a molecular cause for disease. There were

early successes with this approach such as the discovery that BRAF is frequently mutated

at codon 600 in melanoma [40, 88, 6]. There are other similar successes in non-small

cell lung cancer (NSCLC) where scientists have identified EGFR and ALK mutations in

many patients [115, 128, 84]. These findings were great successes and motivators for the

field of molecular targeted therapeutics and spurred development of BRAF, EGFR, and

ALK inhibitors. Targeted molecular therapies showed increased survival and therapeutic

response. However, patients ultimately relapsed, suggesting that these initial molecular

vulnerabilities were insufficient for characterizing the disease[66]. In most cases patients

carry a heterogeneous set of genetic abnormalities and any cancer can contain subpopula-

tions with various combinations of mutations. Computational approaches have made great

strides in considering heterogeneous clones and demonstrating that optimal drug dosing

can prevent the outgrowth of these sub-clones [172]. Further, charting clonal evolution in

tumors can identify points of genetic vulnerability than can specifically sensitize cancers

to targeted therapies [173]. These findings highlight two distinctive features of the cur-

rent cancer treatment landscape: cancer is a molecularly-driven disease and single-agent

therapies are insufficient to prevent relapse.

Next generation sequencing technology has created a wealth of information about po-

tential molecular vulnerabilities, but it is not clear how these characterizations should guide

therapeutic development. Next generation sequencing can measure RNA expression and

epigenetic markers in addition to sequencing mutations. From these measurements, it is

desirable to have a molecular 'signature' or combination of features that can be used to

125

profile any individual patient and make clinical decisions. Frampton et al demonstrate one

implementation of this idea through the creation of a cancer genome profiling test [41]. The

test uses shotgun sequencing to identify actionable molecular features in cancer patients,

and they report a 76% rate of actionable items from this one assay[41]. Ultimately, creat-

ing these reduced assays is going to be invaluable to making clinical decisions. However,

there are unanswered questions related to what molecular entities these assays should

measure. Further, most investigations focus narrowly on a single 'omic measurement:

GWAS, mRNA, RNAi, etc. Some investigations make multiple 'omic measurements, but it

is unclear how to collectively interpret these disparate data sets.

Parallel development in understanding gene membership in pathways will influence

how these molecular assays succeed in the clinic. For instance, in pancreatic cancer,

individual mutations vary across patients. Jones et al. characterized this phenotype and

showed that these mutations occurred in a core set of signaling pathways[69]. In practice,

the cell is an integrated system that is highly interconnected; any individual component

exists in an ecosystem of interactions, and any of our experimental measurements can

only capture a subset of these processes. This realization isn't new, but its acceptance into

routine biological investigations is lacking. Data integration tools can solve these problems

and create novel representations for next generation sequencing data sets. As these tools

mature, data integration will guide molecular characterization and profiling for oncology.

Loss of function characterizations for forward engineering approaches

The field of synthetic biology is a growing discipline that could benefit from tools adapted

in this thesis. The synthetic biology field aims to engineer the cell to actuate a specific

program, and has applications for many societal problems, from medicine to green energy.

Early successes in the field, such as the creation of the bi-stable genetic switch in E. coi

[45] and the repressilator [32], demonstrated the possibility of engineering logical circuits

with biology. This field has blossomed as a result of systems biology approaches and

the ability to create recombinant DNA [15]. Synthetic biology has advanced beyond the

demonstration of simple circuitry and started tackling more complex biological problems.

126

For instance, engineered Notch receptors can tune the specificity of cell response [100]

and can be combined in engineered T-cells for selective cancer targeting[1 27]. However,

the success of many synthetic biology approaches depend on the fidelity of individual com-

ponents; circuit component expression depends on promoter activity, and leaky transcrip-

tion can prevent circuit specificity. Coordinated expression of these promoters is difficult

and can cause build up of components that are detrimental for cell growth[74]. As such,

many investigations have sought to define and characterize libraries of parts for rational

design [156]. In contrast to building large parts libraries, discovery and development of

CRISPR elements has increased the tools available by offering an alternative platform for

genome engineering. For example, a combination of constitutively active scRNAs and an

inducible Cas9 master regulator could selectively direct flow through the VioC pathway

in E. coli [168]. The ability of CRISPR activation and inhibition (CRISPRa and CRISPRi

respectively) holds additional promise for synthetic biology's toolbox[81].

However, this field depends on characterized cellular pathways to identify the parts

list associated with any biological phenotype and thus, is limited to controlling simple cir-

cuits. As synthetic biology matures and tackles more complex problems, pathway dis-

covery will become essential for more thoroughly defining the processes which we hope

to control. This thesis emphasized the uncharacterized nature of many biological path-

ways and demonstrated a means for annotating novel pathways from loss of function stud-

ies. Currently, RNAi screens have an unappreciated role in synthetic biology but could

be coupled with transcriptional synthetic technologies to actuate complex programs. For

instance, Zalatan et al's use of a Cas9 master regulator with multiple scRNAs is an attrac-

tive approach for modulating a cellular process. However, this system is limited to a sim-

ple, well-characterized pathway. For more complex systems, loss of function screens and

network models could identify molecular components amenable to RNA perturbation and

could eliminate the dependency on specific promoters for tuning component expression.

For the most part, synthetic biology has focused on the induction of required components,

but hasn't yet explored the use of RNA interference as a means of damping existing com-

ponents. This platform could become an alternative or orthogonal approach to increase

tenability of biological systems.

127

References

[1] Britt Adamson, Agata Smogorzewska, Frederic D Sigoillot, Randall W King, and

Stephen J Elledge. A genome-wide homologous recombination screen identifies the

RNA-binding protein RBMX as a component of the DNA-damage response. Nature

Cell Biology, 14(3):318-328, February 2012.

[2] C Adrain and M Freeman. Regulation of Receptor Tyrosine Kinase Ligand Process-

ing. Cold Spring Harbor Perspectives in Biology, 6(1 ):a008995-a008995, January

2014.

[3] Clint Allen, Kunal Saigal, Liesl Nottingham, Pattatheyil Arun, Zhong Chen, and

Carter Van Waes. Bortezomib-induced apoptosis with limited clinical response is

accompanied by inhibition of canonical but not alternative nuclear factor-kappaB

subunits in head and neck cancer. Clinical cancer research : an official journal

of the American Association for Cancer Research, 14(13):4175-4185, July 2008.

[4] Clint T Allen, Barbara Conley, John B Sunwoo, and Carter Van Waes. CCR 20th

anniversary commentary: Preclinical study of proteasome inhibitor bortezomib in

head and neck cancer. Clinical cancer research : an official journal of the American

Association for Cancer Research, 21(5):942-943, March 2015.

[5] Simon Anders and Wolfgang Huber. gb-2010-11-10-r106. Genome Biology,

11(1 0):R1 06, October 2010.

[6] P A Ascierto, J M Kirkwood, J J Grob, and E Simeone. The role of BRAF V600

mutation in melanoma. J Transl..., 2012.

[7] Eishi Ashihara, Tetsuya Takada, and Taira Maekawa. Targeting the canonical Wnt/-

catenin pathway in hematological malignancies. Cancer Science, 106(6):665-671,

April 2015.

[8] Sabarish Ayyappan, Dhivya Prabhakar, and Neelesh Sharma. Epidermal growth

factor receptor (EGFR)-targeted therapies in esophagogastric cancer. Anticancer

research, 33(10):4139-4155, October 2013.

128

[9] Albert-Ldszl6 Barabdsi, Natali Gulbahce, and Joseph Loscalzo. Network medicine:

a network-based approach to human disease. Nature Reviews Genetics, 12(1):56-

68, January 2011.

[101 Bonnie Berger, Jian Peng, and Mona Singh. Computational solutions for omics data.

Nature Reviews Genetics, 14(5):333-346, May 2013.

[11] Amanda Birmingham, Laura M Selfors, Thorsten Forster, David Wrobel, Caleb J

Kennedy, Emma Shanks, Javier Santoyo-Lopez, Dara J Dunican, Aideen Long,

Dermot Kelleher, Queta Smith, Roderick L Beijersbergen, Peter Ghazal, and Car-

oline E Shamu. Statistical methods for analysis of high-throughput RNA interference

screens. Nature Publishing Group, 6(8):569-575, August 2009.

[12] M Boettcher and M T McManus. Choosing the Right Tool for the Job: RNAi, TALEN,

or CRISPR. Molecular Cell, 2015.

[13] E Buehler, A A Khan, S Marine, M Rajaram, and A Bahl. siRNA off-target effects in

genome-wide screens identify signaling pathway members. Scientific reports, 2012.

[14] Frederic D Bushman, Nirav Malani, Jason Fernandes, lvdn D'Orso, Gerard Cagney,

Tracy L Diamond, Honglin Zhou, Daria J Hazuda, Amy S Espeseth, Renate K6nig,

Sourav Bandyopadhyay, Trey Ideker, Stephen P Goff, Nevan J Krogan, Alan D

Frankel, John A T Young, and Sumit K Chanda. Host Cell Factors in HIV Repli-

cation: Meta-Analysis of Genome-Wide Studies. PLoS Pathogens, 5(5):el 000437,

May 2009.

[15] D Ewen Cameron, Caleb J Bashor, and James J Collins. A brief history of synthetic

biology. Nature reviews. Microbiology, 12(5):381-390, May 2014.

[16] Cai Chen, Christoph Bartenhagen, Michael Gombert, Vera Okpanyi, Vera Binder,

Silja R6ttgers, Jutta Bradtke, Andrea Teigler-Schlegel, Jochen Harbott, Sebastian

Ginzel, Ralf Thiele, Peter Husemann, Pina F I Krell, Arndt Borkhardt, Martin Dugas,

Jianda Hu, and Ute Fischer. Next-generation-sequencing of recurrent childhood

129

high hyperdiploid acute lymphoblastic leukemia reveals mutations typically associ-

ated with high risk patients. Leukemia research, 39(9):990-1001, September 2015.

[17] Ceshi Chen. The Oncogenic Role of WWP1 E3 Ubiquitin Ligase in Prostate Cancer

Development. May 2011.

[18] Ceshi Chen and Lydia E Matesic. The Nedd4-like family of E3 ubiquitin ligases and

cancer. Cancer and Metastasis Reviews, 26(3-4):587-604, August 2007.

[19] Yanqing Chen, Jun Zhu, Pek Yee Lum, Xia Yang, Shirly Pinto, Douglas J Mac-

Neil, Chunsheng Zhang, John Lamb, Stephen Edwards, Solveig K Sieberts,

Amy Leonardson, Lawrence W Castellini, Susanna Wang, Marie-France Champy,

Bin Zhang, Valur Emilsson, Sudheer Doss, Anatole Ghazalpour, Steve Horvath,

Thomas A Drake, Aldons J Lusis, and Eric E Schadt. Variations in DNA elucidate

molecular networks that cause disease. Nature, 452(7186):429-435, March 2008.

[20] C Cobaleda and I Sdnchez Garcia. B-cell acute lymphoblastic leukaemia: towards

understanding its cellular origin. Bioessays, 2009.

[21] Federica Cossu, Mario Milani, Serena Grassi, Francesca Malvezzi, Alessandro

Corti, Martino Bolognesi, and Eloise Mastrangelo. NF023 binding to XIAP-BIR1:

searching drugs for regulation of the NF-rB pathway. Proteins, 83(4):612-620, April

2015.

[22] L M Coussens, B Fingleton, and L M Matrisian. Matrix metalloproteinase inhibitors

and cancer-trials and tribulations. Science, 2002.

[23] C Culver, A Sundqvist, S Mudie, A Melvin, D Xirodimas, and S Rocha. Mecha-

nism of Hypoxia-Induced NF-rB. Molecular and Cellular Biology, 30(20):4901--4921,

September 2010.

[24] M Dang, N Armbruster, M A Miller, E Cermeno, M Hartmann, G W Bell, D E

Root, D A Lauffenburger, H F Lodish, and A Herrlich. Regulated ADAM17-

dependent EGF family ligand release by substrate-selecting signaling pathways.

PNAS, 110(24):9776-9781, June 2013.

130

[251 Michelle Dang, Karen Dubbin, Antonio D'Aiello, Monika Hartmann, Harvey Lodish,

and Andreas Herrlich. Epidermal growth factor (EGF) ligand release by substrate-

specific a disintegrin and metalloproteases (ADAMs) involves different protein kinase

C (PKC) isoenzymes depending on the stimulus. Journal of Biological Chemistry,

286(20):17704-17713, May 2011.

[26] Joseph A DiDonato, Frank Mercurio, and Michael Karin. NF-rB and the link between

inflammation and cancer. Immunological reviews, 246(1):379-400, March 2012.

[27] D Dinour, L Ganon, L I Nomy, R Ron, and E J Holtzman. Wild-type uromodulin

prevents NFkB activation in kidney cells, while mutant uromodulin, causing FJHU

nephropathy, does not. Journal of nephrology, 2014.

[28] James R Downing. TGF-beta signaling, tumor suppression, and acute lymphoblastic

leukemia. The New England journal of medicine, 351(6):528-530, August 2004.

[29] Eran Eden, Doron Lipson, Sivan Yogev, and Zohar Yakhini. Discovering motifs in

ranked lists of DNA sequences. PLoS Computational Biology, 3(3):e39, March 2007.

[30] Eran Eden, Roy Navon, Israel Steinfeld, Doron Lipson, and Zohar Yakhini. GOrilla: a

tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC

Bioinformatics, 10:48, 2009.

[31] J R Edgar, E R Eden, and C E Futter. Hrs- and CD36- Dependent Competing

Mechanisms Make Different Sized Endosomal Intraluminal Vesicles. Traffic, 2014.

[32] M B Elowitz and S Leibler. A synthetic oscillatory network of transcriptional regula-

tors. Nature, 403(6767):335-338, January 2000.

[33] J A Engelman and P A Janne. Mechanisms of Acquired Resistance to Epidermal

Growth Factor Receptor Tyrosine Kinase Inhibitors in Non-Small Cell Lung Cancer.

Clinical Cancer Research, 14(10):2895-2899, May 2008.

[34] Jeffrey A Engelman, Kreshnik Zejnullahu, Tetsuya Mitsudomi, Youngchul Song,

Courtney Hyland, Joon Oh Park, Neal Lindeman, Christopher-Michael Gale, Xiaojun

131

Zhao, James Christensen, Takayuki Kosaka, Alison J Holmes, Andrew M Rogers,

Federico Cappuzzo, Tony Mok, Charles Lee, Bruce E Johnson, Lewis C Cantley,

and Pasi A Jdnne. MET amplification leads to gefitinib resistance in lung cancer by

activating ERBB3 signaling. Science, 316(5827):1039-1043, May 2007.

[35] H Fan and R Derynck. Ectodomain shedding of TGF-alpha and other transmem-

brane proteins is induced by receptor tyrosine kinase activation and MAP kinase

signaling cascades. The EMBO journal, 18(24):6962-6972, December 1999.

[36] Marta Faryna, Carolin Konermann, Sebastian Aulmann, Justo Lorenzo Bermejo,

Markus Brugger, Sven Diederichs, Joachim Rom, Dieter Weichenhan, Rainer Claus,

Michael Rehli, Peter Schirmacher, Hans-Peter Sinn, Christoph Plass, and Clarissa

Gerhauser. Genome-wide methylation screen in low-grade breast cancer identi-

fies novel epigenetically altered genes as potential biomarkers for tumor diagnosis.

FASEB journal : official publication of the Federation of American Societies for Ex-

perimental Biology, 26(12):4937-4950, December 2012.

[37] J Feng, T Liu, and Y Zhang. Using MACS to identify peaks from ChIP-seq data.

Current Protocols in Bioinformatics, 2011.

[38] M Fennell, Q Xiang, A Hwang, and C Chen. Impact of RNA-guided technologies for

target identification and deconvolution. Journal of..., 2014.

[39] W Fiskus, S Sharma, S Saha, B Shah, S G T Devaraj, B Sun, S Horrigan, C Leveque,

Y Zu, S lyer, and K N Bhalla. Pre-clinical efficacy of combined therapy with novel

-catenin antagonist BC2059 and histone deacetylase inhibitor against AML cells.

Leukemia, 29(6):1267-1278, December 2014.

[40] S A Forbes, G Bhamra, and S Bamford. The catalogue of somatic mutations in

cancer (COSMIC). Current protocols in ... , 2008.

[41] Garrett M Frampton, Alex Fichtenholtz, Geoff A Otto, Kai Wang, Sean R Downing,

Jie He, Michael Schnall-Levin, Jared White, Eric M Sanford, Peter An, James Sun,

Frank Juhn, Kristina Brennan, Kiel Iwanik, Ashley Maillet, Jamie Buell, Emily White,

132

Mandy Zhao, Sohail Balasubramanian, Selmira Terzic, Tina Richards, Vera Ban-

ning, Lazaro Garcia, Kristen Mahoney, Zac Zwirko, Amy Donahue, Himisha Beltran,

Juan Miguel Mosquera, Mark A Rubin, Snjezana Dogan, Cyrus V Hedvat, Michael F

Berger, Lajos Pusztai, Matthias Lechner, Chris Boshoff, Mirna Jarosz, Christine Vi-

etz, Alex Parker, Vincent A Miller, Jeffrey S Ross, John Curran, Maureen T Cronin,

Philip J Stephens, Doron Lipson, and Roman Yelensky. Development and validation

of a clinical cancer genomic profiling test based on massively parallel DNA sequenc-

ing. Nature biotechnology, 31(11):1023-1031, November 2013.

[42] A Franceschini, R Meier, A Casanova, S Kreibich, N Daga, D Andritschke, S Dilling,

P Ramo, M Emmenlauer, A Kaufmann, R Conde-Alvarez, S H Low, L Pelkmans,

A Helenius, W D Hardt, C Dehio, and C von Mering. Specific inhibition of diverse

pathogens in human cells by synthetic microRNA-like oligonucleotides inferred from

RNAi screens. PNAS, 111(12):4548-4553, March 2014.

[431 T K Fung, AYH Leung, and CWE So. The Wnt/-Catenin Pathway as a Potential

Target for Drug Resistant Leukemic Stem Cells. Stem Cells and Cancer Stem Cells,

2013.

[44] A Furlan, V Stagni, A Hussain, S Richelme, F Conti, A Prodosmo, A Destro, M Ron-

calli, D BarilA, and F Maina. Abl interconnects oncogenic Met and p53 core pathways

in cancer cells. Cell Death and Differentiation, 18(10):1608-1616, April 2011.

[45] T S Gardner, C R Cantor, and J J Collins. Construction of a genetic toggle switch in

Escherichia coli. Nature, 2000.

[46] Claudia Gherman, Ovidiu Leonard Braicu, Oana Zanoaga, Anca Jurj, Valentina

Pileczki, Mahafarin Maralani, Flaviu Drigla, Cornelia Braicu, Liviuta Budisan, Patri-

ciu Achimas-Cadariu, and loana Berindan-Neagoe. Caffeic acid phenethyl ester

activates pro-apoptotic and epithelial-mesenchymal transition-related genes in ovar-

ian cancer cells A2780 and A2780cis. Molecular and cellular biochemistry, 413(1-

2):189-198, February 2016.

133

[47] L A Gilbert and M T Hemann. Context-specific roles for paracrine IL-6 in lymphoma-

genesis. Genes & development, 26(15):1758-1768, August 2012.

[48] Luke A Gilbert, Max A Horlbeck, Britt Adamson, Jacqueline E Villalta, Yuwen Chen,

Evan H Whitehead, Carla Guimaraes, Barbara Panning, Hidde L Ploegh, Michael C

Bassik, Lei S Qi, Martin Kampmann, and Jonathan S Weissman. Genome-Scale

CRISPR-Mediated Control of Gene Repression and Activation. Cell, 159(3):647-

661, October 2014.

[49] Sara Gosline, Sarah Spencer, and Ernest Fraenkel. SAMNET: a network-based

approach to integrate multi-dimensional high throughput datasets. Technical report.

[50] SJC Gosline, C Oh, and E Fraenkel. SAMNetWeb: identifying condition-specific

networks linking signaling and transcription. Bioinformatics, 2015.

[51] Andreas Gschwind, Oliver M Fischer, and Axel Ullrich. The discovery of receptor

tyrosine kinases: targets for cancer therapy. Nature Reviews Cancer, 4(5):361-370,

May 2004.

[52] T Hart, K R Brown, F Sircoulomb, R Rottapel, and J Moffat. Measuring error rates

in genomic perturbation screens: gold standards for human functional genomics.

Molecular Systems Biology, 10(7):733-733, July 2014.

[53] Traver Hart, Megha Chandrashekhar, Michael Aregger, Zachary Steinhart, Kevin R

Brown, Stephane Angers, and Jason Moffat. Systematic discovery and classification

of human cell line essential genes. Technical report, February 2015.

[54] Charlotta Hedner, David Borg, Bj6rn Nodin, Emelie Karnevi, Karin Jirstr6m, and

Jakob Eberhard. Expression and Prognostic Significance of Human Epidermal

Growth Factor Receptors 1 and 3 in Gastric and Esophageal Adenocarcinoma. PLoS

ONE, 11 (2):eOl 48101, February 2016.

[55] M T Hemann. From Breaking Bad to Worse: Exploiting Homologous DNA Repair

Deficiency in Cancer. Cancer Discovery, 4(5):516-518, May 2014.

134

[56] Michael T Hemann, Jordan S Fridman, Jack T Zilfou, Eva Hernando, Patrick J Pad-

dison, Carlos Cordon-Cardo, Gregory J Hannon, and Scott W Lowe. An epi-allelic

series of p53 hypomorphs created by stable RNAi produces distinct tumor pheno-

types in vivo. Nature Genetics, 33(3):396-400, March 2003.

[57] Andreas Herrlich, Eva Klinman, Jonathan Fu, Cameron Sadegh, and Harvey Lodish.

Ectodomain cleavage of the EGF ligands HB-EGF, neuregulini -beta, and TGF-alpha

is specifically triggered by different stimuli and involves different PKC isoenzymes.

FASEB journal : official publication of the Federation of American Societies for Ex-

perimental Biology, 22(12):4281-4295, December 2008.

[58] Teru Hideshima, Constantine Mitsiades, Giovanni Tonon, Paul G Richardson, and

Kenneth C Anderson. Understanding multiple myeloma pathogenesis in the bone

marrow to identify new therapeutic targets. Nature Reviews Cancer, 7(8):585-598,

August 2007.

[59] B Hoesel and J A Schmid. The complexity of NF-KB signaling in inflammation and

cancer. Molecular cancer, 2013.

[60] Peter V Hornbeck, Indy Chabra, Jon M Kornhauser, Elzbieta Skrzypek, and Bin

Zhang. PhosphoSite: A bioinformatics resource dedicated to physiological protein

phosphorylation. PROTEOMICS, 4(6):1551-1561, June 2004.

[61] S s C Huang and E Fraenkel. Integrating Proteomic, Transcriptional, and Interactome

Data Reveals Hidden Components of Signaling and Regulatory Networks. Science

Signaling, 2(81):ra4O-ra4O, July 2009.

[62] Shao-shan Carol Huang, David C Clarke, Sara J C Gosline, Adam Labadorf, Can-

dace R Chouinard, William Gordon, Douglas A Lauffenburger, and Ernest Fraenkel.

Linking Proteomic and Transcriptional Data through the Interactome and Epigenome

Reveals a Map of Oncogene-induced Signaling. PLoS Computational Biology,

9(2):el 002887, February 2013.

135

[63] M A Huber, N Azoitei, and B Baumann. NF-KB is essential for epithelial-

mesenchymal transition and metastasis in a model of breast cancer progression.

The Journal of..., 2004.

[64] Julie A E Irving. Towards an understanding of the biology and targeted treatment of

paediatric relapsed acute lymphoblastic leukaemia. British journal of haematology,

November 2015.

[65] Aimee L Jackson and Peter S Linsley. Recognizing and avoiding siRNA off-target

effects for target identification and therapeutic application. Nature reviews Drug dis-

covery, pages 1-11, January 2010.

[66] Filip Janku. Tumor heterogeneity in the clinic: is it a real problem? Therapeutic

advances in medical oncology, 6(2):43-51, March 2014.

[67] Lu Jiang, Yongchang Chen, Yueying Li, Ting Lan, Min Wu, Ying Wang, and Hai Qian.

Type 11 cGMP-dependent protein kinase inhibits ligand-induced activation of EGFR

in gastric cancer cells. Molecular Medicine Reports, February 2014.

[68] Duohui Jing, Vivek A Bhadri, Dominik Beck, Julie A I Thoms, Nurul A Yakob, Jason

W H Wong, Kathy Knezevic, John E Pimanda, and Richard B Lock. Opposing regu-

lation of BIM and BCL2 controls glucocorticoid-induced apoptosis of pediatric acute

lymphoblastic leukemia cells. Blood, 125(2):273-283, January 2015.

[69] S Jones, X Zhang, D W Parsons, J C H Lin, R J Leary, P Angenendt, P Mankoo,

H Carter, H Kamiyama, A Jimeno, S M Hong, B Fu, M T Lin, E S Calhoun,

M Kamiyama, K Walter, T Nikolskaya, Y Nikolsky, J Hartigan, D R Smith, M Hidalgo,

S D Leach, A P Klein, E M Jaffee, M Goggins, A Maitra, C lacobuzio-Donahue, J R

Eshleman, S E Kern, R H Hruban, R Karchin, N Papadopoulos, G Parmigiani, B Vo-

gelstein, V E Velculescu, and K W Kinzler. Core Signaling Pathways in Human Pan-

creatic Cancers Revealed by Global Genomic Analyses. Science, 321 (5897):1801-

1806, September 2008.

136

[70] Grazyna Jurkowska, Grazyna Piotrowska-Staworko, Katarzyna Guzihska-

Ustymowicz, Andrzej Kemona, Agnieszka Swidnicka-Siergiejko, Wiktor kaszewicz,

and Andrzej Dqbrowski. The impact of Helicobacter pylori on EGF, EGF receptor,

and the c-erb-B2 expression. Advances in Medical Sciences, 59(2):221-226,

September 2014.

[71] William G Kaelin. Molecular biology. Use and abuse of RNAi to study mammalian

gene function. Science, 337(6093):421-422, July 2012.

[72] M Kampmann, M C Bassik, and J S Weissman. Integrated platform for genome-wide

screening and construction of high-density genetic interaction maps in mammalian

cells. PNAS, 110(25):E2317-E2326, June 2013.

[73] S Kaur, F Wang, M Venkatraman, and M Arsura. X-linked inhibitor of apoptosis

(XIAP) inhibits c-Jun N-terminal kinase 1 (JNK1) activation by transforming growth

factor 31 (TGF-,31) through ubiquitin-mediated ....... journal of biological..., 2005.

[74] Jay D Keasling. Synthetic biology and the development of tools for metabolic engi-

neering. Metabolic engineering, 14(3):189-195, May 2012.

[75] Boris Kholodenko, Michael B Yaffe, and Walter Kolch. Computational approaches for

analyzing information flow in biological networks. Science Signaling, 5(220):rel-rel,

April 2012.

[76] John Kinzfogl, Giao Hangoc, and Hal E Broxmeyer. Neurexophilin 1 suppresses the

proliferation of hematopoietic progenitor cells. Blood, 118(3):565-575, July 2011.

[77] Ryunosuke Kogo, Koshi Mimori, Fumiaki Tanaka, Shizuo Komune, and Masaki Mori.

Clinical significance of miR-1 46a in gastric cancer cases. Clinical cancer research

: an officialjournal of the American Association for Cancer Research, 17(13):4277-

4284, July 2011.

[78] F Kosova, FQ Kurt, E Olmez, I Tuglu, and Z Ari. Effects of caffeic acid phenethyl

ester on matrix molecules and angiogenetic and anti-angiogenetic factors in gastric

137

cancer cells cultured on different substrates. Biotechnic and histochemistry: official

publication of the Biological Stain Commission, 91(1 ):38-47, jan 2016.

[79] Christoph KOper, Helmut Bartels, Maria-Luisa Fraek, Franz X Beck, and Wolfgang

Neuhofer. Ectodomain shedding of pro-TGF-alpha is required for COX-2 induction

and cell survival in renal medullary cells exposed to osmotic stress. AJP: Cell Phys-

iology, 293(6):C1 971-C1982, October 2007.

[80] M A Kutuzov, N Bennett, and A V Andreeva. Protein phosphatase with EF-hand

domains 2 (PPEF2) is a potent negative regulator of apoptosis signal regulating

kinase-1 (ASK1). The international journal of..., 2010.

[81] Marie F La Russa and Lei S Qi. The New State of the Art: Cas9 for Gene Activa-

tion and Repression. Molecular and Cellular Biology, 35(22):3800-3809, November

2015.

[82] A Laine and Z Ronai. Regulation of p53 localization and transcription by the HECT

domain E3 ligase WWP1. Oncogene, 26(10):1477-1483, August 2006.

[83] Mark N Lee, Matthew Roy, Shao-En Ong, Philipp Mertins, Alexandra-Chloe Villani,

Weibo Li, Farokh Dotiwala, Jayita Sen, John G Doench, Megan H Orzalli, Igor Kram-

nik, David M Knipe, Judy Lieberman, Steven A Carr, and Nir Hacohen. Identification

of regulators of the innate immune response to cytosolic DNA and retroviral infection

by an integrative approach. Nature Immunology, 14(2):179-185, December 2012.

[84] Taebum Lee, Boram Lee, YoonLa Choi, Joungho Han, Myung-Ju Ahn, and Sang-

Won Um. Non-small Cell Lung Cancer with Concomitant EGFR, KRAS, and ALK

Mutation: Clinicopathologic Features of 12 Cases. Journal of pathology and transla-

tional medicine, April 2016.

[85] X Li, J Long, T He, R Belshaw, and J Scott. Integrated genomic approaches identify

major pathways and upstream regulators in late onset Alzheimer's disease. Scientific

reports, 2015.

138

[86] X Li, M Wilmanns, J Thornton, and m Kohn. Elucidating human phosphotase-

substrate networks. Science Signaling, pages 1-15, May 2013.

[87] C Lindet, F R6villion, V Lhotellier, and L Hornez. Relationships between proges-

terone receptor isoforms and the HER/ErbB receptors and ligands network in 299

primary breast cancers. ... journal of biological..., 2011.

[88] Georgina V Long, Alexander M Menzies, Adnan M Nagrial, Lauren E Haydu,

Anne L Hamilton, Graham J Mann, T Michael Hughes, John F Thompson,

Richard A Scolyer, and Richard F Kefford. Prognostic and clinicopathologic asso-

ciations of oncogenic BRAF in metastatic melanoma. Journal of Clinical Oncology,

29(10):1239-1246, April 2011.

[89] Biao Luo, Hiu Wing Cheung, Aravind Subramanian, Tanaz Sharifnia, Michael

Okamoto, Xiaoping Yang, Greg Hinkle, Jesse S Boehm, Rameen Beroukhim, Bar-

bara A Weir, Craig Mermel, David A Barbie, Tarif Awad, Xiaochuan Zhou, Tuyen

Nguyen, Bruno Piqani, Cheng Li, Todd R Golub, Matthew Meyerson, Nir Hacohen,

William C Hahn, Eric S Lander, David M Sabatini, and David E Root. Highly par-

allel identification of essential genes in cancer cells. PNAS, 105(51):20380-20385,

December 2008.

[90] K D Maclsaac and E Fraenkel. Sequence analysis of chromatin immunoprecipita-

tion data for transcription factors. Computational Biology of Transcription Factor ... ,

2010.

[91] Howard L McLeod. Cancer pharmacogenomics: early promise, but concerted effort

needed. Science, 339(6127):1563-1566, March 2013.

[92] Corbin E Meacham, Emily E Ho, Esther Dubrovsky, Frank B Gertler, and Michael T

Hemann. In vivo RNAi screening identifies regulators of actin dynamics as key deter-

minants of lymphoma progression. Nature Genetics, 41(10):1133-1137, September

2009.

139

[93] Corbin E Meacham, Lee N Lawton, Yadira M Soto-Feliciano, Justin R Pritchard,

Brian A Joughin, Tobias Ehrenberger, Nina Fenouille, Johannes Zuber, Richard T

Williams, Richard A Young, and Michael T Hemann. A genome-scale in vivo loss-of-

function screen identifies Phf6as a lineage-specific regulator of leukemia cell growth.

Genes & development, 29(5):483-488, March 2015.

[94] JP Mesirov and LA Garraway. Systematic investigation of genetic vulnerabilities

across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. In

Proceedings of the ... , pages 1-6, 2011.

[95] Aaron S Meyer, Miles A Miller, Frank B Gertler, and Douglas A Lauffenburger. The

receptor AXL diversifies EGFR signaling and limits the response to EGFR-targeted

inhibitors in triple-negative breast cancer cells. Science Signaling, 6(287):ra66, Au-

gust2013.

[96] Y Mochida, K Takeda, M Saitoh, and H Nishitoh. ASK1 inhibits interleukin-1 -induced

NF-KB activity through disruption of TRAF6-TAK1 interaction. ... journal of biological

... ,2000.

[97] Stephanie Mohr, Chris Bakal, and Norbert Perrimon. Genomic Screening with RNAi:

Results and Challenges. Annual Review of Biochemistry, 79(1):37-64, June 2010.

[98] Stephanie E Mohr, Jennifer A Smith, Caroline E Shamu, Ralph A NeumOller, and

Norbert Perrimon. RNAi screening comes of age: improved techniques and com-

plementary approaches. Nature Reviews Molecular Cell Biology, 15(9):591-600,

August 2014.

[99] M K Montgomery, S A Kostas, S E Driver, and C C Mello. Potent and specific genetic

interference by double-stranded RNA in Caenorhabditis elegans. Nature, 1998.

[100] Leonardo Morsut, Kole T Roybal, Xin Xiong, Russell M Gordley, Scott M Coyle,

Matthew Thomson, and Wendell A Lim. Engineering Customized Cell Sensing

and Response Behaviors Using Synthetic Notch Receptors. Cell, 164(4):780-791,

February 2016.

140

[101] Tarek K Motawi, Samy A Abdelazim, Hebatallah A Darwish, Eman M Elbaz, and

Samia A Shouman. Could Caffeic Acid Phenethyl Ester Expand the Antitumor Effect

of Tamoxifen in Breast Carcinoma? Nutrition and cancer, pages 1-11, March 2016.

[102] Charles G Mullighan, Jinghui Zhang, Lawryn H Kasper, Stephanie Lerach, Debbie

Payne-Turner, Letha A Phillips, Sue L Heatley, Linda Holmfeldt, J Racquel Collins-

Underwood, Jing Ma, Kenneth H Buetow, Ching-Hon Pui, Sharyn D Baker, Paul K

Brindle, and James R Downing. CREBBP mutations in relapsed acute lymphoblastic

leukaemia. Nature, 471(7337):235-239, March 2011.

[103] Takayuki Nagata, Kazuko Murata, Ryo Murata, Shu-lan Sun, Yutaro Saito, Shuhei

Yamaga, Nobuyuki Tanaka, Keiichi Tamai, Kunihiko Moriya, Noriyuki Kasai, Kazuo

Sugamura, and Naoto Ishii. Hepatocyte growth factor regulated tyrosine kinase sub-

strate in the peripheral development and function of B-cells. Biochemical and Bio-

physical Research Communications, 443(2):351-356, January 2014.

[104] Kazuhito Naka, Takayuki Hoshii, Teruyuki Muraguchi, Yuko Tadokoro, Takako

Ooshio, Yukio Kondo, Shinji Nakao, Noboru Motoyama, and Atsushi Hirao. TGF-,3-

FOXO signalling maintains leukaemia-initiating cells in chronic myeloid leukaemia.

Nature, 463(7281):676-680, February 2010.

[105] M Neumann, M Seehawer, C Schlee, and S Vosberg. FAT1 expression and muta-

tions in adult acute lymphoblastic leukemia. Blood cancer ... , 2014.

[106] Christopher W Ng, Ferah Yildirim, Yoon Sing Yap, Simona Dalin, Bryan J Matthews,

Patricio J Velez, Adam Labadorf, David E Housman, and Ernest Fraenkel. Extensive

changes in DNA methylation are associated with expression of mutant huntingtin.

2013.

[107] 0 H Ng, Y Erbilgin, S Firtina, T Celkan, and Z Karakas. Deregulated WNT signaling

in childhood T-cell acute lymphoblastic leukemia. Blood cancer ... , 2014.

141

[108] Matthew J Niederst and Jeffrey A Engelman. Bypass mechanisms of resistance

to receptor tyrosine kinase inhibition in lung cancer. Science Signaling, 6(294):re6,

September 2013.

[109] Y Niitsu, Y Urushizaki, Y Koshida, K Terui, and K Mahara. Expression of TGF-beta

gene in adult T cell leukemia. Blood, pages 1-5, November 2015.

[110] L K Nottingham, C H Yan, X Yang, H Si, J Coupar, Y Bian, T-F Cheng, C Allen,

P Arun, D Gius, L Dang, C Van Waes, and Z Chen. Aberrant IKKa and IKK 3 co-

operatively activate NF-B and induce EGFR/AP1 signaling to promote survival and

migration of head and neck cancer. Oncogene, 33(9):1135-1147, March 2013.

[111] Petra Obexer and Michael J Ausserlechner. X-linked inhibitor of apoptosis protein -

a critical death resistance regulator and therapeutic target for personalized cancer

therapy. Frontiers in oncology, 4:197, 2014.

[112] Wataru Okamoto, Isamu Okamoto, Tokuzo Arao, Kiyoko Kuwata, Erina Hatashita,

Haruka Yamaguchi, Kazuko Sakai, Kazuyoshi Yanagihara, Kazuto Nishio, and

Kazuhiko Nakagawa. Antitumor action of the MET tyrosine kinase inhibitor crizotinib

(PF-02341066) in gastric cancer positive for MET amplification. Molecular Cancer

Therapeutics, 11 (7):1557-1564, July 2012.

[113] N Oue, Y Naito, T Hayashi, M Takigahira, A Kawano-Nagatsuma, K Sentani,

N Sakamoto, H Zarni Oo, N Uraoka, K Yanagihara, A Ochiai, H Sasaki, and W Yasui.

Signal peptidase complex 18, encoded by SEC11 A, contributes to progression via

TGF-a secretion in gastric cancer. Oncogene, 33(30):3918-3926, September 2013.

[114] A Pallis, E Briasoulis, and H Linardou. Mechanisms of resistance to epidermal

growth factor receptor tyrosine kinase inhibitors in patients with advanced non-small-

cell lung cancer: clinical and molecular .... Current medicinal..., 2011.

[115] W Pao and J Chmielecki. Rational, biologically based treatment of EGFR-mutant

non-small-cell lung cancer. Nature Reviews Cancer, 2010.

142

[116] Evan 0 Paull, Daniel E Carlin, Mario Niepel, Peter K Sorger, David Haussler, and

Joshua M Stuart. Discovering causal pathways linking genomic events to transcrip-

tional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformat-

ics, 29(21):2757-2764, November 2013.

[117] Nicole A Perry, Marey Shriver, Marie G Mameza, Bryan Grabias, Eric Balzer, and

Aikaterini Kontrogianni-Konstantopoulos. Loss of giant obscurins promotes breast

epithelial cell survival through apoptotic resistance. FASEBjournal: officialpublica-

tion of the Federation of American Societies for Experimental Biology, 26(7):2764-

2775, July 2012.

[118] M S Pino, M Shrader, C H Baker, F Cognetti, H Q Xiong, J L Abbruzzese, and D J

McConkey. Transforming Growth Factor Expression Drives Constitutive Epidermal

Growth Factor Receptor Pathway Activation and Sensitivity to Gefitinib (Iressa) in

Human Pancreatic Cancer Cell Lines. Cancer Research, 66(7):3802-3812, April

2006.

[119] M E Pitulescu and R H Adams. Eph/ephrin molecules-a hub for signaling and en-

docytosis. Genes & development, 24(22):2480-2492, November 2010.

[120] Trevor J Pugh, Shyamal Dilhan Weeraratne, Tenley C Archer, Daniel A Pomer-

anz Krummel, Daniel Auclair, James Bochicchio, Mauricio 0 Carneiro, Scott L

Carter, Kristian Cibulskis, Rachel L Erlich, Heidi Greulich, Michael S Lawrence,

Niall J Lennon, Aaron McKenna, James Meldrim, Alex H Ramos, Michael G Ross,

Carsten Russ, Erica Shefler, Andrey Sivachenko, Brian Sogoloff, Petar Stojanov,

Pablo Tamayo, Jill P Mesirov, Vladimir Amani, Natalia Teider, Soma Sengupta,

Jessica Pierre Francois, Paul A Northcott, Michael D Taylor, Furong Yu, Gerald R

Crabtree, Amanda G Kautzman, Stacey B Gabriel, Gad Getz, Natalie Jager, David

T W Jones, Peter Lichter, Stefan M Pfister, Thomas M Roberts, Matthew Meyerson,

Scott L Pomeroy, and Yoon-Jae Cho. Medulloblastoma exome sequencing uncovers

subtype-specific somatic mutations. Nature, 488(7409):106-110, July 2012.

143

[121] J Rameseder, K Krismer, Y Dayma, T Ehrenberger, M K Hwang, E M Airoldi, S R

Floyd, and M B Yaffe. A Multivariate Computational Method to Analyze High-Content

RNAi Screening Data. Journal of Biomolecular Screening, 20(8):985-997, August

2015.

[122] Amy Rapkiewicz, Virginia Espina, Jo Anne Zujewski, Peter F. Lebowitz, Armando

Filie, Julia Wulfkuhle, Kevin Camphausen, Emanuel Petricoin, Lance Liotta, and An-

drea Abati. The Needle in the Haystack: Application of Breast Fine-needle Aspira-

tion Samples to Quantitative Protein Microarray Technology. Cancer, 111:1-12, May

2007.

[123] Dashnamoorthy Ravi, Amy M Wiles, Selvaraj Bhavani, Jianhua Ruan, Philip Leder,

and Alexander J R Bishop. A network of conserved damage survival pathways re-

vealed by a genomic RNAi screen. PLoS Genetics, 5(6):el 000527, June 2009.

[124] Sabry Razick, Antonio Mora, Katerina Michalickova, Paul Boddie, and Ian M Don-

aldson. iRefScape. A Cytoscape plug-in for visualization and data mining of protein

interaction data from iRef Index. BMC Bioinformatics, 12:388, 2011.

[125] Garrett W Rhyasen, Lyndsey Bolanos, Jing Fang, Andres Jerez, Mark Wunder-

lich, Carmela Rigolino, Lesley Mathews, Marc Ferrer, Noel Southall, Rajarshi Guha,

Jonathan Keller, Craig Thomas, Levi J Beverly, Agostino Cortelezzi, Esther N

Oliva, Maria Cuzzola, Jaroslaw P Maciejewski, James C Mulloy, and Daniel T Star-

czynowski. Targeting IRAK1 as a therapeutic approach for myelodysplastic syn-

drome. Cancer Cell, 24(1):90-104, July 2013.

[126] Jacob M Rowe. Reasons for optimism in the therapy of acute leukemia. Best practice

& research. Clinical haematology, 28(2-3):69-72, June 2015.

[127] Kole T Roybal, Levi J Rupp, Leonardo Morsut, Whitney J Walker, Krista A McNally,

Jason S Park, and Wendell A Lim. Precision Tumor Recognition by T Cells With

Combinatorial Antigen-Sensing Circuits. Cell, 164(4):770-779, February 2016.

144

[128] A Russo, T Franchina, GRR Ricciardi, and A Picone. A decade of EGFR inhibition

in EGFR-mutated non small cell lung cancer (NSCLC): Old successes and future

perspectives. Oncotarget, 2015.

[129] V S Sabbisetti, K Ito, C Wang, and L Yang. Novel assays for detection of urinary

KIM-1 in mouse models of kidney injury. Toxicological..., 2012.

[130] Claudia Scholl, Stefan Fr6hling, Ian F Dunn, Anna C Schinzel, David A Barbie,

So Young Kim, Serena J Silver, Pablo Tamayo, Raymond C Wadlow, Sridhar Ra-

maswamy, Konstanze DOhner, Lars Bullinger, Peter Sandy, Jesse S Boehm, David E

Root, Tyler Jacks, William C Hahn, and D Gary Gilliland. Synthetic Lethal Interaction

between Oncogenic KRAS Dependency and STK33 Suppression in Human Cancer

Cells. Cell, 137(5):821-834, May 2009.

[131] Atefeh Seghatoleslam, Farzaneh Bozorg-Ghalati, Ahmad Monabati, Mohsen

Nikseresht, and Ali Akbar Owji. UBE2Q1, as a Down Regulated Gene in Pedi-

atric Acute Lymphoblastic Leukemia. International journal of molecular and cellular

medicine, 3(2):95-101, 2014.

[132] Jing Shi. Pathogenetic mechanisms in gastric cancer. World Journal of Gastroen-

terology, 20(38):13804, 2014.

[133] K Shostak and A Chariot. EGFR and NF-rB: partners in cancer. Trends in molecular

medicine, 2015.

[134] Frederic D Sigoillot and Randall W King. Vigilance and validation: Keys to success

in RNAi screening. ACS Chemical Biology, 6(1):47-60, January 2011.

[135] Frederic D Sigoillot, Susan Lyman, Jeremy F Huckins, Britt Adamson, Eunah Chung,

Brian Quattrochi, and Randall W King. A bioinformatics method identifies prominent

off-targeted transcripts in RNAi screens. Nature Publishing Group, 9(4):363-366,

February 2012.

[136] R Somasundaram, M Prasad, and J Ungerbdck. Transcription factor networks in

B-cell differentiation link development to acute lymphoid leukemia. Blood, 2015.

145

[137] Simona Soverini, Caterina De Benedittis, Cristina Papayannidis, Stefania Paolini,

Claudia Venturi, Ilaria lacobucci, Mario Luppi, Paola Bresciani, Marzia Salvucci,

Domenico Russo, Simona Sica, Ester Orlandi, Tamara Intermesoli, Antonella

Gozzini, Massimiliano Bonifacio, Gian Matteo Rigolin, Fabrizio Pane, Michele Bac-

carani, Michele Cavo, and Giovanni Martinelli. Drug resistance and BCR-ABL

kinase domain mutations in Philadelphia chromosome-positive acute lymphoblas-

tic leukemia from the imatinib to the second-generation tyrosine kinase inhibitor

era: The main changes are in the type of mutations, but not in the fr. Cancer,

120(7):1002-1009, December 2013.

[138] R Steuer, J Kurths, C 0 Daub, J Weise, and J Selbig. The mutual information:

detecting and evaluating dependencies between variables. Bioinformatics, 18 Suppl

2:S231-40, 2002.

[139] Michael R Stratton, Peter J Campbell, and P Andrew Futreal. The cancer genome.

Nature, 458(7239):719-724, April 2009.

[140] A Subramanian and P Tamayo. Gene set enrichment analysis: a knowledge-based

approach for interpreting genome-wide expression profiles. In Proceedings of the

... 2005.

[141] Y Tanaka, N Tanaka, Y Saeki, K Tanaka, M Murakami, T Hirano, N Ishii, and K Sug-

amura. c-Cbl-Dependent Monoubiquitination and Lysosomal Degradation of gp130.

Molecular and Cellular Biology, 28(15):4805-4818, July 2008.

[142] J Taylor and S Woodcock. A Perspective on the Future of High-Throughput RNAi

Screening: Will CRISPR Cut Out the Competition or Can RNAi Help Guide the Way?

Journal of Biomolecular Screening, 20(8):1040-1051, August 2015.

[143] Andrea R Tentner, Michael J Lee, Gerry J Ostheimer, Leona D Samson, Douglas A

Lauffenburger, and Michael B Yaffe. Combined experimental and computational

analysis of DNA damage signaling reveals context-dependent roles for Erk in apop-

tosis and G1/S arrest after genotoxic stress. Molecular Systems Biology, 8:1-18,

January 2012.

146

[144] M Toyoshima, N Tanaka, J Aoki, Y Tanaka, K Murata, M Kyuuma, H Kobayashi,

N Ishii, N Yaegashi, and K Sugamura. Inhibition of Tumor Growth and Metastasis by

Depletion of Vesicular Sorting Protein Hrs: Its Regulatory Role on E-Cadherin and

-Catenin. Cancer Research, 67(11):5162-5171, June 2007.

[145] Linh M Tran, Bin Zhang, Zhan Zhang, Chunsheng Zhang, Tao Xie, John R Lamb,

Hongyue Dai, Eric E Schadt, and Jun Zhu. Inferring causal genomic alterations in

breast cancer using gene expression data. BMC systems biology, 5(1):121, August

2011.

[146] Nurcan Tuncbag, Alfredo Braunstein, Andrea Pagnani, Shao-shan Carol Huang,

Jennifer Chayes, Christian Borgs, Riccardo Zecchina, and Ernest Fraenkel. Simul-

taneous reconstruction of multiple signaling pathways via the prize-collecting steiner

forest problem. Journal of Computational Biology, 20(2):124-136, February 2013.

[147] J W Tyner, A M Jemal, M Thayer, B J Druker, and B H Chang. Targeting survivin and

p53 in pediatric acute lymphoblastic leukemia. Leukemia, 2012.

[148] Beth 0 Van Emburgh, Andrea Sartore-Bianchi, Federica Di Nicolantonio, Salvatore

Siena, and Alberto Bardelli. ScienceDirect. Molecular Oncology, 8(6):1084-1094,

September 2014.

[149] F Vandin, E Upfal, and B J Raphael. Algorithms for detecting significantly mutated

pathways in cancer. Research in Computational Molecular ... , 2010.

[150] Fabio Vandin, Eli Upfal, and Benjamin J Raphael. Algorithms for detecting signifi-

cantly mutated pathways in cancer. Journal of Computational Biology, 18(3):507-

522, March 2011.

[151] Marc Vidal, Michael E Cusick, and Albert-Lcszl6 Barabdsi. Interactome Networks

and Human Disease. Cell, 144(6):986-998, March 2011.

[152] J M Villaveces, R C Jimenez, P Porras, N del Toro, M Duesbury, M Dumousseau,

S Orchard, H Choi, P Ping, N C Zong, M Askenazi, B H Habermann, and H Her-

mjakob. Merging and scoring molecular interactions utilising existing community

147

standards: tools, use-cases and a case study. Database, 2015(0):bau131-bau131,

January 2015.

[153] B Vogelstein, N Papadopoulos, V E Velculescu, S Zhou, L A Diaz, and K W Kinzler.

Cancer Genome Landscapes. Science, 339(6127):1546-1558, March 2013.

[154] Joel Wagner, Alejandro Wolf-Yadlin, Mark Sevecka, Jennifer K Grenier, David E

Root, Douglas A Lauffenburger, and G MacBeath. Receptor Tyrosine Kinases Fall

Into Distinct Classes Based on Their Inferred Signaling Networks. Technical report,

March 2013.

[155] Zhen Ning Wee, Siti Maryam J M Yatim, Vera K Kohlbauer, Min Feng, Jian Yuan Goh,

Bao Yi, Puay Leng Lee, Songjing Zhang, Pan Pan Wang, Elgene Lim, Wai Leong

Tam, Yu Cai, Henrik J Ditzel, Dave S B Hoon, Ern Yu Tan, and Qiang Yu. IRAK1 is a

therapeutic target that drives breast cancer metastasis and resistance to paclitaxel.

Nature communications, 6:8746, 2015.

[156] Tim Weenink and Tom Ellis. Creation and characterization of component libraries for

synthetic biology. Methods in molecular biology (Clifton, N.J.), 1073:51-60, 2013.

[157] Scott Wilhelm, Christopher Carter, Mark Lynch, Timothy Lowinger, Jacques Dumas,

Roger A Smith, Brian Schwartz, Ronit Simantov, and Susan Kelley. Discovery and

development of sorafenib: a multikinase inhibitor for treating cancer. Nature reviews

Drug discovery, 5(10):835-844, October 2006.

[158] Lucia Wille, Melissa L Kemp, Peter Sandy, Christina L Lewis, and Douglas A Lauf-

fenburger. Epi-allelic Erk1 and Erk2 knockdown series for quantitative analysis of T

cell Erk regulation and IL-2 production. Molecular Immunology, 44(12):3085-3091,

May 2007.

[159] Richard T Williams, Willem den Besten, and Charles J Sherr. Cytokine-dependent

imatinib resistance in mouse BCR-ABL+, Arf-null lymphoblastic leukemia. Genes &

development, 21(18):2283-2287, September 2007.

148

[160] R.T. Williams, M.F. Roussel, and C.J. Sherr. Arf gene loss enhances oncogenicity

and limits imatinib response in mouse models of Bcr-Abl-induced acute lymphoblas-

tic leukemia. Proceedings of the National Academy of Sciences, 103(17):6688-

6693, 2006.

[161] Jennifer L Wilson, Michael T Hemann, Ernest Fraenkel, and Douglas A Lauffen-

burger. Integrated network analyses for functional genomic studies in cancer. Sem-

inars in Cancer Biology, 23(4):213-218, August 2013.

[162] Timothy R Wilson, Jane Fridlyand, Yibing Yan, Elicia Penuel, Luciana Burton, Emily

Chan, Jing Peng, Eva Lin, Yulei Wang, Jeff Sosman, Antoni Ribas, Jiang Li, John

Moffat, Daniel P Sutherlin, Hartmut Koeppen, Mark Merchant, Richard Neve, and

Jeff Settleman. Widespread potential for growth-factor-driven resistance to anti-

cancer kinase inhibitors. Nature, 487(7408):505-509, July 2012.

[163] Meng Xu, Yiqun Xie, Songshi Ni, and Hua Liu. The latest therapeutic strategies

after resistance to first generation epidermal growth factor receptor tyrosine kinase

inhibitors (EGFR TKIs) in patients with non-small cell lung cancer (NSCLC). Annals

of translational medicine, 3(7):96, May 2015.

[164] Michael B Yaffe. The scientific drunk and the lamppost: massive sequencing ef-

forts in cancer discovery and treatment. Science Signaling, 6(269):pel 3-pel 3, April

2013.

[165] Yinghai Ye, Xiaocong Zhou, Xiaoyang Li, Yinhe Tang, Yusheng Sun, and Jun Fang.

Inhibition of epidermal growth factor receptor signaling prohibits metastasis of gastric

cancer via downregulation of MMP7 and MMP13. Tumor Biology, 35(11):10891-

10896, August 2014.

[166] Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar,

Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan, David R Karger,

Susan Lindquist, and Ernest Fraenkel. Bridging high-throughput genetic and tran-

scriptional data reveals cellular responses to alpha-synuclein toxicity. Nature Genet-

ics, 41(3):316-323, March 2009.

149

[167] Benjamin Yeung, King-Ching Ho, and Xiaolong Yang. WWP1 E3 Ligase Targets

LATS1 for Ubiquitin-Mediated Degradation in Breast Cancer Cells. PLoS ONE,

8(4):e61027, April 2013.

[168] Jesse G Zalatan, Michael E Lee, Ricardo Almeida, Luke A Gilbert, Evan H White-

head, Marie La Russa, Jordan C Tsai, Jonathan S Weissman, John E Dueber, Lei S

Qi, and Wendell A Lim. Engineering complex synthetic transcriptional programs with

CRISPR RNA scaffolds. Cell, 160(1-2):339-350, January 2015.

[169] Lars Zender, Wen Xue, Johannes Zuber, Camile P Semighini, Alexander Kras-

nitz, Beicong Ma, Peggy Zender, Stefan Kubicka, John M Luk, Peter Schirmacher,

W Richard McCombie, Michael Wigler, James Hicks, Gregory J Hannon, Scott Pow-

ers, and Scott W Lowe. An Oncogenomics-Based In Vivo RNAi Screen Identifies

Tumor Suppressors in Liver Cancer. Cell, 135(5):852-864, November 2008.

[170] Min Zhang, Gopesh Srivastava, and Liwei Lu. The pre-B cell receptor and its function

during B cell development. Cellular & molecular immunology, 1(2):89-94, April 2004.

[171] X D Zhang, F Santini, R Lacson, S D Marine, Q Wu, L Benetti, R Yang, A McCamp-

bell, J P Berger, D M Toolan, E M Stec, D J Holder, K A Soper, J F Heyse, and

M Ferrer. cSSMD: assessing collective activity for addressing off-target effects in

genome-scale RNA interference screens. Bioinformatics, 27(20):2775-2781, Octo-

ber2011.

[172] Boyang Zhao, Justin R Pritchard, Douglas A Lauffenburger, and Michael T Hemann.

Addressing genetic tumor heterogeneity through computationally predictive combi-

nation therapy. Cancer Discovery, 4(2):166-174, February 2014.

[173] Boyang Zhao, Joseph C Sedlak, Raja Srinivas, Pau Creixell, Justin R Pritchard,

Bruce Tidor, Douglas A Lauffenburger, and Michael T Hemann. Exploiting Temporal

Collateral Sensitivity in Tumor Clonal Evolution. Cell, 165(1):234-246, March 2016.

[174] Y Zhen, L Guanghui, and Z Xiefu. Knockdown of EGFR inhibits growth and invasion

of gastric cancer cells. Cancer Gene Therapy, 21(11):491-497, November 2014.

150

[175] D Zhou, W Zou, Y Wei, and Y Zhou. Downregulation of c-Met expression does not

enhance the sensitivity of gastric cancer cell line MKN-45 to gefitinib. Molecular...,

2015.

151

Documents

Network analyses for functional genomic screens in cancer