Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
GENOMICS
Exploring genetic interactionmanifolds constructed from richsingle-cell phenotypesThomas M. Norman1,2,3*†, Max A. Horlbeck1,2,3*, Joseph M. Replogle1,2,3, Alex Y. Ge4,5,Albert Xu1,2,3, Marco Jost1,2,3, Luke A. Gilbert4,5†, Jonathan S. Weissman1,2,3†
How cellular and organismal complexity emerges from combinatorial expression of genes isa central question in biology. High-content phenotyping approaches such as Perturb-seq(single-cell RNA-sequencing pooled CRISPR screens) present an opportunity for exploring suchgenetic interactions (GIs) at scale. Here, we present an analytical framework for interpretinghigh-dimensional landscapes of cell states (manifolds) constructed from transcriptionalphenotypes.We applied this approach to Perturb-seq profiling of strong GIs mined from agrowth-based, gain-of-function GI map. Exploration of this manifold enabled ordering ofregulatory pathways, principled classification of GIs (e.g., identifying suppressors), andmechanistic elucidation of synergistic interactions, including an unexpected synergy betweenCBL and CNN1 driving erythroid differentiation. Finally, we applied recommender systemmachine learning to predict interactions, facilitating exploration of vastly larger GI manifolds.
The complexity of cell types in multicellularorganisms is driven not by a large increasein gene number but instead by the com-binatorial expression of a surprisingly smallnumber of components. Specific combina-
tions of genes exhibit emergent properties whenexpressed together, enabling the generation ofmany diverse cell types and behaviors. Search-ing for such emergent properties is enabled bythe quantitative study of genetic interactions(GIs), which compare the phenotypic consequencesof perturbing a pair of genes alone or in combina-tion, typically bymeasuring growth [althoughotherphenotypic readouts such as reporter gene expres-sion and transcriptional responses have beenexplored (1, 2)]. GIs can reveal synthetic lethalvulnerabilities in tumors, identify suppressors ofinherited and acquired disorders, and guide thedesign of cocktails of genes to drive differentia-tion between cell types (3, 4). Pioneering effortsin yeast to construct systematic GImaps betweenall gene pairs have enabled systematic determi-nation of gene function, identification of proteincomplexes, and definition of gene-regulatory net-works in a principled and unbiasedmanner (1, 2).Recent studies have extended such approaches
to mammalian and other metazoan systems, butthese efforts face twomajor challenges: scale andinformation content (1, 2). For example,mappingpairwise interactions among the ~10,000 tran-
scribed genes in a human cell would requiremeasuring ~50million doublemutants. This scalehas necessitated the use of highly parallelizablereadouts of phenotype, such as growth rate, thatobscure the mechanistic or molecular basis forany particular interaction. Put simply, there aremany ways for cells to appear equally “unfit”: thereprogramming of a pluripotent cell to a termi-nally differentiated neuron may affect growthas much as induction of apoptosis or cell cyclearrest. Furthermore, many metazoan cell typesare quiescent or postmitotic and as such are notamenable to growth-based screens. Finally, bulkmeasures of their properties may obscure impor-tant cell-to-cell variability.Emerging high-throughput approaches for
monitoring rich phenotypes of individual cells(e.g., imaging or droplet single-cell transcriptom-ics) present a potential solution to these problems.For example, Perturb-seq pairs CRISPR-basedscreens with single-cell RNA sequencing (5–8).Each individual cell is in effect an independentexperiment connecting a genetic perturbation toits transcriptional consequences, allowing hundredsof thousands of parallel measurements (9, 10).It has been suggested that the rich phenotypesenabled by Perturb-seq can be used to better in-terpret the impact of genetic interactions (7).In this study, we exploit the scalability and rich
transcriptome readout of Perturb-seq to imple-ment a principled approach for systematicallystudying mammalian GIs. Each transcriptionalmeasurement, and effectively each genetic per-turbation (single or combinatorial), can be viewedas defining a point in high-dimensional space. Bymeasuring many GIs, we therefore effectivelymap out a surface—a manifold in mathematicalterminology—that describes the transcriptionalstates that a cell can occupy upon perturbation(Fig. 1A). We argue that this GI manifold carriesmuch more information about interactions than
a traditional GI map and is intrinsically moreinterpretable in several ways, including the abil-ity to resolve the distinct outcomes underlyingGIs and to model the different ways genetic per-turbations combine to yield new phenotypes.
An overexpression strategy identifiesstrong genetic interactions
Most previous studies of GIs have focused onloss-of-function perturbations, but many impor-tant cellular processes such as differentiation arealso associated with activation of gene expres-sion (e.g., MYOD1 in muscle cells). Genes thatexhibit phenotypes when expressed alone havea higher rate of genetic interactions with othergenes (1, 2). To identify mechanistically diverseGIs arising from gene activation, we thereforeselected 112 “hit” genes whose activation en-hances or retards growth of K562 cells (Fig. 1Aand table S1) (11), including cell cycle regulators,transcription factors, kinases, phosphatases, andgenes of unknown function.To systematically measure gain-of-function
(overexpression) GIs, we adapted a technologydeveloped for constructing fitness GI maps in hu-man cells using CRISPR interference (CRISPRi)(12) (Fig. 1, A and B, and fig. S1). Each candidateinteraction was probed by constructing a libraryof vectors containing pairs of single-guide RNAs(sgRNAs) (table S2). As we included two distinctsgRNAs targeting each gene, a total of 28,680unique sgRNApairs were tested. K562 cells stablyexpressing the SunTagCRISPRactivation (CRISPRa)system (11) were transduced with the CRISPRa GIlibrary, and sgRNApair abundancewas comparedat the start of the screen and after 10 days ofgrowth to measure fitness phenotypes. GI scoreswere assigned by measuring deviation betweenthe observed fitness of overexpressing both genesfrom the expected fitness based on the averageimpact of each single gene [(12); see materialsand methods, fig. S2, and table S3]. Independentreplicate experiments showed high levels of con-cordance for sgRNA-level GI scores and GI profilecorrelations, and independent sgRNAs targetingthe same gene were much more similar than thebackground of all sgRNA GI correlations (me-dianR = 0.50 comparedwith 0.04; Fig. 1, C andD;fig. S3A; and table S4). The gene-level GI scoresobtained by averaging sgRNAs targeting the samegenewere alsowell correlated between replicates(gene-level GI R = 0.80, p < 10−300; Fig. 1E, fig.S3A, and table S5) and followed a bell-shapeddistribution but with GIs ranging well beyondthe expectation from negative control sgRNAs(Fig. 1F and fig. S3B).We then clustered genes according to the sim-
ilarity of their GI profiles to produce a GI map(Fig. 1B; a larger version with gene labels isprovided in fig. S4). Highly correlated genes wereenriched for genes with the same DAVID termannotations (13), allowing for unbiased annota-tion of clusters (Fig. 1B; fig. S5, A and B; and tableS6). In addition, the map contained fewer clustersthan would be expected in a random map (fig.S5, C andD), consistent with the typical low-rankstructure of GI maps (i.e., groups of genes interact
RESEARCH
Norman et al., Science 365, 786–793 (2019) 23 August 2019 1 of 8
1Department of Cellular and Molecular Pharmacology, Universityof California, San Francisco, CA 94158, USA. 2Howard HughesMedical Institute, University of California, San Francisco, CA94158, USA. 3California Institute for Quantitative BiomedicalResearch, University of California, San Francisco, CA 94158,USA. 4Department of Urology, University of California,San Francisco, CA 94158, USA. 5Helen Diller FamilyComprehensive Cancer Center, San Francisco, CA 94158, USA.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (T.M.N.);[email protected] (L.A.G.); [email protected] (J.S.W.)
on March 8, 2021
http://science.sciencem
ag.org/D
ownloaded from
similarly so that there are fewer overall degreesof freedom than total genes). Thus, the structureof the CRISPRa GI map, like past efforts basedon loss-of-function alleles, can assign function toindividual genes by the similarity of GI profiles(1, 2). However, although the GI map robustlyidentifiedmany strongGIs, the origins of specificinteractions were difficult to deduce as each GIwas characterized only by a single scalar value(i.e., deviation from expected growth rate).
A Perturb-seq platform formeasuring GIs
We reasoned that Perturb-seq, which allowssingle-cell, pooled transcriptional profiling ofCRISPR-mediated perturbations (5–8), wouldenable us to better resolve the mechanisms un-derlying GIs. We picked 132 gene pairs from theGI map, chosen both within and between blocksof genes with similar interaction profiles, andtargeted each with CRISPRa sgRNA pairs (Fig.1A, fig. S6A, table S2, andmaterials andmethods).Given the low-rank structure of the fitness GImap, we reasoned that we could broadly samplethe biological processes representedwithoutmea-suring all gene pairs, asmanyGIs that fell into the“blocks” in the GI map were likely explained bysimilar mechanisms (materials and methods).We also profiled all single-gene perturbations toenable direct comparison of individual and com-bined perturbations (i.e., single geneA, single geneB, and pair AB). In total, we obtained transcrip-tional readouts for 287 perturbations measuredacross ~110,000 single cells (median 273 cells percondition; materials and methods, fig S1, andtable S7) in one pooled experiment.The Perturb-seq profiles also allowed us to
directly assess the performance of our CRISPRareagents (table S7). Levels of target gene activa-tion spanned a broad range (Fig. 1G and fig. S6, Band C), with a general trend that poorly expressedtargets were more highly induced. The A and Bpositions of the sgRNA cassette performed sim-ilarly (Fig. 1G and fig. S6, D and E), and expressionof genes neighboring the target was generallyunperturbed exceptwhen transcripts shared pro-moter regions (materials and methods; fig. S7, Aand B; and table S8). Finally, there was minimalcorrelation between fold activation and the num-ber of differentially expressed genes, implying thateven small increases in themRNA abundance ofsome genes can strongly alter a cell’s state (R =0.07; Fig. 1H). The degree of fitness defect wasrelated to the number of differentially expressedgenes (fig. S7C).
Constructing a GI manifold revealsbiological processes driving GIs
Whereas GI maps assign a scalar score to eachGI, our Perturb-seq approach instead associatesa transcriptional phenotype. We viewed thisensemble of measurements as defining a high-dimensional analog of a GI map, here termed aGI manifold. In our manifold analogy, each pos-sible cellular transcriptional state defines a pointon a high-dimensional surface. By applying a di-verse set of perturbations and measuring the
Norman et al., Science 365, 786–793 (2019) 23 August 2019 2 of 8
MAP2K
ETS2/MAPK1
Ig d
omai
n co
ntai
ning
Histidi
ne p
hosp
hata
se
Tyro
sine
phos
phat
ase
Centriole
DNA binding
MAP4K
LIM-type zinc finger TF
Brachyury TF
Kinesin
BAK/BIM
bLZ TF
Blood coagulation
Forkhead TF
CDK inhibitor
N-terminal Forkhead TF
Response to peptide hormone
A
B
D
E F G H
Combinatorial overexpression library
Fitness-level geneticinteraction (GI) map
Perturb-seq profilingof GIs
Genetic interactionmanifold
Position B
Position ADoubles
C
−0.5 0.0 0.5 1.0
replicate 1
sgRNA-level GIprofile correlations
−0.5
0.0
0.5
1.0
repl
icat
e 2
All sgRNA pairsSame gene
R=0.88, p<10-300
≤-4.0
0.0
≥4.0
GI score
-0.4
0.0
0.4
Gene fitnessphenotype (γ)
100
10-3.3
10-6.5
Term enrichmentp-value
Fold activation of target
100
100 101 102 103
101
102
103N
umbe
r of
diff
eren
tially
exp
ress
ed g
enes
Fold activation in A position
Fol
d ac
tivat
ion
in B
pos
ition
100 101 102 103
100
101
102
103
Mean expression profilesabab
fitness more fitless fit
bbbab
cd
cd
ab
aaa
ccd
Gene a
Double ab
Gene b
replicate 1
−10
0
10
repl
icat
e 2
−10 0 10
R=0.80, p<10-300
>6 SD R=0.94, p<10-133
gene-level GI scores
−0.5 0.0 0.5 1.0
sgRNA-level GI profile correlation
0
50
100
% m
axim
um
same gene, median=0.50all genes, median=0.04
−10 0 10
GI score
num
ber
of g
ene
pairs
Estimated 5% FDR
10¹
10³
0
-2.77 2.83
• activate all pairs among 112 target genes • 2 sgRNAs per target gene• 28,680 total sgRNA pairs
• fitness GI scores for all sgRNA pairs and for all gene pairs (5%-95% range: -3.5 to 3.3)
• profile 287 gene pair and single gene perturbations• ~110,000 single-cell transcriptomes
Fig. 1. CRISPRa fitness-level GI map. (A) Experimental strategy. Pairs of genes were systemati-cally coactivated using dual sgRNA CRISPRa libraries and a GI map was generated from the fitnessmeasurements. A subset of GIs was then profiled transcriptionally using Perturb-seq. Thesehigh-dimensional measurements define a surface called a GI manifold. Distinct GIs that lie inmarkedly different parts of the GI manifold may result in similar outcomes when viewed only atthe level of fitness. (B) CRISPRa fitness-level GI map. Gene-level GI profiles were clustered byaverage linkage hierarchical clustering based on Pearson correlation. Clusters were annotatedby assigning DAVID annotations if a DAVID term was significantly enriched in that cluster[hypergeometric ln(p) ≤ –7.5; see materials and methods]. (C and D) GI profile correlation betweenpairs of sgRNAs targeting any genes (black) or the same gene (green). Data are displayed as scatterplot of replicates (C) and histogram of replicate-averaged GIs (D). (E and F) Gene-level GI scoresgenerated by averaging all sgRNA-level GIs for each gene pair. (E) Scatter plot of replicates.Red points indicate nontargeting control sgRNA pairs, and dashed line indicates a radius of 6 SDsfrom nontargeting controls. (F) Histogram of gene-level GI scores with estimated empirical 5% falsediscovery rate threshold. (G) Comparison of fold activation of target gene measured by Perturb-seq whenthe targeting sgRNA is in the A or B position in the dual sgRNA expression cassette. (H) Fold activation ofthe target gene compared with the total number of differentially expressed genes.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
resulting states (as we have), it becomes possibleto infer the shape of this surface (Fig. 1A). More-over, as each GI is characterized by a rich pheno-type, we envisioned that this perspective wouldallow us to organize GIs by common features andglobally examine theirmechanistic underpinnings.
To visualize this GI manifold, we used UMAP[uniformmanifold approximation and projection(14)] to project the mean expression profiles forour 287 perturbations into a two-dimensionalplane (Fig. 2, A and B; a single-cell version isshown in fig. S8). This algorithm approximates
the shape of a high-dimensional surface in twodimensions by trying to conserve nearest neigh-bor relationships. Perturbations that inducesimilar transcriptional changes then naturallycluster close to each other; we defined stableclusters using the HDBSCAN algorithm [(15);
Norman et al., Science 365, 786–793 (2019) 23 August 2019 3 of 8
AHR / FEV
AHR / KLF1
AHR / ctrl
ARID1A / ctrl
ARRDC3 / ctrl
ATL1 / ctrl
BAK1 / ctrl
BCL2L11 / BAK1
BCL2L11 / ctrl
BCL2L11 / TGFBR2
BCORL1 / ctrl
BPGM / ctrlBPGM / SAMD1BPGM / ZBTB1
CBARP / ctrl
CBFA2T3 / ctrl
CBL / CNN1
CBL / ctrl
CBL / PTPN12
CBL / PTPN9CBL / TGFBR2
CBL / UBASH3A
CBL / UBASH3B
CDKN1A / ctrlCDKN1B / CDKN1A
CDKN1B / ctrlCDKN1C / CDKN1A
CDKN1C / CDKN1B
CDKN1C / ctrl
CEBPA / ctrlCEBPB / CEBPA
CEBPB / MAPK1 CEBPB / ctrl
CEBPB / OSR2
CEBPB / PTPN12
CEBPE / CEBPA
CEBPE / CEBPB
CEBPE / CNN1
CEBPE / KLF1
CEBPE / ctrl
CEBPE / PTPN12
CEBPE / RUNX1T1CEBPE / SPI1
CELF2 / ctrl
CITED1 / ctrl
CKS1B / ctrl
CLDN6 / ctrl
CNN1 / MAPK1
CNN1 / ctrl
CNN1 / UBASH3A
CNNM4 / ctrl
COL1A1 / ctrlCOL2A1 / ctrl
CSRNP1 / ctrl
DLX2 / ctrl
DUSP9 / ETS2
DUSP9 / IGDCC3
DUSP9 / KLF1DUSP9 / MAPK1
DUSP9 / ctrl
DUSP9 / PRTGDUSP9 / SNAI1
EGR1 / ctrl
ELMSAN1 / ctrl
ETS2 / CEBPE
ETS2 / CNN1
ETS2 / IGDCC3
ETS2 / IKZF3
ETS2 / MAP7D1
ETS2 / MAPK1
ETS2 / ctrl
ETS2 / PRTG
FEV / CBFA2T3
FEV / ISL2
FEV / MAP7D1
FEV / ctrl
FOSB / CEBPB
FOSB / CEBPE
FOSB / IKZF3
FOSB / ctrl
FOSB / OSR2
FOSB / PTPN12
FOSB / UBASH3B
FOXA1 / FOXF1
FOXA1 / FOXL2
FOXA1 / HOXB9
FOXA1 / ctrlFOXA3 / FOXA1
FOXA3 / FOXF1FOXA3 / FOXL2
FOXA3 / HOXB9FOXA3 / ctrl
FOXF1 / FOXL2FOXF1 / HOXB9
FOXF1 / ctrl
FOXL2NB / FOXL2
FOXL2NB / ctrl
FOXL2 / HOXB9
FOXL2 / MEIS1
FOXL2 / ctrl
FOXO4 / ctrl
GLB1L2 / ctrl
HES7 / ctrl
HK2 / ctrl
HNF4A / ctrl
HOXA13 / ctrl
HOXB9 / ctrl
HOXC13 / ctrl
IER5L / ctrl
IGDCC3 / MAPK1
IGDCC3 / ctrl
IGDCC3 / PRTG
IGDCC3 / ZBTB25
IKZF3 / ctrl
IRF1 / ctrl
IRF1 / SET
ISL2 / ctrl
JUN / CEBPA
JUN / CEBPB
JUN / ctrl
KIF18B / KIF2C
KIF18B / ctrl
KIF2C / ctrl
KLF1 / BAK1
KLF1 / CEBPA
KLF1 / CLDN6
KLF1 / COL2A1
KLF1 / FOXA1
KLF1 / MAP2K6KLF1 / ctrlKLF1 / TGFBR2
KMT2A / ctrl
LHX1 / ELMSAN1
LHX1 / ctrl
LYL1 / CEBPB
LYL1 / IER5L
LYL1 / ctrl
MAML2 / ctrl
MAP2K3 / ELMSAN1
MAP2K3 / IKZF3
MAP2K3 / MAP2K6
MAP2K3 / ctrl
MAP2K3 / SLC38A2
MAP2K6 / ELMSAN1
MAP2K6 / IKZF3
MAP2K6 / ctrl
MAP2K6 / SPI1
MAP4K3 / ctrl
MAP4K5 / ctrl
MAP7D1 / ctrl
MAPK1 / IKZF3
MAPK1 / ctrl
MAPK1 / PRTG
MAPK1 / TGFBR2
MEIS1 / ctrl
MIDN / ctrl
NCL / ctrl
NIT1 / ctrl
ctrl / BAK1
ctrl / CBARP
ctrl / CBFA2T3
ctrl / CDKN1A
ctrl / CDKN1B
ctrl / CEBPA
ctrl / CEBPB
ctrl / CEBPE
ctrl / CLDN6
ctrl / CNN1
ctrl / COL2A1
ctrl / DLX2
ctrl / ELMSAN1
ctrl / ETS2
ctrl / FEV
ctrl / FOXA1ctrl / FOXF1
ctrl / FOXL2
ctrl / HOXB9
ctrl / HOXC13
ctrl / IER5L
ctrl / IGDCC3
ctrl / IKZF3ctrl / ISL2
ctrl / KIF2C
ctrl / KLF1 ctrl / MAP2K6ctrl / MAP7D1
ctrl / MAPK1
ctrl / MEIS1
ctrl / ctrl
ctrl / OSR2
ctrl / PRTG
ctrl / PTPN12
ctrl / PTPN9
ctrl / RUNX1T1
ctrl / SAMD1
ctrl / SET
ctrl / SLC38A2
ctrl / SNAI1
ctrl / SPI1
ctrl / STIL
ctrl / TBX2ctrl / TBX3
ctrl / TGFBR2
ctrl / UBASH3A
ctrl / UBASH3B
ctrl / ZBTB1
ctrl / ZBTB25
ctrl10 / ctrl
ctrl11 / ctrl
ctrl1 / ctrl
OSR2 / ctrl
PLK4 / ctrl
PLK4 / STILPOU3F2 / CBFA2T3
POU3F2 / FOXL2
POU3F2 / ctrl PRDM1 / CBFA2T3
PRDM1 / ctrl
PRTG / ctrl
PTPN12 / ctrl
PTPN12 / OSR2
PTPN12 / PTPN9
PTPN12 / SNAI1
PTPN12 / UBASH3A
PTPN12 / ZBTB25
PTPN13 / ctrl
PTPN1 / ctrl
PTPN9 / ctrl
RHOXF2B / ctrl
RHOXF2B / SET
RHOXF2B / ZBTB25
RP5-862P8.2 / ctrl
RREB1 / ctrl
RUNX1T1 / ctrl
S1PR2 / ctrl
SAMD1 / ctrl
SAMD1 / PTPN12
SAMD1 / TGFBR2SAMD1 / UBASH3B
SAMD1 / ZBTB1
SET / CEBPE
SET / KLF1SET / ctrl SGK1 / ctrl
SGK1 / S1PR2
SGK1 / TBX2SGK1 / TBX3
SLC4A1 / ctrl
SLC6A9 / ctrl
SNAI1 / DLX2
SNAI1 / ctrl
SNAI1 / UBASH3B
SPI1 / ctrl
STIL / ctrl
TBX2 / ctrl
TBX3 / ctrlTBX3 / TBX2
TGFBR2 / CBARP
TGFBR2 / ETS2
TGFBR2 / IGDCC3
TGFBR2 / ctrl
TGFBR2 / PRTG
TMSB4X / BAK1
TMSB4X / ctrl
TP73 / ctrl
TSC22D1 / ctrl
UBASH3A / ctrl
UBASH3B / CNN1
UBASH3B / ctrl
UBASH3B / OSR2
UBASH3B / PTPN12
UBASH3B / PTPN9
UBASH3B / UBASH3A
UBASH3B / ZBTB25
ZBTB10 / DLX2
ZBTB10 / ELMSAN1
ZBTB10 / ctrlZBTB10 / PTPN12
ZBTB10 / SNAI1
ZBTB1 / ctrl
ZBTB25 / ctrl
ZC3HAV1 / CEBPA
ZC3HAV1 / CEBPE
ZC3HAV1 / HOXC13
ZC3HAV1 / ctrl
ZNF318 / FOXL2
ZNF318 / ctrl
BA
a ab
b
Reduce dimensions to visualize manifold and
cluster related phenotypes
G1 CELL CYCLEG1 CELL CYCLEARRESTARREST
GRANULOCYTEGRANULOCYTE/ APOPTOSIS/ APOPTOSIS
PIONEERPIONEERFACTORSFACTORS
ERYTHROIDERYTHROID
PRO-GROWTHPRO-GROWTH
MEGAKARYOCYTEMEGAKARYOCYTEMARKERMARKER
D Gene-level GI score
−4−2024
C Fitness phenotype (γ)
−0.4−0.20.00.20.4
E Deviation of cell cycledistribution from control cells
0.00.10.20.30.40.50.60.7
F Cell cycle position
M
M-G
1
G1-
S S
G2-
M
ctrl / CDKN1Actrl / CDKN1BCDKN1C / ctrlCDKN1C / CDKN1ACDKN1C / CDKN1BPLK4 / ctrlctrl / STILPLK4 / STILCKS1B / ctrlKIF18B / ctrlctrl / KIF2CKIF18B / KIF2C
−1.00.01.0
Fig. 2. Visualization of the GI manifold. (A) Using diverse geneticperturbations, the structure of the GI manifold can be inferred and thenvisualized by dimensionality reduction to a plane. (B) UMAP projection ofall single-gene and gene-pair Perturb-seq profiles. Each dot represents agenetic perturbation characterized by its mean expression profile. Clustersof transcriptionally similar perturbations are colored identically, whereasgray dots are perturbations that do not fall within stable clusters.
(C) Fitness measurements from the GI map expressed as gene-pair growthphenotypes (g). (D) GI scores from the fitness-level GI map. Single-geneperturbations are not included. (E) Cell cycle deviation scores. Strongerscores indicate alteration from the distribution of cell cycle positionsobserved in unperturbed cells. (F) Relative enrichment or depletion of cellcycle phases relative to unperturbed cells induced by selected geneticperturbations.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
materials andmethods]. Both fitness andGI scoreswere distributed throughout the GI manifold (Fig.2, C andD), in accordancewith the idea that scalarfitness measurements collapse the much largerlandscape of transcriptional states (Fig. 1A). Bycontrast, overlaying markers derived from theunderlying transcriptional data allowed us to gaininsight into the mechanisms of individual GIsby looking at common features of perturbationswithin each cluster. For example, one cluster ofmean expression profiles (“G1 cell cycle arrest”)contained canonical regulators of the cell cycle.In this case, the underlying single-cell data con-firmed that these perturbations induced cell cyclearrest in the expected cell cycle stages (Fig. 2, Eand F), explaining the growth defect.We observed clusters of perturbations that
caused cells to induce erythroid, granulocyte,or megakaryocyte markers, which is consistentwith the known multilineage potential of theK562 model (Figs. 2B and 3, A and B, and fig. S9)(16, 17). These results suggested that cell differ-entiation or priming and a concomitant decreasein proliferation explained some of the structureof theGImanifold. For example,many interactionssurrounding CBL, its regulatorsUBASH3A/B, andseveralmultisubstrate tyrosine phosphatases (e.g.,PTPN9/12) induced erythroid markers, suggest-ing a common mechanism in the regulation ofreceptor tyrosine kinase signaling (18). By con-trast, the granulocyte cluster mostly containedperturbations of canonical regulators such asC/EBP-a, -b, and -e (CEBPA/B/E) and PU.1 (SPI1).Finally, a cluster of perturbations induced ex-pression of the canonical megakaryocyte markerCD41, but these cells did not adopt the charac-
teristicmorphological features ofmegakaryocytes,as seen by microscopy (fig. S9G), suggesting thatthey are at best primed toward megakaryocyticdifferentiation (16, 17).To test our ability to better resolve specific
interactions using Perturb-seq, we examined astrong synergistic interaction identified by ourfitness GI map between CBL and CNN1 (calpo-nin) that belonged to the erythroid cluster. CNN1is a poorly characterized gene that is annotatedas a smoothmuscle–specific protein, although itis expressed in many cell types (19, 20). Over-expressing either gene induced similar transcrip-tional changes, and single-cell analysis revealedan apparent progression of phenotypes from un-perturbed through singly perturbed to doublyperturbed CBL/CNN1 cells (Fig. 3, C and D). Con-sistent with an erythroid transcriptional program,overexpression of CBL and CNN1 caused stronginduction of canonicalmarkers: hemoglobin genes(sixfold to 39-fold), an iron importer involved inheme biosynthesis (SLC25A37, 13-fold), and theblood group antigen CD235a (GYPA, twofold)(Fig. 3, C and E, and fig. S10, A and B) (17).Furthermore, overexpression of CBL and CNN1transgenes in a human erythroid progenitormodel(HUDEP2 cells) individually and in combinationalso inducedmarkers of erythroid differentiation(Fig. 3, F and G, and fig. S10, C and D) (21).This example highlighted how Perturb-seq
analysis can directly lead to a hypothesis aboutthe biology underlying a GI evenwhen one of thecomponents is poorly understood. More gen-erally, because our approach was sensitive tosingle-cell phenotypes and incomplete differen-tiation and could simultaneously detect signa-
tures of multiple differentiation states, it couldfacilitate higher-order combinatorial perturba-tion screens aimed at improving protocols fordriving cells into distinct differentiation states.
Quantitative modeling of GIs definesmechanisms of interaction
Our large collection of matched single and doubleoverexpression transcriptional phenotypes pro-vides us with the opportunity to quantitativelymodel GIs directly from transcription profileswithout using the fitness GImap.We devised anapproach based on fitting a regression model,dab = c1da + c2db + e, which decomposes thetranscriptional changes observed in doubly per-turbed cells (dab) as a linear combination of thetranscriptional changes induced by the two singleperturbations of genes a and b (c1da + c2db) andan error term (e) that contains unmodeled ornonlinear effects. The coefficients c1 and c2 theneffectively measure how much of the phenotypeis accounted for by each single perturbation(Fig. 4A). This linear model of transcriptional GIsexplained, on average, more than 70% of the var-iance in gene expression (Fig. 4B; mean R2 = 0.71).GI maps traditionally classify interactions as
either buffering (indicating antagonism, GI scorepositive) or synthetic sick/lethal (SSL, indicat-ing synergy, GI score negative). We observed arobust anticorrelation (R = –0.72) between themagnitude of the coefficients c1 and c2 and thefitness-based GI score (Fig. 4C). An intuitive ex-planation was that buffering interactions travelless “far” (smaller coefficients) along the GI mani-fold, whereas SSL interactions travel farther (big-ger coefficients) (Fig. 4A).
Norman et al., Science 365, 786–793 (2019) 23 August 2019 4 of 8
Differentiation hierarchy
CMP
MEP
ErythrocytesPlatelets
Granulocytes
Macrophages
GMPK562s
Developmental markers
erythroid markers
granulocyte markers
CD41
A B
G
Contro
lCBL
CNN1
CBL/CNN1
CBL / CNN1 (GI score: -11.9)
differentially expressed genes
C
SLC
25A
37H
BZ
PR
ELI
D1
MIF
TM
SB
4XP
RS
S57
GM
FG
TE
X30
PK
MC
D99 XG
ZF
P36
L1R
EE
P6
EE
F1A
1G
ST
P1
GY
PA
NA
RF
HB
A2
ALA
S2
HB
A1
SM
IM1
GY
PB
TE
SC
CO
L18A
1C
HI3
L2A
PO
EA
PO
C1
HB
G2
HB
G1
BLV
RB
CBL / ctrl
ctrl / CNN1
CBL / CNN1−3.0−1.50.01.53.0
CBL / ctrlctrl / ctrl
ctrl / CNN1CBL / CNN1
ED
F
0 20 40Combined score
ERYTHROBLAST
CORD BLOOD
CD34+ CELL
PERIPHERAL BLOOD
ADRENAL GLAND
Contro
lCBL
CNN1
CBL/CNN1
0
1000
2000
3000
Hem
oglo
bin
(med
ian
fluor
esce
nce
inte
nsity
)
Fig. 3. Dissecting a genetic interaction using Perturb-seq. (A) Expres-sion of marker genes for different hematopoietic cell types in the GImanifold UMAP projection. Color is scaled by mean expression Z-score of amarker gene panel. (B) Hematopoietic differentiation hierarchy. K562 cellsare a poorly differentiated, erythroid-like cancer cell line. (C) Perturb-seqprofiling of the CBL/CNN1 GI. Average transcriptional profiles for the twoconstituent single perturbations are compared with the double perturba-tion. Heat maps show deviation in gene expression relative to unperturbed
cells. (D) UMAP projection of single-cell Perturb-seq data in the CBL/CNN1interaction. Each dot is a cell colored according to genetic background.(E) ARCHS4 (35) cell-type term enrichment for genes showing largeexpression changes in CBL/CNN1 doubly perturbed cells. (F) Expression ofhemoglobin in HUDEP2 cells upon cDNA overexpression of CBL or CNN1.Hemoglobin was labeled with anti–fetal hemoglobin antibody andmeasured by flow cytometry. (G) Pelleted HUDEP2 cells. Hemoglobinexpression appears red.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
Norman et al., Science 365, 786–793 (2019) 23 August 2019 5 of 8
Fig. 4. Quantitative model for high-dimensional GIs. (A) Model oftranscriptional genetic interactions. Different transcriptional states define pointson the surface of the GI manifold, and genetic perturbations define vectors oftravel.The model decomposes double perturbations as a linear combination ofthe two constituent single perturbations. (B) Model fit across all GIs measuredwith Perturb-seq. (C) Magnitude of model coefficients compared with GI scorefrom the fitness-level GI map. (D and E) Application of the model to selectedGIs. For each GI, transcriptional profiles for the two constituent single
perturbations are compared with the double perturbation and the model fit.Heat maps show deviation in gene expression relative to unperturbed cells.(F) Visualization of all measured GIs in Perturb-seq experiment. Each GI wascharacterized using features derived from the model (x-axis) and by measuresof similarity among the transcriptional profiles (y-axis).These two viewpointswere each clustered and collapsed to a single dimension using UMAP to definethe two axes.The features defining the two axes are plotted next to them.Categories of GIs are annotated based on features shared within the clusters.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
To explore the ability of Perturb-seq to betterresolve GIs [by analogy with past efforts in othersystems (1, 22, 23)], we examined two strongbuffering interactions that each had scores of+10.1 in our fitness GI map but appeared to be-have differently on a transcriptional level. Anal-ysis by Perturb-seq revealed that the GI betweenKLF1 and CEBPA resulted from genetic epistasis(i.e., one single overexpression phenotypemaskingthe other), resulting in smaller, asymmetrical co-efficients (c1 = 0.19, c2 = 0.72; Fig. 4D and fig.S11A). By contrast, the PTPN12/SNAI1 GI resultedfrom genetic suppression (i.e., when combinedoverexpression of two genes attenuated each other’sindividual phenotypes), resulting in two smallercoefficients (e.g., PTPN12/SNAI1, c1 = 0.60, c2 =0.57; Fig. 4E and fig. S11B). Finally, as discussedabove, synergistic or synthetic lethal interactionstended to result in two larger coefficients (e.g.,CBL/CNN1, c1 = 1.24, c2 = 0.8; Fig. 3C and fig. S11C).A central questionwhen consideringGIs is how
often new or unexpected (neomorphic) pheno-types emerge through the combined action ofgenes. A relatively small number of GIs (lowermode in Fig. 4B) deviated from the expectationgiven by the linear model (which we quantifiedby distance correlation d; materials andmethodsand fig. S11D). A common neomorphic behaviorsimilar to an example observed in yeast (22) oc-curred when a perturbation that had little tran-scriptional effect on its own appeared to enhancethe effects of a second perturbation (e.g., FEV/CBFA2T3, d = 0.74; fig. S11E). We also observed
relatively rare instances in which double pheno-types appeared to be completely unexpected [e.g.,the physical interactors PLK4/STIL (24) d = 0.53;fig. S11F]. The model’s parameters thus provideda simple, useful summary of how perturbationscombine (table S9).To look for structure among interactions, we
used a two-dimensional visualization and clus-tering technique (25) (table S9 and materialsand methods). One axis grouped interactionsaccording to properties derived from the modelcoefficients (c1, c2, d), whereas the other groupedinteractions according to how correlated the un-derlying transcriptional responses were (materialsand methods). The resulting figure (Fig. 4F) iden-tified numerous distinct categories of interaction,showing that the model can serve as a general-ization of the one-dimensional “buffering versussynthetic lethal” paradigm that has typically beenused to categorize genetic interactions.
Ordering genes into linear pathwaysusing Perturb-seq
The linear GI model allowed us to make hypo-theses about gene regulation because it can iden-tify, for example, which single perturbationphenotype better explains the double perturbationphenotype. We examined the strong GIs amongthegenesDUSP9,ETS2, andMAPK1. In theDUSP9/ETS2 interaction (Fig. 5A), theDUSP9 phenotypedominated, suggesting thatDUSP9overexpressionantagonized ETS2. Similarly,DUSP9 andMAPK1antagonized each other’s activities (Fig. 5B). Fi-
nally, ETS2 and MAPK1 induced similar pheno-types, andETS2 transcriptionwas activated in allbackgrounds (9.3-fold inMAPK1, 9.2-fold in ETS2,and 35.8-fold in MAPK1/ETS2 overexpression;Fig. 5C). This type of interaction, in which a per-turbation (MAPK1) acts at least partly by up-regulating its partner (ETS2), was uncommonin our dataset (fig. S11G). Taken together, theseresults suggested a linear regulatory pathway inwhichDUSP9 (a phosphatase) inhibitsMAPK1 (akinase) that activates ETS2 (a transcription fac-tor), which is consistent with the known biologyof these gene families (Fig. 5D) (26). Followingsimilar logic, the model allowed us to orient allthe buffering interactions in which one pertur-bation is epistatic to another (Fig. 5E).
Single-cell heterogeneity reveals thetrajectory of GIs
The single-cell resolution afforded by Perturb-seqcan reveal phenotypic heterogeneity for some GIsthat we reasoned could yield further insight intotheir mechanism (Fig. 5F; compare fig. S8). Forexample, cells overexpressing both DUSP9 andMAPK1 showed a range of phenotypes spanningthe transcriptional states observed in cells over-expressing either DUSP9 or MAPK1 alone (Fig.5G). In particular, we observed cells in whichDUSP9 and MAPK1 appeared to suppress eachother’s activity entirely.We reasoned that we could therefore identify
DUSP9 orMAPK1 regulatory targets that showeddiffering sensitivity to the levels of these proteins
Norman et al., Science 365, 786–793 (2019) 23 August 2019 6 of 8
DUSP9 / CtrlCtrl / MAPK1DUSP9 / MAPK1principal curve
F
H
GSingle-cell phenotypicheterogeneity
Exp
ress
ion
accu
mul
atio
n al
ong
prin
cipa
l cur
ve
DU
SP9
expr
essi
on
MAP
K1 e
xpre
ssio
n
Differentially expressed genes
DU
SP9
/ ctrl
DU
SP9
/ MAP
K1
ctrl
/ MAP
K1
identity of perturbation in cell
E
DUSP9 / ctrlCtrl / ETS2DUSP9 / ETS2model prediction
A
B
C
DDUSP9 ETS2
double = 0.75 DUSP9 + 0.25 ETS2 (GI = 5.7, d = 0.90)
ETS2 / ctrlCtrl / MAPK1ETS2 / MAPK1model prediction
MAPK1 ETS2double = 0.47 ETS2 + 1.07 MAPK1 (GI = -2.5, d = 0.87)
DUSP9 MAPK1 ETS2
DUSP9 / CtrlCtrl / MAPK1DUSP9 / MAPK1model prediction
DUSP9 MAPK1
double = 0.39 DUSP9 + 0.23 MAPK1 (GI = 11.0, d = 0.72)
KLF1
MAP2K3
MAP2K6
ZC3HAV1
IGDCC3PTPN12
SET
UBASH3B
ETS2
MAPK1
CEBPA
CEBPB
CEBPE
FOSB
SPI1DUSP9IKZF3
ZBTB10
OSR2
RUNX1T1
phosphatase kinase transcriptionfactor
Fig. 5. Inferring gene-regulatory logic underlying GIs. (A to C)Application of linear genetic interaction model to GIs among DUSP9,MAPK1, and ETS2. (D) Order of pathway inferred from model fits.(E) Epistatic buffering interactions oriented using the genetic interac-tion model. Each arrow denotes a genetic interaction originating in thegene that dominates when the two genes are simultaneously perturbed.Arrow size denotes the degree of dominance as measured byasymmetry of model coefficients. Genetic perturbations with similartranscriptional profiles are colored identically. (F) Stochastic heteroge-neity can cause individual cells (dots) bearing a given geneticperturbation to explore the space on the GI manifold surrounding the
average direction of travel (arrows). (G) UMAP projection of singlecells with overexpression of DUSP9 and/or MAPK1. Black line representsthe principal curve, which tracks the primary direction of variation inthe dataset that can be used to order all cells. (H) Gene expressionaveraged along the principal curve. Each row denotes a cell orderedaccording to position along the principal curve. The left three columnsindicate that cell’s genetic background. At each point, cells that areclose on the principal curve are averaged to produce a local estimateof median gene expression. The heat map shows normalized expressionof differentially expressed genes. The DUSP9 and MAPK1 expressioncolumns show the same data for the targeted genes.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
by exploiting single-cell data. To order cells inan unbiased way by “phenotype,”we computed aprincipal curve measuring the path of maximumvariation in the dataset [(27); Fig. 5G and mate-rials andmethods]. Examining median-filteredgene expression (materials andmethods) alongthis curve revealed distinct classes of transcriptsregulated byDUSP9 orMAPK1 activity (e.g.,GYPAappeared to be more sensitive to DUSP9 activitythan did HBZ; Fig. 5H and fig. S12A). This varia-tion did not appear to be the result of stable dif-ferences in the expression of MAPK1 and DUSP9(fig. S12B), suggesting a possible role either forhistorical differences or stochastic gene expres-sion. Perturb-seq can therefore reveal gradedphenotypes resulting from antagonism betweentwo proteins.
Predicting GIs using arecommender system
One possible strategy to address the scale ofgenetic interactions is to pursue a split experi-
mental and computational approach, in whichonly a subset of interactions is sampled [eitherrandomly or through “compressed” experimen-tal designs (28)] and the remainder is predictedcomputationally (Fig. 6A) (7, 29, 30). Perturb-seqprovides a scalable means of constraining thesetypes of searches (7), enabling exploration of theGI landscape.There is substantial similarity between this
problem and that of predicting a person’s shop-ping preferences based on past buying behavior,which is commonly addressed using “recommen-der system” algorithms.Many of these approachescan exploit low-rank structure such as that seen inGI maps and leverage additional side informationobtained by other means to improve predictivepower. We examined the Perturb-seq profiles ofsingle-gene overexpression (fig. S13A) and foundthat therewas amodest concordance betweenGIprofile and Perturb-seq profile correlations (R =0.29, p < 10−103; fig. S13B), suggesting that thetranscriptional data provided a complementary,
scalable means of comparing genes that mightinform GI prediction.We constructed a matrix factorization model
for fitness GIs and then constrained this modelto encourage similar interaction profiles amonggenes that induced similar transcriptional changes(fig. S14A and materials and methods) (30, 31).We then predicted unobserved GIs using thismodel trained on different fractions of randomlysubsampled interactions (Fig. 6A and materialsand methods). The end result (Fig. 6, B to D)preserved much of the large-scale structure ofthe map as seen through block averaging of GIscores (Fig. 6, C and D; fig. S14B; and materialsand methods). Our approach was also substan-tially better than random sampling at predict-ing the top 10% of interactions and reasonablypreserved both the rank order of all interactions(Spearman r ≈ 0.5 at 10% sampling; Fig. 6, D andE and fig S14C) and the pairwise similaritiesbetween GI profiles (Fig. 6F). Notably, the use ofPerturb-seq–derived single perturbation profilesas side information substantially improved per-formance (fig. S14D). These results suggest thatthe hybrid approach can nominate blocks of GIsfor in-depth study.Finally, we used our data to model the mini-
mum number of cells that would be required toperform larger experiments. By downsamplingour measured perturbations and reperformingour analyses, we observed that as few as 50 cellsper perturbation could be sufficient,meaning that~106 cells would be needed to collect side infor-mation for the entire set of protein-coding se-quences (Fig. 6G).
Discussion
A central goal of genetics is to understand therelationship between the set of genes that a cellexpresses and its phenotype. However, this rela-tionship is challenging to study because manyphenotypes emerge only through the coordinatedaction of multiple genes. Here, we used Perturb-seq to manipulate a large number of gene pairsand then measure the resulting changes in cellstate. This ensemble of measurements describeda high-dimensional surface called a GI manifold.By interpreting and modeling the GI manifold,we can gain several insights into how complexphenotypes emerge.First, the transcriptional profiles can distin-
guish distinct outcomes such as cell death, slowgrowth, and differentiation to a variety of cellstates that would appear equivalent at the fitnesslevel. We also identified both canonical genes(e.g., KLF1, GATA1) and unexpected genes (e.g.,CNN1) that interacted to promote differentiationto a specific cell state (erythrogenesis). As our single-cell approach is sensitive to multiple outcomesor perturbations with incomplete penetrance,it is a natural strategy to pursue combinatorialsearches for factors driving (trans)differentiation(4). Second, the shape of the GI manifold can re-veal howGIs arise.We derived a simple, geometricGImodel and used it to identify the different waysin which genetic perturbations combine to yieldnew phenotypes, allowing us to, for example,
Norman et al., Science 365, 786–793 (2019) 23 August 2019 7 of 8
True GI map
Pre
dict
ed G
I map
LLLLLLLLLLLLLLLLLLLLLLL
Recommendersystem
Prioritize regions of interest
Perturb-seq side information about
gene a
Per
turb
-seq
sid
e in
form
atio
n ab
out
gene
b Sparse samplingof fitness
interactions
EstimatedGI map
A B C
ED
−0.2 0.0 0.2True GI score
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
Pre
dict
ed G
I sco
re
FSimilarity of GI scoresSimilarity of GI scores G
Cop
hene
tic c
orre
latio
n
10 20 30 50 100Cells per perturbation
0.75
0.80
0.85
0.90
0.95
Similarity of Perturb-seq profiles
Similarity of GI profilecorrelations
5 10 15 20 25Sampling percentage
Spe
arm
an c
orre
latio
n
0.4
0.6
0.8
5 10 15 20 25Sampling percentage
0.4
0.5
0.6
0.7
0.8
Cop
hene
tic c
orre
latio
n
−0.06−0.030.000.030.06
Block-averagedtrue GI map
Block-averagedpredicted GI map −0.04
0.000.04
Fig. 6. Recommender system for exploring the GI landscape. (A) Schematic of predictionstrategy. Fitness phenotypes of a limited subset of GIs are measured. Each gene is characterized byits Perturb-seq transcriptional profile, and similarity among these profiles is used as side informationto constrain a recommender system model to impute remaining fitness GI scores and highlightregions of interest. (B) True versus predicted GI map obtained by prediction from 10% of randomlysampled fitness-level GIs. (C) Block-averaged true and predicted GI maps obtained by averaging GIscores within clusters. (D) Scatter plot of true and predicted GI scores (blue dots) from (B). Thedashed lines show 5 and 95% quantiles, which are used to designate strong GIs. Orange dots showequivalent scatter for block-averaged GI scores in (C). (E) Spearman correlation between trueand predicted GI scores at different levels of random sampling. Fifty random subsets were measuredfor each sampling level. Blue and orange denote individual and block-averaged GIs, respectively.(F) Cophenetic correlation of GI profiles as a function of sampling level, measuring the similarityof correlation structure in the true and predicted GI maps. (G) To assess scaling ability, therepresentation of each perturbation in the Perturb-seq experiment was randomly down-sampled todifferent levels of representation. Plot shows cophenetic correlation between down-sampled and truetranscriptional profiles used to construct the GI manifold visualization of Fig. 2.
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
order genes into linear pathways. We and othershave established that functionally related genesinteract similarly—in geometric terms, the GImanifold is therefore highly constrained, enablingimputation strategies (1, 2, 7, 12, 29). Our re-sults provide a strategy for exploring largespaces of combinatorial genetic interactionsby measuring only a subset of fitness-level GIs.This provides a complementary approach to ef-forts based on composite measurements, com-pressed sensing, and rich readouts to predictunmeasured GIs (7, 28). By intelligently mea-suring and exploring the GI manifold, one canstart to create a global view of the nonlinear map-ping between genotype and phenotype. Such ap-proaches should enable large-scale searches forsynthetic lethal interactions in cancer, the dis-covery of gene targets that lessen the severity ofgenetic disease, and, more generally, the under-standing of how complex, multigenic interactionsgovern biological traits and disease risk.
REFERENCES AND NOTES
1. M. Costanzo et al., Cell 177, 85–100 (2019).2. J. Domingo, P. Baeza-Centurion, B. Lehner, Annu. Rev. Genomics
Hum. Genet. 20, annurev-genom-083118-014857 (2019).3. J. L. Hartman 4th, B. Garvik, L. Hartwell, Science 291,
1001–1004 (2001).4. K. Takahashi, S. Yamanaka, Cell 126, 663–676 (2006).5. D. A. Jaitin et al., Cell 167, 1883–1896.e15 (2016).6. B. Adamson et al., Cell 167, 1867–1882.e21 (2016).7. A. Dixit et al., Cell 167, 1853–1866.e17 (2016).8. P. Datlinger et al., Nat. Methods 14, 297–301 (2017).9. M. Gasperini et al., Cell 176, 377–390.e19 (2019).10. J. M. Replogle et al., bioRxiv 503367, (2018).11. L. A. Gilbert et al., Cell 159, 647–661 (2014).12. M. A. Horlbeck et al., Cell 174, 953–967.e22 (2018).13. W. Huang, B. T. Sherman, R. A. Lempicki, Nucleic Acids Res.
37, 1–13 (2009).
14. L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifoldapproximation and projection for dimension reduction.arXiv:1802.03426v2 [stat.ML] (6 December 2018).
15. L. McInnes, J. Healy, S. Astels, J. Open Source Softw. 2, 205 (2017).16. R. Alitalo, Leuk. Res. 14, 501–514 (1990).17. S. H. Orkin, L. I. Zon, Cell 132, 631–644 (2008).18. K. Kowanetz et al., J. Biol. Chem. 279, 32786–32795 (2004).19. S. J. Winder, M. P. Walsh, Cell. Signal. 5, 677–686 (1993).20. GTEx Consortium; Laboratory, Data Analysis & Coordinating
Center (LDACC)—Analysis Working Group; Statistical Methodsgroups—Analysis Working Group; Enhancing GTEx (eGTEx)groups; NIH Common Fund; NIH/NCI; NIH/NHGRI; NIH/NIMH;NIH/NIDA; Biospecimen Collection Source Site—NDRI;Biospecimen Collection Source Site—RPCI; Biospecimen CoreResource—VARI; Brain Bank Repository—University of MiamiBrain Endowment Bank; Leidos Biomedical—ProjectManagement; ELSI Study; Genome Browser Data Integration& Visualization—EBI; Genome Browser Data Integration& Visualization—UCSC Genomics Institute, University ofCalifornia Santa Cruz; Lead analysts; Laboratory, DataAnalysis & Coordinating Center (LDACC); NIH programmanagement; Biospecimen collection; Pathology; eQTLmanuscript working group; A. Battle, C. D. Brown,B. E. Engelhardt, S. B. Montgomery, Nature 550, 204–213 (2017).
21. R. Kurita et al., PLOS ONE 8, e59890 (2013).22. S. van Wageningen et al., Cell 143, 991–1004 (2010).23. N. Van Driessche et al., Nat. Genet. 37, 471–477 (2005).24. M. Ohta et al., Nat. Commun. 5, 5267 (2014).25. Y. Cheng, M. T. Wong, L. van der Maaten, E. W. Newell,
J. Immunol. 196, 924–932 (2016).26. C. J. Caunt, S. M. Keyse, FEBS J. 280, 489–504 (2013).27. C. Trapnell et al., Nat. Biotechnol. 32, 381–386 (2014).28. B. Cleary, L. Cong, A. Cheung, E. S. Lander, A. Regev, Cell 171,
1424–1436.e18 (2017).29. N. S. Madhukar, O. Elemento, G. Pandey, Front. Bioeng.
Biotechnol. 3, 172 (2015).30. M. Žitnik, B. Zupan, J. Comput. Biol. 22, 595–608 (2015).31. M. Xu, R. Jin, Z.-H. Zhou in Advances in Neural Information
Processing Systems, M. I. Jordan, Y. LeCun, S. A. Solla, Eds.(MIT Press, 2013), pp. 2301–2309.
ACKNOWLEDGMENTS
We thank B. Adamson, O. Chen, M. Cho, J. Corn, C. Forester,J. Hussmann, S. Weissman, the Innovative Genomics Institute, and
members of the Weissman and Gilbert labs for discussion orreagents. Funding: This work was funded by grants from theNational Institutes of Health (P50 GM102706, U01 CA168370, U01CA217882, RM1 HG009490, and R01 DA036858 to J.S.W.; K99/R00 CA204602 and DP2 CA239597 to L.A.G.; and F32 GM116331to M.J.) and DARPA (HR0011-19-2-0007 to J.S.W. and L.A.G.).J.S.W. is a Howard Hughes Medical Institute Investigator. L.A.G. isthe Goldberg-Benioff Endowed Professor in Prostate CancerTranslational Biology. T.M.N. is a fellow of the Damon RunyonCancer Research Foundation DRG-2211-15. M.A.H. is a Byers FamilyDiscovery Fellow and is supported by the UCSF Medical ScientistTraining Program and the School of Medicine. Authorcontributions: T.M.N., M.A.H., L.A.G., and J.S.W. were responsiblefor the conception, design, and interpretation of the experimentsand wrote the manuscript. M.J. and J.M.R. provided criticalfeedback on the manuscript. L.A.G., M.J., A.X., and J.M.R.constructed GI vectors, cDNA vectors, and libraries. T.M.N., J.M.R.,A.X., A.Y.G., M.A.H., and L.A.G. conducted the GI mapping andPerturb-seq experiments. M.A.H. performed the GI map dataanalysis. T.M.N. performed the Perturb-seq analysis, GI predictionanalysis, and modeling of transcriptional genetic interactions. L.A.G.,T.M.N., A.Y.G., and A.X. designed and conducted validationexperiments. Competing interests: T.M.N., M.A.H., L.A.G., M.J.,and J.S.W. have filed patent applications related to CRISPRi/ascreening, Perturb-seq, and GI mapping. J.S.W. consults for andholds equity in KSQ Therapeutics, Maze Therapeutics, and TenayaTherapeutics. J.S.W. is a venture partner at 5AM Ventures. T.M.N.,M.J., J.M.R., and M.A.H. consult for Maze Therapeutics. Theremaining authors declare no competing interests. Data andmaterials availability: All data are available in the manuscript, thesupplementary materials, or GEO (GSE133344). Notebooksimplementing analyses are available at https://github.com/mhorlbeck and https://github.com/thomasmaxwellnorman.
SUPPLEMENTARY MATERIALS
science.sciencemag.org/content/365/6455/786/suppl/DC1Materials and MethodsFigs. S1 to S14Tables S1 to S9References (32–35)
29 March 2019; accepted 29 July 2019Published online 8 August 201910.1126/science.aax4438
Norman et al., Science 365, 786–793 (2019) 23 August 2019 8 of 8
RESEARCH | RESEARCH ARTICLEon M
arch 8, 2021
http://science.sciencemag.org/
Dow
nloaded from
Exploring genetic interaction manifolds constructed from rich single-cell phenotypes
S. WeissmanThomas M. Norman, Max A. Horlbeck, Joseph M. Replogle, Alex Y. Ge, Albert Xu, Marco Jost, Luke A. Gilbert and Jonathan
originally published online August 8, 2019DOI: 10.1126/science.aax4438 (6455), 786-793.365Science
, this issue p. 786Scienceof genes in pathways and systematic classifications of GIs.cell. Each perturbation and GI projects the cell state to a particular position on this manifold, enabling unbiased ordering perturbation screens. Diverse transcriptomic phenotypes construct a ''manifold'' representing all possible states of theapproach leverages high-dimensional single-cell RNA sequencing data gathered from CRISPR-mediated, pooled
developed a framework for mapping and understanding GIs. Thiset al.the mechanistic origin of interactions. Norman Mapping of genetic interactions (GIs) is usually based on cell fitness as the phenotypic readout, which obscures
Manifold destiny
ARTICLE TOOLS http://science.sciencemag.org/content/365/6455/786
MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2019/08/07/science.aax4438.DC1
CONTENTRELATED http://stm.sciencemag.org/content/scitransmed/11/503/eaaw3768.full
REFERENCES
http://science.sciencemag.org/content/365/6455/786#BIBLThis article cites 32 articles, 3 of which you can access for free
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Terms of ServiceUse of this article is subject to the
is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience
Science. No claim to original U.S. Government WorksCopyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of
on March 8, 2021
http://science.sciencem
ag.org/D
ownloaded from