9
GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3 *, Max A. Horlbeck 1,2,3 *, Joseph M. Replogle 1,2,3 , Alex Y. Ge 4,5 , Albert Xu 1,2,3 , Marco Jost 1,2,3 , Luke A. Gilbert 4,5 , Jonathan S. Weissman 1,2,3 How cellular and organismal complexity emerges from combinatorial expression of genes is a central question in biology. High-content phenotyping approaches such as Perturb-seq (single-cell RNA-sequencing pooled CRISPR screens) present an opportunity for exploring such genetic interactions (GIs) at scale. Here, we present an analytical framework for interpreting high-dimensional landscapes of cell states (manifolds) constructed from transcriptional phenotypes.We applied this approach to Perturb-seq profiling of strong GIs mined from a growth-based, gain-of-function GI map. Exploration of this manifold enabled ordering of regulatory pathways, principled classification of GIs (e.g., identifying suppressors), and mechanistic elucidation of synergistic interactions, including an unexpected synergy between CBL and CNN1 driving erythroid differentiation. Finally, we applied recommender system machine learning to predict interactions, facilitating exploration of vastly larger GI manifolds. T he complexity of cell types in multicellular organisms is driven not by a large increase in gene number but instead by the com- binatorial expression of a surprisingly small number of components. Specific combina- tions of genes exhibit emergent properties when expressed together, enabling the generation of many diverse cell types and behaviors. Search- ing for such emergent properties is enabled by the quantitative study of genetic interactions (GIs), which compare the phenotypic consequences of perturbing a pair of genes alone or in combina- tion, typically by measuring growth [although other phenotypic readouts such as reporter gene expres- sion and transcriptional responses have been explored (1, 2)]. GIs can reveal synthetic lethal vulnerabilities in tumors, identify suppressors of inherited and acquired disorders, and guide the design of cocktails of genes to drive differentia- tion between cell types (3, 4). Pioneering efforts in yeast to construct systematic GI maps between all gene pairs have enabled systematic determi- nation of gene function, identification of protein complexes, and definition of gene-regulatory net- works in a principled and unbiased manner (1, 2). Recent studies have extended such approaches to mammalian and other metazoan systems, but these efforts face two major challenges: scale and information content (1, 2). For example, mapping pairwise interactions among the ~10,000 tran- scribed genes in a human cell would require measuring ~50 million double mutants. This scale has necessitated the use of highly parallelizable readouts of phenotype, such as growth rate, that obscure the mechanistic or molecular basis for any particular interaction. Put simply, there are many ways for cells to appear equally unfit: the reprogramming of a pluripotent cell to a termi- nally differentiated neuron may affect growth as much as induction of apoptosis or cell cycle arrest. Furthermore, many metazoan cell types are quiescent or postmitotic and as such are not amenable to growth-based screens. Finally, bulk measures of their properties may obscure impor- tant cell-to-cell variability. Emerging high-throughput approaches for monitoring rich phenotypes of individual cells (e.g., imaging or droplet single-cell transcriptom- ics) present a potential solution to these problems. For example, Perturb-seq pairs CRISPR-based screens with single-cell RNA sequencing (58). Each individual cell is in effect an independent experiment connecting a genetic perturbation to its transcriptional consequences, allowing hundreds of thousands of parallel measurements (9, 10). It has been suggested that the rich phenotypes enabled by Perturb-seq can be used to better in- terpret the impact of genetic interactions (7). In this study, we exploit the scalability and rich transcriptome readout of Perturb-seq to imple- ment a principled approach for systematically studying mammalian GIs. Each transcriptional measurement, and effectively each genetic per- turbation (single or combinatorial), can be viewed as defining a point in high-dimensional space. By measuring many GIs, we therefore effectively map out a surfacea manifold in mathematical terminologythat describes the transcriptional states that a cell can occupy upon perturbation (Fig. 1A). We argue that this GI manifold carries much more information about interactions than a traditional GI map and is intrinsically more interpretable in several ways, including the abil- ity to resolve the distinct outcomes underlying GIs and to model the different ways genetic per- turbations combine to yield new phenotypes. An overexpression strategy identifies strong genetic interactions Most previous studies of GIs have focused on loss-of-function perturbations, but many impor- tant cellular processes such as differentiation are also associated with activation of gene expres- sion (e.g., MYOD1 in muscle cells). Genes that exhibit phenotypes when expressed alone have a higher rate of genetic interactions with other genes (1, 2). To identify mechanistically diverse GIs arising from gene activation, we therefore selected 112 hitgenes whose activation en- hances or retards growth of K562 cells (Fig. 1A and table S1) (11), including cell cycle regulators, transcription factors, kinases, phosphatases, and genes of unknown function. To systematically measure gain-of-function (overexpression) GIs, we adapted a technology developed for constructing fitness GI maps in hu- man cells using CRISPR interference (CRISPRi) (12) (Fig. 1, A and B, and fig. S1). Each candidate interaction was probed by constructing a library of vectors containing pairs of single-guide RNAs (sgRNAs) (table S2). As we included two distinct sgRNAs targeting each gene, a total of 28,680 unique sgRNA pairs were tested. K562 cells stably expressing the SunTag CRISPR activation (CRISPRa) system ( 11 ) were transduced with the CRISPRa GI library, and sgRNA pair abundance was compared at the start of the screen and after 10 days of growth to measure fitness phenotypes. GI scores were assigned by measuring deviation between the observed fitness of overexpressing both genes from the expected fitness based on the average impact of each single gene [(12); see materials and methods, fig. S2, and table S3]. Independent replicate experiments showed high levels of con- cordance for sgRNA-level GI scores and GI profile correlations, and independent sgRNAs targeting the same gene were much more similar than the background of all sgRNA GI correlations (me- dian R = 0.50 compared with 0.04; Fig. 1, C and D; fig. S3A; and table S4). The gene-level GI scores obtained by averaging sgRNAs targeting the same gene were also well correlated between replicates (gene-level GI R = 0.80, p < 10 300 ; Fig. 1E, fig. S3A, and table S5) and followed a bell-shaped distribution but with GIs ranging well beyond the expectation from negative control sgRNAs (Fig. 1F and fig. S3B). We then clustered genes according to the sim- ilarity of their GI profiles to produce a GI map (Fig. 1B; a larger version with gene labels is provided in fig. S4). Highly correlated genes were enriched for genes with the same DAVID term annotations (13), allowing for unbiased annota- tion of clusters (Fig. 1B; fig. S5, A and B; and table S6). In addition, the map contained fewer clusters than would be expected in a random map (fig. S5, C and D), consistent with the typical low-rank structure of GI maps (i.e., groups of genes interact RESEARCH Norman et al., Science 365, 786793 (2019) 23 August 2019 1 of 8 1 Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA 94158, USA. 2 Howard Hughes Medical Institute, University of California, San Francisco, CA 94158, USA. 3 California Institute for Quantitative Biomedical Research, University of California, San Francisco, CA 94158, USA. 4 Department of Urology, University of California, San Francisco, CA 94158, USA. 5 Helen Diller Family Comprehensive Cancer Center, San Francisco, CA 94158, USA. *These authors contributed equally to this work. Corresponding author. Email: [email protected] (T.M.N.); [email protected] (L.A.G.); [email protected] (J.S.W.) on March 8, 2021 http://science.sciencemag.org/ Downloaded from

GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

GENOMICS

Exploring genetic interactionmanifolds constructed from richsingle-cell phenotypesThomas M. Norman1,2,3*†, Max A. Horlbeck1,2,3*, Joseph M. Replogle1,2,3, Alex Y. Ge4,5,Albert Xu1,2,3, Marco Jost1,2,3, Luke A. Gilbert4,5†, Jonathan S. Weissman1,2,3†

How cellular and organismal complexity emerges from combinatorial expression of genes isa central question in biology. High-content phenotyping approaches such as Perturb-seq(single-cell RNA-sequencing pooled CRISPR screens) present an opportunity for exploring suchgenetic interactions (GIs) at scale. Here, we present an analytical framework for interpretinghigh-dimensional landscapes of cell states (manifolds) constructed from transcriptionalphenotypes.We applied this approach to Perturb-seq profiling of strong GIs mined from agrowth-based, gain-of-function GI map. Exploration of this manifold enabled ordering ofregulatory pathways, principled classification of GIs (e.g., identifying suppressors), andmechanistic elucidation of synergistic interactions, including an unexpected synergy betweenCBL and CNN1 driving erythroid differentiation. Finally, we applied recommender systemmachine learning to predict interactions, facilitating exploration of vastly larger GI manifolds.

The complexity of cell types in multicellularorganisms is driven not by a large increasein gene number but instead by the com-binatorial expression of a surprisingly smallnumber of components. Specific combina-

tions of genes exhibit emergent properties whenexpressed together, enabling the generation ofmany diverse cell types and behaviors. Search-ing for such emergent properties is enabled bythe quantitative study of genetic interactions(GIs), which compare the phenotypic consequencesof perturbing a pair of genes alone or in combina-tion, typically bymeasuring growth [althoughotherphenotypic readouts such as reporter gene expres-sion and transcriptional responses have beenexplored (1, 2)]. GIs can reveal synthetic lethalvulnerabilities in tumors, identify suppressors ofinherited and acquired disorders, and guide thedesign of cocktails of genes to drive differentia-tion between cell types (3, 4). Pioneering effortsin yeast to construct systematic GImaps betweenall gene pairs have enabled systematic determi-nation of gene function, identification of proteincomplexes, and definition of gene-regulatory net-works in a principled and unbiasedmanner (1, 2).Recent studies have extended such approaches

to mammalian and other metazoan systems, butthese efforts face twomajor challenges: scale andinformation content (1, 2). For example,mappingpairwise interactions among the ~10,000 tran-

scribed genes in a human cell would requiremeasuring ~50million doublemutants. This scalehas necessitated the use of highly parallelizablereadouts of phenotype, such as growth rate, thatobscure the mechanistic or molecular basis forany particular interaction. Put simply, there aremany ways for cells to appear equally “unfit”: thereprogramming of a pluripotent cell to a termi-nally differentiated neuron may affect growthas much as induction of apoptosis or cell cyclearrest. Furthermore, many metazoan cell typesare quiescent or postmitotic and as such are notamenable to growth-based screens. Finally, bulkmeasures of their properties may obscure impor-tant cell-to-cell variability.Emerging high-throughput approaches for

monitoring rich phenotypes of individual cells(e.g., imaging or droplet single-cell transcriptom-ics) present a potential solution to these problems.For example, Perturb-seq pairs CRISPR-basedscreens with single-cell RNA sequencing (5–8).Each individual cell is in effect an independentexperiment connecting a genetic perturbation toits transcriptional consequences, allowing hundredsof thousands of parallel measurements (9, 10).It has been suggested that the rich phenotypesenabled by Perturb-seq can be used to better in-terpret the impact of genetic interactions (7).In this study, we exploit the scalability and rich

transcriptome readout of Perturb-seq to imple-ment a principled approach for systematicallystudying mammalian GIs. Each transcriptionalmeasurement, and effectively each genetic per-turbation (single or combinatorial), can be viewedas defining a point in high-dimensional space. Bymeasuring many GIs, we therefore effectivelymap out a surface—a manifold in mathematicalterminology—that describes the transcriptionalstates that a cell can occupy upon perturbation(Fig. 1A). We argue that this GI manifold carriesmuch more information about interactions than

a traditional GI map and is intrinsically moreinterpretable in several ways, including the abil-ity to resolve the distinct outcomes underlyingGIs and to model the different ways genetic per-turbations combine to yield new phenotypes.

An overexpression strategy identifiesstrong genetic interactions

Most previous studies of GIs have focused onloss-of-function perturbations, but many impor-tant cellular processes such as differentiation arealso associated with activation of gene expres-sion (e.g., MYOD1 in muscle cells). Genes thatexhibit phenotypes when expressed alone havea higher rate of genetic interactions with othergenes (1, 2). To identify mechanistically diverseGIs arising from gene activation, we thereforeselected 112 “hit” genes whose activation en-hances or retards growth of K562 cells (Fig. 1Aand table S1) (11), including cell cycle regulators,transcription factors, kinases, phosphatases, andgenes of unknown function.To systematically measure gain-of-function

(overexpression) GIs, we adapted a technologydeveloped for constructing fitness GI maps in hu-man cells using CRISPR interference (CRISPRi)(12) (Fig. 1, A and B, and fig. S1). Each candidateinteraction was probed by constructing a libraryof vectors containing pairs of single-guide RNAs(sgRNAs) (table S2). As we included two distinctsgRNAs targeting each gene, a total of 28,680unique sgRNApairs were tested. K562 cells stablyexpressing the SunTagCRISPRactivation (CRISPRa)system (11) were transduced with the CRISPRa GIlibrary, and sgRNApair abundancewas comparedat the start of the screen and after 10 days ofgrowth to measure fitness phenotypes. GI scoreswere assigned by measuring deviation betweenthe observed fitness of overexpressing both genesfrom the expected fitness based on the averageimpact of each single gene [(12); see materialsand methods, fig. S2, and table S3]. Independentreplicate experiments showed high levels of con-cordance for sgRNA-level GI scores and GI profilecorrelations, and independent sgRNAs targetingthe same gene were much more similar than thebackground of all sgRNA GI correlations (me-dianR = 0.50 comparedwith 0.04; Fig. 1, C andD;fig. S3A; and table S4). The gene-level GI scoresobtained by averaging sgRNAs targeting the samegenewere alsowell correlated between replicates(gene-level GI R = 0.80, p < 10−300; Fig. 1E, fig.S3A, and table S5) and followed a bell-shapeddistribution but with GIs ranging well beyondthe expectation from negative control sgRNAs(Fig. 1F and fig. S3B).We then clustered genes according to the sim-

ilarity of their GI profiles to produce a GI map(Fig. 1B; a larger version with gene labels isprovided in fig. S4). Highly correlated genes wereenriched for genes with the same DAVID termannotations (13), allowing for unbiased annota-tion of clusters (Fig. 1B; fig. S5, A and B; and tableS6). In addition, the map contained fewer clustersthan would be expected in a random map (fig.S5, C andD), consistent with the typical low-rankstructure of GI maps (i.e., groups of genes interact

RESEARCH

Norman et al., Science 365, 786–793 (2019) 23 August 2019 1 of 8

1Department of Cellular and Molecular Pharmacology, Universityof California, San Francisco, CA 94158, USA. 2Howard HughesMedical Institute, University of California, San Francisco, CA94158, USA. 3California Institute for Quantitative BiomedicalResearch, University of California, San Francisco, CA 94158,USA. 4Department of Urology, University of California,San Francisco, CA 94158, USA. 5Helen Diller FamilyComprehensive Cancer Center, San Francisco, CA 94158, USA.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (T.M.N.);[email protected] (L.A.G.); [email protected] (J.S.W.)

on March 8, 2021

http://science.sciencem

ag.org/D

ownloaded from

Page 2: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

similarly so that there are fewer overall degreesof freedom than total genes). Thus, the structureof the CRISPRa GI map, like past efforts basedon loss-of-function alleles, can assign function toindividual genes by the similarity of GI profiles(1, 2). However, although the GI map robustlyidentifiedmany strongGIs, the origins of specificinteractions were difficult to deduce as each GIwas characterized only by a single scalar value(i.e., deviation from expected growth rate).

A Perturb-seq platform formeasuring GIs

We reasoned that Perturb-seq, which allowssingle-cell, pooled transcriptional profiling ofCRISPR-mediated perturbations (5–8), wouldenable us to better resolve the mechanisms un-derlying GIs. We picked 132 gene pairs from theGI map, chosen both within and between blocksof genes with similar interaction profiles, andtargeted each with CRISPRa sgRNA pairs (Fig.1A, fig. S6A, table S2, andmaterials andmethods).Given the low-rank structure of the fitness GImap, we reasoned that we could broadly samplethe biological processes representedwithoutmea-suring all gene pairs, asmanyGIs that fell into the“blocks” in the GI map were likely explained bysimilar mechanisms (materials and methods).We also profiled all single-gene perturbations toenable direct comparison of individual and com-bined perturbations (i.e., single geneA, single geneB, and pair AB). In total, we obtained transcrip-tional readouts for 287 perturbations measuredacross ~110,000 single cells (median 273 cells percondition; materials and methods, fig S1, andtable S7) in one pooled experiment.The Perturb-seq profiles also allowed us to

directly assess the performance of our CRISPRareagents (table S7). Levels of target gene activa-tion spanned a broad range (Fig. 1G and fig. S6, Band C), with a general trend that poorly expressedtargets were more highly induced. The A and Bpositions of the sgRNA cassette performed sim-ilarly (Fig. 1G and fig. S6, D and E), and expressionof genes neighboring the target was generallyunperturbed exceptwhen transcripts shared pro-moter regions (materials and methods; fig. S7, Aand B; and table S8). Finally, there was minimalcorrelation between fold activation and the num-ber of differentially expressed genes, implying thateven small increases in themRNA abundance ofsome genes can strongly alter a cell’s state (R =0.07; Fig. 1H). The degree of fitness defect wasrelated to the number of differentially expressedgenes (fig. S7C).

Constructing a GI manifold revealsbiological processes driving GIs

Whereas GI maps assign a scalar score to eachGI, our Perturb-seq approach instead associatesa transcriptional phenotype. We viewed thisensemble of measurements as defining a high-dimensional analog of a GI map, here termed aGI manifold. In our manifold analogy, each pos-sible cellular transcriptional state defines a pointon a high-dimensional surface. By applying a di-verse set of perturbations and measuring the

Norman et al., Science 365, 786–793 (2019) 23 August 2019 2 of 8

MAP2K

ETS2/MAPK1

Ig d

omai

n co

ntai

ning

Histidi

ne p

hosp

hata

se

Tyro

sine

phos

phat

ase

Centriole

DNA binding

MAP4K

LIM-type zinc finger TF

Brachyury TF

Kinesin

BAK/BIM

bLZ TF

Blood coagulation

Forkhead TF

CDK inhibitor

N-terminal Forkhead TF

Response to peptide hormone

A

B

D

E F G H

Combinatorial overexpression library

Fitness-level geneticinteraction (GI) map

Perturb-seq profilingof GIs

Genetic interactionmanifold

Position B

Position ADoubles

C

−0.5 0.0 0.5 1.0

replicate 1

sgRNA-level GIprofile correlations

−0.5

0.0

0.5

1.0

repl

icat

e 2

All sgRNA pairsSame gene

R=0.88, p<10-300

≤-4.0

0.0

≥4.0

GI score

-0.4

0.0

0.4

Gene fitnessphenotype (γ)

100

10-3.3

10-6.5

Term enrichmentp-value

Fold activation of target

100

100 101 102 103

101

102

103N

umbe

r of

diff

eren

tially

exp

ress

ed g

enes

Fold activation in A position

Fol

d ac

tivat

ion

in B

pos

ition

100 101 102 103

100

101

102

103

Mean expression profilesabab

fitness more fitless fit

bbbab

cd

cd

ab

aaa

ccd

Gene a

Double ab

Gene b

replicate 1

−10

0

10

repl

icat

e 2

−10 0 10

R=0.80, p<10-300

>6 SD R=0.94, p<10-133

gene-level GI scores

−0.5 0.0 0.5 1.0

sgRNA-level GI profile correlation

0

50

100

% m

axim

um

same gene, median=0.50all genes, median=0.04

−10 0 10

GI score

num

ber

of g

ene

pairs

Estimated 5% FDR

10¹

10³

0

-2.77 2.83

• activate all pairs among 112 target genes • 2 sgRNAs per target gene• 28,680 total sgRNA pairs

• fitness GI scores for all sgRNA pairs and for all gene pairs (5%-95% range: -3.5 to 3.3)

• profile 287 gene pair and single gene perturbations• ~110,000 single-cell transcriptomes

Fig. 1. CRISPRa fitness-level GI map. (A) Experimental strategy. Pairs of genes were systemati-cally coactivated using dual sgRNA CRISPRa libraries and a GI map was generated from the fitnessmeasurements. A subset of GIs was then profiled transcriptionally using Perturb-seq. Thesehigh-dimensional measurements define a surface called a GI manifold. Distinct GIs that lie inmarkedly different parts of the GI manifold may result in similar outcomes when viewed only atthe level of fitness. (B) CRISPRa fitness-level GI map. Gene-level GI profiles were clustered byaverage linkage hierarchical clustering based on Pearson correlation. Clusters were annotatedby assigning DAVID annotations if a DAVID term was significantly enriched in that cluster[hypergeometric ln(p) ≤ –7.5; see materials and methods]. (C and D) GI profile correlation betweenpairs of sgRNAs targeting any genes (black) or the same gene (green). Data are displayed as scatterplot of replicates (C) and histogram of replicate-averaged GIs (D). (E and F) Gene-level GI scoresgenerated by averaging all sgRNA-level GIs for each gene pair. (E) Scatter plot of replicates.Red points indicate nontargeting control sgRNA pairs, and dashed line indicates a radius of 6 SDsfrom nontargeting controls. (F) Histogram of gene-level GI scores with estimated empirical 5% falsediscovery rate threshold. (G) Comparison of fold activation of target gene measured by Perturb-seq whenthe targeting sgRNA is in the A or B position in the dual sgRNA expression cassette. (H) Fold activation ofthe target gene compared with the total number of differentially expressed genes.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 3: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

resulting states (as we have), it becomes possibleto infer the shape of this surface (Fig. 1A). More-over, as each GI is characterized by a rich pheno-type, we envisioned that this perspective wouldallow us to organize GIs by common features andglobally examine theirmechanistic underpinnings.

To visualize this GI manifold, we used UMAP[uniformmanifold approximation and projection(14)] to project the mean expression profiles forour 287 perturbations into a two-dimensionalplane (Fig. 2, A and B; a single-cell version isshown in fig. S8). This algorithm approximates

the shape of a high-dimensional surface in twodimensions by trying to conserve nearest neigh-bor relationships. Perturbations that inducesimilar transcriptional changes then naturallycluster close to each other; we defined stableclusters using the HDBSCAN algorithm [(15);

Norman et al., Science 365, 786–793 (2019) 23 August 2019 3 of 8

AHR / FEV

AHR / KLF1

AHR / ctrl

ARID1A / ctrl

ARRDC3 / ctrl

ATL1 / ctrl

BAK1 / ctrl

BCL2L11 / BAK1

BCL2L11 / ctrl

BCL2L11 / TGFBR2

BCORL1 / ctrl

BPGM / ctrlBPGM / SAMD1BPGM / ZBTB1

CBARP / ctrl

CBFA2T3 / ctrl

CBL / CNN1

CBL / ctrl

CBL / PTPN12

CBL / PTPN9CBL / TGFBR2

CBL / UBASH3A

CBL / UBASH3B

CDKN1A / ctrlCDKN1B / CDKN1A

CDKN1B / ctrlCDKN1C / CDKN1A

CDKN1C / CDKN1B

CDKN1C / ctrl

CEBPA / ctrlCEBPB / CEBPA

CEBPB / MAPK1 CEBPB / ctrl

CEBPB / OSR2

CEBPB / PTPN12

CEBPE / CEBPA

CEBPE / CEBPB

CEBPE / CNN1

CEBPE / KLF1

CEBPE / ctrl

CEBPE / PTPN12

CEBPE / RUNX1T1CEBPE / SPI1

CELF2 / ctrl

CITED1 / ctrl

CKS1B / ctrl

CLDN6 / ctrl

CNN1 / MAPK1

CNN1 / ctrl

CNN1 / UBASH3A

CNNM4 / ctrl

COL1A1 / ctrlCOL2A1 / ctrl

CSRNP1 / ctrl

DLX2 / ctrl

DUSP9 / ETS2

DUSP9 / IGDCC3

DUSP9 / KLF1DUSP9 / MAPK1

DUSP9 / ctrl

DUSP9 / PRTGDUSP9 / SNAI1

EGR1 / ctrl

ELMSAN1 / ctrl

ETS2 / CEBPE

ETS2 / CNN1

ETS2 / IGDCC3

ETS2 / IKZF3

ETS2 / MAP7D1

ETS2 / MAPK1

ETS2 / ctrl

ETS2 / PRTG

FEV / CBFA2T3

FEV / ISL2

FEV / MAP7D1

FEV / ctrl

FOSB / CEBPB

FOSB / CEBPE

FOSB / IKZF3

FOSB / ctrl

FOSB / OSR2

FOSB / PTPN12

FOSB / UBASH3B

FOXA1 / FOXF1

FOXA1 / FOXL2

FOXA1 / HOXB9

FOXA1 / ctrlFOXA3 / FOXA1

FOXA3 / FOXF1FOXA3 / FOXL2

FOXA3 / HOXB9FOXA3 / ctrl

FOXF1 / FOXL2FOXF1 / HOXB9

FOXF1 / ctrl

FOXL2NB / FOXL2

FOXL2NB / ctrl

FOXL2 / HOXB9

FOXL2 / MEIS1

FOXL2 / ctrl

FOXO4 / ctrl

GLB1L2 / ctrl

HES7 / ctrl

HK2 / ctrl

HNF4A / ctrl

HOXA13 / ctrl

HOXB9 / ctrl

HOXC13 / ctrl

IER5L / ctrl

IGDCC3 / MAPK1

IGDCC3 / ctrl

IGDCC3 / PRTG

IGDCC3 / ZBTB25

IKZF3 / ctrl

IRF1 / ctrl

IRF1 / SET

ISL2 / ctrl

JUN / CEBPA

JUN / CEBPB

JUN / ctrl

KIF18B / KIF2C

KIF18B / ctrl

KIF2C / ctrl

KLF1 / BAK1

KLF1 / CEBPA

KLF1 / CLDN6

KLF1 / COL2A1

KLF1 / FOXA1

KLF1 / MAP2K6KLF1 / ctrlKLF1 / TGFBR2

KMT2A / ctrl

LHX1 / ELMSAN1

LHX1 / ctrl

LYL1 / CEBPB

LYL1 / IER5L

LYL1 / ctrl

MAML2 / ctrl

MAP2K3 / ELMSAN1

MAP2K3 / IKZF3

MAP2K3 / MAP2K6

MAP2K3 / ctrl

MAP2K3 / SLC38A2

MAP2K6 / ELMSAN1

MAP2K6 / IKZF3

MAP2K6 / ctrl

MAP2K6 / SPI1

MAP4K3 / ctrl

MAP4K5 / ctrl

MAP7D1 / ctrl

MAPK1 / IKZF3

MAPK1 / ctrl

MAPK1 / PRTG

MAPK1 / TGFBR2

MEIS1 / ctrl

MIDN / ctrl

NCL / ctrl

NIT1 / ctrl

ctrl / BAK1

ctrl / CBARP

ctrl / CBFA2T3

ctrl / CDKN1A

ctrl / CDKN1B

ctrl / CEBPA

ctrl / CEBPB

ctrl / CEBPE

ctrl / CLDN6

ctrl / CNN1

ctrl / COL2A1

ctrl / DLX2

ctrl / ELMSAN1

ctrl / ETS2

ctrl / FEV

ctrl / FOXA1ctrl / FOXF1

ctrl / FOXL2

ctrl / HOXB9

ctrl / HOXC13

ctrl / IER5L

ctrl / IGDCC3

ctrl / IKZF3ctrl / ISL2

ctrl / KIF2C

ctrl / KLF1 ctrl / MAP2K6ctrl / MAP7D1

ctrl / MAPK1

ctrl / MEIS1

ctrl / ctrl

ctrl / OSR2

ctrl / PRTG

ctrl / PTPN12

ctrl / PTPN9

ctrl / RUNX1T1

ctrl / SAMD1

ctrl / SET

ctrl / SLC38A2

ctrl / SNAI1

ctrl / SPI1

ctrl / STIL

ctrl / TBX2ctrl / TBX3

ctrl / TGFBR2

ctrl / UBASH3A

ctrl / UBASH3B

ctrl / ZBTB1

ctrl / ZBTB25

ctrl10 / ctrl

ctrl11 / ctrl

ctrl1 / ctrl

OSR2 / ctrl

PLK4 / ctrl

PLK4 / STILPOU3F2 / CBFA2T3

POU3F2 / FOXL2

POU3F2 / ctrl PRDM1 / CBFA2T3

PRDM1 / ctrl

PRTG / ctrl

PTPN12 / ctrl

PTPN12 / OSR2

PTPN12 / PTPN9

PTPN12 / SNAI1

PTPN12 / UBASH3A

PTPN12 / ZBTB25

PTPN13 / ctrl

PTPN1 / ctrl

PTPN9 / ctrl

RHOXF2B / ctrl

RHOXF2B / SET

RHOXF2B / ZBTB25

RP5-862P8.2 / ctrl

RREB1 / ctrl

RUNX1T1 / ctrl

S1PR2 / ctrl

SAMD1 / ctrl

SAMD1 / PTPN12

SAMD1 / TGFBR2SAMD1 / UBASH3B

SAMD1 / ZBTB1

SET / CEBPE

SET / KLF1SET / ctrl SGK1 / ctrl

SGK1 / S1PR2

SGK1 / TBX2SGK1 / TBX3

SLC4A1 / ctrl

SLC6A9 / ctrl

SNAI1 / DLX2

SNAI1 / ctrl

SNAI1 / UBASH3B

SPI1 / ctrl

STIL / ctrl

TBX2 / ctrl

TBX3 / ctrlTBX3 / TBX2

TGFBR2 / CBARP

TGFBR2 / ETS2

TGFBR2 / IGDCC3

TGFBR2 / ctrl

TGFBR2 / PRTG

TMSB4X / BAK1

TMSB4X / ctrl

TP73 / ctrl

TSC22D1 / ctrl

UBASH3A / ctrl

UBASH3B / CNN1

UBASH3B / ctrl

UBASH3B / OSR2

UBASH3B / PTPN12

UBASH3B / PTPN9

UBASH3B / UBASH3A

UBASH3B / ZBTB25

ZBTB10 / DLX2

ZBTB10 / ELMSAN1

ZBTB10 / ctrlZBTB10 / PTPN12

ZBTB10 / SNAI1

ZBTB1 / ctrl

ZBTB25 / ctrl

ZC3HAV1 / CEBPA

ZC3HAV1 / CEBPE

ZC3HAV1 / HOXC13

ZC3HAV1 / ctrl

ZNF318 / FOXL2

ZNF318 / ctrl

BA

a ab

b

Reduce dimensions to visualize manifold and

cluster related phenotypes

G1 CELL CYCLEG1 CELL CYCLEARRESTARREST

GRANULOCYTEGRANULOCYTE/ APOPTOSIS/ APOPTOSIS

PIONEERPIONEERFACTORSFACTORS

ERYTHROIDERYTHROID

PRO-GROWTHPRO-GROWTH

MEGAKARYOCYTEMEGAKARYOCYTEMARKERMARKER

D Gene-level GI score

−4−2024

C Fitness phenotype (γ)

−0.4−0.20.00.20.4

E Deviation of cell cycledistribution from control cells

0.00.10.20.30.40.50.60.7

F Cell cycle position

M

M-G

1

G1-

S S

G2-

M

ctrl / CDKN1Actrl / CDKN1BCDKN1C / ctrlCDKN1C / CDKN1ACDKN1C / CDKN1BPLK4 / ctrlctrl / STILPLK4 / STILCKS1B / ctrlKIF18B / ctrlctrl / KIF2CKIF18B / KIF2C

−1.00.01.0

Fig. 2. Visualization of the GI manifold. (A) Using diverse geneticperturbations, the structure of the GI manifold can be inferred and thenvisualized by dimensionality reduction to a plane. (B) UMAP projection ofall single-gene and gene-pair Perturb-seq profiles. Each dot represents agenetic perturbation characterized by its mean expression profile. Clustersof transcriptionally similar perturbations are colored identically, whereasgray dots are perturbations that do not fall within stable clusters.

(C) Fitness measurements from the GI map expressed as gene-pair growthphenotypes (g). (D) GI scores from the fitness-level GI map. Single-geneperturbations are not included. (E) Cell cycle deviation scores. Strongerscores indicate alteration from the distribution of cell cycle positionsobserved in unperturbed cells. (F) Relative enrichment or depletion of cellcycle phases relative to unperturbed cells induced by selected geneticperturbations.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 4: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

materials andmethods]. Both fitness andGI scoreswere distributed throughout the GI manifold (Fig.2, C andD), in accordancewith the idea that scalarfitness measurements collapse the much largerlandscape of transcriptional states (Fig. 1A). Bycontrast, overlaying markers derived from theunderlying transcriptional data allowed us to gaininsight into the mechanisms of individual GIsby looking at common features of perturbationswithin each cluster. For example, one cluster ofmean expression profiles (“G1 cell cycle arrest”)contained canonical regulators of the cell cycle.In this case, the underlying single-cell data con-firmed that these perturbations induced cell cyclearrest in the expected cell cycle stages (Fig. 2, Eand F), explaining the growth defect.We observed clusters of perturbations that

caused cells to induce erythroid, granulocyte,or megakaryocyte markers, which is consistentwith the known multilineage potential of theK562 model (Figs. 2B and 3, A and B, and fig. S9)(16, 17). These results suggested that cell differ-entiation or priming and a concomitant decreasein proliferation explained some of the structureof theGImanifold. For example,many interactionssurrounding CBL, its regulatorsUBASH3A/B, andseveralmultisubstrate tyrosine phosphatases (e.g.,PTPN9/12) induced erythroid markers, suggest-ing a common mechanism in the regulation ofreceptor tyrosine kinase signaling (18). By con-trast, the granulocyte cluster mostly containedperturbations of canonical regulators such asC/EBP-a, -b, and -e (CEBPA/B/E) and PU.1 (SPI1).Finally, a cluster of perturbations induced ex-pression of the canonical megakaryocyte markerCD41, but these cells did not adopt the charac-

teristicmorphological features ofmegakaryocytes,as seen by microscopy (fig. S9G), suggesting thatthey are at best primed toward megakaryocyticdifferentiation (16, 17).To test our ability to better resolve specific

interactions using Perturb-seq, we examined astrong synergistic interaction identified by ourfitness GI map between CBL and CNN1 (calpo-nin) that belonged to the erythroid cluster. CNN1is a poorly characterized gene that is annotatedas a smoothmuscle–specific protein, although itis expressed in many cell types (19, 20). Over-expressing either gene induced similar transcrip-tional changes, and single-cell analysis revealedan apparent progression of phenotypes from un-perturbed through singly perturbed to doublyperturbed CBL/CNN1 cells (Fig. 3, C and D). Con-sistent with an erythroid transcriptional program,overexpression of CBL and CNN1 caused stronginduction of canonicalmarkers: hemoglobin genes(sixfold to 39-fold), an iron importer involved inheme biosynthesis (SLC25A37, 13-fold), and theblood group antigen CD235a (GYPA, twofold)(Fig. 3, C and E, and fig. S10, A and B) (17).Furthermore, overexpression of CBL and CNN1transgenes in a human erythroid progenitormodel(HUDEP2 cells) individually and in combinationalso inducedmarkers of erythroid differentiation(Fig. 3, F and G, and fig. S10, C and D) (21).This example highlighted how Perturb-seq

analysis can directly lead to a hypothesis aboutthe biology underlying a GI evenwhen one of thecomponents is poorly understood. More gen-erally, because our approach was sensitive tosingle-cell phenotypes and incomplete differen-tiation and could simultaneously detect signa-

tures of multiple differentiation states, it couldfacilitate higher-order combinatorial perturba-tion screens aimed at improving protocols fordriving cells into distinct differentiation states.

Quantitative modeling of GIs definesmechanisms of interaction

Our large collection of matched single and doubleoverexpression transcriptional phenotypes pro-vides us with the opportunity to quantitativelymodel GIs directly from transcription profileswithout using the fitness GImap.We devised anapproach based on fitting a regression model,dab = c1da + c2db + e, which decomposes thetranscriptional changes observed in doubly per-turbed cells (dab) as a linear combination of thetranscriptional changes induced by the two singleperturbations of genes a and b (c1da + c2db) andan error term (e) that contains unmodeled ornonlinear effects. The coefficients c1 and c2 theneffectively measure how much of the phenotypeis accounted for by each single perturbation(Fig. 4A). This linear model of transcriptional GIsexplained, on average, more than 70% of the var-iance in gene expression (Fig. 4B; mean R2 = 0.71).GI maps traditionally classify interactions as

either buffering (indicating antagonism, GI scorepositive) or synthetic sick/lethal (SSL, indicat-ing synergy, GI score negative). We observed arobust anticorrelation (R = –0.72) between themagnitude of the coefficients c1 and c2 and thefitness-based GI score (Fig. 4C). An intuitive ex-planation was that buffering interactions travelless “far” (smaller coefficients) along the GI mani-fold, whereas SSL interactions travel farther (big-ger coefficients) (Fig. 4A).

Norman et al., Science 365, 786–793 (2019) 23 August 2019 4 of 8

Differentiation hierarchy

CMP

MEP

ErythrocytesPlatelets

Granulocytes

Macrophages

GMPK562s

Developmental markers

erythroid markers

granulocyte markers

CD41

A B

G

Contro

lCBL

CNN1

CBL/CNN1

CBL / CNN1 (GI score: -11.9)

differentially expressed genes

C

SLC

25A

37H

BZ

PR

ELI

D1

MIF

TM

SB

4XP

RS

S57

GM

FG

TE

X30

PK

MC

D99 XG

ZF

P36

L1R

EE

P6

EE

F1A

1G

ST

P1

GY

PA

NA

RF

HB

A2

ALA

S2

HB

A1

SM

IM1

GY

PB

TE

SC

CO

L18A

1C

HI3

L2A

PO

EA

PO

C1

HB

G2

HB

G1

BLV

RB

CBL / ctrl

ctrl / CNN1

CBL / CNN1−3.0−1.50.01.53.0

CBL / ctrlctrl / ctrl

ctrl / CNN1CBL / CNN1

ED

F

0 20 40Combined score

ERYTHROBLAST

CORD BLOOD

CD34+ CELL

PERIPHERAL BLOOD

ADRENAL GLAND

Contro

lCBL

CNN1

CBL/CNN1

0

1000

2000

3000

Hem

oglo

bin

(med

ian

fluor

esce

nce

inte

nsity

)

Fig. 3. Dissecting a genetic interaction using Perturb-seq. (A) Expres-sion of marker genes for different hematopoietic cell types in the GImanifold UMAP projection. Color is scaled by mean expression Z-score of amarker gene panel. (B) Hematopoietic differentiation hierarchy. K562 cellsare a poorly differentiated, erythroid-like cancer cell line. (C) Perturb-seqprofiling of the CBL/CNN1 GI. Average transcriptional profiles for the twoconstituent single perturbations are compared with the double perturba-tion. Heat maps show deviation in gene expression relative to unperturbed

cells. (D) UMAP projection of single-cell Perturb-seq data in the CBL/CNN1interaction. Each dot is a cell colored according to genetic background.(E) ARCHS4 (35) cell-type term enrichment for genes showing largeexpression changes in CBL/CNN1 doubly perturbed cells. (F) Expression ofhemoglobin in HUDEP2 cells upon cDNA overexpression of CBL or CNN1.Hemoglobin was labeled with anti–fetal hemoglobin antibody andmeasured by flow cytometry. (G) Pelleted HUDEP2 cells. Hemoglobinexpression appears red.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 5: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

Norman et al., Science 365, 786–793 (2019) 23 August 2019 5 of 8

Fig. 4. Quantitative model for high-dimensional GIs. (A) Model oftranscriptional genetic interactions. Different transcriptional states define pointson the surface of the GI manifold, and genetic perturbations define vectors oftravel.The model decomposes double perturbations as a linear combination ofthe two constituent single perturbations. (B) Model fit across all GIs measuredwith Perturb-seq. (C) Magnitude of model coefficients compared with GI scorefrom the fitness-level GI map. (D and E) Application of the model to selectedGIs. For each GI, transcriptional profiles for the two constituent single

perturbations are compared with the double perturbation and the model fit.Heat maps show deviation in gene expression relative to unperturbed cells.(F) Visualization of all measured GIs in Perturb-seq experiment. Each GI wascharacterized using features derived from the model (x-axis) and by measuresof similarity among the transcriptional profiles (y-axis).These two viewpointswere each clustered and collapsed to a single dimension using UMAP to definethe two axes.The features defining the two axes are plotted next to them.Categories of GIs are annotated based on features shared within the clusters.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 6: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

To explore the ability of Perturb-seq to betterresolve GIs [by analogy with past efforts in othersystems (1, 22, 23)], we examined two strongbuffering interactions that each had scores of+10.1 in our fitness GI map but appeared to be-have differently on a transcriptional level. Anal-ysis by Perturb-seq revealed that the GI betweenKLF1 and CEBPA resulted from genetic epistasis(i.e., one single overexpression phenotypemaskingthe other), resulting in smaller, asymmetrical co-efficients (c1 = 0.19, c2 = 0.72; Fig. 4D and fig.S11A). By contrast, the PTPN12/SNAI1 GI resultedfrom genetic suppression (i.e., when combinedoverexpression of two genes attenuated each other’sindividual phenotypes), resulting in two smallercoefficients (e.g., PTPN12/SNAI1, c1 = 0.60, c2 =0.57; Fig. 4E and fig. S11B). Finally, as discussedabove, synergistic or synthetic lethal interactionstended to result in two larger coefficients (e.g.,CBL/CNN1, c1 = 1.24, c2 = 0.8; Fig. 3C and fig. S11C).A central questionwhen consideringGIs is how

often new or unexpected (neomorphic) pheno-types emerge through the combined action ofgenes. A relatively small number of GIs (lowermode in Fig. 4B) deviated from the expectationgiven by the linear model (which we quantifiedby distance correlation d; materials andmethodsand fig. S11D). A common neomorphic behaviorsimilar to an example observed in yeast (22) oc-curred when a perturbation that had little tran-scriptional effect on its own appeared to enhancethe effects of a second perturbation (e.g., FEV/CBFA2T3, d = 0.74; fig. S11E). We also observed

relatively rare instances in which double pheno-types appeared to be completely unexpected [e.g.,the physical interactors PLK4/STIL (24) d = 0.53;fig. S11F]. The model’s parameters thus provideda simple, useful summary of how perturbationscombine (table S9).To look for structure among interactions, we

used a two-dimensional visualization and clus-tering technique (25) (table S9 and materialsand methods). One axis grouped interactionsaccording to properties derived from the modelcoefficients (c1, c2, d), whereas the other groupedinteractions according to how correlated the un-derlying transcriptional responses were (materialsand methods). The resulting figure (Fig. 4F) iden-tified numerous distinct categories of interaction,showing that the model can serve as a general-ization of the one-dimensional “buffering versussynthetic lethal” paradigm that has typically beenused to categorize genetic interactions.

Ordering genes into linear pathwaysusing Perturb-seq

The linear GI model allowed us to make hypo-theses about gene regulation because it can iden-tify, for example, which single perturbationphenotype better explains the double perturbationphenotype. We examined the strong GIs amongthegenesDUSP9,ETS2, andMAPK1. In theDUSP9/ETS2 interaction (Fig. 5A), theDUSP9 phenotypedominated, suggesting thatDUSP9overexpressionantagonized ETS2. Similarly,DUSP9 andMAPK1antagonized each other’s activities (Fig. 5B). Fi-

nally, ETS2 and MAPK1 induced similar pheno-types, andETS2 transcriptionwas activated in allbackgrounds (9.3-fold inMAPK1, 9.2-fold in ETS2,and 35.8-fold in MAPK1/ETS2 overexpression;Fig. 5C). This type of interaction, in which a per-turbation (MAPK1) acts at least partly by up-regulating its partner (ETS2), was uncommonin our dataset (fig. S11G). Taken together, theseresults suggested a linear regulatory pathway inwhichDUSP9 (a phosphatase) inhibitsMAPK1 (akinase) that activates ETS2 (a transcription fac-tor), which is consistent with the known biologyof these gene families (Fig. 5D) (26). Followingsimilar logic, the model allowed us to orient allthe buffering interactions in which one pertur-bation is epistatic to another (Fig. 5E).

Single-cell heterogeneity reveals thetrajectory of GIs

The single-cell resolution afforded by Perturb-seqcan reveal phenotypic heterogeneity for some GIsthat we reasoned could yield further insight intotheir mechanism (Fig. 5F; compare fig. S8). Forexample, cells overexpressing both DUSP9 andMAPK1 showed a range of phenotypes spanningthe transcriptional states observed in cells over-expressing either DUSP9 or MAPK1 alone (Fig.5G). In particular, we observed cells in whichDUSP9 and MAPK1 appeared to suppress eachother’s activity entirely.We reasoned that we could therefore identify

DUSP9 orMAPK1 regulatory targets that showeddiffering sensitivity to the levels of these proteins

Norman et al., Science 365, 786–793 (2019) 23 August 2019 6 of 8

DUSP9 / CtrlCtrl / MAPK1DUSP9 / MAPK1principal curve

F

H

GSingle-cell phenotypicheterogeneity

Exp

ress

ion

accu

mul

atio

n al

ong

prin

cipa

l cur

ve

DU

SP9

expr

essi

on

MAP

K1 e

xpre

ssio

n

Differentially expressed genes

DU

SP9

/ ctrl

DU

SP9

/ MAP

K1

ctrl

/ MAP

K1

identity of perturbation in cell

E

DUSP9 / ctrlCtrl / ETS2DUSP9 / ETS2model prediction

A

B

C

DDUSP9 ETS2

double = 0.75 DUSP9 + 0.25 ETS2 (GI = 5.7, d = 0.90)

ETS2 / ctrlCtrl / MAPK1ETS2 / MAPK1model prediction

MAPK1 ETS2double = 0.47 ETS2 + 1.07 MAPK1 (GI = -2.5, d = 0.87)

DUSP9 MAPK1 ETS2

DUSP9 / CtrlCtrl / MAPK1DUSP9 / MAPK1model prediction

DUSP9 MAPK1

double = 0.39 DUSP9 + 0.23 MAPK1 (GI = 11.0, d = 0.72)

KLF1

MAP2K3

MAP2K6

ZC3HAV1

IGDCC3PTPN12

SET

UBASH3B

ETS2

MAPK1

CEBPA

CEBPB

CEBPE

FOSB

SPI1DUSP9IKZF3

ZBTB10

OSR2

RUNX1T1

phosphatase kinase transcriptionfactor

Fig. 5. Inferring gene-regulatory logic underlying GIs. (A to C)Application of linear genetic interaction model to GIs among DUSP9,MAPK1, and ETS2. (D) Order of pathway inferred from model fits.(E) Epistatic buffering interactions oriented using the genetic interac-tion model. Each arrow denotes a genetic interaction originating in thegene that dominates when the two genes are simultaneously perturbed.Arrow size denotes the degree of dominance as measured byasymmetry of model coefficients. Genetic perturbations with similartranscriptional profiles are colored identically. (F) Stochastic heteroge-neity can cause individual cells (dots) bearing a given geneticperturbation to explore the space on the GI manifold surrounding the

average direction of travel (arrows). (G) UMAP projection of singlecells with overexpression of DUSP9 and/or MAPK1. Black line representsthe principal curve, which tracks the primary direction of variation inthe dataset that can be used to order all cells. (H) Gene expressionaveraged along the principal curve. Each row denotes a cell orderedaccording to position along the principal curve. The left three columnsindicate that cell’s genetic background. At each point, cells that areclose on the principal curve are averaged to produce a local estimateof median gene expression. The heat map shows normalized expressionof differentially expressed genes. The DUSP9 and MAPK1 expressioncolumns show the same data for the targeted genes.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 7: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

by exploiting single-cell data. To order cells inan unbiased way by “phenotype,”we computed aprincipal curve measuring the path of maximumvariation in the dataset [(27); Fig. 5G and mate-rials andmethods]. Examining median-filteredgene expression (materials andmethods) alongthis curve revealed distinct classes of transcriptsregulated byDUSP9 orMAPK1 activity (e.g.,GYPAappeared to be more sensitive to DUSP9 activitythan did HBZ; Fig. 5H and fig. S12A). This varia-tion did not appear to be the result of stable dif-ferences in the expression of MAPK1 and DUSP9(fig. S12B), suggesting a possible role either forhistorical differences or stochastic gene expres-sion. Perturb-seq can therefore reveal gradedphenotypes resulting from antagonism betweentwo proteins.

Predicting GIs using arecommender system

One possible strategy to address the scale ofgenetic interactions is to pursue a split experi-

mental and computational approach, in whichonly a subset of interactions is sampled [eitherrandomly or through “compressed” experimen-tal designs (28)] and the remainder is predictedcomputationally (Fig. 6A) (7, 29, 30). Perturb-seqprovides a scalable means of constraining thesetypes of searches (7), enabling exploration of theGI landscape.There is substantial similarity between this

problem and that of predicting a person’s shop-ping preferences based on past buying behavior,which is commonly addressed using “recommen-der system” algorithms.Many of these approachescan exploit low-rank structure such as that seen inGI maps and leverage additional side informationobtained by other means to improve predictivepower. We examined the Perturb-seq profiles ofsingle-gene overexpression (fig. S13A) and foundthat therewas amodest concordance betweenGIprofile and Perturb-seq profile correlations (R =0.29, p < 10−103; fig. S13B), suggesting that thetranscriptional data provided a complementary,

scalable means of comparing genes that mightinform GI prediction.We constructed a matrix factorization model

for fitness GIs and then constrained this modelto encourage similar interaction profiles amonggenes that induced similar transcriptional changes(fig. S14A and materials and methods) (30, 31).We then predicted unobserved GIs using thismodel trained on different fractions of randomlysubsampled interactions (Fig. 6A and materialsand methods). The end result (Fig. 6, B to D)preserved much of the large-scale structure ofthe map as seen through block averaging of GIscores (Fig. 6, C and D; fig. S14B; and materialsand methods). Our approach was also substan-tially better than random sampling at predict-ing the top 10% of interactions and reasonablypreserved both the rank order of all interactions(Spearman r ≈ 0.5 at 10% sampling; Fig. 6, D andE and fig S14C) and the pairwise similaritiesbetween GI profiles (Fig. 6F). Notably, the use ofPerturb-seq–derived single perturbation profilesas side information substantially improved per-formance (fig. S14D). These results suggest thatthe hybrid approach can nominate blocks of GIsfor in-depth study.Finally, we used our data to model the mini-

mum number of cells that would be required toperform larger experiments. By downsamplingour measured perturbations and reperformingour analyses, we observed that as few as 50 cellsper perturbation could be sufficient,meaning that~106 cells would be needed to collect side infor-mation for the entire set of protein-coding se-quences (Fig. 6G).

Discussion

A central goal of genetics is to understand therelationship between the set of genes that a cellexpresses and its phenotype. However, this rela-tionship is challenging to study because manyphenotypes emerge only through the coordinatedaction of multiple genes. Here, we used Perturb-seq to manipulate a large number of gene pairsand then measure the resulting changes in cellstate. This ensemble of measurements describeda high-dimensional surface called a GI manifold.By interpreting and modeling the GI manifold,we can gain several insights into how complexphenotypes emerge.First, the transcriptional profiles can distin-

guish distinct outcomes such as cell death, slowgrowth, and differentiation to a variety of cellstates that would appear equivalent at the fitnesslevel. We also identified both canonical genes(e.g., KLF1, GATA1) and unexpected genes (e.g.,CNN1) that interacted to promote differentiationto a specific cell state (erythrogenesis). As our single-cell approach is sensitive to multiple outcomesor perturbations with incomplete penetrance,it is a natural strategy to pursue combinatorialsearches for factors driving (trans)differentiation(4). Second, the shape of the GI manifold can re-veal howGIs arise.We derived a simple, geometricGImodel and used it to identify the different waysin which genetic perturbations combine to yieldnew phenotypes, allowing us to, for example,

Norman et al., Science 365, 786–793 (2019) 23 August 2019 7 of 8

True GI map

Pre

dict

ed G

I map

LLLLLLLLLLLLLLLLLLLLLLL

Recommendersystem

Prioritize regions of interest

Perturb-seq side information about

gene a

Per

turb

-seq

sid

e in

form

atio

n ab

out

gene

b Sparse samplingof fitness

interactions

EstimatedGI map

A B C

ED

−0.2 0.0 0.2True GI score

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

Pre

dict

ed G

I sco

re

FSimilarity of GI scoresSimilarity of GI scores G

Cop

hene

tic c

orre

latio

n

10 20 30 50 100Cells per perturbation

0.75

0.80

0.85

0.90

0.95

Similarity of Perturb-seq profiles

Similarity of GI profilecorrelations

5 10 15 20 25Sampling percentage

Spe

arm

an c

orre

latio

n

0.4

0.6

0.8

5 10 15 20 25Sampling percentage

0.4

0.5

0.6

0.7

0.8

Cop

hene

tic c

orre

latio

n

−0.06−0.030.000.030.06

Block-averagedtrue GI map

Block-averagedpredicted GI map −0.04

0.000.04

Fig. 6. Recommender system for exploring the GI landscape. (A) Schematic of predictionstrategy. Fitness phenotypes of a limited subset of GIs are measured. Each gene is characterized byits Perturb-seq transcriptional profile, and similarity among these profiles is used as side informationto constrain a recommender system model to impute remaining fitness GI scores and highlightregions of interest. (B) True versus predicted GI map obtained by prediction from 10% of randomlysampled fitness-level GIs. (C) Block-averaged true and predicted GI maps obtained by averaging GIscores within clusters. (D) Scatter plot of true and predicted GI scores (blue dots) from (B). Thedashed lines show 5 and 95% quantiles, which are used to designate strong GIs. Orange dots showequivalent scatter for block-averaged GI scores in (C). (E) Spearman correlation between trueand predicted GI scores at different levels of random sampling. Fifty random subsets were measuredfor each sampling level. Blue and orange denote individual and block-averaged GIs, respectively.(F) Cophenetic correlation of GI profiles as a function of sampling level, measuring the similarityof correlation structure in the true and predicted GI maps. (G) To assess scaling ability, therepresentation of each perturbation in the Perturb-seq experiment was randomly down-sampled todifferent levels of representation. Plot shows cophenetic correlation between down-sampled and truetranscriptional profiles used to construct the GI manifold visualization of Fig. 2.

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 8: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

order genes into linear pathways. We and othershave established that functionally related genesinteract similarly—in geometric terms, the GImanifold is therefore highly constrained, enablingimputation strategies (1, 2, 7, 12, 29). Our re-sults provide a strategy for exploring largespaces of combinatorial genetic interactionsby measuring only a subset of fitness-level GIs.This provides a complementary approach to ef-forts based on composite measurements, com-pressed sensing, and rich readouts to predictunmeasured GIs (7, 28). By intelligently mea-suring and exploring the GI manifold, one canstart to create a global view of the nonlinear map-ping between genotype and phenotype. Such ap-proaches should enable large-scale searches forsynthetic lethal interactions in cancer, the dis-covery of gene targets that lessen the severity ofgenetic disease, and, more generally, the under-standing of how complex, multigenic interactionsgovern biological traits and disease risk.

REFERENCES AND NOTES

1. M. Costanzo et al., Cell 177, 85–100 (2019).2. J. Domingo, P. Baeza-Centurion, B. Lehner, Annu. Rev. Genomics

Hum. Genet. 20, annurev-genom-083118-014857 (2019).3. J. L. Hartman 4th, B. Garvik, L. Hartwell, Science 291,

1001–1004 (2001).4. K. Takahashi, S. Yamanaka, Cell 126, 663–676 (2006).5. D. A. Jaitin et al., Cell 167, 1883–1896.e15 (2016).6. B. Adamson et al., Cell 167, 1867–1882.e21 (2016).7. A. Dixit et al., Cell 167, 1853–1866.e17 (2016).8. P. Datlinger et al., Nat. Methods 14, 297–301 (2017).9. M. Gasperini et al., Cell 176, 377–390.e19 (2019).10. J. M. Replogle et al., bioRxiv 503367, (2018).11. L. A. Gilbert et al., Cell 159, 647–661 (2014).12. M. A. Horlbeck et al., Cell 174, 953–967.e22 (2018).13. W. Huang, B. T. Sherman, R. A. Lempicki, Nucleic Acids Res.

37, 1–13 (2009).

14. L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifoldapproximation and projection for dimension reduction.arXiv:1802.03426v2 [stat.ML] (6 December 2018).

15. L. McInnes, J. Healy, S. Astels, J. Open Source Softw. 2, 205 (2017).16. R. Alitalo, Leuk. Res. 14, 501–514 (1990).17. S. H. Orkin, L. I. Zon, Cell 132, 631–644 (2008).18. K. Kowanetz et al., J. Biol. Chem. 279, 32786–32795 (2004).19. S. J. Winder, M. P. Walsh, Cell. Signal. 5, 677–686 (1993).20. GTEx Consortium; Laboratory, Data Analysis & Coordinating

Center (LDACC)—Analysis Working Group; Statistical Methodsgroups—Analysis Working Group; Enhancing GTEx (eGTEx)groups; NIH Common Fund; NIH/NCI; NIH/NHGRI; NIH/NIMH;NIH/NIDA; Biospecimen Collection Source Site—NDRI;Biospecimen Collection Source Site—RPCI; Biospecimen CoreResource—VARI; Brain Bank Repository—University of MiamiBrain Endowment Bank; Leidos Biomedical—ProjectManagement; ELSI Study; Genome Browser Data Integration& Visualization—EBI; Genome Browser Data Integration& Visualization—UCSC Genomics Institute, University ofCalifornia Santa Cruz; Lead analysts; Laboratory, DataAnalysis & Coordinating Center (LDACC); NIH programmanagement; Biospecimen collection; Pathology; eQTLmanuscript working group; A. Battle, C. D. Brown,B. E. Engelhardt, S. B. Montgomery, Nature 550, 204–213 (2017).

21. R. Kurita et al., PLOS ONE 8, e59890 (2013).22. S. van Wageningen et al., Cell 143, 991–1004 (2010).23. N. Van Driessche et al., Nat. Genet. 37, 471–477 (2005).24. M. Ohta et al., Nat. Commun. 5, 5267 (2014).25. Y. Cheng, M. T. Wong, L. van der Maaten, E. W. Newell,

J. Immunol. 196, 924–932 (2016).26. C. J. Caunt, S. M. Keyse, FEBS J. 280, 489–504 (2013).27. C. Trapnell et al., Nat. Biotechnol. 32, 381–386 (2014).28. B. Cleary, L. Cong, A. Cheung, E. S. Lander, A. Regev, Cell 171,

1424–1436.e18 (2017).29. N. S. Madhukar, O. Elemento, G. Pandey, Front. Bioeng.

Biotechnol. 3, 172 (2015).30. M. Žitnik, B. Zupan, J. Comput. Biol. 22, 595–608 (2015).31. M. Xu, R. Jin, Z.-H. Zhou in Advances in Neural Information

Processing Systems, M. I. Jordan, Y. LeCun, S. A. Solla, Eds.(MIT Press, 2013), pp. 2301–2309.

ACKNOWLEDGMENTS

We thank B. Adamson, O. Chen, M. Cho, J. Corn, C. Forester,J. Hussmann, S. Weissman, the Innovative Genomics Institute, and

members of the Weissman and Gilbert labs for discussion orreagents. Funding: This work was funded by grants from theNational Institutes of Health (P50 GM102706, U01 CA168370, U01CA217882, RM1 HG009490, and R01 DA036858 to J.S.W.; K99/R00 CA204602 and DP2 CA239597 to L.A.G.; and F32 GM116331to M.J.) and DARPA (HR0011-19-2-0007 to J.S.W. and L.A.G.).J.S.W. is a Howard Hughes Medical Institute Investigator. L.A.G. isthe Goldberg-Benioff Endowed Professor in Prostate CancerTranslational Biology. T.M.N. is a fellow of the Damon RunyonCancer Research Foundation DRG-2211-15. M.A.H. is a Byers FamilyDiscovery Fellow and is supported by the UCSF Medical ScientistTraining Program and the School of Medicine. Authorcontributions: T.M.N., M.A.H., L.A.G., and J.S.W. were responsiblefor the conception, design, and interpretation of the experimentsand wrote the manuscript. M.J. and J.M.R. provided criticalfeedback on the manuscript. L.A.G., M.J., A.X., and J.M.R.constructed GI vectors, cDNA vectors, and libraries. T.M.N., J.M.R.,A.X., A.Y.G., M.A.H., and L.A.G. conducted the GI mapping andPerturb-seq experiments. M.A.H. performed the GI map dataanalysis. T.M.N. performed the Perturb-seq analysis, GI predictionanalysis, and modeling of transcriptional genetic interactions. L.A.G.,T.M.N., A.Y.G., and A.X. designed and conducted validationexperiments. Competing interests: T.M.N., M.A.H., L.A.G., M.J.,and J.S.W. have filed patent applications related to CRISPRi/ascreening, Perturb-seq, and GI mapping. J.S.W. consults for andholds equity in KSQ Therapeutics, Maze Therapeutics, and TenayaTherapeutics. J.S.W. is a venture partner at 5AM Ventures. T.M.N.,M.J., J.M.R., and M.A.H. consult for Maze Therapeutics. Theremaining authors declare no competing interests. Data andmaterials availability: All data are available in the manuscript, thesupplementary materials, or GEO (GSE133344). Notebooksimplementing analyses are available at https://github.com/mhorlbeck and https://github.com/thomasmaxwellnorman.

SUPPLEMENTARY MATERIALS

science.sciencemag.org/content/365/6455/786/suppl/DC1Materials and MethodsFigs. S1 to S14Tables S1 to S9References (32–35)

29 March 2019; accepted 29 July 2019Published online 8 August 201910.1126/science.aax4438

Norman et al., Science 365, 786–793 (2019) 23 August 2019 8 of 8

RESEARCH | RESEARCH ARTICLEon M

arch 8, 2021

http://science.sciencemag.org/

Dow

nloaded from

Page 9: GENOMICS Exploring genetic interaction manifolds ... · GENOMICS Exploring genetic interaction manifolds constructed from rich single-cell phenotypes Thomas M. Norman 1,2,3*†, Max

Exploring genetic interaction manifolds constructed from rich single-cell phenotypes

S. WeissmanThomas M. Norman, Max A. Horlbeck, Joseph M. Replogle, Alex Y. Ge, Albert Xu, Marco Jost, Luke A. Gilbert and Jonathan

originally published online August 8, 2019DOI: 10.1126/science.aax4438 (6455), 786-793.365Science 

, this issue p. 786Scienceof genes in pathways and systematic classifications of GIs.cell. Each perturbation and GI projects the cell state to a particular position on this manifold, enabling unbiased ordering perturbation screens. Diverse transcriptomic phenotypes construct a ''manifold'' representing all possible states of theapproach leverages high-dimensional single-cell RNA sequencing data gathered from CRISPR-mediated, pooled

developed a framework for mapping and understanding GIs. Thiset al.the mechanistic origin of interactions. Norman Mapping of genetic interactions (GIs) is usually based on cell fitness as the phenotypic readout, which obscures

Manifold destiny

ARTICLE TOOLS http://science.sciencemag.org/content/365/6455/786

MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2019/08/07/science.aax4438.DC1

CONTENTRELATED http://stm.sciencemag.org/content/scitransmed/11/503/eaaw3768.full

REFERENCES

http://science.sciencemag.org/content/365/6455/786#BIBLThis article cites 32 articles, 3 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Science. No claim to original U.S. Government WorksCopyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of

on March 8, 2021

http://science.sciencem

ag.org/D

ownloaded from