22
The Plant Cell, Vol. 30: 1178–1198, June 2018, www.plantcell.org © 2018 ASPB. INTRODUCTION The core cell cycle machinery is regulated by internal and ex- ternal cues to appropriately modulate proliferation in various developmental stages and environmental conditions. Therefore, cell cycle regulation plays a pivotal role in plant development and growth (Inzé and De Veylder, 2006; Francis, 2007). Cell cy- cle control is well studied in Opisthokonts (animals and fungi) and is deeply conserved within that lineage; however, Opist- hokonts diverged from the green algae/land plant lineage early in eukaryotic evolution (Rogozin et al., 2009), leaving open the possibility that much of the Opisthokont model might apply poorly or not at all to the plant kingdom. To the extent that the systems in plants are ultimately shown empirically to be similar or even identical to those in Opisthokonts, that is a very good result because it means that much hard-won knowledge can be transferred more or less wholesale from yeast and animals to plants. Already, though, there are clear examples of strong divergence in cell cycle control between the plant kingdom and Opisthokonts (Cools et al., 2011; Nowack et al., 2012; Kumar et al., 2015; Atkins and Cross, 2018). Therefore, simply observing orthologous sequences in a new genome does not by itself guarantee similar function; experimentation is required. Notably, these divergences are due both to the involvement of distantly related or unrelated gene sequences carrying out the same function and to orthologous sequences that have taken on very distinct functions in different lineages (Cross et al., 2011). Green algae constitute a large and ecologically important class of photosynthetic organisms, of which land plants are an evolutionary offshoot. Chlamydomonas reinhardtii has con- served some important features found also in animals that were lost in the land plant lineage (for example, cilia and basal body/ centrioles). However, Chlamydomonas is clearly a member of the broader plant kingdom; this leads to a very important (but perhaps underexploited) role as a microbial model for land plants (Bisova et al., 2005). The extraordinary utility of yeast models for animal cell biology clearly speaks to the value of this approach overall. Due to accelerated evolution in fungal lineages but not in plant and algal lineages, Chlamydomonas is a substantially better genomic match to land plants than yeasts are for animals. Ancient genome duplications in the land plant lineage resulted in variable but frequently very high paralog levels (recent gene duplicates with very similar sequences). Paralogs frequently have overlapping functions, severely complicating loss-of- function genetic analysis. However, these genome duplications occurred after divergence from green algae, and most genes in Chlamydomonas are single copy (Bisova et al., 2005). Loss-of- function mutations in single-copy genes immediately reveal the Comprehensive Discovery of Cell-Cycle-Essential Pathways in Chlamydomonas reinhardtii Michal Breker, Kristi Lieberman, and Frederick R. Cross 1 Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, New York 10065 ORCID IDs: 0000-0002-3210-9487 (K.L.); 0000-0002-4387-4133 (F.R.C.) We generated a large collection of temperature-sensitive lethal mutants in the unicellular green alga Chlamydomonas rein- hardtii, focusing on mutations specifically affecting cell cycle regulation. We used UV mutagenesis and robotically assisted phenotypic screening to isolate candidates. To overcome the bottleneck at the critical step of molecular identification of the causative mutation (“driver”), we developed MAPS-SEQ (meiosis-assisted purifying selection sequencing), a multiplexed genetic/bioinformatics strategy. MAPS-SEQ allowed us to perform multiplexed simultaneous determination of the driver mutations from hundreds of neutral “passenger” mutations in each member of a large pool of mutants. This method should work broadly, including in multicellular diploid genetic systems, for any scorable trait. Using MAPS-SEQ, we identified es- sential genes spanning a wide range of molecular functions. Phenotypic clustering based on DNA content analysis and cell morphology indicated that the mutated genes function in the cell cycle at multiple points and by diverse mechanisms. The collection is sufficiently complete to allow specific conditional inactivation of almost all cell-cycle-regulatory pathways. Ap- proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes, a huge enrichment compared with the value of 20% for the Chlamydomonas genome overall. Findings about these mutants will likely have direct relevance to essential cell biology in land plants. 1 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Frederick R. Cross ([email protected]). www.plantcell.org/cgi/doi/10.1105/tpc.18.00071 LARGE-SCALE BIOLOGY ARTICLE CM

Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

The Plant Cell, Vol. 30: 1178–1198, June 2018, www.plantcell.org © 2018 ASPB.

INTRODUCTION

The core cell cycle machinery is regulated by internal and ex-ternal cues to appropriately modulate proliferation in various developmental stages and environmental conditions. Therefore, cell cycle regulation plays a pivotal role in plant development and growth (Inzé and De Veylder, 2006; Francis, 2007). Cell cy-cle control is well studied in Opisthokonts (animals and fungi) and is deeply conserved within that lineage; however, Opist-hokonts diverged from the green algae/land plant lineage early in eukaryotic evolution (Rogozin et al., 2009), leaving open the possibility that much of the Opisthokont model might apply poorly or not at all to the plant kingdom. To the extent that the systems in plants are ultimately shown empirically to be similar or even identical to those in Opisthokonts, that is a very good result because it means that much hard-won knowledge can be transferred more or less wholesale from yeast and animals to plants. Already, though, there are clear examples of strong divergence in cell cycle control between the plant kingdom and Opisthokonts (Cools et al., 2011; Nowack et al., 2012; Kumar et al., 2015; Atkins and Cross, 2018). Therefore, simply observing

orthologous sequences in a new genome does not by itself guarantee similar function; experimentation is required. Notably, these divergences are due both to the involvement of distantly related or unrelated gene sequences carrying out the same function and to orthologous sequences that have taken on very distinct functions in different lineages (Cross et al., 2011). Green algae constitute a large and ecologically important class of photosynthetic organisms, of which land plants are an evolutionary offshoot. Chlamydomonas reinhardtii has con-served some important features found also in animals that were lost in the land plant lineage (for example, cilia and basal body/centrioles). However, Chlamydomonas is clearly a member of the broader plant kingdom; this leads to a very important (but perhaps underexploited) role as a microbial model for land plants (Bisova et al., 2005). The extraordinary utility of yeast models for animal cell biology clearly speaks to the value of this approach overall. Due to accelerated evolution in fungal lineages but not in plant and algal lineages, Chlamydomonas is a substantially better genomic match to land plants than yeasts are for animals. Ancient genome duplications in the land plant lineage resulted in variable but frequently very high paralog levels (recent gene duplicates with very similar sequences). Paralogs frequently have overlapping functions, severely complicating loss-of- function genetic analysis. However, these genome duplications occurred after divergence from green algae, and most genes in Chlamydomonas are single copy (Bisova et al., 2005). Loss-of-function mutations in single-copy genes immediately reveal the

Comprehensive Discovery of Cell-Cycle-Essential Pathways in Chlamydomonas reinhardtii

Michal Breker, Kristi Lieberman, and Frederick R. Cross1

Laboratory of Cell Cycle Genetics, The Rockefeller University, New York, New York 10065

ORCID IDs: 0000-0002-3210-9487 (K.L.); 0000-0002-4387-4133 (F.R.C.)

We generated a large collection of temperature-sensitive lethal mutants in the unicellular green alga Chlamydomonas rein-hardtii, focusing on mutations specifically affecting cell cycle regulation. We used UV mutagenesis and robotically assisted phenotypic screening to isolate candidates. To overcome the bottleneck at the critical step of molecular identification of the causative mutation (“driver”), we developed MAPS-SEQ (meiosis-assisted purifying selection sequencing), a multiplexed genetic/bioinformatics strategy. MAPS-SEQ allowed us to perform multiplexed simultaneous determination of the driver mutations from hundreds of neutral “passenger” mutations in each member of a large pool of mutants. This method should work broadly, including in multicellular diploid genetic systems, for any scorable trait. Using MAPS-SEQ, we identified es-sential genes spanning a wide range of molecular functions. Phenotypic clustering based on DNA content analysis and cell morphology indicated that the mutated genes function in the cell cycle at multiple points and by diverse mechanisms. The collection is sufficiently complete to allow specific conditional inactivation of almost all cell-cycle-regulatory pathways. Ap-proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes, a huge enrichment compared with the value of ∼20% for the Chlamydomonas genome overall. Findings about these mutants will likely have direct relevance to essential cell biology in land plants.

1 Address correspondence to [email protected] author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Frederick R. Cross ([email protected]).www.plantcell.org/cgi/doi/10.1105/tpc.18.00071

LARGE-SCALE BIOLOGY ARTICLE

CM

Page 2: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1179

maximum phenotypic consequences, as opposed to the inter-esting but frequently subtle effects resulting from loss of single paralogs out of large families in land plant genetic studies. A forward-genetic screen is not dependent on prior knowl-edge/annotations, allowing for unbiased detection of the critical genes. Random insertional disruptions have been strongly pre-ferred for mutagenesis in Chlamydomonas (recently performed on a systematic scale; Zhang et al., 2014; Li et al., 2016) for three main reasons. First, they have a high probability of producing a knockout of all function, simplifying analysis. Second, insertion of foreign DNA provides a molecular tag for identification of the disrupted gene. Third, the number of insertions per clone is low (frequently only one), so the causative lesion is obvious. How-ever, since insertional mutagenesis mainly produces null alleles, that method in principle cannot yield mutations in the essential gene set. Point mutant screens, with random mutations intro-duced by chemicals or UV, have the key positive feature, not shared by insertional disruptions, that essential genes can be identified by looking for temperature-sensitive (Ts) lethal point mutations. Ts mutants not only serve as tools for identifying es-sential functions, but they also provide convenient experimental handles for phenotyping to determine function. This is essential for previously completely unknown genes and is very import-ant even for genes with identified sequence orthologs in other organisms, since sequence orthology suggests but does not prove functional equivalence (Cross et al., 2011). A drawback to point mutagenesis is that identification of the causative lesion and mutated gene in Chlamydomonas was, until recently, nearly intractable. Extensive mapping was re-quired, sequencing was slow and cumbersome, and random mutagenesis resulted in hundreds or thousands of lesions per clone, only one of which was truly causative. In our previous work (Tulin and Cross, 2014), we developed a method that com-bines bulked segregant sequence analysis with next generation sequencing to identify causative mutations. Using this method, a UV-induced mutant with a mutation causing a Ts− phenotype was put through a cross to the wild type, and a small pool of

Ts− meiotic segregants was collected by tetrad analysis. A DNA sample from this pool was sequenced to high coverage. Reads at the position of the causative lesion were uniformly mutant, while reads at the position of irrelevant point mutations were on average 50% mutant. This method is effective but time- consuming, requiring individual crosses and library preparation for each mutant. In another study (Onishi et al., 2016), we used a combinatorial sequencing strategy to identify all mutations in a large set of mutants by sequencing only a small number of pools with over-lapping membership. Causative lesions were identified from the deconvolved lists of total mutations in each strain based on recurrent mutations of genes from independent strains previ-ously determined to be in the same complementation group. This method is effective, but it is restricted to genetic screens identifying only a small number of complementation groups, since it requires at least two independent driver mutations in each complementation group to be present in the pool. Here, we introduce innovative methodologies for multiplexed, robust identification of causative lesions for nearly 100 mutants in parallel. This methodology represents a major scale-up of our work developing and sequence-identifying a collection of Ts- lethal mutations in Chlamydomonas, with the aim of approach-ing saturation (complete identification) of cell-cycle-specific pathways. We started with a greatly expanded pool of Ts-lethal mutations (Supplemental Figure 1). We systematized and im-proved the combinatorial sequencing approach to determine the complete set of mutations in a large collection of mutants in parallel (Erlich et al., 2009; Onishi et al., 2016). Then, to identify driver mutations from these mutants, we developed a highly ef-ficient multiplexed version of bulked segregant sequence analy-sis. Unlike the method by Onishi et al. (2016), this method does not require multiple independent alleles and thus is suitable for a screen with a large mutational target size. These methods re-quired us to develop innovative upstream genetic procedures to produce maximally informative sequence libraries, as well as sensitive bioinformatics for sequence analysis to make efficient

Page 3: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1180 The Plant Cell

use of even rare minority reads without confounding our results with inevitable sequencing/alignment errors. Importantly, these methodologies can in principle be applied to a range of meiotic systems, including multicellular diploids, with some modifications. Using these methods, we broadened our recovery of lesions in genes required for cell cycle progression to an estimated ∼50% of the total; since all biological pathways are multicom-ponent, this implies that we likely have identified (and have multiple experimental handles to inactivate) nearly all essential pathways contributing to cell cycle regulation. We also identified mutations inactivating broad biosynthetic pathways including general transcription, ribosome biogenesis, and translation, in-dicating the utility of the methods for investigations beyond cell cycle regulation. The mutants represent a valuable resource for investigation into how cell cycle progression influences any cell behavior of interest. Over 75% of the genes identified have con-served sequence orthologs in land plant genomes; thus, their characterization is highly likely to be informative for essential cell biology across the plant kingdom.

RESULTS

We began the study by isolating thousands of mutants harbor-ing Ts-lethal mutations in essential genes using a highly effi-cient robotically assisted pipeline (Tulin and Cross, 2014; Breker et al., 2016). In order to enrich for mutagenized genes with po-tential cell cycle involvement, we used time-lapse microscopy to focus on mutants with little detectable problems with cell growth, but which fail to successfully complete a division cycle (Figure 1; Supplemental Figure 1) (assays and logic previously described in detail in Tulin and Cross, 2014; Breker et al., 2016). This quality control step reduces the number of mutants to be analyzed by a factor of ∼10 to 100, but still leaves many more mutants than could practically be sequence identified by the rather laborious bulked segregant sequence analysis strategy of Tulin and Cross (2014).

A Combinatorial Strategy to Sequence Mutants in Overlapping Pools

These UV-mutagenized strains harbor hundreds of mutations scattered randomly across the genome. To determine the com-plete list of mutations in each individual strain, we employed a combinatorial pooling scheme in which each individual mu-tant occupies a unique pattern within a set of bar-coded pools (Figure 2) (Erlich et al., 2009; Onishi et al., 2016). While straightforward in principle, this approach has chal-lenges compared with simple sequence determination of a single clone, since it requires analysis of minority reads at high coverage (“minority reads” are reads with an alternative call po-tentially derived from an authentic mutation, at a position where a majority [in this application, >95%] of the reads correspond to the wild-type reference sequence). The error rate in Illumina sequencing is significant, so at high enough coverage, almost any position will have at least some erroneous calls; these must be distinguished from the minority reads representing a genuine mutation derived from some strain. Sequence coverage of each

pool, the mutant occupancy within a set of pools, and number of mutants per pool were all optimized to ensure that minority reads are reliably detected and clearly distinguishable from Illumina/aligner errors (Figure 2A; see Methods). Additionally, we developed quality control measures that eliminated the vast ma-jority of erroneous calls from consideration, based on Illumina quality scores for the position and for the read overall (inde-pendent of the position) and number of mismatches (excluding the position of interest) in the read overall. These measures are only partially correlated; combined, they reduce the error rate to ∼3*10−4/base call, while eliminating only a small proportion of the input data. We aimed for sufficient coverage to allow each mutation in each mutant to give a reliable signal and experimented with varying numbers of mutants and pools (Figure 2A). In the largest experiment, we analyzed 84 mutants in nine pools (each mutant represented in three pools). Statistical analysis (Supplemental Figure 2) indicated >95% accuracy of the assignment of muta-tions to a single specific mutant (details in Methods). For direct confirmation, we tested numerous assigned muta-tions using an allele-specific PCR method (Onishi et al., 2016). In every case tested, this assay showed that the mutation of interest was present uniquely in the assigned strain. Although the mutants contain only one driver mutation each, the procedure identifies thousands of “passenger” mutations, which can be useful. For example, we found candidate null alleles in two cyclin D genes as passengers. These cyclin D mutations were not lethal (other mutations in the strains were shown to be causative) but may be useful in synthetic lethal genetics and for other purposes. We include Supplemental Data Set 1, with a list of 706 candidate null alleles in passenger mutations, with the thought that they may be useful tools for other studies.

MAPS-SEQ: Meiosis-Assisted Purifying Selection Sequencing

The overlapping pool sequencing yielded a reliable list of muta-tions present in each mutant. On average, ∼70 of the ∼200 mu-tations per clone alter coding sequences (missense, nonsense, or splice donor/acceptor site mutations, insertions, and dele-tions) at locations across the genome. However, tetrad analysis showed 2:2 segregation of Ts−:Ts+ progeny in crosses of ∼90% of the mutants to the wild type, indicating that Ts lethality is due to a single genetic lesion. Therefore, a method was needed to identify the “driver” (single mutation responsible for Ts lethality). Previously, we accomplished this by bulked segregant sequenc-ing (Tulin and Cross, 2014), but this approach was unwieldy at the scale of the this project. We therefore developed MAPS-SEQ, a technique that multiplexes driver discovery, providing a massive increase in efficiency. A pool of up to 84 mutants, for which all mutations are known from the previous step, is mated (using equal numbers of mu-tants of each of the two Chlamydomonas mating types) (Figure 3). In this mixture, each mutant can mate at random with each mutant of the other mating type. The mated cells form zygo-spores committed to undergoing meiosis. Purified zygospores complete meiosis, and haploid products enter mitotic growth,

Page 4: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1181

producing a single genetically complex pool of meiotic recom-binants (Figure 3A). At each chromosomal position, this final pool should contain an approximately equal mixture of every possible version from the original pool (Figure 3B). Mendelian inheritance means that 50% of haploid progeny will be Ts− lethal due to a single lesion, 25% Ts− due to two independent genetic lesions, and 25% Ts+ (Figure 3A); this was indeed observed by tetrad analysis of ran-dom zygospores from the pool (Figure 3C). After brief propaga-tion of meiotic products at permissive temperature, we impose several rounds of high-temperature selection. All mutations from the original pool will survive in the selected pool, except for the causative mutations and passenger mutations that are closely linked to them (Supplemental Figure 1B). Based on the number of zygospores produced by the mating mix (estimated microscopically) and the number of mutants that contributed equally to the mating pool, each mutant probably participated in several hundred independent mating events to various other mutants from the pool. Genetic recombination over the resulting hundreds of meioses will allow the separation of even tightly linked passengers. With 25× coverage per mutant in the pool, the limit of resolution is ∼4 cM between passengers and caus-ative mutation (100*1/25), converting to a physical distance of ∼0.4 Mb (Merchant et al., 2007; Tulin and Cross, 2014). Higher

sequence coverage should allow resolution down to sub-cM levels, given the large number of potentially recombinant chro-mosomes in the pool. We sequence a single DNA preparation from this complex pool to ∼900× overall coverage. Each individual passenger mutation should be present at ∼3% (∼25× coverage). A key advantage of this step is that we can search the aligned se-quences only for the mutations predefined as present in each original mutant. This restriction of hypotheses means that even a single mutant read can be informative, unlike the case with de novo sequencing, where multiple concordant reads are typically essential to overcome Illumina error rates over the whole genome. For example, an error rate of 1/3000 (the ef-fective error rate after filtering low-quality positions/reads as described above) over the Chlamydomonas genome at 1000× will result in 4*107 single-read “mutations” purely from sequenc-ing errors. This same error rate applied to just 5700 predeter-mined mutations will result in no false-positive replicates of the mutation in ∼5000 cases and a single false positive read in ∼700 cases. Errors in this procedure can come from misassignment of a mutation to the wrong mutant or from stochastic lack of reads covering a passenger mutation. We estimate these to be rare events (<1%) (Supplemental Figure 3).

Figure 1. Distinct Arrest Morphologies Differentiate Mutants.

Nitrogen-depleted synchronized samples were plated on TAP agar plates and were transferred to restrictive temperature (33°). Images were taken every hour by light microscopy. (A) Wild-type sample. Cells grow over the course of the first 10 h and then enter rapid cycles of division and reach clusters of cells, as demonstrated at the 12-h time point, followed by hatching of new born cells. (B) Some mutants fail in cell growth (Non-growers). Other mutants (Large round) grow but fail to make cleavage planes (CP). Mutants making cleavage planes (arrows) frequently lose cell integrity a few hours later (Tulin and Cross, 2014); some mutants (Arrested clusters) stay morphologically intact. White arrows mark the initiation of CPs. *Due to differences of focal planes of the images of ess123-1, cell size may seem erroneously shrinking.

Page 5: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1182 The Plant Cell

Identifying Driver Mutations

After computational extraction of the mutations assigned to an individual mutant (Figure 3D), we examine the pattern of mutant read representation across all chromosomes for each mutant. In typical cases, all chromosomes but one show mutant read fre-quencies at a similar low frequency across the chromosome (the expected representation for passenger mutations), and a single chromosome will exhibit a V pattern of progressive depletion, centered at a region of zero coverage (Figures 3D and 4A). This is the pattern expected for absolute selection against the causative (Ts-lethal) mutation, along with decreasing cose-lection against linked passenger mutations (negative hitchhik-ing). A single-depleted-region pattern (Figure 4) was detected in ∼77% of mutants tested (Figure 5A). In most cases, one or a few candidate coding sequence (CDS)-changing mutations are identified at the base of the V shape (Figure 4B). Multiple depleted candidates most likely correspond to a causative mutation with several tightly linked passengers; this is also a limitation of the bulked segregant sequencing strategy employed previously (Tulin and Cross, 2014).

For ∼14% of the single-depleted-region mutants, we detected no CDS-changing mutation within the depleted region, despite an unambiguous V shape implying the presence of a severely deleterious mutation at its base (Figure 4C). This indicates fail-ure to detect the causative mutation by Illumina sequencing. Im-portantly, the frequency of this “unsequenceable” phenomenon is not notably different with the present multiplexed approach than with the mutant-by-mutant bulked segregant sequencing approach (Tulin and Cross, 2014) (see Methods). There are two other possible outcomes. In rare cases, all mu-tations from the mutant are represented; there is no depleted region across the genome. In a few such cases we examined in detail, the source of this problem was shown to be a synthetic phenotype: Two unlinked mutations in the original mutant were required for Ts lethality. In other cases, few or no mutations from a given mutant are present in the pool. This is most likely due to a poorly mating mutant that therefore contributes very few of its mutations to the zygotic pool, neither causative nor passen-gers. This overall low representation can cause spurious regions of apparent depletion simply due to statistical fluctuations, but this is clearly recognizable, since mutations on all chromosomes

Figure 2. Combinatorial Framework to Efficiently and Accurately Sequence Pools of Mutants.

(A) A set of mutants is distributed in overlapping fashion across many pools (in the illustration, 28 mutants in eight pools, A–H). (B) Each mutant has a unique occupancy pattern. Mutant numbers occupying these pools (C and D) are represented in black and gray. Minority reads shared exclusively between two pools allow assignment of a mutation uniquely to the relevant mutant (marked in black). (C) Representative numbers of mutations identified in a set of eight pools. “All assigned” (single-nucleotide polymorphisms) are the number of total genetic lesions that were identified in the set. “CDS-involved” is the number of genetic lesions that change the annotated coding sequence. The com-plementary number is the “Non-CDS.” The number in parentheses is the average per mutant. (D) Example for a detected single-nucleotide polymorphism. Minority reads “C” were identified uniquely in C and D pools (compared with the majority reads “T”) and were assigned uniquely to mutant 14. Together with the adjacent mutation, the lesion (C>T) results in a conversion from leucine to serine in amino acid 151 of CDC20.

Page 6: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1183

Figure 3. Mass Mating of Ts− Mutants Followed by Selection of Ts+ Meiotic Segregants: Behavior of Causative Mutations and of Linked and Unlinked Passengers.

(A) Sets of 42 Mat+ and 42 Mat– Ts− mutants were crossed in a mass mating (see Methods) and allowed to form zygospores (premeiotic diploids; ∼500 zygospores per mutant). Transfer to light triggers meiosis and the formation of four haploid progeny per zygospore. According to Mendelian inheritance for unlinked lesions, the expected result is that 50% of these haploids will inherit a single Ts-lethal mutation from one parent (mutant A or mutant B), 25% will inherit Ts-lethal mutations from both parents, and 25% will inherit no Ts-lethal mutations and will therefore be Ts+. (B) Passenger (neutral) mutations should appear in the Ts+ pool at a frequency of the reciprocal of the total number of mutants in the pool (since any neutral chromosome in the pool is equally likely). However, the rare causative lesions should be entirely depleted, and passenger mutations linked on the same chromosome with the causative lesion will be detected at progressively decreasing frequency as the mutation gets closer to the causative mutation. Rarely, very tightly linked passengers exhibit complete depletion and are not distinguishable from linked causative mutations. In the diagram, the red star represents a causative mutation from some mutant; green or orange indicates a chromosome lacking or containing a causative mutation. Mutations from the causative chromosome that are distant from the causative mutation can readily recombine onto a neutral chromosome. The closer the neutral mutation is to the causative mutation, the lower its frequency in the Ts+ pool (schematics at right). (C) Tetrad analysis of random zygospores of the pools confirms that most parental diploids contain two unlinked Ts− mutations, as expected. PD, parental ditype: 0 Ts+:4 Ts−; TT, tetratype: 1 Ts+:3 Ts−; NPD, nonparental ditype: 2 Ts+:2 Ts−. The net yield of Ts+ is 23% (expected 25%). (D) Plotting frequencies of all predefined lesions on two chromosomes (12 and 16) within the Ts+ pool (far left) and after computational extraction of

Page 7: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1184 The Plant Cell

from such mutants are severely underrepresented. This problem will interfere with the identification of causative mutations that reduce the efficiency of mating or meiosis. We manually curated the lists of assigned mutations to identify candidate lesions of interest based on annotations, but this procedure was generally not fruitful due to the high number of candidates. We applied MAPS-SEQ to ∼350 mutants containing mu-tations blocking cell division and identified their causative lesions (Figure 5). Table 1 shows functional annotation anal-ysis of the new genes detected in this study. These genes span a broad range of cell biological pathways (for a com-plete list of mutants and identified genes, see Supplemental Data Set 2).

Evaluation of Causality Identification

Identification of the characteristic V pattern provides prima fa-cie evidence in favor of the causative mutation mapping to the base of the V, leading to the hypothesis that (one of the) CDS- changing mutations found at the base is causative. We have tested this in multiple ways. We employed genetic tests for 27 mutants. For some, we tested cosegregation of the candidate mutation with Ts lethality in a cross to the wild type using allele-specific PCR (Gaudet et al., 2009). In all cases tested, we observed the expected cosegregation, implying close linkage of the causative mutation to the candidate mutation. Second, some candidate causative lesions are in genes previously identified as bearing independent causative mutations. In such cases, we test the new mutant for failure to complement the previous mutation in heterozygous diploids and for failure to recombine to produce Ts+ progeny in meiosis (Breker et al., 2016). Third, we can select for reversion of temperature sensitivity and determine whether the revertants alter the coding sequence at or near the site of the candidate mutation, implying that this mutation was causative. We isolated revertants for predicted mutants in CYCB1 (CYCLIN B) and in CDC20 (CELL DIVISION CYCLE20) (Figure 2D) and found intra-genic reversion (Supplemental Figure 4), confirming causality. Information about gene validation is included in Supplemental Data Set 2. Finally, to determine the reliability of the whole population of assigned causative mutations, we examined statistical features of the candidates compared with the pool of passenger mu-tations. We showed previously (Tulin and Cross, 2014; Cross et al., 2017) that causative mutations leading to Ts lethality were systematically more likely than passengers to fall in regions of protein sequences conserved in the land plant Arabidopsis thaliana; additionally, causative mutations are on average more “severe” (using the Blosum62 scoring matrix; Henikoff et al., 2002) than passengers.

The large majority of mutations are sure to be passengers, which were therefore not depleted in the Ts+ pool. Consistently, undepleted mutations were preferentially found in higher num-bered BLAST/Blosum categories due to less severe mutation and/or low sequence conservation (Figure 6). Among the depleted mutations (candidate drivers), some fell in genes that we identified previously as yielding Ts-lethal mutations (Tulin and Cross, 2014). We take these mutations as highly likely to be causative (as noted above, where tested, this was confirmed by complementation testing against the previ-ous isolates). These mutations are highly enriched in the BLAST/Blosum categories, indicating more severe mutation/high se-quence conservation, consistent with previous findings (Tulin and Cross, 2014; Cross et al., 2017) (Figure 6). This discrim-ination thus provides a statistical test for depleted mutations in genes not identified in our previous study. These showed a BLAST/Blosum distribution very similar to the mutations in the previously verified genes and very different from the bulk pop-ulation of presumed passenger mutations. This suggests that true causative mutations in previously unidentified genes are highly enriched in the depleted set. Finally, in some cases, more than one closely linked CDS- changing mutation was depleted. Most likely, one of these is causative and the others are closely linked passengers. The depleted mutations from these mutants gave an apparently bi-modal distribution among the BLAST/Blosum categories that suggested the presence of both causative and passenger mu-tations. For almost all such mutants, just one of these linked candidates was in one of the lower numbered (higher causation probability) categories; it is very likely that this one is causative. This idea was tested directly by genetic means (reversion or complementation testing; see above) for two such mutants and found to be correct. From this statistical analysis, we conclude that the depleted mutations are very strongly enriched in the causative subset, and in occasional cases where more than one candidate is detected by depletion, analysis of BLAST/Blosum category will identify the causative one with high probability, consistent with previous results (Tulin and Cross, 2014).

Phenotypes of Gene Inactivation

Because the primary biological focus of the screen was cell cycle regulation, we used DNA flow cytometry of synchronized cells as a primary classifier of newly isolated mutants. Chlam-ydomonas has a long G1, during which cells can grow more than 10-fold, followed by multiple rapid cycles of DNA replica-tion and mitosis. These division cycles occur within the mother cell wall; after approximately four to five divisions, the 16 to 32 daughters stop dividing, develop flagellae, and hatch from

the mutations on these chromosomes specific to mutants A and B. For Chr. 12-Mutant B or Chr. 16-Mutant A, all mutations are passengers; Chr. 12-Mutant A and Chr. 16-Mutant B exhibit the pattern indicative of a causative genetic lesion: decreasing frequency along the chromosome, up to complete depletion (a V shape) (compared with expectation sketched in Figure 2B). The expected slope of this V shape is dependent on the mutant proportion within the pool and the conversion between physical and genetic distance. In Chlamydomonas, the conversion is ∼10 cM per 1 Mb (Merchant et al., 2007; Tulin and Cross, 2014), resulting in an expected slope of ∼0.6 for an average coverage of 3%, approximately as observed.

Figure 3. (continued).

Page 8: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1185

Figure 4. Sequencing of Ts+ Meiotic Progenies Pool Allows Multiplexed Identification of Causative Genetic Lesions.

(A) In most (77%) mutants, such a V shape is observed in exactly one of the 17 chromosomes. Mutants were classified as “single-depleted region” mutants as shown. Once the depleted region is identified, close examination of the mutations within this region gave two possible results. In most cases, one or more CDS-changing mutation (marked in blue) was identified and scored as candidate causative mutations. (B) Mutant Seq4-AH chromosome 12. Arrow indicates candidate causative CDS-changing mutation. This is the only CDS-changing mutation that is completely depleted. (C) Rarely, no CDS-changing (marked in red circles) mutations were identified in the depleted region (Seq1-CE chromosome 7). The latter constitutes the “unsequenceable” class.

Page 9: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1186 The Plant Cell

the mother cell wall to begin another growth cycle. Since the mutations are temperature conditional, phenotypic analysis of temperature-shifted cultures provides an immediate view of functional consequences of inactivation of these genes. There-fore, DNA flow cytometry on wild-type cells in the middle of a partially synchronous division phase results in a mix of large (undivided) cells with 1C, 2C, 4C, 8C, and 16C DNA content and small newborn hatched daughter cells with 1C DNA con-tent (in the latter, cells have left-shifted apparent DNA con-tent due to small cell size). We chose a single time point (14 h restrictive temperature) since in a preliminary wild type time course, the highest proportion of dividing cells was observed at that time. Two-parameter flow cytometry of DNA content by Sytox staining and measurement of forward scatter (reflecting cell size) gives characteristic patterns that with appropriate gat-ing can yield a quantitative assessment of all these populations (Figure 7). These gatings, when applied to mutants that have just reached their arrest points, then provide a quantitative read-out of the arrest. For example, the cdc20 (APC-activating sub-unit) mutant yields a predominant accumulation of large cells with 2C DNA content (Figures 7 and 8). We used this vector for hierarchical clustering of 187 inde-pendent Ts-lethal mutations (Figure 7C) and observed strong clustering based on a few simple patterns (discussed below; complete data for all mutants tested are available in Supple-mental Data Set 3). Similar results were obtained with a different flow cytometry method (Supplemental Figure 5E). In this data set, approximately half of the mutants were repre-sented by two siblings from backcrosses to the wild type. These “pairs” allow an estimate of reproducibility for a given mutation to be obtained. As shown in Supplemental Figure 6, pairs of

siblings are much closer to each other in Euclidean distance than are random pairs, supporting the reliability of the assay. This finding implies that despite the ∼70 coding sequence changes in an average mutant clone, in most cases, only one change has significant phenotypic consequences; the remain-der are near neutral. In some cases, we do detect phenotypic alterations due to moderately deleterious background muta-tions. The most typical such result is a moderate slowing in cell cycle progression; the final phenotype is almost never affected. This slowing does result in some cases in mutants being clus-tered with lower replication levels than achieved in their final arrest or in the arrest of backcrossed strains; some examples are noted in Supplemental Data Set 1. Different mutant alleles in the same genes were also closer in Euclidean distance than random pairs, indicating reproduc-ible phenotypes of independent mutations in most target genes; however, these distances were greater than between siblings, suggesting (as is generally the case) that different alleles may inactivate gene function to different degrees (Supplemental Figure 6). Overall, the hierarchical clustering yielded eight definitive clusters, as shown in Figure 7C. The three most occupied groups are the 1C, 2C, and >2C (Table 2). Additional groups are the “Intermediates,” with high frequency of cells with inter-mediate signal between the discrete DNA peaks: 1/2/4C and 1/2C, which arrest with a mixed population in few DNA peaks (1C, 2C, 4C or 1C, 2C, respectively) and “Fragiles” (Db) for which much of the sample is cell fragments. (Separately, we determined that the fragile mutant cells on an agar plate were very easily disrupted by gentle manipulation with a dissecting needle.)

Figure 5. Representative Yield from Applying MAPS-SEQ to a Large Collection of Ts-Lethal Mutants.

(A) Seventy-seven percent of 340 mutants tested with this procedure were identified with a single depleted (Dep.) region, and for 82% of them, MAPS-SEQ enabled the identification of the exact genetic lesion. Thirteen percent of these were “known genes” (KG), essential genes identified in our previ-ous work (Tulin and Cross, 2014); 50% are “new genes” (NG), candidate essential genes identified in this study; 14% of mutants are “unsequenceable” (US), with a single depleted region, but with no identified CDS-changing mutation. The rest of the mutants (23%) represent poor maters, mutants without depletion, or mutants with multiple depleted regions (PM, ND, and MD, respectively). These are discussed in the text. (B) Within the new genes group, 90% have a single candidate causative lesion, and for 10%, we have detected more than one candidate lesion (usually two), likely representing a tightly linked passenger(s) (see Figure 3).

Page 10: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1187

The 1C Cluster

The largest group by far accumulated almost exclusively cells with 1C (unreplicated) DNA at the blocked stage. Previously (Tulin and Cross, 2014), we noted that many G1-arresting mu-tants exhibit detectably slowed cell growth compared with the wild type, although the mutant cells fail to divide even after even-tually achieving wild-type division size. This was true of many genes in the 1C cluster reported here as well. Thus, for some of these genes, the block in G1 could reflect a general monitor-ing of cell size/growth rate governing G1/S, so that slowing cell growth could cause G1 arrest in a relatively nonspecific manner. For other mutants, G1 arrest could reflect a specific involvement of the mutated gene in a novel regulatory pathway. A mutation inactivating the cyclin-dependent kinase homolog CDKA1 was shown previously to strongly delay (but not block) cell cycle initiation (Tulin and Cross, 2014). cdka1-1 cells grow at normal rates during this elongated G1, reaching very large cell size (much larger than the 1C class mutants described above). Intriguingly, cdka1-1 is synthetic lethal at permissive temperature with a subset of category 1 genes (Atkins, 2018), as well as with a null mutation in the MED6 subunit mediator complex gene (Tulin and Cross, 2014), suggesting a functional

relationship between some general cell growth pathways and CDKA1 function. The 1C class also contained mutants that could carry out cell cycle initiation, as reflected by the formation of incipient cytoki-netic cleavage furrows (Tulin and Cross, 2014) (which occurs at the time of the first division in the wild type) but a specific failure of the initiation of DNA replication. Eight genes annotated for a role in DNA replication fit this description (Table 2; e.g., the helicase genes RECQL4 and RIR1 [encoding the large subunit of ribonucleotide reductase]). The sole Chlamydomonas cyclin B gene CYCB1 contributed to the 1C cluster, but in a complex manner. Different CYCB1 alleles were variably classified as 1C, 2C, or 4C (Supplemen-tal Data Set 1). This probably reflects multiple roles of CYCB1 in promoting DNA replication and perhaps independently pro-moting mitosis, and perhaps also regulating DNA rereplication (Atkins and Cross, 2018). In yeast, the six B-type cyclins carry out all these roles, although with substantial division of labor (Bloom and Cross, 2007).

The 2C Cluster

The 2C cluster (most arrested cells with 2C DNA content) in-cludes almost all mutations recovered in components of the anaphase-promoting complex (APC): mutations in APC2, 6, and 8 and CDC20, with multiple alleles in several cases (Figure 8). In a separate work, we found that when these mutants become arrested, they exhibit nearly uniform metaphase spindles (Atkins and Cross, 2018). In yeast, almost any mutation that prevents the assembly of a functional mitotic spindle results in metaphase arrest with once-replicated DNA because the spindle assembly checkpoint inactivates APC-CDC20. As discussed in the next section, such checkpoint controls appear much less effective in Chlamydo-monas, accounting for the high specificity within our screen of 2C arrest to APC mutations. Some mutants in the 2C cluster are annotated for roles in DNA replication; we assume these are partially functional alleles that manage at least some replication before arrest. Additional genes in the 2C cluster lack annotations that are immediately informative as to their functional connection(s) to replication or mitosis; these are subjects for future work. Interestingly, all APC mutations, but no other mutations in the 2C cluster, strongly blocked cleavage furrow formation. This was noted previously for one such mutation (cdc27-6; Atkins and Cross, 2018). It is currently unknown what APC tar-get might regulate furrowing; CYCB1 was suggested as a can-didate (Atkins and Cross, 2018).

>2C Mutants and Ploidy Control

Some mutants accumulated >2C DNA content when blocked at nonpermissive temperature. In most eukaryotic cells, DNA repli-cation occurs in strict alternation with nuclear division, whether cells are mono- or multinucleate, maintaining control of ploidy at the level of individual nuclei. (An exception is found in endoredu-plicating tissues, with multiple rounds of DNA replication with-out nuclear division.) Strict alternation means that DNA content

Table 1. MAPS-SEQ Has Broad Applicability for the Discovery of Essential Genes Functioning in Diverse Biological Pathways

Functional Annotation No. of Genes

Examples of Likely Orthologs Defined by BLAST

Cell cycle machinery 41 APC1, APC2, APC3, APC4, APC6, APC8, CDC20, AUR2, SMC1, SMC2, SMC3, SMC4, POLA, POLD, POLE

General transcription 9 POLII, SPT16, TFIIH, SPT6

Splicing 4 SRPK, PRP4, PRP8Translation 3 GTS2, DOH1, TSK2Protein modification 3 UBC12, SENP7,

CLPP4Stress/protein folding 9 IRE1, HSF, CCT1,

SNF1Glycosylation/

secretion11 UDP-galactose and

-rhamnose epimerase, UDP- sugar transport, PGI, NSF

Transporters 3 Magnesium transporter, sulfate transporter

Other/unknown 19 GIF3, DUO3, clathrin, NUP205, Exportin-2, MIDASIN, BSL1

Classes of functional annotations of the genes newly discovered as sequence-identified Ts-lethal alleles in this work, the number of identi-fied genes in each class and examples of their land plant orthologs are shown.

Page 11: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1188 The Plant Cell

per nucleus will range between 1C and 2C, independent of the number of nuclei in a cell; multiple rounds of replication without nuclear division will result in nuclear DNA content >2C. Since Chlamydomonas cells in the rapid division phase stay within the mother cell wall, a given cellular DNA content can correspond to one or many nuclei (e.g., a 4C cell could have four 1C nuclei, two 2C nuclei, or one 4C nucleus; only the last case provides

evidence of loss of ploidy control). To resolve this ambiguity, we used the Amnis imaging flow cytometer, which captures fluo-rescent images of each individual Sytox-stained cell, allowing a count of nuclei in the cell while simultaneously quantifying total cellular DNA content (Figure 9; Supplemental Figures 5 and 7). For wild-type Chlamydomonas, we observed an average ratio of DNA/nucleus of <1.5 C, averaging over cells with total

Figure 6. Conservation and Evolutionary Constraints Confirm the Likelihood of Candidate Causative Lesions.

As described (Tulin and Cross, 2014), we classify mutations according to sequence conservation and severity. Category A: mutation falls within a segment of BLAST alignment to Arabidopsis (HSP) and alters a conserved residue (Blosum62 > 0 for Chlamydomonas versus Arabidopsis) within this segment. Category B: mutation falls within an overall conserved region but alters an unconserved residue (Blosum62 ≤ 0). Category C: mutation lies N-terminal or C-terminal to all detected HSPs. Category D: mutation in gene with no Arabidopsis BLAST hit. Bar graphs demonstrate the distribution of the identified mutations in this work according to the classified groups. Within each category, mutations are classified according to severity: severe mutation (Blosum62 <−1); less severe mutation (Blosum62 ≥−1). (A) Distribution of all mutations (including both the depleted ones from the Ts+ pool and presumably passenger mutations that appear in the pool). (B) Distribution of mutations depleted in the Ts+ pool (potential drivers) within genes that were previously verified to be essential (Tulin and Cross, 2014) (known genes). (C) Distribution of mutations depleted in the Ts+ pool that are in genes newly identified in this work (new genes). (D) Distribution of depleted mutations for mutants that were assigned with more than one candidate (in nearly every case these mutations are closely linked). The distribution in (D) suggests bimodality (a combination of the patterns in [A] and [B]), as would be expected for a mixture of driver and linked passenger mutations in this class (see text). (E) The essential genes set (middle) is strongly enriched (78%) for orthologous genes in Arabidopsis compared with the total Chlamydomonas pro-teome (left) (P < 0.00001). In contrast, the nonessential gene set (right; defined as genes with passenger mutations that are candidate null alleles; Supplemental Data Set 1) is depleted for orthologous gene content compared with the total proteome (left) (P < 0.00001).

Page 12: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1189

Figure 7. Functional Grouping of Mutants Based on Hierarchical Clustering of DNA Content Profiles.

Dot plots of DNA content (intensity of SYTOX green staining) and cell size measured by forward scatter (FSC). Gates are applied to extract the number of cells in each DNA peak. Gate colors: gray, newborns; cyan, 1C large (pre-S phase); red, 2C; orange, 4C; purple, 8C; brown, 16C; black (dotted circle), cell aggregates. (A) In a wild-type sample there are multiple peaks of DNA content, reflecting sequential rounds of replication. (B) The cdc20 mutant arrests with mostly 2C. (C) Numbers of cells in each gate are extracted; accordingly, each sample has a characteristic vector of numbers. (D) Hierarchical clustering of all samples based on the DNA content profiles. Letters on the right indicate the definitive groups that emerge from the clustering. Letters below the clustergram indicate the discreet peaks of DNA content. NB, newborn; Agg, aggregates; Db, debris; Int, intermediates. The color key on the left provides the standard deviations from the mean (=0) for the normalized values in the clustergram.

Page 13: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1190 The Plant Cell

cellular DNA contents of 1C, 2C, 4C, and 8C (Figure 9). This in-dicates strict alternation of DNA replication and nuclear division, as shown previously by microspectrophotometry (Coleman and Maguire, 1982). We tested 177 mutants in this assay and ob-served ploidy ratios <2 for almost all mutants (Figure 9D). How-ever, mutations in 15 genes resulted in severely defective ploidy control; for example, the spc25 mutant (Figure 9C) goes through repeated rounds of DNA replication with no nuclear division, re-sulting in a ploidy ratio of 4.6. Most of these mutants had lesions in genes for which annotation suggest a function in structural formation of mitotic structures (spindle and condensed chromo-somes; for example, SPC25 has homology to a known kineto-chore component) (Supplemental Data Set 1). We have checked a subset of these mutants and found that they lack normal spin-dle structures at their blocks (data not shown), confirming previ-ous results that suggested a weak spindle checkpoint (Tulin and Cross, 2014). Most mutants in the high-ploidy cluster form one or more apparent cleavage furrows, as judged by light micros-copy, indicating independence of initial cytokinesis from spindle formation. It is interesting that mutants with lesions in two subunits of condensin, SMC2 and SMC4, are in the high-ploidy class. This may reflect the involvement of the condensin complex in the construction or functioning of the spindle (Wignall et al., 2003).

Ploidy Control and Cyclin-CDK Regulation

Three other genes exhibited a suggestive high-ploidy sig-nal, though less extreme than that observed with the spindle assembly mutants. These genes encode the plant-specific

cyclin-dependent kinase CDKB, the CKS1 subunit of CDK-cyclin complexes, and the APC1 subunit of the anaphase-promoting complex (Supplemental Data Set 1). CDKB is the main mitosis- inducing CDK in Chlamydomonas (Tulin and Cross, 2014). The cdkb1-1 allele described previously arrests robustly with a sin-gle nucleus and once-replicated DNA (Tulin and Cross, 2014). The cdkb1-2 and cdkb1-3 mutants tested here show leakage

Table 2. Hierarchical Clustering Basted on DNA Content Yields Distinct Functional Categories within the Chlamydomonas Cell Cycle Mutants

DNA ContentTypical Annotations Gene (No.)

Orthologous Genes in Arabidopsis

1C Growth related; DNA replication

50 76%

2C APC subunits; DNA replication; CDKB

13 84%

4C/8C Spindle formation

25 80%

Clustering all tested mutants according to their DNA content profile pro-vides three major functional groups, each highly occupied. Listed in the table are the groups, including the number of genes in each one, and their functions as suggested by characteristic annotations. Mutants that had an intermediate phenotype were counted in the nearest category that fits their DNA profile (1/2C were either included in 1C or 2C; 1/2/4C were included in 4C/8C).

Figure 8. Mutants in APC Subunits and Some DNA Replication Factors Arrest after the First Round of Replication.

DNA-stained samples were analyzed by flow cytometry, and gates were applied as described in Figure 7. Hierarchical clustering of all samples tested reveals distinct functional groups, among them arrested samples at 2C that are mostly composed of APC subunits (A) and DNA replication factors (B). NB, newborn; Agg, aggregates; Db, debris; Int, intermediates. The color key on the left provides the standard deviations from the mean (=0) for the normalized values in the clustergram.

Page 14: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1191

through this block: Some cells replicate a second time to 4C, and these 4C cells were mostly mononucleate, indicating par-tially defective ploidy control. Based on the imaging flow cytometry data, we examined all mutants with a significant 4C population and determined the proportion of those 4C cells that appeared mononucleate (as noted above, this is a hallmark of failure of ploidy control, since a 4C mononucleate cell must come from two rounds of replica-tion without intervening nuclear division). This is a more sensi-tive quantitative assay for ploidy control than the overall ploidy ratio, since it focuses only on the subset of cells that get to 4C (and thus have the potential to exhibit ploidy control failure). In addition to the mutants already described, this sensitive assay detected sporadic failure in ploidy control in cells mutant for APC4 and APC6, since the rare 4C cells frequently contain only single nuclei (Figure 9F). As noted above, we isolated multiple alleles in the sole Chlamydomonas cyclin B. Several CYCB1 alleles clustered with the 1C-arresting mutants, suggesting a role for cyclin B in rep-lication initiation (Supplemental Data Set 1; Atkins and Cross, 2018). However, other alleles accumulate higher DNA content (2C and 4C); these alleles also showed a low level of defective ploidy control (cycb1-1 and cycb1-8; Figure 9F). Thus, in addition to mutations affecting spindle morphogen-esis, ploidy control defects were highly specific to cyclin B, CDKB, and the APC. The cks1-1 mutant also had a partial ploidy control defect; CKS1 is an essential subunit of CDK complexes in yeast and animals, and it was recently shown to play a direct role in activation of the APC by cyclin B/CDK1 (Fujimitsu et al., 2016; Qiao et al., 2016; Zhang et al., 2016). In yeast and animals, replication origin loading with the MCM helicase complex is a key step in controlling DNA replication. CDK complexes both promote replication initiation from loaded replication origins and block reloading of replication origins after initiation (Bell and Dutta, 2002). This system may couple a single round of DNA replication to a single cycle of CDK activity. In animals, the APC also directly regulates origin reloading by promoting proteolysis of the loading inhibitor geminin (McGarry and Kirschner, 1998). It is unknown how replication origin reloading is controlled in the plant kingdom; our results suggest a complex role for CYCB1/CDKB1 (perhaps in complex with CKS1) and the APC in regulat-ing replication and ploidy control.

Insights from Annotations: “Mystery Genes”

Most of the genes in the mutant collection are reasonably ex-pected to directly affect cell cycle regulation based on likely function, as suggested by alignment and annotations, com-pared with results for similar or orthologous genes from fungi and animals (e.g., ORC and DNA polymerase subunits, cyclin B, γ-tubulin ring complex subunits). However, this is not the case for a minority of detected genes (Supplemental Data Set 4 lists 47 such genes, combining results from this work and from Tulin and Cross [2014]). 38 of these 47 genes are in the G1 cluster. Nine of these have little detectable growth deficit, but still arrest quantitatively with 1C DNA content (examples: Cre09.g400478 [phosphatidylinositol 4P-5 kinase PIP5K9 ortholog] and Cre06.g278197 [desumoylating protease SENP7 ortholog]).

The mutants we analyzed were collected based on Ts lethal-ity, then screened for near-wild-type cell growth proficiency combined with failure to complete division. These simple cri-teria were chosen deliberately to avoid being too restrictive. However, as a consequence, some of the mutated genes may be needed for essential processes not directly related to the cell cycle, but pass the criteria, for example, due to phenotypic lag allowing significant cell growth before arrest. Further work will be required to evaluate how directly the listed genes are involved in cell cycle regulation. In any case, it is surprising that based on annotations, a large majority of genes identified by these simple criteria have been clearly implicated in cell cycle regulation or execution based on prior work in fungi, animals, and/or land plants. This global finding implies that the gene set for core cell cycle regulation is largely unchanged since the time of the last common an-cestor of animals and plants, very early in eukaryotic evolution (Rogozin et al., 2009). Because our procedure for identification of causative mutations is annotation independent, plant-kingdom- specific cell cycle regulators should have been detected with equal efficiency to regulators conserved across eukaryotes. Plant-kingdom-specific regulators do exist (e.g., the serine/ threonine-protein phosphatase BSL1 and CDKB1; Tulin and Cross, 2014; Atkins and Cross, 2018) but represent only a small proportion of the total. It is important to note that sequence homology, or even orthology, does not allow findings from yeast and animals to be directly imported to the plant kingdom: Orthology is neither necessary nor sufficient for equivalence of function (Cross et al., 2011). The conditional mutants obtained here will be important tools to evaluate this. For example, while CDK1 is the proximal activator of mitosis in fungi and animals, its ortholog CDKA is dispensable for mitosis in Chlamydomonas and in land plants, instead functioning primarily in early cell cycle commitment; mitosis is instead promoted by the plant-kingdom-specific CDKB (Nowack et al., 2012; Tulin and Cross, 2014; Atkins and Cross, 2018).

Conservation of the Identified Genes in Land Plant Genomes

Chlamydomonas has the potential to serve as a powerful mi-crobial model for multicellular land plant cell biology. For this to be an effective approach with the temperature-sensitive lethal mutations found here, it is important to have an understanding of the evolutionary conservation of the genes and their func-tions. While functional conservation is hard to address without detailed characterization, sequence conservation can be quan-tified. The Arabidopsis genome is the best understood among land plants, so we concentrated on this comparison. Best reciprocal BLAST analysis (Remm et al., 2001) identifies candidate orthologs, and the DOE Phytozome website has a genome-wide ortholog database generated with this method for Arabidopsis versus Chlamydomonas (https://phytozome.jgi.doe.gov/pz/portal.html). Only 21% of Chlamydomonas proteins are reported as having Arabidopsis orthologs; however, ∼75% of the genes identified here by Ts-lethal mutations encoded pro-teins in this subset. This striking enrichment confirms previous

Page 15: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1192 The Plant Cell

Figure 9. Imaging Flow Cytometry Used to Determine a Ploidy Ratio.

(A) Diagram of the Chlamydomonas cell cycle. (B) and (C) Wild-type (B) and spc25 (C) samples analyzed by image stream provide a similar pattern of discrete peaks, suggesting a complete rounds of replication, as analyzed by conventional flow cytometry. The table provides the values for the average number of nuclei (AveNuc.) under each peak and the calculated DNA content per nucleus in each peak accordingly. Representative images for the wild type demonstrate the nuclei distribution and size within one cell before and after segregation (segregation was essentially unobserved in the spc25 mutant at any nuclear ploidy level). (D) Distribution of DNA content per nucleus calculated for all mutants. Black arrow marks wild-type value. (E) Mutants whose nuclei have average DNA content >2C were enriched for microtubule synthesis and spindle formation components (Supplemental Data Set 2; orange diamonds in [F]). Their DNA content per nucleus in each peak increases with each DNA replication round (colored lines), unlike the wild type, which maintains normal nuclear ploidy throughout the replication cycles (black lines). (F) For sensitive detection of defective ploidy control, we examined 71 mutant samples for which at least ∼10% of cells were in the 4C peak. We exam-ined images captured by Amnis for these 4C cells specifically and determined the proportion of mononucleates. These images provide one orientation and one plane of focus from a cell; a binucleate cell will necessarily get called as mononucleate at some frequency. We attribute the large signal at ∼0.2 to this background; therefore, a signal above 0.25 is likely specific to a defect in ploidy control. Some of the discussed genes are marked. Orange diamonds correspond to mutants in (E) with average DNA content >2C per nucleus.

Page 16: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1193

findings on a smaller sample of genes (Tulin and Cross, 2014; Cross et al., 2017) and strongly supports the idea that the es-sential gene set in Chlamydomonas provides a good model for cell-essential genes throughout the plant kingdom. The frequent existence of paralogous sequences (gene du-plicates) in land plant genomes but not in Chlamydomonas was noted in the Introduction. Consistently, almost half of the genes that we identified based on Ts-lethal mutations align by best reciprocal BLAST analysis to multigene families with high sequence conservation within the Arabidopsis genome (Sup-plemental Data Set 1). Genetic redundancy (very common with paralogous sequences) would then imply that in these cases, even complete null alleles of individual family members in Ara-bidopsis might have very mild or no phenotype (indeed, this is a very common observation in Arabidopsis, where combinations of multiple paralog knockouts are frequently needed for strong phenotypes suggesting function). Thus, approximately half of the cell-essential gene set overall might be extremely difficult to identify genetically in Arabidopsis solely for reasons of genetic redundancy, whereas this problem is almost nonexistent in Chlamydomonas for the orthologous gene set.

DISCUSSION

In this work, we screened many thousands of Ts-lethal muta-tions, selected ∼350 for detailed characterization, and applied the MAPS-SEQ method to identify 260 genomic regions and 150 candidate essential genes, many identified in multiple alleles. We also identified likely null alleles (premature stop codons or strong splicing mutations) in ∼700 nonessential genes. In previous work with point mutants, identification of causative mutation/gene has been a serious bottleneck. Compared with bulked segregant sequence analysis, which was the most ef-ficient method from previous work (Tulin and Cross, 2014), MAPS-SEQ works with comparable accuracy to one-at-a-time bulked segregant sequence analysis, with (at least) dozens of mutants tested in a single experiment. The reason it is effective is that in bulked segregant sequence analysis, at every posi-tion the reads are either all wild type (almost always the case) or rarely will show some representation of a mutant sequence. Most positions are simply wild type, sequenced over and over for mutant after mutant. This is wasteful. In MAPS-SEQ, all mu-tants in the pool contribute representations of their mutations at each position, so many more positions are diverse and in-formative. In three other respects, MAPS-SEQ is better than bulked segregant analysis. First, mating and tetrad analysis is required for each individual mutant with bulked segregant anal-ysis, while with MAPS-SEQ, one mating reaction and no tetrad analysis is required. Second, with 10 Ts− segregants pooled (the number we used previously), there is considerable variability in representation of any mutant region due to statistical variation in genetic composition of the ten pool members. With MAPS-SEQ, the total number of meiotic products contributing to the pool is many hundreds, allowing <1 cM resolution at sufficient sequence coverage; in addition, undepleted (neutral) mutations are present at a near-constant proportion (Figure 4). Finally, MAPS-SEQ only moderately increases needed sequence cov-erage, but drastically reduces labor-intensive upstream genetic

manipulations and sample preparation (Supplemental Table 1). This means that the focus can be affordably placed on a broad mutant collection without highly specific criteria for inclusion (an advantage, since such criteria inevitably incorporate biases and incorrect ideas about how the “good” mutants should behave phenotypically). It is of interest to ask how complete the current collection is with respect to the cell cycle. Zones et al. (2015) list 68 Chlamydomonas genes with annotations for DNA polymeras-es, the ORC and MCM complexes, the condensin and cohes-in complexes, and cyclin-CDKs. We have Ts-lethal mutations in 32 of these genes (47%), distributed across these catego-ries. We assume that this implies a similar ∼50% hit rate for all genes involved in cell cycle regulation, including for genes where annotations so far do not suggest such a connection. Since most biological pathways depend on multiple genes and proteins, availability of Ts-lethal mutations in 50% of cell cycle genes means that almost all such complexes can be inactivated conditionally by one or more independent mutations in the present collection. This should be a valuable resource for fu-ture work: The cell cycle impinges on nearly all aspects of cell biology, directly or indirectly, and this collection allows efficient determination of how this happens. Any assayable event can be monitored across a spectrum of Ts-lethal backgrounds blocking the cell cycle at diverse points, to obtain an informative first look at dependency relationships between cell cycle regulation and other events of interest (e.g., regulated transcription, photosyn-thesis, basal body/centriole/flagellar function, protein localiza-tion, and chloroplast division). The MAPS-SEQ method is directly applicable to any haploid system with Mendelian genetics (Chlamydomonas, yeast, and fungi). It also should work with diploid multicellular organisms with modifications (conceptual mating and screening schemes are presented in Supplemental Figure 8). It works with any scorable trait (it is not restricted to essential genes) and will work with mutations in different pathways and different mutant phenotypes, provided these can be independently scored. Phylogenetic analysis suggests plants diverged very early from nonphotosynthetic eukaryotes (Rogozin et al., 2009). Since the consensus models for most cell biology stem from yeast and animal studies, forward, unbiased screens are important to determine where plant kingdom-specific divergences might be found. A major limitation to most mutant hunts has been the expensive and labor-intensive sequence identification of the causative mutation. This led to the need for careful phenotypic selection of specific mutants for sequencing, meaning that most mutants remained unidentified; if phenotypic selection was biased in some way, then entire classes of mutants might be missed. The methods reported here can alter the balance, allowing the production of a large and unbiased collection of sequence-identified conditional mutants inactivating essential genes. Such a collection synergizes with recently developed systematic insertional disruption libraries knocking out nones-sential genes (Zhang et al., 2014; Li et al., 2016), and with the developing ability for targeted genome editing in Chlamydomo-nas (Baek et al., 2016; Greiner et al., 2017), strongly enhances the utility of Chlamydomonas as a powerful genetic model for plant kingdom cell biology.

Page 17: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1194 The Plant Cell

METHODS

Strains, Mutant Isolation, and Genetic Characterization

Chlamydomonas reinhardtii congenic strains CC-4402 (mt+) and CC-4403 (mt−) were provided by Susan Dutcher (Department of Genetics, Washington University School of Medicine). Medium was prepared as described (Dutcher, 1995; Harris, 2008). Mutant isolation, complementa-tion, linkage, and reversion analysis was as described (Breker et al., 2016).

Combinatorial Pooling

Individual mutants were grown as patches on agar TAP plates. Patches were scraped off, weighed, and suspended in liquid TAP (2:1 [w/w] TAP:pellet). Next, equal volumes from each mutant were pooled accord-ing to their unique occupancy pattern within the set of pools, resulting in ∼1 g of cell pellet for each pool.

Preparation of a Multiplexed Bulked-Segregant Pool

Fresh cultures of each mutant were suspended in nitrogen-free gamete- induction medium (Dutcher, 1995). Equal amounts of each mutant were pooled according to mating type. Following 5 h of gametogenesis in the light (using LED bulbs for illumination at light intensity of ∼4000 Lux on average), cells were mated for ∼1 h in liquid, plated on TAP plates, and incubated 12 to 16 h in light and 5 d in the dark for zygospore maturation. The plates were scraped with a sharp razor to remove most unmated cells and meiosis was induced by 2 to 3 d light incubation. Once haploid progeny began to proliferate, the cultures received three rounds of selec-tion with replating at restrictive temperature 2 d each. (Due to the mixture of mating types in the Ts+ resulting pool, it is important to keep replating before the plates are overgrown to avoid mating within the pool when ni-trogen is depleted.) Genomic DNA was extracted from the final Ts+ pool.

Genomic Sample Preparation

Cell pellets were suspended in 8 mL TEN (10 mM Tris, 10 mM EDTA, and 150 mM NaCl, pH 8) with 0.5% SDS and 0.1 mg/mL proteinase K and heated to 50°C for 1 h. Samples were sonicated and incubated at 50°C for additional 1 h. Lysate was extracted twice with 16 mL phe-nol-chloroform and was precipitated with 16 mL 100% ethanol. The pre-cipitate was rinsed with 70% ethanol and resuspended in TE (10 mM Tris and 1 mM EDTA, pH 8). NaCl was added to 1 M, and the solution was phenol-chloroform-extracted and ethanol-precipitated. The pellet was washed in 70% ethanol and resuspended in 200 μL TE. Samples were treated with RNase A, and purified DNA was fragmented by sonication (Covaris S2 Focused Ultrasonicator) to 300 bp, followed by gel excision of a size range from 200 to 500 bp. Following gel extraction, DNA was end-repaired and dA-tailed. Illumina TruSeq sequencing adapters were ligated to dA-tailed DNA. All enzymatic reactions were performed using New England Biolabs kits. Adaptor-ligated DNA (350 to 600 bp) was purified using carboxylic acid-coated Dynabeads MyOne (Invitrogen). DNA concentration was estimated both by agarose gel electrophore-sis and conventional qPCR using seven dilutions (500×–32,000× in 2× dilution steps) of a standard sample. Efficiency of end modification is tested as follows: (1) In a small test ligation, sonicated DNA self-ligates very inefficiently (most ends staggered). End repair results in fragments that self-ligate very efficiently, to concatemers of ∼>5 fragments, as as-sayed by gel electrophoresis. (2) Conversely, successful A-tailing elim-inates self-ligatability, but allows ligation of a T-tailed Y-adapter to the fragments, causing a migration shift of ∼100 bp. Importantly, library preparation was PCR free. Samples with different indices were pooled with approximately equal representation. We estimate >80% conversion

of genomic fragments to doubly end-ligated library fragments. These procedures result in significantly more even representation across the genome, although read count is still overdispersed compared with counting error expectations, due to reproducible (sequence-determined) variations in the number of reads detected across the genome (data not shown). Illumina sequencing was performed by Beijing Genomics Insti-tute, New York Genome Center, or GENEWIZ.

Sequencing and Bioinformatics

Overlapping Pool Sequencing

Placing each mutant in two of n pools allows a unique pool pattern for (n*(n−1)/2) mutants. While other patterns of pool occupancy allow more mutants per pool (e.g., using all possible occupancy patterns allows 2n−1 mutants), the two-pools-per-mutant scheme has more favorable error-correcting characteristics (Supplemental Figure 2). For example, in the full pattern procedure, a mutation detected in pools A, B, and C could be genuinely AB and false positive in C, ABCD with a false nega-tive in D, etc. With two mutants per pool, false positive or false negative instances always result in a detectable error, with a minimal number of possible corrections.

We aimed for ∼90 to 100× coverage of each pool and 14 to 16× coverage per mutant. We tried three levels of pooling with 28 and 36 mutants in sets of eight or nine pools, respectively (each mutant ap-pears in two pools and accordingly each pool is composed of 7 or 8 mutants, respectively), and 84 mutants in a set of nine pools where each mutant appears in three pools and accordingly each pool is composed of 28 mutants (Figure 2A). Mutation assignments, as well as successful identification rate of driver lesions, were with similar frequencies in all setups. Assignment of mutations to individual mutants is based on de-tection of a mutation in exactly the two pools to which the mutant con-tributed (Figures 1B and 2A; Supplemental Figure 2). In one experiment, we detected ∼5700 genetic lesions in 28 sequenced specimens and uniquely assigned an average of ∼200 to each specimen (Figure 2C). The number of lesions assigned per mutant varied from 87 to 386, due in part to diverse UV dosages during mutagenesis (Breker et al., 2016).

Statistical analysis strongly supported the reliability of the assign-ments of mutations to strains. Eighty-five percent of the mutations were detected in just two pools as expected, with only 1.4% of the mutations being detected in a single pool. Mutations (13.6%) were assigned to three pools; however, in essentially every case, two of these pools had >90% of the total mutation read count, and the third pool typically had only a single read, deriving most likely from Illumina read error, barcode misassignment or low-level cross-contamination of pooled cell or DNA samples. After discarding such stray single reads, the net result was assignment of >95% of mutations to a single specific mutant based on specific double pool occupancy.

In sequence data from a pool containing a mutant with a lesion at some position, the number of mutant reads should average the total read count at the position divided by the number of mutants contributing to the pool, distributed with Poisson (counting) variability. We found a good fit to this expectation (Supplemental Figure 2).

Overall, these statistical features indicate that mutations were reli-ably detected in just two pools at the expected frequencies, indicat-ing high-quality assignments. Systematic, reproducible variation in total read count at individual positions, mainly due to variability in recovery of alignable reads, prevented effective detection over at most 10% of the genome (Supplemental Figure 2G).

Bioinformatics

After paired-end Illumina sequencing, reads were aligned to the refer-ence genome using bowtie2 (Langmead et al., 2009) and processed

Page 18: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1195

with mpileup (Li et al., 2009) to determine reference and variant calls at each position. mpileup was run without the “base quality adjustment” feature (flag –BAQ), in order to recover raw Illumina quality scores for each call in the output. mpileup output for each chromosome was pro-cessed by MATLAB code to a matrix form: six rows (A, C, G, T, de-letion, and insertion) and one column for each chromosomal position (code is available upon request). Read count and average quality score for each call at each position was recovered in matrix form. Mutation detection was always performed in comparison of such matrices derived from parental sequence data to mutant pool data. Thus, common errors (from sequencing, alignment, etc.) found in both “subjects” (parents) and “queries” (mutant pools) were disregarded. This matrix form makes de-tection, quantification, and initial evaluation of mutant reads very rapid using simple MATLAB matrix operations.

This output provided an initial, highly error-prone list of candidate mutations. Errors must be carefully screened out because in these pro-cedures even rare (low minority) calls must be considered. A low Illumina quality score for the called position is a major error predictor, as expected. Another predictive feature is consistency of calls at a position, measured by coefficient of variation of read count at a position, across the six pos-sible calls (low CV correlates with error); another is total read count at a position (positions with low overall read count are prone to errors). We also found a number of indicators not accessible in pileup format: Features of the read in which a candidate mutation call is embedded. These features include mapping quality, average Illumina quality score over the entire read, number of mismatches to reference (excluding the mutant call itself and a few positions to either side), and the proportion of repetitive sequence (indicated as “soft-masked repeat” in the reference genome from Phytozome). These error indicators were only partially cor-related (data not shown); therefore, screening out individual reads with conservative thresholds for each indicator vastly decreases errors while removing only a small proportion of total reads. Determining these features requires computational reexamination of the bowtie2 .bam aligner out-put. For efficiency, we did this reexamination only on candidate positions detected using the matrix format derived from pileup output, screened for number and proportion of mutant calls and CV for subjects and queries.

Assignment of mutation candidates to specific mutants further re-quires that the call be made either in exactly the two pools the mutant was entered into, or at most one more pool, and that at least 90% of the reads be found in only the two relevant pools. These criteria screened out only a small number of candidates, and they were obvious false pos-itives (a typical example would be a position with one to three mutant reads each in four or five different libraries).

Candidates passing these criteria are almost all correct calls (that is, they represent genuine mutations present specifically in the assigned mu-tant), based on criteria discussed above; therefore, false positives are rare. Since we recovered the true causative mutation in most mutants, false negatives (true mutations excluded by the criteria) were also relatively rare.

Determination of the Causative Mutations

After construction and sequence analysis of the Ts+ pool, the final output of the pipeline is a MATLAB data structure containing (for each mutant) the occupancy pattern, assigned mutations, and frequency of these mu-tations in the Ts+ pool sequencing. Using this as an input for a plotting program PlotMutSnps.m, the assigned mutations for each mutant were plotted as shown in Figure 4: x axis is the position on each chromosome and y axis is the fraction of mutation reads in the relevant Ts+ pool.

For each chromosome, least-squares errors were calculated for the hypothesis of no depletion, and for the hypotheses of depletion at some position on the chromosome, along with the expectation of single crossovers at random positions proximal to the site of depletion. These lines were graphed in red or green, respectively, depending which had a better fit. This aided in the visual detection of clearly depleted genomic

regions, allowing an initial classification of mutants based on number of depleted regions (0, synthetic lethal; 1, single gene; 2 or more, multiple depletion; all depleted, poor maters). Additionally, each chromosome was plotted separately with the frequency of mutations in each assayed Ts+ pool one next to each other to verify correct assignments. Mutations are represented as red circles (non-CDS-changing mutations) and blue circles (CDS-changing mutations). Identified depleted mutations were checked specifically by clicking on the relevant circle and obtaining a complete report on the position, reads, gene, and mutation effect. All CDS-changing mutations depleted in an identified depleted region were considered to be candidate causative mutations. For a minority of mu-tants (∼15–25%), we detected a clear V pattern, with no CDS-changing mutation at its base. This can be explained in two ways: first, failure to recover any aligned reads over the critical region, either due to failure of Illumina sequencing to produce reliable reads due to sequence charac-teristics, or failure of alignment (for example, if the mutation occurs in a repetitive [unmappable] sequence). Another possibility, for cases where a depleted region was identified and only non-CDS-changing mutations were detected, is that some of these changes actually interfere with gene function. For example, some apparently non-CDS-changing mutations might in fact alter coding sequences that are not recognized in the anno-tation. This was observed previously: a strong loss-of-function mutation mapped to a probable premature stop codon mutation that according to annotation was in the 5′-untranslated region; most likely this region was actually part of the CDS (Cross, 2016). In this work, the CYCB1 mu-tations were initially missed, as the mutations are in a region annotated as the 3′-untranslated region of the gene adjoining cyclin B1 in tail-to- tail orientation. CYCB1 was annotated as containing only an N-terminal fragment (https://phytozome.jgi.doe.gov/pz/portal.html); however, se-quence alignment showed that this proposed 3′-untranslated region contained sequence highly homologous to the C terminus of cyclin B, and transcription data supported inclusion of these sequences in the cyclin B1 coding sequence (https://phytozome.jgi.doe.gov/pz/portal.html; Atkins and Cross, 2018). Another possibility is that some appar-ently silent mutations could change codon usage to a very rare/absent tRNA (Sørensen et al., 1989; Naya et al., 2001; Cannarozzi et al., 2010; Gingold and Pilpel, 2011; Salinas et al., 2012). A final potential cause of “unsequenceability” is a gene whose sequence is missing in part or in whole from the reference. This is difficult to evaluate at present; it is clear that there is at least some level of missing coding sequence from the current assembly (Tulin and Cross, 2016).

BLAST-Blosum Scoring

We used BLAST-Blosum scoring as described previously (Cross et al., 2017). Briefly, mutations are categorized into four classes by BLAST analysis. Class A: Mutation falls within a segment of BLAST alignment (high-scoring pair [HSP]), and the mutation alters a conserved residue within this segment. Class B: Mutation falls within an overall conserved region but alters an unconserved residue, is BLAST-aligned across a small deletion in the Arabidopsis thaliana sequence in the HSP, or is found between two distinct HSPs. Class C: Mutation is N-terminal or C-terminal to all detected HSPs. Class D: No Arabidopsis BLAST hit. Blosum scoring reflects the likely severity of effect of substitutions to disrupt protein function. We used the Blosum62 score (Henikoff and Henikoff, 1993). Segregating the BLAST classes A to D according to the severity score (mutation has Blosum <−1 or ≥−1) yielded eight informa-tive classes (Cross et al., 2017).

Time-Lapse Microscopy Screens

Freshly growing arrayed mutants organized in 100-block were plated on low-nitrogen TAP plates in the light at 21°C to allow synchronization.

Page 19: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1196 The Plant Cell

Following 2 d of growth, colonies were replicated on TAP plates and light microscopy images were taken every at time 0, 10, 20, and 48 h at 33°C.

The “fragility” category is based on sensitivity of arrested cells on an agar pad to manipulation with a dissecting needle. Wild-type and most mutant cells could be moved without problems. Fragile mutant cells promptly exploded when touched with a needle.

Synchronization

Cells were plated on 0.1× nitrogen TAP plates in the light at 21°C. After 2 d, the cells had completed divisions and arrested as small newborns. Cells were then plated on TAP plates at 33°C with light. After 14 h, wild-type cells were in the middle of the division phase (mix of cells undergo-ing S phase and mitosis, and newborn cells).

Flow Cytometry

Samples were fixed in 3:1 ethanol:acetic acid (15 min), washed in PBS, incubated in PBS+RNase A (100 μg/mL) at 37°C for 2 h, and resuspended in PBS. DNA was stained using PBS + 500 nM SYTOX Green (Thermo Fisher Scientific). Samples were read (20,000 cells each) in the BD Accuri C6 platform using 90% attenuator to avoid saturation in the FL1 chan-nel (excitation, 488 nm; emission: 530 ± 15 nm). Data were processed and analyzed using a downloadable MATLAB program (Nitai Steinberg; https://www.mathworks.com/matlabcentral/fileexchange/38080-flow- cytometry-gui-for-matlab). Gates for each DNA peak were set according to the wild-type sample and were applied accordingly for each sam-ple. Gates were slightly modified to account for minor variations in peak positions between samples; these were mainly due to a known effect whereby larger cells give a higher signal with the same amount of nuclear DNA (Tulin and Cross, 2014).

Image Stream Analysis

DNA-stained samples were analyzed in an ImageStream X Mark II Im-aging Flow Cytometer (10,000 cells from each). Images were then an-alyzed using IDEAS6.2. In order to distinguish between single dividing cells versus aggregates, many samples were examined manually and two readouts were found to be efficient: area of the object, and the longitudinal/horizontal axes ratio. The gate for singlets used is as fol-lows: area <50 ratio 0.3 to 1 (newborn cells) and 50 < area < 200 ratio 0.75 to 1. A few independent samples were than examined with this gate, and the number of aggregates within singlets gate and number of single cells excluded were counted. Sample 1: 2/250 cells counted were aggregates (total number within gate: 6054 cells), 43/160 cells counted were single cells (total number outside the gate: 1213 cells). Sample 2: 1/240 cells counted were aggregates (total number within gate: 5973 cells), 47/160 cells counted were single cells (total number outside the gate: 1190 cells). Sample 3: 11/250 cells counted were aggregates (total number within gate: 6136 cells), 34/125 cells counted were single cells (total number outside the gate: 1859 cells). Sample 4: 1/250 cells counted were aggregates (total number within gate: 5310), 51/160 cells counted were single cells (total number outside the gate: 1656 cells). Sample 5: 0/250 cells counted were aggregates (total number within gate: 4464 cells), 84/160 cells counted were single cells (total number outside the gate: 1285 cells). In conclusion, between 0 and 4% aggregates were in-cluded in the singlets gate, while only 10% of single cells were excluded from final analysis. Singlets gate was applied on all samples tested, and information was extracted and processed using an in-house MATLAB program (available upon request). Moreover, each sample was manually examined and the number of nuclei per dividing cell was counted for 50 to 80 cells from each DNA peak.

Accession Numbers

Accession numbers (Phytozome v.10; http://phytozome.jgi.doe.gov/pz/portal.html) of Chlamydomonas genes discussed in this article are as follows: CYCD2, Cre06.g289750.t1.1; CYCD3, Cre06.g284350.t1.1; CDC20, Cre09.g399911.t1.1; CYCB1, Cre08.g370401.t1.1; CDKA1, Cre10.g465900.t1.2; MED6, Cre11.g467659.t1.1; RECQL4, Cre15.g634701.t1.1; RIR1, Cre12.g492950.t1.2; SENP7, Cre06.g278197.t1.1; BSL1, Cre01.g050850.t1.2; APC1, Cre13.g579100.t1.1; APC2, Cre10.g460532.t1.1; APC4, Cre12.g556650.t1.2; APC6, Cre13.g562950.t1.1; APC8, Cre16.g681578.t1.1; SPC25, Cre16.g658600.t1.1; SMC2, Cre02.g086650.t1.2; SMC4, Cre12.g493400.t1.1; CDKB1, Cre08.g372550.t1.1; CKS1, Cre03.g180350.t1.2; and PIP5K9, Cre09.g400478.t1.1. Acces-sion numbers for remaining genes in which we isolated mutations are available in Supplemental Data Sets 1 to 3. Sequence data (.fastq or .bam files) from this article and in-house software are available upon request.

Supplemental Data

Supplemental Figure 1. A schematic presentation of the mutant hunt and identification of causative mutations.

Supplemental Figure 2. Minority reads are almost exclusively as-signed to occupied patterns within the set of libraries.

Supplemental Figure 3. Scrambling strategy confirms correct muta-tions assignments.

Supplemental Figure 4. Confirmation of causality identification by isolating revertants.

Supplemental Figure 5. High-throughput imaging of DNA stained samples provides an efficient tool to exclude cell aggregates.

Supplemental Figure 6. High reproducibility of DNA profiling assay as demonstrated by pairs of siblings.

Supplemental Figure 7. Ploidy measurements are highly reproducible.

Supplemental Figure 8. Applicability of MAPS-SEQ to diploid meiotic organisms.

Supplemental Table 1. Comparison of MAPS-SEQ to bulked-segregant analysis.

Supplemental Data Set 1. Candidate null allele mutations identified within the set of passenger mutations.

Supplemental Data Set 2. All mutations in essential genes identified in this work.

Supplemental Data Set 3. Raw data of DNA content profiles of all mutants tested by flow cytometry.

Supplemental Data Set 4. Newly identified genes with potential involvement in cell division.

ACKNOWLEDGMENTS

We thank the Cross lab members for advice and useful discussion. We thank Kresti Pecani for development and application of PCR-free meth-ods for DNA sample preparation. This work was supported by National Institutes of Health Grant GM07853 to F.R.C., by a Junior Fellow award from the Simons Foundation to M.B., and by The Rockefeller University.

AUTHOR CONTRIBUTIONS

M.B., K.L., and F.R.C. designed and performed research. M.B. and F.R.C. contributed new computational tools and wrote the article.

Page 20: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

Multiplexed Identification of Essential Genes 1197

Received January 29, 2018; revised March 26, 2018; accepted May 8, 2018; published May 9, 2018.

REFERENCES

Atkins, K.C. (2018). Cell Cycle Control by Cyclin-CDKs in Chlamydo-monas reinhardtii. PhD dissertation (New York: The Rockefeller University).

Atkins, K.C., and Cross, F. (2018). Interregulation of CDKA/CDK1 and the plant-specific cyclin-dependent kinase CDKB in control of the Chlamydomonas cell cycle. Plant Cell 30: 429–446.

Baek, K., Kim, D.H., Jeong, J., Sim, S.J., Melis, A., Kim, J.S., Jin, E., and Bae, S. (2016). DNA-free two-gene knockout in Chlamydomonas reinhardtii via CRISPR-Cas9 ribonucleoproteins. Sci. Rep. 6: 30620.

Bell, S.P., and Dutta, A. (2002). DNA replication in eukaryotic cells. Annu. Rev. Biochem. 71: 333–374.

Bisova, K., Krylov, D.M., and Umen, J.G. (2005). Genome-wide an-notation and expression profiling of cell cycle regulatory genes in Chlamydomonas reinhardtii. Plant Physiol. 137: 475–491.

Bloom, J., and Cross, F.R. (2007). Multiple levels of cyclin specificity in cell-cycle control. Nat. Rev. Mol. Cell Biol. 8: 149–160.

Breker, M., Lieberman, K., Tulin, F., and Cross, F.R. (2016). High- throughput robotically assisted isolation of temperature-sensitive le-thal mutants in Chlamydomonas reinhardtii. J. Vis. Exp. 118: 54831.

Cannarozzi, G., Schraudolph, N.N., Faty, M., von Rohr, P., Friberg, M.T., Roth, A.C., Gonnet, P., Gonnet, G., and Barral, Y. (2010). A role for codon order in translation dynamics. Cell 141: 355–367.

Coleman, A.W., and Maguire, M.J. (1982). A microspectrofluoromet-ric analysis of nuclear and chloroplast DNA in Volvox. Dev. Biol. 94: 441–450.

Cools, T., Iantcheva, A., Weimer, A.K., Boens, S., Takahashi, N., Maes, S., Van den Daele, H., Van Isterdael, G., Schnittger, A., and De Veylder, L. (2011). The Arabidopsis thaliana checkpoint kinase WEE1 protects against premature vascular differentiation during replication stress. Plant Cell 23: 1435–1448.

Cross, F.R. (2016). Tying down loose ends in the Chlamydomonas ge-nome: functional significance of abundant upstream open reading frames. G3 (Bethesda) 6: 435–446.

Cross, F.R., Buchler, N.E., and Skotheim, J.M. (2011). Evolution of net-works and sequences in eukaryotic cell cycle control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 366: 3532–3544.

Cross, F.R., Breker, M., and Lieberman, K. (2017). Validated Bayesian differentiation of causative and passenger mutations. G3 (Bethesda) 7: 2081–2094.

Dutcher, S.K. (1995). Mating and tetrad analysis in Chlamydomonas reinhardtii. Methods Cell Biol. 47: 531–540.

Erlich, Y., Chang, K., Gordon, A., Ronen, R., Navon, O., Rooks, M., and Hannon, G.J. (2009). DNA Sudoku: harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res. 19: 1243–1253.

Francis, D. (2007). The plant cell cycle: 15 years on. New Phytol. 174: 261–278.

Fujimitsu, K., Grimaldi, M., and Yamano, H. (2016). Cyclin-dependent kinase 1-dependent activation of APC/C ubiquitin ligase. Science 352: 1121–1124.

Gaudet, M., Fara, A.G., Beritognolo, I., and Sabatti, M. (2009). Allele-specific PCR in SNP genotyping. Methods Mol. Biol. 578: 415–424.

Gingold, H., and Pilpel, Y. (2011). Determinants of translation efficiency and accuracy. Mol. Syst. Biol. 7: 481.

Greiner, A., Kelterborn, S., Evers, H., Kreimer, G., Sizova, I., and Hegemann, P. (2017). Targeting of photoreceptor genes in Chlam-

ydomonas reinhardtii via zinc-finger nucleases and CRISPR/Cas9. Plant Cell 29: 2498–2518.

Harris, E. (2008). The Chlamydomonas Sourcebook: Introduction into Chlamydomonas and Its Laboratory Use. (San Diego, CA: Elsevier Academic Press).

Henikoff, S., and Henikoff, J.G. (1993). Performance evaluation of ami-no acid substitution matrices. Proteins 17: 49–61.

Henikoff, J.G., Greene, E.A., Taylor, N., Henikoff, S., and Pietrokovski, S. (2002). Using the blocks database to recognize functional domains. Curr. Protoc. Bioinformatics 2: 2.2.

Inzé, D., and De Veylder, L. (2006). Cell cycle regulation in plant devel-opment. Annu. Rev. Genet. 40: 77–105.

Kumar, N., et al. (2015). Functional conservation in the SIAMESE- RELATED family of cyclin-dependent kinase inhibitors in land plants. Plant Cell 27: 3065–3080.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultra-fast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10: R25.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing Subgroup (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25: 2078–2079.

Li, X., Zhang, R., Patena, W., Gang, S.S., Blum, S.R., Ivanova, N., Yue, R., Robertson, J.M., Lefebvre, P.A., Fitz-Gibbon, S.T., Grossman, A.R., and Jonikas, M.C. (2016). An indexed, mapped mutant library enables reverse genetics studies of biological processes in Chlam-ydomonas reinhardtii. Plant Cell 28: 367–387.

McGarry, T.J., and Kirschner, M.W. (1998). Geminin, an inhibitor of DNA replication, is degraded during mitosis. Cell 93: 1043–1053.

Merchant, S.S., et al. (2007). The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318: 245–250.

Naya, H., Romero, H., Carels, N., Zavala, A., and Musto, H. (2001). Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett. 501: 127–130.

Nowack, M.K., Harashima, H., Dissmeyer, N., Zhao, X., Bouyer, D., Weimer, A.K., De Winter, F., Yang, F., and Schnittger, A. (2012). Ge-netic framework of cyclin-dependent kinase function in Arabidopsis. Dev. Cell 22: 1030–1040.

Onishi, M., Pringle, J.R., and Cross, F.R. (2016). Evidence that an un-conventional actin can provide essential F-actin function and that a surveillance system monitors F-actin integrity in Chlamydomonas. Genetics 202: 977–996.

Qiao, R., et al. (2016). Mechanism of APC/CCDC20 activation by mitotic phosphorylation. Proc. Natl. Acad. Sci. USA 113: E2570–E2578.

Remm, M., Storm, C.E.V., and Sonnhammer, E.L.L. (2001). Automatic clustering of orthologs and in-paralogs from pairwise species com-parisons. J. Mol. Biol. 314: 1041–1052.

Rogozin, I.B., Basu, M.K., Csürös, M., and Koonin, E.V. (2009). Anal-ysis of rare genomic changes does not support the unikont-bikont phylogeny and suggests cyanobacterial symbiosis as the point of pri-mary radiation of eukaryotes. Genome Biol. Evol. 1: 99–113.

Salinas, T., Duby, F., Larosa, F., Coosemans, M., Bonnefoy, N., Motte, P., Maréchal-Drouard, L., and Remacle, C. (2012). Co-evolution of mitochondrial tRNA import and codon usage determines transla-tional efficiency in the green alga Chlamydomonas. PLoS Genet. 8: e1002946.

Sørensen, M.A., Kurland, C.G., and Pedersen, S. (1989). Codon us-age determines translation rate in Escherichia coli. J. Mol. Biol. 207: 365–377.

Tulin, F., and Cross, F.R. (2014). A microbial avenue to cell cycle control in the plant superkingdom. Plant Cell 26: 4019–4038.

Page 21: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

1198 The Plant Cell

Tulin, F., and Cross, F.R. (2016). Patching holes in the Chlamydomonas genome. G3 (Bethesda) 6: 1899–1910.

Wignall, S.M., Deehan, R., Maresca, T.J., and Heald, R. (2003). The condensin complex is required for proper spindle assembly and chromosome segregation in Xenopus egg extracts. J. Cell Biol. 161: 1041–1051.

Zhang, R., Patena, W., Armbruster, U., Gang, S.S., Blum, S.R., and Jonikas, M.C. (2014). High-throughput genotyping of green algal mutants reveals random distribution of mutagenic insertion sites and

endonucleolytic cleavage of transforming DNA. Plant Cell 26: 1398–1409.

Zhang, S., Chang, L., Alfieri, C., Zhang, Z., Yang, J., Maslen, S., Skehel, M., and Barford, D. (2016). Molecular mechanism of APC/C activa-tion by mitotic phosphorylation. Nature 533: 260–264.

Zones, J.M., Blaby, I.K., Merchant, S.S., and Umen, J.G. (2015). High-resolution profiling of a synchronized diurnal transcriptome from Chlamydomonas reinhardtii reveals continuous cell and metabolic dif-ferentiation. Plant Cell 27: 2743–2769.

Page 22: Comprehensive Discovery of Cell-Cycle-Essential Pathways ...proximately seventy-five percent of the essential genes identified in this project had clear orthologs in land plant genomes,

DOI 10.1105/tpc.18.00071; originally published online May 9, 2018; 2018;30;1178-1198Plant Cell

Michal Breker, Kristi Lieberman and Frederick R. CrossChlamydomonas reinhardtiiComprehensive Discovery of Cell-Cycle-Essential Pathways in

 This information is current as of May 22, 2020

 

Supplemental Data /content/suppl/2018/05/16/tpc.18.00071.DC2.html /content/suppl/2018/05/09/tpc.18.00071.DC1.html

References /content/30/6/1178.full.html#ref-list-1

This article cites 43 articles, 16 of which can be accessed free at:

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists