Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Systematic Analysis of Suppressor Mutations in S. cerevisiae
Strains with Deleted Genome Integrity Genes
by
Takafumi Yamaguchi
A thesis submitted in conformity with the requirements
for the degree of Master of Science
Molecular Genetics
University of Toronto
© Copyright by Takafumi Yamaguchi, 2013
ii
Systematic Analysis of Suppressor Mutations in S. cerevisiae Strains
with Deleted Genome Integrity Genes
Takafumi Yamaguchi
Master of Science
Molecular Genetics
University of Toronto
2013
Abstract
The effects of a mutation in one gene can occasionally be suppressed by mutation in another
gene. Genetic suppression indicates functional relationships and provides clues about the
mechanism and order of action in genetic pathways. Here I explored the existing yeast deletion
collection to identify suppressor relationships. The collection was released in 2000 and it is
known that some strains in the collection have acquired mutations. Whole genome sequencing
of 48 yeast deletion strains corresponding to 26 genome integrity genes was performed. High-
throughput sequencing revealed a broad mutational spectrum including point mutations, indels,
and copy number variations. I identified and experimentally validated two new suppressor
mutations (sgs1 mutations in both top3∆ and rmi1∆ strains) corresponding to gene pairs with
previously known suppressor relationships. Thus, high-throughput sequencing and analysis of
yeast deletion strains can identify suppressor mutations. The resulting genome sequences also
provide a baseline for future laboratory evolution experiments.
iii
Acknowledgments
First and foremost, I would like to express my sincere gratitude to my supervisor, Dr.
Frederick Roth, for providing patient guidance, supportive mentorship and an excellent work
environment during my M.Sc. studies. Under his thought-stimulating supervision, I was able to
focus on my research, learn new techniques and gain knowledge. I am truly fortunate to have
had the opportunity to work with him right after he opened the Roth laboratory at the University
of Toronto.
I would also like to thank my committee members, Dr. Leah Cowen and Dr. Zhaolei
Zhang, for the friendly guidance and valuable suggestions to complete my project. Additionally,
I would like to acknowledge Dr. Daniel Durocher for helping us select interesting genome
integrity genes.
I am truly grateful for the help and support of my lab-mates and collaborators. I would
like to thank Joe Mellor, Dax Torti, and Anna Karkhanina for generating sequencing data to
analyze. I greatly benefited from Joe’s wide-ranging knowledge of molecular and computational
biology. Atina Coté also shared her broad knowledge of DNA repair and yeast biology with me.
I am really indebted to Kenny Chua, Nozomu Yachie, and Murat Taşan for their patient and
friendly advice. I would have had a much more difficult time to develop the computational
pipeline without their help. Xinlei Shun, a visiting summer student, performed a part of the
experiments in this thesis under my supervision. Theodore Pak developed a genome browser
called ChromoZoom, which allowed me to easily and intuitively visualize the results of the
sequencing data analyses. I am thankful to Siyang Li, J. Javier Díaz-Mejía, Mariana Babor, Paul
Bansal, and Jolanda van Leeuwen for many valuable discussions.
I would also like to show my appreciation to my best friend in Toronto, Kota Hatta, for
his warm friendship, encouragement and valuable advice.
Finally and most importantly, I would like to express my heartfelt gratitude to my
immediate family, my mother, Akiko Yamaguchi, and my sister, Mika Yamasaki. This thesis is
dedicated to my mother and sister for their understanding, support, patience, and love during my
studies in Canada.
iv
Table of Contents
Contents
Acknowledgments ......................................................................................................................... iii
Table of Contents ............................................................................................................................ iv
List of Tables .................................................................................................................................. vi
List of Figures ................................................................................................................................vii
List of Abbreviations ................................................................................................................... viii
1 Introduction ................................................................................................................................. 1
1.1 Suppressor genetic interactions illuminate functional relationships between genes ........... 1
1.2 Suppressor genetic interactions to complement the current genetic interaction
networks of the cell .............................................................................................................. 4
1.3 Importance of understanding the mechanism of genome integrity ..................................... 5
1.4 Convenience of high-throughput sequencing technology to analyze the multiple
genomes of the yeast strains ................................................................................................ 7
1.5 High-throughput sequencing helps to reveal suppressor mutations in strains ..................... 8
2 Materials and Methods ................................................................................................................ 9
2.1 Media and Yeast strains ....................................................................................................... 9
2.2 High-throughput sequencing ............................................................................................... 9
2.3 Computational pipeline for Illumina sequencing data analyses ........................................ 10
2.4 Simulation of SNV distribution ......................................................................................... 13
2.5 Sanger sequencing ............................................................................................................. 15
2.6 Confirmation of candidate suppressor mutations using MoBY-ORF 2.0
Transformation .................................................................................................................. 15
3 Results ....................................................................................................................................... 16
3.1 High-throughput sequencing of genome integrity gene deletion strains and its data
analysis detects unique SNVs ............................................................................................ 16
v
3.2 CNV analysis detected large duplications and telomere aberration in the deletion
strains ................................................................................................................................. 31
3.3 A sgs1 44bp deletion was detected in the rmi1∆ MATα strain through SV analyses ........ 32
3.4 Attempt to validate promising suppressor candidates ....................................................... 34
3.5 Comparison of observed SNVs distribution with that of simulated distribution .............. 36
3.6 The SNV-deletion strains network illuminated sgs1 mutations as promising
suppressor candidates ........................................................................................................ 43
3.7 MoBY-ORF 2.0 plasmid transformation for validation of the candidate suppressors ...... 47
3.8 Tetrad analysis validated sgs1 suppressor mutations ........................................................ 50
4 Discussion ................................................................................................................................. 54
4.1 Mutation spectrum in the genome integrity deletion strains ............................................. 54
4.2 Potential Molecular mechanisms of the validated sgs1 suppressor mutations .................. 56
4.3 Have genome integrity gene deletion strains evolved? ..................................................... 57
5 Conclusions and future directions ............................................................................................. 59
5.1 Rationale ............................................................................................................................ 60
5.2 Specific aims ...................................................................................................................... 61
6 Reference .................................................................................................................................. 65
7 Appendices ................................................................................................................................ 72
7.1 Other relevant research ...................................................................................................... 72
7.1.1 Large-Scale Identification of extragenic suppressor mutations in yeast ............... 72
7.1.2 Identification of E0005 resistant mutations in yeast using next-generation
sequencing ............................................................................................................. 75
7.1.3 Generation of the lift-over genomes of BY4741 and BY4742 strains .................. 75
7.2 Poster Presentation............................................................................................................. 76
vi
List of Tables
Table 1. Categories of genome integrity genes for the simulation of SNV distribution ............. 14
Table 2. A list of query strains and their notable structural variations ........................................ 20
Table 3. A list of candidate small – large deletions ..................................................................... 33
Table 4. List of SNVs and micro-indels tested using Sanger sequencing ................................... 35
Table 5. A list of candidate suppressor mutations ....................................................................... 49
Table 6. The results of validations for the sgs1 suppressor mutations using Sanger sequencing in
the top3∆ and rmi1∆ strains ......................................................................................................... 53
vii
List of Figures
Figure 1. Computational workflow for processing Illumina sequencing reads ........................... 12
Figure 2. Median coverage for each of the deletion strains ......................................................... 21
Figure 3. Fraction of genome covered at more than certain read depth ...................................... 22
Figure 4. Numbers of total SNVs for each deletion strain .......................................................... 23
Figure 5. Numbers of total micro-insertions for each deletion strain .......................................... 24
Figure 6. Numbers of total micro-deletions for each deletion strain ........................................... 25
Figure 7. Numbers of non-synonymous mutations for each deletion strain ................................ 26
Figure 8. Numbers of deleterious non-synonymous mutations using BLOSUM80 matrix ........ 27
Figure 9. Numbers of deleterious non-synonymous mutations using PolyPhen-2 ...................... 28
Figure 10. Numbers of deleterious non-synonymous mutations using Provean ......................... 29
Figure 11. Correlation of deleterious SNVs between BLOSUM80, PolyPhen2, and Provean ... 30
Figure 12. Ratio of transitions to transversions in the seven groups ........................................... 38
Figure 13. Fraction of SNVs and non-synonymous mutations in CDS are significantly low
compared to the simulated data ................................................................................................... 39
Figure 14. Fractions of SNVs in CDS in seven groups ............................................................... 40
Figure 15. Fraction of non-synonymous mutations in CDS ........................................................ 41
Figure 16. Fraction of deleterious SNVs in CDS ........................................................................ 42
Figure 17. A network describing the relationship between the deletion strains and SNV/micro-
indels ............................................................................................................................................ 46
Figure 18. Validation of the sgs1 suppressor mutations in top3∆ and rmi1∆ strains .................. 52
viii
List of Abbreviations
Abbrebiation Meaning First occurance (page)
BER Base Excision Repair 13
BWA Burrows-Wheeler Aligner 9
CDS CoDing Sequence 13
CI Confidence Interval 13
CIN Chromosome Instability 6
CNV Copy Number Variation 3
DAmP Decreased Abundance by mRNA
Perturbation
60
DEL Deletion 11
dHJ double Holliday junction 56
DUP Tandem Duplication 11
EMS Ethyl methanesulfonate 62
FP False Positive 35
gDNA genomic DNA 9
GO Gene Ontology 13
INS Insertion 11
INV Inversion 11
ix
MMS Methylmethanesulfonate 47
MoBY Molecular Barcoded Yeast 15
OMIM Online Mendelian Inheritance in Man 60
ORF Open Reading Frame 11
RFP Replication Fork Progression 13
RPL Replacement 11
SAMtools Sequence Alignment/Map tools 9
SGA Synthetic Genetic Array 60
SGD Saccharomyces Genome Database 7
SNV Single Nucleotide Variant 10
TP True Positive 35
1
1 Introduction
1.1 Suppressor genetic interactions illuminate functional relationships
between genes
Adaptation is a process where a population of organisms evolves through natural
selection to become more fit within the context of a given external environment or an internal
genetic change. The driving force of adaptation is beneficial mutations that result in increased
fitness. This kind of adaptation can be used to identify functionally related genes and to study
genetic pathways. A commonly used method is to start with a strain that already has a known
mutation and to wait until the population accumulates new mutations that ameliorate or
counteract the effects of the existing mutation (Prelich, 1999). Distinct mutations that counteract
the effects of another mutation are called suppressors and often indicate potentially strong
functional relationships between the gene pairs, which may not be easily detectable by other
methods. Therefore, suppressors have been extensively used as a very valuable tool to determine
the roles of genes.
There are mainly three classes of suppressors, which are particularly useful to investigate
functional relationships between genes and also to investigate pathway orders. These classes
are: Class 1 – Alteration in the activity of the mutant proteins, Class 2 – Alteration in the
activity of the mutant pathway, and Class 3 – Alteration in the activity of a different pathway
(Prelich, 1999). A primary mutation may cause the structural change of a mutant protein and
decrease the protein’s ability to bind its interacting partner. This defect could be suppressed by
specifically modifying the binding domain structure of the interacting partner or, more likely, by
forming additional contact sites. For example, an S. cerevisiae actin-binding protein (Sac6)
suppressor was previously identified in strains bearing a temperature-sensitive mutation in actin
2
(Act1). The act1 mutation reduces the affinity of Act1 for Sac6 but the Sac6 suppressor protein
restores the affinity for Act1 mutant protein by forming another binding site (Hanein et al.,
1998; Sandrock et al., 1997). This example of a Class 1 suppressor illustrates the importance of
suppressors to identify direct interactions of two gene products and also to provide information
about the protein structure involved in the interaction.
Suppressors belonging to Class 2 are often extremely valuable because they not only
identify other gene products involved in the pathway and their functional relationships but also
provide indications of the pathway order (Avery and Wasserman, 1992). A pde1 suppressor
mutation, for example, was detected in the ras2 temperature sensitive strain (Uno et al., 1983).
Both Ras2 and Pde1 are involved in the RAS/cAMP pathway, which plays a major role in the
control of metabolism, stress resistance, and cell proliferation (Broach, 1991). In brief, Ras2 is
required for inducing high cAMP levels, activating the RAS/cAMP pathway whilst Pde1
maintains the basal cAMP levels by cAMP degradation. When the protein activity of Pde1 is
disrupted due to a loss of function mutation in PDE1, which is a suppressor mutation against the
primary ras2 mutation that affects temperature sensitivity, the concentration of cAMP is kept
high enough to perform normal cell proliferation because the mutated Pde1 protein loses the
ability to catalyze cAMP degradation. Other suppressor relationships involved in the pathway
have been identified such as between RAS2 and CYR1, and RAS2 and PDE2 (Mitsuzawa et al.,
1989; Sass et al., 1986). The accumulation of such information can facilitate ordering the
pathway and understanding its function(s).
Suppressors in Class 3 may help find relationships between a pathway that contains a
primary mutation and a pathway that has a suppressor mutation against the primary mutation.
Suppressors in this class may modify the regulation of a pathway that has a related function or
3
change the function of a pathway that does not relate to the primary pathway. For instance, in
Escherichia coli, a suppressor mutation in the lactose permease allows for maltose
transportation in the maltose permease deficient strain although the wild type lactose permease
does not have an ability to transport maltose. Thus, Class 3 suppressors can also give us some
insights about gene functions.
In this study, we investigate suppressor genetic relationships in yeast haploid deletion
strains that were generated by the Saccharomyces Genome Deletion Project (Winzeler et al.,
1999). We sequenced the genomes of 48 yeast deletion haploid strains corresponding to 26
genome integrity genes (both mating types for most of these genes were sequenced) using the
Illumina HiSeq 2000 Platform. Importantly, high-throughput sequencing allows us to reveal the
mutational spectrum of unintentional laboratory evolution, including interesting structural
variations, which may have arisen in the strains. As the genome-wide mutation spectrum in the
deletion strains is largely unknown, it is important to access the comprehensive mutation
spectrum including structural variations, point mutations, copy number variations (CNVs), and
suppressors on the genome were analyzed.
Here, potential outcomes with respect to the sequencing results could fall along a
continuum between two extremes. One extreme outcome is that the deletion strains harbor many
mutations and have a diverse mutational spectrum as they have lost one genome integrity gene
and also it has been more than 10 years since they were generated. In this case, we may need to
question the biological insights that were derived from the data using genome integrity gene
deletion strains as mutations accumulated in the strain could affect experimental data. However,
mutations accumulated in the strains would be valuable information to study the effects of
mutations. This is because investigating genome-wide suppressors that occur in a strain bearing
4
a specific null mutant gene potentially allows us to reveal a new network of genetic interactions
and find novel functional relationships between genes.
A possible outcome on the other extreme is that the strains do not have many mutations
and the mutational spectrum is limited. This may result in finding no suppressor mutations in
the strains. However, the second outcome can potentially support the reliability of the data that
have been published using genome integrity gene deletion strains. In addition, the second
possible outcome would imply the robustness and buffering effects of cellular network in the
deletion strains. Furthermore, as the genome integrity gene deletion strains are expected to
harbor some of the most affected genomes among the deletion strains owing to a gene deletion,
the second outcome would generally support the experimental data obtained using other yeast
deletion strains.
1.2 Suppressor genetic interactions to complement the current genetic
interaction networks of the cell
S. cerevisiae is one of the most well studied model organisms because yeast grows
rapidly, has a small genome and can easily be manipulated in normal laboratory conditions. It
has allowed development of a wide range of genetic engineering techniques, and many reagents
are available for studying specific gene function, such as deletion strain collections and a set of
systematic functional annotations (Sherman, 2002). By taking advantage of these features and
reagents, many studies have been conducted in a high-throughput manner to reveal the entire
cellular network on the basis of the genetic and physical interactions. For example, large-scale
physical interaction networks for budding yeast were constructed using different methodologies
(Krogan et al., 2006; Tarassov et al., 2008; Yu et al., 2008). In addition to the protein-protein
interaction networks, much of the genome-wide genetic interaction network has been
5
established based on the fitness of 5.4 million yeast double deletion strains (Costanzo et al.,
2010). Also, a dosage-suppression genetic interaction network for yeast has been described
using high-copy number plasmids over-expressing a wild type gene, which can restore a mutant
phenotype (Magtanong et al., 2011). These genetic and physical interaction network data help us
to understand many aspects of the biological processes in yeast. However, despite the extensive
mapping of the genetic and physical interaction networks, mapping of the global functional
network of the cell has not been completed.
Interestingly, a dosage suppression genetic interaction study showed that many dosage
suppression interactions identified did not overlap with other types of genetic and protein
interactions (Magtanong et al., 2011). Also, it has been shown that the integration of multiple
genetic and physical networks is a powerful method to understand the more complete cellular
network (Zhu et al., 2008). Despite the importance of suppressor genetic interactions, genome-
wide effects of suppressor mutations accumulated in the yeast deletion strains are largely
unknown. Thus, the systematic analyses of suppressor genetic interactions in the yeast deletion
strain could provide valuable information to further complement the global understanding of
functional relationships and pathways in the cell. Although only 26 genome integrity gene
deletion strains were sequenced and analyzed in this project, I have also been involved in a
collaborative project to reveal the genome-wide suppressor network in the yeast deletion strains
(Appendices 6.1.1).
1.3 Importance of understanding the mechanism of genome integrity
Genome integrity genes play a crucial role in maintaining genome stability during
complex cellular processes such as DNA replication and DNA repair. Losing genome integrity
in humans can cause serious diseases including cancer and immune deficiency (Shiloh, 2003).
6
Also, it has been shown that mutations resulting in genomic instability can cause additional
changes in the yeast genome (Gasch et al., 2001; Lehner et al., 2007). For example, a frame-
shift mutation that truncates the Msh3 protein causes microsatellite instability (Lehner et al.,
2007). Another example is the chromosome IV, which is often duplicated in mec1 haploid null
strains overexpressing RNR1 due to the selective pressure of this gene deletion because the
strains are not viable without the chromosomal duplication (Gasch et al., 2001). Therefore, we
are expecting that the genome integrity gene deletion strains have accumulated mutations and
chromosomal rearrangements due to the decrease in stability of the deletion strains and/or their
inability to repair mutations efficiently.
Many cellular processes involved in genome integrity are highly conserved between
yeast and multicellular organisms including humans so that findings in yeast are potentially
relevant to human diseases such as cancer (Rouse and Jackson, 2002). For example, some
human homologs relating to chromosome instability (CIN) in yeast are known to mutate during
tumor formation and the synthetic lethal genetic interaction of yeast CIN genes was used to kill
cancer cells in human (Yuen et al., 2007). Although extensive studies have been conducted to
uncover the global functional relationships between genes involved in genome integrity, they
are not yet completely understood. Hence, further studies are necessary to improve the global
understanding of the genome integrity network. In this study, we describe the use of high-
throughput sequencing to test by analyzing the entire genomes if the selective pressure of a gene
deletion involved in genome integrity leads to adaptations through natural selection.
7
1.4 Convenience of high-throughput sequencing technology to
analyze the multiple genomes of the yeast strains
Modern (‘next-generation”) sequencing technologies have made genome sequencing and
re-sequencing fast, economical, and reliable. Identification of genomic variations among
populations, de novo assembly of a whole genome, and analyses of gene expression levels
through RNA sequencing are excellent examples to illustrate the achievements of high-
throughput sequencing (Nielsen et al., 2011).
The yeast genome is approximately 12.2 Mb (S288C genome assembly produced by
Saccharomyces Genome Database (SGD)), which is one of the smallest genomes among well-
studied eukaryotic model organisms, while the genome size of human is about 3.2 Gb according
to the Genome Reference Consortium (GRCh37.p11). There are three main advantages for
sequencing a relatively small genome. First, a large number of different samples can be
sequenced at once in parallel through multiplexing at a level that is appropriate to the number of
sequence reads that can be generated for a single sequencing run. Fragments from each strain
are tagged with a corresponding DNA barcode (or combination of barcodes) so that all strains
can be pooled for sequencing in parallel and every DNA sequence read is distinguishable.
Second, similar to the first advantage, deeper coverage can be easily obtained throughout the
genome compared to the organisms that have bigger genome. Sequencing of many samples with
deep coverage allows accurate downstream analyses because high coverage grants confident
base-calling. Thus, this advantage can reduce error and uncertainty associated with the results of
downstream analyses.
The haploid reference genome is publicly available at SGD. A haploid genome can avoid
any ambiguity in base-calling or alignment error arising through heterozygosity in a diploid or
8
multiploid genomes. Therefore, a yeast haploid genome is much simpler to perform the
downstream analyses of large sequencing data and this facilitates the accurate identification of
mutations and structural variations among strains. Thus, given the rapid growth and easy
handling, yeast is an inexpensive and suitable model organism for whole genome re-sequencing.
1.5 High-throughput sequencing helps to reveal suppressor mutations
in strains
Using high-throughput sequencing technology, I was able to identify new mutations and
structural variations such as large deletions and duplications in the deletion strains. Among these,
I was able to identify promising sgs1 suppressor candidates using the computational pipeline
and experimentally proved that the candidate mutations for the top3∆ and rmi1∆ strains were
true suppressors. Although suppressor relationships for these gene pairs were already known
(Chang et al., 2005; Gangloff et al., 1994) , the suppressor mutations identified were novel. This
result shows that high-throughput sequencing of the yeast deletion strains can provide strong
indications about suppressor mutations existing in the strains, in the absence of intentional in
vitro evolution or prior knowledge and mapping of the suppressors. Importantly, the candidate
suppressor mutations can be experimentally confirmed. Our data also sheds some light on the
mutational spectrum of different classes of genome integrity genes. The mutational spectrum
identified in this study shows that the deletion strains generally do not harbor many mutations
especially indels, which would have more deleterious effects on fitness. Moreover, only these
two suppressor genetic interaction (corresponding to three novel suppressor mutations) could be
identified amongst all 48 strains sequenced. Therefore, our results generally support reliability
of the experimental data that have been accumulated using yeast deletion strains.
9
2 Materials and Methods
2.1 Media and Yeast strains
Yeast haploid deletion strains used in this study are listed in Table 2 (Open Biosystems).
Yeast strains were grown in YPD (1% yeast extract, 2% peptone, and 2% glucose with or
without 2% agar) or SC-Leu (0.2% Drop-out mix, 2% glucose and 0.67% yeast nitrogen base
without amino acids with ammonium sulfate when MMS was not used: 0.2% Drop-out mix, 2%
glucose and 0.17% yeast nitrogen base without amino acids and without ammonium sulfate, and
0.1% L-glutamic acid monosodium salt when MMS was used). Sporulation medium is 2% agar,
1% potassium acetate, 0.1% yeast extract, 0.05% glucose, and 0.01% amino acids supplement
(histidine, Leucine, and uracil). Pre-sporulation medium contains 5% glucose, 3% nutrient broth,
1% yeast extract, and 2% agar.
2.2 High-throughput sequencing
S. cerevisiae genomic DNA (gDNA) from 48 strains (Table 2) was sheared to ~300bp
fragments with a Covaris S2 (Covaris, Inc., Waltham, MA) using the manufacturer's guidelines.
Sheared DNA was then end-polished (End-it Repair Mix, Epicentre) and 3'-adenylated (Klenow
exo-, NEB). Illumina-compatible, amplification-free adapters were designed with 8-bp
multiplexed index tags. For each strain, 500ng of library gDNA was then ligated to each of the
index tags (Kozarewa et al., 2009), with two replicates per strain, yielding 98 total samples. The
adapter-ligated libraries were quantified via qPCR (KAPA Library Quantification Kit) and
mixed to ensure even representation of each sample. All sequencing was performed on an
Illumina HiSeq 2000 instrument per the manufacturer’s recommended guidelines.
10
2.3 Computational pipeline for Illumina sequencing data analyses
Sequencing data generated by the Illumina HiSeq platform (101(forward) - 73(reverse)
bp paired-end reads with insert length of 200-400bp) was processed using the computational
pipeline that I developed (Figure 1).
First, the sequencing reads were aligned to the S288C reference genome from SGD
using Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2010). Next, Single Nucleotide
Variants (SNVs) and micro-indels were identified using Sequence Alignment/Map tools (SAM)
tools (Li et al., 2009). Custom shell/Perl scripts were used to perform CNV and amino acid
substitution analyses. CNV was calculated based on coverage for each strain. All 200bp windows
were rescaled so that the median coverage for each strain was 1 and also rescaled so that the median
coverage across the strains was 1. For example, the copy numbers for a single copy gene, a deleted
gene, and a duplicated gene are 1, 0, and 2 in theory.
Additionally, genome tracks for CNVs, actual coverage for each base in the genome, the
genomic regions that have no coverage, structural variations, SNV/micro-indels, and amino acid
substitutions were generated using custom Perl scripts and visualized using ChromoZoom(Table
2) (Pak and Roth, 2013). From the top except the default gene tracks, nine tracks were generated
as follows:
1. Relative CNV - CNV was calculated using the method above and colored in light green.
2. Read depth – This track shows actual read depth for each nucleotide in the genome.
Read depth is colored in dark green.
3. Genomic regions without coverage – Black bands in this track indicate no coverage at
the region, which means no DNA reads were mapped to the region. Thus, the track could
11
indicate possible break points and deletions although sequencing results and mapping
may be biased and simply no data for those regions. Gene names are added besides black
bands when the bands are overlapped with the genes.
4. Structural variants - Track for structural variants detected using Pindel (Version - 0.2.4t)
(Ye et al., 2009). Pindel could detect insertions, deletions, and inversions. Pindel was
able to detect some auxotrophy makers, inversions, and indels although some query gene
deletions were not detected. Detected mutations are described as “reference | alternate
bases (mutation) | a number of reads support the mutation | mutation type (either deletion
(DEL), insertion(INS), inversion(INV), replacement(RPL), or tandem
duplication(DUP) )) | gene name Open Reading Frame (ORF) name – if applicable |”.
Note that reference and alternate bases were omitted when length of a mutation is more
than 100 bp.
5. SNV/micro-indels with the stringent threshold (Q > 17, Read depth >= 5, % high-quality
bases supports an alternate base >= 90 %)
6. SNV/micro-indels with the permissive threshold (Q >= 10, Read depth >= 3, % high-
quality bases supports an alternate base > 50 %)
7. Amino acid substitution/frameshifts caused by SNV/micro-indels with the stringent
threshold
8. Amino acid substitution/frameshifts caused by SNV/micro-indels with the permissive
threshold
9. Mutations identified in parental strains (BY4741/BY4742)
12
Figure 1. Computational workflow for processing Illumina sequencing reads
This diagram describes the computational pipeline to process Illumina sequencing reads. First,
Illumina HiSeq 2000 generated billions of DNA short reads. Then, the reads were aligned to the
reference genome using BWA. Next, SNVs and micro-indels were identified using SAMtools
followed by CNV analysis and amino acid substitution analysis using custom Perl scripts. The
results were visualized with ChromoZoom and Cytoscape.
13
2.4 Simulation of SNV distribution
To investigate whether there are any trends in terms of the SNV distributions detected in
the strains, the distribution of randomly generated SNVs was compared with that of the
observed results. First, genome integrity genes were categorized into 7 groups (All (all SNVs
identified among the strains with the stringent threshold), Replication Fork Progression (RFP),
Junction Resolution, Mismatch Repair, Checkpoint Regulation, DNA Catabolism, and Base
Excision Repair (BER)) based on Gene Ontology (GO) annotations. Then, numbers of each type
SNVs (A to T, C to T, etc) were tallied (Table 1). Next, the same numbers of each type of SNVs
were randomly generated using custom Perl scripts and analyzed using the computational
pipeline. For example, the Mismatch Repair group has 15 G to A mutations. In the simulated
data, 15 G to A mutations were randomly generated between chrI and chrXVI and I repeated
this step for A to T mutations, C to T mutations, and so on. This simulation was performed 1000
times for each category. Based on the simulated data and the observed data, fraction of SNVs in
CoDing Sequence (CDS), fraction of non-synonymous SNVs in CDS, and fraction of
deleterious non-synonymous mutations in CDS were calculated. In addition to that, the 95 %
Confidence Interval (CI) for each category was computed and showed as error bars in Figure 12
– 15. A single p-value for each category was calculated based on the simulated data and
adjusted using the Bonferroni correction.
14
Function Genes Number of SNVs
Base Excision Repair MAG1,NTG1,RAD2,XRS2 15
Checkpoint Regulation CSM3,ESC2,MRC1,SGS1 19
DNA Catabolism MAG1,MSH2,NTG1,RAD1,SGS1 67
Junction Resolution MSH2,RAD5,RMI1,SGS1,TOP3 67
Mismatch Repair MLH2,MSH2,MSH4,RAD1 61
Replication Fork Progression ASF1,CSM3,DIA2,RAD5 37
Table 1. Categories of genome integrity genes for the simulation of SNV distribution
16 genome integrity genes were categorized into 6 groups based on GO annotations. DNA
catabolism, Junction Resolution and Mismatch Repair groups consist of MSH2. This increased
the number of SNVs in the groups compared to the other groups.
15
2.5 Sanger sequencing
Sanger sequencing was used to confirm specific SNV/micro-indels that were identified
by Illumina sequencing and also to validate suppressor mutations of dissected spores. BLAST,
primer3 were used to design specific PCR primers to avoid non-specific amplification. The
remaining purified gDNA for Illumina sequencing was used to amplify the target DNA
fragments for the spot-checking. For the validation of candidate suppressor mutations, colony
PCR was performed to amplify the DNA fragments that contain candidate suppressor mutations
of individual spores. The strain AB972 was used for the negative control for this experiment.
For both cases, PCR products were cleaned up using enzymatic reactions (Exonuclease I and
Antarctic Phosphatase, NEB) and the DNA concentrations were measured using the Quant-It
Picogreen Assay (Invitrogen). The PCR samples were Sanger sequenced by the TCAG DNA
Sequencing Facility.
2.6 Confirmation of candidate suppressor mutations using MoBY-
ORF 2.0 Transformation
Each candidate strain was transformed with the corresponding Molecular Barcoded
Yeast Open Reading Frame (MoBY)-ORF 2.0 plasmids, which contain high copy number of the
wild type allele of the suppressor candidate gene, using standard methods (Magtanong et al.,
2011). Spot assay (1:5 serial dilutions) was performed on SD –Leu plate and incubated at 30 °C.
16
3 Results
3.1 High-throughput sequencing of genome integrity gene deletion
strains and its data analysis detects unique SNVs
48 strains, comprised of either MATa and MATα strains or both for each of 26 DNA
genome integrity gene deletions, were sequenced using the Illumina HiSeq 2000 platform and
analyzed. All of the 48 strains achieved more than 15X coverage after the reads for each strain
were aligned to the reference genome (Figure 2 and Figure 3). The strain S288C from SGD was
used as reference genome. Following the read mapping, SNVs and micro-indels were identified
for each strain and the results were summarized for the 48 strains (Figure 4-6). Any SNVs and
micro-indels found in either BY4741 or BY4742 were removed because many of the deletion
strains were derived from BY4741 and BY4742. In addition, any common SNVs and micro-
indels identified in more than one of the deletion strains were removed. Therefore, SNVs and
micro-indels considered here were unique to the strains.
In total, 235 SNVs were identified with a stringent threshold of Phred score >= 17 (a
log-odds score that measures the likelihood of base-calling error), read depth >= 4 (the number
of reads that align to known reference bases), and also >= 90% of high-quality reads support the
mutations. 242 SNVs were identified with a permissive threshold (but below the stringent
threshold) (Figure 4). The permissive threshold is Phred score of at least 10, read depth of at
least 3 and also more than 50% of high-quality reads support the mutations. On average, 4.9 and
5.1 SNVs were found per strain with the stringent and permissive threshold, respectively.
Notably, of the 235 stringent SNVs identified, 42 SNVs came from the msh2∆ MATα strain,
which lacks Msh2 mismatch repair protein (Z-score 6). Another 6 stringent mutations were
found in the msh2∆ MATa strain (Z-score 0.18).
17
Regarding micro-indels, 40 and 22 micro-insertions and 117 and 48 micro-deletions
were identified with the stringent and permissive threshold. Again, it is notable that
approximately 53 % and 45 % of micro-insertions and 77 % and 85 % of micro-deletions with
the stringent and permissive threshold were found in the msh2∆ strains (Figure 5 and Figure 6).
No micro-insertion or only 1 to 4 micro-insertions were found in each of the strains other than
the msh2∆ MATa and MATα strains. Similarly, no micro-deletions or 1 to 4 micro-deletions
were detected in each of the deletion strains except the msh2∆ strains. The mean numbers of
micro-insertions and deletions identified per strain are 0.74 and 0.67 in total when excluding the
msh2∆ strains. Thus, the high micro-indels rates in msh2∆ strains indicates that the Msh2
protein plays a very important role in repairing or preventing micro-indels given that the
deletion strains were presumably stored and propagated in the same environmental conditions.
Next, 153 out of 235 SNVs (stringent) and 114 out of 242 SNVs (permissive) were
found in CDS regions). Only 2 (stringent) and 3 (permissive) SNVs were detected in intronic
regions. Then, with respect to the SNVs found in CDS, non-synonymous mutations occurring in
each strain were enumerated (Figure 7). 100 and 58 SNVs were found as non-synonymous
mutations with the stringent and permissive threshold. To further narrow down possible
candidate suppressor mutations, deleterious non-synonymous mutations were predicted using
BLOSUM80 score (a log-odds score that indicates the likelihood of a particular amino acid
substituting for another at the corresponding position in a homologous protein) (Henikoff and
Henikoff, 1992), PolyPhen-2 (Adzhubei et al., 2010) and Provean (Choi et al., 2012). Non-
synonymous mutations with negative BLOSUM80 score, start codon, nonsense, and read-
through mutations were considered as deleterious SNVs. Also, non-synonymous mutations
classified as deleterious by PolyPhen-2 and Provean were considered (Figure 8 - 10). PolyPhen-
2 was able to calculate prediction scores for the substitution effects for 132 (~83.5 %) non-
18
synonymous mutations while Provean calculated predicted scores for the substitution effects for
all non-synonymous mutations. As a result, 89, 66, and 53 deleterious SNVs were predicted
with BLOSUM80, PolyPhen-2, and Provean, respectively. This is equivalent to only 1.9, 1.4
and 1.1 deleterious SNVs in total per strain. Among the deleterious SNVs, 30 SNVs were
predicted by the three methods. To investigate if there is a significant difference in terms of a
frequency of deleterious mutations between the unique and common non-synonymous mutations,
PolyPhen-2 was used for 59 common non-synonymous mutations that were detected in the 48
strains. PolyPhen-2 predicted 13 non-synonymous mutations as deleterious out of 51 non-
synonymous mutations (no prediction was made for the remaining 8 non-synonymous
mutations). As 66 out of 132 (50%) and 13 out of 51 (34%) unique and common non-
synonymous mutations were predicted as deleterious, there is no significant difference between
the two groups. (Chi-squared test, P 0.0855). In addition, 12 nonsense mutations (9 with the
stringent threshold and 3 with the permissive threshold) and 1 read through mutation with the
permissive threshold were identified among the strains. This enumeration of point mutations
demonstrates that the genome integrity deletion strains generally only harbor small numbers of
SNVs and a few or no micro-indels that may affect the function of genes and the fitness.
19
Query Strains Genome Tracks Query Deletion Structural Variations
asf1∆ MATα asf1∆ MATα Y chrI duplication
asf1∆MATa asf1∆ MATa Y
csm3∆ MATα csm3∆ MATα
Y
Genes involved in cell wall maintenance
are deleted/duplicated
csm3∆ MATa csm3∆ MATa Y
dia2∆ MATα dia2∆ MATα Y chrI duplication
dia2∆ MATa dia2∆ MATa Y
esc2∆ MATα esc2∆MATα Y chrIII duplication, chrXII partial deletion
mag1∆ MATα mag1∆ MATα Y
mag1∆ MATa mag1∆ MATa Y
mgs1∆ MATα mgs1∆ MATα Y
mgs1∆ MATa mgs1∆ MATa Y
mlh2∆ MATα mlh2∆ MATα Y
mlh2∆ MATa mlh2∆ MATa Y
mrc1∆ MATα mrc1∆ MATα Y
mrc1∆ MATa mrc1∆ MATa Y
msh2∆ MATα msh2∆ MATα Y
msh2∆ MATa msh2∆ MATa Y
msh4∆ MATα msh4∆ MATα Y
msh4∆ MATa msh4∆ MATa Y
mus81∆ MATα mus81∆ MATα Y
mus81∆ MATa mus81∆ MATa Y
nej1∆MATα nej1∆ MATα Y Telomeric region aberration
nej1∆ MATa nej1∆ MATa Y Telomeric region aberration
ntg1∆ MATα ntg1∆ MATα Y
ntg1∆ MATa ntg1∆ MATa Y
pso2∆ MATα pso2∆ MATα Y
pso2∆ MATa pso2∆ MATa Y Telomeric region aberration
rad1∆ MATα rad1∆ MATα Y
rad1∆ MATa rad1∆ MATa Y
rad2∆ MATα rad2∆ MATα Y Partial chrIV duplication (1182k-2182k)
rad2∆ MATa rad2∆ MATa Y Partial chrIV duplication (8840k-9820k)
rad30∆ MATα rad30∆ MATα Y
rad52∆ MATα rad52∆ MATα Y
rad52∆ MATa rad52∆ MATa Y
rad5∆MATα rad5∆ MATα Y Telomeric region aberration
rad5∆ MATa rad5∆ MATa Y Telomeric region aberration
rmi1∆ MATα rmi1∆ MATα
Y
Partial chrIV duplication (1100k-1154k,
1163k-1205k)
rmi1∆ MATa rmi1∆ MATa Y
sgs1∆ MATα sgs1∆ MATα Contamination?
sgs1∆ MATa sgs1∆ MATa Y
20
shu1∆ MATα shu1∆ MATα Y
shu1∆ MATa shu1∆ MATa Y
slx5∆ MATa slx5∆ MATa Y
tdp1∆ MATα tdp1∆ MATα Y
tdp1∆ MATa tdp1∆ MATa Y
top3∆ MATα top3∆ MATα Y
top3∆ MATa top3∆ MATa Y
xrs2∆ MATa xrs2∆ MATa Y
Table 2. A list of query strains and their notable structural variations
All the strains sequenced using the Illumina HiSeq platform are listed. Their CNVs,
SNV/micro-indels, and non-synonymous mutations were visualized using ChromoZoom and the
hyperlinks are available in the second column. The third column indicates whether the query
deletion was confirmed based on the mapped reads. I was able to identify structural variations in
the strains and notable examples are listed in the fourth column.
21
Figure 2. Median coverage for each of the deletion strains
The median read depths across all genomic locations in chromosome1 to 16 for the 48
deletion strains were calculated and plotted in the figure. The minimum, average, and
maximum median coverage among the strains are 17, 26.8, and 54, respectively. The
deletion strains listed in the x-axis are in alphabetical order.
22
Figure 3. Fraction of genome covered at more than certain read depth
The fraction of genome covered at read depth of equal to or more than 10, 15, 20 and 25
were calculated and visualized for the 48 deletion strains using boxplot. For most of the
deletion strains, nearly 100 % of the genomic locations had more than 10X. About 25 % of
the strains achieved 25X for approximately 70 to 100 % of the genomic locations.
23
Figure 4. Numbers of total SNVs for each deletion strain
The number of SNVs was enumerated and plotted for each strain. 42 SNVs were found with the
stringent threshold in the msh2∆ MATα, which lacked the mismatch repair gene Msh2. On
average, approximately 10 SNVs were detected in the strains, as indicated by the black solid
line.
24
Figure 5. Numbers of total micro-insertions for each deletion strain
The number of micro-insertions was enumerated and plotted for each strain. In total 62 unique
micro-insertions were identified in the strains. Among them, 18 and 7 micro-insertions with the
stringent and permissive threshold were identified in the msh2∆ MATα strain. Also, 3 micro-
insertions passed both permissive and stringent thresholds in the msh2∆ MATa strain.
25
Figure 6. Numbers of total micro-deletions for each deletion strain
The number of micro-deletions was counted and plotted for each strain. In total 165 micro-
deletions were identified among the strains. 17 and 19 micro-deletions with the stringent and
permissive threshold were identified in the msh2∆ MATa strain. Also, 73 and 22 micro-deletions
were found in the msh2∆ MATα strain as the same manner.
26
Figure 7. Numbers of non-synonymous mutations for each deletion strain
The number of non-synonymous mutations was plotted for each strain. 100 and 58 SNVs were
detected with the stringent and permissive cut-off. On average, 2.1 and 1.2 SNVs were detected
with the stringent and permissive cut-off, respectively.
27
Figure 8. Numbers of deleterious non-synonymous mutations using BLOSUM80 matrix
The number of non-synonymous mutations with negative BLSOUM80 score was plotted for
each strain. In this graph, deleterious is defined as a mutation that is non-synonymous and has a
negative BLOSUM80 score. 63 and 39 deleterious mutations were detected with the stringent
and permissive cut-off, respectively. On average, only 1.31 and 0.81 deleterious non-
synonymous mutations were detected with the stringent and permissive cut-off, respectively.
28
Figure 9. Numbers of deleterious non-synonymous mutations using PolyPhen-2
The number of deleterious non-synonymous mutations identified using PolyPhen-2 was plotted
for each strain. In this graph, deleterious is defined as a mutation that has PolyPhen-2 score
above 0.5. The score indicates classifier probability of the variation being damaging. 46 and 32
deleterious mutations were detected with the stringent and permissive cut-off, respectively. On
average, only 0.96 and 0.67 deleterious non-synonymous mutations were detected with the
stringent and permissive cut-off, respectively.
29
Figure 10. Numbers of deleterious non-synonymous mutations using Provean
The number of deleterious non-synonymous mutations identified using Provean was plotted for
each strain. In this graph, deleterious is defined as a mutation that has PolyPhen-2 score below -
2.5. 38 and 26 deleterious mutations were detected with the stringent and permissive threshold,
respectively. On average, only 0.80 and 0.54 deleterious non-synonymous mutations were
detected with the stringent and permissive threshold, respectively.
30
Figure 11. Correlation of deleterious SNVs between BLOSUM80, PolyPhen2, and Provean
(A) Scores for BLOSUM80 and PolyPhen2 (PPH), (B) PPH2 and Provean and (C) BLOSUM80
and Provean were plotted, respectively. The shaded areas indicate that the SNVs in the area are
predicted as deleterious in both scores (BLOSUM80 <= -1, PPH2 > 0.5 and Provean < -2.5).
The correlation coefficient for (A) is r = -0.26, (B) is r = -0.70 and (C) is r = 0.33. (D) A Venn
diagram showing numbers of deleterious mutations detected by each method. 30 SNVs were
predicted as deleterious SNVs by the three methods.
31
3.2 CNV analysis detected large duplications and telomere aberration
in the deletion strains
CNV analysis, calculated based on coverage for each strain, is very informative to
validate the query gene deletions and to detect duplications. CNV analysis indicates that the
asf1∆ MATα, dia2∆ MATα, and esc2∆ MATα strains have a whole chromosomal duplication,
and that the rad2∆ MATa, rad2∆ MATα and rmi1∆ MATα strains have a partial chromosomal
duplication (Table 2). The rad2∆ strains showed a large chromosomal duplication in
chromosome IV both in MATa and MATα. In both cases, the duplicated DNA sequences were
flanked by direct repeats. In the rad2∆ MATa, the duplicated DNA region is approximately 1000
kb and flanked by Long Terminal Repeat retrotransposons. The rad2∆ MATa also showed a ~
1000kb DNA sequence duplication flanked by RPL35A and RPL35B, which encode the large
(60S) ribosomal subunits.
All the auxotrophy markers that were used in the deletion strains (his3∆, ura3∆, leu2∆,
met17∆, and lys2∆) were detected. All of the query deletions were also confirmed except the
sgs1∆ MATα strain. SGS1 in the sgs1∆ MATα strain was evenly covered but the CNV is
approximately 0.5, indicating the sample might have been mixed with other samples.
In addition, CNV tracks in some strains such as nej1∆ and rad5∆ strains show spikes at
the telomere regions and this may indicate that the telomeric regions were lost their stability and
elongated due to the gene deletion. Nej1, a regulator protein of non-homologous end joining, is
involved in telomere maintenance and it has been reported that deleting nej1 results in telomere
repeat extension (Liti and Louis, 2003). Rad5 is a DNA helicase possibly involved in post
replication repair and it has been also reported that a rad5∆ strain has longer telomere region
(Gatbonton et al., 2006; Unk et al., 2010). This is consistent with the increase of the number of
32
mapped reads in the telomere regions and may indicate a possible telomere maintenance
function for RAD5. Therefore, the sequencing data was likely to capture this phenomenon and
these demonstrate the usefulness of high-throughput sequencing technology to study telomere
length alternation.
3.3 A sgs1 44bp deletion was detected in the rmi1∆ MATα strain
through SV analyses
In order to detect medium to large size deletions, I searched through the coverage data
files computationally and detected chromosome regions where coverage is zero for each strain.
Also, Pindel was used to detect possible structural variants including deletions, insertions,
inversions and tandem duplications (Ye et al., 2009). As a result, candidate medium - large size
deletions (Table 3), the auxotrophy markers and query deletions were identified by the read
depth analysis. Pindel also detected 7 insertions and 6 deletions whose length are more than 5 bp
among the strains excluding common mutations although it failed to detect some known large
deletions such as the query deletions. Notably, both analyses found a 44bp deletion mutation
(642434 – 642477 in ChrXIII) in sgs1 in the rmi1∆ MATα strain and confirmed by PCR. The
breakpoint was determined by de novo assembling the sequence reads using ABySS (Simpson et
al., 2009) and Pindel. This deletion results in a frameshift mutation in the SGS1 ORF, resulting
in the introduction of a premature stop codon. Thus, the deletion is a promising suppressor
candidate because sgs1 loss of function mutation is known to suppress the slow growth of rmi1∆
strains (Chang et al., 2005).
33
Strain ORF Gene
Non-covered
bases
PCR
Confirmation
rmi1∆ MATα YMR190C SGS1 47 Y
dia2∆ MATα YFR016C YFR016C 56 Not performed
top3∆ MATα YHR214C-E YHR214C-E 71 Not performed
esc2∆ MATα YLR157C-C YLR157C-C 132 Not performed
rmi1∆ MATa YLR159C-A YLR159C-A 132 Not performed
csm3∆ MATα YIR019C MUC1 171 Not performed
top3∆ MATα YHR216W IMD2 1400 Not performed
csm3∆ MATα YHR211W FLO5 2240 Not performed
Table 3. A list of candidate small – large deletions
Based on read depth, possible deletions were identified among the strains. Interestingly, the
rmi1∆ MATα contains a deletion in SGS1. This deletion was detected by both Pindel and Abyss
and was confirmed by PCR.
34
3.4 Attempt to validate promising suppressor candidates
Sanger sequencing was performed to validate SNVs and micro-indels detected using the
computational pipeline (Table 4). I designed specific PCR primers to avoid non-specific
amplification by means of BLAST, and primer3. 12 SNVs and 1 micro-deletion that passed the
stringent threshold were tested. All the point mutations were identified in the deletion strains
while none of them were detected in the control strain. 5 SNVs that only passed the permissive
threshold were tested and one of them showed ambiguity. The ambiguous C to A mutation was
found at position 642939 in chromosome XIII in the top3∆ MATa strain, where C is the
reference base and A is the mutated base. The chromatograph data for the Sanger sequencing
result showed spikes of A and C at the position. In the high-throughput sequencing data, 12
high-quality reads supported the mutated base while 4 high-quality reads supported the
reference base at the corresponding position. The two sequencing attempts were consistent,
suggesting that two cell populations were mixed in the sample. The other 4 SNVs were false
positives without ambiguity, indicating miscalling of bases in the Illumina data. In addition, 3
SNVs and 1 micro-insertion below the permissive threshold were tested and all of them were
false positives.
35
Strain Threshold Chr Position Ref Alt PCR
csm3∆ MATα stringent chr11 611659 T G TP
dia2∆ MATα stringent chr02 508369 A G TP
dia2∆ MATα stringent chr04 378336 T A TP
mlh2∆ MATα stringent chr13 151483 G C TP
msh2∆ MATα stringent chr11 543179 G A TP
rad30∆ MATα stringent chr04 818841 C A TP
rad52∆ MATα stringent chr04 1247936 C T TP
rad52∆ MATα stringent chr05 530552 C A TP
rmi1∆ MATα stringent chr13 642118 G T TP
shu1∆ MATα stringent chr11 407144 A T TP
tdp1∆ MATα stringent chr15 604080 C T TP
top3∆ MATα stringent chr05 478786 C A TP
top3∆ MATα stringent chr13 641702 ATTTTT ATTTT TP
csm3∆ MATα permissive chr07 180858 A C FP
esc2∆ MATα permissive chr11 392009 T G FP
mgs1∆ MATα permissive chr07 28243 C G FP
rad52∆ MATα permissive chr11 176265 G C FP
top3∆ MATα permissive chr13 642939 C A Ambiguous SNV
esc2∆ MATα below permissive chr15 554479 G T FP
mag1∆ MATα below permissive chr08 333427 G A FP
mag1∆ MATα below permissive chr14 48787 CGG CGGG FP
mgs1∆ MATα below permissive chr07 181125 G A FP
Table 4. List of SNVs and micro-indels tested using Sanger sequencing
12 SNVs and 1 micro-deletion that passed the stringent cut-off were tested by Sanger
sequencing and all of them were confirmed as true positives (TP). 5 SNVs that only passed the
permissive cut-off were tested. Only one of them showed ambiguity in the chromatographic data,
suggesting that two types of cells having different genotypes are mixed in the sample. The other
four SNVs were confirmed as false positives (FP) due to homo-polymer error. Another 3 SNVs
and one micro-insertion that were below the permissive threshold were tested and confirmed as
FP.
36
3.5 Comparison of observed SNVs distribution with that of simulated
distribution
To explore the question of whether the SNVs identified in the deletion strains were
neutral or selected, the distribution of the SNVs in the genomes was compared with that of
randomly distributed SNVs. In addition to all the 235 SNVs identified in the strains, 16 genome
integrity genes were categorized into 6 groups based on gene function to investigate if gene
function affects selection bias (Table 1). Also, only SNVs that passed the stringent threshold
were considered because the SNVs that only passed the permissive threshold might have a high
false positive rate. The total numbers of SNVs in the groups are 15, 19, 67, 67, 61, and 37,
respectively (Table 1).
First, the ratio of transition to transversion for each group was calculated (Figure 12).
The groups that are involved in DNA catabolism, junction resolution, and mismatch repair had
lower transition to transversion rates compared to the other groups although the three groups had
transition/transversion rate very close to a reported value (0.62) by Lynch et al, which is
indicated by the dashed line (Lynch et al., 2008). In contrast, the transition/transversion rate for
all the strains is 1.18, which is approximately twice as much as the reported value. Interestingly,
the observed ratios for BER and checkpoint regulation were appeared shifted from the other
groups although the CIs were very broad and may not be statistically significant after multiple
hypothesis testing.
Next, the fraction of SNVs detected in CDS regions was calculated for each category
and the simulated data (Figure 13 and Figure 14). Interestingly, the observed fraction of SNVs
in coding regions was significantly lower than that observed in simulations (P 0.014). For the
divided groups based on their gene functions, although the observed data were lower than the
37
95% of CI of the simulated data in the other five groups except the RFP groups, the CIs for
observed data were too broad and the small number of SNVs limited our statistical power to
observe a significant difference between the simulated data and observed data after the
Bonferroni correction (Figure 13 and Figure 15). However, there is no statistical significance
between the observed and simulated data for the groups categorized by gene functions after the
Bonferroni correction. These results indicate that the distribution of mutations might have been
biased by selective pressure. This may be because the non-synonymous mutations are likely to
have more deleterious effects on the fitness and cells that harbor such mutations are removed
from the population. However, as to the categorized groups we did not observe statistical
significance to conclude this hypothesis when using the Bonferroni correction. To further
investigate this hypothesis, the fraction of deleterious mutations based on BLOSUM80 scores
was calculated (Figure 16). As the overall tendency, the fraction of deleterious mutations was
slightly lower compared to that of the simulated data although the small number of SNVs
limited the statistical power to observe a significant difference between the simulated data and
observed data. PolyPhen-2 and Provean were not used for the simulation due to the
computational load. These results show that selection biases might have affected the distribution
of SNVs in the genome integrity gene deletion strains at the non-synonymous mutation level.
38
Figure 12. Ratio of transitions to transversions in the seven groups
The transitions to transversions ratio for the seven groups were calculated. The error bars show
the 95 % CI for the ratio. The dotted line indicates the reported transitions to transversions ratio
(0.62). The ratios for the genes in the DNA catabolism, Junction Resolution, and Mismatch
Repair are slightly lower than that of the reported value. The ratios for BER and Checkpoint
Regulation are approximately 2.5 although the CIs were very broad. In total, the ratio is 1.18,
which is approximately twice as the reported value.
39
Figure 13. Fraction of SNVs and non-synonymous mutations in CDS are significantly low
compared to the simulated data
The fraction of SNVs and non-synonymous mutations in CDS for observed and simulated data
(n=1000) were plotted, respectively. (A) The fraction of SNVs in CDS that were detected in the
48 strains was 0.65 (blue vertical line), significantly lower than the fraction observed in
simulated data (P 0.014). (B) The observed fraction of non-synonymous mutations in CDS
was 0.69, which was significantly lower than the simulated data (P 0.018).
40
Figure 14. Fractions of SNVs in CDS in seven groups
The fraction of SNVs in CDS was calculated for the seven groups. The blue points and error
bars are for the observed data while the red points and error bars are for the mean of the
simulated data for 1000 times. The error bars show their 95% CI. The simulated data was
generated by randomly distributing the mutations that were observed. Thus, the numbers of each
type of mutation are the same between the observed and simulated data. The p–value of the
group containing all the SNVs (All) was 0.014, which is indicated by *. Thus, the fraction of
SNVs in CDS was significantly lower than that of the simulated data. No statistical significance
was observed for the six categorized groups based on gene functions after the Bonferroni
correction.
41
Figure 15. Fraction of non-synonymous mutations in CDS
The fraction of non-synonymous mutations in CDS was calculated for the seven groups. The
blue error bars for the observed data was calculated based on the proportion of non-synonymous
mutations while the red error bars for the simulated data was 95 % CI of the mean of the
simulated data for 1000 times. The fraction of non-synonymous mutations in CDS for the 48
strains was 0.69. This is significantly lower than that of the simulated data (P 0.018), which is
indicated by *. No statistical significance was observed for the six categorized groups based on
gene functions after the Bonferroni correction.
42
Figure 16. Fraction of deleterious SNVs in CDS
The fraction of deleterious mutations (BLOSUM80 score =< -1) in CDS was calculated for the
seven groups. The blue points and error bars are for the observed data while the red points and
error bars are for the simulated data. The error bars show their 95% CI. The CI for the observed
data was broad due to the number of SNVs. Thus, it is not possible to conclude that there is
selection bias in the current dataset.
43
3.6 The SNV-deletion strains network illuminated sgs1 mutations as
promising suppressor candidates
To find promising suppressor candidates, relationships between the deletion strains and
SNVs/micro-indels were visualized using Cytoscape (Figure 17) (Smoot et al., 2011). The
deleterious SNVs predicted by BLOSUM80, PolyPhen-2 and Provean, nonsense mutations and
frameshift mutations were visualized in the network. Also, I integrated genetic and physical
interaction data from the Biogrid database and GO slim annotations into the network.
Mutations in sgs1 within the top3∆ and rmi1∆ strains appear to be promising suppressor
candidates because the suppressor genetic interactions had been reported (Chang et al., 2005;
Gangloff et al., 1994). Although the gene relationships were known, the specific mutations
identified here had not been reported as suppressors. The pipeline detected a 44 bp (642434 –
642477 in ChrXIII) deletion in sgs1 that caused a truncation mutation due to a frameshift in the
rmi1∆ MATα deletion strain. Also, a nonsense mutation (G to T at position 642118) was
detected in sgs1 in the rmi1∆ MATa strain. Similarly, a T base deletion of one of five Ts at
position 641703 – 641707, which results in Sgs1 protein truncation due to a premature stop
codon, and a non-synonymous C to A mutation at position 642939, which causes a deleterious
amino acid substitution, were observed in the top3∆ MATa and MATα strains, respectively.
These mutations were all confirmed by PCR or Sanger sequencing although the mutation in the
top3∆ MATa strains showed ambiguity. This indicates that sgs1 mutation is a highly effective
suppressor to increase fitness in the rmi1∆ and top3∆ strains because both of the rmi1∆ and
top3∆ MATa and MATα strains have unique mutations in sgs1. The deletion of RAD51, RAD52,
RAD54 or RAD55 in the top3∆ strain are also known to increase the fitness but the doubling
time for sgs1∆ top3∆ are the shortest among the double deletion strains (Shor et al., 2002). The
44
high frequency of sgs1 mutations in these strains has been previously reported as well,
supporting the results in this study (Chang et al., 2005; Gangloff et al., 1994).
There were 18 additional interesting mutations that we identified as potential suppressors
based on the genetic interactions and the biological process (Table 5). For example, the dia2∆
MATα strain has a point mutation in mec1, which is an essential protein kinase. It has been
shown that triple deletion of DIA2, MEC1 and SML1 causes synthetic lethality in yeast (Blake et
al., 2006). As another example, a point mutation in mus81 in the rad52∆ MATα strain was
identified. The two genes have a negative genetic interaction (Pan et al., 2006). Interestingly,
deletion of RAD52 rescues the lethality of a mus81/sgs1 double mutant (Ii et al., 2011). The
presence of these two candidate suppressor mutations were confirmed using Sanger sequencing.
45
46
Figure 17. A network describing the relationship between the deletion strains and
SNV/micro-indels
This network shows that the relationship between the deletion strains and the genes that have
SNV/micro-indels. The light-blue circles indicate the deletion strains and the MATa and MATα
strains are placed side by side. The green square boxes indicate genes that have SNV/micro-
indel. The orange diamond boxes indicate essential genes that have SNV/micro-indels. Also, the
edge line width corresponds to a sum of the predictions made as deleterious mutation by
BLOSUM80, PolyPhen-2 and Provean. Therefore, the widths are either one, two or three.
Frameshift mutations and non-sense mutations have the width of three. There are two cases
where different deletion strains have point mutations in the same gene. Such genes are placed in
the middle of the figure. SGS1 acquired point mutations both in the top3∆ MATa and MATα
strains and also the rmi1∆ MATa strain.
47
3.7 MoBY-ORF 2.0 plasmid transformation for validation of the
candidate suppressors
MoBY-ORF 2.0 is a high-copy plasmid library in which each plasmid carries a single
yeast ORF with its upstream and downstream sequences (Magtanong et al., 2011). To test
whether or not specific candidate mutations were in fact suppressors, we transformed MoBY-
ORF 2.0 plasmids, which presumably contain the wild type allele, into the candidate suppressor
strains that have the candidate suppressor mutations.. If we observe reduction of the growth rate
in the strains, we would conclude that the candidate mutation is a recessive suppressor. As a
known example, the growth defects in tps1∆ cells can be suppressed by a loss of function
mutation in the hxk2 gene (Hohmann et al., 1993). Transformation of the tps1∆ strain with a
plasmid carrying wild type HXK2 causes lethality while transformation of the plasmid to the
wild type does not affect the growth rate. I used this set as my positive control for this
experiment. I performed a spot assay to see whether the transformants showed a difference in
growth rate when they were grown on YPD and YPD + methylmethanesulfonate (MMS) (0.02
and 0.0025%). I was not able to observe any obvious difference in the growth rate between the
transformants and controls (Table 5). One possible explanation for this result is that the
mutations tested were not suppressor mutations or very weak suppressors. Also, it is possible
that the MoBY-ORF 2.0 plasmids that I utilized did not contain the wild type alleles. Another
biological explanation is that the suppressor mutations were dominant thus over-expression of
wild type gene, which was recessive in this case, did not affect the fitness of the strains.
48
Index Strain
Candi
date
gene
Candidate
ORF Threshold
Sanger
Confirmation
MoBY-
ORF 2.0
Tetrad
analysis Reason to choose as candidate suppressor
1 csm3 MATα SRL3 YKR091W Stringent Y Y Y Multiple overlapping genes genetically/physically interact
with both SRL3 and CSM3
2 dia2 MATα SIR2 YDL042C Stringent Y Y Y Multiple overlapping genes genetically/physically interact
with both DIA2 and SIR2
3 dia2 MATα MEC1 YBR136W Stringent Y N/A Y dia2 mec1 sml1 triple mutant is inviable, MEC1 is an
essential gene
4 mlh2 MATa PIF1 YML061C Stringent Y Y Y NAB2 physically interacts with both MLH2 and PIF1
5 msh2 MATα DYN1 YKR054C Stringent Y N/A LSR
Multiple overlapping genes genetically/physically interact
with both MSH2 and DYN1:both involved in chromosome
movement
6 rad52
MATa MUS81 YDR386W Stringent Y Y Y
rad52∆mus81∆ - synthetic growth defect
7 rad52
MATα BRR2 YER172C Stringent Y Y Y
Multiple overlapping genes genetically/physically interact
with both RAD52 and BRR2
8 shu1 MATa HCS1 YKL017C Stringent Y N/A Y SHU1 is involved in post-replication repair: HCS1 is a
DNA helicase
9 tdp1 MATα ELG1 YOR144C Stringent Y Y Y SMT3 physically interacts with both ELG1 and TDP1
10 top3 MATα BEM2 YER155C Stringent Y N/A Y Multiple overlapping genes genetically/physically interact
with both TOP3 and BEM2
11 asf1 MATα IRA1 YBR140C Stringent Not performed Y LSR IRA1 is an essential gene and truncated
12 dia2 MATα CST6 YIL036W Stringent Not performed Y Y rad53∆dia2∆ - synthetic lethality (by RSA):RAD53
physically interacts with CST6
13 dia2 MATα SYC1 YOR179C Stringent Not performed Y Y Multiple overlapping genes genetically interact with DIA2
and SYC1
14 mgs1 MATα VPS34 YLR240W Stringent Not performed Y Y Multiple overlapping genes genetically interact with MGS1
and VPS34
15 msh4 MATa ATG19 YOL082W Stringent Not performed Y Y NAB2 physically interacts with both MSH4 and ATG19
16 ntg1MATα GPB1 YOR371C Stringent Not performed Y Y GPB1 is truncated
17 rad1 MATα RAS2 YNL098C Stringent Not performed Y Y Multiple overlapping genes genetically/physically interact
with both DIA2 and RAS2
18 xrs2 MATa GET3 YDL100C Stringent Not performed Y Y XRS2 has a negative genetic interaction with GET3
49
Table 5. A list of candidate suppressor mutations
The candidate suppressors were chosen based on the genetic/physical interactions, biological
processes, BLOSUM80 score and essentiality. For MEC1, DYN1, HCS1, and BEM2, they are
not in the MoBY-ORF 2.0 library and indicated as N/A. Regarding tetrad analysis, the msh2∆
MATα and asf1∆ MATα strains showed low sporulation rate (LSR).
50
3.8 Tetrad analysis validated sgs1 suppressor mutations
I performed tetrad analysis for the deletion strains that have suppressor candidates (Table
5). The strains were backcrossed to the parental strain and sporulated. I dissected 10 tetrads for
each strain except in strains that showed low sporulation efficiency (Table 6). Spores were then
incubated on YPD + G418 in order to select the strains that have the KanMX deletion marker.
When a suppressor mutation exists, the possible outcome is two small colonies (parental ditype),
two large colonies (non-parental ditype), or one large and one small colony (tetra type). Only
the top3∆ and rmi1∆ MATa and MATα strains showed all three combinations (Figure 18). In
order to validate the sgs1 candidate suppressor mutations, I showed that that the colony size
corresponded to the existence of the candidate suppressor mutation using colony PCR and
Sanger sequence. I tested 10 colonies (picked one colony from one tetrad) for each strain. For
the rmi1∆ MATa strain, colony PCR showed that the large colonies have the ~ 40 bp deletion as
identified by high-throughput sequencing while the small colonies do not. For the top3∆ MATa,
and the rmi1∆ MATa strains, Sanger sequencing of the PCR products showed that all the large
colonies have the candidate suppressor mutations while all the small colonies do not. The p-
values (adjusted by the Bonferroni correction) for candidate suppressors are below 0.008 (Table
6) using the binomial distribution because the probability of having the mutation is 0.5 for each
colony when it is at random. With respect to the strains that did not show obvious growth
difference between colonies, I plated the colonies on YPD + MMS (0.0025% and 0.02%) and
SC or performed spot assay. As a result, there were no obvious differences in terms of the
growth rate. Using tetrad analysis and Sanger sequencing, the three sgs1 candidate suppressor
mutations were validated as real suppressors.
51
52
Figure 18. Validation of the sgs1 suppressor mutations in top3∆ and rmi1∆ strains
A diploid heterozygous for top3 and sgs1, and rmi1 and sgs1 for both mating types was
dissected. The spores are incubated for 3 days on YPD + G418 to select the spores that have the
KanMX deletion marker. Two large colonies (parental ditype), two small colonies (non-parental
ditype), and one large and one small colony (tetra type) were observed. One of two colonies for
each tetrad was picked to validate the candidate suppressor mutations by PCR and Sanger
sequencing. The white rectangle indicates that the Sanger sequencing or PCR detects the
suppressor mutation while the red circle denotes that the colony has the wild type DNA
sequence.
53
Strains
Suppressor
Gene
# Sanger
successes
(# tested
colonies)
#
Observed
successes P-val
#
candidates
P-adj
(Bonferroni)
top3∆ MATa SGS1 10 4 Not Significant 4 Not Significant
top3∆ MATα SGS1 9(10) 9 0.00195 4 0.00781
rmi1∆ MATa SGS1 10(10) 10 0.00098 4 0.00391
rmi1∆ MATα SGS1 10(10) 10 0.00098 4 0.00391
Table 6. The results of validations for the sgs1 suppressor mutations using Sanger sequencing in the top3∆ and rmi1∆ strains
Ten samples for each strain were Sanger sequenced and p-values for the four suppressor candidates were calculated based on the
observed successes. The number of observed successes is defined as the sum of normal growing colonies that have the sgs1
suppressor and the number of slow growing colonies that do not have the sgs1 suppressor. As there are four candidate suppressors,
the p-values are adjusted using Dunn-Sidak and Bonferroni correction. The p-values for the suppressors identified in the top3∆
MATα, rmi1∆ MATa, and rmi1∆ MATα strains showed statistical significance.
54
4 Discussion
4.1 Mutation spectrum in the genome integrity deletion strains
In this study, 48 genome integrity gene deletion strains were sequenced and analyzed.
The analysis pipeline detected various types of mutations including SNVs, micro-indels,
deletions, and duplications in the strains. According to the results of assessing point mutations
using Sanger sequencing, the SNVs that passed the stringent threshold are likely to be reliable
because all of the SNVs tested by Sanger were confirmed as true positives. In contrast, the
SNVs that only passed the permissive threshold might often be false positives as we confirmed
that 4 out of 5 SNVs were not present by Sanger. This is because this sequencing data contains
some homopolymer miscalling errors in GC rich region. However, the permissive threshold was
able to capture an ambiguous SNV in the top3∆ MATα strain, which was detected in both high-
throughput and Sanger sequencing. One potential explanation for the ambiguity is that the strain
was mixed with other samples but this is not likely as the query top3 deletion reads were clean.
If the sample was contaminated with other samples, the query deletion gene should have been
covered with the reads that were derived from other samples. Another potential source of error
is that both sequencing methods mis-called the base. Another biological explanation is that there
may be two populations within a single deletion strain, one of the populations harbor the SNV.
The ambiguous SNV was found in sgs1 in the top3∆ MATα strain and the mutated base caused a
deleterious amino acid substitution. It was reported that a loss of function mutation in sgs1 can
suppress the slow growth of top3∆ strain and, therefore, it is possible that the new population
with the sgs1 mutation was emerging and was ‘caught in the act’ of taking over the slow
growing population in the top3∆ MATα strain.
55
The msh2∆ MATa strain contains many SNVs and the msh2∆ MATa and MATα strains
had elevated mutation frequency in micro-insertions and especially micro-deletions. Msh2
forms a mismatch repair protein heterodimer and plays a key role in repairing mismatched DNA
bases in yeast (Earley and Crouse, 1998; Pochart et al., 1997). Therefore, lacking Msh2 protein
can cause high mutation rate in the genome as the sequencing data suggests. However, it is not
clear why msh2∆ MATa and MATα strains have a difference in the numbers of SNVs and
micro–indels. Reasons may be that they were subjected to different laboratory and experimental
conditions (e.g. temperature, media) or a number of generations passed were different each other.
Also, it is possible that one or more of the mutated genes in the msh2∆ MATα strain play an
important role in DNA repair mechanism(s), resulting in an increase in spontaneous mutation
rate in the strain. One candidate gene is SWI1 which regulates transcription by remodeling
chromosomes. Another potential explanation is that suppressor mutations have occurred in the
msh2∆ MATa strain, masking or alleviating the effects of the gene deletion.
The sequencing data analysis provides strong indications of sgs1 suppressors in the
top3∆ and rmi1∆ strains. Three of the four sgs1 suppressors were validated using tetrad analysis
and Sanger sequencing. Although the suppressor interactions were known, the mutations
identified in this study have not been reported. Therefore, this study showed that high-
throughput sequencing technologies can identify suppressor mutations with proper downstream
analysis. However when my analyses were broadened to look for novel suppressor interactions,
of the 18 mutations that were chosen as candidate suppressor mutations, none were confirmed as
true suppressors. There are multiple possible explanations for the results. First, it is possible that
the mutations selected here are just unselected (presumably neutral) mutations. As deleting one
gene could inactivate a protein complex or downstream pathway, some of the genes that are
related to the deleted gene may have been released from maintaining selection by removal of
56
the deleted gene. As a result, mutations in these genes may be neutral in the strain although the
same mutations are deleterious in another strain. It is also conceivable that the selected
mutations are potentiating mutations, which on their own do not increase the fitness but are
required to increase fitness in combination with another mutation. Another possibility is that the
mutations are indeed suppressors but the effect of the mutations on growth rate is too subtle for
us to detect. Alternatively, the mutations could suppress the defects of other biological processes
other than the growth rate. Survival rate in cold temperature over long periods may be one of the
examples because the deletion strains were stored at 4 degrees in a cold room at various times.
4.2 Potential Molecular mechanisms of the validated sgs1 suppressor
mutations
The sequencing data shows that the top3∆ and rmi1∆ strains in both mating type had
mutations in Sgs1. The results of tetrad analysis and PCR confirmation demonstrate that the
SGS1 mutations are true suppressors except the top3∆ MATa strain. SGS1 (slow growth
suppressor) encodes a 3´ – 5´ DNA helicase and was discovered as a suppressor of the growth
defects in top3∆ and rmi1∆ mutants (Chang et al., 2005; Gangloff et al., 1994). Top3 is a type
IA topoisomerase that unlinks hemi-catenane structures formed as a result of converging double
Holliday junctions (dHJ) (Wang, 2002). Rmi1 stimulates the activity of Top3-Sgs1 complex.
Sgs1, Top3, and Rmi1 are likely to form a protein complex and are involved in the dHJ
resolution pathways. In the pathway, it has been shown in vitro that Sgs1 proteins catalyze HJ
migration and Top3 proteins unlink the converged dHJ. In addition, it was demonstrated that
dHJs were not processed without Sgs1 proteins in mitosis in vivo (Bzymek et al., 2010). Also,
Rmi1 is required to disentangle the converged dHJ when the concentrations of Top3 and Sgs1
are low in vitro (Cejka et al., 2010). Based on the function of the proteins, in the top3∆ strain
Sgs1 protein may only catalyze HJ migration but cannot fully unlink the dHJs. The converged
57
HJs may be the toxic intermediates that affect chromosome segregation. In the rmi1∆ strain, the
toxic converged dHJ may be accumulated in the same manner in the top3∆ strain because Top3-
Sgs1 protein complex may not fully dissolve the converged HJs without Rmi1 protein.
Therefore, the model of the sgs1 suppression is that SGS1 loss of function mutation in the top3∆
and rmi1∆ deletion strains prevents the convergence of double Holliday junctions, thereby
preventing the accumulation of hemi-catenane structures. The dHJs can thus be processed by
alternative fourway junction resolution proteins such as Mus81-Mms4 or Yen1 (Cejka et al.,
2010). The mutations that I validated were premature stop codons and a medium size deletion
that also causes truncation. Thus, the model is consistent with the validated suppressor
mutations.
4.3 Have genome integrity gene deletion strains evolved?
We have described the mutation spectrum of various genome integrity gene deletion
strains and identified suppressor mutations in several strains. The genome integrity gene
deletion strains were expected to accumulate many mutations and structural variations owing to
the loss of genome integrity. However, the number of SNVs identified with high confidence is
only ~ 5 among the strains although the msh2∆ MATα strain is a notable exception.
Consequently, only a few deleterious SNVs per strain were identified using BLOSUM80 matrix,
PolyPhen-2, and Provean. However, most of the predicted deleterious mutations were likely to
be neutral at its phenotypic level because no obvious growth difference was observed for most
cases in the tetrad analysis and from the MoBY-ORF 2.0 transformations. This is probably
because cells that have real deleterious mutations at the phenotypic level are easily eliminated
from the population.
58
Furthermore, the number of micro-indels identified per strain was below 1 even
including the low confident calls except the msh2∆ strains. Interestingly, only two micro-
deletions with the stringent threshold were identified in the top3∆ MATα strain and one of two
was validated as a suppressor mutation. For small to large structural variations such as inversion
and insertions, Pindel detected just 6 unique deletions and 7 unique insertions among the strains
and only one deletion that was identified in the rmi1∆ MATα strain directly affected a CDS
(SGS1). This mutation was validated as a suppressor. Pindel and read-depth analyses indicate a
large inversion and deletion event involving the SRD1 gene in chromosome III. However, this
event was observed among all the strains and, therefore, it is not interesting as a suppressor
candidate. Regarding CNVs, many of the strains maintained normal CNVs although several
chromosome duplications or telomere aberrations were identified. Thus, our sequencing data
analysis reveals that the genome integrity gene deletion strains did not harbor many mutations
and the mutation rate was generally similar across all strains.
Interestingly, DNA repair and replication related genes are known to have significantly
lower number of negative genetic interactions. This suggests that these genes were more
buffered in the gene network or need specific or different conditions to reveal genetic network
(Costanzo et al., 2010). When the buffering effect in the gene network is strong, deleterious
effects of a gene deletion would be less severe. Our findings show that the genome integrity of
the deletion strains is relatively well-maintained and also implies the robustness of the genome
integrity genetic network and buffering effects. Therefore, our results suggest that experimental
data acquired from yeast deletion strains are not generally confounded by the presence of
additional ‘unintentionally evolved’ mutations.
59
5 Conclusions and future directions
In conclusion, this study shows that the analysis of high-throughput sequencing data
alone allows us to narrow down the list of candidate suppressor mutations and identifies novel
candidate sgs1 suppressor mutations in the top3∆ and rmi1∆ strains as true suppressors. By
contrast, the other candidate suppressor mutations that were tested did not show any suppressive
effects in the conditions used in this study. The most challenging part of this study was that it is
unsure if a suppressor mutation in the starting deletion strains was present or not, and if so
where it might lie. Although the sequencing data provides the indication of possible suppressor
mutations using genetic interaction data and GO annotations, it might not be reasonable to test
all identified mutations in various experimental conditions using brute force. I suspect that most
of the candidate mutations may be neutral mutations or mild suppressors that are difficult to
detect in normal experimental conditions. However, this problem can be overcome using
laboratory-evolved strains that are known to have suppressor mutations. That is, strains of
interest can be grown in a certain condition until they show a fitness increase due to suppressor
mutations and sequenced to identify the mutations followed by tetrad analysis and PCR
confirmation. Therefore, this project can be further extended to investigate functional
relationships between genes involving these strains by conducting laboratory evolution
experiments. Also, it would be interesting to identify functional relationships among genes that
have human homologues, especially disease-related genes. This project could potentially help to
elucidate not only functional relationships between genes in yeast but also provide clues about
the possible molecular mechanisms of human disease.
60
5.1 Rationale
In the M.Sc. project, I was able to prove that the method for identification of suppressor
mutations in the yeast deletion strains worked and the suppressors were experimentally
validated. Here, I would like to propose a laboratory evolution project using yeast deletion
strains in which the query deleted genes are related to human disease. That is, the human
orthologous of the query genes are involved in human disease. Our first target is 54 genes
involved in DNA repair (22) and translation (32).
The main goals of the proposed project are to reveal functional relationships between the
genes and suppressor genes and also to illustrate the role of the genes in terms of the mechanism
of the diseases.
We chose the target genes as follows. First, the YKO, DAmP (Decreased Abundance by
mRNA Perturbation) and temperature sensitive strains that have less than 90% fitness compared
to wild type when cultured in YPD or minimal media are filtered (Breslow et al., 2008;
Deutschbauer et al., 2005). Then, we selected the genes that have human orthologs based on a
search using InParanoid 7 and the human genes are involved in human diseases according to
OMIM (Online Mendelian Inheritance in Man) database (Ostlund et al., 2010). Finally, the
selected genes were categorized based on GO ontology using FuncAssociate 2.0 and two
categories, DNA repair and translation, have a suitable number of genes to perform the lab
evolution experiments. Also, the two functions are highly conserved between human and yeast
and therefore the two categories would be suitable to study the mechanism of the diseases that
are derived from the human homolog based on the results of yeast lab evolution experiments.
Suppressor mutations in yeast have been used to study human disease (Menne et al.,
2007). The suppressors were identified using synthetic genetic array (SGA) analysis, which
61
requires genome wide mapping of double deletion strains. Consequently, this SGA method
requires extensive efforts to generate double deletion strains for mapping one suppressor locus.
In contrast, the high-throughput sequencing analysis can provide direct indications of
suppressors based on the occurrence of mutations. In addition, the method could reveal multiple
suppressors at once by sequencing many independently evolved strains, which will further
accelerate the understanding of the functional associations between genes. Furthermore, it is
also possible to detect potentiating mutations, which enhance the effects of suppressors but itself
does not affect the fitness although this would require back crossing, tetrad analysis and re-
sequencing of multiple spores. Thus, using this powerful method we will test whether the high-
throughput sequencing of lab evolved strains and the downstream analysis can characterize the
functional relationships between genes in yeast and illuminate the biology of disease mechanism.
5.2 Specific aims
Aim1. To develop suppressor mutations in the deletion strains using laboratory evolution
experiment
1) Laboratory evolution experiment
Laboratory evolution experiments in various controlled conditions are useful to research
spontaneous suppressor mutations (Blount et al., 2012; Gresham et al., 2008; Lenski and
Travisano, 1994). In this proposed project, Song Sun, a postdoctoral fellow in the Roth lab, and
I plan to conduct lab evolution experiments to select spontaneous suppressor mutations that
evolved in the context of the lack of a DNA repair or translation related gene in a high-
throughput manner.
There are two studies that can highlight the feasibility of lab evolution experiment to
describe functional relationships between genes using yeast deletion strains. First, it has been
62
reported that ~ 97% of deletion strains showed significant growth defects under certain
conditions (Hillenmeyer et al., 2008). This indicates that most of the yeast deletion strains can
be chosen as target strains for lab evolution experiments to develop suppressor mutations
because target strains have to show measurable growth defects to differentiate deletion strains
with or without suppressors and to characterize the importance of suppressor genes. Yet, it may
be feasible to reconstruct the deletion strains to measure accurate growth rate of the fresh
deletion strain and to minimize a number of mutations accumulated.
Second, a recent large-scale laboratory evolution experiment using yeast deletions
strains demonstrated that suppressor mutations can be developed often in a normal laboratory
condition without extensive culturing. According to the Dr. Balazs Papp’s study (personal
communication), more than 100 strains out of 200 deletion strains needed only 400 generations
of lab evolution to compensate for at least 50% of the initial fitness defects (unpublished data).
This study indicates that deleterious effects of deleted genes could be suppressed by only 1 or 2
mutations given that the mutation rate is 0.33 × 10 -9
per site per cell division (Lynch et al.,
2008), which is equivalent to approximately 4 mutations per 1000 cell divisions. This
convenience to select suppressor mutations further supports the feasibility of the proposed
experiment although adding drugs such as Ethyl Methanesulfonate (EMS) to increase the
mutation rate might be valuable to shorten the duration of lab evolution experiments.
We plan to reconstruct and/or evolve at least 8 independent lineages per strain of interest
for smooth identification of candidate suppressor mutations because suppressors for a specific
loss of function mutation may arise often in the same gene or functionally related genes. Then,
the strains will be grown until they show an increase in their fitness using Tecan in certain
63
conditions. In addition, during this lab evolution experiment, archival glycerol stocks of the
strains will be made so that I can validate the reproducibility of experiment.
Aim2. Identification of suppressor mutations in the query strains
1) High-throughput sequencing and mutation calling for the query deletion stains
The lab-evolved deletion strains that are known to harbor suppressor mutations will be
sequenced using the Illumina HiSeq platform. The sequencing data will be analyzed using the
computational pipeline that I have developed. Based on the sequencing data analysis, I should
be able to narrow down candidate suppressors for the strains as shown. When the strains present
only one or two candidate suppressors, they will be validated using tetrad analysis and Sanger
sequencing as shown in this study. However, evolved strains may have many mutations that
could mislead or confuse us to find suppressor mutations. In this case, it would be laborious and
not be cost-effective to test each mutation by Sanger sequence after sporulation. Thus, as
another approach, it would be reasonable to perform whole genome sequence of 10 or more
spores (either one normal or one slow glowing colony from one tetrad) to identify suppressor
mutations because the statistical significance for each candidate mutation can be calculated
based on the colony size and the existence of mutations by a single sequencing run. This method
would be also useful when suppression requires two or more mutations including potentiating
mutations that are hard to detect as they do not show obvious fitness increase.
In summary, I proposed to extend the M.Sc. project for laboratory evolved deletion
strains. In the lab evolved strains, one or more suppressor mutations should be accumulated to
compensate for the fitness defects. Our genes of interest have homologs in human and they are
known to be involved in DNA repair and translation. Using the computational pipeline that I
64
have developed and the experimental validation, it may be possible to find multiple yeast
orthologs that are involved in disease through the analysis of the suppressor genetic interactions.
This project could potentially reveal the functional relationships between genes and furthermore,
it could unveil the biology underlying the diseases caused by the DNA repair and translation
related genes.
65
6 Reference
Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P.,
Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting damaging
missense mutations. Nat. Methods 7, 248-249.
Avery, L., and Wasserman, S. (1992). Ordering gene function: the interpretation of epistasis in
regulatory hierarchies. Trends in Genetics : TIG 8, 312-316.
Blake, D., Luke, B., Kanellis, P., Jorgensen, P., Goh, T., Penfold, S., Breitkreutz, B.J.,
Durocher, D., Peter, M., and Tyers, M. (2006). The F-box protein Dia2 overcomes replication
impedance to promote genome stability in Saccharomyces cerevisiae. Genetics 174, 1709-1727.
Blount, Z.D., Barrick, J.E., Davidson, C.J., and Lenski, R.E. (2012). Genomic analysis of a key
innovation in an experimental Escherichia coli population. Nature 489, 513-518.
Breslow, D.K., Cameron, D.M., Collins, S.R., Schuldiner, M., Stewart-Ornstein, J., Newman,
H.W., Braun, S., Madhani, H.D., Krogan, N.J., and Weissman, J.S. (2008). A comprehensive
strategy enabling high-resolution functional analysis of the yeast genome. Nat. Methods 5, 711-
718.
Broach, J.R. (1991). RAS genes in Saccharomyces cerevisiae: signal transduction in search of a
pathway. Trends Genet. 7, 28-33.
Bzymek, M., Thayer, N., Oh, S., Kleckner, N., and Hunter, N. (2010). Double Holliday
junctions are intermediates of DNA break repair. Nature 464, 937-941.
Cejka, P., Plank, J.L., Bachrati, C.Z., Hickson, I.D., and Kowalczykowski, S.C. (2010). Rmi1
stimulates decatenation of double Holliday junctions during dissolution by Sgs1-Top3. Nat.
Struct. Mol. Biol. 17, 1377-1382.
Chang, M., Bellaoui, M., Zhang, C., Desai, R., Morozov, P., Delgado-Cruzata, L., Rothstein, R.,
Freyer, G., Boone, C., and Brown, G. (2005). RMI1/NCE4, a suppressor of genome instability,
66
encodes a member of the RecQ helicase/Topo III complex. The {EMBO} Journal 24, 2024-
2033.
Choi, Y., Sims, G.E., Murphy, S., Miller, J.R., and Chan, A.P. (2012). Predicting the functional
effect of amino acid substitutions and indels. PLoS One 7, e46688.
Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh,
J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic landscape of a cell. Science 327,
425-431.
Costanzo, M., and Boone, C. (2009). SGAM: an array-based approach for high-resolution
genetic mapping in Saccharomyces cerevisiae. Methods Mol. Biol. 548, 37-53.
Deutschbauer, A.M., Jaramillo, D.F., Proctor, M., Kumm, J., Hillenmeyer, M.E., Davis, R.W.,
Nislow, C., and Giaever, G. (2005). Mechanisms of haploinsufficiency revealed by genome-
wide profiling in yeast. Genetics 169, 1915-1925.
Earley, M.C., and Crouse, G.F. (1998). The role of mismatch repair in the prevention of base
pair mutations in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U. S. A. 95, 15487-15491.
Gangloff, S., McDonald, J.P., Bendixen, C., Arthur, L., and Rothstein, R. (1994). The yeast type
I topoisomerase Top3 interacts with Sgs1, a DNA helicase homolog: a potential eukaryotic
reverse gyrase. Mol. Cell. Biol. 14, 8391-8398.
Gasch, A., Huang, M., and Metzner…, S. (2001). Genomic expression responses to DNA-
damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Molecular Biology
of …
Gatbonton, T., Imbesi, M., Nelson, M., Akey, J.M., Ruderfer, D.M., Kruglyak, L., Simon, J.A.,
and Bedalov, A. (2006). Telomere length as a quantitative trait: genome-wide survey and
genetic mapping of telomere length-control genes in yeast. PLoS Genet. 2, e35.
Gresham, D., Desai, M.M., Tucker, C.M., Jenq, H.T., Pai, D.A., Ward, A., DeSevo, C.G.,
Botstein, D., and Dunham, M.J. (2008). The repertoire and dynamics of evolutionary
adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 4, e1000303.
67
Hanein, D., Volkmann, N., Goldsmith, S., Michon, A., Lehman, W., Craig, R., DeRosier, D.,
Almo, S., and Matsudaira, P. (1998). An atomic model of fimbrin binding to F-actin and its
implications for filament crosslinking and regulation. Nat. Struct. Biol. 5, 787-792.
Henikoff, S., and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks.
Proc. Natl. Acad. Sci. U. S. A. 89, 10915-10919.
Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, S.E., Hoon, S., Lee, W., Proctor, M., St
Onge, R.P., Tyers, M., Koller, D., et al. (2008). The chemical genomic portrait of yeast:
uncovering a phenotype for all genes. Science 320, 362-365.
Hohmann, S., Neves, M.J., de Koning, W., Alijo, R., Ramos, J., and Thevelein, J.M. (1993).
The growth and signalling defects of the ggs1 (fdp1/byp1) deletion mutant on glucose are
suppressed by a deletion of the gene encoding hexokinase PII. Curr. Genet. 23, 281-289.
Ii, M., Ii, T., Mironova, L.I., and Brill, S.J. (2011). Epistasis analysis between homologous
recombination genes in Saccharomyces cerevisiae identifies multiple repair pathways for Sgs1,
Mus81-Mms4 and RNase H2. Mutat. Res. 714, 33-43.
Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., and Turner, D.J. (2009).
Amplification-free Illumina sequencing-library preparation facilitates improved mapping and
assembly of (G+C)-biased genomes. Nat. Methods 6, 291-295.
Krogan, N., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N.,
Tikuisis, A., et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces
cerevisiae. Nature 440, 637-643.
Lehner, K.R., Stone, M.M., Farber, R.A., and Petes, T.D. (2007). Ninety-six haploid yeast
strains with individual disruptions of open reading frames between YOR097C and YOR192C,
constructed for the Saccharomyces genome deletion project, have an additional mutation in the
mismatch repair gene MSH3. Genetics 177, 1951-1953.
Lenski, R.E., and Travisano, M. (1994). Dynamics of adaptation and diversification: a 10,000-
generation experiment with bacterial populations. Proc. Natl. Acad. Sci. U. S. A. 91, 6808-6814.
68
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics 26, 589-595.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.,
Durbin, R., and 1000 Genome Project Data Processing Subgroup. (2009). The Sequence
Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
Liti, G., and Louis, E. (2003). NEJ1 prevents NHEJ-dependent telomere fusions in yeast without
telomerase. Mol. Cell 11, 1373-1378.
Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dickinson, W.J.,
Okamoto, K., Kulkarni, S., Hartl, D.L., and Thomas, W.K. (2008). A genome-wide view of the
spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U. S. A. 105, 9272-9277.
Magtanong, L., Ho, C.H., Barker, S.L., Jiao, W., Baryshnikova, A., Bahr, S., Smith, A.M.,
Heisler, L.E., Choy, J.S., Kuzmin, E., et al. (2011). Dosage suppression genetic interaction
networks enhance functional wiring diagrams of the cell. Nat. Biotechnol. 29, 505-511.
Menne, T., Goyenechea, B., S\'anchez-Puig, N., Wong, C., Tonkin, L., Ancliff, P., Brost, R.,
Costanzo, M., Boone, C., and Warren, A. (2007). The Shwachman-Bodian-Diamond syndrome
protein mediates translational activation of ribosomes in yeast. Nat. Genet. 39, 486-495.
Mitsuzawa, H., Uno, I., Oshima, T., and Ishikawa, T. (1989). Isolation and characterization of
temperature-sensitive mutations in the RAS2 and CYR1 genes of Saccharomyces cerevisiae.
Genetics 123, 739-748.
Nielsen, R., Paul, J.S., Albrechtsen, A., and Song, Y.S. (2011). Genotype and SNP calling from
next-generation sequencing data. Nat. Rev. Genet. 12, 443-451.
Ostlund, G., Schmitt, T., Forslund, K., Kostler, T., Messina, D.N., Roopra, S., Frings, O., and
Sonnhammer, E.L. (2010). InParanoid 7: new algorithms and tools for eukaryotic orthology
analysis. Nucleic Acids Res. 38, D196-203.
Pak, T.R., and Roth, F.P. (2013). ChromoZoom: a flexible, fluid, web-based genome browser.
Bioinformatics 29, 384-386.
69
Pan, X., Ye, P., Yuan, D., Wang, X., Bader, J., and Boeke, J. (2006). A DNA integrity network
in the yeast Saccharomyces cerevisiae. Cell 124, 1069-1081.
Pochart, P., Woltering, D., and Hollingsworth, N.M. (1997). Conserved properties between
functionally distinct MutS homologs in yeast. J. Biol. Chem. 272, 30345-30349.
Prelich, G. (1999). Suppression mechanisms: themes from variations. Trends in Genetics : TIG
15, 261-266.
Rouse, J., and Jackson, S. (2002). Interfaces between the detection, signaling, and repair of
DNA damage. Science (New York, N. Y. ) 297, 547-551.
Sandrock, T., O'Dell, J., and Adams, A. (1997). Allele-specific suppression by formation of new
protein-protein interactions in yeast. Genetics 147, 1635-1642.
Sass, P., Field, J., Nikawa, J., Toda, T., and Wigler, M. (1986). Cloning and characterization of
the high-affinity cAMP phosphodiesterase of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci.
U. S. A. 83, 9303-9307.
Sherman, F. (2002). Getting started with yeast. Methods Enzymol. 350, 3-41.
Shiloh, Y. (2003). ATM and related protein kinases: safeguarding genome integrity. Nature
Reviews. Cancer 3, 155-168.
Shor, E., Gangloff, S., Wagner, M., Weinstein, J., Price, G., and Rothstein, R. (2002). Mutations
in homologous recombination genes rescue top3 slow growth in Saccharomyces cerevisiae.
Genetics 162, 647-662.
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., and Birol, I. (2009). ABySS:
a parallel assembler for short read sequence data. Genome Res. 19, 1117-1123.
Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., and Ideker, T. (2011). Cytoscape 2.8: new
features for data integration and network visualization. Bioinformatics 27, 431-432.
70
Tarassov, K., Messier, V., Landry, C., Radinovic, S., Serna Molina, M., Shames, I., Malitskaya,
Y., Vogel, J., Bussey, H., and Michnick, S. (2008). An in vivo map of the yeast protein
interactome. Science (New York, N. Y. ) 320, 1465-1470.
Tong, A.H., Lesage, G., Bader, G.D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G.F., Brost,
R.L., Chang, M., et al. (2004). Global mapping of the yeast genetic interaction network. Science
303, 808-813.
Unk, I., Hajdu, I., Blastyak, A., and Haracska, L. (2010). Role of yeast Rad5 and its human
orthologs, HLTF and SHPRH in DNA damage tolerance. DNA Repair (Amst) 9, 257-267.
Uno, I., Matsumoto, K., and Ishikawa, T. (1983). Characterization of a cyclic nucleotide
phosphodiesterase-deficient mutant in yeast. The Journal of Biological Chemistry 258, 3539-
3542.
Wang, J. (2002). Cellular roles of DNA topoisomerases: a molecular perspective. Nature
Reviews.Molecular Cell Biology 3, 430-440.
Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B.,
Bangham, R., Benito, R., Boeke, J.D., Bussey, H., et al. (1999). Functional characterization of
the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901-906.
Ye, K., Schulz, M.H., Long, Q., Apweiler, R., and Ning, Z. (2009). Pindel: a pattern growth
approach to detect break points of large deletions and medium sized insertions from paired-end
short reads. Bioinformatics 25, 2865-2871.
Yu, H., Braun, P., Yildirim, M., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa,
T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-quality binary protein interaction map of
the yeast interactome network. Science (New York, N. Y. ) 322, 104-110.
Yuen, K.W., Warren, C.D., Chen, O., Kwok, T., Hieter, P., and Spencer, F.A. (2007).
Systematic genome instability screens in yeast and their potential relevance to cancer. Proc.
Natl. Acad. Sci. U. S. A. 104, 3925-3930.
71
Zhu, J., Zhang, B., Smith, E., Drees, B., Brem, R., Kruglyak, L., Bumgarner, R., and Schadt, E.
(2008). Integrating large-scale functional genomic data to dissect the complexity of yeast
regulatory networks. Nat. Genet. 40, 854-861.
72
7 Appendices
7.1 Other relevant research
7.1.1 Large-Scale Identification of extragenic suppressor mutations in yeast
Jolanda van Leeuwen, Joseph Mellor, Anastasia Baryshnikova, Takafumi Yamaguchi, Atina
Cote, Michael Costanzo, Brenda Andrews, Frederick Roth, Charles Boone
CCBR - University of Toronto, Toronto, Ontario, Canada
My contribution to this project
I analyzed ~ 300 candidate suppressor strains and background strains using the
computational pipeline that I have developed. As a result, I have detected promising suppressor
candidates in more than 200 of the strains including SNVs and small - large structural
variations. In addition, all the genomes that I have analyzed were visualized using
ChromoZoom.
The detailed information that was written by Jolanda van Leeuwen with minor adaptations by
me is available below.
Abstract
Genetic mutations can adversely affect biological pathways often resulting in cellular
damage that may lead to human disease. In some instances, accumulation of secondary
mutations elsewhere in the genome may compensate the deleterious effects of the primary
mutation. This phenomenon is referred to as genetic suppression and it provides a powerful tool
for identifying novel functional relationships between genes and their corresponding pathways.
While isolated suppressors have been identified previously, large-scale suppressor identification
73
and the mapping of a suppressor genetic network have never been performed.
Introduction
The Boone lab established Synthetic Genetic Array (SGA) analysis that allows for the
automated construction of high-density arrays of double mutants and the identification of
genetic interactions in yeast (Tong et al., 2004). Because SGA analysis can be applied to any
genetic element linked to a selectable marker, it can also be adapted for a variety of different
genetic screens. In particular, SGA analysis enables high-resolution genetic mapping of loci
associated with spontaneously arising suppressor mutations (Costanzo and Boone, 2009). Using
SGA mapping, we identified spontaneous extragenic mutations that suppress the fitness defects
associated with ~200 yeast deletion and conditional temperature sensitive mutants. We used
next-generation sequencing to generate whole genome sequences of these strains to pinpoint the
precise mutation within the loci outlined by SGA mapping.
Results
Whole-genome sequencing identified a single mutation within the suppressor gene
linkage group in 80% of the strains. More than 90% of these mutations are non-synonymous
single nucleotide variants whereas 10% represent larger deletion or insertion mutations. These
experiments identify a high-confidence list of candidate mutations that suppress growth defects
associated with deletion alleles of non-essential genes.
Future directions
We are now confirming the genetic suppression interactions identified in our SGA
screens using three assays; [a] Plasmid-based complementation, where we introduce a wild-type
copy of the suppressor gene into the query strain carrying the suppressor mutation – in this
74
assay, we expect the wild-type allele to reduce growth of the query strain if the suppressor
mutation is recessive; [b] Genetic analysis, where we cross the query strain carrying the
suppressor mutation to a wild-type strain and perform traditional tetrad analysis to ask if the
suppression phenotype is associated with a single gene; [c] Strain reconstruction – we will use
standard allele replacement to introduce the putative suppressor mutation identified by
sequencing into a diploid strain heterozygous for the query gene deletion allele and conduct
another round of tetrad analysis to test for a suppression phenotype.
This collaborative project is well on its way and should map the first large-scale
suppressor genetic interaction network.
75
7.1.2 Identification of E0005 resistant mutations in yeast using next-
generation sequencing
Siyang Li’s project
My contribution to this project
I assisted Siyang in the analysis of 15 drug resistant strains against a natural compound
called E0005 using the computational pipeline that I have developed. We identified various
lem3, pdr1 and pdr3 mutations that would confer the resistance in the strains.
7.1.3 Generation of the lift-over genomes of BY4741 and BY4742 strains
Available at the Saccharomyces Genome Database
BY4741 genome - http://downloads.yeastgenome.org/sequence/strains/BY4741/
BY4742 genome - http://downloads.yeastgenome.org/sequence/strains/BY4742/
76
7.2 Poster Presentation
The 13th International Conference on Systems Biology, University of Toronto, Canada (Aug 20,
2012)
Systematic analysis of suppressor mutations in S. cerevisiae strains with deleted genome
integrity genes
Takafumi Yamaguchi1,3,4
, Joseph Mellor3,4
, Hon Nian Chua3,4
, Atina Cote3,4
, Anna
Karkhanina3,4
, Daniel Durocher1,4
, Frederick Roth1,2,3,4
1) Department of Molecular Genetics, 2) Department of Computer Science; and 3) Donnelly
Centre, University of Toronto, Toronto, Ontario M5S-3E1; and 4) Samuel Lunenfeld Research
Institute, Mt. Sinai Hospital, Toronto, Ontario M5G-1X5, Canada