Systematic Analysis of Suppressor Mutations in S ... · Systematic Analysis of Suppressor Mutations in S. cerevisiae Strains with Deleted Genome Integrity Genes Takafumi Yamaguchi

Systematic Analysis of Suppressor Mutations in S. cerevisiae

Strains with Deleted Genome Integrity Genes

by

Takafumi Yamaguchi

A thesis submitted in conformity with the requirements

for the degree of Master of Science

Molecular Genetics

University of Toronto

© Copyright by Takafumi Yamaguchi, 2013

ii

Systematic Analysis of Suppressor Mutations in S. cerevisiae Strains

with Deleted Genome Integrity Genes

Takafumi Yamaguchi

Master of Science

Molecular Genetics

University of Toronto

2013

Abstract

The effects of a mutation in one gene can occasionally be suppressed by mutation in another

gene. Genetic suppression indicates functional relationships and provides clues about the

mechanism and order of action in genetic pathways. Here I explored the existing yeast deletion

collection to identify suppressor relationships. The collection was released in 2000 and it is

known that some strains in the collection have acquired mutations. Whole genome sequencing

of 48 yeast deletion strains corresponding to 26 genome integrity genes was performed. High-

throughput sequencing revealed a broad mutational spectrum including point mutations, indels,

and copy number variations. I identified and experimentally validated two new suppressor

mutations (sgs1 mutations in both top3∆ and rmi1∆ strains) corresponding to gene pairs with

previously known suppressor relationships. Thus, high-throughput sequencing and analysis of

yeast deletion strains can identify suppressor mutations. The resulting genome sequences also

provide a baseline for future laboratory evolution experiments.

iii

Acknowledgments

First and foremost, I would like to express my sincere gratitude to my supervisor, Dr.

Frederick Roth, for providing patient guidance, supportive mentorship and an excellent work

environment during my M.Sc. studies. Under his thought-stimulating supervision, I was able to

focus on my research, learn new techniques and gain knowledge. I am truly fortunate to have

had the opportunity to work with him right after he opened the Roth laboratory at the University

of Toronto.

I would also like to thank my committee members, Dr. Leah Cowen and Dr. Zhaolei

Zhang, for the friendly guidance and valuable suggestions to complete my project. Additionally,

I would like to acknowledge Dr. Daniel Durocher for helping us select interesting genome

integrity genes.

I am truly grateful for the help and support of my lab-mates and collaborators. I would

like to thank Joe Mellor, Dax Torti, and Anna Karkhanina for generating sequencing data to

analyze. I greatly benefited from Joe’s wide-ranging knowledge of molecular and computational

biology. Atina Coté also shared her broad knowledge of DNA repair and yeast biology with me.

I am really indebted to Kenny Chua, Nozomu Yachie, and Murat Taşan for their patient and

friendly advice. I would have had a much more difficult time to develop the computational

pipeline without their help. Xinlei Shun, a visiting summer student, performed a part of the

experiments in this thesis under my supervision. Theodore Pak developed a genome browser

called ChromoZoom, which allowed me to easily and intuitively visualize the results of the

sequencing data analyses. I am thankful to Siyang Li, J. Javier Díaz-Mejía, Mariana Babor, Paul

Bansal, and Jolanda van Leeuwen for many valuable discussions.

I would also like to show my appreciation to my best friend in Toronto, Kota Hatta, for

his warm friendship, encouragement and valuable advice.

Finally and most importantly, I would like to express my heartfelt gratitude to my

immediate family, my mother, Akiko Yamaguchi, and my sister, Mika Yamasaki. This thesis is

dedicated to my mother and sister for their understanding, support, patience, and love during my

studies in Canada.

iv

Table of Contents

Contents

Acknowledgments ......................................................................................................................... iii

Table of Contents ............................................................................................................................ iv

List of Tables .................................................................................................................................. vi

List of Figures ................................................................................................................................vii

List of Abbreviations ................................................................................................................... viii

1 Introduction ................................................................................................................................. 1

1.1 Suppressor genetic interactions illuminate functional relationships between genes ........... 1

1.2 Suppressor genetic interactions to complement the current genetic interaction

networks of the cell .............................................................................................................. 4

1.3 Importance of understanding the mechanism of genome integrity ..................................... 5

1.4 Convenience of high-throughput sequencing technology to analyze the multiple

genomes of the yeast strains ................................................................................................ 7

1.5 High-throughput sequencing helps to reveal suppressor mutations in strains ..................... 8

2 Materials and Methods ................................................................................................................ 9

2.1 Media and Yeast strains ....................................................................................................... 9

2.2 High-throughput sequencing ............................................................................................... 9

2.3 Computational pipeline for Illumina sequencing data analyses ........................................ 10

2.4 Simulation of SNV distribution ......................................................................................... 13

2.5 Sanger sequencing ............................................................................................................. 15

2.6 Confirmation of candidate suppressor mutations using MoBY-ORF 2.0

Transformation .................................................................................................................. 15

3 Results ....................................................................................................................................... 16

3.1 High-throughput sequencing of genome integrity gene deletion strains and its data

analysis detects unique SNVs ............................................................................................ 16

v

3.2 CNV analysis detected large duplications and telomere aberration in the deletion

strains ................................................................................................................................. 31

3.3 A sgs1 44bp deletion was detected in the rmi1∆ MATα strain through SV analyses ........ 32

3.4 Attempt to validate promising suppressor candidates ....................................................... 34

3.5 Comparison of observed SNVs distribution with that of simulated distribution .............. 36

3.6 The SNV-deletion strains network illuminated sgs1 mutations as promising

suppressor candidates ........................................................................................................ 43

3.7 MoBY-ORF 2.0 plasmid transformation for validation of the candidate suppressors ...... 47

3.8 Tetrad analysis validated sgs1 suppressor mutations ........................................................ 50

4 Discussion ................................................................................................................................. 54

4.1 Mutation spectrum in the genome integrity deletion strains ............................................. 54

4.2 Potential Molecular mechanisms of the validated sgs1 suppressor mutations .................. 56

4.3 Have genome integrity gene deletion strains evolved? ..................................................... 57

5 Conclusions and future directions ............................................................................................. 59

5.1 Rationale ............................................................................................................................ 60

5.2 Specific aims ...................................................................................................................... 61

6 Reference .................................................................................................................................. 65

7 Appendices ................................................................................................................................ 72

7.1 Other relevant research ...................................................................................................... 72

7.1.1 Large-Scale Identification of extragenic suppressor mutations in yeast ............... 72

7.1.2 Identification of E0005 resistant mutations in yeast using next-generation

sequencing ............................................................................................................. 75

7.1.3 Generation of the lift-over genomes of BY4741 and BY4742 strains .................. 75

7.2 Poster Presentation............................................................................................................. 76

vi

List of Tables

Table 1. Categories of genome integrity genes for the simulation of SNV distribution ............. 14

Table 2. A list of query strains and their notable structural variations ........................................ 20

Table 3. A list of candidate small – large deletions ..................................................................... 33

Table 4. List of SNVs and micro-indels tested using Sanger sequencing ................................... 35

Table 5. A list of candidate suppressor mutations ....................................................................... 49

Table 6. The results of validations for the sgs1 suppressor mutations using Sanger sequencing in

the top3∆ and rmi1∆ strains ......................................................................................................... 53

vii

List of Figures

Figure 1. Computational workflow for processing Illumina sequencing reads ........................... 12

Figure 2. Median coverage for each of the deletion strains ......................................................... 21

Figure 3. Fraction of genome covered at more than certain read depth ...................................... 22

Figure 4. Numbers of total SNVs for each deletion strain .......................................................... 23

Figure 5. Numbers of total micro-insertions for each deletion strain .......................................... 24

Figure 6. Numbers of total micro-deletions for each deletion strain ........................................... 25

Figure 7. Numbers of non-synonymous mutations for each deletion strain ................................ 26

Figure 8. Numbers of deleterious non-synonymous mutations using BLOSUM80 matrix ........ 27

Figure 9. Numbers of deleterious non-synonymous mutations using PolyPhen-2 ...................... 28

Figure 10. Numbers of deleterious non-synonymous mutations using Provean ......................... 29

Figure 11. Correlation of deleterious SNVs between BLOSUM80, PolyPhen2, and Provean ... 30

Figure 12. Ratio of transitions to transversions in the seven groups ........................................... 38

Figure 13. Fraction of SNVs and non-synonymous mutations in CDS are significantly low

compared to the simulated data ................................................................................................... 39

Figure 14. Fractions of SNVs in CDS in seven groups ............................................................... 40

Figure 15. Fraction of non-synonymous mutations in CDS ........................................................ 41

Figure 16. Fraction of deleterious SNVs in CDS ........................................................................ 42

Figure 17. A network describing the relationship between the deletion strains and SNV/micro-

indels ............................................................................................................................................ 46

Figure 18. Validation of the sgs1 suppressor mutations in top3∆ and rmi1∆ strains .................. 52

viii

List of Abbreviations

Abbrebiation Meaning First occurance (page)

BER Base Excision Repair 13

BWA Burrows-Wheeler Aligner 9

CDS CoDing Sequence 13

CI Confidence Interval 13

CIN Chromosome Instability 6

CNV Copy Number Variation 3

DAmP Decreased Abundance by mRNA

Perturbation

60

DEL Deletion 11

dHJ double Holliday junction 56

DUP Tandem Duplication 11

EMS Ethyl methanesulfonate 62

FP False Positive 35

gDNA genomic DNA 9

GO Gene Ontology 13

INS Insertion 11

INV Inversion 11

ix

MMS Methylmethanesulfonate 47

MoBY Molecular Barcoded Yeast 15

OMIM Online Mendelian Inheritance in Man 60

ORF Open Reading Frame 11

RFP Replication Fork Progression 13

RPL Replacement 11

SAMtools Sequence Alignment/Map tools 9

SGA Synthetic Genetic Array 60

SGD Saccharomyces Genome Database 7

SNV Single Nucleotide Variant 10

TP True Positive 35

1

1 Introduction

1.1 Suppressor genetic interactions illuminate functional relationships

between genes

Adaptation is a process where a population of organisms evolves through natural

selection to become more fit within the context of a given external environment or an internal

genetic change. The driving force of adaptation is beneficial mutations that result in increased

fitness. This kind of adaptation can be used to identify functionally related genes and to study

genetic pathways. A commonly used method is to start with a strain that already has a known

mutation and to wait until the population accumulates new mutations that ameliorate or

counteract the effects of the existing mutation (Prelich, 1999). Distinct mutations that counteract

the effects of another mutation are called suppressors and often indicate potentially strong

functional relationships between the gene pairs, which may not be easily detectable by other

methods. Therefore, suppressors have been extensively used as a very valuable tool to determine

the roles of genes.

There are mainly three classes of suppressors, which are particularly useful to investigate

functional relationships between genes and also to investigate pathway orders. These classes

are: Class 1 – Alteration in the activity of the mutant proteins, Class 2 – Alteration in the

activity of the mutant pathway, and Class 3 – Alteration in the activity of a different pathway

(Prelich, 1999). A primary mutation may cause the structural change of a mutant protein and

decrease the protein’s ability to bind its interacting partner. This defect could be suppressed by

specifically modifying the binding domain structure of the interacting partner or, more likely, by

forming additional contact sites. For example, an S. cerevisiae actin-binding protein (Sac6)

suppressor was previously identified in strains bearing a temperature-sensitive mutation in actin

2

(Act1). The act1 mutation reduces the affinity of Act1 for Sac6 but the Sac6 suppressor protein

restores the affinity for Act1 mutant protein by forming another binding site (Hanein et al.,

1998; Sandrock et al., 1997). This example of a Class 1 suppressor illustrates the importance of

suppressors to identify direct interactions of two gene products and also to provide information

about the protein structure involved in the interaction.

Suppressors belonging to Class 2 are often extremely valuable because they not only

identify other gene products involved in the pathway and their functional relationships but also

provide indications of the pathway order (Avery and Wasserman, 1992). A pde1 suppressor

mutation, for example, was detected in the ras2 temperature sensitive strain (Uno et al., 1983).

Both Ras2 and Pde1 are involved in the RAS/cAMP pathway, which plays a major role in the

control of metabolism, stress resistance, and cell proliferation (Broach, 1991). In brief, Ras2 is

required for inducing high cAMP levels, activating the RAS/cAMP pathway whilst Pde1

maintains the basal cAMP levels by cAMP degradation. When the protein activity of Pde1 is

disrupted due to a loss of function mutation in PDE1, which is a suppressor mutation against the

primary ras2 mutation that affects temperature sensitivity, the concentration of cAMP is kept

high enough to perform normal cell proliferation because the mutated Pde1 protein loses the

ability to catalyze cAMP degradation. Other suppressor relationships involved in the pathway

have been identified such as between RAS2 and CYR1, and RAS2 and PDE2 (Mitsuzawa et al.,

1989; Sass et al., 1986). The accumulation of such information can facilitate ordering the

pathway and understanding its function(s).

Suppressors in Class 3 may help find relationships between a pathway that contains a

primary mutation and a pathway that has a suppressor mutation against the primary mutation.

Suppressors in this class may modify the regulation of a pathway that has a related function or

3

change the function of a pathway that does not relate to the primary pathway. For instance, in

Escherichia coli, a suppressor mutation in the lactose permease allows for maltose

transportation in the maltose permease deficient strain although the wild type lactose permease

does not have an ability to transport maltose. Thus, Class 3 suppressors can also give us some

insights about gene functions.

In this study, we investigate suppressor genetic relationships in yeast haploid deletion

strains that were generated by the Saccharomyces Genome Deletion Project (Winzeler et al.,

1999). We sequenced the genomes of 48 yeast deletion haploid strains corresponding to 26

genome integrity genes (both mating types for most of these genes were sequenced) using the

Illumina HiSeq 2000 Platform. Importantly, high-throughput sequencing allows us to reveal the

mutational spectrum of unintentional laboratory evolution, including interesting structural

variations, which may have arisen in the strains. As the genome-wide mutation spectrum in the

deletion strains is largely unknown, it is important to access the comprehensive mutation

spectrum including structural variations, point mutations, copy number variations (CNVs), and

suppressors on the genome were analyzed.

Here, potential outcomes with respect to the sequencing results could fall along a

continuum between two extremes. One extreme outcome is that the deletion strains harbor many

mutations and have a diverse mutational spectrum as they have lost one genome integrity gene

and also it has been more than 10 years since they were generated. In this case, we may need to

question the biological insights that were derived from the data using genome integrity gene

deletion strains as mutations accumulated in the strain could affect experimental data. However,

mutations accumulated in the strains would be valuable information to study the effects of

mutations. This is because investigating genome-wide suppressors that occur in a strain bearing

4

a specific null mutant gene potentially allows us to reveal a new network of genetic interactions

and find novel functional relationships between genes.

A possible outcome on the other extreme is that the strains do not have many mutations

and the mutational spectrum is limited. This may result in finding no suppressor mutations in

the strains. However, the second outcome can potentially support the reliability of the data that

have been published using genome integrity gene deletion strains. In addition, the second

possible outcome would imply the robustness and buffering effects of cellular network in the

deletion strains. Furthermore, as the genome integrity gene deletion strains are expected to

harbor some of the most affected genomes among the deletion strains owing to a gene deletion,

the second outcome would generally support the experimental data obtained using other yeast

deletion strains.

1.2 Suppressor genetic interactions to complement the current genetic

interaction networks of the cell

S. cerevisiae is one of the most well studied model organisms because yeast grows

rapidly, has a small genome and can easily be manipulated in normal laboratory conditions. It

has allowed development of a wide range of genetic engineering techniques, and many reagents

are available for studying specific gene function, such as deletion strain collections and a set of

systematic functional annotations (Sherman, 2002). By taking advantage of these features and

reagents, many studies have been conducted in a high-throughput manner to reveal the entire

cellular network on the basis of the genetic and physical interactions. For example, large-scale

physical interaction networks for budding yeast were constructed using different methodologies

(Krogan et al., 2006; Tarassov et al., 2008; Yu et al., 2008). In addition to the protein-protein

interaction networks, much of the genome-wide genetic interaction network has been

5

established based on the fitness of 5.4 million yeast double deletion strains (Costanzo et al.,

2010). Also, a dosage-suppression genetic interaction network for yeast has been described

using high-copy number plasmids over-expressing a wild type gene, which can restore a mutant

phenotype (Magtanong et al., 2011). These genetic and physical interaction network data help us

to understand many aspects of the biological processes in yeast. However, despite the extensive

mapping of the genetic and physical interaction networks, mapping of the global functional

network of the cell has not been completed.

Interestingly, a dosage suppression genetic interaction study showed that many dosage

suppression interactions identified did not overlap with other types of genetic and protein

interactions (Magtanong et al., 2011). Also, it has been shown that the integration of multiple

genetic and physical networks is a powerful method to understand the more complete cellular

network (Zhu et al., 2008). Despite the importance of suppressor genetic interactions, genome-

wide effects of suppressor mutations accumulated in the yeast deletion strains are largely

unknown. Thus, the systematic analyses of suppressor genetic interactions in the yeast deletion

strain could provide valuable information to further complement the global understanding of

functional relationships and pathways in the cell. Although only 26 genome integrity gene

deletion strains were sequenced and analyzed in this project, I have also been involved in a

collaborative project to reveal the genome-wide suppressor network in the yeast deletion strains

(Appendices 6.1.1).

1.3 Importance of understanding the mechanism of genome integrity

Genome integrity genes play a crucial role in maintaining genome stability during

complex cellular processes such as DNA replication and DNA repair. Losing genome integrity

in humans can cause serious diseases including cancer and immune deficiency (Shiloh, 2003).

6

Also, it has been shown that mutations resulting in genomic instability can cause additional

changes in the yeast genome (Gasch et al., 2001; Lehner et al., 2007). For example, a frame-

shift mutation that truncates the Msh3 protein causes microsatellite instability (Lehner et al.,

2007). Another example is the chromosome IV, which is often duplicated in mec1 haploid null

strains overexpressing RNR1 due to the selective pressure of this gene deletion because the

strains are not viable without the chromosomal duplication (Gasch et al., 2001). Therefore, we

are expecting that the genome integrity gene deletion strains have accumulated mutations and

chromosomal rearrangements due to the decrease in stability of the deletion strains and/or their

inability to repair mutations efficiently.

Many cellular processes involved in genome integrity are highly conserved between

yeast and multicellular organisms including humans so that findings in yeast are potentially

relevant to human diseases such as cancer (Rouse and Jackson, 2002). For example, some

human homologs relating to chromosome instability (CIN) in yeast are known to mutate during

tumor formation and the synthetic lethal genetic interaction of yeast CIN genes was used to kill

cancer cells in human (Yuen et al., 2007). Although extensive studies have been conducted to

uncover the global functional relationships between genes involved in genome integrity, they

are not yet completely understood. Hence, further studies are necessary to improve the global

understanding of the genome integrity network. In this study, we describe the use of high-

throughput sequencing to test by analyzing the entire genomes if the selective pressure of a gene

deletion involved in genome integrity leads to adaptations through natural selection.

7

1.4 Convenience of high-throughput sequencing technology to

analyze the multiple genomes of the yeast strains

Modern (‘next-generation”) sequencing technologies have made genome sequencing and

re-sequencing fast, economical, and reliable. Identification of genomic variations among

populations, de novo assembly of a whole genome, and analyses of gene expression levels

through RNA sequencing are excellent examples to illustrate the achievements of high-

throughput sequencing (Nielsen et al., 2011).

The yeast genome is approximately 12.2 Mb (S288C genome assembly produced by

Saccharomyces Genome Database (SGD)), which is one of the smallest genomes among well-

studied eukaryotic model organisms, while the genome size of human is about 3.2 Gb according

to the Genome Reference Consortium (GRCh37.p11). There are three main advantages for

sequencing a relatively small genome. First, a large number of different samples can be

sequenced at once in parallel through multiplexing at a level that is appropriate to the number of

sequence reads that can be generated for a single sequencing run. Fragments from each strain

are tagged with a corresponding DNA barcode (or combination of barcodes) so that all strains

can be pooled for sequencing in parallel and every DNA sequence read is distinguishable.

Second, similar to the first advantage, deeper coverage can be easily obtained throughout the

genome compared to the organisms that have bigger genome. Sequencing of many samples with

deep coverage allows accurate downstream analyses because high coverage grants confident

base-calling. Thus, this advantage can reduce error and uncertainty associated with the results of

downstream analyses.

The haploid reference genome is publicly available at SGD. A haploid genome can avoid

any ambiguity in base-calling or alignment error arising through heterozygosity in a diploid or

8

multiploid genomes. Therefore, a yeast haploid genome is much simpler to perform the

downstream analyses of large sequencing data and this facilitates the accurate identification of

mutations and structural variations among strains. Thus, given the rapid growth and easy

handling, yeast is an inexpensive and suitable model organism for whole genome re-sequencing.

1.5 High-throughput sequencing helps to reveal suppressor mutations

in strains

Using high-throughput sequencing technology, I was able to identify new mutations and

structural variations such as large deletions and duplications in the deletion strains. Among these,

I was able to identify promising sgs1 suppressor candidates using the computational pipeline

and experimentally proved that the candidate mutations for the top3∆ and rmi1∆ strains were

true suppressors. Although suppressor relationships for these gene pairs were already known

(Chang et al., 2005; Gangloff et al., 1994) , the suppressor mutations identified were novel. This

result shows that high-throughput sequencing of the yeast deletion strains can provide strong

indications about suppressor mutations existing in the strains, in the absence of intentional in

vitro evolution or prior knowledge and mapping of the suppressors. Importantly, the candidate

suppressor mutations can be experimentally confirmed. Our data also sheds some light on the

mutational spectrum of different classes of genome integrity genes. The mutational spectrum

identified in this study shows that the deletion strains generally do not harbor many mutations

especially indels, which would have more deleterious effects on fitness. Moreover, only these

two suppressor genetic interaction (corresponding to three novel suppressor mutations) could be

identified amongst all 48 strains sequenced. Therefore, our results generally support reliability

of the experimental data that have been accumulated using yeast deletion strains.

9

2 Materials and Methods

2.1 Media and Yeast strains

Yeast haploid deletion strains used in this study are listed in Table 2 (Open Biosystems).

Yeast strains were grown in YPD (1% yeast extract, 2% peptone, and 2% glucose with or

without 2% agar) or SC-Leu (0.2% Drop-out mix, 2% glucose and 0.67% yeast nitrogen base

without amino acids with ammonium sulfate when MMS was not used: 0.2% Drop-out mix, 2%

glucose and 0.17% yeast nitrogen base without amino acids and without ammonium sulfate, and

0.1% L-glutamic acid monosodium salt when MMS was used). Sporulation medium is 2% agar,

1% potassium acetate, 0.1% yeast extract, 0.05% glucose, and 0.01% amino acids supplement

(histidine, Leucine, and uracil). Pre-sporulation medium contains 5% glucose, 3% nutrient broth,

1% yeast extract, and 2% agar.

2.2 High-throughput sequencing

S. cerevisiae genomic DNA (gDNA) from 48 strains (Table 2) was sheared to ~300bp

fragments with a Covaris S2 (Covaris, Inc., Waltham, MA) using the manufacturer's guidelines.

Sheared DNA was then end-polished (End-it Repair Mix, Epicentre) and 3'-adenylated (Klenow

exo-, NEB). Illumina-compatible, amplification-free adapters were designed with 8-bp

multiplexed index tags. For each strain, 500ng of library gDNA was then ligated to each of the

index tags (Kozarewa et al., 2009), with two replicates per strain, yielding 98 total samples. The

adapter-ligated libraries were quantified via qPCR (KAPA Library Quantification Kit) and

mixed to ensure even representation of each sample. All sequencing was performed on an

Illumina HiSeq 2000 instrument per the manufacturer’s recommended guidelines.

10

2.3 Computational pipeline for Illumina sequencing data analyses

Sequencing data generated by the Illumina HiSeq platform (101(forward) - 73(reverse)

bp paired-end reads with insert length of 200-400bp) was processed using the computational

pipeline that I developed (Figure 1).

First, the sequencing reads were aligned to the S288C reference genome from SGD

using Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2010). Next, Single Nucleotide

Variants (SNVs) and micro-indels were identified using Sequence Alignment/Map tools (SAM)

tools (Li et al., 2009). Custom shell/Perl scripts were used to perform CNV and amino acid

substitution analyses. CNV was calculated based on coverage for each strain. All 200bp windows

were rescaled so that the median coverage for each strain was 1 and also rescaled so that the median

coverage across the strains was 1. For example, the copy numbers for a single copy gene, a deleted

gene, and a duplicated gene are 1, 0, and 2 in theory.

Additionally, genome tracks for CNVs, actual coverage for each base in the genome, the

genomic regions that have no coverage, structural variations, SNV/micro-indels, and amino acid

substitutions were generated using custom Perl scripts and visualized using ChromoZoom(Table

2) (Pak and Roth, 2013). From the top except the default gene tracks, nine tracks were generated

as follows:

1. Relative CNV - CNV was calculated using the method above and colored in light green.

2. Read depth – This track shows actual read depth for each nucleotide in the genome.

Read depth is colored in dark green.

3. Genomic regions without coverage – Black bands in this track indicate no coverage at

the region, which means no DNA reads were mapped to the region. Thus, the track could

11

indicate possible break points and deletions although sequencing results and mapping

may be biased and simply no data for those regions. Gene names are added besides black

bands when the bands are overlapped with the genes.

4. Structural variants - Track for structural variants detected using Pindel (Version - 0.2.4t)

(Ye et al., 2009). Pindel could detect insertions, deletions, and inversions. Pindel was

able to detect some auxotrophy makers, inversions, and indels although some query gene

deletions were not detected. Detected mutations are described as “reference | alternate

bases (mutation) | a number of reads support the mutation | mutation type (either deletion

(DEL), insertion(INS), inversion(INV), replacement(RPL), or tandem

duplication(DUP) )) | gene name Open Reading Frame (ORF) name – if applicable |”.

Note that reference and alternate bases were omitted when length of a mutation is more

than 100 bp.

5. SNV/micro-indels with the stringent threshold (Q > 17, Read depth >= 5, % high-quality

bases supports an alternate base >= 90 %)

6. SNV/micro-indels with the permissive threshold (Q >= 10, Read depth >= 3, % high-

quality bases supports an alternate base > 50 %)

7. Amino acid substitution/frameshifts caused by SNV/micro-indels with the stringent

threshold

8. Amino acid substitution/frameshifts caused by SNV/micro-indels with the permissive

threshold

9. Mutations identified in parental strains (BY4741/BY4742)

12

Figure 1. Computational workflow for processing Illumina sequencing reads

This diagram describes the computational pipeline to process Illumina sequencing reads. First,

Illumina HiSeq 2000 generated billions of DNA short reads. Then, the reads were aligned to the

reference genome using BWA. Next, SNVs and micro-indels were identified using SAMtools

followed by CNV analysis and amino acid substitution analysis using custom Perl scripts. The

results were visualized with ChromoZoom and Cytoscape.

13

2.4 Simulation of SNV distribution

To investigate whether there are any trends in terms of the SNV distributions detected in

the strains, the distribution of randomly generated SNVs was compared with that of the

observed results. First, genome integrity genes were categorized into 7 groups (All (all SNVs

identified among the strains with the stringent threshold), Replication Fork Progression (RFP),

Junction Resolution, Mismatch Repair, Checkpoint Regulation, DNA Catabolism, and Base

Excision Repair (BER)) based on Gene Ontology (GO) annotations. Then, numbers of each type

SNVs (A to T, C to T, etc) were tallied (Table 1). Next, the same numbers of each type of SNVs

were randomly generated using custom Perl scripts and analyzed using the computational

pipeline. For example, the Mismatch Repair group has 15 G to A mutations. In the simulated

data, 15 G to A mutations were randomly generated between chrI and chrXVI and I repeated

this step for A to T mutations, C to T mutations, and so on. This simulation was performed 1000

times for each category. Based on the simulated data and the observed data, fraction of SNVs in

CoDing Sequence (CDS), fraction of non-synonymous SNVs in CDS, and fraction of

deleterious non-synonymous mutations in CDS were calculated. In addition to that, the 95 %

Confidence Interval (CI) for each category was computed and showed as error bars in Figure 12

– 15. A single p-value for each category was calculated based on the simulated data and

adjusted using the Bonferroni correction.

14

Function Genes Number of SNVs

Base Excision Repair MAG1,NTG1,RAD2,XRS2 15

Checkpoint Regulation CSM3,ESC2,MRC1,SGS1 19

DNA Catabolism MAG1,MSH2,NTG1,RAD1,SGS1 67

Junction Resolution MSH2,RAD5,RMI1,SGS1,TOP3 67

Mismatch Repair MLH2,MSH2,MSH4,RAD1 61

Replication Fork Progression ASF1,CSM3,DIA2,RAD5 37

Table 1. Categories of genome integrity genes for the simulation of SNV distribution

16 genome integrity genes were categorized into 6 groups based on GO annotations. DNA

catabolism, Junction Resolution and Mismatch Repair groups consist of MSH2. This increased

the number of SNVs in the groups compared to the other groups.

15

2.5 Sanger sequencing

Sanger sequencing was used to confirm specific SNV/micro-indels that were identified

by Illumina sequencing and also to validate suppressor mutations of dissected spores. BLAST,

primer3 were used to design specific PCR primers to avoid non-specific amplification. The

remaining purified gDNA for Illumina sequencing was used to amplify the target DNA

fragments for the spot-checking. For the validation of candidate suppressor mutations, colony

PCR was performed to amplify the DNA fragments that contain candidate suppressor mutations

of individual spores. The strain AB972 was used for the negative control for this experiment.

For both cases, PCR products were cleaned up using enzymatic reactions (Exonuclease I and

Antarctic Phosphatase, NEB) and the DNA concentrations were measured using the Quant-It

Picogreen Assay (Invitrogen). The PCR samples were Sanger sequenced by the TCAG DNA

Sequencing Facility.

2.6 Confirmation of candidate suppressor mutations using MoBY-

ORF 2.0 Transformation

Each candidate strain was transformed with the corresponding Molecular Barcoded

Yeast Open Reading Frame (MoBY)-ORF 2.0 plasmids, which contain high copy number of the

wild type allele of the suppressor candidate gene, using standard methods (Magtanong et al.,

2011). Spot assay (1:5 serial dilutions) was performed on SD –Leu plate and incubated at 30 °C.

16

3 Results

3.1 High-throughput sequencing of genome integrity gene deletion

strains and its data analysis detects unique SNVs

48 strains, comprised of either MATa and MATα strains or both for each of 26 DNA

genome integrity gene deletions, were sequenced using the Illumina HiSeq 2000 platform and

analyzed. All of the 48 strains achieved more than 15X coverage after the reads for each strain

were aligned to the reference genome (Figure 2 and Figure 3). The strain S288C from SGD was

used as reference genome. Following the read mapping, SNVs and micro-indels were identified

for each strain and the results were summarized for the 48 strains (Figure 4-6). Any SNVs and

micro-indels found in either BY4741 or BY4742 were removed because many of the deletion

strains were derived from BY4741 and BY4742. In addition, any common SNVs and micro-

indels identified in more than one of the deletion strains were removed. Therefore, SNVs and

micro-indels considered here were unique to the strains.

In total, 235 SNVs were identified with a stringent threshold of Phred score >= 17 (a

log-odds score that measures the likelihood of base-calling error), read depth >= 4 (the number

of reads that align to known reference bases), and also >= 90% of high-quality reads support the

mutations. 242 SNVs were identified with a permissive threshold (but below the stringent

threshold) (Figure 4). The permissive threshold is Phred score of at least 10, read depth of at

least 3 and also more than 50% of high-quality reads support the mutations. On average, 4.9 and

5.1 SNVs were found per strain with the stringent and permissive threshold, respectively.

Notably, of the 235 stringent SNVs identified, 42 SNVs came from the msh2∆ MATα strain,

which lacks Msh2 mismatch repair protein (Z-score 6). Another 6 stringent mutations were

found in the msh2∆ MATa strain (Z-score 0.18).

17

Regarding micro-indels, 40 and 22 micro-insertions and 117 and 48 micro-deletions

were identified with the stringent and permissive threshold. Again, it is notable that

approximately 53 % and 45 % of micro-insertions and 77 % and 85 % of micro-deletions with

the stringent and permissive threshold were found in the msh2∆ strains (Figure 5 and Figure 6).

No micro-insertion or only 1 to 4 micro-insertions were found in each of the strains other than

the msh2∆ MATa and MATα strains. Similarly, no micro-deletions or 1 to 4 micro-deletions

were detected in each of the deletion strains except the msh2∆ strains. The mean numbers of

micro-insertions and deletions identified per strain are 0.74 and 0.67 in total when excluding the

msh2∆ strains. Thus, the high micro-indels rates in msh2∆ strains indicates that the Msh2

protein plays a very important role in repairing or preventing micro-indels given that the

deletion strains were presumably stored and propagated in the same environmental conditions.

Next, 153 out of 235 SNVs (stringent) and 114 out of 242 SNVs (permissive) were

found in CDS regions). Only 2 (stringent) and 3 (permissive) SNVs were detected in intronic

regions. Then, with respect to the SNVs found in CDS, non-synonymous mutations occurring in

each strain were enumerated (Figure 7). 100 and 58 SNVs were found as non-synonymous

mutations with the stringent and permissive threshold. To further narrow down possible

candidate suppressor mutations, deleterious non-synonymous mutations were predicted using

BLOSUM80 score (a log-odds score that indicates the likelihood of a particular amino acid

substituting for another at the corresponding position in a homologous protein) (Henikoff and

Henikoff, 1992), PolyPhen-2 (Adzhubei et al., 2010) and Provean (Choi et al., 2012). Non-

synonymous mutations with negative BLOSUM80 score, start codon, nonsense, and read-

through mutations were considered as deleterious SNVs. Also, non-synonymous mutations

classified as deleterious by PolyPhen-2 and Provean were considered (Figure 8 - 10). PolyPhen-

2 was able to calculate prediction scores for the substitution effects for 132 (~83.5 %) non-

18

synonymous mutations while Provean calculated predicted scores for the substitution effects for

all non-synonymous mutations. As a result, 89, 66, and 53 deleterious SNVs were predicted

with BLOSUM80, PolyPhen-2, and Provean, respectively. This is equivalent to only 1.9, 1.4

and 1.1 deleterious SNVs in total per strain. Among the deleterious SNVs, 30 SNVs were

predicted by the three methods. To investigate if there is a significant difference in terms of a

frequency of deleterious mutations between the unique and common non-synonymous mutations,

PolyPhen-2 was used for 59 common non-synonymous mutations that were detected in the 48

strains. PolyPhen-2 predicted 13 non-synonymous mutations as deleterious out of 51 non-

synonymous mutations (no prediction was made for the remaining 8 non-synonymous

mutations). As 66 out of 132 (50%) and 13 out of 51 (34%) unique and common non-

synonymous mutations were predicted as deleterious, there is no significant difference between

the two groups. (Chi-squared test, P 0.0855). In addition, 12 nonsense mutations (9 with the

stringent threshold and 3 with the permissive threshold) and 1 read through mutation with the

permissive threshold were identified among the strains. This enumeration of point mutations

demonstrates that the genome integrity deletion strains generally only harbor small numbers of

SNVs and a few or no micro-indels that may affect the function of genes and the fitness.

19

Query Strains Genome Tracks Query Deletion Structural Variations

asf1∆ MATα asf1∆ MATα Y chrI duplication

asf1∆MATa asf1∆ MATa Y

csm3∆ MATα csm3∆ MATα

Y

Genes involved in cell wall maintenance

are deleted/duplicated

csm3∆ MATa csm3∆ MATa Y

dia2∆ MATα dia2∆ MATα Y chrI duplication

dia2∆ MATa dia2∆ MATa Y

esc2∆ MATα esc2∆MATα Y chrIII duplication, chrXII partial deletion

mag1∆ MATα mag1∆ MATα Y

mag1∆ MATa mag1∆ MATa Y

mgs1∆ MATα mgs1∆ MATα Y

mgs1∆ MATa mgs1∆ MATa Y

mlh2∆ MATα mlh2∆ MATα Y

mlh2∆ MATa mlh2∆ MATa Y

mrc1∆ MATα mrc1∆ MATα Y

mrc1∆ MATa mrc1∆ MATa Y

msh2∆ MATα msh2∆ MATα Y

msh2∆ MATa msh2∆ MATa Y

msh4∆ MATα msh4∆ MATα Y

msh4∆ MATa msh4∆ MATa Y

mus81∆ MATα mus81∆ MATα Y

mus81∆ MATa mus81∆ MATa Y

nej1∆MATα nej1∆ MATα Y Telomeric region aberration

nej1∆ MATa nej1∆ MATa Y Telomeric region aberration

ntg1∆ MATα ntg1∆ MATα Y

ntg1∆ MATa ntg1∆ MATa Y

pso2∆ MATα pso2∆ MATα Y

pso2∆ MATa pso2∆ MATa Y Telomeric region aberration

rad1∆ MATα rad1∆ MATα Y

rad1∆ MATa rad1∆ MATa Y

rad2∆ MATα rad2∆ MATα Y Partial chrIV duplication (1182k-2182k)

rad2∆ MATa rad2∆ MATa Y Partial chrIV duplication (8840k-9820k)



rad52∆ MATa rad52∆ MATa Y

rad5∆MATα rad5∆ MATα Y Telomeric region aberration

rad5∆ MATa rad5∆ MATa Y Telomeric region aberration

rmi1∆ MATα rmi1∆ MATα

Y

Partial chrIV duplication (1100k-1154k,

1163k-1205k)

rmi1∆ MATa rmi1∆ MATa Y

sgs1∆ MATα sgs1∆ MATα Contamination?

sgs1∆ MATa sgs1∆ MATa Y

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/ASF1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/ASF1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/CSM3-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/CSM3-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/DIA2-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/DIA2-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/ESC2-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MAG1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MAG1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MGS1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MGS1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MLH2-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MLH2-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MRC1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MRC1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MSH2-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MSH2-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MSH4-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MSH4-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MUS81-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/MUS81-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/NEJ1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/NEJ1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/NTG1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/NTG1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/PSO2-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/PSO2-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/RAD1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/RAD1-a.txt








http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/RMI1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/RMI1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/SGS1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/SGS1-a.txt

20

shu1∆ MATα shu1∆ MATα Y

shu1∆ MATa shu1∆ MATa Y

slx5∆ MATa slx5∆ MATa Y

tdp1∆ MATα tdp1∆ MATα Y

tdp1∆ MATa tdp1∆ MATa Y

top3∆ MATα top3∆ MATα Y

top3∆ MATa top3∆ MATa Y

xrs2∆ MATa xrs2∆ MATa Y

Table 2. A list of query strains and their notable structural variations

All the strains sequenced using the Illumina HiSeq platform are listed. Their CNVs,

SNV/micro-indels, and non-synonymous mutations were visualized using ChromoZoom and the

hyperlinks are available in the second column. The third column indicates whether the query

deletion was confirmed based on the mapped reads. I was able to identify structural variations in

the strains and notable examples are listed in the fourth column.

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/SHU1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/SHU1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/SLX5-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/TDP1-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/TDP1-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/TOP3-alpha.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/TOP3-a.txt

http://chromozoom.org/?db=sacCer3&customTracks=http://dalai.mshri.on.ca/~galaxy/ucsc_tracks/GI_strains/XRS2-a.txt

21

Figure 2. Median coverage for each of the deletion strains

The median read depths across all genomic locations in chromosome1 to 16 for the 48

deletion strains were calculated and plotted in the figure. The minimum, average, and

maximum median coverage among the strains are 17, 26.8, and 54, respectively. The

deletion strains listed in the x-axis are in alphabetical order.

22

Figure 3. Fraction of genome covered at more than certain read depth

The fraction of genome covered at read depth of equal to or more than 10, 15, 20 and 25

were calculated and visualized for the 48 deletion strains using boxplot. For most of the

deletion strains, nearly 100 % of the genomic locations had more than 10X. About 25 % of

the strains achieved 25X for approximately 70 to 100 % of the genomic locations.

23

Figure 4. Numbers of total SNVs for each deletion strain

The number of SNVs was enumerated and plotted for each strain. 42 SNVs were found with the

stringent threshold in the msh2∆ MATα, which lacked the mismatch repair gene Msh2. On

average, approximately 10 SNVs were detected in the strains, as indicated by the black solid

line.

24

Figure 5. Numbers of total micro-insertions for each deletion strain

The number of micro-insertions was enumerated and plotted for each strain. In total 62 unique

micro-insertions were identified in the strains. Among them, 18 and 7 micro-insertions with the

stringent and permissive threshold were identified in the msh2∆ MATα strain. Also, 3 micro-

insertions passed both permissive and stringent thresholds in the msh2∆ MATa strain.

25

Figure 6. Numbers of total micro-deletions for each deletion strain

The number of micro-deletions was counted and plotted for each strain. In total 165 micro-

deletions were identified among the strains. 17 and 19 micro-deletions with the stringent and

permissive threshold were identified in the msh2∆ MATa strain. Also, 73 and 22 micro-deletions

were found in the msh2∆ MATα strain as the same manner.

26

Figure 7. Numbers of non-synonymous mutations for each deletion strain

The number of non-synonymous mutations was plotted for each strain. 100 and 58 SNVs were

detected with the stringent and permissive cut-off. On average, 2.1 and 1.2 SNVs were detected

with the stringent and permissive cut-off, respectively.

27

Figure 8. Numbers of deleterious non-synonymous mutations using BLOSUM80 matrix

The number of non-synonymous mutations with negative BLSOUM80 score was plotted for

each strain. In this graph, deleterious is defined as a mutation that is non-synonymous and has a

negative BLOSUM80 score. 63 and 39 deleterious mutations were detected with the stringent

and permissive cut-off, respectively. On average, only 1.31 and 0.81 deleterious non-

synonymous mutations were detected with the stringent and permissive cut-off, respectively.

28

Figure 9. Numbers of deleterious non-synonymous mutations using PolyPhen-2

The number of deleterious non-synonymous mutations identified using PolyPhen-2 was plotted

for each strain. In this graph, deleterious is defined as a mutation that has PolyPhen-2 score

above 0.5. The score indicates classifier probability of the variation being damaging. 46 and 32

deleterious mutations were detected with the stringent and permissive cut-off, respectively. On

average, only 0.96 and 0.67 deleterious non-synonymous mutations were detected with the

stringent and permissive cut-off, respectively.

29

Figure 10. Numbers of deleterious non-synonymous mutations using Provean

The number of deleterious non-synonymous mutations identified using Provean was plotted for

each strain. In this graph, deleterious is defined as a mutation that has PolyPhen-2 score below -

2.5. 38 and 26 deleterious mutations were detected with the stringent and permissive threshold,

respectively. On average, only 0.80 and 0.54 deleterious non-synonymous mutations were

detected with the stringent and permissive threshold, respectively.

30

Figure 11. Correlation of deleterious SNVs between BLOSUM80, PolyPhen2, and Provean

(A) Scores for BLOSUM80 and PolyPhen2 (PPH), (B) PPH2 and Provean and (C) BLOSUM80

and Provean were plotted, respectively. The shaded areas indicate that the SNVs in the area are

predicted as deleterious in both scores (BLOSUM80 <= -1, PPH2 > 0.5 and Provean < -2.5).

The correlation coefficient for (A) is r = -0.26, (B) is r = -0.70 and (C) is r = 0.33. (D) A Venn

diagram showing numbers of deleterious mutations detected by each method. 30 SNVs were

predicted as deleterious SNVs by the three methods.

31

3.2 CNV analysis detected large duplications and telomere aberration

in the deletion strains

CNV analysis, calculated based on coverage for each strain, is very informative to

validate the query gene deletions and to detect duplications. CNV analysis indicates that the

asf1∆ MATα, dia2∆ MATα, and esc2∆ MATα strains have a whole chromosomal duplication,

and that the rad2∆ MATa, rad2∆ MATα and rmi1∆ MATα strains have a partial chromosomal

duplication (Table 2). The rad2∆ strains showed a large chromosomal duplication in

chromosome IV both in MATa and MATα. In both cases, the duplicated DNA sequences were

flanked by direct repeats. In the rad2∆ MATa, the duplicated DNA region is approximately 1000

kb and flanked by Long Terminal Repeat retrotransposons. The rad2∆ MATa also showed a ~

1000kb DNA sequence duplication flanked by RPL35A and RPL35B, which encode the large

(60S) ribosomal subunits.

All the auxotrophy markers that were used in the deletion strains (his3∆, ura3∆, leu2∆,

met17∆, and lys2∆) were detected. All of the query deletions were also confirmed except the

sgs1∆ MATα strain. SGS1 in the sgs1∆ MATα strain was evenly covered but the CNV is

approximately 0.5, indicating the sample might have been mixed with other samples.

In addition, CNV tracks in some strains such as nej1∆ and rad5∆ strains show spikes at

the telomere regions and this may indicate that the telomeric regions were lost their stability and

elongated due to the gene deletion. Nej1, a regulator protein of non-homologous end joining, is

involved in telomere maintenance and it has been reported that deleting nej1 results in telomere

repeat extension (Liti and Louis, 2003). Rad5 is a DNA helicase possibly involved in post

replication repair and it has been also reported that a rad5∆ strain has longer telomere region

(Gatbonton et al., 2006; Unk et al., 2010). This is consistent with the increase of the number of

32

mapped reads in the telomere regions and may indicate a possible telomere maintenance

function for RAD5. Therefore, the sequencing data was likely to capture this phenomenon and

these demonstrate the usefulness of high-throughput sequencing technology to study telomere

length alternation.

3.3 A sgs1 44bp deletion was detected in the rmi1∆ MATα strain

through SV analyses

In order to detect medium to large size deletions, I searched through the coverage data

files computationally and detected chromosome regions where coverage is zero for each strain.

Also, Pindel was used to detect possible structural variants including deletions, insertions,

inversions and tandem duplications (Ye et al., 2009). As a result, candidate medium - large size

deletions (Table 3), the auxotrophy markers and query deletions were identified by the read

depth analysis. Pindel also detected 7 insertions and 6 deletions whose length are more than 5 bp

among the strains excluding common mutations although it failed to detect some known large

deletions such as the query deletions. Notably, both analyses found a 44bp deletion mutation

(642434 – 642477 in ChrXIII) in sgs1 in the rmi1∆ MATα strain and confirmed by PCR. The

breakpoint was determined by de novo assembling the sequence reads using ABySS (Simpson et

al., 2009) and Pindel. This deletion results in a frameshift mutation in the SGS1 ORF, resulting

in the introduction of a premature stop codon. Thus, the deletion is a promising suppressor

candidate because sgs1 loss of function mutation is known to suppress the slow growth of rmi1∆

strains (Chang et al., 2005).

33

Strain ORF Gene

Non-covered

bases

PCR

Confirmation

rmi1∆ MATα YMR190C SGS1 47 Y

dia2∆ MATα YFR016C YFR016C 56 Not performed

top3∆ MATα YHR214C-E YHR214C-E 71 Not performed

esc2∆ MATα YLR157C-C YLR157C-C 132 Not performed

rmi1∆ MATa YLR159C-A YLR159C-A 132 Not performed

csm3∆ MATα YIR019C MUC1 171 Not performed

top3∆ MATα YHR216W IMD2 1400 Not performed

csm3∆ MATα YHR211W FLO5 2240 Not performed

Table 3. A list of candidate small – large deletions

Based on read depth, possible deletions were identified among the strains. Interestingly, the

rmi1∆ MATα contains a deletion in SGS1. This deletion was detected by both Pindel and Abyss

and was confirmed by PCR.

34

3.4 Attempt to validate promising suppressor candidates

Sanger sequencing was performed to validate SNVs and micro-indels detected using the

computational pipeline (Table 4). I designed specific PCR primers to avoid non-specific

amplification by means of BLAST, and primer3. 12 SNVs and 1 micro-deletion that passed the

stringent threshold were tested. All the point mutations were identified in the deletion strains

while none of them were detected in the control strain. 5 SNVs that only passed the permissive

threshold were tested and one of them showed ambiguity. The ambiguous C to A mutation was

found at position 642939 in chromosome XIII in the top3∆ MATa strain, where C is the

reference base and A is the mutated base. The chromatograph data for the Sanger sequencing

result showed spikes of A and C at the position. In the high-throughput sequencing data, 12

high-quality reads supported the mutated base while 4 high-quality reads supported the

reference base at the corresponding position. The two sequencing attempts were consistent,

suggesting that two cell populations were mixed in the sample. The other 4 SNVs were false

positives without ambiguity, indicating miscalling of bases in the Illumina data. In addition, 3

SNVs and 1 micro-insertion below the permissive threshold were tested and all of them were

false positives.

35

Strain Threshold Chr Position Ref Alt PCR

csm3∆ MATα stringent chr11 611659 T G TP

dia2∆ MATα stringent chr02 508369 A G TP

dia2∆ MATα stringent chr04 378336 T A TP

mlh2∆ MATα stringent chr13 151483 G C TP

msh2∆ MATα stringent chr11 543179 G A TP

rad30∆ MATα stringent chr04 818841 C A TP

rad52∆ MATα stringent chr04 1247936 C T TP

rad52∆ MATα stringent chr05 530552 C A TP

rmi1∆ MATα stringent chr13 642118 G T TP

shu1∆ MATα stringent chr11 407144 A T TP

tdp1∆ MATα stringent chr15 604080 C T TP

top3∆ MATα stringent chr05 478786 C A TP

top3∆ MATα stringent chr13 641702 ATTTTT ATTTT TP

csm3∆ MATα permissive chr07 180858 A C FP

esc2∆ MATα permissive chr11 392009 T G FP

mgs1∆ MATα permissive chr07 28243 C G FP

rad52∆ MATα permissive chr11 176265 G C FP

top3∆ MATα permissive chr13 642939 C A Ambiguous SNV

esc2∆ MATα below permissive chr15 554479 G T FP

mag1∆ MATα below permissive chr08 333427 G A FP

mag1∆ MATα below permissive chr14 48787 CGG CGGG FP

mgs1∆ MATα below permissive chr07 181125 G A FP

Table 4. List of SNVs and micro-indels tested using Sanger sequencing

12 SNVs and 1 micro-deletion that passed the stringent cut-off were tested by Sanger

sequencing and all of them were confirmed as true positives (TP). 5 SNVs that only passed the

permissive cut-off were tested. Only one of them showed ambiguity in the chromatographic data,

suggesting that two types of cells having different genotypes are mixed in the sample. The other

four SNVs were confirmed as false positives (FP) due to homo-polymer error. Another 3 SNVs

and one micro-insertion that were below the permissive threshold were tested and confirmed as

FP.

36

3.5 Comparison of observed SNVs distribution with that of simulated

distribution

To explore the question of whether the SNVs identified in the deletion strains were

neutral or selected, the distribution of the SNVs in the genomes was compared with that of

randomly distributed SNVs. In addition to all the 235 SNVs identified in the strains, 16 genome

integrity genes were categorized into 6 groups based on gene function to investigate if gene

function affects selection bias (Table 1). Also, only SNVs that passed the stringent threshold

were considered because the SNVs that only passed the permissive threshold might have a high

false positive rate. The total numbers of SNVs in the groups are 15, 19, 67, 67, 61, and 37,

respectively (Table 1).

First, the ratio of transition to transversion for each group was calculated (Figure 12).

The groups that are involved in DNA catabolism, junction resolution, and mismatch repair had

lower transition to transversion rates compared to the other groups although the three groups had

transition/transversion rate very close to a reported value (0.62) by Lynch et al, which is

indicated by the dashed line (Lynch et al., 2008). In contrast, the transition/transversion rate for

all the strains is 1.18, which is approximately twice as much as the reported value. Interestingly,

the observed ratios for BER and checkpoint regulation were appeared shifted from the other

groups although the CIs were very broad and may not be statistically significant after multiple

hypothesis testing.

Next, the fraction of SNVs detected in CDS regions was calculated for each category

and the simulated data (Figure 13 and Figure 14). Interestingly, the observed fraction of SNVs

in coding regions was significantly lower than that observed in simulations (P 0.014). For the

divided groups based on their gene functions, although the observed data were lower than the

37

95% of CI of the simulated data in the other five groups except the RFP groups, the CIs for

observed data were too broad and the small number of SNVs limited our statistical power to

observe a significant difference between the simulated data and observed data after the

Bonferroni correction (Figure 13 and Figure 15). However, there is no statistical significance

between the observed and simulated data for the groups categorized by gene functions after the

Bonferroni correction. These results indicate that the distribution of mutations might have been

biased by selective pressure. This may be because the non-synonymous mutations are likely to

have more deleterious effects on the fitness and cells that harbor such mutations are removed

from the population. However, as to the categorized groups we did not observe statistical

significance to conclude this hypothesis when using the Bonferroni correction. To further

investigate this hypothesis, the fraction of deleterious mutations based on BLOSUM80 scores

was calculated (Figure 16). As the overall tendency, the fraction of deleterious mutations was

slightly lower compared to that of the simulated data although the small number of SNVs

limited the statistical power to observe a significant difference between the simulated data and

observed data. PolyPhen-2 and Provean were not used for the simulation due to the

computational load. These results show that selection biases might have affected the distribution

of SNVs in the genome integrity gene deletion strains at the non-synonymous mutation level.

38

Figure 12. Ratio of transitions to transversions in the seven groups

The transitions to transversions ratio for the seven groups were calculated. The error bars show

the 95 % CI for the ratio. The dotted line indicates the reported transitions to transversions ratio

(0.62). The ratios for the genes in the DNA catabolism, Junction Resolution, and Mismatch

Repair are slightly lower than that of the reported value. The ratios for BER and Checkpoint

Regulation are approximately 2.5 although the CIs were very broad. In total, the ratio is 1.18,

which is approximately twice as the reported value.

39

Figure 13. Fraction of SNVs and non-synonymous mutations in CDS are significantly low

compared to the simulated data

The fraction of SNVs and non-synonymous mutations in CDS for observed and simulated data

(n=1000) were plotted, respectively. (A) The fraction of SNVs in CDS that were detected in the

48 strains was 0.65 (blue vertical line), significantly lower than the fraction observed in

simulated data (P 0.014). (B) The observed fraction of non-synonymous mutations in CDS

was 0.69, which was significantly lower than the simulated data (P 0.018).

40

Figure 14. Fractions of SNVs in CDS in seven groups

The fraction of SNVs in CDS was calculated for the seven groups. The blue points and error

bars are for the observed data while the red points and error bars are for the mean of the

simulated data for 1000 times. The error bars show their 95% CI. The simulated data was

generated by randomly distributing the mutations that were observed. Thus, the numbers of each

type of mutation are the same between the observed and simulated data. The p–value of the

group containing all the SNVs (All) was 0.014, which is indicated by *. Thus, the fraction of

SNVs in CDS was significantly lower than that of the simulated data. No statistical significance

was observed for the six categorized groups based on gene functions after the Bonferroni

correction.

41

Figure 15. Fraction of non-synonymous mutations in CDS

The fraction of non-synonymous mutations in CDS was calculated for the seven groups. The

blue error bars for the observed data was calculated based on the proportion of non-synonymous

mutations while the red error bars for the simulated data was 95 % CI of the mean of the

simulated data for 1000 times. The fraction of non-synonymous mutations in CDS for the 48

strains was 0.69. This is significantly lower than that of the simulated data (P 0.018), which is

indicated by *. No statistical significance was observed for the six categorized groups based on

gene functions after the Bonferroni correction.

42

Figure 16. Fraction of deleterious SNVs in CDS

The fraction of deleterious mutations (BLOSUM80 score =< -1) in CDS was calculated for the

seven groups. The blue points and error bars are for the observed data while the red points and

error bars are for the simulated data. The error bars show their 95% CI. The CI for the observed

data was broad due to the number of SNVs. Thus, it is not possible to conclude that there is

selection bias in the current dataset.

43

3.6 The SNV-deletion strains network illuminated sgs1 mutations as

promising suppressor candidates

To find promising suppressor candidates, relationships between the deletion strains and

SNVs/micro-indels were visualized using Cytoscape (Figure 17) (Smoot et al., 2011). The

deleterious SNVs predicted by BLOSUM80, PolyPhen-2 and Provean, nonsense mutations and

frameshift mutations were visualized in the network. Also, I integrated genetic and physical

interaction data from the Biogrid database and GO slim annotations into the network.

Mutations in sgs1 within the top3∆ and rmi1∆ strains appear to be promising suppressor

candidates because the suppressor genetic interactions had been reported (Chang et al., 2005;

Gangloff et al., 1994). Although the gene relationships were known, the specific mutations

identified here had not been reported as suppressors. The pipeline detected a 44 bp (642434 –

642477 in ChrXIII) deletion in sgs1 that caused a truncation mutation due to a frameshift in the

rmi1∆ MATα deletion strain. Also, a nonsense mutation (G to T at position 642118) was

detected in sgs1 in the rmi1∆ MATa strain. Similarly, a T base deletion of one of five Ts at

position 641703 – 641707, which results in Sgs1 protein truncation due to a premature stop

codon, and a non-synonymous C to A mutation at position 642939, which causes a deleterious

amino acid substitution, were observed in the top3∆ MATa and MATα strains, respectively.

These mutations were all confirmed by PCR or Sanger sequencing although the mutation in the

top3∆ MATa strains showed ambiguity. This indicates that sgs1 mutation is a highly effective

suppressor to increase fitness in the rmi1∆ and top3∆ strains because both of the rmi1∆ and

top3∆ MATa and MATα strains have unique mutations in sgs1. The deletion of RAD51, RAD52,

RAD54 or RAD55 in the top3∆ strain are also known to increase the fitness but the doubling

time for sgs1∆ top3∆ are the shortest among the double deletion strains (Shor et al., 2002). The

44

high frequency of sgs1 mutations in these strains has been previously reported as well,

supporting the results in this study (Chang et al., 2005; Gangloff et al., 1994).

There were 18 additional interesting mutations that we identified as potential suppressors

based on the genetic interactions and the biological process (Table 5). For example, the dia2∆

MATα strain has a point mutation in mec1, which is an essential protein kinase. It has been

shown that triple deletion of DIA2, MEC1 and SML1 causes synthetic lethality in yeast (Blake et

al., 2006). As another example, a point mutation in mus81 in the rad52∆ MATα strain was

identified. The two genes have a negative genetic interaction (Pan et al., 2006). Interestingly,

deletion of RAD52 rescues the lethality of a mus81/sgs1 double mutant (Ii et al., 2011). The

presence of these two candidate suppressor mutations were confirmed using Sanger sequencing.

45

46

Figure 17. A network describing the relationship between the deletion strains and

SNV/micro-indels

This network shows that the relationship between the deletion strains and the genes that have

SNV/micro-indels. The light-blue circles indicate the deletion strains and the MATa and MATα

strains are placed side by side. The green square boxes indicate genes that have SNV/micro-

indel. The orange diamond boxes indicate essential genes that have SNV/micro-indels. Also, the

edge line width corresponds to a sum of the predictions made as deleterious mutation by

BLOSUM80, PolyPhen-2 and Provean. Therefore, the widths are either one, two or three.

Frameshift mutations and non-sense mutations have the width of three. There are two cases

where different deletion strains have point mutations in the same gene. Such genes are placed in

the middle of the figure. SGS1 acquired point mutations both in the top3∆ MATa and MATα

strains and also the rmi1∆ MATa strain.

47

3.7 MoBY-ORF 2.0 plasmid transformation for validation of the

candidate suppressors

MoBY-ORF 2.0 is a high-copy plasmid library in which each plasmid carries a single

yeast ORF with its upstream and downstream sequences (Magtanong et al., 2011). To test

whether or not specific candidate mutations were in fact suppressors, we transformed MoBY-

ORF 2.0 plasmids, which presumably contain the wild type allele, into the candidate suppressor

strains that have the candidate suppressor mutations.. If we observe reduction of the growth rate

in the strains, we would conclude that the candidate mutation is a recessive suppressor. As a

known example, the growth defects in tps1∆ cells can be suppressed by a loss of function

mutation in the hxk2 gene (Hohmann et al., 1993). Transformation of the tps1∆ strain with a

plasmid carrying wild type HXK2 causes lethality while transformation of the plasmid to the

wild type does not affect the growth rate. I used this set as my positive control for this

experiment. I performed a spot assay to see whether the transformants showed a difference in

growth rate when they were grown on YPD and YPD + methylmethanesulfonate (MMS) (0.02

and 0.0025%). I was not able to observe any obvious difference in the growth rate between the

transformants and controls (Table 5). One possible explanation for this result is that the

mutations tested were not suppressor mutations or very weak suppressors. Also, it is possible

that the MoBY-ORF 2.0 plasmids that I utilized did not contain the wild type alleles. Another

biological explanation is that the suppressor mutations were dominant thus over-expression of

wild type gene, which was recessive in this case, did not affect the fitness of the strains.

48

Index Strain

Candi

date

gene

Candidate

ORF Threshold

Sanger

Confirmation

MoBY-

ORF 2.0

Tetrad

analysis Reason to choose as candidate suppressor

1 csm3 MATα SRL3 YKR091W Stringent Y Y Y Multiple overlapping genes genetically/physically interact

with both SRL3 and CSM3

2 dia2 MATα SIR2 YDL042C Stringent Y Y Y Multiple overlapping genes genetically/physically interact

with both DIA2 and SIR2

3 dia2 MATα MEC1 YBR136W Stringent Y N/A Y dia2 mec1 sml1 triple mutant is inviable, MEC1 is an

essential gene

4 mlh2 MATa PIF1 YML061C Stringent Y Y Y NAB2 physically interacts with both MLH2 and PIF1

5 msh2 MATα DYN1 YKR054C Stringent Y N/A LSR

Multiple overlapping genes genetically/physically interact

with both MSH2 and DYN1:both involved in chromosome

movement

6 rad52

MATa MUS81 YDR386W Stringent Y Y Y

rad52∆mus81∆ - synthetic growth defect

7 rad52

MATα BRR2 YER172C Stringent Y Y Y

Multiple overlapping genes genetically/physically interact

with both RAD52 and BRR2

8 shu1 MATa HCS1 YKL017C Stringent Y N/A Y SHU1 is involved in post-replication repair: HCS1 is a

DNA helicase

9 tdp1 MATα ELG1 YOR144C Stringent Y Y Y SMT3 physically interacts with both ELG1 and TDP1

10 top3 MATα BEM2 YER155C Stringent Y N/A Y Multiple overlapping genes genetically/physically interact

with both TOP3 and BEM2

11 asf1 MATα IRA1 YBR140C Stringent Not performed Y LSR IRA1 is an essential gene and truncated

12 dia2 MATα CST6 YIL036W Stringent Not performed Y Y rad53∆dia2∆ - synthetic lethality (by RSA):RAD53

physically interacts with CST6

13 dia2 MATα SYC1 YOR179C Stringent Not performed Y Y Multiple overlapping genes genetically interact with DIA2

and SYC1

14 mgs1 MATα VPS34 YLR240W Stringent Not performed Y Y Multiple overlapping genes genetically interact with MGS1

and VPS34

15 msh4 MATa ATG19 YOL082W Stringent Not performed Y Y NAB2 physically interacts with both MSH4 and ATG19

16 ntg1MATα GPB1 YOR371C Stringent Not performed Y Y GPB1 is truncated

17 rad1 MATα RAS2 YNL098C Stringent Not performed Y Y Multiple overlapping genes genetically/physically interact

with both DIA2 and RAS2

18 xrs2 MATa GET3 YDL100C Stringent Not performed Y Y XRS2 has a negative genetic interaction with GET3

49

Table 5. A list of candidate suppressor mutations

The candidate suppressors were chosen based on the genetic/physical interactions, biological

processes, BLOSUM80 score and essentiality. For MEC1, DYN1, HCS1, and BEM2, they are

not in the MoBY-ORF 2.0 library and indicated as N/A. Regarding tetrad analysis, the msh2∆

MATα and asf1∆ MATα strains showed low sporulation rate (LSR).

50

3.8 Tetrad analysis validated sgs1 suppressor mutations

I performed tetrad analysis for the deletion strains that have suppressor candidates (Table

5). The strains were backcrossed to the parental strain and sporulated. I dissected 10 tetrads for

each strain except in strains that showed low sporulation efficiency (Table 6). Spores were then

incubated on YPD + G418 in order to select the strains that have the KanMX deletion marker.

When a suppressor mutation exists, the possible outcome is two small colonies (parental ditype),

two large colonies (non-parental ditype), or one large and one small colony (tetra type). Only

the top3∆ and rmi1∆ MATa and MATα strains showed all three combinations (Figure 18). In

order to validate the sgs1 candidate suppressor mutations, I showed that that the colony size

corresponded to the existence of the candidate suppressor mutation using colony PCR and

Sanger sequence. I tested 10 colonies (picked one colony from one tetrad) for each strain. For

the rmi1∆ MATa strain, colony PCR showed that the large colonies have the ~ 40 bp deletion as

identified by high-throughput sequencing while the small colonies do not. For the top3∆ MATa,

and the rmi1∆ MATa strains, Sanger sequencing of the PCR products showed that all the large

colonies have the candidate suppressor mutations while all the small colonies do not. The p-

values (adjusted by the Bonferroni correction) for candidate suppressors are below 0.008 (Table

6) using the binomial distribution because the probability of having the mutation is 0.5 for each

colony when it is at random. With respect to the strains that did not show obvious growth

difference between colonies, I plated the colonies on YPD + MMS (0.0025% and 0.02%) and

SC or performed spot assay. As a result, there were no obvious differences in terms of the

growth rate. Using tetrad analysis and Sanger sequencing, the three sgs1 candidate suppressor

mutations were validated as real suppressors.

51

52

Figure 18. Validation of the sgs1 suppressor mutations in top3∆ and rmi1∆ strains

A diploid heterozygous for top3 and sgs1, and rmi1 and sgs1 for both mating types was

dissected. The spores are incubated for 3 days on YPD + G418 to select the spores that have the

KanMX deletion marker. Two large colonies (parental ditype), two small colonies (non-parental

ditype), and one large and one small colony (tetra type) were observed. One of two colonies for

each tetrad was picked to validate the candidate suppressor mutations by PCR and Sanger

sequencing. The white rectangle indicates that the Sanger sequencing or PCR detects the

suppressor mutation while the red circle denotes that the colony has the wild type DNA

sequence.

53

Strains

Suppressor

Gene

# Sanger

successes

(# tested

colonies)

#

Observed

successes P-val

#

candidates

P-adj

(Bonferroni)

top3∆ MATa SGS1 10 4 Not Significant 4 Not Significant

top3∆ MATα SGS1 9(10) 9 0.00195 4 0.00781

rmi1∆ MATa SGS1 10(10) 10 0.00098 4 0.00391

rmi1∆ MATα SGS1 10(10) 10 0.00098 4 0.00391

Table 6. The results of validations for the sgs1 suppressor mutations using Sanger sequencing in the top3∆ and rmi1∆ strains

Ten samples for each strain were Sanger sequenced and p-values for the four suppressor candidates were calculated based on the

observed successes. The number of observed successes is defined as the sum of normal growing colonies that have the sgs1

suppressor and the number of slow growing colonies that do not have the sgs1 suppressor. As there are four candidate suppressors,

the p-values are adjusted using Dunn-Sidak and Bonferroni correction. The p-values for the suppressors identified in the top3∆

MATα, rmi1∆ MATa, and rmi1∆ MATα strains showed statistical significance.

54

4 Discussion

4.1 Mutation spectrum in the genome integrity deletion strains

In this study, 48 genome integrity gene deletion strains were sequenced and analyzed.

The analysis pipeline detected various types of mutations including SNVs, micro-indels,

deletions, and duplications in the strains. According to the results of assessing point mutations

using Sanger sequencing, the SNVs that passed the stringent threshold are likely to be reliable

because all of the SNVs tested by Sanger were confirmed as true positives. In contrast, the

SNVs that only passed the permissive threshold might often be false positives as we confirmed

that 4 out of 5 SNVs were not present by Sanger. This is because this sequencing data contains

some homopolymer miscalling errors in GC rich region. However, the permissive threshold was

able to capture an ambiguous SNV in the top3∆ MATα strain, which was detected in both high-

throughput and Sanger sequencing. One potential explanation for the ambiguity is that the strain

was mixed with other samples but this is not likely as the query top3 deletion reads were clean.

If the sample was contaminated with other samples, the query deletion gene should have been

covered with the reads that were derived from other samples. Another potential source of error

is that both sequencing methods mis-called the base. Another biological explanation is that there

may be two populations within a single deletion strain, one of the populations harbor the SNV.

The ambiguous SNV was found in sgs1 in the top3∆ MATα strain and the mutated base caused a

deleterious amino acid substitution. It was reported that a loss of function mutation in sgs1 can

suppress the slow growth of top3∆ strain and, therefore, it is possible that the new population

with the sgs1 mutation was emerging and was ‘caught in the act’ of taking over the slow

growing population in the top3∆ MATα strain.

55

The msh2∆ MATa strain contains many SNVs and the msh2∆ MATa and MATα strains

had elevated mutation frequency in micro-insertions and especially micro-deletions. Msh2

forms a mismatch repair protein heterodimer and plays a key role in repairing mismatched DNA

bases in yeast (Earley and Crouse, 1998; Pochart et al., 1997). Therefore, lacking Msh2 protein

can cause high mutation rate in the genome as the sequencing data suggests. However, it is not

clear why msh2∆ MATa and MATα strains have a difference in the numbers of SNVs and

micro–indels. Reasons may be that they were subjected to different laboratory and experimental

conditions (e.g. temperature, media) or a number of generations passed were different each other.

Also, it is possible that one or more of the mutated genes in the msh2∆ MATα strain play an

important role in DNA repair mechanism(s), resulting in an increase in spontaneous mutation

rate in the strain. One candidate gene is SWI1 which regulates transcription by remodeling

chromosomes. Another potential explanation is that suppressor mutations have occurred in the

msh2∆ MATa strain, masking or alleviating the effects of the gene deletion.

The sequencing data analysis provides strong indications of sgs1 suppressors in the

top3∆ and rmi1∆ strains. Three of the four sgs1 suppressors were validated using tetrad analysis

and Sanger sequencing. Although the suppressor interactions were known, the mutations

identified in this study have not been reported. Therefore, this study showed that high-

throughput sequencing technologies can identify suppressor mutations with proper downstream

analysis. However when my analyses were broadened to look for novel suppressor interactions,

of the 18 mutations that were chosen as candidate suppressor mutations, none were confirmed as

true suppressors. There are multiple possible explanations for the results. First, it is possible that

the mutations selected here are just unselected (presumably neutral) mutations. As deleting one

gene could inactivate a protein complex or downstream pathway, some of the genes that are

related to the deleted gene may have been released from maintaining selection by removal of

56

the deleted gene. As a result, mutations in these genes may be neutral in the strain although the

same mutations are deleterious in another strain. It is also conceivable that the selected

mutations are potentiating mutations, which on their own do not increase the fitness but are

required to increase fitness in combination with another mutation. Another possibility is that the

mutations are indeed suppressors but the effect of the mutations on growth rate is too subtle for

us to detect. Alternatively, the mutations could suppress the defects of other biological processes

other than the growth rate. Survival rate in cold temperature over long periods may be one of the

examples because the deletion strains were stored at 4 degrees in a cold room at various times.

4.2 Potential Molecular mechanisms of the validated sgs1 suppressor

mutations

The sequencing data shows that the top3∆ and rmi1∆ strains in both mating type had

mutations in Sgs1. The results of tetrad analysis and PCR confirmation demonstrate that the

SGS1 mutations are true suppressors except the top3∆ MATa strain. SGS1 (slow growth

suppressor) encodes a 3´ – 5´ DNA helicase and was discovered as a suppressor of the growth

defects in top3∆ and rmi1∆ mutants (Chang et al., 2005; Gangloff et al., 1994). Top3 is a type

IA topoisomerase that unlinks hemi-catenane structures formed as a result of converging double

Holliday junctions (dHJ) (Wang, 2002). Rmi1 stimulates the activity of Top3-Sgs1 complex.

Sgs1, Top3, and Rmi1 are likely to form a protein complex and are involved in the dHJ

resolution pathways. In the pathway, it has been shown in vitro that Sgs1 proteins catalyze HJ

migration and Top3 proteins unlink the converged dHJ. In addition, it was demonstrated that

dHJs were not processed without Sgs1 proteins in mitosis in vivo (Bzymek et al., 2010). Also,

Rmi1 is required to disentangle the converged dHJ when the concentrations of Top3 and Sgs1

are low in vitro (Cejka et al., 2010). Based on the function of the proteins, in the top3∆ strain

Sgs1 protein may only catalyze HJ migration but cannot fully unlink the dHJs. The converged

57

HJs may be the toxic intermediates that affect chromosome segregation. In the rmi1∆ strain, the

toxic converged dHJ may be accumulated in the same manner in the top3∆ strain because Top3-

Sgs1 protein complex may not fully dissolve the converged HJs without Rmi1 protein.

Therefore, the model of the sgs1 suppression is that SGS1 loss of function mutation in the top3∆

and rmi1∆ deletion strains prevents the convergence of double Holliday junctions, thereby

preventing the accumulation of hemi-catenane structures. The dHJs can thus be processed by

alternative fourway junction resolution proteins such as Mus81-Mms4 or Yen1 (Cejka et al.,

2010). The mutations that I validated were premature stop codons and a medium size deletion

that also causes truncation. Thus, the model is consistent with the validated suppressor

mutations.

4.3 Have genome integrity gene deletion strains evolved?

We have described the mutation spectrum of various genome integrity gene deletion

strains and identified suppressor mutations in several strains. The genome integrity gene

deletion strains were expected to accumulate many mutations and structural variations owing to

the loss of genome integrity. However, the number of SNVs identified with high confidence is

only ~ 5 among the strains although the msh2∆ MATα strain is a notable exception.

Consequently, only a few deleterious SNVs per strain were identified using BLOSUM80 matrix,

PolyPhen-2, and Provean. However, most of the predicted deleterious mutations were likely to

be neutral at its phenotypic level because no obvious growth difference was observed for most

cases in the tetrad analysis and from the MoBY-ORF 2.0 transformations. This is probably

because cells that have real deleterious mutations at the phenotypic level are easily eliminated

from the population.

58

Furthermore, the number of micro-indels identified per strain was below 1 even

including the low confident calls except the msh2∆ strains. Interestingly, only two micro-

deletions with the stringent threshold were identified in the top3∆ MATα strain and one of two

was validated as a suppressor mutation. For small to large structural variations such as inversion

and insertions, Pindel detected just 6 unique deletions and 7 unique insertions among the strains

and only one deletion that was identified in the rmi1∆ MATα strain directly affected a CDS

(SGS1). This mutation was validated as a suppressor. Pindel and read-depth analyses indicate a

large inversion and deletion event involving the SRD1 gene in chromosome III. However, this

event was observed among all the strains and, therefore, it is not interesting as a suppressor

candidate. Regarding CNVs, many of the strains maintained normal CNVs although several

chromosome duplications or telomere aberrations were identified. Thus, our sequencing data

analysis reveals that the genome integrity gene deletion strains did not harbor many mutations

and the mutation rate was generally similar across all strains.

Interestingly, DNA repair and replication related genes are known to have significantly

lower number of negative genetic interactions. This suggests that these genes were more

buffered in the gene network or need specific or different conditions to reveal genetic network

(Costanzo et al., 2010). When the buffering effect in the gene network is strong, deleterious

effects of a gene deletion would be less severe. Our findings show that the genome integrity of

the deletion strains is relatively well-maintained and also implies the robustness of the genome

integrity genetic network and buffering effects. Therefore, our results suggest that experimental

data acquired from yeast deletion strains are not generally confounded by the presence of

additional ‘unintentionally evolved’ mutations.

59

5 Conclusions and future directions

In conclusion, this study shows that the analysis of high-throughput sequencing data

alone allows us to narrow down the list of candidate suppressor mutations and identifies novel

candidate sgs1 suppressor mutations in the top3∆ and rmi1∆ strains as true suppressors. By

contrast, the other candidate suppressor mutations that were tested did not show any suppressive

effects in the conditions used in this study. The most challenging part of this study was that it is

unsure if a suppressor mutation in the starting deletion strains was present or not, and if so

where it might lie. Although the sequencing data provides the indication of possible suppressor

mutations using genetic interaction data and GO annotations, it might not be reasonable to test

all identified mutations in various experimental conditions using brute force. I suspect that most

of the candidate mutations may be neutral mutations or mild suppressors that are difficult to

detect in normal experimental conditions. However, this problem can be overcome using

laboratory-evolved strains that are known to have suppressor mutations. That is, strains of

interest can be grown in a certain condition until they show a fitness increase due to suppressor

mutations and sequenced to identify the mutations followed by tetrad analysis and PCR

confirmation. Therefore, this project can be further extended to investigate functional

relationships between genes involving these strains by conducting laboratory evolution

experiments. Also, it would be interesting to identify functional relationships among genes that

have human homologues, especially disease-related genes. This project could potentially help to

elucidate not only functional relationships between genes in yeast but also provide clues about

the possible molecular mechanisms of human disease.

60

5.1 Rationale

In the M.Sc. project, I was able to prove that the method for identification of suppressor

mutations in the yeast deletion strains worked and the suppressors were experimentally

validated. Here, I would like to propose a laboratory evolution project using yeast deletion

strains in which the query deleted genes are related to human disease. That is, the human

orthologous of the query genes are involved in human disease. Our first target is 54 genes

involved in DNA repair (22) and translation (32).

The main goals of the proposed project are to reveal functional relationships between the

genes and suppressor genes and also to illustrate the role of the genes in terms of the mechanism

of the diseases.

We chose the target genes as follows. First, the YKO, DAmP (Decreased Abundance by

mRNA Perturbation) and temperature sensitive strains that have less than 90% fitness compared

to wild type when cultured in YPD or minimal media are filtered (Breslow et al., 2008;

Deutschbauer et al., 2005). Then, we selected the genes that have human orthologs based on a

search using InParanoid 7 and the human genes are involved in human diseases according to

OMIM (Online Mendelian Inheritance in Man) database (Ostlund et al., 2010). Finally, the

selected genes were categorized based on GO ontology using FuncAssociate 2.0 and two

categories, DNA repair and translation, have a suitable number of genes to perform the lab

evolution experiments. Also, the two functions are highly conserved between human and yeast

and therefore the two categories would be suitable to study the mechanism of the diseases that

are derived from the human homolog based on the results of yeast lab evolution experiments.

Suppressor mutations in yeast have been used to study human disease (Menne et al.,

2007). The suppressors were identified using synthetic genetic array (SGA) analysis, which

61

requires genome wide mapping of double deletion strains. Consequently, this SGA method

requires extensive efforts to generate double deletion strains for mapping one suppressor locus.

In contrast, the high-throughput sequencing analysis can provide direct indications of

suppressors based on the occurrence of mutations. In addition, the method could reveal multiple

suppressors at once by sequencing many independently evolved strains, which will further

accelerate the understanding of the functional associations between genes. Furthermore, it is

also possible to detect potentiating mutations, which enhance the effects of suppressors but itself

does not affect the fitness although this would require back crossing, tetrad analysis and re-

sequencing of multiple spores. Thus, using this powerful method we will test whether the high-

throughput sequencing of lab evolved strains and the downstream analysis can characterize the

functional relationships between genes in yeast and illuminate the biology of disease mechanism.

5.2 Specific aims

Aim1. To develop suppressor mutations in the deletion strains using laboratory evolution

experiment

1) Laboratory evolution experiment

Laboratory evolution experiments in various controlled conditions are useful to research

spontaneous suppressor mutations (Blount et al., 2012; Gresham et al., 2008; Lenski and

Travisano, 1994). In this proposed project, Song Sun, a postdoctoral fellow in the Roth lab, and

I plan to conduct lab evolution experiments to select spontaneous suppressor mutations that

evolved in the context of the lack of a DNA repair or translation related gene in a high-

throughput manner.

There are two studies that can highlight the feasibility of lab evolution experiment to

describe functional relationships between genes using yeast deletion strains. First, it has been

62

reported that ~ 97% of deletion strains showed significant growth defects under certain

conditions (Hillenmeyer et al., 2008). This indicates that most of the yeast deletion strains can

be chosen as target strains for lab evolution experiments to develop suppressor mutations

because target strains have to show measurable growth defects to differentiate deletion strains

with or without suppressors and to characterize the importance of suppressor genes. Yet, it may

be feasible to reconstruct the deletion strains to measure accurate growth rate of the fresh

deletion strain and to minimize a number of mutations accumulated.

Second, a recent large-scale laboratory evolution experiment using yeast deletions

strains demonstrated that suppressor mutations can be developed often in a normal laboratory

condition without extensive culturing. According to the Dr. Balazs Papp’s study (personal

communication), more than 100 strains out of 200 deletion strains needed only 400 generations

of lab evolution to compensate for at least 50% of the initial fitness defects (unpublished data).

This study indicates that deleterious effects of deleted genes could be suppressed by only 1 or 2

mutations given that the mutation rate is 0.33 × 10 -9

per site per cell division (Lynch et al.,

2008), which is equivalent to approximately 4 mutations per 1000 cell divisions. This

convenience to select suppressor mutations further supports the feasibility of the proposed

experiment although adding drugs such as Ethyl Methanesulfonate (EMS) to increase the

mutation rate might be valuable to shorten the duration of lab evolution experiments.

We plan to reconstruct and/or evolve at least 8 independent lineages per strain of interest

for smooth identification of candidate suppressor mutations because suppressors for a specific

loss of function mutation may arise often in the same gene or functionally related genes. Then,

the strains will be grown until they show an increase in their fitness using Tecan in certain

63

conditions. In addition, during this lab evolution experiment, archival glycerol stocks of the

strains will be made so that I can validate the reproducibility of experiment.

Aim2. Identification of suppressor mutations in the query strains

1) High-throughput sequencing and mutation calling for the query deletion stains

The lab-evolved deletion strains that are known to harbor suppressor mutations will be

sequenced using the Illumina HiSeq platform. The sequencing data will be analyzed using the

computational pipeline that I have developed. Based on the sequencing data analysis, I should

be able to narrow down candidate suppressors for the strains as shown. When the strains present

only one or two candidate suppressors, they will be validated using tetrad analysis and Sanger

sequencing as shown in this study. However, evolved strains may have many mutations that

could mislead or confuse us to find suppressor mutations. In this case, it would be laborious and

not be cost-effective to test each mutation by Sanger sequence after sporulation. Thus, as

another approach, it would be reasonable to perform whole genome sequence of 10 or more

spores (either one normal or one slow glowing colony from one tetrad) to identify suppressor

mutations because the statistical significance for each candidate mutation can be calculated

based on the colony size and the existence of mutations by a single sequencing run. This method

would be also useful when suppression requires two or more mutations including potentiating

mutations that are hard to detect as they do not show obvious fitness increase.

In summary, I proposed to extend the M.Sc. project for laboratory evolved deletion

strains. In the lab evolved strains, one or more suppressor mutations should be accumulated to

compensate for the fitness defects. Our genes of interest have homologs in human and they are

known to be involved in DNA repair and translation. Using the computational pipeline that I

64

have developed and the experimental validation, it may be possible to find multiple yeast

orthologs that are involved in disease through the analysis of the suppressor genetic interactions.

This project could potentially reveal the functional relationships between genes and furthermore,

it could unveil the biology underlying the diseases caused by the DNA repair and translation

related genes.

65

6 Reference

Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P.,

Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting damaging

missense mutations. Nat. Methods 7, 248-249.

Avery, L., and Wasserman, S. (1992). Ordering gene function: the interpretation of epistasis in

regulatory hierarchies. Trends in Genetics : TIG 8, 312-316.

Blake, D., Luke, B., Kanellis, P., Jorgensen, P., Goh, T., Penfold, S., Breitkreutz, B.J.,

Durocher, D., Peter, M., and Tyers, M. (2006). The F-box protein Dia2 overcomes replication

impedance to promote genome stability in Saccharomyces cerevisiae. Genetics 174, 1709-1727.

Blount, Z.D., Barrick, J.E., Davidson, C.J., and Lenski, R.E. (2012). Genomic analysis of a key

innovation in an experimental Escherichia coli population. Nature 489, 513-518.

Breslow, D.K., Cameron, D.M., Collins, S.R., Schuldiner, M., Stewart-Ornstein, J., Newman,

H.W., Braun, S., Madhani, H.D., Krogan, N.J., and Weissman, J.S. (2008). A comprehensive

strategy enabling high-resolution functional analysis of the yeast genome. Nat. Methods 5, 711-

718.

Broach, J.R. (1991). RAS genes in Saccharomyces cerevisiae: signal transduction in search of a

pathway. Trends Genet. 7, 28-33.

Bzymek, M., Thayer, N., Oh, S., Kleckner, N., and Hunter, N. (2010). Double Holliday

junctions are intermediates of DNA break repair. Nature 464, 937-941.

Cejka, P., Plank, J.L., Bachrati, C.Z., Hickson, I.D., and Kowalczykowski, S.C. (2010). Rmi1

stimulates decatenation of double Holliday junctions during dissolution by Sgs1-Top3. Nat.

Struct. Mol. Biol. 17, 1377-1382.

Chang, M., Bellaoui, M., Zhang, C., Desai, R., Morozov, P., Delgado-Cruzata, L., Rothstein, R.,

Freyer, G., Boone, C., and Brown, G. (2005). RMI1/NCE4, a suppressor of genome instability,

66

encodes a member of the RecQ helicase/Topo III complex. The {EMBO} Journal 24, 2024-

2033.

Choi, Y., Sims, G.E., Murphy, S., Miller, J.R., and Chan, A.P. (2012). Predicting the functional

effect of amino acid substitutions and indels. PLoS One 7, e46688.

Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E.D., Sevier, C.S., Ding, H., Koh,

J.L., Toufighi, K., Mostafavi, S., et al. (2010). The genetic landscape of a cell. Science 327,

425-431.

Costanzo, M., and Boone, C. (2009). SGAM: an array-based approach for high-resolution

genetic mapping in Saccharomyces cerevisiae. Methods Mol. Biol. 548, 37-53.

Deutschbauer, A.M., Jaramillo, D.F., Proctor, M., Kumm, J., Hillenmeyer, M.E., Davis, R.W.,

Nislow, C., and Giaever, G. (2005). Mechanisms of haploinsufficiency revealed by genome-

wide profiling in yeast. Genetics 169, 1915-1925.

Earley, M.C., and Crouse, G.F. (1998). The role of mismatch repair in the prevention of base

pair mutations in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U. S. A. 95, 15487-15491.

Gangloff, S., McDonald, J.P., Bendixen, C., Arthur, L., and Rothstein, R. (1994). The yeast type

I topoisomerase Top3 interacts with Sgs1, a DNA helicase homolog: a potential eukaryotic

reverse gyrase. Mol. Cell. Biol. 14, 8391-8398.

Gasch, A., Huang, M., and Metzner…, S. (2001). Genomic expression responses to DNA-

damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Molecular Biology

of …

Gatbonton, T., Imbesi, M., Nelson, M., Akey, J.M., Ruderfer, D.M., Kruglyak, L., Simon, J.A.,

and Bedalov, A. (2006). Telomere length as a quantitative trait: genome-wide survey and

genetic mapping of telomere length-control genes in yeast. PLoS Genet. 2, e35.

Gresham, D., Desai, M.M., Tucker, C.M., Jenq, H.T., Pai, D.A., Ward, A., DeSevo, C.G.,

Botstein, D., and Dunham, M.J. (2008). The repertoire and dynamics of evolutionary

adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 4, e1000303.

67

Hanein, D., Volkmann, N., Goldsmith, S., Michon, A., Lehman, W., Craig, R., DeRosier, D.,

Almo, S., and Matsudaira, P. (1998). An atomic model of fimbrin binding to F-actin and its

implications for filament crosslinking and regulation. Nat. Struct. Biol. 5, 787-792.

Henikoff, S., and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks.

Proc. Natl. Acad. Sci. U. S. A. 89, 10915-10919.

Hillenmeyer, M.E., Fung, E., Wildenhain, J., Pierce, S.E., Hoon, S., Lee, W., Proctor, M., St

Onge, R.P., Tyers, M., Koller, D., et al. (2008). The chemical genomic portrait of yeast:

uncovering a phenotype for all genes. Science 320, 362-365.

Hohmann, S., Neves, M.J., de Koning, W., Alijo, R., Ramos, J., and Thevelein, J.M. (1993).

The growth and signalling defects of the ggs1 (fdp1/byp1) deletion mutant on glucose are

suppressed by a deletion of the gene encoding hexokinase PII. Curr. Genet. 23, 281-289.

Ii, M., Ii, T., Mironova, L.I., and Brill, S.J. (2011). Epistasis analysis between homologous

recombination genes in Saccharomyces cerevisiae identifies multiple repair pathways for Sgs1,

Mus81-Mms4 and RNase H2. Mutat. Res. 714, 33-43.

Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., and Turner, D.J. (2009).

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and

assembly of (G+C)-biased genomes. Nat. Methods 6, 291-295.

Krogan, N., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N.,

Tikuisis, A., et al. (2006). Global landscape of protein complexes in the yeast Saccharomyces

cerevisiae. Nature 440, 637-643.

Lehner, K.R., Stone, M.M., Farber, R.A., and Petes, T.D. (2007). Ninety-six haploid yeast

strains with individual disruptions of open reading frames between YOR097C and YOR192C,

constructed for the Saccharomyces genome deletion project, have an additional mutation in the

mismatch repair gene MSH3. Genetics 177, 1951-1953.

Lenski, R.E., and Travisano, M. (1994). Dynamics of adaptation and diversification: a 10,000-

generation experiment with bacterial populations. Proc. Natl. Acad. Sci. U. S. A. 91, 6808-6814.

68

Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler

transform. Bioinformatics 26, 589-595.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.,

Durbin, R., and 1000 Genome Project Data Processing Subgroup. (2009). The Sequence

Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.

Liti, G., and Louis, E. (2003). NEJ1 prevents NHEJ-dependent telomere fusions in yeast without

telomerase. Mol. Cell 11, 1373-1378.

Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dickinson, W.J.,

Okamoto, K., Kulkarni, S., Hartl, D.L., and Thomas, W.K. (2008). A genome-wide view of the

spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U. S. A. 105, 9272-9277.

Magtanong, L., Ho, C.H., Barker, S.L., Jiao, W., Baryshnikova, A., Bahr, S., Smith, A.M.,

Heisler, L.E., Choy, J.S., Kuzmin, E., et al. (2011). Dosage suppression genetic interaction

networks enhance functional wiring diagrams of the cell. Nat. Biotechnol. 29, 505-511.

Menne, T., Goyenechea, B., S\'anchez-Puig, N., Wong, C., Tonkin, L., Ancliff, P., Brost, R.,

Costanzo, M., Boone, C., and Warren, A. (2007). The Shwachman-Bodian-Diamond syndrome

protein mediates translational activation of ribosomes in yeast. Nat. Genet. 39, 486-495.

Mitsuzawa, H., Uno, I., Oshima, T., and Ishikawa, T. (1989). Isolation and characterization of

temperature-sensitive mutations in the RAS2 and CYR1 genes of Saccharomyces cerevisiae.

Genetics 123, 739-748.

Nielsen, R., Paul, J.S., Albrechtsen, A., and Song, Y.S. (2011). Genotype and SNP calling from

next-generation sequencing data. Nat. Rev. Genet. 12, 443-451.

Ostlund, G., Schmitt, T., Forslund, K., Kostler, T., Messina, D.N., Roopra, S., Frings, O., and

Sonnhammer, E.L. (2010). InParanoid 7: new algorithms and tools for eukaryotic orthology

analysis. Nucleic Acids Res. 38, D196-203.

Pak, T.R., and Roth, F.P. (2013). ChromoZoom: a flexible, fluid, web-based genome browser.

Bioinformatics 29, 384-386.

69

Pan, X., Ye, P., Yuan, D., Wang, X., Bader, J., and Boeke, J. (2006). A DNA integrity network

in the yeast Saccharomyces cerevisiae. Cell 124, 1069-1081.

Pochart, P., Woltering, D., and Hollingsworth, N.M. (1997). Conserved properties between

functionally distinct MutS homologs in yeast. J. Biol. Chem. 272, 30345-30349.

Prelich, G. (1999). Suppression mechanisms: themes from variations. Trends in Genetics : TIG

15, 261-266.

Rouse, J., and Jackson, S. (2002). Interfaces between the detection, signaling, and repair of

DNA damage. Science (New York, N. Y. ) 297, 547-551.

Sandrock, T., O'Dell, J., and Adams, A. (1997). Allele-specific suppression by formation of new

protein-protein interactions in yeast. Genetics 147, 1635-1642.

Sass, P., Field, J., Nikawa, J., Toda, T., and Wigler, M. (1986). Cloning and characterization of

the high-affinity cAMP phosphodiesterase of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci.

U. S. A. 83, 9303-9307.

Sherman, F. (2002). Getting started with yeast. Methods Enzymol. 350, 3-41.

Shiloh, Y. (2003). ATM and related protein kinases: safeguarding genome integrity. Nature

Reviews. Cancer 3, 155-168.

Shor, E., Gangloff, S., Wagner, M., Weinstein, J., Price, G., and Rothstein, R. (2002). Mutations

in homologous recombination genes rescue top3 slow growth in Saccharomyces cerevisiae.

Genetics 162, 647-662.

Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., and Birol, I. (2009). ABySS:

a parallel assembler for short read sequence data. Genome Res. 19, 1117-1123.

Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., and Ideker, T. (2011). Cytoscape 2.8: new

features for data integration and network visualization. Bioinformatics 27, 431-432.

70

Tarassov, K., Messier, V., Landry, C., Radinovic, S., Serna Molina, M., Shames, I., Malitskaya,

Y., Vogel, J., Bussey, H., and Michnick, S. (2008). An in vivo map of the yeast protein

interactome. Science (New York, N. Y. ) 320, 1465-1470.

Tong, A.H., Lesage, G., Bader, G.D., Ding, H., Xu, H., Xin, X., Young, J., Berriz, G.F., Brost,

R.L., Chang, M., et al. (2004). Global mapping of the yeast genetic interaction network. Science

303, 808-813.

Unk, I., Hajdu, I., Blastyak, A., and Haracska, L. (2010). Role of yeast Rad5 and its human

orthologs, HLTF and SHPRH in DNA damage tolerance. DNA Repair (Amst) 9, 257-267.

Uno, I., Matsumoto, K., and Ishikawa, T. (1983). Characterization of a cyclic nucleotide

phosphodiesterase-deficient mutant in yeast. The Journal of Biological Chemistry 258, 3539-

3542.

Wang, J. (2002). Cellular roles of DNA topoisomerases: a molecular perspective. Nature

Reviews.Molecular Cell Biology 3, 430-440.

Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B.,

Bangham, R., Benito, R., Boeke, J.D., Bussey, H., et al. (1999). Functional characterization of

the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901-906.

Ye, K., Schulz, M.H., Long, Q., Apweiler, R., and Ning, Z. (2009). Pindel: a pattern growth

approach to detect break points of large deletions and medium sized insertions from paired-end

short reads. Bioinformatics 25, 2865-2871.

Yu, H., Braun, P., Yildirim, M., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa,

T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-quality binary protein interaction map of

the yeast interactome network. Science (New York, N. Y. ) 322, 104-110.

Yuen, K.W., Warren, C.D., Chen, O., Kwok, T., Hieter, P., and Spencer, F.A. (2007).

Systematic genome instability screens in yeast and their potential relevance to cancer. Proc.

Natl. Acad. Sci. U. S. A. 104, 3925-3930.

71

Zhu, J., Zhang, B., Smith, E., Drees, B., Brem, R., Kruglyak, L., Bumgarner, R., and Schadt, E.

(2008). Integrating large-scale functional genomic data to dissect the complexity of yeast

regulatory networks. Nat. Genet. 40, 854-861.

72

7 Appendices

7.1 Other relevant research

7.1.1 Large-Scale Identification of extragenic suppressor mutations in yeast

Jolanda van Leeuwen, Joseph Mellor, Anastasia Baryshnikova, Takafumi Yamaguchi, Atina

Cote, Michael Costanzo, Brenda Andrews, Frederick Roth, Charles Boone

CCBR - University of Toronto, Toronto, Ontario, Canada

My contribution to this project

I analyzed ~ 300 candidate suppressor strains and background strains using the

computational pipeline that I have developed. As a result, I have detected promising suppressor

candidates in more than 200 of the strains including SNVs and small - large structural

variations. In addition, all the genomes that I have analyzed were visualized using

ChromoZoom.

The detailed information that was written by Jolanda van Leeuwen with minor adaptations by

me is available below.

Abstract

Genetic mutations can adversely affect biological pathways often resulting in cellular

damage that may lead to human disease. In some instances, accumulation of secondary

mutations elsewhere in the genome may compensate the deleterious effects of the primary

mutation. This phenomenon is referred to as genetic suppression and it provides a powerful tool

for identifying novel functional relationships between genes and their corresponding pathways.

While isolated suppressors have been identified previously, large-scale suppressor identification

73

and the mapping of a suppressor genetic network have never been performed.

Introduction

The Boone lab established Synthetic Genetic Array (SGA) analysis that allows for the

automated construction of high-density arrays of double mutants and the identification of

genetic interactions in yeast (Tong et al., 2004). Because SGA analysis can be applied to any

genetic element linked to a selectable marker, it can also be adapted for a variety of different

genetic screens. In particular, SGA analysis enables high-resolution genetic mapping of loci

associated with spontaneously arising suppressor mutations (Costanzo and Boone, 2009). Using

SGA mapping, we identified spontaneous extragenic mutations that suppress the fitness defects

associated with ~200 yeast deletion and conditional temperature sensitive mutants. We used

next-generation sequencing to generate whole genome sequences of these strains to pinpoint the

precise mutation within the loci outlined by SGA mapping.

Results

Whole-genome sequencing identified a single mutation within the suppressor gene

linkage group in 80% of the strains. More than 90% of these mutations are non-synonymous

single nucleotide variants whereas 10% represent larger deletion or insertion mutations. These

experiments identify a high-confidence list of candidate mutations that suppress growth defects

associated with deletion alleles of non-essential genes.

Future directions

We are now confirming the genetic suppression interactions identified in our SGA

screens using three assays; [a] Plasmid-based complementation, where we introduce a wild-type

copy of the suppressor gene into the query strain carrying the suppressor mutation – in this

74

assay, we expect the wild-type allele to reduce growth of the query strain if the suppressor

mutation is recessive; [b] Genetic analysis, where we cross the query strain carrying the

suppressor mutation to a wild-type strain and perform traditional tetrad analysis to ask if the

suppression phenotype is associated with a single gene; [c] Strain reconstruction – we will use

standard allele replacement to introduce the putative suppressor mutation identified by

sequencing into a diploid strain heterozygous for the query gene deletion allele and conduct

another round of tetrad analysis to test for a suppression phenotype.

This collaborative project is well on its way and should map the first large-scale

suppressor genetic interaction network.

75

7.1.2 Identification of E0005 resistant mutations in yeast using next-

generation sequencing

Siyang Li’s project

My contribution to this project

I assisted Siyang in the analysis of 15 drug resistant strains against a natural compound

called E0005 using the computational pipeline that I have developed. We identified various

lem3, pdr1 and pdr3 mutations that would confer the resistance in the strains.

7.1.3 Generation of the lift-over genomes of BY4741 and BY4742 strains

Available at the Saccharomyces Genome Database

BY4741 genome - http://downloads.yeastgenome.org/sequence/strains/BY4741/

BY4742 genome - http://downloads.yeastgenome.org/sequence/strains/BY4742/

http://downloads.yeastgenome.org/sequence/strains/BY4741/

http://downloads.yeastgenome.org/sequence/strains/BY4742/

76

7.2 Poster Presentation

The 13th International Conference on Systems Biology, University of Toronto, Canada (Aug 20,

2012)

Systematic analysis of suppressor mutations in S. cerevisiae strains with deleted genome

integrity genes

Takafumi Yamaguchi1,3,4

, Joseph Mellor3,4

, Hon Nian Chua3,4

, Atina Cote3,4

, Anna

Karkhanina3,4

, Daniel Durocher1,4

, Frederick Roth1,2,3,4

1) Department of Molecular Genetics, 2) Department of Computer Science; and 3) Donnelly

Centre, University of Toronto, Toronto, Ontario M5S-3E1; and 4) Samuel Lunenfeld Research

Institute, Mt. Sinai Hospital, Toronto, Ontario M5G-1X5, Canada

Documents

Systematic Analysis of Suppressor Mutations in S ... · Systematic Analysis of Suppressor Mutations in S. cerevisiae Strains with Deleted Genome Integrity Genes Takafumi Yamaguchi