32
100 Genomes in 100 Days: The Structural V ariant Landscape of Tomato Genomes Michael Schatz February 28, 2019 AGBT @mike_schatz

100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

100 Genomes in 100 Days:The Structural Variant Landscape of Tomato GenomesMichael Schatz

February 28, 2019AGBT

@mike_schatz

Page 2: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Tomato Domestication & Agriculture

Tomatoes are one of the most valuable crops in the world Worldwide annual production >175 million tons & >$85B Major ingredient in many common foods:

– Sauces, salsa, ketchup, soups, salads, etc

Tomatoes are an important plant model systemOriginally from South America, transported to Europe by early explorers in the 17th century, and then back to North America in the 18th centuryExtensive phenotypic variation: >15,000 named varieties

– Model for studying fruiting and floweringMember of important Solanaceae family – Potato, pepper, eggplant, tobacco, petunia, etc

Page 3: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Tomato Genomics and GeneticsTomato Reference Genome published in May 2012

International consortium from 14 countries requiring years of effort and tens of millions of dollars– Sanger + 454 + fosmids + BAC-ends + genetic map + FISH

‘Heinz 1706’ cultivar (v3)– 12 chromosomes, 950 Mbp genome, diploid– 22,707 contigs, 133kbp contig N50, 80M ’Ns’– 20Mb on “chromosome 0”

Resource for thousands of studies– Candidate SNPs for many traits identified through GWAS– Candidate genes and pathways through RNAseq– Extensive investment into agricultural traits:

– ripening, flavor, fruit size, color, morphology

Page 4: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Structural Variations Are Drivers of Quantitative Variation

Recent results highlight structural variations to play a major role in phenotypic differencesSV are any variants >50bp: insertions, deletions, inversions, duplications, translocations, etcAdds, removes, and moves exons, binding sites, and other regulatory sequencesNotoriously difficult to identify using short reads: high false positive & false negative rate

Page 5: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Structural Variation Landscapes in Tomato Genomes and their role in Natural Variation, Domestication, and Crop Improvement

Zach LippmanCSHL / HHMI

Joyce Van EckBoyce Thompson

Esther van der KnaapUniv. of Georgia

Fritz SedlazeckBaylor

Sara GoodwinCSHL

Project overview

1. Select diverse samples

2. Long read sequencing

3. Per-sample SV identification

4. Pan-tomato SV landscape

5. Identify and validate SVs associated with agricultural and phenotypic traits

Page 6: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

~ Part 1 ~Sample Selection

Page 7: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Tomato Population Genetics

More than 900 varieties of tomato and related species have been sequenced with short reads

Most are between 20x and 40x coverageSNP-based phylogenetic tree shows 10 major clades

Initial examination of SVs Identify variants using an aggregation of three individual methods to improve sensitivity– Lumpy, Manta & DellyConsensus calls using SURVIVOR (Jeffares et al, Nature Communications, 2017)– Retain calls supported by ≥ 2 callers– Reduces false-positives while not worsening

false-negatives Tree produced by:Jose Jimenez-Gomez

Page 8: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Optimized Sample SelectionOur goal is to select the 100 samples that collectively capture the most diversity

Short-read based SVs will under-sample variants but still represents relative diversitySelecting 100 at random only recovers about 40% of the total known diversityOptimal strategy is NP-hard using a set-cover algorithm, we approximate using a greedy approach

Ranking samples by number of variants picks diverse samples, although tends to pick siblings (nearly duplicate samples)

SVCollector: Optimized sample selection for validating and long-read resequencing of structural variantsSedlazeck et al (2018) bioRxiv doi: https://doi.org/10.1101/342386

Page 9: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Optimized Sample SelectionOur goal is to select the 100 samples that collectively capture the most diversity

Short-read based SVs will under-sample variants but still represents relative diversitySelecting 100 at random only recovers about 40% of the total known diversityRanking samples by number of variants picks diverse samples, although tends to pick siblings (nearly duplicate samples)Optimal strategy is NP-hard using a set-cover algorithm, we approximate using a greedy approach

SVCollector: Optimized sample selection for validating and long-read resequencing of structural variantsSedlazeck et al (2018) bioRxiv doi: https://doi.org/10.1101/342386

Fritz SedlazeckBaylor College

of Medicine

Poster #314

Page 10: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

~ Part 2 ~ Long Read Sequencing

Page 11: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

0

10

20

30

8/24

/201

79/

6/20

1711

/3/2

017

11/3

0/20

1712

/29/

2017

12/2

9/20

171/

3/20

182/

9/20

183/

7/20

183/

20/2

018

3/20

/201

83/

23/2

018

3/23

/201

83/

30/2

018

4/9/

2018

4/30

/201

85/

3/20

185/

11/2

018

5/11

/201

85/

30/2

018

6/8/

2018

6/8/

2018

6/13

/201

86/

13/2

018

6/19

/201

87/

3/20

187/

9/20

187/

13/2

018

7/20

/201

87/

23/2

018

7/25

/201

88/

6/20

188/

10/2

018

8/24

/201

89/

4/20

189/

7/20

189/

11/2

018

9/26

/201

89/

26/2

018

10/2

6/20

1811

/1/2

018

11/2

/201

811

/6/2

018

11/9

/201

812

-Nov

11/3

0/20

1811

/30/

2018

12/7

/201

812

/7/2

018

12/1

2/20

1812

/12/

2018

12/1

3/20

1812

/17/

2018

12/1

7/20

18D

ec-1

812

/20/

2018

12/2

0/20

1812

/26/

2018

1/3/

2019

1/3/

2019

1/3/

2019

1/3/

2019

1/17

/201

91/

17/2

019

1/25

/201

91/

25/2

019

2/8/

2019

2/11

/201

9

140GB

Sequencing strategyInitial proposal called for mix of short, long, and linked read sequencing

MinION and GridION became feasible spring/summer 2018

Encouraging PromethION yields on test runs mid-summer 2018 motivated switch in strategy

MinION + GridION

Nanopore Performance at CSHLSara Goodwin

MinION + GridION

Page 12: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

0

20

40

60

80

100

120

140

160

8/24

/201

79/

6/20

1711

/3/2

017

11/3

0/20

1712

/29/

2017

12/2

9/20

171/

3/20

182/

9/20

183/

7/20

183/

20/2

018

3/20

/201

83/

23/2

018

3/23

/201

83/

30/2

018

4/9/

2018

4/30

/201

85/

3/20

185/

11/2

018

5/11

/201

85/

30/2

018

6/8/

2018

6/8/

2018

6/13

/201

86/

13/2

018

6/19

/201

87/

3/20

187/

9/20

187/

13/2

018

7/20

/201

87/

23/2

018

7/25

/201

88/

6/20

188/

10/2

018

8/24

/201

89/

4/20

189/

7/20

189/

11/2

018

9/26

/201

89/

26/2

018

10/2

6/20

1811

/1/2

018

11/2

/201

811

/6/2

018

11/9

/201

812

-Nov

11/3

0/20

1811

/30/

2018

12/7

/201

812

/7/2

018

12/1

2/20

1812

/12/

2018

12/1

3/20

1812

/17/

2018

12/1

7/20

18D

ec-1

812

/20/

2018

12/2

0/20

1812

/26/

2018

1/3/

2019

1/3/

2019

1/3/

2019

1/3/

2019

1/17

/201

91/

17/2

019

1/25

/201

91/

25/2

019

2/8/

2019

2/11

/201

9

Yie

ld G

bs

MinION + GridION

140GB

Sequencing strategyInitial proposal called for mix of short, long,

and linked read sequencing

MinION and GridION became feasible

spring/summer 2018

Encouraging PromethION yields on test

runs mid-summer 2018 motivated switch

in strategy

Nanopore Performance at CSHLSara Goodwin

Page 13: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

0

20

40

60

80

100

120

140

160

8/24

/201

79/

6/20

1711

/3/2

017

11/3

0/20

1712

/29/

2017

12/2

9/20

171/

3/20

182/

9/20

183/

7/20

183/

20/2

018

3/20

/201

83/

23/2

018

3/23

/201

83/

30/2

018

4/9/

2018

4/30

/201

85/

3/20

185/

11/2

018

5/11

/201

85/

30/2

018

6/8/

2018

6/8/

2018

6/13

/201

86/

13/2

018

6/19

/201

87/

3/20

187/

9/20

187/

13/2

018

7/20

/201

87/

23/2

018

7/25

/201

88/

6/20

188/

10/2

018

8/24

/201

89/

4/20

189/

7/20

189/

11/2

018

9/26

/201

89/

26/2

018

10/2

6/20

1811

/1/2

018

11/2

/201

811

/6/2

018

11/9

/201

812

-Nov

11/3

0/20

1811

/30/

2018

12/7

/201

812

/7/2

018

12/1

2/20

1812

/12/

2018

12/1

3/20

1812

/17/

2018

12/1

7/20

18D

ec-1

812

/20/

2018

12/2

0/20

1812

/26/

2018

1/3/

2019

1/3/

2019

1/3/

2019

1/3/

2019

1/17

/201

91/

17/2

019

1/25

/201

91/

25/2

019

2/8/

2019

2/11

/201

9

Yie

ld G

bs

MinION + GridION

140GB

Sequencing strategyInitial proposal called for mix of short, long,

and linked read sequencing

MinION and GridION became feasible

spring/summer 2018

Encouraging PromethION yields on test

runs mid-summer 2018 motivated switch

in strategy

Nanopore Performance at CSHLSara Goodwin

PromethION

74GB

Page 14: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

0

20

40

60

80

100

120

140

160

8/24

/201

79/

6/20

1711

/3/2

017

11/3

0/20

1712

/29/

2017

12/2

9/20

171/

3/20

182/

9/20

183/

7/20

183/

20/2

018

3/20

/201

83/

23/2

018

3/23

/201

83/

30/2

018

4/9/

2018

4/30

/201

85/

3/20

185/

11/2

018

5/11

/201

85/

30/2

018

6/8/

2018

6/8/

2018

6/13

/201

86/

13/2

018

6/19

/201

87/

3/20

187/

9/20

187/

13/2

018

7/20

/201

87/

23/2

018

7/25

/201

88/

6/20

188/

10/2

018

8/24

/201

89/

4/20

189/

7/20

189/

11/2

018

9/26

/201

89/

26/2

018

10/2

6/20

1811

/1/2

018

11/2

/201

811

/6/2

018

11/9

/201

812

-Nov

11/3

0/20

1811

/30/

2018

12/7

/201

812

/7/2

018

12/1

2/20

1812

/12/

2018

12/1

3/20

1812

/17/

2018

12/1

7/20

18D

ec-1

812

/20/

2018

12/2

0/20

1812

/26/

2018

1/3/

2019

1/3/

2019

1/3/

2019

1/3/

2019

1/17

/201

91/

17/2

019

1/25

/201

91/

25/2

019

2/8/

2019

2/11

/201

9

Yie

ld G

bs

MinION + GridION PromethION

74GB

140GB

Nanopore Performance at CSHLSara Goodwin

Page 15: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

0

20

40

60

80

100

120

140

160

8/24

/201

79/

6/20

1711

/3/2

017

11/3

0/20

1712

/29/

2017

12/2

9/20

171/

3/20

182/

9/20

183/

7/20

183/

20/2

018

3/20

/201

83/

23/2

018

3/23

/201

83/

30/2

018

4/9/

2018

4/30

/201

85/

3/20

185/

11/2

018

5/11

/201

85/

30/2

018

6/8/

2018

6/8/

2018

6/13

/201

86/

13/2

018

6/19

/201

87/

3/20

187/

9/20

187/

13/2

018

7/20

/201

87/

23/2

018

7/25

/201

88/

6/20

188/

10/2

018

8/24

/201

89/

4/20

189/

7/20

189/

11/2

018

9/26

/201

89/

26/2

018

10/2

6/20

1811

/1/2

018

11/2

/201

811

/6/2

018

11/9

/201

812

-Nov

11/3

0/20

1811

/30/

2018

12/7

/201

812

/7/2

018

12/1

2/20

1812

/12/

2018

12/1

3/20

1812

/17/

2018

12/1

7/20

18D

ec-1

812

/20/

2018

12/2

0/20

1812

/26/

2018

1/3/

2019

1/3/

2019

1/3/

2019

1/3/

2019

1/17

/201

91/

17/2

019

1/25

/201

91/

25/2

019

2/8/

2019

2/11

/201

9

Yie

ld G

bs

140GB

Nanopore Performance at CSHLSara Goodwin

Currently running 6 to 8 PromethION flow cells in parallel twice a week

Routinely achieve 80-90 Gb / flow cell, many over 100 Gb

First 80 samples have been sequenced to at least 40x

Now at 12 to 16 samples per week

– 100 Genomes in 100 Days

Page 16: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Nanopore Read LengthsOptimized Sequencing strategy

Fragmentation at 30kbp using the Megarupter109 Ligation Sequencing Kit yields both long reads and high yield

Very long reads with PromethIONMean read length: 15kbp – 25kbpRead length N50: 25kbp – 30kbpOver 20x coverage of reads over 20kbp

Page 17: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracyRang et al (2018) Genome Biology. https://doi.org/10.1186/s13059-018-1462-9

Page 18: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracyRang et al (2018) Genome Biology. https://doi.org/10.1186/s13059-018-1462-9

Page 19: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracyRang et al (2018) Genome Biology. https://doi.org/10.1186/s13059-018-1462-9

Page 20: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

~ Part 3 ~ Structural Variation Identification

Page 21: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Structural Variation Identification

Genome structural variation discovery and genotypingAlkan, C, Coe, BP, Eichler, EE (2011) Nature Reviews Genetics. May;12(5):363-76. doi: 10.1038/nrg2958.

Two major strategies for detectionAlignment-based detection– Split-read alignment to detect the breakpoints of events– Fast, accurately identifies most variants, including

heterozygous variants – Very long insertions may be incomplete

Assembly-based detection– De novo assembly followed by whole genome

alignment– Can capture novel sequences and other complex

variants– Slow, demanding analysis, limited by contig length,

heterozygous variants challenging

Page 22: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Alignment Based Analysis

Accurate detection of complex structural variations using single molecule sequencing Sedlazeck, Rescheneder, et al (2018) Nature Methods. doi:10.1038/s41592-018-0001-7

NGMLR: Dual mode scoring to accommodate indel errors plus SVs CrossStitch: Local re-assembly across variants to improve breakpoints

BWA-MEM NGMLR

Page 23: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

De novo Assembly

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separationKoren et al (2018) Genome Research. doi: 10.1101/gr.215087.116

Gold level assemblies with CanuWell-established, integrated correction & assemblyContig N50 sizes >10-fold better than referenceMain challenge is speed– ~2 weeks per assembly on ~320 cores

Exploring faster optionsMiniasm (Li, Bioinformatics, 2016) runs in ~72 core hours (+1.5 days for consensus)Wtdbg2 (https://github.com/ruanjue/wtdbg2) runs in ~8 core hours (+1.5 days for consensus) although mixed results depending on sampleDiscussing cloud-enabled pipelines with DNAnexus

Page 24: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

RaGOO: Fast and accurate reference-guided scaffolding

Minimap2alignments

Ref

eren

ceC

ontig

s

Ref

eren

ce-G

uide

d (R

aGO

O)

Ref

eren

ce-F

ree

(SAL

SA2)

Reference guided scaffoldingUse the reference genome as a “genetic map”Effective when sample is structurally similar to reference

Validation using Hi-CReference-guided scaffolding leads to more complete and more accurate chromosomes

Fast and accurate reference-guided scaffolding of draft genomesAlonge et al (2019) bioRxiv. https://www.biorxiv.org/content/early/2019/01/13/519637

Page 25: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Assembly Based AnalysisRaGOO scaffolding yields essentially complete chromosomes

Final polishing using Bowtie2 + Pilon

– Substantially faster than Nanopolish, and modestly more accurate based on gene-analysis and alignment to reference

Gene annotations using MAKER

Assemblytics: a web analytics tool for the detection of variants from an assemblyNattestad, M, Schatz, MC (2016) Bioinformatics. doi: 10.1093/bioinformatics/btw369

M82

FLA

BGV

red: insertionsblue: deletions

Identify structural variants using AssemblyticsFinds variants within and between alignments

Especially important for large insertions of novel sequences

Tens of thousands of SVs, widely distributed across chromosomes

Page 26: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

~ Part 4 ~The Landscape of Structural Variation in Tomato Genomes

Page 27: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

The landscape of structural variations

Landscape of the first 20 accessionsSubstantial variation between samples– 15 to 50 thousand structural

variations each– Mostly insertions + deletions

Population GeneticsMost variants specific to 1 sampleMany variant shared by multiple samples, including some in all 20

Page 28: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Identification of the ej2 Tandem DuplicationValidation of our first SV association

Crosses of tomato plants with a highly desirable breeding trait (j2: jointless2) and a desirable domestication trait (ej2: an enhancer for j2) are typically poorly producing plants – a negative epistatic interactionHowever some breeding lines carry both alleles and yet have good yields through unknown means

– One of our first samples was such a breeding line and revealed a 83kbp tandem duplication spanning ej2

– Validated the duplication using Sanger, RNA-seq and quantitative genetics to conclude the duplication of the locus causes stabilization of branching and flower production

Now able to use CRISPR/cas9 to overcome the negative epistatic interaction to improve fruit yields

GCAATCCTdeletion

ej2w copy#1

83 Kbp copy#166’173’373

F2 ej2w copy#2

83 Kbp copy#2

R1

66’256’371

F1

R1

F2

Fla.8924Soyk et al (2019) Under Review

Page 29: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Identification of the ej2 Tandem DuplicationValidation of our first SV association

Crosses of tomato plants with a highly desirable breeding trait (j2: jointless2) and a desirable domestication trait (ej2: an enhancer for j2) are typically poorly producing plants – a negative epistatic interactionHowever some breeding lines carry both alleles and yet have good yields through unknown means

– One of our first samples was such a breeding line and revealed a 83kbp tandem duplication spanning ej2

– Validated the duplication using Sanger, RNA-seq and quantitative genetics to conclude the duplication of the locus causes stabilization of branching and flower production

Now able to use CRISPR/cas9 to overcome the negative epistatic interaction to improve fruit yields

GCAATCCTdeletion

ej2w copy#1

83 Kbp copy#166’173’373

F2 ej2w copy#2

83 Kbp copy#2

R1

66’256’371

F1

R1

F2

Fla.8924Soyk et al (2019) Under Review

Sebastian SoykCold Spring Harbor Laboratory

Page 30: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Summary & Future WorkHigh throughput long read sequencing is unlocking the universe of structural variations

Discovering tens of thousands of variants previously missed, as well as clarifying tens of thousands of false positives per samplePossible to rapidly characterize pan-genomes with >100 samplesThroughput & accuracy rapidly improving, realtime direct alignment of nanopore signal data

Beyond mere structural variation identification ..... towards “Rules of Life” interaction maps and beyond

Identify the specific pathways for many important traitsDiscovery and dissection of cis-regulatory epistasisAnalysis of epigenetic modificationsEngineering domestication traits in “wild” plants

Expect to see similar results in all other plant and animal species~~~ Sergey Aganezov @ 7:50pm Cancer Session ~~~

Page 31: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

AcknowledgementsSchatz LabMike AlongeSrividya

RamakrishnanSergey AganezovCharlotte DarbyArun DasKatie Jenike

Michael KirscheSam KovakaT. Rhyker

Ranallo-BenavideRachel Sherman

*Your Name Here*

Lippman LabSebastian SoykXingang Wang

Zachary Lemmon

Cold Spring Harbor LaboratorySara GoodwinW. Richard McCombie

Baylor College of Medicine Fritz Sedlazeck

Boyce Thompson Joyce Van Eck

University of GeorgiaEsther van der Knaap

Page 32: 100 Genomes in 100 Days: The Structural Variant Landscape ...schatz-lab.org › presentations › 2019 › 2019.02.28.AGBT... · CSHL / HHMI Joyce Van Eck Boyce Thompson Esther van

Thank you!@mike_schatz

http://schatz-lab.org

~~~ Sergey Aganezov @ 7:50pm Cancer Session ~~~

Call For Papers: Graph Genome Methodshttps://www.biomedcentral.com/collections/graphgenomes