User friendly tools for the Arabidopsis thaliana 1001 Genomes

  • View
    438

  • Download
    0

  • Category

    Science

Preview:

Citation preview

User-friendly web tools for the Arabidopsis thaliana 1001

genomesBeth Rowan

Max Planck Institute for Developmental BiologyPlant and Animal Genomes XXIV

January 11, 2016

Why sequence 1001 Arabidopsis thaliana genomes?

Brief history of variant discovery

Why sequence 1001 Arabidopsis thaliana genomes?

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Why sequence 1001 Arabidopsis thaliana genomes?

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Haplotype map with 20 strains

Why sequence 1001 Arabidopsis thaliana genomes?

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Haplotype map with 20 strains

2 wild strains resequenced

Why sequence 1001 Arabidopsis thaliana genomes?

Goals

-understand genome variation in the species

-reconstruct demographic history

-identify geographic and genetic subsets

-generate a powerful resource for genome-wide association studies

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Haplotype map with 20 strains

2 wild strains resequenced

Why sequence 1001 Arabidopsis thaliana genomes?

Goals

-understand genome variation in the species

-reconstruct demographic history

-identify geographic and genetic subsets

-generate a powerful resource for genome-wide association studies

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Haplotype map with 20 strains

2 wild strains resequenced

>80 wild strains resequenced

Why sequence 1001 Arabidopsis thaliana genomes?

Goals

-understand genome variation in the species

-reconstruct demographic history

-identify geographic and genetic subsets

-generate a powerful resource for genome-wide association studies

Brief history of variant discovery

SNPs

107

105

103

1995 2000 2005 2010 2015

First reference genome

Haplotype map with 20 strains

2 wild strains resequenced

>80 wild strains resequenced

1135 wild strains resequenced

1001 Arabidopsis genomes: an overview

1001 Genomes Consortium, in review

Final Set: 1135 Accessions

1001 Arabidopsis genomes: an overview

1001 Genomes Consortium, in review

Final Set: 1135 Accessions

1001 Arabidopsis genomes: an overview

1001 Genomes Consortium, in review

1001 Arabidopsis genomes: an overview

1001 Genomes Consortium, in review

Nearly-identical pairs

North America & British Isles

1001 Arabidopsis genomes: an overview

1001 Genomes Consortium, in review

Highly divergent pairs

26 “relict” accessions-Iberian peninsula-Cape Verde & Canary Islands

Nearly-identical pairs

North America & British Isles

1001 Arabidopsis genomes: an overview

ADMIXTURE analysis identifies 9 genetic groups

1001 Genomes Consortium, in review

1001 Arabidopsis genomes web tools

Current toolshttp://1001genomes.org/tools/-easyGWAS-GWAPP-1001 Proteomes-GBrowse-POLYMORPH-BLAST-Alivie-TAIR converter-Col-0 DB

New tools http://tools.1001genomes.org/

-Admixture map-Pseudogenomes-Strain ID

Admixture Map

http://1001genomes.github.io/admixture-map/

Admixture map

Admixture map

Admixture map

Pseudogenomes

http://tools.1001genomes.org/pseudogenomes

Pseudogenomes

Select all

Filter on the fly

Pseudogenomes

Format check

Autocomplete

Pseudogenomes

Multi-FASTA

Strain ID

http://tools.1001genomes.org/strain_id

Strain ID

Strain ID

Col-0(6909)

Tsu-0(7373)

X

F1

F2

Strain ID

Strain ID

Strain ID

Integrating tools with Araport

JBrowse

Extend to full dataset

Variant tracks for each strain

(Geographic location for each strain)

Integrating tools with Araport

JBrowse

Hover over variant to get info

Integrating tools with Araport

JBrowse

Left click on variant to see annotation and accession information

Integrating tools with Araport

Future plans

1. Get all SNPs in region2. Get all indels in region3. Get SnpEff info for given SNP4. Get VCF subset for given region5. Get pseudogenomes6. Helper function: Translate gene

id to coordinates

7. Get allele frequencies for variants8. Identify allele/haplotype groups9. Find ADMIXTURE cluster membership10. Experimental design tool for subsetting 1001 collection

examples:-subset with greates genetic diversity-accessions with similar climates but from different geographical areas-accessions with different population histories

Integrating tools with Araport

Future plans

1. Get all SNPs in region2. Get all indels in region3. Get SnpEff info for given SNP4. Get VCF subset for given region5. Get pseudogenomes6. Helper function: Translate gene

id to coordinates

7. Get allele frequencies for variants8. Identify allele/haplotype groups9. Find ADMIXTURE cluster membership10. Experimental design tool for subsetting 1001 collection

examples:-subset with greates genetic diversity-accessions with similar climates but from different geographical areas-accessions with different population histories

https://www.surveymonkey.com/r/8DTCVQF

Acknowledgements

Joffrey Fitz

1001 Genomes Consortium1001 Genomes Consortium

Web ToolsWeb Tools

Project coordinators

Detlef Weigel Magnus NordborgMPI for Developmental

BiologyGregor Mendel Institute

Joy Bergelson, University of ChicagoJoe R. Ecker, Salk InstituteMitchell Sudkamp, Monsanto

Database creationCongmao Wang, Zhejiang Acad. of Agri. SciencesAlexander Platzer, Gregor Mendel Institute+All Consortium Contributors

Ümit Seren