Data dialogue - Human Genomic Data Discovery

  • View
    150

  • Download
    2

  • Category

    Science

Preview:

Citation preview

Human Genomic Data DiscoverabilityFiona Nielsen – Data Dialogue, Cambridge – July 28th 2016

The surge of genomics data

• High throughput technologies – biology is moving from the lab to the computer

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Genomes Sequenced

80+ PB

Sequenced every year

Population sequencing projects

• For example 100,000 Genomes project in the UK

Where is the data?

• A researcher in human genomics knows on average 4-5 data sources

The need to redefine data sharing: http://www.sciencedirect.com/science/article/pii/S2212066114000386

Hundreds of data sources

• Content overview of 163 data sources

Assay Types

Dedicated to…

Hundreds of data sources

• Sizes vary from tens to 100s of thousands of samples

0.2

2

20

200

2000

20000

200000

2000000

Chart TitleSa

mpl

e #

(Log

10)

Top 5:GEO (1.8M)PMI Cohort Program (1M)Auria Biopankki (1M)EGA (~0.6M)SRA (~0.5M)

Which populations are represented?

Aboriginals

African Americans

Africans

Australians

Chinese

MalaysIndians

DanishDutch Estonian

Russian

European Ancestry

FinnishIcelandic

JapaneseKorean

Latin Americans

Saudi

Swedish

Where does the data come from?

9475600

88

660

26

68

5062

3

25

0

0

23

International

Interesting site to look at: http://omicsmaps.com/stats

Why is some data not shared?

• Challenges for international research community: How to work across borders and silos?

Why is some data not shared?

• Additional challenges for biomedical: Data privacy, data governance, patient consent, medical legislation

Also consider: Community-led resources

• patient groups, academia, the general public

What needs to change?

• Increased data visibility and accessibility positively benefit both researchers and patients

?

Pain points

FRAGMENTEDPoor visibility of available

genomic data

ADMIN BURDENHuge overhead to manage

data access

BAD CULTURELack of data sharing habits in

research culture

Best practices

MAKE DATA DISCOVERABLE

SIMPLIFY WORKFLOWS

CONTRIBUTE TOCOMMUNITY

DNAdigest and Repositive – Connecting the world of genomic datahttp://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.1002418

Panel discussion

• What are best practices for sharing difficult data?

FAIR data: Findable, Accessible, Interoperable, Reuseable

Translating and Commercialising Genomic Research7-9 December 2016| Wellcome Genome Campus, Hinxton, Cambridge UK

Applications open soon!

Scientific programme committee Emmanuelle Astoul Wellcome Trust Sanger Institute, UKFiona Nielsen Repositive/DNAdigest, UKAbel Ureta-Vidal Eagle Genomics, UKRoss Rounsevell Wellcome Trust Sanger Institute, UK

Full details at: www.wellcomegenomecampus.org/coursesandconferences

Topics will include:• Commercial opportunities arising from data aggregation• Exploiting bioinformatics tools• Externalising bioinformatics pipelines• Translating biomarkers, genetic signatures or gene panels

CEO Fiona Nielsen, fiona@repositive.io

Try our free platform for discovering human genomic data http://repositive.io Follow us on twitter @repositiveio

Recommended