43
We are always looking for data Finding & Accessing Human Genomic Datasets CRUK, 7 th November 2016 Tweets welcome #CamFindData @repositiveio

Finding and Accessing Human Genomics Datasets

Embed Size (px)

Citation preview

Page 1: Finding and Accessing Human Genomics Datasets

We are always looking for data

Finding & Accessing

Human Genomic Datasets

CRUK, 7th November 2016

Tweets welcome #CamFindData@repositiveio

Page 2: Finding and Accessing Human Genomics Datasets

Outline of the day

- Data sources and data access - Case study: University of Cambridge- Coffee break- Introduction to Repositive- Hands-on session: searching for data- Round up and closure

Page 3: Finding and Accessing Human Genomics Datasets

On-line tools used during the workshop

To ask questions during the presentation and answer questions:

go to slido.com

enter event code: 7315

Page 4: Finding and Accessing Human Genomics Datasets

We are always looking for data

Finding & Accessing

Human Genomic Datasets

CRUK, 7th November 2016

Tweets welcome #CamFindData@repositiveio

Page 5: Finding and Accessing Human Genomics Datasets

• 2001:FirstHumanGenomeSequence• 2005:PersonalGenomeProject• 2008:UK10K• 2013:UK100KProject• 2015:1MPrecisionMedicineUS• 2016:AstraZeneca–HLI2M

• Manyothernationalandinternationalprojects

Genome Technology Evolution

Page 6: Finding and Accessing Human Genomics Datasets

•Consensusamongresearchers,clinicians,politicians&thepublicthatgenomicswilltransformbiomedicalresearch,healthcareandlifestylechoices(StephanBeck,UCL)

OPPORTUNITY

Page 7: Finding and Accessing Human Genomics Datasets

Data should be made available

Page 8: Finding and Accessing Human Genomics Datasets

• Requiredbyfunders• Cannotpublishunlessaccessionnumbergiven

• Specialised• ENA• EGA• dbGaP• dbSNP…

• Generalist• Dryad• figshare

Public Repositories

Page 9: Finding and Accessing Human Genomics Datasets

• OpenAccess• Eg.PGP,CC0• BermudaAccord

• Managed(RestrictedorControlledAccess)• DataAccessCommittee• Noeffectiveagreement(policyvacuum)

• GlobalAllianceforGenomics&Health• enablecompatible,readilyaccessible,andscalableapproachesforsharing

GOVERNANCE Models

Page 10: Finding and Accessing Human Genomics Datasets

Open vs Managed Access

OpenAccess

75,000,000permonth

ManagedAccess

150permonth

500,000 fold difference (Stephan Beck, UCL)

Page 11: Finding and Accessing Human Genomics Datasets

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Large amounts of data, but not accessible

≈.5 PB OpenAccess

80+ PB

Sequenced

Genome data available in public

repos

Exponential growth rate

Under-utilised datahashuge potentialfor

medicalresearch

Page 12: Finding and Accessing Human Genomics Datasets

Access to Managed Data

Benefits:• Strictgovernance• Individualsareprotected• Reviewofconsent• Applicantsignsforfullresponsibilityforgovernance

Disadvantages:• Nocontrolofdataonceaccessisgiven

• Highbarrierforaccess–toohigh?

Page 13: Finding and Accessing Human Genomics Datasets

Often a long process

Bottlenecks: • Finding relevant and usable

data• Getting authorisation to

access data• Formatting data• Storing and moving data

We studied the problem with qualitative interviews followed by a survey of researchers in

human genetics

T. A. van Schaik et alThe need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 http://tinyurl.com/schaik-dnadigest

Page 14: Finding and Accessing Human Genomics Datasets

NIH / eRA Commons login

No

Yes

Organisation registered with eRA

Organisation has DUNS number

No

NoWrite research proposal

Yes+ 2-3 days

+ 1-2 weeks

+ 1 week

Yes

Submit proposal

+ 1-2 days

Access grantedFind/Download/Decrypt data

+ 1-4 weeks

Science…

+ 1-2 days

PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets.

dbGaP application process

Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/

Page 15: Finding and Accessing Human Genomics Datasets

Sanger eDAM Account

No

Write research proposal

+ 1 hourYes

Submit proposal

+ 1-2 days

Access grantedFind/Download/Decrypt data

+ 2-7 days

Science…

+ 1-2 days

EGA application process

Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/

Page 16: Finding and Accessing Human Genomics Datasets

• Findingspecificrelevantgenomicdataforresearchcantakeup to six months foranuntrainedresearcherwithoutdedicatedtools

• Application&responsetimefordata access committees can vary widelydependingon• thetypeofdataset• consentregulationsofthestudy

• =>thereisnoconsensusforthe‘contracts’betweeneachdataset

FACTS

Page 17: Finding and Accessing Human Genomics Datasets

Researchers often choose to not access data at all

Page 18: Finding and Accessing Human Genomics Datasets

WHY should we bother?

Page 19: Finding and Accessing Human Genomics Datasets

• Validateexistingstudies• Avoidunnecessaryduplication• Comparetonewstudies• Enhancenewdatasets

Why datasets are useful

Page 20: Finding and Accessing Human Genomics Datasets

Case studies

Raquel,PhDStudent,London,UK.

Researchinggenesassociatedwithrareeyedisorders.

Problems:- Doesn’tknowwheretolookfordata.- Doesn'tknowifdataevenexists.

“I gave up on finding the data - it was very time consuming and not proving fruitful – so I started focusing more on generating my own data.”

Page 21: Finding and Accessing Human Genomics Datasets

Case studies

Mahantesh,AcademicResearcher,Taipei,Taiwan.

Studyingpharmacogenomicsincardiovascularepidemiology.

Problems:- Needslotsofdata.- Knowsitexistsbutstruggleswithgettingaccesstoit.

“Often it’s very hard to get the required number of cases and controls to carry out research in public health and epidemiology.”

Page 22: Finding and Accessing Human Genomics Datasets

Case studies

Jana,CompanyBiocurator,Zurich,Switzerland.

BiocuratingmicroarrayandRNA-Seqdata.

Problems:- Needslotsofdata.- Lotsofdataouttherebuthardtofilterdownto‘useful/relevant’data.

“Many repositories don’t list the metadata details I need to know if a dataset is useful to me, I can waste a lot of time searching.”

Page 23: Finding and Accessing Human Genomics Datasets

How many data sources?

How many sources of human genomics data do you know

about?

Page 24: Finding and Accessing Human Genomics Datasets

Data sources across the globeGEOlocationof278datasourcesanalysed.

Found by tracking IP address of the source.

Theseinclude:

PublicRepositories

Universities

Companies

BioBanks

Researchconsortiums

Page 25: Finding and Accessing Human Genomics Datasets

Data source content

Assay Types

Dedicated to…

Page 26: Finding and Accessing Human Genomics Datasets

DATA is fragmented

Page 27: Finding and Accessing Human Genomics Datasets

Hundreds of data sources…buttheyaren’teasytofind!

http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here:

Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-160

50

100

150

200

250

300

1025 33 35

102

174

239

Page 28: Finding and Accessing Human Genomics Datasets

Cambridge specific Case Study

Page 29: Finding and Accessing Human Genomics Datasets

• PostdoctoralresearcheratUniversityofCambridgeMedicalSchool

• WorkingongeneticinheritanceandCancer• UsingNGSdataandbioinformatics

• Aftersearchingfordataonlineshedecidedtoapplyfor:• 2dbGaPdatasets• 3EGAdatasets

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 30: Finding and Accessing Human Genomics Datasets

The Research Operations Office -willhelpyouwiththecontracts(DataTransferAgreements-DTAs)andsignatures.

• HasadesignatedindividualwhoprocessesalldbGaPapplicationsastheyallabidebyNIHlegalrestrictionsandregulationsabouthowtohandlethedataoncegrantedaccess

• ForEGAapplications,eachDTAmustbeprocessedseparatelybecausethereisnoconsensusforthe‘contracts’betweeneachdataset.

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 31: Finding and Accessing Human Genomics Datasets

The nominated IT director -willbespecifictoyourdepartment.

• TheywillneedtoconfirmyoucansupporttherequirementsoftheDTA.

• IftheheadofyourdepartmentalITisnothappytosign–theheadofITfortheUniversitywillbeabletosignitoff.

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 32: Finding and Accessing Human Genomics Datasets

Top Tips:

• Thinkaboutyourstoragespace!

• Thinkaboutwhatsortofanalysisandprocessingyouaregoingtodowiththedataonceyoudohaveit.Aftersuchalongprocess,theapprovalcouldbetooquick.

• Understandwhatyouneedbeforeyoustarttheapplicationprocess!

• Youmayhaveaccessforalimitedperiod

Cambridge specific Case Study

Page 33: Finding and Accessing Human Genomics Datasets

COFFEE BREAK

Backin10’

Page 34: Finding and Accessing Human Genomics Datasets

@repositiveio

Page 35: Finding and Accessing Human Genomics Datasets

1-click to human genomic data access

to make finding data as easy as finding a book on Amazon, book a hotel on Expedia!

Page 36: Finding and Accessing Human Genomics Datasets

Simpler workflowfor data access

Our expertise is data search platforms

Discoverandaccess

Search,seerelatedresults

Findcolleagues&theirdata interests

Co-annotatedata&communityfeedback

Page 37: Finding and Accessing Human Genomics Datasets

We are enabling best practices

MAKE DATA DISCOVERABLE

SIMPLIFY WORKFLOWS

CONTRIBUTE TOCOMMUNITY

DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive

Page 38: Finding and Accessing Human Genomics Datasets

Connecting the world of genomic data

Page 39: Finding and Accessing Human Genomics Datasets
Page 40: Finding and Accessing Human Genomics Datasets

1.Formgroupsof2-3people2.Selectaleader&aspokeperson3.Choose1data theme youareinterestedin

1. E.g,coloncancer,prostatecancer,breastcancer

4.Signupathttps://discover.repositive.io/5.SearchtheRepositivewithselectedtheme

Hands on

Page 41: Finding and Accessing Human Genomics Datasets

Team presentation: 2 minutes

1. Introduction What data did you try to find and why?Have you tried to search for this data before?

2. MethodsThe 5 main steps you took on Repositive to try and find this data.

3. ResultsDid you find the data on Repositive?What challenges did you encounter?

4. ConclusionSum up your experience in 1 sentence.

1 2 3 4 5

Page 42: Finding and Accessing Human Genomics Datasets

Feedback on the workshop

Bugs and feedback to: Charlotte at Repositive.io

Please leave your feedback on the workshop:

http://tinyurl.com/feedback280916

Page 43: Finding and Accessing Human Genomics Datasets

http://discover.repositive.io @repositive