14
Data sharing Mark Walport 13 March 2009

Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Embed Size (px)

Citation preview

Page 1: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Data sharing

Mark Walport13 March 2009

Page 2: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Research is generating rapidly increasing volumes of data…

0

20

40

60

80

100

120

DNA sequencing: total gigabases by week(80 gigabases per week is 130,000 bases per second)

Structural Biology: Structures deposited in the Protein Data Bank*

*graph extracted from the Interim Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (Dec 2008)

Courtesy of Julian Parkhill

Page 3: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Field descriptions:

1. submitted name, description MPD_name MPD_description intervention units agesex females only

Pedigree Number mouse_ id pedigree number nBody Weight Total body weight in gm bw body weight high-fat diet 14wks g 20wksLiver Weight Weight of the liver in gm liver_wt liver weight high-fat diet 14wks g 20wksWeek 14 HDL HDL cholesterol after W14 on Atherogenic diet (mg/dl) HDLC HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 Non HDL Cho Non HDL cholesterol after W14 on Atherogenic diet (mg/dl) nonHDLC non-HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 Total Cho Total cholesterol after W14 on Atherogenic diet (mg/dl) chol total cholesterol (plasma) high-fat diet 14wks mg/dL 20wksWeek 14 TG Triglycerides cholesterol after W14 on Atherogenic diet (mg/dl) TG triglycerides (plamsa) high-fat diet 14wks mg/dL 20wksBMD Bone mineral desity (g/cm2) BMD bone mineral density high-fat diet 14wks g/cm2 20wks%Fat Percent body fat (%) pct_ fat percent fat high-fat diet 14wks % 20wksBMI Body Mass Index (BMI = weight(gm)/ (length(cm))2) BMI body mass index (BMI) high-fat diet 14wks g/cm2 20wksAtherosclerotic Lesion Size Atherosclerotic Lesion Size (μm2) aortic_ lesion fatty streak aortic lesion size high-fat diet 14wks μm2 20wks

2. submitted name, description MPD_name MPD_description intervention units ageSex sex (0=female, 1=male) sexMouse-num Mouse ID number mouse_ idPedigree paternal grandmother [0 = (AxB)x(AxB) or (BxA)x(AxB); 1 = (AxB)x(BxA) or (BxA)x(BxA)pgm paternal grandmother [0 = (AxB)x(AxB) or (BxA)x(AxB); 1 = (AxB)x(BxA) or (BxA)x(BxA)BW Body Weight [gm] bw body weight high-fat diet 14wks g 20wksHDL HDL cholesterol [mg/dL] HDLC HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wkslogHDL log of HDL cholesterol HDLC_ log HDL cholesterol (plasma) (log) high-fat diet 14wks mg/dL 20wksLogNon-HDL log nonHDL cholesterol nonHDL_ log non-HDL cholesterol (plasma) high-fat diet 14wks mg/dL 20wksTG Triglycerides cholesterol [mg/dl] TG triglycerides (plamsa) high-fat diet 14wks mg/dL 20wksPnt-Fat Percent body fat (%) pct_ fat percent of body weight that is fathigh-fat diet 14wks % 20wksLogBMI log of Body Mass Index BMI_ log body mass index (BMI) (log) high-fat diet 14wks g/cm2 20wkslesion-binary Aortic Lesions (binary coding) aortic_ lesion_presencefatty streak aortic lesions (0=absent, 1=present)high-fat diet 14wks score 20wkslesion-lt0 Aortic Lesions (no lesion = 0, lesion = 1)lesion Atherosclerotic Lesion Size (μm2) aortic_ lesion fatty streak aortic lesion size high-fat diet 14wks μm2 20wksLogLesion-plus1 Log of the Lesion Size aortic_ lesion_ logfatty streak aortic lesion size (log)high-fat diet 14wks μm2 20wksTotal-BMD Bone mineral desity [g/cm2] BMD_total bone mineral density (whole body)high-fat diet 14wks g/cm2 20wksstd-tBMD Whole body areal Bone mineral density by DXA (PIXI) BMD_areal areal bone mineral density (whole body)high-fat diet 14wks g/cm2 20wksVertebral-BMD Vertebral Bone Mineral Desity [g/cm2] BMD_vertebralvertebral bone mineral density high-fat diet 14wks g/cm2 20wksstd-vBMD Spinal Areal Bone mineral density by DXA (PIXI) BMD_spinal spinal areal bone mineral densityhigh-fat diet 14wks g/cm2 20wksPrin1 Principle Component 1 (not defined) PC1 Principle Component 1 high-fat diet 14wks score 20wksPrin2 Principle Component 2 (not defined) PC2 Principle Component 2 high-fat diet 14wks score 20wksPTH Parathyroid hormone [pg/ml] PTH parathyroid hormone high-fat diet 14wks pg/mL 20wksLog PTH log Parathyroid hormone PTH_log parathyroid hormone (log) high-fat diet 14wks pg/mL 20wks

Data for this project were submitted in two data files, with some measurements duplicated. Data are collated in this data file; duplications have been purged.

…in a diverse range of formats

Page 4: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Through sharing data we can increase its powerGenome-wide association

studies are revealing the genetic basis of common diseases through combining data across large patient cohorts…

High CholesterolObesityMyocardial infarctArrhythmiasType 2 DiabetesProstate cancerBreast cancerColon cancer

KCNJ 11

2003

KCNJ 11

20032000

PPAR

2000

PPAR

2001

IBD5NOD2

2001

IBD5NOD2

2005 20062002

CTLA4

2002

CTLA4

2004

PTPN22

2004

PTPN22

Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma

CD25IRF5

PCSK9CFH

2007

NOS1APIFIH1PCSK9CFB/C2

LOC3877158q24IL23RTCF7L2

CDKN2B/A8q24 #28q24 #38q24 #48q24 #58q24 #6

ATG16L15p13

10q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/AIGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR1

BTBD9C3

8q24ORMDL3

4q25TCF2GCKRFTO

C12orf30ERBB3

KIAA0350CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

NOS1APIFIH1PCSK9CFB/C2

LOC3877158q24IL23RTCF7L2

CDKN2B/A8q24 #28q24 #38q24 #48q24 #58q24 #6

ATG16L15p13

10q21IRGM

NKX2-3IL12B3p211q24

PTPN2TCF2

CDKN2B/AIGF2BP2CDKAL1

HHEXSLC30A8

MEIS1LBXCOR1

BTBD9C3

8q24ORMDL3

4q25TCF2GCKRFTO

C12orf30ERBB3

KIAA0350CD22616p13PTPN2SH2B3FGFR2TNRC9

MAP3K1LSP18q24

LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A

2006

… and databases such as DECIPHER at the Wellcome Trust Sanger Institute, enable researchers to share data to gain new insights

DECIPHER: Overview map of consortium members

Courtesy of Leena Peltonen

Page 5: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

• personal information is individual & precious to each one of us – it’s vital that we treat it properly

• there is a long-standing and healthy debate about the balance between the right to privacy and the necessity to hold and share data

• review announced by PM on 25/10/2007 during his speech on liberty

Data Sharing Review

Page 6: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

High profile disasters…

Page 7: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Data sharing

Public & private benefit• law enforcement/

public protection national security crime protecting the vulnerable

• key issue: proportionality

Page 8: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Data sharing

Public & private benefit• provision of services

targeted services easier transactions improved efficiency

• key issue: choice of terms of service

Page 9: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Data sharing

Public & private benefit• research and statistics

enabling robust and evidence led policy

• key issue: confidentiality vs

consent

BMJ 2 2001; 323: p363

Br J Cancer 2002; 86: p1732

Prenat Diagn. 2007; 27 p1191

Page 10: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Common issues highlighted• general confusion/uncertainty

about the law, including the complex interplay between different strands of the law

• a lack of public trust in data handling

• a lack of transparency in what is done with personal data

• inadequate regulatory powers, sanctions and resources

• lack of clear accountability in a complex shared data environment

• the subject access process out-of-date in the internet era

Page 11: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Recommendations (1)• Transform personal and organisational culture:

clarify corporate governance; maximise transparency; improve training; and consider credentials

• Clarify and streamline legal framework: urge HMG to lead EU debate on reform; authoritative guidance through statutory CoP; and a new statutory fast-track procedure where there is a strong case for removal of a legal barrier

• Ensure effective enforcement: implement fine provisions quickly; provide new powers to inspect & audit; ensure adequate resources; and revise structure

• Safeguard and protect publicly available (online) information: call for new enquiry into online services that aggregate personal information; and ban the sale of the edited electoral register

Page 12: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Recommendations (2)• develop mechanisms for safe and

secure research and statistical analysis:

• develop ‘safe havens’ as environment for population-based research and statistical analysis

• create system for accrediting approved researchers

• government departments wishing to develop, share and hold should work with academic and other partners to create safe havens

• call on HMG and NHS to maximise potential benefits made possible through safe havens

• NHS should develop a system to allow approved researchers to work with healthcare providers to identify potential patients

Page 13: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Coroners and Justice Bill• welcome the intention

• but essential to ensure that appropriate safeguards are in place to facilitate beneficial data sharing while maintaining public confidence

• order making powers relates to Thomas/Walport recommendation 8 on overcoming legal obstacles and absent powers

• separate package of measures for research access (recommendations 15-17 Thomas/Walport)

Page 14: Data sharing Mark Walport 13 March 2009. Research is generating rapidly increasing volumes of data… DNA sequencing: total gigabases by week (80 gigabases

Future challenges• data security

management & governance of data public trust

• infrastructure rising volumes & complexity of data pose immense

challenges for storage & curation key data resources need coordinated and long-term

sustainable funding ELIXIR is aiming to build a sustainable

infrastructure for biological information across Europe

• technical and cultural issues coordination & advocacy from key communities provision of information & guidance for researchers appropriate incentives & recognition for researchers development of key technical standards, metadata,

etc nurturing skills in data management