31
Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research? Professor Richard Sinnott Technical Director National e-Science Centre University of Glasgow 9 th May 2006

Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research?

  • Upload
    liora

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research? Professor Richard Sinnott Technical Director National e-Science Centre University of Glasgow 9 th May 2006. Grids and e-Research. Classical characteristics HPC, data deluge, … More recent push - PowerPoint PPT Presentation

Citation preview

Page 1: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Can Grids Deliver the Vision for Future Hypothesis Driven Life

Science Research?

Professor Richard SinnottTechnical Director National e-Science Centre

University of Glasgow

9th May 2006

Page 2: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

sensor nets

Shared data archives

computers

software

colleagues

instruments

Grid

Grids and e-Research• Classical characteristics

– HPC, data deluge, …

• More recent push– Security, virtual organisations, usability, …

Page 3: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

E-Health Future Drivers

• The big questions– Why do people who eat less tend to live longer?– Is there a genetic reason why Scotland has such a high incident

rate of cardiovascular disease? How significant are social, cultural, occupational factors in this?

– …

• Tailored e-Heath– Wouldn't it be wonderful to know what measures you could take to

stave off/prevent the onset of disease? – Wouldn't it be a relief to know that you are not allergic to the drugs

your doctor just prescribed? – Wouldn't it be a comfort to know that the treatment regimen you

are undergoing has a good chance of success because it was designed just for you?

Page 4: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

The Big Picture…N

ucl

eoti

de

seq

uen

ces

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

GRID

SECURITY

Ep

idem

iolo

gy

+social, lifestyle, occupational,

environmental, …

Page 5: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Southampton

London

Belfast

Daresbury Lab

RALHinxton

NeSC in the UK

NeSC

Core National Grid Service

White Rose Grid

HPC(x)

CSAR

Challenges/ Opportunities

?

The next Grid software

• There are still issues to be resolved– OGSA definition and delivery

• Standards OGSI, WSRF, …• Technologies GT2, GT3, GT4, EGEE, OMII…

– What about the science drivers• What data sets, what services, accessed by

whom, …– Longevity of systems…?

– If I build a Grid infrastructure for you, do you promise not to change your requirements (completely!)

Page 6: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Grid Projects & Experiences

Page 7: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

BRIDGES Project

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MGI

HUGO

OMIM

SWISS-PROT

… DATA HUB

RGD

SyntenyService

Information Integrator

OGSA-DAI

Magna Vista Service

VO Authorisation

blast

+ + +

Page 8: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

www.nesc.ac.uk

MagnaVista

Page 9: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?
Page 10: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?
Page 11: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?
Page 12: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?
Page 13: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

BRIDGES Security

• Used PERMIS (www.permis.org) to provide fine grained security (authorisation)– XML based policies digitally signed (tamperproof) and used to make

authorisation decisions when users invoke services• (XACML based policies coming…)

– Use SAML callouts to transparently link Grid service and policies• Data Policies

– Only members of CFG can access all public and local warehoused data

– Other guest users can only access remote genome databases• Security at DB level!

• Computational Policies – CFG members can run BLAST across NGS, Glasgow clusters and

Condor pools– Guest users only get access to the Condor pool

• Users do not need their own X.509 certificates – all hidden behind portal!

Page 14: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

BRIDGES data • Originally planned that would have many different

types of data with different security requirements– Public data: data from public sources – Processed public data: public data that has additional annotation or indexing to

support the analyses needed by CFG – Sensitive data: data about individuals in the cohorts of patients and the data

derived from animal experiments – Special experimental data: such as quantitative trait loci (QTL) or microarray data – Personal research data: data specific to a researcher as a result of experiments or

analyses that that researcher is performing – Team research data: data shared by the team members at a site – Consortium research data: data produced by one site or a combination of sites that

is now available for the whole consortium– Personalisation data: metadata collected and used by the bioinformatics tools

pertinent to individual users

• …but scientists reluctant to share their data!

Page 15: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

JDSS Project• Public data resources openness

– Often cannot query directly nor easy/possible to find schemas (and they change… often!)

– Joint Data Standards Study investigated this• Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI

and involved– Digital Archiving Consultancy

– NeSC (Edinburgh and Glasgow)

– Bioinformatics Research Centre (Glasgow)

• Looked at technical, political, social, ethical etc issues involved in accessing and using public life science resources

• Final report completed September 2005 and available at:– www.mrc.ac.uk/prn/pdf-jdss_final_report.pdf

» (to also appear as a NeSC technical report)

Page 16: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Grid Enabled Microarray Expression Profile Search (GEMEPS)

• 1 year project (just) started 1st March 2006– Funded by BBSRC

• Involves Glasgow, Cornell University, US, Riken Institute, Japan

– Aim to provide tools for discovery, comparison and analysis of microarray data sets

• How does my data compare to others?• How do these experiments compare?• Can we improve the way we establish how genes in different species are

linked?• …

– Microarrays expensive and contain potentially important (valuable) data sets

– Fine grained security essential (and willingness of researchers to collaborate)!

Page 17: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Grid Enabled Microarray Expression Profile Search (GEMEPS)

• Why bother…?– Major journals require experimental data to be published– Minimal Information About a Microarray Experiment (MIAME) standard

• Does not provide sufficient information for scientist to repeat experiment, to compare results, …

– Scientists often unwilling to spend time to provide additional meta-data• …experiences from BRIDGES

– Scientists also now questioning sensitivity of microarray data results• Gene names and expression values vs ordering of gene expression values • Initial prototypes support both of these but issues of gene naming

– entrez, unigene, go, …

– Work on searching/mining of public repositories on-going • including GEO, arrayExpress, …

Page 18: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

• GEODE– Funded by ESRC lead by University of Stirling with NeSC Glasgow

• Two year project aiming to develop Grid enabled portal for occupational data

– includes integration of various existing classification scheme

– Many occupational classification schemes exist• Used by different researchers/sociologists

– Linkage to national and international census data sets

– When is a plumber not a plumber?– When they are a water transport technician…?

• How many plumbers had a heart attack in Scotland in the last 2 years?

Grid Enabled Occupational Data Environment (GEODE)

Page 19: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES• Virtual Organisations for Trials and Epidemiological Studies

– 3 year MRC (£2.8M) funded project expected to start imminently– Plans to develop Grid infrastructure to address key components of clinical

trial/observational study• Recruitment of potentially eligible participants• Data collection during the study• Study administration and coordination

– Involves Glasgow, Oxford, Leicester, Nottingham, Manchester

– Prototypes available now building on SCIStore, GPASS, consent DB, existing trials repositories

Clinical Virtual Organisation Framework

IMP

CVO-2 (e.g. for

recruitment)

Used to realise

GPs

Lei- Nott GLA

OX

Disease registries

Hospital databases

Transfer Grid

CVO-1 (e.g. for data collection)

Clinical trial data sets

Page 20: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Distributed Data Framework

OGSA-DAIService

GlobusContainer

PortalGrid Server Data Server

DrivingDB

SCI Store 2(SQL Server)

SCI Store 1(SQL Server)

Consent DB(Oracle 10g)

RCB Test Trials DB

(SQL Server)

User Authentication

GlasgowOther

Transfer Grid

Nodes

Remote Trust Policies

Authorisation Access Matrix Security Policies

Access Security Policies

Local Trust

Policies

Local Trust

Policies

Local Trust

Policies

OGSA-DAIService

GlobusContainer

Grid Server Data Server

DrivingDB

SCI Store 2(SQL Server)

SCI Store 1(SQL Server)

Consent DB(Oracle 10g)

RCB Test Trials DB

(SQL Server)

GlasgowOther

Transfer Grid

Nodes

Remote Trust Policies

Authorisation Access Matrix Security Policies

Access Security Policies

Access Security Policies

Local Trust

Policies

Local Trust

Policies

Local Trust

Policies

GPASS

Local Trust

Policies

Local Trust

Policies

Page 21: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES Data Federation Portal Beta Prototype

Page 22: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES Data Federation Portal Beta Prototype

Page 23: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES Data Federation Portal Beta Prototype

Page 24: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES Data Federation Portal Beta Prototype

Page 25: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

VOTES Data Federation Portal Beta Prototype

Page 26: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Generation Scotland Scottish Family Health Study

• Five (2+3) year proposal (£4.6M) started January 2006– Funded by Health Department and Department for Enterprise and

Lifelong Learning• Involves Glasgow, Dundee, Edinburgh, Aberdeen

– focus of genetics as applied to healthcare

– first two years emphasis on providing a platform for research into the genetic basis of common complex diseases in Scotland

» Mental health, cardiovascular, … » Plan to establish 15,000 family-based intensively-phenotyped cohort

recruited from the East and West of Scotland

– basis for neutralising heritable (genetic) risk factors in disease surveillance, treatment optimisation, avoidance of adverse drug events and prediction of response to therapy, health care planning and drug discovery, …

– Recruitment process has started already!

Page 27: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

• GLASS – JISC funded started January 2006

• Exploring early adoption of Shibboleth– Working with Computer Services directly

• Scenarios based upon teaching and access to NHS resources/data• Builds upon university wide unified account management system being rolled

out (based on Novell nSure technology)

• ESP-Grid– JISC/Oxford University funded

• Developing demonstrator to show how Grid resources can be accessed and used via Shibboleth technology

– Initial prototypes already available

• Grid Security Report – JISC/JCSR funded

• Focus on Grid security practices, middleware and outlook– Contact me if want a copy!

Security Related Projects

Page 28: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

DyVOSE Project

PERMIS based Authorisation checks/decisions

Glasgow Education

VO policies

Glasgow EdinburghCondor pool

Grid BLASTData

Service Nucleotide+ Protein

Sequence DB

Grid-data Client

Grid BLASTService

Edinburgh Education VO policies

LDAP LDAP

Implemented by Students

Protein/nucleotide sequence data returned based on student team and Edinburgh policy

data input

Job scheduling/data management

Glasgow SoA using Edinburgh DIS Create new ACs for Glasgow users/roles

Page 29: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

Future• The Grid is not a magic wand

– Your data quality issues won’t go away– We can however identify what these are

• SCIStore schema incompatibilities

• Ethics and legal aspects essential– Working closely with NHS

• Consent crucial– Scenarios now implemented looking at patient consent via GPASS

Page 30: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?

The Future…N

ucl

eoti

de

seq

uen

ces

Nu

cleo

tid

e st

ruct

ure

s

Gen

e ex

pre

ssio

ns

Pro

tein

Str

uct

ure

s

Pro

tei n

fu

nct

ion

s

Pro

tein

-pro

tein

inte

ract

ion

(p

ath

way

s)

Cel

l

Cel

l sig

nal

lin

g

Tis

sues

Org

ans

Ph

ysio

logy

Org

anis

ms

Pop

ula

tion

s

GRID + Security

Page 31: Can Grids Deliver the Vision for  Future Hypothesis Driven Life  Science Research?