Dr Richard SinnottTechnical Director National e-Science Centre
||| Deputy Director Technical Bioinformatics
Research Centre University of Glasgow
18th March 2004
BRIDGESStatus Report
Overview
Review goals of Bridges projectBriefly summarise technical approach
Outline achievements thus farDemonstrationPlans for the future
Bridges Goals
High blood pressure affects 25% of adults in western societiesCardiovascular Functional Genomics (CFG) project investigating this through physiological models of hypertension in ratBridges is a supporting project to CFG and will provide Grid infrastructure to facilitate scientific researchCFG project partners are distributed but need to access and integrate various software and especially data resources
Main aims of BRIDGES are to develop re-useable infrastructure to provide data federation incorporating appropriate security concerns
CFG Partner Distribution
Shared data
Glasgow Edinburgh
Leicester
Oxford
London
Netherlands
Public curated
data
Private data
Private data
Private data
Private data
Private data
Private data
Problems to be addressed
BRIDGES will address the following problems facing CFG biologists
How to integrate data with multiple levels of security including public data, project only data and private data?How to search multiple distributed databases through single optimised queries?How to use multiple tools in a coordinated (and automated) manner, e.g. how to develop re-useable workflows for the CFG scientists?Integration of a range of bioinformatics analysis and visualisation tools, e.g. BLAST, genome browsers, etc. How to deal with inconsistencies of online databases and possible “dirty data”?How to get more “up to date” data?
Make it all user friendly… portals, hidden infrastructure, e.g. security authorisation
Planned Approach
BRIDGES will address these problems throughDevelopment of re-useable Grid services based upon GT3 technologies
Virtualisation of multiple distributed data sets to provide a single virtual data set for use by the biologists – exploiting IBM’s DiscoveryLink
Developing a collection of data on a well-managed platform, including copies of extracts of relevant public data, all project data, and the required software tools (administered using DB2 and DiscoveryLink)
Access to and integration of multiple distributed data sets in a Grid environment using results from the OGSA_DAI/DAIT projects
A secure environment offering authentication and authorisation will build on results of the PERMIS security authorisation project
Bridges team
Project ManagementRichard SinnottDave Berry
Database Design/DevelopmentDerek Houghton
Grid Services DeveloperMicha BayerMagnus Ferrier
Technical InputDavid White, Jean-Christophe Mestres, Andy Knox, Emmanuel Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow)Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak,
Achievements
Web site and project portal establishedhttp://europa.nesc.gla.ac.uk/wps/portal
Engaged with CFG consortiaStaff trained in relevant technologies
GT3, DiscoveryLink, Condor
Initial version of local repository developed
Populated with data that cannot be federated e.g. public data sets with no programmatic interface
– Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite, Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, …
Includes shared data sets of CFG scientists QTL DB, …
Achievements …ctd
GT3 based Grid services offered that allow to make use of these local data sets
Grid enabled BLAST services produced Offer access to large e-Science infrastructures at Glasgow
(ScotGrid)
SyntenyVista tool extended to allow Grid enabled visual navigation of genomic data sets
Planned front end for many other tools
ExternallyPoster at AHM 2003Tutorial submitted to ISMB/ECCB (the major bioinformatics conference)Liaising with other projects
eDIKT, myGrid, GeneGrid, PERMIS, ...
Achievements …ctd
Demonstration of some of the achievements
Plans
Refine/extend and requirementsFurther refinement of use cases & scenariosMore data sets (public, shared, private, …)
Implementation and realisation of further use casese.g. extended query services for microarray data interpretation, workflows for probe set mapping, …
Security realisation and roll-outWe can only help share CFG data sets if we can get SECURE access to them – following up with CFG sites
Authorisation with PERMIS coming GSI based authentication
Investigate application of replication manager (RLS) Should support illusion of data from each site being available to all other sites
Further Grid based data visualisation services accessible via SyntenyVistaEnsure that keep track of relevant developments (WSRF, GT4, …)
To sequence To multiplealignment
To tabularsummaries
DRILL-DOWN FUNCTIONS
Future Vision of Tools via Portal
Questions?