Upload
camron-holt
View
218
Download
0
Embed Size (px)
DESCRIPTION
Foundation for e-Science sensor nets Shared data archives computers software colleagues instruments Grid e-Science methodologies will rapidly transform science, engineering, medicine and business driven by exponential growth (×1000/decade) enabling a whole-system approach Diagram derived from Ian Foster’s slide
Citation preview
Chinese Delegation VisitHigh Performance Computer Mission
UK e-Science&
The National e-Science CentreProf. Malcolm Atkinson
Director
www.nesc.ac.uk
22nd October 2003
Outline
What is e-Science?
Our Role in Enabling e-Science
Information Grids & Data Dominates
Foundation for e-Science
sensor nets
Shared data archives
computers
software
colleagues
instruments
Grid
e-Science methodologies will rapidly transform science, engineering, medicine and business
driven by exponential growth (×1000/decade) enabling a whole-system approach
Diagram derived fromIan Foster’s slide
e-Science and SR2002 Research Council 2004-6 2001-4Medical £13.1M (£8M)Biological £10.0M (£8M)Environmental £8.0M (£7M)Eng & Phys £18.0M (£17M)HPC £2.5M (£9M)Core Prog. £16.2M + ?
(£15M) + £20MParticle Phys & Astro £31.6M (£26M)Economic & Social £10.6M (£3M)Central Labs £5.0M (£5M)
www.nesc.ac.uk
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
SouthamptonLondon
Belfast
Daresbury Lab
RAL Hinxton
NeSC in the UKNational
e-ScienceCentre HPC(x)
Directors’ ForumHelped build a
communityEngineering Task ForceGrid Support CentreArchitecture Task Force
UK Adoption of OGSAOGSA Grid MarketWorkflow Management
Database Task ForceOGSA-DAIGGF DAIS-WG
Globus Alliance
Events held in the 2nd Year(from 1 Aug 2002 to 31 Jul 2003)
We have had 86 events: (Year 1 figures in brackets)
11 project meetings ( 4)11 research meetings ( 7)25 workshops (17 + 1) 2 “summer” schools (0)15 training sessions (8)12 outreach events (3)5 international meetings (1)5 e-Science management meetings (7)
(though the definitions are fuzzy!)
> 3600 Participant Days
Suggestions always welcome
Establishing a training team
Investing in community building, skill generation & knowledge development
WorkshopsSpace for real work
Crossing communitiesCreativity: new strategies and solutionsWritten reports
Scientific Data Mining, Integration and VisualisationGrid Information SystemsPortals and PortletsVirtual Observatory as a Data GridImaging, Medical Analysis and Grid EnvironmentsGrid SchedulingProvenance & WorkflowGeoSciences & Scottish Bioinformatics Forum
http://www.nesc.ac.uk/events/
Duties of a NeSC Research Leader
encouraging the uptake of Grid technologies in Astronomy and related fieldsencouraging visitors, with whom you have a research overlap, to visit Edinburgh and work with you and other local colleaguesorganising and running research workshopsassisting with the development of new core Grid and scientific database technologiespromoting NeSC within the Universities of Edinburgh and Glasgow through, for example, personal presentations and more widely at conferences and workshops. …and all that in 0.5 FTE!
Slide from Dr Bob Mann: 30 September 2003
Three-way Alliance
Computing ScienceSystems, Notations &
Formal Foundation→ Process & Trust
TheoryModels & Simulations
→Shared Data
Experiment &Advanced Data
Collection→
Shared Data
Multi-national, Multi-discipline, Computer-enabledConsortia, Cultures & Societies
Requires Much Engineering, Much Innovation
Changes Culture, New Mores, New Behaviours
New Opportunities, New Results, New Rewards
Global Drivers of e-ScienceCollaborationData DelugeDigital Technology
UbiquityCost reductionPerformance increase
Consequential InvestmentUK e-Science £240 million + 80 companiesEU e-InfrastructureUSA cyberinfrastructure…
Derived from Ian Foster’s slide at ssdbM July 03
It’s Easy to ForgetHow Different 2003 is From
1993Enormous quantities of data: Petabytes
For an increasing number of communitiesgating step is not collection but analysis
Ubiquitous Internet: >100 million hostsCollaboration & resource sharing the normSecurity and Trust are crucial issues
Ultra-high-speed networks: >10 Gb/sGlobal optical networksBottlenecks: last kilometre & firewalls
Huge quantities of computing: >100 Top/sMoore’s law gives us all supercomputersOrganising their effective use is the challenge
Moore’s law everywhereInstruments, detectors, sensors, scanners, …Organising their effective use is the challenge
Global Knowledge CommunitiesOften Driven by Data: E.g.,
Astronomy
No. & sizes of data sets as of mid-2002, grouped by wavelength• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Doubling every 12 months• Largest catalogues near 1B objects
Data and images courtesy Alex Szalay, John Hopkins
Tera → Peta BytesRAM time to move
15 minutes1Gb WAN move time
10 hours ($1000)Disk Cost
7 disks = $5000 (SCSI)Disk Power
100 WattsDisk Weight
5.6 KgDisk Footprint
Inside machine
RAM time to move2 months
1Gb WAN move time14 months ($1 million)
Disk Cost6800 Disks + 490 units + 32 racks = $7 million
Disk Power100 Kilowatts
Disk Weight33 Tonnes
Disk Footprint60 m2
May 2003 Approximately CorrectSee also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24
Three Pillars of e-Science Research
Foundations Technology Applications
Apply knownresults
Focus fornew work
Enable newscience
Steering ofdevelopmentInformatics
DCS
Physics & Astronomy
EPCC
ETF&Testbeds
edikt
Repositories
Computingindustry
Research Departments
Research Institutes
Scottish universities
Commercialcustomers
Deployment
Deployment
L2G
OGSA
UK TestGrid
Local GridIBM
Contract
Funding?
Testbed?
Research Applications
NanoCMOSGrid
AstroGrid
ResearchApplications
Physics
FireGrid
ScotGridGridPP QCDGrid
Proposals
BioInf
ODD-Genes?
NeuroInf
e-Health
CS Research
Projects
Mobile Code
CoAKTinG
ScientificData
DIRC
Engage
AutomatedSynthesis
Ontologies Papers
Staff
CS Research
EQUATOR
AMUSE
Middleware, etc.
Grid CoreProgramme
OGSA-DAIDAIT
Middleware,Etc.
ediktOMII
eldas
FirstDIG
BinX
PGPGrid
MS.NETGridBRIDGES
SunDCGGridWeaver ODD-Genes
Data Access, Integration,Publication, Annotation and
Curation
LinguisticCorpora
ArkDB
Theme:DAIPAC
edikt
Mouse AtlasRepositories
OGSA-DAI& DAIT
DCCProposal
ODD-Genes
Mobile Code
ScientificData
CS Research
GTI
BRIDGES
SuperCOSMOS
AstroGrid
ScotGridGridPP
QCDGrid
ResearchApplications
OGSA
Infrastructure Architecture
OGSI: Interface to Grid Infrastructure
Data Intensive Applications for Science X
Compute, Data & Storage Resources
Distributed
Simulation, Analysis & Integration Technology for Science X
Data Intensive Users
Virtual Integration Architecture
Generic Virtual Data Access and Integration Layer
Structured Data
Integration
Structured Data Access
Structured Data Relational XML Semi-structured-
Transformation
Registry
Job Submission
Data Transport Resource Usage
Banking
Brokering Workflow
Authorisation
Data ServicesGGF Data Access and Integration Svcs (DAIS)
OGSI-compliant interfaces to access relational and XML databasesNeeds to be generalized to encompass other data sources (see next slide…)
Generalized DAIS becomes the foundation for:
Replication: Data located in multiple locationsFederation: Composition of multiple sourcesProvenance: How was data generated?
“OGSA Data Services”(Foster, Tuecke, Unger,
eds.)Describes conceptual model for representing all manner of data sources as Web services
Database, filesystems, devices, programs, …Integrates WS-Agreement
Data service is an OGSI-compliant Web service that implements one or more of base data interfaces:
DataDescription, DataAccess, DataFactory, DataManagementThese would be extended and combined for specific domains (including DAIS)
OGSA-DAI ApproachReuse existing technologies and standards
OGSA, Query languages, Java, transportBuild portTypes and services which will enable:
controlled exposure of heterogenous data resources on an OGSI-compliant gridaccess to these resource via common interfaces using existing underlying query mechanisms(ultimately) data integration across distributed data resources
OGSA-DAI (the software) seeks to be a reference implementation of the GGF DAIS WG standard
Can’t keep up with frequent standard changes, so software releases track specific drafts
See http://www.ogsadai.org.uk/ for details.
1a. Request to Registry for sources of data about “x”
1b. Registry responds with
Factory handle2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
3a. Client queries GDS with XPath, SQL, etc
3b. GDS interacts with database
3c. Results of query returned to client as XML
SOAP/HTTPservice creationAPI interactions
Registry
Factory
2b. Factory creates GridDataService to manage access
Grid Data Service
Client
XML / Relational database
Data Access & Integration Services
Third Party Delivery
1
3
Data Set
2
R E Q U E S T O R S T U B
C L I E N T A P I
Data Set
Data Set
dr
C O N S U M E R S T U B
C L I E N T A P I
Data Set
4
GDTS2 GDS3
GDS2
GDTS1
Sx
Sy
1a. Request to Registry for sources of data about “x” & “y”
1b. Registry responds with
Factory handle
2a. Request to Factory for access and integration from resources Sx and Sy
2b. Factory creates GridDataServices network
2c. Factory returns handle of GDS to client
3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc
3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation
SOAP/HTTP
service creation
API interactions
Data Registry
Data Access& Integrationmaster
Client
Analyst XML database
Relational database
GDS
GDS
GDS
GDTS
GDTS
3b. Client tells analyst
GDS1
Future DAI Services
“scientific”Applicationcodingscientificinsights
ProblemSolving
Environment
SemanticMeta data
Application Code
Take Home MessageData is a Major Source of Challenges
AND an Enabler of New Science, Engineering , Medicine, Planning, …
Information GridsSupport for collaborationSupport for computation and data gridsStructured data fundamentalIntegrated strategies & technologies needed
OGSA-DAI is here nowJoin in making better DAI services & standards
Bioinformatics is a Priority Application Area
There are many opportunities forInternational collaboration