Upload
horatio-hopkins
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Lessons learnt building OGSA-
DAIEGC 2005
Malcolm Atkinson
Director
www.nesc.ac.uk
15th February 2005
Contents
Why invest in shared softwareFacilitating ApplicationsFacilitating Production useImproving code quality
The Data BonanzaOGSA-DAIInternational Collaboration
Foundation for economic high-quality e-Infrastructure
Summary & Take Home Message
Eternal Triangle
Applications
OperationsDevelopers
allwant reliability,dependability,
security, performancelong-term stability
How do you balance innovation and safe engineering
Eternal Triangle
Applications
OperationsDevelopers
allwant reliability,dependability,
security, performancelong-term stability
Many e-Infrastructures
Mix & Match model
Prefer just one e-Infrastructure
Testing & maintenance limit
innovation
One e-InfrastructureStable with good
management tools
Compromise: One e-Infrastructure – Select services & libraries
Agreed & Simple APIs / mappings wrap all common functions
Generating & Storing Data gets Easier
More, faster & cheaper digital devicesHigher resolutionGreater deploymentFaster duty cyclesDo they produce required metadata?
Larger, faster & cheaper storage technologyEconomic to store (multiple copies of) primary dataEconomic to store derived dataCrucial to store sufficient good quality metadata
Digital Communications – higher bandwidth & cheaper
Practical to access and copy remote dataLatency not decreasing – get what you need in a few trips
Multi-dimensional Growth
The Number of Data CollectionsGrows rapidly
The Size of each Data CollectionGrows rapidly
The Complexity of each Data CollectionGrows rapidly & autonomously
The Interdependencies between Collections
Grow rapidly
The User communitiesGrow rapidly – dispersed, diverse & mingling
Data Integration is Everything
MotivationNo business or research team is satisfied with one data resource
Data Curation Expertise Human CentredIntegration Human centredDomain-specialist driven
Dynamic specification of combination functionIterative processes
Revised request minutes later Revised request after months of thought
Sources inevitably heterogeneousTime-varying content, structure & policiesRobust, stable steerable integration services
Higher-level services over multiple resourcesFundamental requirements for (re)negotiation
Federation or Virtualisation
preceding integration
or kit of integration tools to be interwoven
with an application?
Scientific Insightneeded here
Data Integration is Everything
MotivationNo business or research team is satisfied with one data resource
Data Curation Expertise Human CentredIntegration Human centredDomain-specialist driven
Dynamic specification of combination functionIterative processes
Revised request minutes later Revised request after months of thought
Sources inevitably heterogeneousTime-varying content, structure & policiesRobust, stable steerable integration services
Higher-level services over multiple resourcesFundamental requirements for (re)negotiation
Federation or Virtualisation
preceding integration
or kit of integration tools to be interwoven
with an application?
Scientific Insightneeded here
This is the motivation for & home of OGSA-DAIIdentify the recurrent requirements
Provide one infrastructure that meets themWide use enables a robust, reliable and
supported set of facilitiesSteadily increase power of facilities
Steadily raise the level of abstractionStandardise & achieve multi-national
investment
OGSA-DAI Project
OGSA-DAI is one of the Grid Middleware Centre Projects Collaboration between:
– EPCC– IBM (+ Oracle in phase 1)– National e-Science Centre– Manchester University– Newcastle University
Project funding:– OGSA-DAI, 2002-03,
• £3.3 million from the UK Core e-Science funding programme
– DAIT (DAI Two), 2003-06• £1.3 million from the UK e-Science Core Programme II
"OGSA-DAI" is a trade mark
Funded by UK’s Department of Trade & Industry + Engineering
& Physical Sciences Research Council as part of the e-Science
Core Programme
Thanks to Mario Antonioletti for these EPCC slides
Geographically Distributed Team
EPCC Team, Edinburgh
IBM Development Team, Hursley
ESNW, Manchester
Neresc, Newcastle
NeSC, Edinburgh
Communication vital
Web Site
Support Desk
Web Site
Training
Telephone conferences
Face-to-face meetings
Access Grid meetings
Bugzilla
Twiki
IRC
Mailing Lists
Infrastructure
IBM
EPCC
Test Machines & Databases
Test Machines & Databases
IRC
Mailing Lists
Access Grid
Telephone
NeSC
CVS Repository
Twiki
Basic Operational Model
Data Resource
Container
DAISGR
Client GDSF
GDS
More Complex Behaviour
Data Resource
Container
Client GDSGDT
Data Resource
Container
GDS
GDT
Deliver data back to the client.
Data Resource
Deliver data to
a third
party.
Deliver data another GDS.
And there's a lot more that you can do …
GridData
Service
DataResource
PerformDocument
ResponseDocument
ResultData
gzipCompression
zipArchive
xslTransform
Predefined Activities
sqlQueryStatement
sqlStoredProcedure
sqlUpdateStatement
sqlBulkLoadRowset
xPathStatement
xUpdateStatement
xQueryStatement
xmlResourceManagement
xmlCollectionManagementrelationalResourceManager
inputStream
outputStream
DeliverFromURL
DeliverToURL
DeliverToGFTP
DeliverFromGFTP
DeliverToStream
DeliverFromGDT DeliverToGDT
DeliverToFileDeliverFromFile
fileWritingdirectoryAccess
fileAccessfileManipulation
Developers encouraged to roll their own – many do
INFOD
GridFTP
DT
TMBoFADFBoF
GGF
Arch Data ISP SRM
OGSA
CMM GIR
GSM
GFS DAIS
CGS GRAAP
Policy
IETF
SNMP
Other Standards Bodies
????W3C
XQuery
ANSI
SQL
DMTF
CIM
OASIS
WS-DM
WS-RF
WS-N
WSPolicy
WSAddress
OREP
JDBC
JCP
DFDL
Standardisation is important
http://forge.gridforum.org/projects/dais-wg
Example Projects Using OGSA-DAI
OGSA-DAI(http://www.ogsadai.org.uk)
AstroGrid(http://www.astrogrid.org/)
BioSimGrid(http://www.biosimgrid.org/)
BioGrid(http://www.biogrid.jp/)
Bridges(http://www.brc.dcs.gla.ac.uk/projects/bridges/)
eDiaMoND (http://www.ediamond.ox.ac.uk/)
FirstDig(http://www.epcc.ed.ac.uk/~firstdig/)
GeneGrid(http://www.qub.ac.uk/escience/projects.php#genegrid)
GEON(http://www.geongrid.org/)
IU RGRBench(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)
myGrid(http://www.mygrid.org.uk/)
N2Grid(http://www.cs.univie.ac.at/institute/index.html?project-80=80)
ODD-Genes(http://www.epcc.ed.ac.uk/oddgenes/)
OGSA-WebDB(http://www.gtrc.aist.go.jp/dbgrid/)
INWA(http://www.epcc.ed.ac.uk/)
OGSA-DAI User Project classification
OGSA-DAI
BiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• INWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD-Genes
• N2Grid
• GEON
• MCS
• IU RGBench
• OGSA Web-DB
• GeneGrid
• GridMiner
OGSA-DAI Downloads690 downloads since May 04Actual user downloads not search engine crawlersDoes not include downloads as part of GT3.2 releasesData from 13 December 04
Total of 966 registered users
R1.0 (Jan 03) 107R1.5 (Feb 03) 110R2.0 (Apr 03) 254R2.5 (Jun 03) 294R3.0 (Jul 03) 792R3.1 (Feb 04) 655R4.0 (May 04) 939R5.0 (Dec 04) 138
Total 3323
Total Downloads
China25%
United States14%
Japan8%
United Kingdom22%Unknown
8%
Germany5%
Taiwan2%
Italy3%
Austria2%
Further Information
The OGSA-DAI Project Site:– http://www.ogsadai.org.uk
The DAIS-WG site:– http://forge.gridforum.org/projects/dais-wg/
OGSA-DAI Users Mailing list– [email protected]– General discussion on grid DAI matters
Formal support for OGSA-DAI releases– http://www.ogsadai.org.uk/support– [email protected]
OGSA-DAI training courses
Platforms & Users
Currently on OGSI (GT3)Discontinue support when ~ R6
Currently on WS-I+ (OMII1)Will be in next releaseWithout asynchronous & Third-party data transfers
Currently in Preview on WSRF (GT4)Not yet a supported release ~GT4 release
Users about equally dividedSome still use R3!
Re-designed architectureLong list of requested featuresMany projects want long-term support commitments
DS (Mobius)DS (DAIS)
DS (OGSA-DAI)
DRAM
initiateDataService( )
Registry DSDL
Request TADD
DR
Initiates/Manages
0
n
Id – UUIDperformRequest()
Single Service SessionId - UUID
DIDTypeFormat
Txn
Response TADD
Compute&
storageresources
Registry2
Local Store
DRs
DRs
LoggingService
33
OGSA-DAI & Triangle
Applications
OperationsDevelopers
allwant reliability,dependability,
security, performancelong-term stability
One client libraryIncreasingly important
More abstraction needed
Mostly use client library
Some use protocolsNo extra tools yet
No tools or interfaces yetMotivation for new
architecture
Compromise: One e-Infrastructure – Select services & librariesDevelop: Higher-level Client Library & Tools, more Integration, Operations support
OGSA-DAI team needs
Agreed data naming systemOGSA effort – 3-level: human, abstract & physical address
Addresses of state & data resourcesWS-Addressing
Life Time ManagementWS-RF Resource LifeTime – “imported into OGSA-DAI”
PropertiesWS-Resource Properties – easily implemented look alike
Error reportingWS-BaseFaults
Agreed Data Transport AbstractionOGSA-Data Design + InfoD meeting at GGF13 SeoulMost of all we need these standard with APIs
only one of each!
The Ultimate Challenge
Testing Large-scale, always on, distributed persistent infrastructureProduct space of platforms and external components{Oracle, DB2, MySQL, Postgres, …}{Xindice, eXist, …}{files, DFDL, semi-structured, indexed, text-mined, …} {java, J2EE, .NET, …} {OGSI, WS-I+, WSRF, …} {client libraries in: java, C, C++, C#, …} …Growing proportion of team effort – though mechanised
MaintenanceFixing bugs (<20%), Dealing with context changes (>20%)Providing new required functions (~60%)Better coding and testing can at best save 20%Maintenance is a life sentenceNo remission for good behaviour!
Grows to dominate costs and
limit development
To Meet the Challenge 1
Agree an Architecture: OGSA + NextGRID + …
To partition the problem spaceTo raise the level of abstraction & discourseTo provide a framework for collaboration
Environments in which alternative solutions can perform roles
Incremental progress to agreement Profiles
Invest in APIsProtect Application Developer investmentProtect Middleware subsystem investmentClarify requirementsSpecify semantics
To Meet the Challenge 2
Raise the Level of AbstractionGreater benefit for Application DevelopersGreater benefit for other Middleware DevelopersEasier comprehension for designers, implementers & exploitersOpportunity for implementation improvement increases
Form ≤ 2 International Alliances / ConsortiaAgree on target e-Infrastructure function and propertiesSafety of Open Source – future maintenance always possible
But only affordable through alliances
Agree partitioned R&D task: Country X delivers A and Country Y delivers B
Incremental development of relationship Compete partnership trust mutual dependence
Avoid brittle dependency Minimum functionality base platform in which subsystems can work OGSA base profile a good candidate
To Meet the Challenge 3
Desist from Starting from Scratch“pencil sharpening” auto-distractionSort-term illusion of progress and successLong-term – another body of software to maintainDivision of effortYour legacy: Transition and translation problemsYour legacy: Another body of software to maintain
Darwinian survival of the fittestDoesn’t result in “best”Expensive & slow
Some diversity and competition valuableBut don’t let it split users, developers, operations, training, …
Discard ego trips, “nationalism” & excess
competition – wasteful and harm
ful
Observations 1
E-InfrastructureDisruptive technology
Will change what we do and how we do it
Opportunity to reap major benefits Is Europe prepared? Education, Education, Education Will we focus effort? Critical mass. Don’t divide & conquer ourselves
Education essential
Must Collaborate InternationallyTo agree, build and operate e-InfrastructureTo give adequate support to our usersTo afford maintenance and operationsTo facilitate international research, business & decisions
Observations 2 – OGSA-DAI
>30 staff years of effort, Release 5, coming soon R63 platforms, >1000 users, world-wide use, diversityBacked by standards effort – hard work!User community & User groupMajor investment in client-side API
Beneficial for users, application developers & trainingProvides implementation optionsUndergoing re-design
Flexible and Extensible frameworkEssential for applicationsContributors build using this – webDB, streaming data, …No contributor code shipped with releases yet
Diverse demand for new featuresDiverse & multi-platformLooking for reassurance about future support and maintenance
Perhaps 25% of original OGSA-DAI vision built so far
Reserve slides for questions
End of slide show
From OGSA Status and Future, Hiro Kishimoto and Ian Foster, GGF12slide originally from Michael Behrens, DISA consultant
GRID COMPUTING
UTILITY COMPUTING
DISTRIBUTED COMPUTING
Core Services
Base Profile
WS-Addressing
Privacy
WS-BaseNotificati
on
CIM/JSIM
WSRF-RAP
WSDM
WS-Security
Naming
OGSA-EMS OGSA Self Mgmt
GFD-C.16
GGF-UR
Data Model
HTTP(S)/SOAP
Discovery
SAML/XACML
WSDL
WSRF-RL
Trust
WS-DAI
VO Management
Information
Distributed query processing
ASP
Data Centre
Use Cases & Applications
Collaboration Multi Media Persistent Archive
Data Transport
WSRF-RP
X.509
Standard Evolving Gap Hole
Provided by David Snelling (Fujitsu) and Mark Linesch (GGF & HP).
47
Why Invest in OGSArchitecture 4
IntegrationCompletenessAbstraction
CooperationOGSA partitions the e-Infrastructure implementationEncourages independent concurrent & coordinated
Development or evaluation of each part’s standards R&D on implementation of each part
Promises assembly of the partsBasic profile provides context for concurrent R&D
Context for each M/W developer to build against Reduced interdependence – each can deliver if others don’t
Focus effort on reaching
minimum threshold
that makes this work
48
Back OGSA more
Invest effort in OGSAInvestigating, evaluating, contributing, commentingImplementing profilesUsing it
It is the ONLY show in townWhich offers
integration, completeness, abstraction A foundation for collaboration
UK focus on Data Design TeamUK efforts in other design teams
EMS, Grid markets, JSDL, GSM, mySpace, …
49
Use OGSA for Collaboration
Big push to Reach OGSA Basic ProfileSufficient platform, context & frameworkFor safely partitioning further R&D
Agree a division of workUpgrading / alternative trade-off componentsNew componentsHigher level facilities
Minimise duplicationMaximise combined efforts to deliver
Function, Stability, Quality & Abstraction
Bury the egos,
project competition & national pride silos