Neil Chue HongProject Manager, EPCC
[email protected]+44 131 650 5957
OGSA-DAIdata access
and integration
NERC GridGIS workshop
eSI, 1 February 2006
NERC GridGIS workshop - 1 February 2006 2
Overview
• The Data Deluge– challenges of increasing data availability– benefits of bringing data together
• OGSA-DAI– overview– use as a data integration base layer
NERC GridGIS workshop - 1 February 2006 6
Data Services: challenges to management
• Scale– Many sites, large collections, many uses
• Longevity– Research requirements outlive technical decisions
• Diversity– No “one size fits all” solutions will work
– Primary Data, Data Products, Meta Data, Administrative data, …
• Many Data Resources– Independently owned & managed
– No common goals– No common design– Work hard for agreements on foundation types and ontologies– Autonomous decisions change data, structure, policy, …
– Geographically distributed
• and I haven’t even mentioned security yet!
NERC GridGIS workshop - 1 February 2006 8
What is a data service?
• An interface to a stored collection of data– e.g. Google and Amazon– web services
• But the data could be:– replicated– shared– federated– virtual– incomplete
• Don’t care about the underlying representation– do care about the information it represents
• Adding a service layer to existing data sources can improve composability
NERC GridGIS workshop - 1 February 2006 10
Use Cases for Data Services
• Data Filtering:– Single source producing large amounts of data distributed to many sites
downstream
• Data Discovery:– many sources, many query entry points in a linked system
• Data Translation:– source to sink, conversion of data model / structure
• Data Federation:– many sources, linked to provide view as a single source
• Data Replication– full or partial copies to improve throughput
• Data Integration (model aggregation)– e.g. integration of time variant data, streams, files
• Data Integration (knowledge expansion)– forming links between databases to increase knowledge
NERC GridGIS workshop - 1 February 2006 13
OGSA-DAI In One Slide
• An extensible framework for data access and integration.
• Expose heterogeneous data resources to a grid through web services.
• Interact with data resources:– Queries and updates.– Data transformation / compression– Data delivery.
• Customise for your project using– Additional Activities– Client Toolkit APIs– Data Resource handlers
• A base for higher-level services– federation, mining, visualisation,…
NERC GridGIS workshop - 1 February 2006 17
MySQL
OGSA-DAI service
Engine
SQLQuery
JDBCData
Resources
Activities
DB2
The OGSA-DAI Framework
GZip GridFTPXPath
XMLDB
XIndice
readFile
File
SWISSPROT
XSLT
SQLServer
Data-bases
ApplicationApplicationClient ToolkitClient Toolkit
NERC GridGIS workshop - 1 February 2006 18
Intermediary
• Simple intermediary– potential to accelerate development, logging, or filtering
• Persistent intermediary– e.g. to allow efficient local indexing
Client OGSA-DAIRequest & Response D
ata
Res
ourc
e
DR messages
Client OGSA-DAIRequest & Response D
ata
Res
ourc
e
DR messages
Client OGSA-DAIRequest & Response D
ata
Re
sour
ce
DR messages
OG
SA
-DA
IP
rivat
e S
tore
NERC GridGIS workshop - 1 February 2006 19
Redirector, Coordinator, Network
• Allowing composition and decentralisation
consumer
Data
Res
ourc
e
DR messages
Client OGSA-DAIRequest & Response D
ata
Res
ourc
e
DR messagesD
ata
del
iver
y
OGSA-DAI
Request & Response
Client
DR1
DR2
DR3
Data
Res
ourc
e
OGSA-DAI
Data
Res
ourc
eD
ata
Res
ourc
e
DR mes
sage
s
DR messages
DR messages
Data
Res
ourc
e
OGSA-DAI
Data
Res
ourc
eD
ata
Res
ourc
e
DR mes
sage
s
DR messages
DR messages
Data
Res
ourc
e
OGSA-DAI
Data
Res
ourc
eD
ata
Res
ourc
e
DR mes
sage
s
DR messages
DR messages
Request, R
esponse & D
ata Transport
Req
uest
, Res
pons
e & D
ata
Tran
spor
t
Request & Response
Data
Res
ourc
e
Client OGSA-DAIRequest & Response D
ata
Res
ourc
eD
ata
Res
ourc
e
DR messages
DR1
DR2
DR3
NERC GridGIS workshop - 1 February 2006 20
MySQL
OGSA-DAI service
Engine
SQLQuery
JDBC
SQL
JDBC
SQL
JDBC
SQL
JDBC
SQL
JDBC
MultipleSQL GDS
SQLQuery
Extensibility Example
NERC GridGIS workshop - 1 February 2006 21
Map Retrieval: Current
OGC
browser
Internet
Service GISOracle
EDINA
NERC GridGIS workshop - 1 February 2006 22
Map Retrieval: Grid Prototype
OGC
GIS OracleOGSA-DAI 1Client
EDINABasic client to demonstrate proof of concept
SO-OGC
NERC GridGIS workshop - 1 February 2006 23
Map Retrieval: Security
• Exploit NGS infrastructure to provide secure access layer
OGCODS 1 GIS OraclePortlet
Allowed users dn
SO-OGC
NGS Authentication
EDINA
NERC GridGIS workshop - 1 February 2006 24
Map Retrieval: Integration
• Exploit OGSA-DAI extensibility to add e.g. overlay
OGCODS 2 GIS OraclePortlet
ODS 1OracleCensus
ODS 3 Application data
SO-OGC
JDBC
SO-OGC
SQL/XML
NGS Authentication
NERC GridGIS workshop - 1 February 2006 25
OGSA-DAI / EDINA prototyping work
• Stage 1: Using existing OGSA-DAI technology
• Stage 2: Extending OGSA-DAI
OGSA-DAI service
HTTP Data Resource WMS
Server
DeliverFromURL
GISClientGISClient
URLInput Parameters
Image/XML File
HTTP Request
HTTP Response
GISActivities
NERC GridGIS workshop - 1 February 2006 28
Distributed Query Processing
• Higher level services building on
OGSA-DAI– specialised metadata extraction
• Execute queries in parallel over multiple
data resources
• Queries mapped to algebraic
expressions for evaluation
• Parallelism represented by partitioning
queries –Use exchange operators
• Equality based joins in current release– supported types: long, integer, string, double and float table_scan
(protein)table_scantermID=S92(proteinTerm)
reduce
reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange
exchange
3,4
1 2
NERC GridGIS workshop - 1 February 2006 29
DQP architecture
Co-ordinator
Evaluator Evaluator Evaluator
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
Query SQL & OQL
OGSA-DAI activity
WS-I only
Using client toolkit
All interfaces that aresupported by toolkit
NERC GridGIS workshop - 1 February 2006 37
Contributing to OGSA-DAI
• Additional functionality:– Provide activities which implement specific functionality– Provide extra client functionality– Provide different security mechanisms– Provide higher level components and applications
• Different levels of contributions– Based on OGSA-DAI?– Works with OGSA-DAI?– Part of OGSA-DAI?
NERC GridGIS workshop - 1 February 2006 38
In the near future
• A new version of the OGSA-DAI Engine– should look mostly the same externally– better support for concurrency, sessions and monitoring
• Implementing new versions of specifications– DAIS Specifications
• Key things that we will be addressing:– Performance– A Security Model which can be applied across platforms– Full Transactions framework, distributed transactions– More data integration facilities– Better abstraction over DBMS variation
• Application centric queries– collaborating with other projects
• Research projects looking at:– schema mapping– extended data resources
NERC GridGIS workshop - 1 February 2006 39
Associated Meetings and Workshops
• DIALOGUE Workshops (http://www.datagrids.org)– Data Integration Applications: Linking Organisations to Gain
Understanding and Experience– Bringing together Data Integration middleware and application
providers with users– Next one at NeSC: 9-10th February 2006
– http://www.nesc.ac.uk/esi/events/636/
• Next Generation Distributed Data Management (HPDC15,
Paris)– http://www.isi.edu/~annc/distributedDataWorkshop.html
• Data Management on Grids (VLDB’06, Seoul)
NERC GridGIS workshop - 1 February 2006 40
Conclusions
• The benefits of trying to integrate data are hindered by
challenges such as heterogeneity, scale and distribution
• A common data service layer should make data integration
easier
• OGSA-DAI provides an extensible, data service based
framework which makes it easier to implement data
integration
• GIS data is amenable to integration using data services
NERC GridGIS workshop - 1 February 2006 41
Further information
• The OGSA-DAI Project Site:– http://www.ogsadai.org.uk
• The DAIS-WG site:– http://forge.gridforum.org/projects/dais-wg/
• OGSA-DAI Users Mailing list– [email protected]– General discussion on grid DAI matters
• Formal support for OGSA-DAI releases– http://bugs.ogsadai.org.uk/
• OGSA-DAI training courses