20
Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: [email protected]

Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: [email protected]

Embed Size (px)

Citation preview

Page 1: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Using OGSA-DAI in a commercial environment

Terry SloanEPCC

Telephone: +44 131 650 5155

Email: [email protected]

Page 2: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Overview

FirstDIG INWA Outstanding issues raised by these projects

Page 3: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

First Data Investigation on the Grid: FirstDIG

http://www.epcc.ed.ac.uk/firstdig/

Page 4: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Motivation

Few UK e-Science projects involve service companies such as First plc

First plc– Operate worldwide in variety of transport sectors– Over 10000 vehicles in the UK, 23% of the market– UK’s largest operator

The challenge for First– Meeting the needs of the travelling public whilst making money– Data integration and mining may assist but huge range of

fragmented data sources

Page 5: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Data Sources in the Bus Industry

Many different kinds of data involved with running a bus company– Mileage, revenue, customer contact, schedule, fuel consumption,

vehicle maintenance, routes…

Many means to collect data– Manually entered data at depot– Data collected on buses from ticket machines– Data collected on buses from GPS systems– GPS system notes when bus passes through a predefined

“footprint” and records the time at which this happens

Page 6: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Answering Business Questions

Want to combine data from more than one source:– Complaints versus Lateness– Revenue versus Lost Miles– Complaints versus Lost Miles

Want data aggregated in some way:– By Service– By Day

Want to consider subsets of the data– e.g. weekdays only

Page 7: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Disparate Databases

Data is typically stored in disparate databases– Various reasons for this: Incremental construction of systems.– Not a problem for day-to-day running and querying but…

Introduces challenges for Data Analysis– Systems introduced at different times– Different database engines– Different front-ends– Different operating systems– Different physical locations– Different ways of representing data

These issues are NOT unique to buses

Page 8: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

OGSA-DAI

OGSA-DAI– Open Grid Services Architecture : Data Access and Integration– Potentially provides a solution– Need business users to make transition from science to commerce

Grid middleware:– Assists with the access and integration of data from separate data

sources via the Grid– Represents databases as Grid Services– Enables access from other machines in a secure manner

Page 9: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

FirstDIG Achievements

Deployment at First South Yorkshire Combined two databases to answer real business

questions– The Customer Contact System

• Microsoft Access

• Information on customer complaints e.g. time, service, nature

– The Mileage database• dBASE IV

• Information on bus mileage e.g. lost miles

Produced generic Grid Data Service Browser – SQL access including joins across the databases

Page 10: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

First Grid Data Service Browser

Page 11: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Informing Business & Regional Policy: Grid-enabled fusion of global data &

‘local’ knowledge

INWA

http://www.epcc.ed.ac.uk/~inwa/

Page 12: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

INWA

An e-Social Science demonstrator– Demonstrates how grid technologies can improve business– Combining private and public data sources– Finance and Telecommunications

Uses many grid technologies– TOG from Sun DCG provides access to remote HPC resource– OGSA-DAI provides access control and discovery of distributed

heterogeneous data resources– FirstDIG grid data service browser provides SQL access to

OGSA-DAI enabled resources– Globus Toolkit 2 and 3

Page 13: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

EPCC

INWA Grid Infrastructure

UK Property

data service Australian Property

data service

Curtin

User@CurtinUser@Edinburgh

Globus

Grid

Globus

Grid

FirstDIG FirstDIGGrid Engine

Bank Telco

TOG

Grid Engine

Bank Telco

TOG

Bank data Telco data

Page 14: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

References

EPCC– http://www.epcc.ed.ac.uk/

FirstDIG– http://www.epcc.ed.ac.uk/firstdig/

OGSA-DAI – http://www.ogsadai.org.uk

INWA– http://www.epcc.ed.ac.uk/~inwa

Sun Data & Compute Grids– http://www.epcc.ed.ac.uk/sungrid/

Transfer-queue Over Globus (TOG) – http://gridengine.sunsource.net/project/gridengine/tog.html

Page 15: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Outstanding issues raised by FirstDIG & INWA

Page 16: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Outstanding Issues:Usability

OGSA-DAI is middleware, client toolkit helps Incorporation of demo First browser helpful’ish

But really want … Interfaces to real data analysis & dbms packages eg

SPSS Otherwise users could end up building applications that

replicate these eg the First Grid Data Service Browser Want to be able to point Access, Excel, etc at a grid data

source and examine it

Page 17: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Outstanding issues:Data

CSV (Comma separated value) data sources – are common but current JDBC-ODBC drivers do not have

sufficient functionality (NOT an OGSA-DAI issue per se)

No support for BIT type field – And others eg BOOLEAN, BINARY, etc

Certain characters (eg &, >) are not handled by the OGSA-DAI XML parser– Company names often have & in them

Dates from certain sources not handled properly– First Grid Data Service has to handle this internally

Page 18: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Outstanding issues:Miscellaneous

Security– Rolemap file is not encrypted– If one GDS accesses another GDS the user security credentials

are not passed on so it does not work

Installation & Testing– Install & Set-up

• Well-explained but still a fair amount of user effort involved

– Lack of an example OGSA-DAI site to point at to test that your OGSA-DAI installation works

Page 19: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Outstanding Issues:Miscellaneous

Installation & Testing– Lack of an example OGSA-DAI site to point at to test that your

OGSA-DAI installation works

Large results sets– Can increase JVM size but this is not scalable– This occurred on most datasets

Integration– DQP is a start ….(Linux, OQL)

Why use OGSA-DAI ?– Easysoft etc– http://www.easysoft.com/products/2001/main.phtml

Page 20: Using OGSA-DAI in a commercial environment Terry Sloan EPCC Telephone: +44 131 650 5155 Email: tsloan@epcc.ed.ac.uk

Why use OGSA-DAI ?

‘a RDBMS engine that appears to client apps as a fully

conformant ODBC 3.5 data source….can be used to

provide real-time, heterogeneous access to

multiple target data sources.’