15
Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director www.nesc.ac.uk 3 rd October 2003

Japanese UK N+N Data, Data everywhere and Prof. Malcolm Atkinson Director 3 rd October 2003

Embed Size (px)

DESCRIPTION

Web Hits - Domain

Citation preview

Page 1: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Japanese & UK N+N

Data, Data everywhere and …Prof. Malcolm Atkinson

Director

www.nesc.ac.uk

3rd October 2003

Page 2: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Discovery is a wonderful thing

Page 3: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Web Hits - Domain

47%

4%

15%

17%

4%

9% .ac.uk.uk (other)unresolved.ibm.com.com (other).net.edu.jp.deother

Page 4: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Our job: Make the Party a Success every time

Computing ScienceSystems, Notations &

Formal Foundation→ Process & Trust

TheoryModels & Simulations

→Shared Data

Experiment &Advanced Data

Collection→

Shared Data

Multi-national, Multi-discipline, Computer-enabledConsortia, Cultures & Societies

Requires Much Engineering, Much Innovation

Changes Culture, New Mores, New Behaviours

Page 5: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Integration is our FocusSupporting Collaboration

Bring together disciplinesBring together people engaged in shared challengeInject initial energyInvent methods that work

Supporting Collaborative ResearchIntegrate compute, storage and communicationsDeliver and sustain integrated software stackOperate dependable infrastructure serviceIntegrate multiple data sourcesIntegrate data and computationIntegrate experiment with simulationIntegrate visualisation and analysis

High-level tools and automation essentialFundamental research as a foundation

Page 6: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Derived from Ian Foster’s slide at ssdbM July 03

It’s Easy to ForgetHow Different 2003 is From

1993Enormous quantities of data: Petabytes

For an increasing number of communitiesGating step is not collection but analysis

Ubiquitous Internet: >100 million hostsCollaboration & resource sharing the normSecurity and Trust are crucial issues

Ultra-high-speed networks: >10 Gb/sGlobal optical networksBottlenecks: last kilometre & firewalls

Huge quantities of computing: >100 Top/sMoore’s law gives us all supercomputersUbiquitous computing

(Moore’s law)2 everywhereInstruments, detectors, sensors, scanners, …

Page 7: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Tera → Peta BytesRAM time to move

15 minutes1Gb WAN move time

10 hours ($1000)Disk Cost

7 disks = $5000 (SCSI)Disk Power

100 WattsDisk Weight

5.6 KgDisk Footprint

Inside machine

RAM time to move2 months

1Gb WAN move time14 months ($1 million)

Disk Cost6800 Disks + 490 units + 32 racks = $7 million

Disk Power100 Kilowatts

Disk Weight33 Tonnes

Disk Footprint60 m2

May 2003 Approximately CorrectSee also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24

Page 8: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

DynamicallyMove computation to the dataAssumption: code size << data sizeDevelop the database philosophy for this?

Queries are dynamically re-organised & boundDevelop the storage architecture for this?

Compute closer to disk? System on a Chip using free space in the on-disk controller

Data Cutter a step in this directionDevelop the sensor & simulation architectures for this?Safe hosting of arbitrary computation

Proof-carrying code for data and compute intensive tasks + robust hosting environments

Provision combined storage & compute resourcesDecomposition of applications

To ship behaviour-bounded sub-computations to dataCo-scheduling & co-optimisation

Data & Code (movement), Code executionRecovery and compensation

Dave PattersonSeattle

SIGMOD 98

Page 9: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

OGSA

Infrastructure Architecture

OGSI: Interface to Grid Infrastructure

Data Intensive Applications for Science X

Compute, Data & Storage Resources

Distributed

Simulation, Analysis & Integration Technology for Science X

Data Intensive X Scientists

Virtual Integration Architecture

Generic Virtual Data Access and Integration Layer

Structured Data

Integration

Structured Data Access

Structured Data Relational XML Semi-structured-

Transformation

Registry

Job Submission

Data Transport Resource Usage

Banking

Brokering Workflow

Authorisation

Page 10: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

1a. Request to Registry for sources of data about “x”

1b. Registry responds with

Factory handle2a. Request to Factory for access to database

2c. Factory returns handle of GDS to client

3a. Client queries GDS with XPath, SQL, etc

3b. GDS interacts with database

3c. Results of query returned to client as XML

SOAP/HTTPservice creationAPI interactions

Registry

Factory

2b. Factory creates GridDataService to manage access

Grid Data Service

Client

XML / Relational database

Data Access & Integration Services

Page 11: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

GDTS2 GDS3

GDS2

GDTS1

Sx

Sy

1a. Request to Registry for sources of data about “x” & “y”

1b. Registry responds with

Factory handle

2a. Request to Factory for access and integration from resources Sx and Sy

2b. Factory creates GridDataServices network

2c. Factory returns handle of GDS to client

3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc

3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation

SOAP/HTTP

service creation

API interactions

Data Registry

Data Access& Integrationmaster

Client

Analyst XML database

Relational database

GDS

GDS

GDS

GDTS

GDTS

3b. Client tells analyst

GDS1

Future DAI Services

“scientific”Applicationcodingscientificinsights

ProblemSolving

Environment

SemanticMeta data

Application Code

Page 12: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

A New WorldWhat Architecture will Enable Data & Computation Integration?

Common Conceptual ModelsCommon Planning & OptimisationCommon Enactment of WorkflowsCommon Debugging…

What Fundamental CS is needed?Trustworthy code & Trustworthy evaluatorsDecomposition and Recomposition of Applications…

Is there an evolutionary path?

Page 13: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Take Home MessageInformation Grids

Support for collaborationSupport for computation and data gridsStructured data fundamental

Relations, XML, semi-structured, files, …Integrated strategies & technologies needed

OGSA-DAI is here nowA first stepTry itTell us what is needed to make it betterJoin in making better DAI services & standards

Page 14: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

SouthamptonLondon

Belfast

Daresbury Lab

RAL Hinxton

NeSC in the UKNational

e-ScienceCentre HPC(x)

Directors’ ForumHelped build a

communityEngineering Task ForceGrid Support CentreArchitecture Task Force

UK Adoption of OGSAOGSA Grid MarketWorkflow Management

Database Task ForceOGSA-DAIGGF DAIS-WG

GridNet e-Storm

Globus Alliance

Page 15: Japanese  UK N+N Data, Data everywhere and  Prof. Malcolm Atkinson Director   3 rd October 2003

www.nesc.ac.uk