26
May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University, US

May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

May 29, 2007

Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and

Search

Beth PlaleDirector, Center for Data and Search Informatics

School of Informatics Indiana University, US

May 29, 2007

Introduction• Linked Environments for Atmospheric Discovery

(LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves.

• In this talk we describe key data management aspects of the project - those projects being carried out in the Center for Data and Search Informatics at Indiana University

May 29, 2007

Infrastructure is portal based - that is, all services are available

through a web server

Infrastructure is portal based - that is, all services are available

through a web server

May 29, 2007

Gateway ServicesGateway Services

Core Grid ServicesCore Grid Services

e-Science Gateway Architecture

Grid Portal Server

Grid Portal Server

ExecutionManagement

ExecutionManagement

InformationServices

InformationServices

SelfManagement

SelfManagement

DataServices

DataServices

ResourceManagement

ResourceManagement

SecurityServices

SecurityServices

Resource Virtualization (OGSA)Resource Virtualization (OGSA)

Compute Resources Data Resources Instruments & Sensors

Proxy CertificateServer (Vault)

Proxy CertificateServer (Vault)

Events & Messaging

Events & Messaging

Resource BrokerResource Broker

Community & User Metadata Catalog

Community & User Metadata Catalog

Workflow engine

Workflow engine Resource

Registry

Resource Registry

ApplicationDeployment

ApplicationDeployment

User’s Grid DesktopUser’s Grid Desktop

[1][1] Service Oriented Architectures for Science Gateways on Grid Systems, Gannon, D., et al.; ICSOC, 2005

May 29, 2007

arpssfc

arpstrn Ext2arps-ibc

88d2arps

mci2arps

ADASassimilation

arps2wrf

nids2arps

WRF

Ext2arps-lbc

wrf2arps

arpsplot

IDV viz

Terrain data files

Surface data files

ETA, RUC, GFS data

Radar data (level II)

Radar data (level III)

Satellite data

Surface, upper air mesonet & wind profiler

data

Typical weather forecast runs as workflow

~400 Data Products Consumed & ~400 Data Products Consumed & Produced Produced –– transformedtransformed –– during during

Workflow LifecycleWorkflow Lifecycle

~400 Data Products Consumed & ~400 Data Products Consumed & Produced Produced –– transformedtransformed –– during during

Workflow LifecycleWorkflow Lifecycle

Pre-ProcessingPre-Processing AssimilationAssimilation ForecastForecast VisualizationVisualization

May 29, 2007

To set up workflowexperiment,

we select a workflow(not shown)

then set model parameters here

To set up workflowexperiment,

we select a workflow(not shown)

then set model parameters here

May 29, 2007

Supported community

data collections

Supported community

data collections

May 29, 2007

Data Integration

CASA radarCollection,

Months (ftp)

Latest 3 days Unidata IDD Distribution

(XML web server)

Level II and III radar, latest

3 days(XML web server)

ETA, NCEP, NAM,

METAR, etc.(XML web server)

Oklahoma

Indiana

Colorado

ColoradoIndexXMLDB native XML database

and Lucene for index

Local view: crosswalk point of presence supports crawling,

publishes difference list as LEAD Metadata Schema (LMS)

documents

• Crawler crawls catalogs; • Builds index of results; • Web service API; • Boolean search query with spatial/temporal support

Globally integrated view: Data Catalog Service

Web s

erv

ice A

PI

Boolean search query

List of results as LEAD Metadata

Schema documents

crosswalks

May 29, 2007

LEAD Personal Workspace

• CyberInfrastructure extends user’s desktop to incorporate vast data analysis space.

• As users go about doing scientific experiments, the CI manages back-end storage and compute resources.

• Portal provides ways to explore this data and search and discover it.

• Metadata about experiments is largely automatically generated, and highly searchable.

• Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

May 29, 2007

Searching for experiments using model configuration parameters: 2 attributes selected

May 29, 2007

Searching for experiments based on model parameters: 4 returned experiments; one displayed

May 29, 2007

How forecast model configuration parameters stored in personal catalog

Forecast model configuration file handed off to plugin that shreds XML

document into queriable attributes

associated with experiment

May 29, 2007

What & Why of Provenance• Derivation history of a data product

• What (when, where) application created the data• Its parameters & configuration• Other input data used by application

• Workflow is composed from building blocks like these. So provenance for data used in workflow gives workflow trace

ApplicationA

Data.Out.1

Data.In.1

Config.A

Data.In.2

Data Provenance::Data.Out.1Process: Application_A Timestamp: 2006-06-23T12:45:23 Host: tyr20.cs.indiana.edu …Input: Data.In.1, Data.In.2Config: Config.A

May 29, 2007

The What & Why of Provenance• Trace Workflow Execution

• What services were used during workflow execution?• Validate if all steps of execution successful?

• Audit Trail• What resources were used during workflow execution?

• Data Quality & Reuse• What applications were used to derived data products?• Which workflows use a certain data product?

• Attribution• Who performed the experiment?• Who owns the workflow & data products?

• Discovery• Locate data generated by a workflow• Locate workflows containing App-X that succeeded

May 29, 2007

Karma Provenance ServiceKarma Provenance Service

ProvenanceListener

ProvenanceListener

ActivityDB

ActivityDB

Collection Framework

Workflow Instance10 Data Products Consumed & Produced by each Service

Workflow Instance10 Data Products Consumed & Produced by each Service

Service2

Service2 ……Service

1Service

1Service

10Service

10Service

9Service

910P/10C

10C

10P 10C 10P/10C

10P

Workflow Engine

Workflow Engine

Message Bus WS-Eventing Service API Message Bus WS-Eventing Service API WS-Messenger

Notification BrokerWS-Messenger

Notification Broker

Publish Provenance Activities as Notifications

Application–Started & –Finished, Data–Produced & –ConsumedActivities

Workflow–Started & –Finished Activities

ProvenanceQuery API

ProvenanceQuery API

Provenance Browser ClientProvenance

Browser Client

Query for Workflow, Process,& Data Provenance

Subscribe & Listen toActivity Notifications

A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., ICWS Conference, 2006

May 29, 2007

Generating Karma Provenance Activities• Instrument applications to publish provenance• Simple Java Library available to

• Create provenance activities• Publish activities as messages

• Jython “wrapper” scripts use library to publish provenance & invoke application

• Generic Factory toolkit easily converts applications to web service• Built-in provenance instrumentation

May 29, 2007

Sample Sequence of ActivitiesappStarted(App1)

info(‘App1 starting’)fileReceiveStarted(File1)

-- do gridftp get to stage input file File1 --fileReceiveFinished(File1)fileConsumed(File1)computationStarted(Code1)

-- call Fortran code Code1 to process input files --computationFinished(Code1)fileProduced(File2)fileSendStarted(File2)

-- do gridftp put to save output file File2 --fileSendFinished(File2)publishURL(File2)

appFinishedSuccess(App1, File2) | appFinishedFailed(App1, ERR)flush()

May 29, 2007

Performance perturbation

1460

286

1969

296

1643

28092785

419

2216

1653

28342805

426

2233

6

4

1

4

5

- 3

5

0

4

0

500

1000

1500

2000

2500

3000

Start

Terrain PreProcSurface PreProc

3D Interp

ARPS2 WRF

WRF

WRF2 ARPSARPS Plot PS2Image

W o r k f l o w A p p l i c a t i o n S c r i p ts E x e c u ti o n S e q u e n c e

Cumulative Time for Execution (Secs)

-15

-10

-5

0

5

10

15

Provenance Overhead for Each Script (Secs)

C u m u l a t i v e T i m e w / o P r o v e n a n c e ( S e c s )

C u m u l a t i v e T i m e w / P r o v e n a n c e ( S e c s )

P r o v e n a n c e O v e r h e a d

May 29, 2007

Scalability Study4

[4][4] Performance Evaluation of the Karma Provenance Framework for Scientific Workflows, Simmhan, Y., et al.; IPAW Workshop, 2006

May 29, 2007

Resource monitoring as two-planes of control

May 29, 2007

LEADBPEL

WorkflowEngine

WorkflowConfiguration

ServicePortal

Event Broker

Workflow

Application Service

(per task)

Workflow and File Status

DAG

myLEAD(subscribes to messages from

the broker and knows what magic to do with input/output files and

talks to RLS/DRS

Run workflow one

step at a time

Run job

Jobnotification

CreateServices

App. Factory

Launch Services

ResourceManagement

Services

Sensor

Actuator

Resource adaptation illustrated (1)

Resource has failed, need to

reschedule remaining parts

of workflow

Stop the earlier workflow

Replan the workflow

ResourceChanges

May 29, 2007

LEADBPEL

WorkflowEngine

WorkflowConfiguration

ServicePortal

Event Broker

Workflow

Application Service

(per task)

Workflow and File Status

DAG

myLEAD(subscribes to messages from

the broker and knows what magic to do with input/output files and

talks to RLS/DRS

Run workflow one

step at a time

Run job

Jobnotification

CreateServices

App. Factory

Launch Services

ResourceManagement

Services

Sensor

Actuator

Resource adaptation illustrated (2)

Implement strict deadline

scheduling

Weather change

Plan resourcesfor sub-

components

Change priorities for users e.g. Lavanya’s workflow gets

lower priority

Implement Adverse

Weather Policy

May 29, 2007

LEADBPEL

WorkflowEngine

WorkflowConfiguration

ServicePortal

Event Broker

Workflow

Application Service

(per task)

DAG

myLEAD(subscribes to messages from

the broker and knows what magic to do with input/output files and

talks to RLS/DRS

Run workflow one

step at a time

Run job

Jobnotification

CreateServices

App. Factory

Launch Services

ResourceManagement

Services

Sensor

Actuator

Resource adaptation illustrated (3)Services

“ReplicateService”“Service

Overloaded”

May 29, 2007

Recent LEAD HighlightSpring 2007 Weather Challenge Forecast contest - February -

March 2007• Students ran …..

Statistics from the Challenge• Approximately 50 participants• 6696 jobs submitted to Teragrid (52925 TG SU's), and • Generated about 2.6 TB of data which is archived at Indiana

University and available though each participating user’s personal workspace catalog.

• Computational models run on Teragrid resources. Portal and persistent back-end services run at Indiana University. Data storage resources (45 TB) for user-generated data products provided by Indiana University.

May 29, 2007

Future Work• Optimizations and refinements: file movement,

revisit metadata schema, improve crosswalks with eye to reduced maintenance

• Personal predictor - packaging LEAD framework into single 8-16 core multicore machine for the individual purchase

May 29, 2007

Thanks to the whole LEAD team, and the National Science Foundation for their support.

For more information, feel free to contact me at [email protected] or go to http://www.leadportal.org