21
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Cyberinfrastructure for Coastal Forecasting and Coastal Forecasting and Change Analysis Change Analysis Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University

Cyberinfrastructure for Coastal Forecasting and Change Analysis

Embed Size (px)

DESCRIPTION

Cyberinfrastructure for Coastal Forecasting and Change Analysis. Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University. Context. New Award from Office of Cyberinfrastructure (OCI) Under Cyberinfrastructure for Environmental Observatories Program - PowerPoint PPT Presentation

Citation preview

Ohio State University Department of Computer Science and Engineering

1

Cyberinfrastructure for Coastal Cyberinfrastructure for Coastal Forecasting and Change Forecasting and Change

AnalysisAnalysisGagan Agrawal

Hakan FerhatosmanogluXutong Niu

Ron Li Keith Bedford

The Ohio State University

Ohio State University Department of Computer Science and Engineering

2

Context Context

• New Award from Office of Cyberinfrastructure (OCI)– Under Cyberinfrastructure for Environmental Observatories

Program

– September 2006 – August 2009, total amount $1,400,000

• Involves 2 Computer Scientists and 2 Environmental Scientists – G. Agrawal (PI) – Grid Middleware

– H. Ferhatosmanoglu – Databases

– K. Bedford: Great Lakes Now/Forecasting

– R. Li: Coastal Erosion Analysis

Ohio State University Department of Computer Science and Engineering

3

Coastal Forecasting and Change Coastal Forecasting and Change Detection (Lake Erie)Detection (Lake Erie)

Ohio State University Department of Computer Science and Engineering

4

Project PremiseProject Premise

• Limitation of Current Environmental Observation Systems – Tightly coupled systems

» No reuse of algorithms

» Very hard to experiment with new algorithms

– Closely tied to existing resources

• Our claim – Emerging trends towards web-services and grid-

services can help

Ohio State University Department of Computer Science and Engineering

5

Challenges Challenges

• Existing Grid Middleware Systems have not considered – Processing of Streaming Data

– Data Integration Issues

• The applications involved needs techniques for multi-modal data fusion, query planning, and data mining – Need to implement them as grid or web-services

Ohio State University Department of Computer Science and Engineering

6

Proposed Infrastructure and Proposed Infrastructure and CollaborationCollaboration

Ohio State University Department of Computer Science and Engineering

7

Application Details: Great Lakes Application Details: Great Lakes Now/ForeCasting Now/ForeCasting

• GLOS: Great Lakes Observing System – Co-designer/project manager: K. Bedford, a co-PI on

this project

– Collaboration with NOAA

• Limitations: Hard-wired – Cannot incorporate new streams or algorithms

• Create an Implementation using our Middleware for Streaming Data

Ohio State University Department of Computer Science and Engineering

8

Application Details: Coastal Erosion Application Details: Coastal Erosion Prediction and Analysis Prediction and Analysis

• Focus: Erosion along Lake Erie Shore – Serious problem – Substantial Economic Losses

• Prediction requires data from – Variety of Satellites – In-situ sensors – Historical Records

• Challenges – Analyzing distributed data – Data Integration/Fusion

Ohio State University Department of Computer Science and Engineering

9

Middleware Developed at Ohio Middleware Developed at Ohio State State

• Automatic Data Virtualization Framework – Enabling processing and integration of data in low-

level formats

• GATES (Grid-based AdapTive Execution on Streams) – Processing of distributed data streams

• FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid) – Supporting scalable data analysis on remote data

Ohio State University Department of Computer Science and Engineering

10

Automatic Data Virtualization: Automatic Data Virtualization: MotivationMotivation

• Access mechanisms for remote repositories– Complex low-level formats make accessing and

processing of data difficult

– Main desired functionality » Ability to select, down-load, and process a subset of data

• Sensor Data – Again, low level data

– Need to convert formats

– Need a flexible architecture

Ohio State University Department of Computer Science and Engineering

11

Data VirtualizationData Virtualization

An abstract view of data

dataset

Data Service Data

Virtualization

By Global Grid Forum’s DAIS working group:• A Data Virtualization describes an abstract view of data.• A Data Service implements the mechanism to access and process data through the Data Virtualization

Ohio State University Department of Computer Science and Engineering

12

Our Approach: Automatic Data Our Approach: Automatic Data VirtualizationVirtualization

• Automatically create data services – A new application of compiler technology

• A metadata descriptor describes the layout of data on a repository

• An abstract view is exposed to the users • Two implementations:

– Relational /SQL-based

– XML/XQuery based

Ohio State University Department of Computer Science and Engineering

13

Streaming Data ModelStreaming Data Model

• Continuous data arrival and processing • Emerging model for data processing

– Sources that produce data continuously: sensors, long running simulations

– Critical In Environmental Observatories • Active topic in many computer science communities

– Databases– Data Mining – Networking ….

Ohio State University Department of Computer Science and Engineering

14

Need for a Grid-Based Stream Need for a Grid-Based Stream Processing Middleware Processing Middleware

• Application developers interested in data stream processing – Will like to have abstracted

» Grid standards and interfaces » Adaptation function

– Will like to focus on algorithms only

• GATES is a middleware for – Grid-based – Self-adapting

Data Stream Processing

Ohio State University Department of Computer Science and Engineering

15

Adaptation for Real-time ProcessingAdaptation for Real-time Processing

• Analysis on streaming data is approximate • Accuracy and execution rate trade-off can be

captured by certain parameters (Adaptation parameters) – Sampling Rate – Size of summary structure

• Application developers can expose these parameters and a range of values

Ohio State University Department of Computer Science and Engineering

16

FREERIDE-G: Supporting Distributed Data-Intensive Science

Data Repository Cluster

Compute ClusterUser

?

Ohio State University Department of Computer Science and Engineering

17

Challenges for Application Challenges for Application DevelopmentDevelopment

• Analysis of large amounts of disk resident data• Incorporating parallel processing into analysis• Processing needs to be independent of other

elements and easy to specify• Coordination of storage, network and

computing resources required• Transparency of data retrieval, staging and

caching is desired

Ohio State University Department of Computer Science and Engineering

18

FREERIDE-G GoalsFREERIDE-G Goals

• Support High-End Processing– Enable efficient processing of large scale data mining

computations

• Ease Use of Parallel Configurations– Support shared and distributed memory parallelization starting

from a common high-level interface

• Hide Details of Data Movement and Caching– Data staging and caching (when feasible/appropriate) needs to be

transparent to application developer

Ohio State University Department of Computer Science and Engineering

19

Data Analysis Services Data Analysis Services

• Multi-model Multi-Sensor Data Integration – Built on our Data Virtualization Framework

• Query Planning Service – Feature Extraction: Integration with Grid Metadata

Catalogs

• Remote Mining of Spatio-Temporal Data – Built using FREERIDE-G

• Mining algorithms for Data Streams – Built using GATES

Ohio State University Department of Computer Science and Engineering

20

Recap Recap

Ohio State University Department of Computer Science and Engineering

21

Looking For Looking For

• Feedback on our approach • Synergy with other efforts • Lessons learnt by others