40
Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids Galip Aydin Department Of Computer Science Indiana University 1 1/15/200 7

Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids

  • Upload
    mikasi

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids. Galip Aydin Department Of Computer Science Indiana University. Geographic Information Systems. - PowerPoint PPT Presentation

Citation preview

Page 1: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids

Galip AydinDepartment Of Computer Science

Indiana University

11/15/2007

Page 2: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Geographic Information Systems A Geographic Information System is a system for creating,

storing, sharing, analyzing, manipulating and displaying spatial data and associated attributes.

GIS history saw the evolution from mainframe GIS to Desktop GIS to Distributed GIS.

Modern GIS require: Distributed data access for spatial databases Utilizing remote analysis, simulation or visualization tools.

Problems with traditional distributed GIS approaches: Distributed nature of the geo-data; various client-server

models, databases, HTTP, FTP, RDBs, XML DBs etc. Data format problems, conversion overheads Data processing issues, hardware and software

requirements, COM+/ActiveX, CORBA/IIOP frameworks

2

Page 3: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Open Geographic Standards Open GIS Standards bodies aim to make geographic information

and services neutral and available across any network, application, or platform.

Two major standard bodies: OGC and ISO/TC211, former being most popular

OGC Specifications are widely accepted: Data Format Specs: GML, SensorML, O&M Service Specs: WFS, WMS, WCS

OGC Services are HTTP GET/POST based; limited data transport capabilities.

Request-response type services; centralized, synchronous applications.

3

Page 4: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

PBO and CRTN GPS Stations

4

Plate Boundary Observatory (PBO) GPS Stations in North America

California Real-Time GPS Network (CRTN).

Page 5: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Requirements for a GIS/Sensor Grid Requirements of service orchestration capabilities

Complex problems require GIS applications to collaborate. Coupling data sources to scientific applications Data transport requirements Proliferation of Sensors

Ability to analyze data on-the-fly, continuous streaming support, scalable systems for addition of new sensors.

High performance and high rate messaging Real-time data access, rapid response systems, crisis

management etc. From the Grids perspective the Motivations are

To apply general Grid/Distributed computing principles to GIS Investigate how to integrate with geophysical and other scientific

applications with data sources

5

Page 6: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Motivating Use Cases Very successful and highly acclaimed earthquake science

applications Pattern Informatics (PI) - UC Davis

• Earthquake forecasting code developed by Prof. John Rundle (UC Davis) and collaborators, uses seismic archives.

Regularized Dynamic Annealing Hidden Markov Method (RDAHMM) – NASA/JPL

• Time series analysis code, can be applied to GPS and seismic archives. It can be applied to real-time and archival data.

SOPAC GPS Networks provide real-time messages – UCSD/SIO • 8 networks for 80 stations produce 1Hz high resolution data. The

signatures of GPS Sensors are used in Earthquake forecasting. Interdependent Energy Infrastructure Simulation System

(IEISS) - LANL Models infrastructure networks (e.g. electric power systems and

natural gas pipelines) and simulates their physical behavior, interdependencies between systems. 6

Page 7: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Research Issues 1 Applying Web Service principles to GIS data services

Orchestration of Services, workflows. We need services suitable for large data sets and where quick response is required.

High Performance support in GIS services The performance problem must be addressed in a

complete and general framework supporting different data requirements

Interoperability The system should bridge GIS and Web Service

communities by adapting standards from both. Other GIS applications should be able to consume data

without having to do costly format conversions.7

Page 8: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Research Issues 2 Scalability

The system should be able to handle high volume and high rate data transport and processing.

Plugging new sensors, data sources or geo-processing applications should not degrade system’s overall performance.

Flexibility and extendibility How to develop real-time services to process sensor data

on the fly. Ability to add new filters without system failures.

Quality of Service Issues Is latency introduced by services in processing real-time

sensor data acceptable?

8

Page 9: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

SOA for GIS – Geophysical Data Grid To create a GIS Data Grid (Geophysical Grid) Architecture we

utilize Web Services to realize Service Oriented Architecture OGC data formats and application interfaces to achieve

interoperability at both data and service levels. GIS Data Grid Features

Depending on the source, geospatial data can be archival or real-time. The architecture provides standard control and access interfaces for both types.

Supports alternate transport and representation schemes, uses topic based messaging infrastructure for data and message exchange.

Streaming and non-streaming services to access archived data. Real-Time and near real-time filter services for accessing sensor

metadata and sensor measurements.

9

Page 10: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

GIS Grid Usage Model – Earthquake Science Supporting geophysical repositories and real-time

sensors is essential To analyze a typical earthquake it is important to

access to precise measurements of the initial earthquakes and aftershocks

To support earthquake forecasting and the time and spatial positions of the forecasts PI can be used with existing data RDAHMM can be used with the real-time data

Earth Science field is moving from a previously data poor field to a data rich world. We will have thousands of sensors spread around the world. (i.e. GPS sensors, InSAR satellites)

10

Page 11: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

GIS Grid Components Filter Services for Real-Time data support OGC Web Feature Service (WFS) for archival data

support Web Service version Streaming version, which introduces data and control

channel separation All control goes through SOAP messages, data is

transferred by a variety of transport mechanisms which are implied by the control message.

Publish-Subscribe system for message and data exchange

UDDI based service registry (by Mehmet Aktas)

11

Page 12: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Geophysical Data Grid Architecture

Archival Data Grid Real-Time Data Grid12

Page 13: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

GIS Grid Part 1 - Real-Time Data Services

Sensors and sensor networks are being deployed for measuring various geo-physical entities.

Sensors and GIS are closely related. Sensor measurements are used by GIS for statistical or analytical purposes.

With the proliferation of the sensors, data collection and processing paradigms are changing.

Most scientific geo-applications are designed to work with archived data.

Critical Infrastructure Systems and Crisis Management environments require fast and accurate access to real-time sources a flexible/pluggable architecture for coupling geo-

processing applications with the data.13

Page 14: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

SensorGrid Architecture Major components:

Real-Time filters Publish-Subscribe System Information Service

Filters can be run as Web Services to create workflows.

Filter Chains can be deployed for complex processing.

Streaming messaging provides high-performance transfer options.

14

Page 15: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Real-Time Filters

Real-time data processing is supported by employing filters around publish/subscribe messaging system.

The filters are extended from a generic class to inherit publish and subscribe capabilities.

They can be connected in parallel or serial as chains to solve complex problems.

15

Parallel Operation

Serial Operation

Page 16: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Use Case - GPS Sensors GPS is used to identify long-term tectonic deformation and

static displacements. SCIGN has 250 Real-Time GPS Stations. SOPAC GPS networks:

8 networks for 80 stations produce 1Hz high resolution data. Socket based real-time binary-RYO format access is available. We developed filters to provide multiple format (RYO, ASCII, GML)

real-time streaming access. OHIO principle (a general principle required by DOD) and chain of

filters. Our Architecture

Uses publish/subscribe based NaradaBrokering for managing real-time GPS streams

Utilizes topics for hierarchical organization of the sensors Deploys successive data filters ranging from format translators to

data analysis codes Could potentially be used to run RDAHMM clones to monitor state

changes in the entire GPS network We are partner in a pioneering project to use the real-time

GPS data for the first time in this context.16

Page 17: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Processing Real-Time GPS Streams

17

Raw Data

7010

7011

7012

RYOPorts

NB Server

ScrippsRTD

Server

Raw Data

A Complete Sensor Message Processing Path, including a data analysis application.

GPS Networks

Page 18: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Application Integration with Real-Time Filters

Station Monitor Filter records real-time positions for 10 minutes and calculates position changes

Graph Plotter Application creates visual representation of the positions.

RDAHMM Filter records real-time positions for 10 minutes and invokes RDAHMM application which determines state changes in the XYZ signal.

Graph Plotter Application creates visual representation of the RDAHMM output.

18

Page 19: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Recording and Replaying Sensor Streams Filters can be used to record and replay scenarios,

such as Earthquakes in GPS case. We developed RYO Recorder and RYO Publisher

Filters. The RYO Recorder creates daily archives of the GPS

Streams. RYO Publisher can be used to play daily or certain

segments of the records. We replayed the 2004 Southern California

Earthquake using Parkfield GPS network archive These filters are used in the performance and

scalability tests.

19

Page 20: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

SensorGrid Performance Tests Two Major Goals: System Stability and

Scalability Ensuring stability of the distributed Filter Services

for continuous operation. Finding the maximum number of publishers

(sensors) and clients that can be supported with a single broker.

Investigate if system scales for larger number of sensors and clients.

20

Page 21: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Test Methodology

The test system consists of a NaradaBrokering server and a three-filter chain for publishing, converting and receiving RYO messages.

We take 4 timings to calculate mean end-to-end delivery times of GPS measurements.

The tests were run at least for 24 hours. GridFarm001-008 servers are used in these tests.

Ttransfer = (T2 – T1) + (T4 – T3)

21

NB Server

1

1 2

3

4

Page 22: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

1- System Stability Test The basic system with

three filters and one broker.

The figure shows average results for every 30 minutes.

The average transfer time shows the continuous operation does not degrade the system performance.

22

Page 23: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

2 – Multiple Publishers Test

We add more GPS networks by running more publishers. The results show that 1000 publishers can be supported

with no performance loss. This is an operating system limit.23

Topic 1A

Topic 1B

Topic 2

Topic n

Page 24: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

3 – Multiple Clients Test

We add more clients by running multiple Simple Filters which subscribe to the same ASCII topic.

The system can support as many as 1000 clients with very low performance decrease.

24

Topic 1A

Topic 1B

1000 Clients

Page 25: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Extending Scalability The limit of the basic system appears to be 1000

clients or publishers. This is due to an Operating System restriction of

open file descriptors (1024 for Red Hat Linux) which can be increased by changing OS parameters.

To overcome this limit we create NaradaBrokering networks with linking multiple brokers. NB supports scalable linkage of the brokers for building tree like architectures.

We run 2 brokers to support 1500 clients. Number of brokers can be increased indefinitely, so we

can potentially support any number of publishers and subscribers.

25

Page 26: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

4 – Multiple Brokers Test

NaradaBrokering allows creation of Broker networks.

We create a two-broker network. Messages published to first broker

can be received from the second broker.

We take timings on each broker. We connect 750 clients to each

broker and run for 24 hours. We chose 750 clients to stay well below the saturation limit.

The results show that the performance is very good and similar to single broker test.

26

NB Server

1

NB Server

2

Topic 1A

Topic 1B

Topic 1B

NB Server

2

Page 27: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

4 – Multiple Brokers Test

27

Page 28: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Real-Time Filters Test Results

The RYO Publisher filter runs at 1Hz and publishes 24-hour archive of the CRTN_01 GPS network, which contains 9 GPS stations.

The single broker configuration can support 1000 clients or publishers (GPS networks - 9000 individual stations).

The system can be scaled up by creating NaradaBrokering broker networks.

Message order was preserved in all tests.

28

Page 29: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

GIS Grid Part 2 - Archival Data Grid Web Feature Service is the default OGC specification for vector data. We have built Web Service version of WFS for accessing geospatial data on

distributed databases. Requirements

Various Feature data should be stored in the databases Queries are in OGC Common Query Language (GML) format Results are GML Feature Collections Operations to support are Get Capabilities, Describe Feature Types, Get Features

To connect to multiple databases we have implemented a DB federation scheme

Adding features is easy with using XML configuration files We have Implemented OGC Filter Encoding for Query Translation Dynamic Capability generation allow federation of the services The first Web Service version of WFS has been successfully used in several

scientific workflows with other services (WMS, HPSearch, UDDI).

29

Page 30: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

WFS Performance Improvements Streaming WFS

Issues with Web Service version of the WFS Synchronous request-response style Handling non-trivial data transfers, large data requests, SOAP overhead. XML Encoding: Size of the geospatial data increases with GML encoding

which increases transfer times, or may cause exceptions To improve performance of the WFS:

Utilized publish/subscribe messaging system for high performance data transfer. Similar to WFS but introduces data and control channel separation which allows one to many data distribution.

Used streaming database connection (MySQL) for faster retrieval of the query results, and lower GML creation overhead.

Binary XML Frameworks are integrated for reducing XML payload size which improves transfer times. We used BNUX and Fast Infoset frameworks in our tests.

Binding data transfer to publish-subscribe messaging system reduces SOAP overhead.

Database processing, GML creation and data transport is streaming 30

Page 31: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

GIS Grid Example –IEISS Integration (LANL)

31

NB Server

NB Server

2

MySQLFeature

Database

WMS User Interface

WMS – Ahmet SayarUDDI, Context Service – Mehmet Aktas

Page 32: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Streaming WFS + AJAXReal-Time positions on Google maps

32

Page 33: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Streaming WFS Performance Tests

The Goal is to find the performance of the Streaming-WFS with and without the Binary XML integration. We test the system performance against message size with up to 10.000 features by changing number of features per request.We use BNUX and Fast Infoset Binary XML Frameworks for compressing the GML FeatureCollection documents The BNUX and FI timings include encoding and decoding costs

33

NBPublisher

Binary XML

Encoder

GMLBuilder

RequestHandler

WSDL

DBQuery

Builder

DBManager

NBSubscriber

Binary XML

Decoder

ClientApp

NB Server

Page 34: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

34

Streaming WFS Performance Tests

Page 35: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Contributions Proposed and implemented a SOA architecture to provide a

common platform supporting both archival and real-time geospatial data in data-centric Grids.

Integrated Web Services with Open Geographic Standards for supporting interoperability at both data and application levels.

Shown that the GIS Services can be implemented as streaming services.

Integration of Binary XML Frameworks with the Streaming Services shows performance gains for long network distances.

We have shown that the Sensor Grids can be built on top of the publish/subscribe middleware.

Continuous real-time data support is achieved in Service Architecture.

Scalable architecture implementation for large number of sensor networks.

Detailed investigation of the scalability and performance of the system.

35

Page 36: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Acknowledgement Mehmet Aktas: UDDI and WS-Context Ahmet Sayar: WMS Server and Client ZhiGang Qi: SensorGrid Performance Tests We thank Prof. Yehuda Bock and his group at SIO for

their help with real-time GPS data streams. The work described in this presentation is part of the

QuakeSim project which is supported by the Advanced Information Systems Technology Program of NASA's Earth-Sun System Technology Office.

This collaboration is part of the NASA ACCESS ROSES funded project, Modeling and On-the-fly Solutions in Solid Earth Science.

36

Page 37: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Additional Slides

37

Page 38: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Future Work

Exploring the use of UDP transport for sensor streams, which could potentially increase the NB related performance.

Investigating real-time sensor workflows with Grid workflow tools such as Taverna.

A smart selection tool for choosing best Binary XML format for particular geographic features. This could be based on Case Based Reasoning (CBR) approach.

38

Page 39: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Related Work Linked Environments for Atmospheric Discovery (LEAD), addressing

fundamental IT and meteorology research challenges to create an integrated framework for analyzing and predicting the atmosphere.

Open-source Project for a Network Data Access Protocol (OPeNDAP) is a framework that aims to simplify all aspects of scientific networking, allows access to scientific data over the internet from applications that were not specifically designed for that purpose.

The Real-time Observatories, Applications, and Data management Network (ROADNet), focuses on resolving challenges related to building wireless sensor networks for various types of observations and the information management system which will deliver this sensor observation in real-time to the users.

Laboratory for Advanced Information and Technology Standards (LAITS) at George Mason University, researches GRID (based on Globus Technology) in Earth and Space Science.

39

Page 40: Service Oriented Architecture for  Geographic Information Systems  Supporting Real Time Data Grids

Processing Real-Time GPS Streams

40

Raw Data

7010

7011

7012

RYOPorts

NB Server

RTDServer

Raw Data

A Complete Sensor Message Processing Path, including a data analysis application.