8
GEON: Assembling Maps on Demand From Heterogeneous Grid Sources Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is an important component of the NSF-funded Geosciences Network (GEON) project. The spatial integration is orchestrated by Geo-GEMS (Grid-Enabled Mediation Services), a collection of grid services that support spatial data mediation, ontology and schema conflict resolution, and composite map assembly. Geologic data are served by distributed ArcIMS and WMS servers, each wrapped in WSDL/SOAP wrappers. Capabilities of each source are registered at the mediator, so that the latter can plan and orchestrate query execution. To produce a composite result, query results retrieved from individual sources are either merged at the mediator or overlapped at the client. A comprehensive GEON map assembly service represents a temporary mediator-level ArcIMS service that is created on demand to merge individual raster and vector fragments from distributed servers into a composite map and generate answers to follow-up requests without re-querying the sources. Introduction Spatial information is maintained in multiple disparate databases. While integrating information from diverse spatial sources is at the heart of GIS, differences in spatial data types and file formats, data schemas and access mechanisms, projections and spatial scales, data quality and precision, query and transformation capabilities of the sources, and ontological frameworks the datasets subscribe to, represent serious integration challenges. Methods of formal standards-based reconciliation of these differences become increasingly important with the advancement of spatial data infrastructures and proliferation of Internet map services. Federating data on demand, instead of materializing data replicas in a data warehouse, appears to be the preferred strategy given the exponential growth of spatial data collections. Data grids have emerged as a popular strategy for managing heterogeneities across distributed scientific datasets. Data grid is a data federation environment where multiple distributed heterogeneous resources share a common logical name space and can be accessed and queried seamlessly. Open Grid Services Architecture (OGSA) is an outline and specification of grid services that combines Web services technology (W3C, 2001) with principles and experiences of computational grid development [Foster et al. 1999, 2001, 2002; GLOBUS 2003, GGF 2003]. The Geosciences Network (GEON) is a large NSF-funded Information Technology Research (ITR) project being developed collaboratively by geoscientists and computer scientists. GEON develops grid services that support discovery, integration and analytical use of multiple geosciences databases. The GEON grid architecture is shown in Figure 1.

Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

GEON: Assembling Maps on Demand From Heterogeneous Grid Sources

Ilya Zaslavsky, Ashraf Memon

Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is an important component of the NSF-funded Geosciences Network (GEON) project. The spatial integration is orchestrated by Geo-GEMS (Grid-Enabled Mediation Services), a collection of grid services that support spatial data mediation, ontology and schema conflict resolution, and composite map assembly. Geologic data are served by distributed ArcIMS and WMS servers, each wrapped in WSDL/SOAP wrappers. Capabilities of each source are registered at the mediator, so that the latter can plan and orchestrate query execution. To produce a composite result, query results retrieved from individual sources are either merged at the mediator or overlapped at the client. A comprehensive GEON map assembly service represents a temporary mediator-level ArcIMS service that is created on demand to merge individual raster and vector fragments from distributed servers into a composite map and generate answers to follow-up requests without re-querying the sources.

Introduction Spatial information is maintained in multiple disparate databases. While integrating information from diverse spatial sources is at the heart of GIS, differences in spatial data types and file formats, data schemas and access mechanisms, projections and spatial scales, data quality and precision, query and transformation capabilities of the sources, and ontological frameworks the datasets subscribe to, represent serious integration challenges. Methods of formal standards-based reconciliation of these differences become increasingly important with the advancement of spatial data infrastructures and proliferation of Internet map services. Federating data on demand, instead of materializing data replicas in a data warehouse, appears to be the preferred strategy given the exponential growth of spatial data collections. Data grids have emerged as a popular strategy for managing heterogeneities across distributed scientific datasets. Data grid is a data federation environment where multiple distributed heterogeneous resources share a common logical name space and can be accessed and queried seamlessly. Open Grid Services Architecture (OGSA) is an outline and specification of grid services that combines Web services technology (W3C, 2001) with principles and experiences of computational grid development [Foster et al. 1999, 2001, 2002; GLOBUS 2003, GGF 2003]. The Geosciences Network (GEON) is a large NSF-funded Information Technology Research (ITR) project being developed collaboratively by geoscientists and computer scientists. GEON develops grid services that support discovery, integration and analytical use of multiple geosciences databases. The GEON grid architecture is shown in Figure 1.

Page 2: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

GEON portal is accessed from http://www.geongrid.org. It provides user-specific interfaces to several application services combined into GeonSearch and GeoWorkbench environments. The application services let users register spatial and non-spatial datasets

(registration services), situate the data in spatial, temporal and conceptual contexts (indexing services), execute complex queries (mediation services) and multi-step processing chains (workflow services), and visualize intermediate and final results as tables, maps, and graphs (visualization and mapping services). The application services execute on top of core grid services which implement basic data grid capabilities providing mechanisms for data movement, user access control and security, load balancing and network weather service, and communication with physical grid infrastructure.

PORTAL (login, myGEON)

Core Grid Services: Authentication, monitoring, scheduling,  catalog, data transfer, replication, collection management, databases 

Registration  Services 

Data  Mediation  Services 

Physical Grid: RedHat Linux, ROCKS, Internet, I2, OptIPuter 

Indexing Services 

GeonSearch                               GeoWorkbench

Workflow Services 

Visualization& Mapping  Services 

Figure 1. The GEON services infrastructure (Source: GEON poster, 2004)

This paper describes our implementation of GEON mapping services, with specific emphasis on services that enable automatic generation of composite maps from heterogeneous grid sources. Below, they are referred to as map assembly services. The structure of the paper is as follows. We start by presenting a motivating example of geologic map integration. The following section outlines potential strategies for assembling composite maps given different outputs generated by individual grid sources. Finally, we describe implementation details of the map assembly service, and conclude with an outline of future work. The motivating example This example derives from a practical need to select geologic formations with a given set of attributes for an area that is covered by several map services. The Rocky Mountains area, one of GEON test beds, covers 8 states. Geologic maps for each state have different schemas, and are served by one or more servers, including ArcIMS, WMS and WFS servers. Handling schema and semantic heterogeneities across multiple map sources within a data grid environment has been considered in [Lin and Ludäscher 2003, Zaslavsky et al. 2003], therefore this issue is not discussed

Page 3: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

here. Instead, we focus on techniques for merging different types of query results into a composite map. In particular, ArcIMS sources may return images of maps (on GET_IMAGE requests), zipped shapefiles (on GET_EXTRACT requests), or feature coordinates in ArcXML (on GET_FEATURES requests). At the same time, WMS servers generate map images (on getmap requests), while WFS sources produce GML documents. In case of ArcIMS servers, for example, the preferred type of result would depend on such factors as the size of result set, target level of interactivity at the client, etc. To handle source heterogeneity, all spatial sources in GEON are “wrapped” in WSDL/SOAP wrappers. The role of wrappers is two-fold: they convert grid service calls generated by the mediator into a system-specific language (for example, translating a mediator request into one or several equivalent ArcXML requests to be posted to an ArcIMS server), and convert query responses (ArcXML response with a path to generated image or shapefiles, or containing feature geometry) into a common format handled by results presentation services. The WSDL/SOAP layer serves as an additional level of abstraction which provides standard access interface to different sources while hiding source peculiarities from the user. For example, the user (or client application) doesn’t have to differentiate between ArcIMS and WMS servers as long as both can be invoked via a standard GetImage grid request. Depending on the type of output generated by each source, we consider several mechanisms for merging them into composite maps. Map assembly mechanisms Merging results of mediated query into a composite result is fairly well understood when the result fragments are homogeneous. For example, XML mediators [e.g., Baru et al. 1999, Gupta et al. 1999] developed in the course of our previous work on MIX (Mediation of Information using XML), combined XML fragments into a single XML tree and presented it to the client. Merging spatial result fragments represents a more serious challenge as the fragments need to be spatially arranged and/or overlapped following cartographic design principles (possibly requiring re-projection and alignment, and adding various elements of geographic context), and certain data/format transformations may be required to create a composite map (raster to vector, XML to shapefile, etc.) In addition, since creating a composite map document may be more time-consuming than merging XML trees, such a map should be able to support additional requests without re-querying individual sources and re-assembly. The composite map creation problems are resolved by a special map assembly service which is presented below. It is described using three generic map assembly scenarios, in the order of increasing complexity.

Page 4: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

1. Spatial services return map fragments as images; the images are overlaid at the client or fused at the mediator. This is the simplest scenario for creating a composite map. Within the GEON environment, grid services exposed for spatial servers include a GifService interface whose methods translate (at source wrappers) into GetCapabilities and GetMap requests of the WMS specification (for ArcIMS: GET_SERVICE_INFO and GET_IMAGE requests respectively). On these requests, each server generates a GIF or PNG8 file with transparent background and within a common map envelope supplied by the mediator. The list of paths to generated image files is communicated to the mediator and either used by an Image Fusion service to merge them into a single image displayed in a standard ArcIMS client, or passed on to a custom ArcIMS or AxioMap-based [Zaslavsky, 2000] web client where images in the stack are displayed in the same coordinate space, one on top of the other (Figure 2). Note that before map fragments can be meaningfully merged into a single map, a common legend must be generated, and uniform rendering requests submitted to spatial servers.

Figure 2. Displaying a stack of image fragments at the client, to produce a composite map. An example from GEON Rocky Mountains test bed.

Page 5: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

2. Spatial services return coordinate information for result fragments, which is rendered as an acetate layer in an ArcIMS service In this scenario, ArcIMS servers (with GET_FEATURES request allowed) and WFS servers are wrapped to expose GetCoordinates grid interface. Query results are returned as either GML or ArcXML (inside the <FEATURES> element). The map assembly service uses the coordinate information to generate a single multi-part feature from all features returned. This feature is added as an acetate layer to an ArcIMS service which already contains a collection of base map elements. The scenario is shown schematically in Figure 3. This approach is efficient when the results set is fairly small, due to limitations on the size of acetate layers. As a variation, the coordinate results may be sent to an SVG client for rendering – subject to the similar limitation on the size of the results set. If the amount of coordinate information returned by each spatial server is large, then the map assembly service converts result fragments into one or more shapefiles and

adds them, as layers, to an existing ArcIMS service. In practice, however, transmitting large volumes of XML-formatted coordinate information between sources during query execution takes significant time, leading to poor scalability of this approach.

Figure 3. The second strategy for map assembly: combining coordinates into an acetate or similar layer.

3. Spatial services return both coordinate and image result fragments, in a variety of formats; a complete ArcIMS service is generated from the fragments When query result fragments are of arbitrary types, including raster images, [compressed] shapefiles, GML, ArcXML with coordinate information for each feature, etc., the map assembly service retrieves them into a local staging area (using GridFTP, in particular), and generates a new ArcIMS image service. The result fragments are then added as layers into the ArcXML service configuration file. The mapping client then interacts with this dynamically-generated service, ideally without re-querying individual sources on each user request. The schematics of the service is shown in Figure 4, and its main internal components, as implemented in GEON middleware, are in Figure 5.

Page 6: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

The service is implemented using OGSA grid service Factory interface, and supports lifetime management via SoftStateDestruction and ExplicitDestruction interfaces. The Factory interface creates a new Grid service instance and returns a Grid service handle, which in turn can be used to retrieve the service WSDL description from the Grid Service Reference for subsequent querying. The service must be destroyed when additional user requests exceed the capabilities of the service (say, if the user zooms out beyond the area covered by the transient ArcIMS service), or after certain period of inactivity. Explicit lifetime management is thus an important component of this service specification.

Figure 4. The third strategy for map assembly: combining raster and vector fragments into a new dynamic ArcIMS service.

Let’s consider the internal operation of the service step-by-step (see Figure 5). On the first step, the user formulates a request against distributed resources, which includes desired extent of the output map, the query expression, and the type of client (the latter is used by the mediator to determine an appropriate type of the output map, whether it is a single or multiple images, XML coordinate information – for SVG rendering, or both). The mediator orchestrates query execution at each source by issuing grid service calls against source wrappers (2). Handles (including data types and paths) to result fragments generated by each service, along with the initial query expression and map extent, are transferred to the map assembly service (3). Using these handles and the query, the command module organizes them into a map configuration file similar to ArcXML Config document, but with paths pointing to remote resources as specified in the handles (4). Additionally, it selects (5) one of several map assembly templates stored in command.xml, which bind together available processing components (file transfer service, uncompress service, image fusion service, data conversion service, etc.) into a map assembly workflow. At the next step, the workflow is executed (6) with help of File Transfer Service (file transfer over HTTP or via GridFTP web service), Uncompress Service (for efficiency, data are shipped in compressed format), Image Fusion Service (responsible for merging raster fragments into a composite map image), and Data Conversion Service (responsible for converting result fragments into formats that ArcIMS can import). Finally, the Image Assembly Service rewrites the map configuration file into a valid ArcXML configuration file, and uses it to start a new ArcIMS service (7).

Page 7: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

Figure 5. Internal organization of the map assembly services within a spatial wrapper-mediator system. Conclusion Techniques for assembling query results from spatial fragments into a composite map document are quite different from merging XML fragments into a single XML tree, which is a common component of XML-based information mediation. This paper reported our experiences developing grid services for assembling composite maps from heterogeneous fragments, which are retrieved from distributed grid sources. We outlined a range of scenarios with increased complexity, for handling different combinations of result fragments. Ultimately, the scenario requiring most flexibility in map assembly leads to automatic generation of a grid service instance based on a transient ArcIMS service. Acknowledgments

Support under US National Science Foundation grants #0121269 “ITR/IM: Enabling the Creation and Use of GeoGrids for Next Generation Geospatial Information” and #0205049 “ITR: GEON: The Geosciences Network: A Research Project to Develop Cyberinfrastructure for the Geosciences”, is gratefully acknowledged.

References Baru, C., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P. and

Chu, V. (1999). "XML Based Information Mediation with MIX". Proc. of the ACM SIGMOD 1999, pp. 597-599.

Page 8: Ilya Zaslavsky, Ashraf Memon - Esri · 2004-09-09 · Ilya Zaslavsky, Ashraf Memon Abstract Integrating spatial information from multiple grid-enabled sources of geologic data is

Foster, I., Kesselman, C. and Tuecke, S. (2001). “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”. International J. Supercomputer Applications, 15(3).

Foster, I., Kesselman, C., Nick, J. and Tuecke, S. (2002) “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration” (www.globus.org/research/papers/ogsa.pdf).

GGF (2003). Global Grid Forum (http://www.gridforum.org/) GEON, the Geosciences Network (2003). www.geongrid.orgGLOBUS (2003). The GLOBUS Project (http://www.globus.org/) Gupta, A., Marciano, R., Zaslavsky, I., and Baru, C. (1999). “Integrating GIS and Imagery

through XML-Based Information Mediation”. In P. Agouris and A. Stefanidis (Eds.) Integrated Spatial Databases: Digital Images and GIS, Lecture Notes in Computer Science, Vol. 1737, pp. 211-234.

Lin, K, and Ludäscher, B. (2003) A System for Semantic Integration of Geologic Maps via Ontologies, In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.

Open GIS Consortium (2000). OpenGIS Web Map Server Interfaces Implementation Specification.

Open GIS Consortium (2001). Geography Markup Language (GML) 2.0. Open GIS Consortium (2002). OpenGIS Web Feature Service Implementation Specification. W3C (2001). Scalable Vector Graphics (SVG) 1.0 Specification, W3C Recommendation, 04

September 2001. W3C (2003a). Web Services Description Language (WSDL) Version 1.2. W3C Working

Draft 24 January 2003 W3C (2003b). Simple Object Access Protocol, W3C Proposed Recommendation, 07 May

2003. Zaslavsky, I. (2000). "A New Technology for Interactive Online Mapping with Vector

Markup and XML". Cartographic Perspectives, # 37 (Fall 2000), pp. 65-77. Zaslavsky, I., Memon, A., Petropoulos, M., and Baru, C. (2003) Online Querying of

Heterogeneous Distributed Spatial Data on a Grid. Proceedings of Digital Earth’2003 Conference.

Author information: Dr. Ilya Zaslavsky Director, Spatial Information Systems Lab, San Diego Supercomputer Center, University of California San Diego 9500 Gilman Drive, La Jolla, CA 92093-0505, USA Phone: 858 534 8342 Fax: 858 534 5113 E-mail: [email protected] Ashraf Memon Programmer/Analyst San Diego Supercomputer Center, University of California San Diego 9500 Gilman Drive, La Jolla, CA 92093-0505, USA Phone: 858 822 0017 Fax: 858 534 5113 E-mail: [email protected]