29
1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PES C Programa de Engenharia de Sistemas e Computação

1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

Embed Size (px)

Citation preview

Page 1: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

1

Spatial Query Broker in a Grid

Environment

Author: Wladimir S. Meyer

Advisors: Jano M. Souza

Milton R. Ramirez

FEDERAL UNIVERSITY OF RIO DE JANEIROPESC

Programa de Engenhariade Sistemas e Computação

Page 2: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

2

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 3: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

3

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 4: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

4

MotivationThe dissemination of GIS systems, associated with the improvement of channels’

bandwidth, is increasing quickly and the interactions between data producers and consumers are becoming more frequent, complex and dynamic.

Some hot points in these relationships:

● Huge amount of data spread by many different geographic places

● Complexity of spatial data

● Demand for sophisticated services delivered by web

● The high price that shared resources may have in some federations (CPU time, storage space, ...)

● Integration problems (many levels of heterogeneity)

Distributed spatial operations and methods to improve their efficiency take an important role in this context .

There are a lot of works involving spatial operations in a centralized context, but few in a distributed context.

The Grid computig paradigm aggregate many characteristics that can improve the execution of distributed spatial operations.

Distributed spatial operations and methods to improve their efficiency take an important role in this context .

There are a lot of works involving spatial operations in a centralized context, but few in a distributed context.

The Grid computig paradigm aggregate many characteristics that can improve the execution of distributed spatial operations.

Page 5: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

5

This work aim at improving the efficiency of distributed spatial join by means of an architecture that permits the allocation of non-specialized computers in execution of the operation, reducing the overall response time.

Spatial join was focused because it is a very common operation in GIS systems and has a high processing cost.

The architecture also offers condictions to make experiments with new algorithms (filter/refine, scheduler, ...)

Goal

SpatialJoin

SDBMS Spatial Join

Page 6: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

6

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 7: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

7

The ProblemHow to proceed with a spatial join in a pool of data providers that share a huge amount of spatial data, in order to have the response time bellow a limit stated by some quality criteria?

The data fragmentation may be spatial and/or thematic (ie a hybrid schema) and there are local spatial indexes on each dataset

This scenario could be depicted by a pool of regional governmental agencies responsible by cartographic data generation, offering query-services that run over their data by mean of the internet.

Themes related with:

● Transport

● Hydrography

● Infra-structure,...

Themes related with:

● Transport

● Hydrography

● Infra-structure,...

Page 8: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

8

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 9: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

9

Related Work

Many important works in spatial query processing are related with the filter / refine strategy [5]. Some of them are mentioned bellow:

● Multi-Step processing of spatial joins Brinkhoff et al [6]

● Raster signatures in spatial joins (4CRS) Zimbrao et al [30]

● Multi-Steps with remote indexes (MR2) Ramirez and Souza [26]

On the other hand, the execution of the query plan in a distributed context may emphasize the parallelism as a manner to reduce the overall response time.

● MR2 Ramirez [26]

● Grid Greedy Node, Porto et al [25]

● OGSA-DQP, Smith et al [27]

The need of a scheduler module in some of these strategies should guarantee an adequate load balance among the selected local SDBMS

Page 10: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

10

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 11: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

11

The Proposal

In this work, the grid’s ability in offering resources on-demand is used to reduce the overall response time during distributed spatial query join operations in databases.

The parallelism in previous works involves only those nodes that are storing spatial data mentioned in the query. Our proposal is involve also generic computational resources in the most expensive step of the filter / refine strategy: the exact geometry processing.

Multi-step filter / refine strategy [6]

Geometric Filtering

Filtering step: SDBMSs

Exact processing step: generic computacional resources

MBR joinDataset 1

Dataset 2

ExactprocessingResults

Page 12: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

12

The Proposal

The follow picture gives an overview of the context:

InformationService(MDS)

User

OGSA-DAI

OGSA-DAI

OGSA-DAI

ReplicaLocationService

Authentication

Proxy generation

Query

WEB Portal

API Information

SpatialQuery Broker

ResourceBroker

Virtual Organization

CE1

CEn

CE6

CE5

CE4

CE3

CE2

Receives the global query and checks the user rights

Auxiliary services

Specialized CEs

Generic CEs

Meta-schedulers

Page 13: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

13

The Proposal

A specialized meta-scheduler, named Spatial Query Broker (SQB), is being proposed to deal with all spatial query processing, in a similar way as conventional Resource Brokers in grid environments.

Item SQB OGSA-DQP GridWay WMS

Unit of work Query Query Job Job

App domain Databases Databases Generic jobs Generic jobs

Dynamic scheduling Yes No Yes No

Spatial queries? Yes No - -

Use generic nodes? Yes No - -

Page 14: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

14

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 15: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

15

SDBMS 1

Coordinator

Execution Monitor

Locator Optimizer SchedulerQuery Decomp.

CE n CE mSDBMS n

MDS

Grid Resources

Spatial Query Broker

Resources shared by organizations

SQB Architecture

The SQB is composed by the following modules:Manages all data flow and the sequence of eventsAnalysis and simplification

of the query

To find data providers that store needed data and to acquire CEs statusSelects the SDBMs and manages the filtering steps

Manages the exact geometry step over CEs

Is the interface with Ces, submitting and monitoring tasks

Delivers information about resources and data partitioning

Page 16: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

16

SQB Architecture

SDBMS

CE1

SDBMS

CE2

MBRs + 4CRS

Region r

Theme 1

Region r

Theme 2

T1 T2MBR

+Geometric

filtering

SDBMS

CE1

Execution Monitor

Execution MonitorInconclusive pairs

and some positive hits

(ids + # vertices)

Steps managed by the optimizer

Page 17: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

17

SQB Architecture● The Execution Monitor builds two queues to store the inconclusive pairs in order to deliver them to the CEs.

● One of them are shared among faster CEs, while the other among slower ones.

● The total number of vertices is adopted as indicator to the complexity of the processing.

● A throughput indicator is previously picked up from the CEs and registered in the Information server (MDS)

Pair j+2

Pair 2

Pair 1

Pair j+1

Pair j#vertices >= threshold #vertices < threshold

Execution Monitor

Scheduler

CE n

CE n+1

CE 1

CE 2

Queue 2Queue 1

Throughput(CEi) >= t Throughput(CEi) < t

It isn’t necessary to sort the pairs

It isn’t necessary to sort the pairs

Page 18: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

18

SQB Architecture

Ac

qu

ire

in

form

ati

on

Pre

pa

re q

ue

ryF

ilte

rin

g S

tep

Re

fin

ing

Ste

p

Simplified sequence diagram

Page 19: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

19

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 20: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

20

Preliminary TestsDespite a prototype is under construction, a few tests were done with synthetic

spatial datasets consisting of polygons in order to give us some relative parameters to guide our work while dealing with spatial joins among polygons (overlap predicate).

Spatial join operations were performed over servers that have both datasets R-Tree indexed.

The original datasets were partitioned in four and nine regular parts and the response time (RT) on each situation was taken:

RT = TMSG * #messages + TTX * # bytes + TCPU + TI/O

Objets that cross boundaries were replicated on involved datasets (they weren’t split).

The tests were executed in three situations:

● The whole query at once in a single SDBMS

● The query over the same region broken in four parts and executed by four identical machines

● The query over the same region broken in nine parts and executed by nine identical machines

Page 21: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

21

Preliminary Tests

1 2 3

4 5 6

7 8 9

1 2 3

4 5 6

7 8 9

NW NE

SESW

NW NE

SESW

W W

Theme 1 Theme 2

WP = intersection

WQuery = WP = intersection

WQuery =

Page 22: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

22

Preliminary Tests

CPU I/O Comm1 SDBMS FULL 10912 2538685 2213416 735 77475 2291626 1.00

NW 2385 533445 133680 172 16279 150131NE 2706 628885 147175 194 19192 166561SW 3154 734406 169332 233 22412 191977SE 2747 640873 148509 209 19558 1682761 1065 246074 24794 65 7510 323692 922 212832 24820 64 6495 313793 1167 269244 29273 85 8217 375754 1140 266527 28396 81 8134 366115 1338 312329 34590 97 9532 442196 1252 292853 29057 90 8937 380847 1440 336147 35398 96 10258 457528 1547 360394 39730 113 10998 508419 1138 265508 27332 81 8103 35516

4 S

DB

MS

11.94

# of SDBMS

Region Name

# objResultset

size Costs Total

CostSpeedup

9 S

DB

MS

45.07

RT = TMSG * #messages + TTX * # bytes + TCPU + TI/O

CommCPU I/O

This operation is CPU bound and the communication cost has a low impact in the final response time. This operation is CPU bound and the communication cost has a low impact in the final response time.

* Communication’s cost based on a 256kbps bandwidth

+ T remove replicas

10992

11009

Page 23: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

23

0

5

10

15

20

25

30

35

40

45

50

Proc/Comm speedup

1 4 9

The processing cost and the communication cost tend to reach a same magnitude when the number of servers increase.

The processing cost and the communication cost tend to reach a same magnitude when the number of servers increase.

Preliminary Tests

The superlinear speedup means, in this case, that computational resources available in a single machine were insufficient to reach good response time

The superlinear speedup means, in this case, that computational resources available in a single machine were insufficient to reach good response time

# servers

parallel

single

RT

RTSpeedup

Page 24: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

24

Test conditions

The preliminary tests were executed under the following conditions:

● Spatial Database: Secondo

● Grid Middleware: Globus GT4

● Datasets: Two datasets composed by 10060 triangles indexed

● Hardware: Sempron 2800, 1GB RAM, 80GB HD

● OS: Fedora Linux

The overall architecture is under construction and is based on web services (WSRF)

Page 25: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

25

● Motivation and Goal● The Problem● Related works● The Proposal● SQB Architecture● Preliminary Tests● Remarks

Outline

Page 26: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

26

Remarks

This work presents an architecture based on grid infrastructure tailored to cover some needs of a distributed geographic information system.

The focus was on offering a strategy to execute spatial queries over spatial databases managed by several organizations that are gathered in a federation

The filter/refine approach was adopted and tried to use some pre-existent spatial index in datasets.

A global ID structure must be proposed in order to:

● Easily reduce the multi-processing of objects crossing boundaries after filtering step (avoiding to move them unnecessarily to CEs)

● Isolate the processing in SQB from local IDs, improving the scalability

As next steps

● Specify new cost models to help the optimizer and the scheduler taken into account the dynamic of the environment

● Research the scheduling process in order to improve the reliability of the architecture

● Compare the response time of a join, executed over a benchmark dataset, with that one executed in similar distributed environments

Page 27: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

27

References1. Adzigogov, L., Soldatos, J., and Polymenakos, L. (2005). "EMPEROR: An OGSA Grid Meta-Scheduler based on Dynamic Resource." Journal of Grid Computing, 3, 19-37.

2. Afgan, E. (2004). "Role of the Resource Broker in the Grid." ACM, Huntsville, Alabama, USA.

3. Andretto, P. e. a. (2004). "Practical approaches to Grid workload and resource management in the EGEE project.".

4. Azevedo, L. G., Monteiro, R. S., Zimbrão, G., and Souza, J. M. (2004). "Approximate Spatial Query Processing Using Raster Signature.".

5. Brinkhoff, T., Kriegel, H. and Seeger B.(1993). “Efficient Processing of Spatial Joins Using R-Trees”, In: Proceedings of the 1993 ACM SIGMOD, Washington,DC.

6. Brinkhoff, T., Kriegel, H., and Schneider, R. (1994). "Multi-Step Processing of Spatial Joins." Washington,DC - USA, 237-246.

7. Buyya, R., and Venegupal, S. (2004). "The Gridbus Toolkit for Service Oriented Grid and Utility Computing: An overview and Status Report.".

8. Câmara, G., and Queiroz, G. (2002). "GeoBR: Intercâmbio Sintático e Semântico de Dados Espaciais.".

9. Di, L., Chen, A., Yang, W., and Zhao, P. (2003). "The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data.".

10. EGEE .(2006) "GLite - Installation and Configuration Guide v 3.0 (rev 2)" , European Union.

11. Egenhofer, M. J., and Herring, J. R. (1994) "Categorizing Binary Topological Relations Between Regions, Lines and Point in Geographical Databases" , NCGIA.

12. "Globus Toolkit 4."(2005). www.gridbus.org/escience/051205GlobusTutorialeScience.ppt, July/2006.

13. Foster, I., and Kesselman, C. (1999). "Computational grids." The Grid: Blueprint for a New Computing Infrastructure, Morgan-Kaufman.

14. Foster, I., Kesselman, C., and Tuecke, S. (2001). "The Anatomy of the Grid Enabling Scalable Virtual Organizations." Lecture Notes in Computer Science, 2150.

15. Gistafson, J. L. (1990). "Fixed Time, Tiered Memory, and Superlinear Speedup.".

Page 28: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

28

References16. GridWay Team .(2006) "GridWay 5 Documentation: User Guide" Madrid, Spain, Universidad Complutense de Madrid.

17. Güting, R. H., Behr, T., Almeida, V., Ding, Z., Hoffmann, F., and Spiekermann, M. (2004) "Secondo: An Extensible DBMS Architecture and Prototype" Hagen, Germany, Fernuniversität Hagen.

18. Hanssen, G. (2005). "The Filter/Refine Strategy: A Study on the Land-Use Resource Dataset in Norway.".

19. Ilya, Z., Memon, A., Petropoulos, M., and Baru, C. (2003). "Online Querying of Heterogeneous Distributed Spatial Data on a Grid." Brno, Cz, 813-823.

20. Kang, M.-S., and Choy, Y.-C. (2002). "Deploying parallel spatial join algorithm for network environment." IEEE, 177-181.

21. Meyer, W. S., and Souza, J. M. (2006). "Overlapped Regions with Distributed Spatial Databases in a Grid Environment." Rio de Janeiro, Brazil.

22. Meyer, W. S., Souza, J. M., and Ramirez, M. R. (2005). "Secondo-grid:An Infrastructure to Study Spatial Databases in Computational Grids." Campos do Jordão, SP, Brazil.

23. Mondal, A., Goda, K., and Kitsuregawa, M. (2003). "Effective Load-Balancing via Migration and Replication in Spatial Grids." Lecture Notes in Computer Science, 2736, 202-211.

24. Özsu, M. T., and Valduriez, P. (2001). "Principles of Distributed Database Systems." Prentice-Hall.

25. Porto, F., Silva, V. F. V., Dutra, M. L., and Shulze, B. (2005). "An adaptive distributed query processing grid service." Trondheim, Norway.

26. Ramirez, M. R. (2001) "Spatial Distributed Query Processing" Rio de Janeiro, RJ, COPPE/UFRJ.

27. Smith, J., Gounaris, A., Watson, P., Paton, N. W., Fernandes, A. A. A., and Sakellariou, R. (2002) "Distributed Query Processing on the Grid"

28. "OGSA-DQP 3.1 User's Documentation."(2006). http://www.ogsadai.org.uk/documentation/ogsa-dqp_3.1/, July/2006.

29. Venegupal, S., Buyya, R., and Winton, L. (2004). "A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids.".

30. Zimbrão, G., and Souza, J. M. (1998). "A Raster Approximation for the Processing of Spatial Joins." New York - USA, 558-569.

Page 29: 1 Spatial Query Broker in a Grid Environment Author: Wladimir S. Meyer Advisors: Jano M. Souza Milton R. Ramirez FEDERAL UNIVERSITY OF RIO DE JANEIRO PESC

29

Thank You !Thank You !