51
Spatial databases Introduction

Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

  • Upload
    dohanh

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial databases

Introduction

Page 2: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Data

● “Also known as geospatial data or geographic information, it represents the data or information that identifies the geographic location of features and boundaries on Earth, such as natural or constructed features, oceans, and more. Spatial data is usually stored as coordinates and topology, and is data that can be mapped. Spatial data is often accessed, manipulated or analyzed through Geographic Information Systems” (Webopedia)

Page 3: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Data Applications

● GIS applications (maps): ○ urban planning ○ route optimization ○ fire or pollution monitoring ○ utility networks

● Other applications: ○ VLSI design ○ CAD/CAM ○ model of human brain

● Traditional applications: ○ Multidimensional records

Page 4: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● “GIS is a computer system capable of assembling, storing, manipulating, and displaying geographically referenced information, i.e. data identified according to their locations.”

● “A GIS is an organized collection of computer hardware, software, geographic data, and personnel to efficiently capture, store, update, manipulate, analyze, and display all forms of geographically referenced information.”

● “An information system that is designed to work with data referenced by spatial or geographic coordinates. In other words, a GIS is both a system with specific capabilities for spatially-referenced data, as well as a set of operations for working [analysis] with the data.”

● “Automated systems for the capture, storage, retrieval, analysis, and display of spatial data.”

What Is GIS?

Page 5: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● computer-based systems for the management, display, and analysis of geographic data

● things GIS can do: ○ Locate geographic features based on their properties (locate

cities within a region of population greater than X) ○ Identify properties of geographic features based on their

location . ○ Determine a good location of a Mall, based on the

demographics and land-use/availability. ○ Generate optimal routing and scheduling for a delivery truck. ○ Determine the ground area covered by a new cellular phone

network. Locate "holes" in cellular coverage. ○ Make colorful, interesting, and informative maps.

GIS

Page 6: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Data Capture ○ Data sources are obtained from manual digitization and

scanning of aerial photographs, paper maps, and existing digital data sets

● Database Management and Update ○ data storage and retrieval, data security, data integrity,

and and data maintenance ● Geographic Analysis

○ The collected information is analyzed and interpreted qualitatively and quantitatively.

● Preparing Result ○ A variety of different ways in which the information can be

presented.

Principle

Page 7: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Hardware ○ Computer System, Scanner, Printer, Plotter, Flat Board

● Software ○ GIS software (MapInfo, ARC/Info, AutoCAD Map). The

software available can be said to be application specific. ● Data

○ A GIS integrates spatial data with other data resources and can even use a DBMS, used by most organization to maintain their data, to manage spatial data.

● People ● Method

○ The map creation can either be automated raster to vector creator or it can be manually vectorised using the scanned images.

Components

Page 8: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Improve Organizational Integration - GIS integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information.

● Helps you answer questions and solve problems by looking at your data in a way that is quickly understood and easily shared.

● GIS allows us to view, understand, question, interpret, and visualize data in many ways that reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts.

● GIS technology can be integrated into any enterprise information system framework.

● More employment opportunity

Advantages of GIS

Page 9: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Expensive software ● Requires large amount of data inputs to be practical for some

tasks ● The earth is round and geographic error is increased as you get

into a larger scale. ● GIS layers may lead to costly mistakes when property agents

interpret a GIS map, or engineer's design around GIS utility lines.

Disadvantages of GIS

Page 10: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial DBMS ● Spatial RDBMS is an RDBMS that can process spatial data.

Popular RDBMSs offer their own Spatial RDBMS features or add-ons so that spatial data can be processed.

● Spatial RDBMS allows to use SQL data types, such as int and varchar, as well as spatial data types, such as Point, Linestring and Polygon for geometric calculations like distance or relationships between shapes.

● Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and triangular irregular networks.

● Spatial RDBMS are not the only spatial database management systems available. Many databases, such as MongoDB, search engines such as Lucene or Solr, provide spatial data processing features.

Page 11: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

SDBMS

● A spatial database system is a database system ○ A DBMS with additional capabilities for handling spatial data

● Offers spatial data types (SDTs) in its data model and query language ○ Structure in space: e.g., POINT, LINE, REGION ○ Relationships among them: (l intersects r)

● Supports SDT in its implementation providing at least ○ spatial indexing (retrieving objects in particular area

without scanning the whole space) ○ efficient algorithms for spatial joins (not simply filtering the

cartesian product)

Page 12: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

SDBMS

Page 13: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

SDBMS Features

● Spatial Measurements: Computes line length, polygon area, the distance between geometries

● Spatial Functions: Modify existing features to create new ones ● Spatial Predicates: Allows true/false queries about spatial

relationships between geometries ● Geometry Constructors: Creates new geometries, usually by

specifying the vertices (points or nodes) which define the shape ● Observer Functions: Queries which return specific information

about a feature

Page 14: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Data consists of observations we make from seeing the real world

● Spatial data: ○ consists of observations with locations ○ identifies features and positions on the Earth’s surface ○ is how we put our observations on the map

● Vector and raster data are the two primary data types used in GIS. Both vector and raster data have spatial referencing systems. These are latitudes and longitudes that represent positions on Earth.

Spatial Data Types

Page 15: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Vector graphics are comprised of vertices and paths. ● The three basic symbol types for vector data are points, lines and

polygons (areas). ● Points as XY Coordinates

○ Vector points are simply XY coordinates. When features are too small to be represented as polygons, points are used.

○ Vector data are stored as pairs of XY coordinates (latitude and longitude) represented as a point.

Vector Spatial Data Type

Page 16: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Lines As Connected Points ○ Vector lines connect vertices with paths. If you were to

connect the dots in a particular order, you would end up with a vector line feature.

○ Lines usually represent features that are linear in nature. ○ They can exist in the real-world such as roads or rivers. Or

they can also be artificial divisions such as regional borders or administrative boundaries.

○ Points are simply pairs of XY coordinates (latitude and longitude). When you connect each point or vertex with a line in a particular order, they become a vector line feature.

Vector Spatial Data Type

Page 17: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Polygons As Closed Lines ○ When a set of vertices are joined in a particular order and

closed, they becomes a vector polygon feature. In order to create a polygon, the first and last coordinate pair are the same and all other pairs must be unique.

○ Polygons represent features that have a two-dimensional area. Examples of polygons are buildings, agricultural fields and discrete administrative areas.

○ Cartographers use polygons when the map scale is large enough to be represented as polygons.

Vector Spatial Data Type

Page 18: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Raster data is made up of pixels (also referred to as grid cells). They are usually regularly-spaced and square but they don’t have to be. Rasters often look pixelated because each pixel is associated with a value or class.

● Example: each pixel value in a digital photograph is associated with a red, green and blue value. Or each value in a digital elevation model represents a value of elevation.

● Raster models are useful for storing data that varies continuously, as in an aerial photograph, an elevation surface or a satellite image. But it depends on the cell size for spatial accuracy.

● Raster data models can be discrete and continuous.

Raster Spatial Data Type

Page 19: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Discrete rasters ○ also referred to as thematic or categorical raster data. They

have distinct themes or categories. For example, one grid cell represents a land cover class or a soil type.

○ in a discrete raster land cover/use map, you can distinguish each thematic class. Each class can be discretely defined where it begins and ends. The class fills the entire area of the cell

○ Discrete data usually consists of integers to represent classes. For example, the value 1 might represent urban areas, the value 2 represents forest, etc.

Raster Spatial Data Type

Page 20: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Continuous Rasters ○ are grid cells with gradual changing data such as elevation,

temperature or an aerial photograph. Continuous data is also known as non-discrete or surface data.

○ surface can be derived from a fixed registration point. For example, a digital elevation model is measured from sea level. Each cell represents a value above or below sea level.

Raster Spatial Data Type

○ Phenomena can gradually vary along a continuous raster from a specific source. For example, a raster depicting an oil spill. At the source of the oil spill, concentration is higher. It diffuses outwards with diminishing values as a function of distance.

Page 21: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Raster Spatial Data Type

Page 22: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Raster VS Vector

Page 23: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Advantages: ○ Vector data is comprised of paths. This means that graphical

output is generally more aesthetically-pleasing. It gives higher geographic accuracy because data isn’t dependent on grid size.

○ Topology rules can help data integrity with vector data models. Vector data structure is the model of choice for efficient network analysis and proximity operations.

● Disadvantages: ○ Continuous data is poorly stored and displayed as vectors. In

order to display continuous data as a vector, it would require substantial generalization.

○ Although topology is useful for vector data, it is often processing intensive. Any feature edits requires updates on topology. With a lot of features, vector manipulation algorithms are complex.

Vector: Advantages/Disadvantages

Page 24: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Advantages: ○ Raster grid format is the natural output of choice of satellite

data. Raster positions are simple. With cell size and a bottom-left coordinate, each cell position can be inferred.

○ Data analysis with raster data is usually quick and easy to perform. With map algebra, quantitative analysis is intuitive equally with discrete or continuous rasters.

● Disadvantages: ○ Graphic output and quality is based on cell size. It can have a

pixelated look and feel. Linear features and paths are difficult to display and depends on spatial resolution.

○ Networks are difficult to establish. Multiple fields with attribute data is difficulty and maps are often restricted to displaying a single attribute field.

○ Raster datasets can become potentially very large.

Raster: Advantages/Disadvantages

Page 25: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Deciding which spatial data type should be used to model real-world features is not always straight-forward:

● It depends on the way in which the individual conceptualizes the feature to select spatial data types. ○ Do you want to work with pixels or coordinates? Raster data

works with pixels. Vector data consists of coordinates. ○ Do you want to scale your features? Vectors can scale objects

up to the size of a billboard. You don’t get that type of flexibility with raster data

○ Do you have restrictions for file size? Raster file size can result larger in comparison with vector data sets with the same phenomenon and area.

Vector or Raster?

Page 26: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Spatial DBMS is most widely used in areas such as transportation (air/shipping) and astronomy. The calculation used for oil-pump construction and to set flight courses can be explained in geometric terms as calculating the length of a line that is connected from one dot to the other on the Earth’s surface.

● Locations on the Earth’s surface is usually marked using the longitude and latitude.

Spatial Reference

● As seen in the figure, a one degree longitude difference differs greatly depending on the latitude. A longitude 1° from the equator is 111.321 km, but longitude 1° from latitude 60° is only 55.802 km.

Page 27: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● A three dimensional Earth surface layout must be modified (by projection) to a two dimensional map for easy calculation of width and distance. However, it’s impossible to show a round Earth in a two dimensional form so that it is both precise and simple.

● The most famous map in the world, the Mercator Projection, is suitable for nautical purposes. The angles of the circular Earth are identically portrayed in a flat surface. Therefore, the image is larger than it actually is near the pole areas.

● Plate Carrée Projection (known as Equirectangular Projection) is better than Mercator when calculating lengths or widths.

Spatial Reference

Page 28: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● As the Plate Carrée Projection figure shows, the location is projected with a square grid. Therefore, the calculation of distance from north, south, east and west is extremely precise. Plate Carrée Projection is currently used as the de facto method in the GIS area and has been adopted by the Google Earth service.

● However, we still have to find the longitude and latitude. As shown in Geographical Coordinate System figure, we need to know where the center of the Earth is to calculate the longitude and latitude using the meridian.

Spatial Reference

Page 29: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Earth is neither globular nor oval. More than that, Earth surface’s position shifts due to the Earth crust’s movement. Earth surface moves 1cm every year without a severe earthquake. The center of the Earth always changes.

● The difference between the center of a virtual globe and the center of the changing oval globe is called the Geocentric Datums, or just Datums.

Spatial Reference

● Although the location is identical on the surface, the longitude and latitude values can change depending on the Datums. The Datums currently most widely used as a “standard” is the value determined by the WGS84 (World Geodetic System), which is the standard method used in GPS.

Page 30: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

● Spatial Reference is a combination of the coordination system, projection and datum. The EPSG (European Petroleum Survey Group) created an ID system for the coordination system, projection and datum.

● When we say the EPSG:4326 is used as a spatial reference, we mean that the Plate Carrée Projection is used with the longitude/latitude value based on WGS84. EPSG:4326 is a spatial reference used for GPS satellites and NATO military purposes, and is most widely used in the world.

Spatial Reference

Page 31: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Modeling

● Assume 2-D and GIS application, two basic things need to be represented: ○ Objects in space: cities, forests, rivers - they represent

single objects ○ Coverage/Field: say something about every point in space

(e.g., partitions, thematic maps) spatially related collections of objects

Page 32: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Modeling

● Spatial primitives for objects: ○ Point: object represented only by its location in space

(center of a town) ○ Line (actually a curve or polyline): representation of

moving through or connections in space (road, river) ○ Region: representation of an extent in 2d-space (lake, city)

● Coverages ○ Partition: set of region objects that are required to be

disjoint (adjacency or region objects with common boundaries) - thematic maps

○ Networks: embedded graph in plane consisting of set of points (vertices) and lines (edges) objects - highways, power supply lines, rivers

Page 33: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Modeling

● a sample spatial type system EXT={lines, regions}, GEO={points, lines, regions} ○ Spatial predicates for topological relationships:

inside: geo x regions → bool intersect, meets: ext1 x ext2 → bool adjacent, encloses: regions x regions → bool

○ Operations returning atomic spatial data types: intersection: lines x lines → points intersection: regions x regions → regions plus, minus: geo x geo → geo contour: regions → lines

Page 34: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Modeling

○ Spatial operators returning numbers dist: geo1 x geo2 → real perimeter, area: regions → real

● Spatial operations on set of objects ○ A spatial aggregate function, geometric union of all

attribute values, e.g. union of set of provinces determine the area of the country

sum: set(obj) x (objgeo) → geo ○ Determines within a set of objects those whose spatial

attribute value has minimal distance from geometric query object

closest: set(obj) x (objgeo1) x geo2 → set(obj) ○ Other complex operations: overlay, buffering, …

Page 35: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Modeling

● Topological relationships (adjacent, inside, disjoint): are invariant under topological transformations like translation, scaling, rotation

● Direction relationships: above, below, or north_of, sothwest_of

● Metric relationships: distance ● 6 valid topological relationships between two simple regions

(no holes, connected): disjoint, in, touch, equal, cover, overlap

Page 36: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Querying

● Two main issues: 1. Connecting the operations of a spatial algebra (including

predicates for spatial relationships) to the facilities of a DBMS query language. Fundamental spatial algebra operator are: ○ Spatial selection ○ Spatial join ○ Other (overlay, fusion)

2. Providing graphical presentation of spatial data (i.e. results of queries), and graphical input of SDT values used in queries.

Page 37: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Querying

● Spatial selection: returning those objects satisfying a spatial predicate with the query object All cities in Cluj county

SELECT sname FROM cities c WHERE c.center

inside Cluj.area

All rivers intersecting a query window SELECT * FROM rivers r WHERE r.route

intersects Window

All big cities no more than 250 kms from Cluj SELECT cname FROM cities c

WHERE dist(c.center, Cluj.center) < 250 and

c.pop > 50k

Page 38: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Querying

● Spatial join: A join which compares any two joined objects based on a predicate on their spatial attribute values. For each river pass through Cluj, find all cities within less than

250 kms SELECT r.rname, c.cname,

length(intersection(r.route, c.area))

FROM rivers r, cities c

WHERE r.route intersects Cluj.area and

dist(r.route,c.area) < 250

Page 39: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Data Structures & Algorithms

● Implementation of spatial algebra in an integrated manner with the DBMS query processing.

● Not just simply implementing atomic operations using computational geometry algorithms, but consider the use of the predicates within set-oriented query processing ○ spatial indexing or access methods ○ spatial join algorithms

Page 40: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Data Structures

● Representation of a value of a SDT must be compatible with two different views: ○ DBMS perspective:

■ Same as attribute values of other types with respect to generic operations

■ Can have varying and possibly large size ■ Reside permanently on disk ■ Can efficiently be loaded into memory ■ Offers a number of type-specific implementations for

generic operations needed by the DBMS (transformation functions from/to ASCII or graphic)

Page 41: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Data Structures

○ Spatial algebra implementation perspective: ■ Is a value of some programming language data type ■ Is some arbitrary data structure which is possibly quite

complex ■ Supports efficient computational geometry algorithms

for spatial algebra operations ■ Is not built to suit only one particular algorithm but is

balanced to support many operations well enough

Page 42: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing

● To make fast spatial selection (as well as other operations such as spatial joins, …) it organizes space and the objects in it in some way so that only parts of the objects need to be considered to answer a query.

● Two main approaches: ○ 1. Dedicated spatial data structures (e.g. R-tree) ○ 2. Spatial objects mapped to a 1-D space to utilize standard

indexing techniques (e.g. B-tree)

Page 43: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing

● Spatial data structures either store points or rectangles (for line or region values)

● Operations on those structures: insert, delete, member ● Query types for points:

○ – Range query: all points within a query rectangle ○ – Nearest neighbor: point closest to a query point ○ – Distance scan: enumerate points in increasing distance

from a query point. ● Query types for rectangles:

○ – Intersection query ○ – Containment query

Page 44: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing

•In order to have simple structures use of approximations as keys 1) continuous (e.g. bounding box) 2) Grid (a geometric entity as a set of cells).

•Filter and refine strategy for query processing: 1.Filter: returns a set of candidate object which is a superset of the objects fulfilling a predicate 2. Refine: for each candidate, the exact geometry is checked

Page 45: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing - 1 D

● One dimensional embedding: z-order or bit-interleaving ○ Find a linear order for the cells of the grid while

maintaining “locality” (i.e., cells close to each other in space are also close to each other in the linear order)

○ Define this order recursively for a grid that is obtained by hierarchical subdivision of space

● Any shape (approximated as set of cells) over the grid can now be decomposed into a minimal number of cells

● For each spatial object, we can obtain a set of “spatial keys” ● Index: can be a B-tree of lexicographically ordered list of the

union of these spatial keys

Page 46: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing - 1 D

Page 47: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial indexing: 2-D points

● Data structures representing points have a much longer tradition: ○ Kd-tree and its extensions (KDBtree and LSDtree) ○ Grid file (organizing buckets into an irregular grid of

pointers)

Page 48: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing: 2-D rectangles

● Spatial index structures for rectangles: unlike points,rectangles don’t fall into a unique cell of a partition and might intersect partition boundaries ○ Transformation approach: instead of k-dimensional

rectangles, 2k-dimensional points are stored using a point data structure

○ Overlapping regions: partitioning space is abandoned & bucket regions may overlap (e.g. R-tree & R*-tree)

○ Clipping: keep partitioning, a rectangle that intersects partition boundaries is clipped and represented within each intersecting cell (e.g. R+-tree)

Page 49: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Indexing: 2-D rectangles

Page 50: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Join

● Traditional join methods such as hash join or sort/merge join are not applicable.

● Filtering cartesian product is expensive. ● Two general classes:

○ Grid approximation/bounding box ○ None/one/both operands are presented in a spatial index

structure ● Grid approximations and overlap predicate:

○ A parallel scan of two sets of z-elements corresponding to two sets of spatial objects is performed

○ Too fine a grid, too many z-elements per object (inefficient)

○ Too coarse a grid, too many “false hits” in a spatial join

Page 51: Introduction - Universitatea Babeş-Bolyaihorea/CABD/Lectures/Lecture8.pdf · A spatial database system is a database system A DBMS with additional capabilities for handling spatial

Spatial Join

● Bounding boxes: for two sets of rectangles R, S all pairs (r,s), r in R, s in S, such that r intersects s: ○ No spatial index on R and S: bb_join which uses a

computational geometry algorithm to detect rectangle intersection, similar to external merge sorting

○ Spatial index on either R or S: index join scan the non-indexed operand and for each object, the bounding box of its SDT attribute is used as a search argument on the indexed operand (only efficient if non-indexed operand is not too big or else bb-join might be better)

○ Both R and S are indexed: synchronized traversal of both structures so that pairs of cells of their respective partitions covering the same part of space are encountered together.