Transcript
Page 1: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

21/04/23 © 2004 IBM Corporation

21/04/23

Extending a Relational DatabaseIntelligent storage of geospatial data

[email protected]

Page 2: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 2

Agenda

History

Current methods

Problems with current methods

Solution – extend database to support new data types

Timeseries

NAG

Spatial including Geodetic and GRID

Conclusion

Page 3: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 3

IBM - Informix Dynamic Server

It was the Informix-IDS Database Informix database business bought by IBM in 2001

IDS and DB2 UDB are similar and are getting closer

Extensibility in Informix came from Stonebraker and Brown Ingres -> Postgres -> Illustra added to Informix

DB2 UDB extensibility the same in theory, DB2 Extenders equivalent to IDS DataBlades

Page 4: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 4

Current Methods

Data held in a RDBMS Easy to manage

• SQL Easy to access from applications

• C- ESQL/C, CLI

• ODBC• JDBC

Easy to administer

• Standard tools• Trained staff

Many 3rd party products use RDBMS

Page 5: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 5

Problems with RDBMS

Not good with complex data

A row is just a row, no concept of

an ordered set of rows

Restricted set of data types

Not good with “unusual” functions

What is unusual ?

• Seasonal averaging

• Multivariate analysis

• VWAP

Not good at loading lots of data

items

OK with high volume of data

Not so good with lots of small

rows, e.g. ticks, readings

High latency (time between data

arriving and being visible through

SQL queries)

Page 6: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 6

Standard client-server

CLIENT

SERVER

DISK

CLIENT

MEMORY

Supports common data types Character

Numeric

• Integer• Float• Decimal

Datetime/interval

Large object

• Binary

• Text Collections

• List, set, multiset

Page 7: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 7

Database can be extended – new data types

Distinct e.g. lengths, mass etc

Row e.g. address = houseNumber + road + town + postCode

Opaque User defines how the data is stored

• In row• Binary large object• External (file or some external process)

Once created it is like an additional built-in type

Page 8: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 8

Opaque data type implementation(1)

Opaque types are implemented by a series of functions Cast functions

• lvarchar -> myType• myType -> lvarchar

Comparison function

• B-tree index support• R-tree index support

Mathematical operators

• + - x /

Page 9: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 9

Opaque data type implementation(2)

Functions implemented in C

Java

SPL Made known to the server by an SQL statement Functions usually execute as part of the main database server

process Types handled correctly by SQL layer

Optimiser

Indexing

Mathematical operations

Aggregates

Page 10: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 10

Opaque data type implementation(3)

Simple example, complex numbers Stored as 2 double precision numbers

Function to convert the string “12.34 + 56.78i” to a C structure containing 2 floats

Function to convert a C structure with 2 floats into the string “12.34 + 56.78i”

Functions to add, subtract, multiply and divide complex numbers

• complex_t *Plus(complex_t, complex_t) Can now be used in SQL statements:

• create table (u complex_t);• insert into table values (“1 + 2i”);• select u * u from tab;

- “-3 + 4i” Comparisons OK, what about ordering ?

Page 11: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 11

Virtual Table/Index Interface

Tables can be replaced by functions (VTI) Insert function

Select function

• First, next, previous, nth, last Delete function

Can create own indexing methods Specialist area

Example of VTI – can make Google look like a database table:

select * from google

where search = “IBM+IDS+TimeSeries”;

Page 12: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 12

Problems storing TimeSeries data in RDBMS

“Tall-thin” tables, primary key roughly same size as data Each row selected can cause a disk read Extra indexing often added for index only reads

Solutions include storing more than one element per row - makes

management difficult Using a separate timeseries database – expensive, difficult

to manage

Page 13: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 13

Timeseries – the problem Ordinary RDBMs store the values in rows, these rows are stored

randomly in tables even though access is usually in time order. You need to create an index to store the key value for each row in time order. This takes more space, data access can be slow with over complicated SQL statements.

Stock Timestamp Price Vol

ABC 2002-06-14 12:00:01 13.80 1000

XYZ 2002-06-14 12:00:01 98.76 50

ABC 2002-06-14 12:00:00 13.45 100

ABC 2002-06-14 12:00:00 4

ABC 2002-06-14 12:00:01 1

XYZ 2002-06-14 12:00:01 2

ABC 2002-06-14 12:00:00 13.45 100 4

ABC 2002-06-14 12:00:01 13.80 1000 1

XYZ 2002-06-14 12:00:01 98.76 50 2

Page 14: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 14

Timeseries – the solution

Store elements as vectors in time order

Stock Time series

ABC (2002-06-14 12:00:00 13.45 100),(2002-06-14 12:00:01 13.80 1000),…

XYZ (2002-06-14 12:00:01 98.76 50),…

Page 15: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 15

ORDBMS - Extended to Support TimeSeries

CLIENT

DISK

CLIENT

MEMORY

SERVER

TimeSeries

Supports common data types Supports extended data types and functionality

TimeSeries

Page 16: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 16

Advantages of TimeSeries

Performance (time/space)

Write SQL functions that work on TimeSeries

Space Time

Traditional (1) 516 2.39

Traditional (2) 903 0.43

TimeSeries 269 0.09

Page 17: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 17

TimeSeries

Create a type to hold the data:create row type rt_t (

ts datetime year to fraction(5),

minTemp temperature,

maxTemp temperature);

Create the table:

create table tab1 (

sensorId integer,

temps timeseries(rt_t));

Use the data:select temps, getLastElem(temps), getnelems(temps),

clip(temps, t1, t2), func(temps, t1, t2)

from tab1

where sensorId = 99;

Page 18: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 18

Timeseries functions

Built in SQL functions Used in SQL statements and SPL functions

Server and Client C API Common way of writing TimeSeries functions

Server and Client Java Class Library

Page 19: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 19

ORDBMS - Extended to Support Complex Analysis

Supports common data types Supports extended data types and functionality

TimeSeries

NAG types and functions

• Moves processing closer to the data• Data reduction at earliest opportunity• Functionality common to very different clients

CLIENT

DISK

CLIENT

MEMORY

SERVER

TimeSeries

NAG

Page 20: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 20

NAG DataBlade

Vector and Matrix data types

NAG Functions

Roots of equations

Curve and surface fitting

Matrix factorisation

Eigenvalues and Eigenvectors

Linear algebra support

Simple correlation on statistical data

Random number generation

Linear statistical modeling

Smoothing

Time series analysis

Sorting

Optimisation

PDE

Black-Scholes

Page 21: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 21

Timeseries data can be analysed using NAG functions in the server (1)

Timeseries and NAG functions can be combined:

select tstovec(clip(temp, t1, t2), “maxTemp”)

from tab1

where sensorid = 99;

select f001(tstovec(clip(temp, t1, t2), “maxTemp”))

from tab1

where sensorid = 99;

f001(timeseries, time, time) could be a stored procedure that calls the NAG function G13AAF() that carries out seasonal differencing of a timeseries.

Page 22: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 22

Timeseries data can be analysed using NAG functions in the server (2)

The function f001() could be a stored procedure calling NAG functions:

create function f001(x vecDblType) returning vecDblType;

define ifail, integer;

define xd vecDblType;

  -- The output vector xd needs to be the same size

-- as the input vector

let xd = copy(x);

  -- Call g13aaf()

let ifail = udrg13aaf(x, 2, 1, 4, xd, ‘0’);

  return xd;

end function;

Page 23: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 23

ORDBMS - Extended to Support RealTime Data

Supports common data types Supports extended data types and functionality

TimeSeries

NAG

RTL

• Ticks made available to client programs in < .01s• High load rates, up to 100,000 s-1

TimeSeries

NAG

RealTime Loader

CLIENT

DISK

CLIENT

MEMORY

RTL MEMORY

RTL CLIENTData Feed

SERVER

Page 24: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 24

RealTime Loader

Ticks in the RealTime Loader memory appear as part of the ORDBS time series

RealTime Loader Memory

ORDBMS Disk

SQL access to the RTL data

Page 25: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 25

ORDBMS - Extended with Customer Functionality

Supports common data types Supports extended data types and functionality

TimeSeries

NAG

RTL

Customer types and functionality

TimeSeries

NAG

RealTime Loader

Custom

CLIENT

DISK

CLIENT

MEMORY

RTL MEMORY

RTL CLIENTData Feed

SERVER

Page 26: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 27

Spatial extensions

Spatial DataBlade – Implements the Open GIS Consortium types Spatial datatypes

• Point

- X, Y, Z, M

• Line, Polygon

- Set of points

• MultiPoint, MultiLineString, MultiPolygon Spatial functions

• Area, BoundingBox, Contains, Crosses, Distance, Overlap, Union, Within R-Tree index

Geodetic DataBlade Uses longitude, latitude, height, time, measure

Grid DataBlade

Page 27: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 28

Spatial query

Spatial data types can be added to the table:create table tab1 (

sensorId integer,

location ST_Point,

temps timeseries(rt_1));

and used in queries:

select sensorId, location, temps, getLastElem(temps), getnelems(temps),

clip(temps, t1, t2), func(temps, t1, t2)

from tab1, counties

where ST_Within(counties.border, location)

and counties.name = “Cheshire”;

Page 28: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 29

GRID Datablade (1)

Written, supplied and supported by an IBM partner, Barrodale Computing Services Ltd, Victoria, Canada

Stores data in “Smart Binary Large Objects” Provides functions to load data from common data formats Provides functions to extra data from the grids Can store:

1D: timeseries, vectors

2D: raster images, arrays

3D: spatial volumes, images at different times

4D: volumes at different times

5D: 4D grids with a set of variables at each 4D point

Page 29: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 30

GRID Datablade (2)

Operations: Interpolation

Affine transformation

Projection

Windowing

Update Storage formats:

CDF

GRIB

HDF

NetCDF

SDTS

Page 30: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 31

BCS GRID Slicer Demo

Page 31: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 32

Other DataBlades

Node Stores tree structures

Ancestor, parent, sibling, descendants Period

Uses R-Tree index

Same problem as spatial search N-dimensional index

Just the R-tree index (up to 5 dimensions) GMP extended precision floats and integers Text Image

Page 32: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 33

DataBlade Tools

BladeSmith Generates templates for the functions

Maintains SQL to manage types/functions Blade Packager Blade Manager

Page 33: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 34

References

“Object-relational DBMs: Tracking the Next Great Wave”, Michael Stonebraker, Paul Brown, Dorothy Moore

ISBN 1-55-860452-9 “Informix Dynamic Server.2000 – Server-Side Programming in C”,

Jacques Roy ISBN 0-13-013709-X

“Storing and Manipulating Gridded Data in Databases” http://www.barrodale.com/grid_Demo/gridInfo.pdf

IBM website http://www-306.ibm.com/software/data/informix/blades/spatial/

Page 34: Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

© 2004 IBM CorporationPage 35

Conclusion ORDBMS extensibility

Better performance

• Faster• Less space• More functionality

Different packages can be combined into complex systems

Already supports useful types and functions:

• Geodetic• GRID (from BCS)• Image• NAG• Spatial• Text• Timeseries

User additions

Easy to do


Recommended