21/04/23 © 2004 IBM Corporation
21/04/23
Extending a Relational DatabaseIntelligent storage of geospatial data
© 2004 IBM CorporationPage 2
Agenda
History
Current methods
Problems with current methods
Solution – extend database to support new data types
Timeseries
NAG
Spatial including Geodetic and GRID
Conclusion
© 2004 IBM CorporationPage 3
IBM - Informix Dynamic Server
It was the Informix-IDS Database Informix database business bought by IBM in 2001
IDS and DB2 UDB are similar and are getting closer
Extensibility in Informix came from Stonebraker and Brown Ingres -> Postgres -> Illustra added to Informix
DB2 UDB extensibility the same in theory, DB2 Extenders equivalent to IDS DataBlades
© 2004 IBM CorporationPage 4
Current Methods
Data held in a RDBMS Easy to manage
• SQL Easy to access from applications
• C- ESQL/C, CLI
• ODBC• JDBC
Easy to administer
• Standard tools• Trained staff
Many 3rd party products use RDBMS
© 2004 IBM CorporationPage 5
Problems with RDBMS
Not good with complex data
A row is just a row, no concept of
an ordered set of rows
Restricted set of data types
Not good with “unusual” functions
What is unusual ?
• Seasonal averaging
• Multivariate analysis
• VWAP
Not good at loading lots of data
items
OK with high volume of data
Not so good with lots of small
rows, e.g. ticks, readings
High latency (time between data
arriving and being visible through
SQL queries)
© 2004 IBM CorporationPage 6
Standard client-server
CLIENT
SERVER
DISK
CLIENT
MEMORY
Supports common data types Character
Numeric
• Integer• Float• Decimal
Datetime/interval
Large object
• Binary
• Text Collections
• List, set, multiset
© 2004 IBM CorporationPage 7
Database can be extended – new data types
Distinct e.g. lengths, mass etc
Row e.g. address = houseNumber + road + town + postCode
Opaque User defines how the data is stored
• In row• Binary large object• External (file or some external process)
Once created it is like an additional built-in type
© 2004 IBM CorporationPage 8
Opaque data type implementation(1)
Opaque types are implemented by a series of functions Cast functions
• lvarchar -> myType• myType -> lvarchar
Comparison function
• B-tree index support• R-tree index support
Mathematical operators
• + - x /
© 2004 IBM CorporationPage 9
Opaque data type implementation(2)
Functions implemented in C
Java
SPL Made known to the server by an SQL statement Functions usually execute as part of the main database server
process Types handled correctly by SQL layer
Optimiser
Indexing
Mathematical operations
Aggregates
© 2004 IBM CorporationPage 10
Opaque data type implementation(3)
Simple example, complex numbers Stored as 2 double precision numbers
Function to convert the string “12.34 + 56.78i” to a C structure containing 2 floats
Function to convert a C structure with 2 floats into the string “12.34 + 56.78i”
Functions to add, subtract, multiply and divide complex numbers
• complex_t *Plus(complex_t, complex_t) Can now be used in SQL statements:
• create table (u complex_t);• insert into table values (“1 + 2i”);• select u * u from tab;
- “-3 + 4i” Comparisons OK, what about ordering ?
© 2004 IBM CorporationPage 11
Virtual Table/Index Interface
Tables can be replaced by functions (VTI) Insert function
Select function
• First, next, previous, nth, last Delete function
Can create own indexing methods Specialist area
Example of VTI – can make Google look like a database table:
select * from google
where search = “IBM+IDS+TimeSeries”;
© 2004 IBM CorporationPage 12
Problems storing TimeSeries data in RDBMS
“Tall-thin” tables, primary key roughly same size as data Each row selected can cause a disk read Extra indexing often added for index only reads
Solutions include storing more than one element per row - makes
management difficult Using a separate timeseries database – expensive, difficult
to manage
© 2004 IBM CorporationPage 13
Timeseries – the problem Ordinary RDBMs store the values in rows, these rows are stored
randomly in tables even though access is usually in time order. You need to create an index to store the key value for each row in time order. This takes more space, data access can be slow with over complicated SQL statements.
Stock Timestamp Price Vol
ABC 2002-06-14 12:00:01 13.80 1000
XYZ 2002-06-14 12:00:01 98.76 50
ABC 2002-06-14 12:00:00 13.45 100
ABC 2002-06-14 12:00:00 4
ABC 2002-06-14 12:00:01 1
XYZ 2002-06-14 12:00:01 2
ABC 2002-06-14 12:00:00 13.45 100 4
ABC 2002-06-14 12:00:01 13.80 1000 1
XYZ 2002-06-14 12:00:01 98.76 50 2
© 2004 IBM CorporationPage 14
Timeseries – the solution
Store elements as vectors in time order
Stock Time series
ABC (2002-06-14 12:00:00 13.45 100),(2002-06-14 12:00:01 13.80 1000),…
XYZ (2002-06-14 12:00:01 98.76 50),…
© 2004 IBM CorporationPage 15
ORDBMS - Extended to Support TimeSeries
CLIENT
DISK
CLIENT
MEMORY
SERVER
TimeSeries
Supports common data types Supports extended data types and functionality
TimeSeries
© 2004 IBM CorporationPage 16
Advantages of TimeSeries
Performance (time/space)
Write SQL functions that work on TimeSeries
Space Time
Traditional (1) 516 2.39
Traditional (2) 903 0.43
TimeSeries 269 0.09
© 2004 IBM CorporationPage 17
TimeSeries
Create a type to hold the data:create row type rt_t (
ts datetime year to fraction(5),
minTemp temperature,
maxTemp temperature);
Create the table:
create table tab1 (
sensorId integer,
temps timeseries(rt_t));
Use the data:select temps, getLastElem(temps), getnelems(temps),
clip(temps, t1, t2), func(temps, t1, t2)
from tab1
where sensorId = 99;
© 2004 IBM CorporationPage 18
Timeseries functions
Built in SQL functions Used in SQL statements and SPL functions
Server and Client C API Common way of writing TimeSeries functions
Server and Client Java Class Library
© 2004 IBM CorporationPage 19
ORDBMS - Extended to Support Complex Analysis
Supports common data types Supports extended data types and functionality
TimeSeries
NAG types and functions
• Moves processing closer to the data• Data reduction at earliest opportunity• Functionality common to very different clients
CLIENT
DISK
CLIENT
MEMORY
SERVER
TimeSeries
NAG
© 2004 IBM CorporationPage 20
NAG DataBlade
Vector and Matrix data types
NAG Functions
Roots of equations
Curve and surface fitting
Matrix factorisation
Eigenvalues and Eigenvectors
Linear algebra support
Simple correlation on statistical data
Random number generation
Linear statistical modeling
Smoothing
Time series analysis
Sorting
Optimisation
PDE
Black-Scholes
© 2004 IBM CorporationPage 21
Timeseries data can be analysed using NAG functions in the server (1)
Timeseries and NAG functions can be combined:
select tstovec(clip(temp, t1, t2), “maxTemp”)
from tab1
where sensorid = 99;
select f001(tstovec(clip(temp, t1, t2), “maxTemp”))
from tab1
where sensorid = 99;
f001(timeseries, time, time) could be a stored procedure that calls the NAG function G13AAF() that carries out seasonal differencing of a timeseries.
© 2004 IBM CorporationPage 22
Timeseries data can be analysed using NAG functions in the server (2)
The function f001() could be a stored procedure calling NAG functions:
create function f001(x vecDblType) returning vecDblType;
define ifail, integer;
define xd vecDblType;
-- The output vector xd needs to be the same size
-- as the input vector
let xd = copy(x);
-- Call g13aaf()
let ifail = udrg13aaf(x, 2, 1, 4, xd, ‘0’);
return xd;
end function;
© 2004 IBM CorporationPage 23
ORDBMS - Extended to Support RealTime Data
Supports common data types Supports extended data types and functionality
TimeSeries
NAG
RTL
• Ticks made available to client programs in < .01s• High load rates, up to 100,000 s-1
TimeSeries
NAG
RealTime Loader
CLIENT
DISK
CLIENT
MEMORY
RTL MEMORY
RTL CLIENTData Feed
SERVER
© 2004 IBM CorporationPage 24
RealTime Loader
Ticks in the RealTime Loader memory appear as part of the ORDBS time series
RealTime Loader Memory
ORDBMS Disk
SQL access to the RTL data
© 2004 IBM CorporationPage 25
ORDBMS - Extended with Customer Functionality
Supports common data types Supports extended data types and functionality
TimeSeries
NAG
RTL
Customer types and functionality
TimeSeries
NAG
RealTime Loader
Custom
CLIENT
DISK
CLIENT
MEMORY
RTL MEMORY
RTL CLIENTData Feed
SERVER
© 2004 IBM CorporationPage 27
Spatial extensions
Spatial DataBlade – Implements the Open GIS Consortium types Spatial datatypes
• Point
- X, Y, Z, M
• Line, Polygon
- Set of points
• MultiPoint, MultiLineString, MultiPolygon Spatial functions
• Area, BoundingBox, Contains, Crosses, Distance, Overlap, Union, Within R-Tree index
Geodetic DataBlade Uses longitude, latitude, height, time, measure
Grid DataBlade
© 2004 IBM CorporationPage 28
Spatial query
Spatial data types can be added to the table:create table tab1 (
sensorId integer,
location ST_Point,
temps timeseries(rt_1));
and used in queries:
select sensorId, location, temps, getLastElem(temps), getnelems(temps),
clip(temps, t1, t2), func(temps, t1, t2)
from tab1, counties
where ST_Within(counties.border, location)
and counties.name = “Cheshire”;
© 2004 IBM CorporationPage 29
GRID Datablade (1)
Written, supplied and supported by an IBM partner, Barrodale Computing Services Ltd, Victoria, Canada
Stores data in “Smart Binary Large Objects” Provides functions to load data from common data formats Provides functions to extra data from the grids Can store:
1D: timeseries, vectors
2D: raster images, arrays
3D: spatial volumes, images at different times
4D: volumes at different times
5D: 4D grids with a set of variables at each 4D point
© 2004 IBM CorporationPage 30
GRID Datablade (2)
Operations: Interpolation
Affine transformation
Projection
Windowing
Update Storage formats:
CDF
GRIB
HDF
NetCDF
SDTS
© 2004 IBM CorporationPage 31
BCS GRID Slicer Demo
© 2004 IBM CorporationPage 32
Other DataBlades
Node Stores tree structures
Ancestor, parent, sibling, descendants Period
Uses R-Tree index
Same problem as spatial search N-dimensional index
Just the R-tree index (up to 5 dimensions) GMP extended precision floats and integers Text Image
© 2004 IBM CorporationPage 33
DataBlade Tools
BladeSmith Generates templates for the functions
Maintains SQL to manage types/functions Blade Packager Blade Manager
© 2004 IBM CorporationPage 34
References
“Object-relational DBMs: Tracking the Next Great Wave”, Michael Stonebraker, Paul Brown, Dorothy Moore
ISBN 1-55-860452-9 “Informix Dynamic Server.2000 – Server-Side Programming in C”,
Jacques Roy ISBN 0-13-013709-X
“Storing and Manipulating Gridded Data in Databases” http://www.barrodale.com/grid_Demo/gridInfo.pdf
IBM website http://www-306.ibm.com/software/data/informix/blades/spatial/
© 2004 IBM CorporationPage 35
Conclusion ORDBMS extensibility
Better performance
• Faster• Less space• More functionality
Different packages can be combined into complex systems
Already supports useful types and functions:
• Geodetic• GRID (from BCS)• Image• NAG• Spatial• Text• Timeseries
User additions
Easy to do