21
© 2010 IBM Corporation April 6, 2011 TimeSeries Technical Presentation Jacques Roy

Ugif 04 2011 france ug04042011-jroy_ts

  • Upload
    ugif

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ugif 04 2011   france ug04042011-jroy_ts

© 2010 IBM CorporationApril 6, 2011

TimeSeries Technical Presentation

Jacques Roy

Page 2: Ugif 04 2011   france ug04042011-jroy_ts

2 © 2010 IBM Corporation

Agenda

■ What is TimeSeries■ Why TimeSeries■ Components■ Usage■

Page 3: Ugif 04 2011   france ug04042011-jroy_ts

3 © 2010 IBM Corporation

“Give me the Jan 1st element from time series “X”

�Most useful when a range of data is normally read

“Give me the Jan 1st thru Jan 10th elements from time series “X”

�Access to one time series is usually completed before moving to the next time series.

Page 4: Ugif 04 2011   france ug04042011-jroy_ts

4

Challenges Managing Time Series Data■ Slow Performance

– Extremely slow data access specially for ordered set of rows due to the data layout and disk I/O

– Operations hard or impossible to do in standard SQL

■ High Storage Requirements– Time series are usually stored as "tall – thin" tables with a very large

number of rows

– May need one index to enforce uniqueness and another for index only read, more space used for index than data

– Huge space requirements in standard relational layout, due to the volume and data

■ Complex Querying– Can be difficult to write SQL to work with the data

Page 5: Ugif 04 2011   france ug04042011-jroy_ts

5

Informix Solution

● TimeSeries Data Type : Native time series support

■ Store time series elements as an ordered set of elements– Uses less space because the "key" is factored out and the time field

takes either 0 (for regular) or 11 ( for irregular) bytes– Access is faster than index-only-read– SQL can be made much simpler

■ Freedom to manage time series data:– Freedom to choose what and how it is stored– Freedom to choose the time series interval– Freedom to choose where the time series is stored

2010-01-01,daily,{(12.34,12567),(12.56,9000),(12.34,55567),..}

2010-09-01,daily,{(9.34,8067),(9.56,9000),(9.40,10780),..}

2010-05-05,daily,{(199.08,6780),(198.55,3400),(198.12,250),..}

Reading

1001

2011

2001

MeterIdNo other RDBMS has native time series support

Page 6: Ugif 04 2011   france ug04042011-jroy_ts

6 © 2010 IBM Corporation

Key Strengths of Informix TimeSeries

�Performance–Extremely fast data access: Data clustered on disk to reduce I/O–Provides very high degree of parallelism on reads and writes–Provides continuous loading of data with minimal impact on concurrent

queries

�Space Savings–Provides high level of compression–Can be over 50% space savings over standard relational layout

�Usability–Time series tool kit allows custom analytics to be written–Handles operations hard or impossible to do in standard SQL–Conceptually closer to how users think of time series–No other RDBMS has native time series support

Page 7: Ugif 04 2011   france ug04042011-jroy_ts

7 © 2010 IBM Corporation

Smart Meters Data: Schema Example

1 Tue Value 1 Value 2 Value N…….

1 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

13 Mon Value 1 Value 2 Value N…….

13 Tue Value 1 Value 2 Value N…….

13 Wed Value 1 Value 2 Value N…….

... ... ... ... ...…….

1 Mon Value 1 Value 2 Value N…….

Primary Key

Col1 Col2 ColNdatemtr_id

Relational Schema

Above schema using Informix TimeSeries

1

2

3

4

(int) timeseries(mtr_data)

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

[(Mon, v1, ...)(Tue,v1…)]

mtr_id Series

Save space and increase performance with faster data access with Informix

Page 8: Ugif 04 2011   france ug04042011-jroy_ts

8 © 2010 IBM Corporation

TimeSeries Space Savings Example●TimeSeries data type takes much less space than traditional relational storage

– Proof of concept example:

• Regular TimeSeries, 15 minute interval

• Relational database used ~ 1TB (1000GB)

• Informix used ~340GB

� The reason for this is:– The TimeSeries does not repeat data

•MeterID: 4 bytes per reading

•TimeStamp: Could be 12 bytes per reading

•Assuming a 8 byte reading, that ~66% savings

•3X less storage!

Data Storage Comparison for 1 million meters

Page 9: Ugif 04 2011   france ug04042011-jroy_ts

9 © 2010 IBM Corporation

TimeSeries Performance

Performance Comparison for Data Loads and Reports for 1 Million Meters

�Performance

–Faster accessing sets of data• Ordered data

–Much faster combining time series

–For data loading into timeseries, Informix outperforms the nearest competition by more than 30x times

–For report generation from timeseries, Informix outperforms the nearest competition by more than 90x times

Page 10: Ugif 04 2011   france ug04042011-jroy_ts

10 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Who’s Interested in TimeSeries

� Energy: smart meters� Capital Markets

– Arbitrage opportunities, breakout signals, risk/return optimization, portfolio management, VaR calculations, simulations, backtesting...

� Telecommunications: – Network monitoring, load prediction, blocked calls (lost revenue)

from load, phone usage, fraud detection and analysis...

� Manufacturing: – Machinery going out of spec; process sampling and analysis

� Logistics: – Location of a fleet (e.g. GPS); route analysis

� Scientific research: – Temperature over time...

Page 11: Ugif 04 2011   france ug04042011-jroy_ts

11 © 2010 IBM Corporation

TimeSeries: Key Concepts

■ Containers– Specialized storage for TimeSeries

EXECUTE PROCEDURETSContainerCreate('raw_container', 'rootdbs',

'meter_data', 100, 50);

■ Timeseries data element: row type– Flexibility to define as many parts as needed

CREATE ROW TYPE meter_data (tstamp datetime year to fraction(5),value decimal(14,3)

);

■ Timeseries types: regular, irregular– Covers regular intervals and sparse data distribution

■ Calendar– Defines business patterns

Page 12: Ugif 04 2011   france ug04042011-jroy_ts

12 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Features Unique to Regular TimeSeries

� Only one element per “on” interval

� Value "persists" to end of interval

� An element for an “on” interval may be missing, entire

element will be NULL

� Calendar determines offset in TimeSeries of given time point

� Elements can be accessed by offset or time point

� Time point not stored; calculated from header + date/time

arithmetic

Page 13: Ugif 04 2011   france ug04042011-jroy_ts

13 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Features Unique to Irregular TimeSeries

� Data can be entered at any time point within a valid "on" interval

� Element persist until next element� No NULL elements� Elements can only be accessed by time� No duplicate time points allowed� If element already exists at given time point either an error is

raise or a unique time point is found:– round time point up to nearest second

– search back for first element

– add 10 microseconds, this is new time point

Page 14: Ugif 04 2011   france ug04042011-jroy_ts

14 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Accessing Timeseries

� Access through standard tabular view– Makes TimeSeries look like a standard relational table

� SQL Functions– 103 functions

� Customized functions– Written in Stored Procedure Language (SPL), “C”, Java

– 65 “C” functions

Page 15: Ugif 04 2011   france ug04042011-jroy_ts

15 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

TimeSeries Header

� A TimeSeries needs information that sets its context:

– Calendar: Time period where data is found

– Origin: Time origin of the TimeSeries

– Threshold: in-row storage threshold

– Container: where to store the out-of-row data

– Metadata: optional data added by the TimeSeries creator

Page 16: Ugif 04 2011   france ug04042011-jroy_ts

16 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Calendar and Calendar Patterns

� A calendar pattern is needed before we can create a calendar:INSERT INTO CalendarPatterns

VALUES(' day ', '{1 on, 2 off, 4 on}, day' );�

� A Calendar defines a set of valid times at which the TimeSeries can record data. (July 8, 2005 is a Friday)

INSERT INTO CalendarTable(c_name, c_calendar)VALUES(' calday ' , 'startdate(2005-07-08 00:00:00.00000), pattstart(2005-07-08 00:00:00.00000), pattname( day )' );

� You can provide a pattern explicitly:INSERT INTO CalendarTable(c_name, c_calendar)

VALUES(' weekcal ' , 'startdate(2005-07-08 00:00:00.00000), pattstart(2005-07-08 00:00:00.00000), pattern({1 on, 2 off, 4 on}, day)' );

Page 17: Ugif 04 2011   france ug04042011-jroy_ts

17 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

TimeSeries: Table

� A TimeSeries resides in a table:�

CREATE TABLE ts_data (loc_esi_id char(20) NOT NULL,measure_unit varchar(10) NOT NULL,direction char(1) NOT NULL,multiplier TimeSeries(meter_data),raw_reads timeseries(meter_data),PRIMARY KEY(loc_esi_id, measure_unit, direction)

) LOCK MODE ROW;

Page 18: Ugif 04 2011   france ug04042011-jroy_ts

18 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Populating a TimeSeries

� A timeSeries must first be created:INSERT INTO taqtrade_dayVALUES("IBM.N", TSCreate('calday', '2005-07-08 00:00:00.00000', 20, 0, 0, 'taqtrade_day'));

� It can be created through the input function:INSERT INTO taqtradeVALUES("AA.N", 'irregular, container(taqtrade),

origin(2007-04-03 06:30:00.00000), calendar(calsec),

[(4.48, . . .)@2007-04-03 06:30:03.00003, (4.50,. . .)@2007-04-03 06:30:03.00119, . . .]');

Page 19: Ugif 04 2011   france ug04042011-jroy_ts

19 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

The Virtual Table Interface

� Makes a TimeSeries look like a table:EXECUTE PROCEDURE TSCreateVirtualTab(' ts_data_v ', ' ts_data ', 'origin(2010-11-10 00:00:00.00000), calendar(cal15min),container(raw_container), threshold(0), regular', 0, ' raw_reads ');

� Virtual table created:CREATE TABLE ts_data_v ( loc_esi_id char(20), measure_unit varchar(10,0), direction char(1), tstamp datetime year to fraction(5), value decimal(14,3));

Page 20: Ugif 04 2011   france ug04042011-jroy_ts

20 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

Quick Review

� A TimeSeries resides in a container– The container resides in a dbspace

– The container is for a specific element type (row type)

– A container is for either a regular or irregular TimeSeries (not both)

– A container can contain multiple TimeSeries�

� A TimeSeries requires a calendar– Defines when the data starts, defines a pattern of valid values

� A TimeSeries data is defines as a row type– Defines the values tracked

� You can operate on TimeSeries through special SQL functions or use the virtual table interface and standard SQL

Page 21: Ugif 04 2011   france ug04042011-jroy_ts

21 © 2007 IBM CorporationInformix Dynamic Server, TimeSeries DataBlade Module class

DEMO