What Can Temporal Analytics Do For My Business?
Alison TorresDirector, Data Warehouse Consulting
Technical Conference 2011
TURNING YOUR DATA WAREHOUSE INTO A TIME MACHINE
3
Many Need It… Only a Few Try It Today
• Temporal processing is attempted by only a few ambitious organizations
• Complexity keeps it out of reach for most> Representing temporal data has been
difficult -- until now> Databases handle time instants, but not
time intervals– No temporal query language– No temporal DDL or DML– No temporal constraints
• Elaborate and complex query code is required> Complex qualifications based on multiple
effective dates
4
The Dawn of the Time-Aware Analytics Database
• Add a time dimension to data analysis
• Understand the evolution of business facts
• Report how things were • Don’t let changes over time
distort analyses• Update data as they change;
let the database organize the history• Questions become simple
> Database understands time relationships and how things changed
> Eliminates complex condition clauses in queries
5
Detailed Data vs. Comprehensive History
• “We keep 3 years of data in our data warehouse”> Sales transactions> Call detail records> Insurance claims> Web, social media contacts> Bank loan activity> Tax payments
• But, is that a comprehensive picture of the business?> Product category hierarchy changes and reclassifications> Price plan and sales territory changes> Policy terms (e.g., face amount, deductibles, dates) changes> Manufacturing bills of material changes> Employee hire/departure dates> Customer contact information changes> Business activities over time (e.g., equipment updates)> Non-compliant taxpayers> Tax Code changes
6
Insurance Policy Processing
July 1, 2008 September 18, 2009
January 1, 2010 February 12, 2010
•New hire Frank starts work
•Chooses premium insurance policy
•Frank gets hurt on job
•Starts rehab program
•Frank back on job
•Changes to basic insurance policy for 2010 at open enrollment to save money
•Injury rehab claim submitted
•Frank currently shows basic insurance policy
•Which policy terms are used in claims processing?
Temporal history and query processing required to handle claim
based on policy terms as of Sept 18, 2009 when injury occurred.
7
Sales Territory Realignment
February 28, 2010YTD Sales
March 31, 2010YTD Sales
YTD YTDSales Sales
(old areas)
West $215M $170
East $175M $220
January 1,2010 Sales Regions
WestEast
YTD 2010Sales Q1 Goal
West $124M $160M
East $180M $ 215M
March 1,2010 Sales Regions
WestEast
8
Revenue Per Customer Analysis
4% 7% 10%-7%-10%
Revenue Growth Yr/Yr
# o
f Custo
mers
Moved D
uring the Y
ear
Moved D
uring the Y
ear
9
Revenue Management and Crew Scheduling
• Airlines, railroads, trucking lines analyze sales patterns and pricing plans> Maximize revenue> Efficiently schedule crews> Maximize equipment utilization
• Seat sales at intervals leading up to flight date> At several points in time> Sequence and pattern of sales and pricing actions
• Similar analyses involve “daily balance”> E.g., Average daily inventory> Values often stored only when changed, not daily> Average and other statistical functions can’t use incomplete data
patterns
10
Manufacturing Bill of Material
• Components and source vendors for parts within complex manufactured products frequently change
• Understanding the true content of a specific item and the impactof changes is critical
• What components from which vendor were used when a product returned for service was manufactured?
• How does the maintenance history compare for units built with parts from Vendor A vs. Vendor B?
• What is the current name and address for each registered customer who purchased a finished product which includes the fastener supplied by vendor x?
11
Equipment Update Example – Gas Pipeline
• Hole in disk in meter calibrates flow measurement• Field operations often replaces disk with new size opening
> Lag between field operation action and data entry> Bills to gas producers are incorrect until system reflects new disk in
meter
• Must “restate history” when system is updated back to actual field equipment change date
• Accurate reports require field change date AND system update date
12
Compliance Reporting Examples
• Reports must be produced showing state of business at an earlier time> Report all insurance members as of January 15, 2008
• Audit may require reproducing reports as of previous filing date> Provide a list of members who were reported as covered on January
15, 2008 in the February 1, 2008 report with names as known then> The report could also be run with current contact information; a third
date used in the same query
• Other situations require audit trail showing all changes to key data> Temporal maintains history and prevents untracked user
modifications
13
Audience Ideas …
• Are you using temporal analysis today in your business?• How could you use temporal analysis in your business?• What information do you need to capture?• What information is currently available?• What application changes will have to be made?• What other thoughts are running through you head right now?
14
Techniques to Build Your Own Solution
• Many names used for handling historical records > Effective dates or Begin/End dates> As-Is/As-Was processing> Slowly Changing Dimensions> Database snapshot capture
• All recognized as difficult to implement and inefficient
15
Product Category Change
2009 Sales 2010 Sales
February 13, 2010 2010 Sales(Categories as of Dec. 31, 2009)
2009 2010Sales Goal for
Bonus
Frozen Foods $124M $136M
Dairy $ 75M $ 90M
Frozen Foods Dairy
2010Sales
Frozen Foods $126M
Dairy $ 92M BONUS
Ice Cream
2010 Ice Cream Sales $12M
2010Sales
Frozen Foods $138M
Dairy $ 80M BONUS
Item Category Change
LOST BONUS
16
What is the current manufacture bill of material for the part?
Compare sales and profitability for product class x for last quarter with the same quarter last year (based on products currently in the class)
What was the number of seats and profitability of flight 999 for the departure last Tuesday?
Current Standard Capability
How does the bill of material for the part differ between current manufacturing, the date when the failed part was manufactured, and when quality testing for the product was conducted?
Compare sales/profitability for product class x for last quarter with the same quarter last year three ways: with the products in class now, with the products in the class last year, and with the products in the class at the time.
What was the number of seats and profitability by segment of Flight 999 three, seven, fourteen days before departure?
With Temporal Support
The Time Dimension Enables Insight
17
2009 Salesas reported2009
FrozenFood
Dairy
124
75
July 2010 Saleswithout temporal
July 2009 vs. 2010 SalesAs Reported
(Period Categories)
DairyFrozenFood
126
92
DairyFrozenFood
DairyFrozenFood
2010
2009
2009 2010
1Ice Cream
2Yogurt
1Lasagna Dinner
1Orange Juice
1Peas
1Pizza
CategoryProduct
2 from Feb. 13, 2010 until changedIce Cream
1 from January 1, 2009 to Feb. 13, 2010Ice Cream
2 from January 1, 2009 until changedYogurt
1 from January 1, 2009 until changedLasagna Dinner
1 from January 1, 2009 until changedOrange Juice
1 from January 1, 2009 until changedPeas
1 from January 1, 2009 until changedPizza
CategoryProduct
July 2010with categories as of Dec. 31, 2009
DairyFrozenFood
138
80
2010
2Ice Cream
2Yogurt
1Lasagna Dinner
1Orange Juice
1Peas
1Pizza
CategoryProduct
On 2/13 Updated Category from 1 to 2 where Product=Ice Cream
124
75
126
92
Uses category before Feb 13 despite more recent update
Need additional row inserted in Product Category table to keep category for each period
Temporal Support – Product Category Example
18
Many Need It… Only a Few Try It Today
• Temporal processing attempted by only a few ambitious organizations
• Complexity keeps it out of reach for most> Representing temporal data has been difficult
-- until now> Databases handle time instants, but not time
intervals– No temporal query language– No temporal DDL or DML– No temporal constraints
• Elaborate and complex query code required> Complex qualifications based on multiple
effective dates
19
• Period data types• Validtime and transactiontime column attributes• Automated temporal data management
> Automatic transaction time> Time period rows created, effective date insertion
• Intelligent temporal query processing> Temporal query semantics
– Temporal qualifiers on constraints, queries, or sessions
> Performance– Optimizer logic– PPI enhancements
> Time series expansion– Dynamically define time series on period data– Produce results at periodic time points
> Backward compatibility– Existing applications run without change– Looking at the current data is the default and works as before
Data Warehouse Support for Temporal Processing
20
• Bi-Temporal Table Definition (DDL):
• Client calls on Dec 15 and requests $100K policy starting on January 1 for 1 year:
Bi-Temporal History Table Example
CREATE MULTISET TABLE PolicyInfo (Client_ID INTEGER
,Policy_ID INTEGER,Insured_Amount DECIMAL(10,2),Effective_Time PERIOD(DATE) AS VALIDTIME,DBMS_Time PERIOD(TIMESTAMP(6) WITH TIME ZONE) AS
TRANSACTIONTIME);
12/15/2009–UNTIL_CLOSED
01/01/2010 – 01/01/2011$100,00010100
DBMS TimeEffective TimeInsured Amount
Policy IDClient ID
21
• Client call on Feb 15 for an adjustment to the existing policy with immediate effect:
1.Increase coverage to $200,000 starting immediately and ending on 06/01/2010.
2.Leave the remainder of the policy unchanged• This can be accomplished using a single Temporal UPDATE
with Period of Applicability as follows:
Bi-Temporal History Table Example
Indicates Temporal Table Usage
Temporal Period of Applicability
SEQUENCED VALIDTIMEPERIOD(DATE ‘2010-02-15’, DATE ‘2010-06-01’)UPDATE PolicyInfoSET Insured_Amount = 200000.00WHERE Client_ID = 100;
22
Bi-Temporal History Table Example
• The SEQUENCED UPDATE statement results in:> 3 SQL INSERTS> 1 SQL UPDATE
02/15/2010 –UNTIL_CLOSED
02/15/2010 – 06/01/2010$200,00010100
02/15/2010 –UNTIL_CLOSED
01/01/2010 – 02/15/2010$100,00010100
02/15/2010 –UNTIL_CLOSED
06/01/2010 – 01/01/2011$100,00010100
12/15/2009 – 02/15/201001/01/2010 – 01/01/2011$100,00010100
DBMS TimeEffective TimeInsured Amount
Policy IDClient ID
12/15/2009 –UNTIL_CLOSED
01/01/2010 – 01/01/2011$100,00010100
DBMS TimeEffective TimeInsured Amount
Policy IDClient ID
Original Row
Temporal Rows
I
U
I
I
23
Moving Current Date in PPI
• Provides the ability to define ‘current’ and ‘history’ partitions• Partition that contains the recent data can be as small as
possible for efficient access • Support use of CURRENT_DATE and CURRENT_TIMESTAMP
built-in functions in Partitioning Expression• Ability to reconcile the values of these built-in functions to a
newer date or timestamp using ALTER TABLE > Users can define with ‘moving’ date and timestamps instead of redefining PPI
expression using constants
24
• One row per period often saved• “Holes” in sequence values
hinders analysis• New EXPAND ON clause added to
SELECT to expand row with a period column into multiple rows
• Permits time-based analysis on period values> Allows business questions such as ‘Get the month-end average
inventory cost during the last quarter of the year 2006’> Allows OLAP analysis on period data
• Allows charting of period data in an excel format
Time Series Expansion Support
25
Row Expansion
Expand On – By Interval
72010-05-11217
72010-05-10217
72010-05-09217
72010-05-08217
102010-05-07217
102010-05-06217
102010-05-05217
152010-05-04217
152010-05-03217
152010-05-02217
152010-05-01217
Inv_PosPostDateItem
SELECT Item
,BEGIN(pd_expand) AS Postdate
,Inv_Pos
FROM Inventory_Position
EXPAND ON Applicable_Period pd_expand
BY INTERVAL ‘1’ DAY
FOR PERIOD(date ‘2010-05-01’
and ‘2010-05-12’);
72010-05-08, 2010-05-12217
102010-05-05, 2010-05-08217
152010-05-01, 2010-05-05217
Inv_PosApplicable_PeriodItem
26
SELECT member.member_id, member.member_nm
FROM edw.member_x_coverage
VALIDTIME AS OF DATE ‘2000-01-15’ AND
TRANSACTIONTIME AS OF DATE ‘2000-02-01’,edw.member
WHERE member_x_coverage.member_id =
member.member_id;
With Temporal Support
SELECT member.member_id
,member.member_nm
FROM edw.member_x_coverage coverage
,edw.member
WHERE coverage.member_id = member.member_id
AND coverage.observation_start_dt <= '2000-02-01'
AND (coverage.observation_end_dt > '2000-02-01'
OR coverage.observation_end_dt is NULL)
AND coverage.effective_dt <= '2000-01-15'
AND (coverage.termination_dt > '2000-01-15'
OR coverage.termination_dt is NULL)
Without Temporal Support
Temporal Query
• Provide a list of members who were reported as covered on Jan. 15, 2000, in the Feb. 1, 2000, NCQA report, with names as accurate as our best data shows today
27
Temporal Update – BiTemporal Table
With Temporal SupportUPDATE objectlocation
SET LOCATION = ‘External’
WHERE item_id = 125
AND item_serial_num = 102
Without Temporal SupportINSERT INTO objectlocation
SELECT item_id, item_serial_num, ‘External’, CURRENT_TIME, END(vt), CURRENT_TIME, ‘Until_Closed’
FROM objectlocationWHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIMEAND END(vt) > CURRENT_TIMEAND END(tt) = ‘Until_Closed’;
INSERT INTO objectlocationSELECT item_id, item_serial_num, location, BEGIN(vt),
CURRENT_TIME, CURRENT_TIME, ‘Until_Closed’FROM objectlocationWHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIMEAND END(vt) > CURRENT_TIMEAND END(tt) = ‘Until_Closed’;
UPDATE objectlocationSET END(tt) = CURRENT_TIMEWHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIMEAND END(vt) > CURRENT_TIMEAND END(tt) = ‘Until_Closed’;
INSERT INTO objectlocationSELECT item_id, item_serial_num, ‘External’, BEGIN(vt), END(vt), CURRENT_TIME, ‘Until_Closed’FROM objectlocationWHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) > CURRENT_TIMEAND END(tt) = ‘Until_Closed’
UPDATE objectlocationSET END(tt) = CURRENT_TIMEWHERE item_id =125 AND item_serial_num = 102
AND BEGIN(vt) > CURRENT_TIMEAND END(vt) = ‘Until_Closed’
• Current valid time, current transaction time QueryJeans (125,102) are sold today (2005-08-30)
28
Temporal Support Provides
• Reduced IT cost and complexity> Reduced query development and data
maintenance costs> Diminished effort monitoring and maintaining
temporal data chains
• Increased business intelligence breadth and depth> ‘Chain of events’ and
‘point-in-time’ analyses> Easily reconstruct historical
transaction details
29
New Jersey
Arizona
UtahMinnesotaIllinois
Iowa
Michigan
TexasMissouri
Maryland
Oklahoma
California New York
Ohio
Some U.S. Government Customers
30
Tax Solutions Results
• Discovery of thousands of non-compliant taxpayers• Optimized use of enforcement resources• Improved access to information to provide better service on
taxpayer contacts• Analytical capabilities to answer tax agency’s most difficult
business questions • Less intrusion and reduced burden for compliant citizens
Hundreds of millions of dollars in recovered tax revenue