34
Active Data Warehousing Five Years of Progress

5 Years of Progress in Active Data Warehousing

Embed Size (px)

Citation preview

Active Data WarehousingFive Years of Progress

2 >

Agenda

• The Six Active Elements> Active Access> Active Load> Active Events> Active Integration> Active Workload Management> Active Availability

• What have we done for you lately> Active innovations 2001-2010

3 >

Six Active Elements

Active Access Active Events

Active Workload Management

Active Data Warehouse

Active Availability

Active Enterprise Integration

Active Load

Active AccessWeb speed inquiries by front line employees, consumers, partners

5 >

Writing Web Speed Queries

• Use frequently> Single AMP accesses> Parameterized SQL/cache plans > Join indexes > Macros > Stored procedures > Row-level & access locks

• Minimize > NUSI access> Joins> Value-ordered indexes> Complex conditions> Lengthy CASE statements> Table level locks

• Avoid> All-AMP steps> Large answer sets> Table scans Active

DataWarehouse

0.01s

0.005s

0.01s

0.005s

0.02s

Network 0.02Application 0.02Query time 0.02Seconds 0.06

6 >

Multi-Statement SQL in Call Center

• 2 minute response time!> Get buyer share of wallet > Five 10 page SQL

statements – Ranking, windowing, OLAP

> BI Tool middle tier server> Aggregate indexes

• Solution: more parallelism> Web services> 5 simultaneous SELECTS> 2 second response time

ADW

Web server

5Selects

After: 2 seconds

ADW

Web server BI server

Before: 2 minutes

5Selects

7 >

Active Load and Web Services for Self Service

Staging Tables

Active Data Warehouse

ETL/ELT

Hourly Mini batch

SQLServer

Mainframe

WebSphereMQ

“Where’s my check?”

CSRConsumer

8 >

Applying for Credit: Before

Document archives

signed contract

release money

3

Customer

customer data gathering

paper application

Credit Committee

Decision maker 2

Decision maker 1

paper decisions

Decision + Terms

1

prepare contract

Contract

paper contract

accept proposal

2

6 working days, 240 Euros

9 >

Applying for Credit: After

Customer

Call Center Branch Internet ATM

Result Monitoring

Real time data gathering Automation

External data providers scoring pricing special

conditions

•Retail customers 30 seconds•Corporate 120 seconds Data

Warehouse

Multi-channel

Active LoadData Loading throughout the Business Day

11 >

Streams, Mini-Batch, & Replication

Active Data

Warehouse

Staging tables

Job SchedulerELT jobs

Hybrid FastLoad

FastLoadTransform

Mini-batch

FastLoad

Streams

TpumpAccess mod

Inmod routines

Replication

Apply

12 >

JMS Access Modules

MessageQueue JMSAccess Module

TPumpFastLoadMultiLoad Teradata

Continuous feeds

Batch feeds

MessageQueue JMSAccess Module

TPumpFastLoadMultiLoad Teradata

Continuous feeds

Batch feeds

13 >

Near Real-Time Data Load

Staging

Hourly Multi-load

Claims Administration

800/hour

WebSphereMQ

CSS Mart24/7

Data Warehouse

Nightly Batch loads

Web

macros

Claims Payments

250K requests/day.01 second

Admindept

14 >

Active Load Best Practices

• Start with hourly mini-batch> Streams need justification

– Queues have many quirks > Replication = stream + license

• Locking and indexes reduce throughput

• 60% of project is data cleansing and transformations> Same as ETL, only faster

• Do surge testing> At least 20X normal volumes

• Its not expensive --mostly labor> Costs 2-8% of machine

Active EventsSense and Respond Event Driven Applications

16 >

Active Events

• Simple to complex events> Unique sales order, truck

breakdown, flight cancellation, huge price drops, fraud detection, daily settlements, etc.

• Inside the database> Triggers, queue tables,

stored procedures• Outside the database

> Business Activity Monitoring> Complex Event Processing

Alerts/Decisions

Event Processing/ Work Flow

Business Rules

Event Filtering

Event Detection

ADW

BAMor CEP

17 >

TIBCO Enterprise Service Bus

HappyTime Casino

ADWCustomer History

Local CMSCurrent

Transactions

ODSCustomer

Profile

iLog Rules Engine

Driven Application

•Event Filtering•Business Logic

Gaming Events (card in/out, win/loss, etc.)

Prescribed Action (offer, message)

LoyaltyCard

Segment,profitability

alert

Active Workload ManagementThe Key to Active Applications

19 >

Active Workload Priorities

Work load

Daytime Weight %

Night Weight % Classification Exception

Front lines 60 35 web/ Call center

applicationsNone

Short-reports 10 15 CPU 1 20 sec CPU > 20

secondsMedium-reports 6 9 CPU 1 120 sec CPU > 200

secondsLong-reports 3 6 CPU > 120 sec CPU Time >

1000 seconds DBA-Rush 12 12 Specific

usernamesCPU > 2000 seconds

Load-High 8 8 GoldenGate,

Tpump None

Load-Long 1 15 Batch loading None

20 >

Measurement Before Changes After Changes % ChangeConcurrency ~ 145 ~ 45 ~ 69%AWT ~ 72 ~ 35 ~51%CPU ~ 70% ~ 70% sameAveragequery duration 724 sec 521 sec 28%

Web Serviceaverage duration 8.29 sec 3.40 sec 59%

ETL average duration 175 sec 159 sec 9%Reporting average duration 1395 sec 1020 sec 27%

Before and After TASM

Source: Simplifying Workload Management, Partners 2008

Active Enterprise IntegrationConnecting the Data Warehouse to Middleware

22 >

What is a Data Access Object?

• Called by a business service> Java beans, WS, POJOs, etc.

• Bridges OO to relational> A Java/.NET best practice > Creates business objects> May use data transfer

objects• Persists business objects

> SQL insert, update, delete• Maps RDBMS data to/from

business objects> Maps database column

values to/from object properties

business objectslayer

Active Data

Warehouse

JDBC/ODBC

Web service

customer DAO

Cust-historyDAO

business layer

Application logic

persistencelayer

data accesslayer

23 >

Sessions, DAOs, and Transactions

Teradata

Data AccessLayer

Business objects layer

Business object

attr1()attr2()getattr1()setattr1()getattr2()setattr2()

Data access layer

Business Service Layer

Business Service

sessiondaoObject1daoObject2func1func2

SessionManager

datasourceconnectiontransactionqueryBandbeginSessionendSessiongetConnectionbeginXtcncommitrollbacksetqueryband

TransactionManager

beginXctncommitrollback

JDBC drivergetConnectioncloseConnection

Data sourcejdbc driver

getConnectioncloseConnection

DAOsessionfindfindByxxxxSaveupdatedelete

24 >

Eclipse and Teradata IDE Plug-in

25 >

Teradata Eclipse Plug-in Features

Initial Release> Connection management> Data Source Explorer> Create Schema, Table, View, Trigger,

Macro, Stored Procedure SP, User Defined Functions

> Run SP/Macro> Display DDL> Sample Contents> Table edit> Ad hoc SQL editor> Drop, delete, rename, row

Teradata 12> JavaBean Wrapper Wizard> Java SP Wizard> DSE menu option> View framework> JXSP Jar management> Execution plan support,> Compare and statistics> XML Services

Teradata 12 continued> SP result sets> Modify schema dialog> Enhanced Create/Modify View> SQL parser & formatter> Preferences

Teradata 13> Spring DAO Wizard> JavaBean called from a Spring DAO> Create Java UDF Wizard > Run UDF Functions> Auto-generated Apache Ant build file > Copy object> Compare object definition> Explain in XML > Find objects> Interdependency browser> JavaBean wrapper enhanced> Java SP enhancements> Teradata compare view

http://developer.teradata.com/tools/articles/teradata-plug-in-for-eclipse-13-02-00-now-available

26 >

Track and Trace Architecture

DataWarehouse

TPumpAccessModule

TransportationSystems

External Portal Internal PortalApplication Server

Postal System: Item collection, posting, delivery information

(27 province centers, 64 city centers)

Sort Dispatch Centers

ContinuousLoad

Internet Intranet

TIBCO Enterprise Message Service

Active Availability

28 >

Active Requires HA -- Some add FT or DR

MPPsystem

appl.servers

DisasterRecovery

DW

off site

FaultTolerant

DW DWsync

primary site

HighAvailability

DWsync

29 >

When to Invest in DR or FT Availability

Many cash transactions Majority of Active Applications

Call centers

Partner report portals

Track and

Trace

Fraud detection

High Availability Fault Tolerant or Disaster Recovery

$ lo

st d

urin

g do

wnt

ime

eCommerce

eCommerce

Passenger rebooking

Labor Scheduling

Out of Stock

OnlineMortgages

IVR routing

eBankingClaims triage

Defect monitoring

alerts

What Have We Done for You Lately

31 >

Friday Night Project on Developer Exchange

32 >

V2R5.1• Enhanced triggers• Enhanced Teradata

Priority Scheduler

V2R4.1 • Atomic UPSERT• Trigger performance• Restart time reduced

V2R6.1• Read access dictionary locks• ARC with JI• Utility concurrency

limit

2001 2002 2003 2004 2005 2006

Teradata TPump

Teradata Labs Active Data Warehousing

V2R5.0• Reduced restart time• TDQM enhancements • Roles and Profiles

V2R4.0 (July 2000)• Stored Procedures• Aggregate JI• Priorities in file sys util

V2R6.0• Improved TPump performance• TDWM, TWA• Triggers calling stored procedure• Queue Tables• External table functions• Reduced restart time• Priority Scheduler enhancementsType 4 JDBC

driver

33 >

Teradata Labs Active Data Warehousing

2007 2008 2009 2010TPT streamsJMS access mod

TRM REST APIs

MDM web services

Teradata 12• Query banding• Parameterized caching

improvements• Online Archive• Replication scalability• TASM Load utility management• Java external Stored Procedures• Dispatcher fault isolation

Teradata 13• Teradata trusted sessions • Java UDFs for push down• No primary index table for ELT• File system fault isolation• Lock manager fault isolation• Restart time reduction• Replication for DDL• Replication table copy

Eclipse Plug-in• Connection management• Data Source Explorer• Create Schema, Table,

View, Trigger, Macro, Stored Procedure, UDFs

• Table edit• Ad hoc SQL editor

Eclipse Plug-in TD 12• JavaBean Wrapper Wizard• XML Services• SQL parser & formatter• Java SP Wizard• DSE menu option• JXSP Jar management• Execution plan support

Eclipse Plug-in TD 13• Spring DAO Wizard• Create Java UDF Wizard • Auto-generated Apache

Ant build file • Copy object• Compare object definition• Explain in XML

BTEQ/FastExportwrite to JMS

Hibernatedialect

34 >

Summary – Teradata Best Practices Advice

• Active Access> High speed SQL techniques> Avoid report style queries

• Active Load> Start with mini-batch> Use replication or streams

where speed is critical

• Active Events> Event driven applications> BAM or CEP> Triggers

• Active Workload Management> Categorize and prioritize> Favor Active Accesses

• Active Integration> Web services + data access

objects

• Active Availability> HA must be built into

application> Thorough review of

operational procedures