41
Business Analytics with IDS Fred Ho, IDS Development

Business Analytics with IDS - Washington Area … · Analytic applications ... Range of Business Analytics Reporting Using Query, Reporting and search tools ... – High Performance

Embed Size (px)

Citation preview

Business Analytics with IDS

Fred Ho, IDS Development

© Copyright IBM Corporation 2009. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.

IBM, the IBM logo, ibm.com, Informix, solid, DataMirror, Optim, Cognos are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries.

Other company, product, or service names may be trademarks or service marks of others.

Disclaimer

Contents

  Definition of BI/DW/BA

  Types of IDS BI Users

  OLTP vs. Data Warehousing

  Informix Warehouse

  IDS Storage Optimization

  Your Feedback and Requirements

Business Intelligence

•  A set of concepts and methodologies to improve decision making in business through use of facts and fact-based systems …..Howard Dresner, The Gartner Group

•  The processes, technologies, and tools needed to turn data into information, information into knowledge, and knowledge into plans that drive profitable business actions … .David Loshin, Business Intelligence: The Savvy Manager’s Guide

The foundation that enables BI is the enterprise architecture – business, data, and technology. A well-implemented data warehousing

program provides much of that foundation.

Data Warehousing

•  A data warehouse is a subject-oriented, integrated, non-volatile, time variant collection of data organized to support management needs ….W H Inmon

•  The Data Warehouse is nothing more than the union of all the constituent data marts

….Ralph Kimball, et al, The Data Warehouse Life Cycle Toolkit

The data warehousing process turns raw data into potentially valuable information usable by people and systems. Warehousing enhances data assets value by:

–  Applying standards and consistency to the data –  Organizing the data into subject areas that cross business functional

lines –  Integrating the data –  Enforcing data consistency over time to provide meaningful history –  Acting as a stable and reliable source –  Providing easy access to data

Business Analytics The process of using information to enhance knowledge and apply that

knowledge to help a business achieve its objectives. Analytic applications provide tools to facilitate the business analytics process.

  Business Metrics and Business Management

  Business Process Management

  Business Performance Management

  Business Activity Monitoring

  Customer Relationship Management

  Supply Chain Management

  Performance Dashboards for Information Delivery

  Real-time (or near Real-time) Monitoring

  Scorecards for Information Delivery

  Monitoring history & trends

  Analytic Applications for Information Delivery

  Customer Analysis, Marketplace Analysis, Sales Channel Analysis, …

Range of Business Analytics

Reporting

Using Query, Reporting and search tools

Analysis

Monitoring

Prediction

Using OLAP & Virtualization tools

Using Dashboards & Scorecards

Using Predictive Analysis tools

Business Value High Low

High

Com

plex

ity

Source: TDWI

IDS in BI/Warehousing

•  Given the IDS Characteristics of Reliability, High Availability, Performance, Ease of Use, why isn’t IDS in this space? –  IDS has traditionally been viewed as an OLTP solution

•  However, there a lot more warehousing users on IDS than one realizes! –  Some customers have implemented IDS warehouses at

Terabyte levels –  There are a lot of features already in IDS that make it suitable

for BI/Warehousing –  BI tools have become very sophisticated over the years

•  We recognize the need to provide better warehousing capabilities for IDS users

What’s Available? IDS Warehousing Features

•  Performance & Scalability –  Inherent SMP Multi-threading –  Parallel Data Query (PDQ) –  Light Scan for fast table scans –  Online Index build –  Efficient Hash Joins –  Auto Fragment Elimination –  Memory Grant Manager (MGM) –  High Performance Loader –  Optimistic Concurrency

•  Easy of Management –  Time cyclic data management using Range Partitioning –  OPTCOMPIND optimization

BI Users Classification

1. BI on Existing OLTP Schema (Operational BI) 2. BI on Star Schema (Data Mart) 3. BI in a Mix-Workload Environment 4. Enterprise BI

Type 1: BI/Analytics on OLTP Schema

•  Majority of today’s IDS customers have the need to do BI/Analytics on their existing IDS (OLTP) database.

•  They currently use a combination of 4GL programs, Excel, and BI tools (Business Objects, Cognos, Crystal Reports)

•  Custom code and maintenance required by customer •  Performance may be acceptable even on an OLTP schema •  Allows for “operational BI”

OLTP vs. Data Warehousing Workload

•  Short Transactions –  Relatively simple SQL

•  Random Updates –  Few Rows accessed

•  Sub-second response time •  ER Modeling

–  Minimizes redundancy •  Normalized data (5NF)

–  Minimizes duplicates •  Few indexes

–  Avoids index maintenance •  Pre-compiled queries

–  Repeated execution of queries

•  Longer Transactions –  Complex SQL with analytics

•  Sequential Updates –  Many Rows Accessed

•  Secs to Mins response time •  Dimensional Modeling

–  OK to have redundancy •  De-normalized data (3NF)

–  Duplicates are OK •  OK to have more indexes

–  Mostly read only •  Ad-hoc queries

–  Unpredictable load

Type 2. BI/Analytics on IDS on Star Schema

•  Transform OLTP database into Star Schema database

•  Better performance for data warehousing and dimensional queries

•  Star Schema database may be on a separate machine/domain

•  Suitable for customers building separate data mart

•  Use IDS as is against Star Schema

What’s Available? BI Tools

The Performance Management Framework Cognos identifies best-practice decision areas, or information sweet spots by business function:

Cognos 8 provides a comprehensive set of BI tools for:

  Reporting

  Analysis

  Dashboards

  Scorecards

Performance Management Framework for:

  Solutions for different areas of the organization

Cognos Business Intelligence and Performance Management One Platform, One Architecture

Industry and Functional Solutions

Complete Coverage of all capabilities

Enterprise-Class SOA Platform

Data Warehouse Architecture

SQL Warehousing Tool Overview

–  Warehousing Process –  Design Studio –  Admin Console –  Summary

SQL Warehousing Tools Overview

•  Typical process –  Identify requirements

•  Data Architect

–  Define data transformation (ETL/ELT) process

•  SQL/ETL developer –  Development of sql/shell scripts

•  SQL/ETL developer –  Deployment in production system

•  Application Architect, DBA

–  Reporting •  Business user

–  Refine requirements

•  SQW Solution –  Data Modeling

•  Physical Data Model (Reverse engineering, New from scratch, generate DDL), compare & sync

–  Data Flows �•  Visual Design •  Optimized SQL code generation •  Control flow supports programming

logic –  Admin Console

•  Schedule, Monitor, Parameterized values

–  Eclipse free reporting tool •  e.g. BIRT

–  Reusable flows •  Easy refinement •  Copy & paste, refactor •  Challenges

–  Dynamic requirements •  Constantly refinement

–  Multiple roles, tools •  Each have different

perspective •  Communication cost/

information loss –  Unreadable, hard-to-debug scripts

•  Poor productivity

•  Values –  Easy to design & reuse

•  Increased productivity –  Integrated tools

•  Seamless integration inside Eclipse

–  Auto generated code from visualized flows

•  Optimized SQL code –  Impact analysis for any data model

change

SQW

Control DB

IDS

Execution

DESIGN

Design Center

(Eclipse)

Data Flows + Control Flows

DEPLOY

Deployment

preparation

Deployment package

Code Units Build Profile

User scripts

Deploy

RUNTIME HTTP service (WAS )

SQW Runtime

Applications Other Servers

(DataStage)

Warehouse

DB

IDS

DB2

Oracle

SQL Server

Desig

n

Stu

dio

A

dm

in C

on

sole

Deploy

SQW

Execution

DB

IDS

Data Source

Data

base

s

SQW Architecture

SQW: Design Studio •  Design Studio

–  Eclipse based IDE •  Integrated tools, shell sharing

–  Team development •  CVS, clearcase for checkin/checkout

projects, flows •  Data Warehousing Project

–  Data Models –  Data Flows –  Control Flows –  Warehouse Applications (deployment

packages) –  Subflow & Subprocess (reusable flow

module) –  Variables

•  Data Source Explorer –  Database connections to multiple

vendors, e.g. Informix, DB2 LUW, Oracle, SQL Server, MySQL, DB2 z/OS

•  DataStage Servers –  Integration with IBM DataStage

SQW: Data Modeling

 Physical Data Model  Visualized data modeling

 Impact analysis

 Reverse engineering or new from scratch

 Compare & sync

 Generate DDL

 Overview diagram

 Shell Sharing with Rational Data Architect & other Data Studio products

SQW: Data Flows

Data Flow Operators: -- source & target operators (table, file)

-- SQL Transformation operators

-- Warehousing operators

File source

Table source

Table join

aggregation

Table target

SQW: Data Flows

A simple flow

Generated SQL code

-- optimization across SQL statements.

-- optimized staging strategy

-- in-database transformation

SQW: Control Flows

Control flow

  Common utility operators

  Control logic, parallel execution, loop iteration

  Error handling

SQW Overview

Design Studio

Eclipse Based Design Environment

Admin Console

Production Environment in Websphere

deploy Application package (zip file)

 deployment profile (database connections, machine resources, variable definitions, DDL files etc..)

  Generated code

create

Manage warehouse applications   Schedule

  Monitor

man

age

Admin Console

 Flex RIA based Warehouse Admin Console

 Admin Console manages common resources (e.g. databases connections, ftp servers, datastage servers)

 Schedule & monitor warehouse processes

XPS Customers Looking to Migrate to IDS

•  External Tables – XPS style loader for easy migration

•  Partitioning Strategies – Auto fragmentation –  Fragment Advisor –  Fragment stats Update –  Truncate Fragments

•  Primary Storage Manager (PSM) –  For simpler, easier management of backups

(replacing ISM) •  Merge

– UpSert capabilities

* Features to be included in the next release(s)

Shared Disk

OLTP Apps

SQW

Connection Manager

Primary

SDS

SDS

“OLTP” Node Group

SDS

“SQW” Node Group

M

AC

H 1

1

Blade Server

User transparency Single

database view

OLTP Apps SQW

OR

(ETL) OLTP

Database

Data Warehouse Database

Use Separate Boxes

Use MACH 11

Using Mach11 for OLTP/Warehousing in IDS

IDS Storage Optimization

  Now Available as of 11.50xC4

  Deep Compression + Storage Optimization

Row Compression Concepts

•  Compression looks for repeating patterns across the entire table

–  When pattern found, string replaced by a 12 bit symbol

–  Symbols are stored in a dictionary for fast lookup

•  Data resides compressed on pages (both on-disk and in bufferpool)

–  Significant I/O bandwidth savings – better performance

–  Significant memory savings – more efficient memory utilization

–  Some CPU overhead costs

•  Rows must be uncompressed before being processed for evaluation

Row Compression Using a Compression Dictionary

•  Dictionary contains repeated information from the rows in the table

–  Compression candidates can be across column boundaries or within columns

A (01) 220J 200 (02) S (01) 580

T 132 (02) …

Animated Slide

PartCode SPart Quantity LotNum BinLoc Aisle

ANCPRPLT 220J 200 Z165-3 NE132 6157

SNCPRPLT 580T 132 Z165-3 NE132 6157

Dictionary

01 NCPRPLT

02 Z165-3NE1326157

… …

ANCPRPLT 220J 200 Z165-3 NE132 6157 SNCPRPLT 580T 132 Z165-3 NE132 6157 …

A (01) 220J 200 (02) S (01) 580T 132 (02) …

Storage savings

•  Tables will often compress in the range of 60% - 80%

•  Overall database storage savings will be between 40% and 50%

•  That’s 50% less disk space needed to support IDS 11 database!

81% S

maller

78% S

maller

Sales Table Product Table

Performance Benefit

•  Performance can be improved using compression

•  Many queries will benefit from compression with fewer I/Os

•  Consumes more CPU - most customers not 100% CPU bound

40% Faster

–  Lab tests show I/O bound workloads improve by 30-40%

•  Many utility (backup and recovery for example) will be faster –  2x as fast in some cases as the

database may now be ½ the size

IDS 11 Compression Operations

• estimate_compression –  Estimates compression ratio on a table

• create_dictionary –  Creates compression dictionary for a table

• compress –  Does implicit create_dictionary and compress all previous data

• uncompress –  Uncompress the table and deactivates compression

• uncompress_offline –  XLOCK table and uncompress it. Also deactivates compression

• purge_dictionary –  Delete old inactive dictionaries

Storage Optimization Operations

• repack – Move rows within a table or fragment to consolidate free space

• repack_offline –  XLOCK the table and move rows within a table or fragment to

consolidate free space

• shrink –  Return free space at end of table or fragment to the dbspace

–  Normally done after a repack

Compression On Data Page With Multiple Rows

compress repack

Uncompressed Compressed Compressed

shrink

Multiple Compressed

Pages Dictionary

Empty Data Pages Animated Slide

Admin API Interface

•  All compression and storage optimization operations are invoked via the IDS Admin API built-in UDRs – execute function task(…); – execute function admin(…);

•  Example execute function task

(

”table compress repack shrink”,

”table_name”, ”database_name”, ”owner_name”

);

Features That Cannot Be Compressed

•  Out-of-row data (e.g. blobs)

•  Indexes

•  Temp tables

•  Catalog tables (Data Dictionary)

•  Partition tables (Tablespace Tablespace)

•  Dictionary Partitions

•  Tables in the following databases: – Sysmaster – Sysutils – Sysuser – Syscdr – Syscdcv1

HDR, ER, CDC (DataMirror) and Compression

•  All are supported on compressed tables •  HDR

–  Tables will be compressed on secondary iff they are compressed on primary

•  ER –  Compression status of tables is independent between source

and target, specified by user •  CDC

–  Compression of targets is a function of what the target database supports and what use specifies

Summary

•  Storage optimization through IDS 11 compression can save 40-50% of your database storage requirements

•  For IO-bound workloads Compression can also improve performance

•  You not only see your online database shrink but often more importantly, your backup storage and disaster recovery storage is cut in half as well

•  In real customer examples storage savings are realized and performance benefits are apparent

•  Add in the time savings with utilities processing (particularly database backup and recover time is cut in half) and you can see the benefits of IDS 11 compression