18
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 1 Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum A study for Business and IT Professionals

The Data Quality Conundrum - querysurge.com fileIndustries: Pharma/Healthcare, Retail, Media, ... Informatica PowerCenter 3) ... Warehouse after ETL process

Embed Size (px)

Citation preview

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 1

Enterprise Business Intelligence & Data Warehousing:

The Data Quality Conundrum

A study for

Business and IT Professionals

Introduction

Objective: To understand current trends in Business Intelligence,

Data Warehousing and Data Quality

Industries: Pharma/Healthcare, Retail, Media, TelCo, Financial,

Software/Hardware, Aerospace/Defense, Services, Higher Education

Target Audience: CxO, Director, Vice President, Architect,

Project Manager

Company Size: 500 to 10,000+ employees

Total Surveys: 207 responses

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 2

1) Oracle (+ MySQL , Exadata)

2) Teradata (+ Aster Data)

3) everyone else

“projected 10% growth in the database management system

market and a significant increase in organizations seeking to deploy data

warehouses for the first time.”

- analyst firm Gartner

Data Warehouses are everywhere…

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 3

1) IBM (Cognos)

2) Microsoft (surprising!!!)

3) Oracle

Business Intelligence is big on CxO’s List…

More Analytics for Executives “The (BI) market is now forecast to continue to grow at a 9.8% compound

annual growth rate through 2016. The media attention on Big Data has put

Business Analytics on the agenda of more senior executives.”

- analyst firm IDC

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 4

1) Microsoft (surprise #2)

2) Informatica PowerCenter

3) IBM’s combined offering

Companies Invest in Enterprise ETL…

Companies Upgrade to Packaged ETL Tools “The enterprise ETL market continues to grow as more enterprises replace

manual scripts with packaged ETL solutions.”

- analyst firm Forrester

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 5

Size of Data Warehouses

• 4% now greater than 1 Petabyte

• 8% now greater than 100 Terabytes

• 33% between 1 TB & 100 TB

Data Warehouse Size is on the Rise…

Significant Increase in Size “There is a significant increase in data size when the largest sector in our

poll 2 years ago was measured in Gigabytes.”

- RTTS report

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 6

Current Testing Strategy

• 33% test across all ETL legs

• 30% only verify row counts

• 15% use minus queries

Only 7% implemented all 3 above

Current Test Strategies Are Flawed…

Lack of Automated Testing Causes Pain “Many singled out a lack of automated testing and a lack of testing resources

as reasons they did not deploy a more rigorous testing strategy.

- RTTS report

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 7

Top Execution Methods

1) Manual testing

2) Vendor tools

3) Home grown tools

Current Test Execution Method

Companies are checking data by eye

Typical Manual

Testing Process • extract data from the

source databases, files, XML

• extract data from the Data

Warehouse after ETL process

• compare millions of data

sets by eye (impossible to

scale)

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 8

• 84% had less than 50 percent data coverage

• 58% had less than 25 percent coverage

• 33% had less than 5 percent

• 29% of companies had less than 1 percent.

Data Coverage is Dangerously Low …

Percent of Current Data Coverage

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 9

Lack of Automated Testing is #1 problem…

Biggest challenges When Testing Data

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 10

Effects of Bad Data

100% of respondents experienced bad data in their data warehouses

Top Answers

• “Incorrect business intelligence reports“

• “Poor delivery quality & customer dissatisfaction”

• “Missing revenue opportunities”

• “Critical business decisions made on bad data”

• “SLA issue with our customers”

• “Major embarrassments to our team”

• “Long working hours to fix bad data”

Bad Data is Wreaking Havoc…

Bad Data Causes Companies to Lose Up to $100 million “the average organization loses $8.2 million annually through poor data

quality, with 22% estimated their annual losses at $20 million and 4% put

that figure as high as an astounding $100 million”

- Gartner

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 11

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 12

RTTS’ Solution…

Ensuring Data Warehouse Quality

built by

QuerySurge is the premier test tool built to automate

ETL Testing and ensure data warehouse quality

QuerySurge

Design Library Create Query Pairs

(source & target SQLs)

Scheduling Build groups of Query Pairs Schedule Test Runs

Run Dashboard View execution in real-time Analyze real-time results

Deep-Dive Reporting Examine and automatically email

test results

12/9/2013 © 2013 Real-Time Technology Solutions, Inc.

QuerySurge™ Modules

13

QuerySurge™ Architecture

Target

Sources

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 14

automates the testing effort the kickoff, the tests, the comparison, emailing the results

speeds up testing up to 1,000 times faster than manual testing

schedules test runs run now, every Tuesday at 11pm or right after ETL process

tests across different platforms any JDBC-compliant database, DWH, D-Mart, flat file, XML

views & shares results through automated emailing of reports

verifies more data verifies upwards of 100% of all data

QuerySurge……

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 15

Value Add Increase in data testing coverage from < 10% to upwards of 100%

Decrease in testing time by as much as 1,000 x over manual tests

Combination of an increase in coverage while a decrease in time

Return on Investment (ROI) Redeployment of testing head count by upwards of 75%

Time savings over manual testing

Increase in better data due to shorter / more thorough testing cycle

QuerySurge ROI

Case Study: Fortune 50 Firm ROI of QuerySurge vs. manual testing

• 522 query pairs executed per release

• Manual execution: 21.75 days (174 hours)

• QuerySurge : 0.35 days (2.75 hours)

• 12 major releases per year

• 3-year ROI: $883,000 or 1,660% return

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 16

Conclusion

Companies are not providing an acceptable level of data quality

Most firms test < 10% of their data

Manual testing and sampling the comparisons are widely used

Automation, which can increase coverage and speed up testing, is used

on less than 1/3 of projects

As data grows exponentially, there is a great risk of decisions made

based on bad data

A more complete solution is needed to keep enterprise-level data clean

QuerySurge provides that complete solution

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 17

built by

12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 18

Thank you for your time and attention

the developers of

QuerySurge