Upload
truongnhan
View
219
Download
1
Embed Size (px)
Citation preview
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 1
Enterprise Business Intelligence & Data Warehousing:
The Data Quality Conundrum
A study for
Business and IT Professionals
Introduction
Objective: To understand current trends in Business Intelligence,
Data Warehousing and Data Quality
Industries: Pharma/Healthcare, Retail, Media, TelCo, Financial,
Software/Hardware, Aerospace/Defense, Services, Higher Education
Target Audience: CxO, Director, Vice President, Architect,
Project Manager
Company Size: 500 to 10,000+ employees
Total Surveys: 207 responses
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 2
1) Oracle (+ MySQL , Exadata)
2) Teradata (+ Aster Data)
3) everyone else
“projected 10% growth in the database management system
market and a significant increase in organizations seeking to deploy data
warehouses for the first time.”
- analyst firm Gartner
Data Warehouses are everywhere…
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 3
1) IBM (Cognos)
2) Microsoft (surprising!!!)
3) Oracle
Business Intelligence is big on CxO’s List…
More Analytics for Executives “The (BI) market is now forecast to continue to grow at a 9.8% compound
annual growth rate through 2016. The media attention on Big Data has put
Business Analytics on the agenda of more senior executives.”
- analyst firm IDC
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 4
1) Microsoft (surprise #2)
2) Informatica PowerCenter
3) IBM’s combined offering
Companies Invest in Enterprise ETL…
Companies Upgrade to Packaged ETL Tools “The enterprise ETL market continues to grow as more enterprises replace
manual scripts with packaged ETL solutions.”
- analyst firm Forrester
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 5
Size of Data Warehouses
• 4% now greater than 1 Petabyte
• 8% now greater than 100 Terabytes
• 33% between 1 TB & 100 TB
Data Warehouse Size is on the Rise…
Significant Increase in Size “There is a significant increase in data size when the largest sector in our
poll 2 years ago was measured in Gigabytes.”
- RTTS report
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 6
Current Testing Strategy
• 33% test across all ETL legs
• 30% only verify row counts
• 15% use minus queries
Only 7% implemented all 3 above
Current Test Strategies Are Flawed…
Lack of Automated Testing Causes Pain “Many singled out a lack of automated testing and a lack of testing resources
as reasons they did not deploy a more rigorous testing strategy.
- RTTS report
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 7
Top Execution Methods
1) Manual testing
2) Vendor tools
3) Home grown tools
Current Test Execution Method
Companies are checking data by eye
Typical Manual
Testing Process • extract data from the
source databases, files, XML
• extract data from the Data
Warehouse after ETL process
• compare millions of data
sets by eye (impossible to
scale)
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 8
• 84% had less than 50 percent data coverage
• 58% had less than 25 percent coverage
• 33% had less than 5 percent
• 29% of companies had less than 1 percent.
Data Coverage is Dangerously Low …
Percent of Current Data Coverage
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 9
Lack of Automated Testing is #1 problem…
Biggest challenges When Testing Data
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 10
Effects of Bad Data
100% of respondents experienced bad data in their data warehouses
Top Answers
• “Incorrect business intelligence reports“
• “Poor delivery quality & customer dissatisfaction”
• “Missing revenue opportunities”
• “Critical business decisions made on bad data”
• “SLA issue with our customers”
• “Major embarrassments to our team”
• “Long working hours to fix bad data”
Bad Data is Wreaking Havoc…
Bad Data Causes Companies to Lose Up to $100 million “the average organization loses $8.2 million annually through poor data
quality, with 22% estimated their annual losses at $20 million and 4% put
that figure as high as an astounding $100 million”
- Gartner
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 11
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 12
RTTS’ Solution…
Ensuring Data Warehouse Quality
built by
QuerySurge is the premier test tool built to automate
ETL Testing and ensure data warehouse quality
QuerySurge
Design Library Create Query Pairs
(source & target SQLs)
Scheduling Build groups of Query Pairs Schedule Test Runs
Run Dashboard View execution in real-time Analyze real-time results
Deep-Dive Reporting Examine and automatically email
test results
12/9/2013 © 2013 Real-Time Technology Solutions, Inc.
QuerySurge™ Modules
13
automates the testing effort the kickoff, the tests, the comparison, emailing the results
speeds up testing up to 1,000 times faster than manual testing
schedules test runs run now, every Tuesday at 11pm or right after ETL process
tests across different platforms any JDBC-compliant database, DWH, D-Mart, flat file, XML
views & shares results through automated emailing of reports
verifies more data verifies upwards of 100% of all data
QuerySurge……
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 15
Value Add Increase in data testing coverage from < 10% to upwards of 100%
Decrease in testing time by as much as 1,000 x over manual tests
Combination of an increase in coverage while a decrease in time
Return on Investment (ROI) Redeployment of testing head count by upwards of 75%
Time savings over manual testing
Increase in better data due to shorter / more thorough testing cycle
QuerySurge ROI
Case Study: Fortune 50 Firm ROI of QuerySurge vs. manual testing
• 522 query pairs executed per release
• Manual execution: 21.75 days (174 hours)
• QuerySurge : 0.35 days (2.75 hours)
• 12 major releases per year
• 3-year ROI: $883,000 or 1,660% return
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 16
Conclusion
Companies are not providing an acceptable level of data quality
Most firms test < 10% of their data
Manual testing and sampling the comparisons are widely used
Automation, which can increase coverage and speed up testing, is used
on less than 1/3 of projects
As data grows exponentially, there is a great risk of decisions made
based on bad data
A more complete solution is needed to keep enterprise-level data clean
QuerySurge provides that complete solution
12/9/2013 © 2013 Real-Time Technology Solutions, Inc. 17
built by