Upload
dangdiep
View
226
Download
0
Embed Size (px)
Citation preview
Ajay Nalabhatla , QA Lead
Srihari Gopisetty , Technology Manager
Wells Fargo India Solutions
1
Data Warehouse Testing Best practices to improve and sustain Data Quality
– Getting ready for Serious DevOps
Abstract
In the age of Digital disruption, every organization wants to transform its technology arm in to advance practices - DevOps , to and fulfill the continuous demand from Business.
However, all organizations are data driven and they need to realize that the success does not rely only on faster throughput and speed but also on the ability to access the diverse, large volume of
complex data at real time to make strategic decisions.
Important question is – Does the organization have ‘Quality’ Data ?
“On average, U.S. organizations believe 32% of their data is inaccurate” -Gartner “Average organization loses $8.2 million annually due to poor Data Quality.“ -Experian “Less than 0.5% of all data is every analyzed” -Forrester
2
Even as many organizations are establishing the Data Warehouse Testing as specializedservice, recent surveys indicate that much more improvements needs to be done. It is acall–to-action for organizations to address Data Quality gaps
Setting the context
Data are of high quality "if they are fit for their intended uses in operations, decision making and planning.“– J.M.Juran (Source –dqglossary.com)
Issue Drivers QA Key CausesData Quality
REMEMBER : Poor Data Quality = Use of Less Information for Decision Making3
DimensionValidity
CompletenessTimeliness
Integrity
AccuracyConsistency
Unavailability of Complete Data
ETL Transformation
Delayed Batch SLA
Batch Performance
Obsolete Jobs & Records
No Exhaustive Validation
Missing Defined Test Strategies
Lack of Tools / Accelerators
Incomplete DB Objects Validation
Missing End -to- End QA Framework
Missing Standard Process
Where, What, Why
Staging/ODS
Xml
Ebcdic
Heterogeneous Sources
Data Warehouse BI
Ascii
DB
Extracts
Inte
rnal
So
urce
s
External Sources
Views
ETL
Extracts Downstream Apps Downstream
AppsET
L
4
Other DB Objects
DB Objects
Static TestingETL
Transformation Testing
Staging /ODS Validation
Data Warehouse Validation
Data Quality & Objects Validation
Batch Performance
BI Testing / Extracts
High Level Tests
High Medium Low
Views
Tables
Reports
Applications
OLTP OLAP
QA Framework – For High Quality
Exhaustive validation at every intermediate check point
Data Integrity Validation – RI checks etc.
Heterogeneous sources validation –Xml/Ascii /Ebcdic
Database Privileges validation at Table/View/Report Level
Runbook and scheduler/dependence validations
Database Objects Validation – Partitions, Synonyms, Flashback etc.
Batch Performance Execution
BI reports –UI ,Data & Performance validation
5
Specialized Testing
Database Object Validation - Partition
Dat
abas
e Ta
ble
Users View DBA’s View
Regression Interval & Merge Purge >13MDay Wise Partition
Inside DB Table 1st time LoadSource Merge @ Monthly Purge >13 Months
Jan’17
Feb’17Day2
Day 31
Day3
Daily Loads
Feb ‘18
Test
Str
ateg
y
Test Flow
Dec’17
Feb’17
Feb ‘18
6
Automation Possibility
Extracts
Parameter Home Grown Tools External ToolsStatic Testing Limited Limited
Source File – Metadata / Layout / Fields Order Yes – Macro / UNIX Shell Yes
Exhaustive Validation – Diff Server DB’s Yes – Macro / UNIX Shell Yes for BothBatch Job execution in sequence Yes - UNIX NoHeterogeneous File Load & Comparison (ASCII / XML / EBCEDIC) Yes – ASCII / XML only YesRegression Testing – Tables /Extracts Yes – Macro / UNIX YesData Quality checks Yes – Macro / UNIX YesTable Metadata validation Yes – Macro / UNIX YesBI Reports Validation (Data /Graphs) Yes – Only Data YesBatch Performance Testing Yes -UNIX Yes
Views ValidationYes – Macro using ODBC
/ UNIX YesPartition /Index validation No NoTest Cases Batch Execution yes –Unix / Excel YesAutomate Test Execution Scheduler yes –Unix Yes
Market Tools= Automation possibility
7
Integrate the ETL testingIn DevOps
Usu
al A
rchi
tect
ure
for E
TL
Static Testing ETL/DW Testing Batch Performance Testing
8
DevOps Readiness
24
Pre-
Requ
isite
sRequire thorough knowledge on ‘Line of Business’ Data flow
Runbook availability following predecessors & successors
Identification of suitable Test approach based on project types
Availability of Test data that represent all needs
Alternative analysis to avoid table unusable state issues
Ensure table Referential Integrity is addressed
Availability of TDM team for table refresh to previous state in case of failures
Batch performance SLA prediction
Focus about - Cultural , Process and Tools
Benefits
25
Improved Business confidence on Quality Blueprint That helps companies to gear up Possible Faster Iteration, Quick Feedback , Great Collaboration Low-priced Automation possibilities Insights on various Database Object Validations Early Defect Detection
Ajay Nalabhatla works as a Data Management QA Lead at WellsFargo for a Line of Business. He has around 11+ years of experience in Quality Assurance for ETL, DWT & BI Testing. Over a decade, he has involved in various DWT projects for different Banking & Securities Clients and delivered them successfully . He was also involved in conducting the Due Diligence for various DWT clients and also suggested them many improvements in both process and Technical competencies.
Ajay’s holds Bachelor of Engineering in Electronics & communication from Anna University
Srihari Gopisetty is managing the Data Management and Digital Advisory Teams for a Line of Business at Wells Fargo India Solutions. He has more than 17 years of rich experience in leading teams for BFSI Domain and Microsoft Products. Prior to Wells Fargo, he has worked with Microsoft, First Advantage
Srihari’s educational background includes Bachelor of Engineering in Mechanical from Gulbarga University
Author Biography
Srihari Gopisetty
Ajay Nalabhatla
27
References & Appendix
28
• http://www.pavantestingtools.com/p/load-runner.html• http://www.slideshare.net/ITRevolution/thursday-320-john-kosco-gb-final• https://en.wikipedia.org/wiki/DevOps• https://www.slideshare.net/Hadoop_Summit/scaling-self-service-on-hadoop• http://digitalcto.com/can-you-build-software-faster-cheaper-and-better/
Test Approach - Data Migration
30
Pre - Migration Post -Migration Steady State
Analyze Db and identify the objects –
Tables/Indexes/views
Segregate wave wise plan -Forklift ,Consolidation, static
, Dynamic tables /views
Take the snapshot for Post comparison
Prioritize / Create Batch & collect pre-run stats
Compare all DB objectsBetween Legacy & New
Validate data for all tables b/w legacy & new
identified for migration
Validate New tables transformation rules are
as per specs
Parallel load comparison for tables between legacy
& new
Execute Batch performance testing &
compare stats with Legacy
Steady state Validations for Monthly loads
Downstream App’s support
Validate Purging process is as
expected for New system
Performance Monitoring for Data
Load
Infrastructure Upgrade Test Strategy- ETL tool /Scheduler tool upgrade
Direct SQL processing
ETL Jobs
Store Procedure
jobs
File watcher / Legacy jobs using c/c++
Pre Migration
•Identify the various types of jobs•Identify the Priority Jobs for phase wise execution•Build run book for the phases with all dependencies•Collect the batch statistics•Take the snapshots of relevant table data
Post Migration
•Execute the identified jobs on upgrade system•Validate the batch / job performance with pre-stats•Regression Testing for tables•Parallel Load processing & data validation•Validate Job dependencies and predecessors
Steady State
•Batch performance monitoring .• Analyse the failures to understand if they are
upgrade related•Steady state support for downstream applications
Test Strategy
31
Job Types