Upload
others
View
27
Download
0
Embed Size (px)
Citation preview
Test Automation for data teams with Tosca BI
By Daina Dirmaitė I Nov 13, 2018
Data migration / DWH / BI testing
Data Testing Challenges
1. Data models and data mapping documents in many ways represent project
requirements—and, as such,are unique to data testing.
4. Typically, data integrity issues go undetected unless domain experts discover a
discrepancy in a report. At this late stage, it’s difficult and time-consuming to unraveland remediate the problem.
3. A good understanding of SQL queries, data profiling methods, Excel, and DBeditors is essential.
2. The environments to be tested are complex and heterogeneous. Multipleprogramming languages, databases, data sources, data targets, and reportingenvironments are often all integral parts of the solution.
TEST TYPES:
• Constraint testing;• Source-to-target count comparisons;• Source-to-target data validation;• Transformations of source data and
application of business rules;• Duplicate testing;• Performance;…and so on…
BENEFITS OF AUTOMATED DWH / BI TESTING:
• Identify data acquisition (ETL) errors;• Reduce test creation and maintenance time;• Expose defects earlier—when they’re faster
and easier to resolve;• Improve data quality;• Eliminate delays from manual testing.
Tricentis ToscaTest Automation @ DevOps Speed
Tosca is a Continuous Testing platform that acceleratestesting to keep pace with Agile and DevOps. With theindustry’s most innovative functional testing technologies,Tricentis Tosca breaks through the barriers experiencedwith conventional software testing tools. Using TricentisTosca, leading companies such as HBO, Toyota, Allianz,BMW, Starbucks, Deutsche Bank, Lexmark, Orange andUBS achieve 90%+ test automation rates.
Test type Description
Has No Empty Values Verifies that the column has no empty values.
Field Type Checks if a field has Numeric values, depending on your selection from the drop-down menu.
Min Value Returns the smallest value of the selected column.
Max Value Returns the largest value of the selected column.
Sum Checks if the sum of this field matches the specified value. You can use the relational operators equals =, smaller than < or greater than > to compare the current sum with the specified value.
Value Range Checks if the values in this field match one of the specified values. Separate several values with a comma.
Min Length Checks if the amount of characters in this field is greater than the specified value.
Max Length Checks if the amount of characters in this field is smaller than the specified value.
Exact Length Checks if the amount of characters in this field is equal to the specified value.
Is Unique Checks if values in this field are unique.
Row Count Checks if the current row count is equal to the specified value.
Pre-screening Testing
Vital Checks
1. Completeness tests: Enable count comparisonsbetween source and target
2. Uniqueness tests: Check for the uniquenessconstraint defined in the database
3. Referential integrity tests: Check that completerecords have been copied and that technicaland logical integrity is maintained
If the database changes (e.g., a table is removed),Tricentis Tosca identifies the impacted test casesfor the review.
Test type Description
Metadata Compares predefined table definitions against the current table definition.
Completeness Tests row counts on file or table level.
Uniqueness Tests if there are primary key violations in target databases. Use this test if the target data source does not enforce constraints.
Referential Integrity Tests the primary and foreign key relationship in target databases.
Vital ChecksTable test
Vital ChecksFile test
Test type Description
Has No Empty Values Verifies that the column has no empty values.
Field Type Checks if a field has Numeric values, depending on your selection from the drop-down menu.
Min Value Returns the smallest value of the selected column.
Max Value Returns the largest value of the selected column.
Sum Checks if the sum of this field matches the specified value. You can use the relational operators equals =, smaller than < or greater than > to compare the current sum with the specified value.
Max Length Checks if the amount of characters in this field is smaller than the specified value.
Exact Length Checks if the amount of characters in this field is equal to the specified value.
Is Unique Checks if values in this field are unique.
Row Count Checks if the current row count is equal to the specified value.
Value Range Checks if the values in this field match one of the specified values. Separate several values with a comma.
Min Length Checks if the amount of characters in this field is greater than the specified value.
Reconciliation Testing
The main goal of reconciliation testing is toconfirm that the source data matches thetarget data. Tosca BI then helps to findmismatches where they are not expected.Possible mismatches are:1. a row in the source dataset is not present
in the target dataset;2. a row in the target dataset is not present
in the source dataset;3. a source row matches a target row
by RowKey, but not on all other columns.
Reconciliation tests include a complete row by rowcomparison of two datasets in two separate systems.
These datasets can be one of the following types:1. database tables;2. files, including files on a Hadoop system or on a
Linux/Unix environment connected via SSH;3. other sources - such as MS-Excel files - if you
have installed an appropriate ODBC driver.
Reconciliation Testing
You want to match source table Left and target table Right:
The row by row comparison delivers the following result:
1. Report testing verifies report creation andcontent from the end-user perspective.
2. Tests can also check access restrictions andreport generation performance.
3. Report tests might involve a combination of UIand API tests, depending on how the reports areaccessed.
4. For example, a test might open a Cognos reportin a web browser, retrieve a value from a tablein the report, and then compare it with a resultretrieved from a database query.
Data Profiling BI Report Testing
1. Profiling tests validates data for logicalconsistency and correctness from a businessperspective.
2. For example, one could automatically checkthat insurance contracts can only be cancelledif all outstanding invoices have been paid. Or,you could validate whether a certain businessprocess completes within a specified period oftime.
3. The profiling functionality can also be used tomonitor how many data values of a certaintype exist at any given point; it can alert you to“out of range” values as well as use results tocreate a trend profile over time.
Dependency testing focusses on the relationships of data, both within tables but also – and even more important –across wider ranges of the database by
joining tables. For instance, if we believe that a combination of two field values [x, y] should always specifically match values in three other fields’ values [a, b, c], we
are above all interested in any exceptions.
Dependency testing can be used as a powerful early alert system for both data-
quality and data processing issues.
Key and join testing is a vital step in any DWH/BI testing project. Whilst the
uniqueness of keys has been already covered in column profiling, joins
between primary and foreign keys now need to be checked and invalid references need to be detected.
Aligned with the arguments above, these checks can also be conducted via plain
SQL statements. Tosca’s predefined framework contains also a tool-set for
join testing.
After basic checks like comparison of elementary, table related key
performance indicators (KPIs such as number of records, sums/ means/
standard deviation of numeric fields) against expected values, column profiling provides the first cut on understanding
the data in the data warehouse: • Meaningful extract of the database
tables’ attributes (columns) are examined
• Basic features to be checked are percentage population, uniqueness (distinction), value ranges and field
lengths
Data Profiling
Column Profiling Key and Join Testing Dependency Testing
Supported Technologies
File Support1. XML, JSON;2. Excel;3. Fixed & Comma Separated;
Sample Operations
1. File to File / DB;2. DB to File / DB;3. WebHDFS to DB/File;4. DB/File to WebHDFS.
Supported Databases
1. All ODBC databases, e.g. Oracle, DB2, Teradata, MS SQL, Hive, Hbase;
2. Hadoop through WebHDFS.
Test Examples
1. Straight data move;2. Report validation;3. Data transformations.
1. Data in its final state is the driving force behind organizational decision making;
2. Raw data is often changed and processed to reach a usable format for BIreports; Data integrity practices ensure that this DWH/BI information isattributable and accurate;
3. Data can easily become compromised if proper measures are not taken to verifyit as it moves from each environment to become available to DWH/BI projects;
4. Errors with data integrity commonly arise through human errors, noncompliantoperating procedures, data transfers, software defects, and compromisedhardware.
OTHER DWH / BI TESTING TOOLS:
1. DBFit: Open source database testing tool2. iCEDQ, QuerySurge, Zuzena: Test automation
tools designed specifically for Data Warehousing and related projects;
3. Informatica Data Validation: Accelerate and automate Informatica ETL testing in both production environments and dev/ test;
4. Analytix Data Services, WhereScape, TimeXtender: DW automation tools that include test automation capabilities.
DATA WAREHOUSING PROJECTSCAN FAIL FOR MANY REASONS:
1. poor data architecture;2. inconsistently defined data;3. inability to relate data from different data
sources;4. missing and inaccurate data values;5. inconsistent use of data fields;6. unacceptable query performance and so forth.
Key Takeaways
What expect from Tosca BI?
1. Automateddata quality testing;2. Automated testing of the entire DWH/BI processing;3. Performance boost in test execution;4. Business based test case definitionwith no hyper complex SQL;5. Reduced test case maintenance effort;6. Accelerated path (3 – 6 months) to achieve test sets with high coverage of
business risk (> 90%).