15
Test Automation for data teams with Tosca BI By Daina Dirmaitė I Nov 13, 2018 Data migration / DWH / BI testing

Data migration / DWH / BI testing Test Automation …adamonis/tikv/1819r/pr/...Tricentis Tosca identifies the impacted test cases for the review. Testtype Description Metadata Compares

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Test Automation for data teams with Tosca BI

By Daina Dirmaitė I Nov 13, 2018

Data migration / DWH / BI testing

Data Testing Challenges

1. Data models and data mapping documents in many ways represent project

requirements—and, as such,are unique to data testing.

4. Typically, data integrity issues go undetected unless domain experts discover a

discrepancy in a report. At this late stage, it’s difficult and time-consuming to unraveland remediate the problem.

3. A good understanding of SQL queries, data profiling methods, Excel, and DBeditors is essential.

2. The environments to be tested are complex and heterogeneous. Multipleprogramming languages, databases, data sources, data targets, and reportingenvironments are often all integral parts of the solution.

TEST TYPES:

• Constraint testing;• Source-to-target count comparisons;• Source-to-target data validation;• Transformations of source data and

application of business rules;• Duplicate testing;• Performance;…and so on…

BENEFITS OF AUTOMATED DWH / BI TESTING:

• Identify data acquisition (ETL) errors;• Reduce test creation and maintenance time;• Expose defects earlier—when they’re faster

and easier to resolve;• Improve data quality;• Eliminate delays from manual testing.

Tricentis ToscaTest Automation @ DevOps Speed

Tosca is a Continuous Testing platform that acceleratestesting to keep pace with Agile and DevOps. With theindustry’s most innovative functional testing technologies,Tricentis Tosca breaks through the barriers experiencedwith conventional software testing tools. Using TricentisTosca, leading companies such as HBO, Toyota, Allianz,BMW, Starbucks, Deutsche Bank, Lexmark, Orange andUBS achieve 90%+ test automation rates.

Testing ChallengesEffective Test Planning

Automated E2E Testing with Tosca BI

Test type Description

Has No Empty Values Verifies that the column has no empty values.

Field Type Checks if a field has Numeric values, depending on your selection from the drop-down menu.

Min Value Returns the smallest value of the selected column.

Max Value Returns the largest value of the selected column.

Sum Checks if the sum of this field matches the specified value. You can use the relational operators equals =, smaller than < or greater than > to compare the current sum with the specified value.

Value Range Checks if the values in this field match one of the specified values. Separate several values with a comma.

Min Length Checks if the amount of characters in this field is greater than the specified value.

Max Length Checks if the amount of characters in this field is smaller than the specified value.

Exact Length Checks if the amount of characters in this field is equal to the specified value.

Is Unique Checks if values in this field are unique.

Row Count Checks if the current row count is equal to the specified value.

Pre-screening Testing

Vital Checks

1. Completeness tests: Enable count comparisonsbetween source and target

2. Uniqueness tests: Check for the uniquenessconstraint defined in the database

3. Referential integrity tests: Check that completerecords have been copied and that technicaland logical integrity is maintained

If the database changes (e.g., a table is removed),Tricentis Tosca identifies the impacted test casesfor the review.

Test type Description

Metadata Compares predefined table definitions against the current table definition.

Completeness Tests row counts on file or table level.

Uniqueness Tests if there are primary key violations in target databases. Use this test if the target data source does not enforce constraints.

Referential Integrity Tests the primary and foreign key relationship in target databases.

Vital ChecksTable test

Vital ChecksFile test

Test type Description

Has No Empty Values Verifies that the column has no empty values.

Field Type Checks if a field has Numeric values, depending on your selection from the drop-down menu.

Min Value Returns the smallest value of the selected column.

Max Value Returns the largest value of the selected column.

Sum Checks if the sum of this field matches the specified value. You can use the relational operators equals =, smaller than < or greater than > to compare the current sum with the specified value.

Max Length Checks if the amount of characters in this field is smaller than the specified value.

Exact Length Checks if the amount of characters in this field is equal to the specified value.

Is Unique Checks if values in this field are unique.

Row Count Checks if the current row count is equal to the specified value.

Value Range Checks if the values in this field match one of the specified values. Separate several values with a comma.

Min Length Checks if the amount of characters in this field is greater than the specified value.

Reconciliation Testing

The main goal of reconciliation testing is toconfirm that the source data matches thetarget data. Tosca BI then helps to findmismatches where they are not expected.Possible mismatches are:1. a row in the source dataset is not present

in the target dataset;2. a row in the target dataset is not present

in the source dataset;3. a source row matches a target row

by RowKey, but not on all other columns.

Reconciliation tests include a complete row by rowcomparison of two datasets in two separate systems.

These datasets can be one of the following types:1. database tables;2. files, including files on a Hadoop system or on a

Linux/Unix environment connected via SSH;3. other sources - such as MS-Excel files - if you

have installed an appropriate ODBC driver.

Reconciliation Testing

You want to match source table Left and target table Right:

The row by row comparison delivers the following result:

1. Report testing verifies report creation andcontent from the end-user perspective.

2. Tests can also check access restrictions andreport generation performance.

3. Report tests might involve a combination of UIand API tests, depending on how the reports areaccessed.

4. For example, a test might open a Cognos reportin a web browser, retrieve a value from a tablein the report, and then compare it with a resultretrieved from a database query.

Data Profiling BI Report Testing

1. Profiling tests validates data for logicalconsistency and correctness from a businessperspective.

2. For example, one could automatically checkthat insurance contracts can only be cancelledif all outstanding invoices have been paid. Or,you could validate whether a certain businessprocess completes within a specified period oftime.

3. The profiling functionality can also be used tomonitor how many data values of a certaintype exist at any given point; it can alert you to“out of range” values as well as use results tocreate a trend profile over time.

Dependency testing focusses on the relationships of data, both within tables but also – and even more important –across wider ranges of the database by

joining tables. For instance, if we believe that a combination of two field values [x, y] should always specifically match values in three other fields’ values [a, b, c], we

are above all interested in any exceptions.

Dependency testing can be used as a powerful early alert system for both data-

quality and data processing issues.

Key and join testing is a vital step in any DWH/BI testing project. Whilst the

uniqueness of keys has been already covered in column profiling, joins

between primary and foreign keys now need to be checked and invalid references need to be detected.

Aligned with the arguments above, these checks can also be conducted via plain

SQL statements. Tosca’s predefined framework contains also a tool-set for

join testing.

After basic checks like comparison of elementary, table related key

performance indicators (KPIs such as number of records, sums/ means/

standard deviation of numeric fields) against expected values, column profiling provides the first cut on understanding

the data in the data warehouse: • Meaningful extract of the database

tables’ attributes (columns) are examined

• Basic features to be checked are percentage population, uniqueness (distinction), value ranges and field

lengths

Data Profiling

Column Profiling Key and Join Testing Dependency Testing

Supported Technologies

File Support1. XML, JSON;2. Excel;3. Fixed & Comma Separated;

Sample Operations

1. File to File / DB;2. DB to File / DB;3. WebHDFS to DB/File;4. DB/File to WebHDFS.

Supported Databases

1. All ODBC databases, e.g. Oracle, DB2, Teradata, MS SQL, Hive, Hbase;

2. Hadoop through WebHDFS.

Test Examples

1. Straight data move;2. Report validation;3. Data transformations.

1. Data in its final state is the driving force behind organizational decision making;

2. Raw data is often changed and processed to reach a usable format for BIreports; Data integrity practices ensure that this DWH/BI information isattributable and accurate;

3. Data can easily become compromised if proper measures are not taken to verifyit as it moves from each environment to become available to DWH/BI projects;

4. Errors with data integrity commonly arise through human errors, noncompliantoperating procedures, data transfers, software defects, and compromisedhardware.

OTHER DWH / BI TESTING TOOLS:

1. DBFit: Open source database testing tool2. iCEDQ, QuerySurge, Zuzena: Test automation

tools designed specifically for Data Warehousing and related projects;

3. Informatica Data Validation: Accelerate and automate Informatica ETL testing in both production environments and dev/ test;

4. Analytix Data Services, WhereScape, TimeXtender: DW automation tools that include test automation capabilities.

DATA WAREHOUSING PROJECTSCAN FAIL FOR MANY REASONS:

1. poor data architecture;2. inconsistently defined data;3. inability to relate data from different data

sources;4. missing and inaccurate data values;5. inconsistent use of data fields;6. unacceptable query performance and so forth.

Key Takeaways

What expect from Tosca BI?

1. Automateddata quality testing;2. Automated testing of the entire DWH/BI processing;3. Performance boost in test execution;4. Business based test case definitionwith no hyper complex SQL;5. Reduced test case maintenance effort;6. Accelerated path (3 – 6 months) to achieve test sets with high coverage of

business risk (> 90%).