DEALING WITH LARGE DATA SET AND COMPLEXITY IN YOUR TESTING Jae-Jin Lee

DEALING WITH LARGE DATA SET AND COMPLEXITY IN YOUR TESTING

Jae-Jin Lee

Search results (Facts)

Google/Bing Possible number of inputs is close to infinity There are huge amount of data source Algorithms (placement) are very complex

ExpediaPossible inputs are not as huge as Google, but the

same input can return different results based on dates, traveler info and other factors.

There are huge amount of inventoriesAlgorithms are very complex and direct impact to the business

Search results (Facts)

Google/Bing Possible number of inputs is close to infinity There are huge amount of data source Algorithms (placement) are very complex

Expedia Possible inputs are not as huge as Google, but the

same input can return different results based on dates, traveler info and other factors.

There are huge amount of data source(world, inventories)

Algorithms are very complex and direct impact to the business

Testing Challenges

Test input selection Data is not organized in a way to be tested Equivalent partitioning is hard Randomness? Coverage?

Verifying mechanism How do we get expected result? RE-implement the algorithm? 474,000,000 results for "Seattle“ search

Good news

Algorithms are complex but defined We have a full access to data source Historical data/statistics are available Not all the results are equally important

Risk analysis / assessment


Question the project (Is it feasible to do it?)

Practical risk analysis Understand the risk on business perspectiveUnderstand the likelihood of faults from

development perspectiveValidations to be doneTest cases

Come up with list and reviewed by entire project team


Question the project (Is it feasible to do it?) Practical risk analysis

Understand the risk from business perspective Understand the likelihood of faults from

development perspective Test cases Validations to be done

Come up with list and reviewed by entire project team


Question the project (Is it feasible to do it?) Practical risk analysis

Understand the risk from business perspective Understand the likelihood of faults from

development perspective Test cases Validations to be done

Summarized it and reviewed by entire project team

Understand data source

Data source is trusted source of test case validation / verification mechanism

Modifying data source should be piece of cake Insert, delete, update rows or execute

sprocs Setup and tear down

No assumption on data source

Test input selection

Historic data and statistics Priority from risk analysis Creativity and product knowledge to

break Radom valid inputs from bucketing (do

as much as you can and log the useful details)

Hard-coded data

Decompose the algorithm

Exercise each logic separately by controlling data source and dependencies

Working with dev for testability or hooks (architecture, logs, and etc.)

If possible, implement algorithms for happy path in your test automation

Heuristic approach helps

Is there a place where good enough result acceptable?

Seatttle (three ‘t’s) Is that in the list? Is that in the first 10 results?

Hybrid approach (manual + automation)

Integration environment Combine human’s intuition/product

knowledge and machine’s powerful diligence

Execute manually and validate using test validation code (turning on logs)

Requires decoupled class design in your automation

UI(JavaScript) broke the functionality

Your thought?

Documents

DEALING WITH LARGE DATA SET AND COMPLEXITY IN YOUR TESTING Jae-Jin Lee