Test Data, Information, Knowledge, Wisdom: past, present & future of standing, running, driving & flying (2016)

SIGiSTSpecialist Interest Group inSoftware Testing 15 Sep 2016

1

Test Data, Information, Knowledge, Wisdom:the past, present & future of standing, running, driving & flying

Neil Thompson @neilttweetThompson information Systems Consulting Ltd

©Thompson information Systems Consulting Ltd

v1.2

(v1.0 was the handout, v1.1 was presented on the day.This v1.2 has erratum & appendix)


2

Agenda

• Part A (past & present) The basics of test data• B (“ “) Data structures & Object Orientation• C (the present) Agile & Context-Driven• D (present & future) Cloud, Big Data, Internet

of Things / “Everything” (oh, and Artificial Intelligence, still)

• Summary, takeaways etc

© Thompson information Systems Consulting Ltd


3

(past & present): The basics of test data

Part A



4

The poor relation of the artefact family?


Image credits: (extracted from) slideshare.net/softwarecentral (repro from ieeeexplore.ieee.org) thenamiracleoccurs.wordpress.com


5

ISO 29119 on test data

• Definition: data created or selected to satisfy the input requirements for executing one or more test cases, which may be defined in the Test Plan, test case or test procedure

• Note: could be stored within the product under test (e.g. in arrays, flat files, or a database), or could be available from or supplied by external sources, such as other systems, other system components, hardware devices, or human operators

• Status of each test data requirement may be documented in a Test Data Readiness Report

• Hmm... so, what about data during and after a test?© Thompson information Systems Consulting Ltd


6

ISO 29119 on test data (continued)• “Actual results”:

– Definition – set of behaviours or conditions of a test item, or set of conditions of associated data or the test environment, observed as a result of test execution

– Example: Outputs to screen, outputs to hardware, changes to data, reports and communication messages sent

• Overall processes:(test datainformationbuilds throughthree ofthese inparticular...)


Source: ISO/IEC/IEEE 29119-2


7

ISO 29119 on test data (continued)• Test Planning (process, in Test Management)

identifies strategy, test environment, test tool & test data needs:– Design Test Strategy (activity, which contributes to Test

Plan) includes:• “identifying” test data• Example: factors to consider include regulations on data

confidentiality (it could require data masking or encryption), volume of data required and data clean-up upon completion

• test data requirements could identify origin of test data and state where specific test data is located, whether has to be disguised for confidentiality reasons, and/or the role responsible for the test data

• test input data and test output data may be identified as deliverables



8

ISO 29119 on test data (continued)• Within Test Design & Implementation (process):

– Derive Test Cases (activity):• preconditions include existing data (e.g. databases)• inputs are the data information used to drive test

execution – may be specified by value or name, eg constant tables, transaction files, databases, files, terminal messages, memory resident areas, and values passed by the operating system

– Derive Test Procedures (activity) includes:• identifying any test data not already included in the Test

Plan• note: although might not be finalized until test

procedures complete, could often start far earlier, even as early as when test conditions are agreed



9

ISO 29119 on test data (continued)• Within Test Design & Implementation process (continued):

– Test Data Requirements describe the properties of the test data needed to execute the test procedures:

• eg simulated / anonymised production data, such as customer data and user account data

• may be divided into elements reflecting the data structure of the test item, eg defined in a class diagram or an entity-relationship diagram

• specific name and required values or ranges of values for each test data element• who responsible, resetting needs, period needed, archiving / disposal

• Test Environment Set-Up & Maintenance (process) produces an established, maintained & communicated test environment:– Establish Test Environment (activity) includes:

• Set up test data to support the testing (where appropriate)– Test Data Readiness Report documents:

• status wrt Test Data Requirements, eg if & how the actual test data deviates from the requirements, e.g. in terms of values or volume



10

My thoughts: Standing & Running data; CRUD build-up


• Consider a new system under test:

SystemCreate

new data(software checks validity)

Standingdata

Runningdata

SystemCreate

(checks validity wrt reference data)

Standingdata

Runningdata

System

C,R,U,D• Moving on to

test an “in-use” system:

• Now, what about test coverage?...

Running test data

Running test data

Runningtest data

Select from /use all

Via tools / interfaces

Outputs

Create,Read,Update,Delete

C,R,U,D


11

A little industrial archaeology: 1993!

© Thompson information Systems Consulting Ltd• “Organisation before Automation”


12

... and 1999...


• “Zen and the Art of Object-Oriented Risk Management”

• Added the concept of input & output data spreads


13

... oh, and 2007!


• “Holistic Test Analysis & Design”(with Mike Smith)

Non-Func

Req’ts

S1ES1ES2S2

ES3

S1ES1ES2S2

ES3

S1ES1ES2S2

ES3

S1ES1ES2S2

ES3

F3 F6F5F2F1

F3 F7F5F1F2

F3F1

F3F2

F10 F12F11F9F8

Data A

Data A

Data B

Data C

Data D

F3 F6F5F2F1

F3 F7F5F1F2

F3F1

F3F2

F10 F12F11F9F8

Data A

Data A

Data B

Data C

Data D

COM

PON

ENT

COM

PON

ENT

S

YSTE

M

SYS

TEM

S

ACC

EPTA

NCE

INTE

GRAT

ION

INTE

GRAT

ION

FuncReq’ts …

FuncSpec

TechDesign

ModuleSpecs

ProgrammingStandards

Workshops

Functional Non- Functional Online Batch

Val Nav … Perf Sec …

Behav

Behav

Behav

Struc

Behav

Behav

Behav

Struc

Behav

Struc

Service to stakeholders

Streams

Threads

Modules

TEST ITEMS

ServiceLevels

Behav

I’faceSpec

F5

F2

F4

F3

F1

C1

C3

C2

C5

C4

Public op

CAB

C6

F5

F2

F4

F3

F1

C1

C3

C2

C5

C4

Public op

CAB

Public op

CAB

C6

F5F3

F1C3

C2

F2 F5F3

F1C3

C2

F2

Pairs /clusters of

modules

TEST FEATURES BEHAVIOURAL /STRUCTURAL

TEST BASISREFERENCES PRODUCT

RISKSADDRESSED

TESTCONDITIONS


14

... (2007 part 2 of 2)


• “Holistic Test Analysis & Design”(with Mike Smith)

COM

PON

ENT

C

OM

PON

ENT

S

YSTE

M

SYS

TEM

S

ACCE

PTAN

CE

IN

TEGR

ATIO

N

I

NTE

GRAT

ION

Manual:- screen images viewed,- test log hand-written

Manual for changes:- examine interface log prints- view screens in each sys

Auto regression test:in-house test harness

Manual for changes:- database spot-checks- view screens, audit printAuto regression test:threads, approved tool

Manual + Auto:- varies, under team control

Manual for changes:- varies, under individual controlAuto regression test:per component, approved tool

update with care,documentationout of date

copy of live data,timing important,users all have access

ad-hoc data,unpredictable content,check early withsystem contacts

tailored to eachcomponent

contains sanitised livedata extracts

arrange data separation between teams

Use Cases

StateTransitions(all transitions,Chow 0-switch)

BoundaryValueAnalysis

1.0 over0.1 overon0.1 under1.0 under

CT-2.4.1CT-2.4.2CT-2.4.3CT-2.4.4CT-2.4.5

Main success scenarioExtension 2aExtension 4aExtension 4bExtension 6a

AT-8.5.1AT-8.5.2AT-8.5.3

AT-8.5.4

etc

etc

MPTUMPTCMPCMCMN

ST-9.7.1ST-9.7.2

ST-9.7.3

etc

• If you use an informal technique, state so here.• You may even invent new techniques!

TESTCONDITIONS

MANUAL/AUTOMATEDVERIFICATION/VALID’NRESULT CHECK METHOD

TEST DATACONSTRAINTS /INDICATIONS

TEST DATA IN

SCRIPTS?

TEST CASEDESIGNTECHNIQUES

“ ““ “

TEST SUITE /TEST / TEST CASEOBJECTIVES

TEST CASE /SCRIPT/ PROCEDUREIDENTIFIERS

“ ““ “

“ ““ “

“ ““ “


15

Now, a data-oriented view of test coverage: but relation to techniques?


Standingdata

Runningdata

System

C,R,U,DRunning

test data

Runningtest data

Select from /use all


Outputs

Create,Read,Update,Delete

C,R,U,D

BLACK-BOXtechniques?

GLASS-BOXtechniques?

(etc)...

• However: glass-box techniques still need data to drive them!

Input transactions

Inputdata

spread

Processing transactions

Storeddata

spread

Output transactions

Outputdata

spread


16

More about test data sources


System

Running test data

Runningtest data

Direct inputby tester


eg messages,transactions,records,files/tables,whole databases

Acquisition

POTENTIAL SOURCES OF TEST DATA

CHARACTERISTICS&

handling:............

Adapted from: Craig & Jaskiel 2002 (Table 6-2 and associated text) “OTHER BOOKS ARE AVAILABLE”!

Validation(calibration)

Change

VOLUME

VARIETY

Manuallycreated

Captured(by tool)

Tool/utilitygenerated

Random Production

Controllable Too muchToo little Controllable Controllable

Good Varies Varies Mediocre Mediocre

Difficult Easy EasyVariesFairly easy

Easy Fairly difficult Difficult DifficultVery difficult

VariesVariesEasy EasyUsually easy


17

Test data and the V/W-model


Contrived

Maybe live,else live-like

Contrived then/and live-like

AcceptanceTesting

SystemTesting

IntegrationTesting

Levels of specification

Requirements

Functional &NF specifica-

tions

Technicalspec, Hi-level

design

Detaileddesigns

UnitTesting

Levels of stakeholders

Business, Users,Business Analysts,Acceptance Testers

Architects,“independent” testers

Designers,integration testers

Developers,unit testers

Levels of integration

+ Businessprocesses

Levels of: testing...

Levels of review

...test data

Pilot /progressive

rollout

NF

Func

Live

Remember: not only for waterfall or V-model SDLCs, rather iterative / incremental go down & up through layers of stakeholders, specifications & system integrations


18

The V-model and techniques


GLASS-BOX → STRUCTURE -BASED

Contrived



AcceptanceTesting

SystemTesting

IntegrationTesting

UnitTesting

Levels of integration

+ Businessprocesses

Levels of: testing... ...test

dataPilot /

progressiverollout

NF

Func

LiveTechniques for being live-like?

Techniques contrived for coverage

BLACK-BOX → BEHAVIOUR-BASED

Source: BS 7925-2

............... Source: ISO/IEC/IEEE 29119-4 ...............

EXPERIENCE-BASED

Cause-Effect GraphingCombinatorial (All, Pairs, Choices)Classification Tree Decision TableBoundary Value Analysis Equivalence Partitioning

Random

ScenariosState Transitions

Error Guessing

.................... Source: BBST Test Design ....................

Domain ↑

• Input & output• Primary & secondary• Filters & consequences• Multiple variables

MAY BE EXPLO

RATORY, RISK-O

RIENTED

Tester-basedeg α, β, Paired

Coverage-basedeg Functions, Tours,

[Para-func] riskseg Stress, Usability

Activity-based, egUse Cases, All-pairs

Evaluation-basedeg Math oracle

Desired-resulteg Build verification


19

Test data for Unit Testing


Contrived...

IntegrationTesting

UnitTesting

Drivers Stubs

DATATo drive all techniques in use

MeasureGLASS-BOX coverage(manually? /by instrumentation)

Functional, eg validity checks:• intra-field & inter-fieldAny Non-Func wanted & feasible, eg:• local performance, usability

Input transactions

Inputdata

spread


Storeddata

spread

Output transactions

Outputdata

spread

Standingdata

Running test data

Runningtest data

Outputs


20

Test data for Integration Testing


Contrived...

SystemTesting

IntegrationTesting

Input interfacesInputdata

spread


Storeddata

spread

Output interfaces

Outputdata

spread

Standingdata

Running test data

Runningtest data

Outputs

UnitTesting

Drivers StubsOther unitswhen ready

Functional, eg boundary conditions:• null transfer, single-record, duplicatesAny Non-Func wanted & feasible, eg:• local performance, security

Running test data

Runningtest data

Some UNIDIRECTIONAL, some BIDIRECTIONAL


21

Test data for System Testing


Contrived

AcceptanceTesting

IntegrationTesting

Live-like

Inputtransactions

Inputdata

spread


Storeddata

spread

Outputtransactions

Outputdata

spread

Standingdata

Running test data

Runningtest data

Outputs

Eg:• performance (eg

response times)• peak business volumes• usability• user access security

Some FUNCTIONAL, some NON / PARA - FUNCTIONAL

Eg:• stress• volumes over-peak• contention• anti-penetration

security

All (believed)contrivable

Live-like Surprise?!

SystemTesting


22

Test data for Acceptance Testing etc


Full liverunning

AcceptanceTesting

SystemTesting


Inputtransactions

Inputdata

spread


Storeddata

spread

Outputtransactions

Outputdata

spread

Standingdata

Running test data

Runningtest data

Outputs

NF acceptance criteria, eg:• performance • volume• security

Some FUNCTIONAL, some NON / PARA - FUNCTIONAL

Live-like Any surprises herewon’t cause failures?

Pilot / progressiverollout

Follow up any issuesLive

Live

But any surprises heremay cause failures!


23

Test data often needs planning & acquiring earlier than you may think!


• Data to check validity? (esp. inter-field)• Who writes stubs & drivers, and when?• How get enough data to test perf early?

• How to select location, scope etc of pilot• Planning rollout sequence• Planning containment & fix of any failures• Deciding whether live and/or live-like• For a new system, how much live exists yet?• Rules, permissions & tools for obfuscation

• Top-down / bottom-up affects data coord?• Still stubs & drivers, but also harnesses,

probes, analysers?

• Any live-like avail yet? How know like live?• Tools to replicate data to volumes, while

keeping referential integrity

Contrived



AcceptanceTesting

SystemTesting

IntegrationTesting

UnitTesting

Levels of: testing... ...test

dataPilot /

progressiverollout

NF

Func

Live


24

“Experience report”: general advice, and pitfalls / concerns• Start tests with empty data stores, then progressively add data• Standing data can/should be complete, but running (and transactional) data need only be a

representative subset – until volume etc & acceptance testing • If different teams can own data, pre-agree codings & value ranges• If able to automate tests (usual caveats applying), use a data-driven framework (more later

about targeted test data tools)• Keep backups, and/or use database checkpointing facilities, to be able to refresh back to

known data states (but remember this is not live-like!)• Regression testing should occur at all levels, and needs stable data baselines


• Pitfalls:– not having planned test data carefully / early enough– insufficiently rich, traceable, and/or embarrassingly silly test data values – difficulties with referential integrity– despite huge efforts, not getting permission to use live data (even obfuscated) – if actual live data is too large, subsetting is not trivial (eg integrity)– test data leaking into live!!

• Concerns:– (from personal experience, also anecdotally) real projects we meet

don’t have time/expertise to craft with techniques – at least, not very explicitly


25

Summary of “old school” approach to test data


Contrived



AcceptanceTesting

SystemTesting

IntegrationTesting

UnitTesting

Pilot /progressive

rollout

NF

Func

Live

Standing

data

Running test data

Runningtest data

5. SOME TOOL USE, EG FOR FUNC TEST AUTOMATION,

REPLICATING DATA FORPERF/VOLUME TESTS

1. POOR RELATION OFARTEFACT FAMILY

2. (IN THEORY)DRIVEN BY TECHNIQUES:MAINLY BLACK-BOX

3. DIFFERENT STYLES/EMPHASESAT DIFFERENT LEVELS

4. MAY BE THOUGHT OF ASBUILDING CRUD USAGE...

...ACROSS NOT ONLY INPUTS,BUT ALSO PROCESSING & OUTPUTS(EG VIA DOMAIN TESTING)

Outputs

eg TestFrame


26

(past & present): Data structures & Object Orientation

Part B



27

Test data in object-oriented methods• The famous 1191-page book:

– no “test data” in index! (nor data, nor persistence – but maybe because data is/are “encapsulated”)

– emphasis on automated testing, though “manual testing, of course, still plays a role”

– structure of book is Models, Patterns & Tools:• applying combinational test models (decision/truth tables,

Karnaugh-Veitch matrices, cause-effect graphs) to Unified Modelling Language (UML) diagrams (eg state transitions)

• Patterns – “results-oriented” (*not* glass/black box) test design at method, class, component, subsystem, integration & system scopes

• Tools – assertions, oracles & test harnesses © Thompson information Systems Consulting Ltd


28


Diagram examples: agilemodeling.com

[etc]

Use case BEHAVIOURAL

STRUCTURAL

Activity

Sequence

State

Collaboration (now Communication)

Class

Object

Component

Deployment

Method

Class

Component

Subsystem

Integration

System

• (more industrial archaeology!)

A UML V-model

• Since then, UML (now v2) has expanded to 14 diagram types, but anyway how many of these 9 do you typically see?


29

If functional / data / OO diagrams not provided: you could build your own

• Note: parts of UML extend/develop concepts from older-style models, eg SSADM (Structured Systems Analysis and Design Method):– Use Cases built on Business Activity Model,

BAM– Class diagrams built on Logical Data Model,

LDM (entity relationships)– Activity diagrams on Data Flow Model, DFM– Interaction diagrams on Entity Life History,

ELH (entity event modelling)• And (according to Beizer and many since)

testers may build their own models – even potentially invent new ones © Thompson

information Systems Consulting LtdDiagram examples: visionmatic.co.uk, umsl.edu, paulherber.co.uk, jacksonworkbench.co.uk


30

And, if you have much time / a desire for selecting carefully...


Version of Zachman framework from icmgworld.comSee also modified expansion in David C. Hay, Data Model Patterns – a Metadata Map


31

(the present): Agile & Context-Driven

Part C



32

Analogy with scientific experiment: hypothesis “all swans are white”

• So far, we have been setting out conditions (→ cases) we want to test, then contriving/procuring test data to trigger those conditions


Test Data

Test Strategy

Test Plan

Test Conditions

Test Cases

Test Procedures/Scripts• This is like scientific hypothesis,

then experiment to confirm/falsify:– test whiteness of swans (hmm:

cygnets grey, adult birds may be dirty)

– but only by going to Australia could early observers have found real, wild black swans

• See also “Grounded Theory”


33

From top-down to bottom-up: what if data triggers unplanned conditions?

• “Old-school” methods don’t seem to consider this? Neither do OO & UML in themselves?

• But agile can, and Context-Driven does... © Thompson information Systems Consulting Ltd

Test Data

Test Strategy

Test Plan

Test Conditions

Test Cases

Test Procedures/Scripts

Testing

Test Data


34

Agile on test data• The Agile/agile methods/”ecosystems” ?

– Prototyping, Spiral, Evo, RAD, DSDM– USDP, RUP, EUP, AUP, EPF-OpenUP, Crystal, Scrum, XP,

Lean, ASD, AM, ISD, Kanban, Scrumban• The “Drivens”:

– A(T)DD, BDD, CDD, DDD, EDD, FDD, GDD, HDD, IDD, JSD, KDS, LFD, MDD, NDD, ODD, PDD, QDD, RDD, SDD, TDD, UDD, VDD, WDD, XDD, YDD, ZDD*

• Scalable agile frameworks (SAFe, DAD, LeSS etc) ?• Lisa Crispin & Janet Gregory:

– Agile Testing; More Agile Testing © Thompson information Systems Consulting Ltd* I fabricated only four of these – can you guess which?


35

Driven by test data?


Source: Gojko Adzic via neuri.co.uk

eg ACCEPTANCE TESTS:As a <Role> I want <Feature> so that <Benefit>

Source: Wikipedia [!]

eg UNIT TESTS:SetupExecutionValidationCleanup

ACCEPTANCE CRITERIA:Given <Initial context>when <Event occurs>then <ensure some Outcomes>

Source: Aaron Kromer via github.com

Source: Gojko Adzic via gojko.net:“a good acceptance test”

SPECIFICATION:When <Executable example 1>and <Executable example 2>then <Expected behaviour>

Source: The RSpec Book, David

Chelimsky, Dan North etc

ATDDSBE


36

Crispin & Gregory on test data


• These books were written partly because so many agile methods say (often deliberately) so little about testing

• Within “Strategies for writing tests” – test genesis/design patterns include:– Build-Operate-Check, using multiple input data

values– Data-Driven testing

• Much on TDD, BDD & ATDD (mentioning Domain Specific Languages, Specification By Example etc)

• Guest article by Jeff Morgan on test data management


37

Beyond agile to DevOps


• DevOps extends (“Shifts Right”) agile concepts into operations, ie live production

• Multiple aspects, but especially needs more specialised tools, eg:

• This includestest datageneration & managementtools, eg:

continuousautomation.com

techbeacon.com

techarcis.com

(formerly Grid Tools)


38

Context-Driven on test data

• The books:– actually there *are* more than one!

• BBST courses (BBST is a registered trademark of Kaner,

Fiedler & Associates ,LLC):– Foundations– Bug Advocacy– Test Design

• Rapid Software Testing course (James Bach & Michael Bolton) © Thompson

information Systems Consulting Ltd


39

Context-Driven on test data: books

• Jerry Weinberg:– rain gauge story– composition & decomposition fallacies

• Kaner, Falk & Nguyen:– [static] testing of data structures & access

• Kaner, Bach (James) & Pettichord:– 103 & 129 Use automated techniques to

extend reach of test inputs, eg:• models, combinations, random, volume

– 127 & 130 Data-driven automation separating generation & execution:

• tabulate inputs & expected outputs• easier to understand, review & control



40

Context-Driven on test data: courses• BBST:

– Foundations:• a typical context includes creating test data sets with well-understood attributes, to be used in

several tests– Bug Advocacy:

• may need to analyse & vary test data during the “Replicate, Isolate, Generalise, Externalise” reporting elements

– Test Design:• significant emphasis on Domain Testing

• Rapid Software Testing:– use diversified, risk-based strategy, eg:

• “sample data” is one of the tours techniques– “easy input” oracles include:

• populations of data which have distinguishable statistical properties

• data which embeds data about itself• where output=input but state may have changed

– if “repeating” tests, exploit variation to find more bugs, eg:

• substitute different data• vary state of surrounding system(s)


NB this illustrationis from csestudyzone.blogspot.co.uk


41

Random testing & Hi-Volume Automation• Random testing:

– counts as a named technique in many taxonomies– may be random / pseudo-random (advantages of no bias)...– ...wrt data values generated, selection of data from pre-populated tables, sequence of

functions triggered etc– may be guided by heuristics (partial bias)– “monkey testing” does not do it justice, but difficult to specify oracles/expected results

• But... James Bach & Patrick J. Schroeder paper:– empirical studies found no significant difference in the defect detection efficiency of

pairwise test sets and same-size randomly selected test sets, however...– several factors need consideration in such comparisons

• And... Cem Kaner on HiVAT:– “automated generation, execution and evaluation of arbitrarily many tests. The

individual tests are often weak, but taken together, they can expose problems that individually-crafted tests will miss” – examples:

• inputs-focussed: parametric variation, combinations, fuzzing, hostile datastream• exploiting oracle: function equivalence, constraint checks, inverse

operations, state models, diagnostic• exploiting existing tests/tools: long-sequence regression, hi-volume

protocol, load-enhanced functional



42

(present & future): Cloud, Big Data, IoT/E (oh, and AI, still)

Part D



Beyond driving: now let’s fly!

© Thompson information Systems Consulting Ltd 43

• Test data for cloud systems...


44

(Test) data in cloud systems• In the olden days, the data was in one known place• But in cloud computing:

– system specifications & physical implementations are abstracted away from users, data replicated & stored in unknown locations

– resources virtualised, pooled & shared with unknown others • Differing deployment models: private, community, public, hybrid• Non/para-functional tests prioritised & complicated (eg “elasticity”, service

levels & portability) even more than plain internet systems; huge data sizes, incl. auto-generated for mgmt

• From my own experience with Twitter etc:– no single source of truth at a time; notifications ≠ web or mobile app view; updates

prioritised & cascaded, sequence unpredictable– testing extends into live usage (eg “test on New Zealand first”)

• Particular considerations for data when migrating an in-house system out to cloud (IaaS / PaaS / SaaS)

• Dark web not indexable by search engines (eg Facebook!)• So, testing & test data more difficult? © Thompson


Sources: Barrie Sosinsky, Cloud Computing “Bible” Blokland, Mengerink & Pol – Testing Cloud Services

See Erratum slide 68


45

Big data• An extension of data warehouse → business intelligence concepts, enabled by:

– vastness of data (some cloud, some not), Moore’s Law re processing power– new data, eg GPS locations & biometrics from mobile devices & wearables– tools which handle diverse, unstructured data (not just neat files / tables / fields) –

importance of multimedia & metadata– convergences: Social, Mobile, Analytics & Cloud; Volume, Variety, Velocity

• Not just data for a system to create/add value: but value from data itself• Exact → approximate; need not be perfect for these new purposes• Away from rules & hypotheses, eg language translation by brute inference • This extends the “bottom-up” emphasis I have been developing• A key aim is to identify hitherto unknown (or at least unseen) patterns,

relationships & trends – again a testing challenge, because not predictable, what are “expected results”?

• So contrived test data may be no use – need real or nothing? • (And beware, not all correlations are causations – but users may still be happy

to use for decision-making) © Thompson information Systems Consulting Ltd

Sources: Mayer-Schönberger & Cukier – Big Data Minelli, Chambers & Dhiraj – Big Data, Big Analytics


46

When Data gets “big”, does it grow up into something else?


Sources: above – Matthew Viel as in US Army CoP, via Wikipedia below – Karim Vaes blog, below right – Bellinger, Castro & Mills at systems-thinking.org

• (two axesbut no distinction?)

• other perspectives........

David McCandless, Malcolm Pritchardinformationisbeautiful.net cademy.isf.edu.hk fluks.dvrlists.com

(Respectively above)...................................................

applied

organised

discrete

linked

values,virtues,vision

experience,reflection,

understandingmeaning,memorysymbols,senses

signals,know-nothing

useful,organised,structured

contextual,synthesised,

learning

understanding,integrated,actionable

+ DECISION!


47

Another of the many views available


Source(s): Avinash Kaushik (kaushik.net/avinash/great-analyst-skills-skepticism-wisdom) quoting by David Somerville, based on a two pane version by Hugh McLeod


48

No, it did not have to be a pyramid: plus here are several extra resonances


• like Verification & Validation?

• T & E alsoquoted(reversed)elsewhereas Explicit& Tacit*

Source: Michael Ervick,via systemswiki.org

* Harry Collins after Michael Polanyi; quotation by “Omegapowers” on Wikipedia


49

Data Science• A new term arising out of Big Data & Analytics?

How much more than just statistics?


DataScience

Modified after Steven Geringer:

• Data used to be quite scientific already?– sets, categories & attributes– types, eg strings, integers, floating-point– models & schemas, names & representations,

determinants & identifiers, redundancy & duplication, repeating groups

– flat, hierarchical, network, relational, object databases

– normalisation, primary & foreign keys, relational algebra & calculus

– distribution, federation, loose/tight coupling, commitment protocols

– data quality rules

• But now... (this is only one of several available alternatives)

• And blobs hide hypothesising, pattern recognition, judgement, prediction skills


50

Internet of Things


collaborative.com

pubnub.com

• Extends “Social, Mobile, Analytics & Cloud”

• Even more data – and more diverse

• Identifying & using signals amid “noise”

• So, new architectures suggested

• Maybe nature can help

• Yet more testing difficulty!

Francis daCosta: Rethinking the IoT

eg see Paul Gerrard’s articles


51

Entropy & information theory• Cloud, Big Data & IoT are all bottom-up

disrupters of the old top-down methods• Are there any bottom-up theories which might

help here?


After hyperphysics.phy-astr.gsu.edu

“temperature”of gas

energies ofindividualmolecules

Ito & Sagawa, nature.com

BOLTZMANN etc

SHANNON etc


52

Information grows where energy flows

© Thompson information Systems Consulting Ltd Image from http://www.aaas.org/spp/dser/03_Areas/cosmos/perspectives/Essay_Primack_SNAKE.GIF

Sources: Daniel Dennett “Darwin’s Dangerous Idea” “cosmic Ouroboros” (Sheldon Glashow, Primack & Abrams, Rees etc)

Mathematics

EVOLU

TION

& “EM

ERGENCE”

Neil Thompson: Value Flow ScoreCards

Daniel Dennett: platforms & cranes

Physics (QuantumTheory end) Physics (General

Relativity end)

Chemistry (inorganic)

Chemistry (organic)Biology

Humans

Tools

Languages

Books

Information Technology

ArtificialIntelligence

Physics (String Theories & rivals)

Geography

Geology

Astronomy

http://www.aaas.org/spp/dser/03_Areas/cosmos/perspectives/Essay_Primack_SNAKE.GIF


53

Evolution & punctuated equilibria

• Here’s anotherbottom-up theory


“Punctuated equilibra” idea originated by Niles Eldredge & Stephen Jay GouldImages from www.wikipedia.org

Sophistication

Diversity“Gradual” Darwinsim

Sophistication

DiversityPunctuated equilibria

“Explosion” in species, eg Cambrian

Spread into new niche,eg Mammals

Mass extinction, eg Dinosaurs

(equilibrium)

(equilibrium)

(equilibrium)

Sophistication

Diversity

Number ofspecies

http://www.wikipedia.org/


54


Punctuated equilibria in information technology?

Computers1GL

ObjectOrientation

Internet,Mobile devices

ArtificialIntelligence?!

4GL

3GL

2GL

• Are we ready to test AI??


55

Artificial Intelligence• Remember this?...............• But there’s

much more!

• I keep treating it as the future, but much is already here, or imminent sooner than you may think?

• Again there is the “oracle problem”:– what are the “expected results”– how can we predict emergent things?– who will determine whether good, or bad, or...?


DataScience

legaltechnology.com


56


More about Emergence: progress along order-chaos edge?

Physics

Social sciences

Chemistry

Biology

• For best innovation & progress, need neither too much order nor too much chaos• “Adjacent Possible”

ORDER

CHAO

S

ORDER

CHAO

S

ORDER

CHAO

S

ORDER

CHAO

S

Extrapolation from various sources, esp. Stuart Kauffman, “The Origins of Order”, “Investigations”

jurgenappelo.com


57


So, back to test data

Computers

Artificial Intelligence

Object Orientation

Internet, Mobile

• Ross Ashby’s Law of Requisite Variety• Make your test data “not too uniform, not too random”

ORDER

CHAO

S

ORDER

CHAO

S

ORDER

CHAO

S

ORDER

CHAO

S

• SMAC + IoT +AI will be an ecosystem?• Which needs Data Science to manage it??



• No, maybe we do indeed need rocket science!

Summary ArtificialIntelligence DECISIONS

DATA

INFORMATION

KNOWLEDGE

WISDOM

INSIGHT

IoT

Cloud

Mobile

Social

Analytics

Emergence


59

The key points – test data is (are):

• More fundamental than you probably think, because:– test conditions & cases don’t “really” exist until triggered by data– whether data is standing or running can affect design of tests

• More interesting, because:– considerations (eg contrived, live-like) vary greatly at different

levels in the V-model– data was always a science (you may have missed that) in many

respects• Changing, through:

– agile & context-driven paradigms (both of which are still evolving)– cloud, big data, Internet of Things and Artificial Intelligence– these changes are arguably moving from a top-down approach

(via test conditions & cases) to a more bottom-up / holistic worldview



60

Takeaway messages

• Despite its apparent obscurity & tedium, test data actually unifies a strand through key concepts of test techniques, top-down v bottom-up approaches, SDLC methods and even (arguably) “emergence”

• Whatever your context, think ahead – much more needs to be decided than the literature makes clear, and much of it is non-trivial

• Don’t think only test data, think test information, knowledge, wisdom – and insight & decisions!

• This embraces the distinctions between:– tacit /explicit knowledge– verification / validation (≈ checking / testing)

• The “future” is already here, in many respects – honest inquiry and research can cut through much of the hype – don’t get left behind



61

Main references• Standards:

– IEEE 829-1998– BS 7925-2– ISO/IEC/IEEE 29119-2:2013 & 4:2015

• Industrial archaeology from my own past:– EuroSTAR 1993: Organisation before Automation– EuroSTAR 1999: Zen & the Art of Object-Oriented Risk Management– Book 2002: Risk-Based E-Business Testing (Paul Gerrard lead author)– STARWest 2007: Holistic Test Analysis & Design (with Mike Smith)

• Testing textbooks:– Craig & Jaskiel: Systematic Software Testing (2002)– Binder: Testing Object-Oriented Systems (2000)– Weinberg: Perfect Software and Other Illusions about Testing (2008)– Kaner, Falk & Nguyen: Testing Computer Software (2nd ed, 1999)– Kaner, Bach & Pettichord: Lessons Learned in Software Testing (2002)– Crispin & Gregory: Agile Testing (2009) & More Agile Testing (2015)

• Testing training courses:– BBST Foundations, Bug Advocacy & Test Design (Kaner et al.)– Rapid Software Testing (Bach & Bolton)



62

Main references (continued)• Data / keyword / table driven test automation:

– Buwalda, Janssen & Pinkster: Integrated Test Design & Automation using the TestFrame method (2002)

• Methods:– Structured Systems Analysis & Design Method (SSADM)– Unified Modelling Language (UML)– The Zachman framework (eg as in Hay: Data Model Patterns – a Metadata Map)

• Data generally:– Kent: Data & Reality (1978 & 1998)– Howe: Data Analysis for Database Design (1983, 1989 & 2001)

• Agile methods textbooks:– Highsmith: Agile Software Development Ecosystems (2002)– Boehm & Turner: Balancing Agility & Discipline (2004)– Adzic: Bridging the Communication Gap (2009)– Gärtner: ATDD by Example (2013)– Appelo: Management 3.0 (2010) – maybe post-agile?



63

Main references (continued)• Cloud, Big Data, Internet of Things:

– Sosinsky: Cloud Computing “Bible” (2011)– Blokland, Mengerink & Pol: Testing Cloud Services (2013)– Mayer-Schönberger & Cukier: Big Data (2013)– Minelli, Chambers & Dhiraj: Big Data, Big Analytics (2013)– daCosta: Rethinking the Internet of Things (2013)

• Entropy, emergence, Artificial Intelligence etc:– Kauffman: The Emergence of Order (1993) & Investigations (2000) – Dennett: Darwin’s Dangerous Idea (1995)– Taleb: Fooled by Randomness (2001) & The Black Swan (2007)– Gleick: The Information (2011)– Morowitz: The Emergence of Everything (2002)– Birks & Mills: Grounded Theory (2011)– Kurzweil: The Singularity is Near (2005)

• Websites:– many (see individual credits annotated on slides)



64

Thanks for listening (and looking)!

Neil Thompson @neilttweet [email protected] linkedin.com/in/tisclThompson information Systems Consulting Ltd ©Thompson


Questions?

Contact information:


65

Answers to *DD quiz(ie the four I fabricated)

• XDD: eXtremely Driven Development (ie micromanaged)

• LFD: Laissez Faire Development• WDD: Weakly Driven Development• NDD: Not Driven Development



66

Appendix: the most convincing *DD examples which I didn’t fabricate

• A(T)DD: Acceptance (Test) Driven Development

• B: Behaviour Driven Development• C: Context Driven Design• D: Domain Driven Design• E: Example Driven Development• F: Feature Driven Development• G: Goal Driven Process• H: Hypothesis Driven

Development• I: Idea Driven Development• J: Jackson System Development• K: Knowledge Driven Software• M: Model Driven Development


• O: Object Driven Development• P: Process Driven Development• Q: Quality Driven Development• R: Result Driven Development• S: Security Driven Development • T: Test Driven Development• U: Usability Driven Development • V: Value Driven Development• Y: YOLO (You Only Live Once)

Development• Z: Zero Defects Development


67

Appendix: best *DD runner-up

Quantum Driven Development: • works only on hardware that hasn't yet been

invented• works; oh no it doesn’t; oh now it does...• works & doesn’t work at the same time• uncertain whether or not it works• there’s a probability function that...• [That’s enough QDD: Ed.]


secretgeek.net


68

Erratum

• Slide 44: Sorry, Facebook is no longer “dark web”. I now see that it was, sort-of, only before 2007 and I read in a 2011 book that it still was, but this seems wrong and I didn’t test it carefully enough! – Obviously much depends on specific privacy

settings– Maybe deep content is still not externally

crawlable?© Thompson information Systems Consulting Ltd