90
The FITTEST project is funded by the European Commission (FP7-ICT-257574) Tanja E.J. Vos Centro de Investigación en Métodos de Producción de Software Universidad Politecnica de Valencia Valencia, Spain [email protected] 1 European initiatives where academia and industry get together.

TAROT summerschool slides 2013 - Italy

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Tanja E.J. Vos Centro de Investigación en Métodos de Producción de Software

Universidad Politecnica de Valencia Valencia, Spain

[email protected]

1

European initiatives where academia and industry get together.  

Page 2: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Overview  

•  EU  projects  (with  SBST)  that  I  have  been  coordina;ng:  –  EvoTest  (2006-­‐2009)  

–  FITTEST  (2010-­‐2013)  

•  What  is  means  to  coordinate  them  and  how  they  are  structured  

•  How  do  we  evaluate  their  results  through  academia-­‐industry  projects.  

2

Page 3: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

EvoTest  •  Evolutionary Testing for Complex Systems •  September 2006– September 2009 •  Total costs: 4.300.000 euros •  Partners:  

–  Universidad  Politecnica  de  Valencia  (Spain)  –  University  College  London  (United  Kingdom)  –  Daimler  Chrysler  (Germany)  –  Berner  &  MaVner  (Germany)  –  Fraunhofer  FIRST  (Germany)  –  Motorola  (UK)  –  Rila  Solu;ons  (Bulgaria)  

•  Website  does  not  work  anymore

Page 4: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

EvoTest  objec;ves/results  

•  Apply  Evolu;onary  Search  Based  Tes;ng  techniques  to  solve  tes;ng  problems  from  a  wide  spectrum  of  complex  real  world  systems    in  an  industrial  context.    

•  Improve  the  power  of  evolu;onary  algorithms  for  searching  important  test  scenarios,  hybridising  with  other  techniques:  

–  other  general-­‐purpose  search  techniques,  

–  other  advanced  so_ware  engineering  techniques,  such  as  slicing  and  program  transforma;on.    

•  An  extensible  and  open  Automated  Evolu;onary  Tes;ng  Architecture  and  Framework  will  be  developed.  This  will  provide  general  components  and  interfaces  to  facilitate  the  automa;c  genera;on,  execu;on,  monitoring  and  evalua;on  of  effec;ve  test  scenarios.  

 

 

4

Page 5: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

FITTEST  •  Future Internet Testing •  September 2010 – December 2013 •  Total costs: 5.845.000 euros •  Partners:  

–  Universidad  Politecnica  de  Valencia  (Spain)  –  University  College  London  (United  Kingdom)  –  Berner  &  MaVner  (Germany)  –  IBM  (Israel)  –  Fondazione  Bruno  Kessler  (Italy)  –  Universiteit  Utrecht  (The  Netherlands)  –  So_team  (France)    

•   hVp://www.pros.upv.es/fiVest/  

Page 6: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Future Internet Applications –  Characterized by an extreme high level of dynamism –  Adaptation to usage context (context awareness) –  Dynamic discovery and composition of services –  Etc..

•  Testing of these applications gets extremely important •  Society depends more and more on them •  Critical activities such as social services, learning, finance, business.

•  Traditional testing is not enough –  Testwares are fixed

•  Continuous testing is needed –  Testwares that automatically adapt to the dynamic behavior of the

Future Internet application –  This is the objective of FITTEST

FITTEST  objec;ves/results  

Page 7: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574) ORACLES

RUN SUT

GENERATE TEST CASES

AUTOMATETEST CASES

SUT

INSTRUMENT

COLLECT & PREPROCESS

LOGS

LOGGING TEST-WARE GENERATION

ANALYSE & INFER MODELS

ANALYSE & INFER ORACLES

EXECUTE TEST CASES

TEST EVALUATION

EVALUATE TEST CASES

MODEL BASED ORACLES

HUMAN ORACLES

TEST RESULTS

TEST EXECUTION

FREE ORACLES

Model based oracles

PROPERTIES BASED ORACLES

Log based oracles

End-users

PATTERN BASED ORACLES

MANUALLY

Domain experts Domain Input Specifications

OPTIONAL MANUAL EXTENSIONS

How  does  it  work?  

LOGGING

1.  Run the target System that is Under Test (SUT)

2.  Collect the logs it generates This can be done by: •  real usage by end users of the application

in the production environment

•  test case execution in the test environment.

Page 8: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

How  does  it  work?    

ORACLES

RUN SUT

GENERATE TEST CASES

AUTOMATETEST CASES

SUT

INSTRUMENT

COLLECT & PREPROCESS

LOGS

LOGGING TEST-WARE GENERATION

ANALYSE & INFER MODELS

ANALYSE & INFER ORACLES

EXECUTE TEST CASES

TEST EVALUATION

EVALUATE TEST CASES

MODEL BASED ORACLES

HUMAN ORACLES

TEST RESULTS

TEST EXECUTION

FREE ORACLES

Model based oracles

PROPERTIES BASED ORACLES

Log based oracles

End-users

PATTERN BASED ORACLES

MANUALLY

Domain experts Domain Input Specifications

OPTIONAL MANUAL EXTENSIONS

GENERATION

1.  Analyse the logs 2.  Generate different testwares:

• Models •  Domain Input Specification • Oracles

3.  Use these to generate and automate a test suite consisting off:

•  Abstract test cases •  Concrete test cases •  Pass/Fail Evaluation criteria

Page 9: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

How  does  it  work?    

ORACLES

RUN SUT

GENERATE TEST CASES

AUTOMATETEST CASES

SUT

INSTRUMENT

COLLECT & PREPROCESS

LOGS

LOGGING TEST-WARE GENERATION

ANALYSE & INFER MODELS

ANALYSE & INFER ORACLES

EXECUTE TEST CASES

TEST EVALUATION

EVALUATE TEST CASES

MODEL BASED ORACLES

HUMAN ORACLES

TEST RESULTS

TEST EXECUTION

FREE ORACLES

Model based oracles

PROPERTIES BASED ORACLES

Log based oracles

End-users

PATTERN BASED ORACLES

MANUALLY

Domain experts Domain Input Specifications

OPTIONAL MANUAL EXTENSIONS

Execute the test cases and start a new test cycle for continuous testing and adaptation of the test wares!

Page 10: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

What  does  it  mean  to  be  an  EU    project  coordinator  

•  Understand what the project is about, what needs to be done and what is most important.

•  Do NOT be afraid to get your hands dirty. •  Do NOT assume people are working as hard on the project

as you are (or are as enthusiastic as you ;-) •  Do NOT assume that the people that ARE responsible for

some tasks TAKE this responsibility •  Get CC-ed in all emails and deal with it •  Stalk people (email, sms, whatsapp, skype, voicemail

messages) •  Be patient when explaining the same things over and over

again 10

Page 11: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

EU  project  structures  •  We have: WorkPackages (WP) •  These are composed of: Tasks •  These result in: Deliverables

WP  

Project  M

anagem

ent  

WP  

Exploita;on  and  Dissemina;on  

WP:  Integrated  it  all  together  in  a  superduper  solu;on  that  industry  needs  

WP:  Do  some  research  

WP:  And  some  more  

WP:  Do  more  research  

WP:  And  more……..  

WP:  Evaluate  Your  Results  through  Case  Studies  

Page 12: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

EU  project  how  to  evaluate  your  results  

•  You need to do studies that evaluate the resulting testing tools/techniques within a real industrial environment

12

Page 13: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

WE  NEED  MORE  THAN  …  

Glenford Myers 1979! Triangle Problem

Test a program which can return the type of a triangle based on the widths of the 3 sides

For  evalua;on  of  tes;ng  tools  we  need  more!  b a

c

α β

γ

Page 14: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

WE  NEED….  

Real  people   Real  faults  

Real  systems  

Real  Tes;ng  Environments  

Real  Tes;ng  Processes  

Empirical  studies  with  

Page 15: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

What  are  empirical  studies  •  [Wikipedia]  Empirical  research  is  a  way  of  gaining  knowledge  by  means  of  

direct  and  indirect  observa8on  or  experience.  •  [PPV00]  An  empirical  study  is  really  just  a  test  that  compares  what  we  

believe  to  what  we  observe.  Such  tests  when  wisely  constructed  and  executed  play  a  fundamental  role  in  so?ware  engineering,  helping  us  understand  how  and  why  things  work.  

•  Collec;ng  data:  –  Quan;ta;ve  data  -­‐>  numeric  data  –  Qualita;ve  data  -­‐>  observa;ons,  interviews,  opinions,  diaries,  etc.  

•  Different  kinds  are  dis;nguished  in  literature  [WRH+00,  RH09]  –  Controlled  experiments  –  Surveys  –  Case  Studies  –  Ac;on  research  

15

Page 16: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Controlled  experiments  What  Experimental   inves;ga;on   of   hypothesis   in   a   laboratory   semng,   in   which  condi;ons  are  set  up  to  isolate  the  variables  of  interest  ("independent  variables")  and   test   how   they   affect   certain   measurable   outcomes   (the   "dependent  variables")  

Good  for  –  Quan;ta;ve  analysis  of  benefits  of  a  tes;ng  tool  or  technique  –  We  can  use  methods  showing  sta;s;cal  significance  –  We  can  demonstrate  how  scien;fic  we  are!  [EA06]  

Disadvantages  –  Limited  confidence  that  laboratory  set-­‐up  reflects  the  real  situa;on  –  ignores  contextual  factors  (e.g.  social/organiza;onal/poli;cal  factors)  –  extremely  ;me-­‐consuming  

See:  [BSH86,  CWH05,  PPV00,  Ple95]     16

Page 17: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Surveys  What  Collec;ng  informa;on  to  describe,  compare  or  explain  knowledge,  amtudes  and  behaviour  over  large  popula;ons  using  interviews  or  ques0onnaires.  

Good  for  –  Quan;ta;ve  and  qualita;ve  data  –  Inves;ga;ng  the  nature  of  a  large  popula;on  –  Tes;ng  theories  where  there  is  liVle  control  over  the  variables  

Disadvantages  –  Difficul;es  of  sampling  and  selec;on  of  par;cipants  –  Collected  informa;on  tends  to  subjec;ve  opinion  

See:  [PK01-­‐03]    

17

Page 18: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Case  Studies  What  A  technique  for  detailed  exploratory  inves;ga;ons  that  aVempt  to  understand  and  explain  phenomenon  or  test  theories  within  their  context  

Good  for  –  Quan;ta;ve  and  qualita;ve  data  –  Inves;ga;ng  capability  of  a  tool  within  a  specific  real  context  –  Gaining  insights  into  chains  of  cause  and  effect  –  Tes;ng  theories  in  complex  semngs  where  there  is  liVle  control  over  the  

variables  (Companies!)  

Disadvantages  –  Hard  to  find  good  appropriate  case  studies  –  Hard  to  quan;fy  findings  –  Hard  to  build  generaliza;ons  (only  context)  

See:  [RH09,  EA06,  Fly06]      

18

Page 19: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Ac;on  Research  What  research   ini;ated   to   solve   an   immediate   problem   (or   a   reflec;ve   process   of  progressive   problem   solving)   involving   a   process   of   ac;vely   par;cipa;ng   in   an  organiza;on’s  change  situa;on  whilst  conduc;ng  research  

Good  for  –  Quan;ta;ve  and  qualita;ve  data  –  When  the  goal  is  solving  a  problem.  –  When  the  goal  of  the  study  is  change.  

Disadvantages  –  Hard  to  quan;fy  findings  –  Hard  to  build  generaliza;ons  (only  context)  

See:  [RH09,  EA06,  Fly06,  Rob02]      

19

Page 20: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

EU  project  how  to  evaluate  your  results  

•  You need to do empirical studies that evaluate the resulting testing tools/techniques within a real industrial environment

•  The empirical study that best fits our purposes is Case Study –  Evaluate the capability of our testing techiques/tools

–  In a real industrial context

–  Comparing to current testing practice

20

Page 21: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Case  studies:  not  only  for  jus;fying  research  projects  

•  We  need  to  apply  our  results  in  industry  to  understand  the  problems  they  have  and  if  we  are  going  the  right  direc;on  to  solve  them.  

•  Real  need  in  so_ware  engineering  industry  to  have  general  guidelines  on   what   tes;ng   techniques   and   tools   to   use   for   different   tes;ng  objec;ves,  and  how  usable  these  techniques  are.  

•  Up  to  date  these  guidelines  do  not  exist.  

•  If  we  would  have  a  body  of  documented  experiences  and  knowledge  from  which  the  needed  guidelines  can  be  extracted  

•  With   these   guidelines,   tes;ng   prac;cioners   might   make   informed  decisions  about  which  techniques  to  use  and  es;mate  the  ;me/effort  that  is  needed.  

21

Page 22: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

The  challenge  

TO DO THIS WE HAVE TO:

•  Perform more evaluative empirical case studies in industrial environments •  Carry out these studies by following the same methodology, to enhance the

comparison among the testing techniques and tools, •  Involve realistic systems, environments and subjects (and not toy-programs

and students as is the case in most current work). •  Do the studies thoroughly to ensure that any benefit identified during the

evaluation study is clearly derived from the testing technique studied, and also to ensure that different studies can be compared.

FOR THIS WE NEED:

•  A general methodological evaluation framework that can simplify the design of case studies for comparing software testing techniques and make the results more precise, reliable, and easy to compare.

To create a body of evidence consisting of evaluative studies of testing techniques and tools that can be used to understand the needs of industry and derive general guidelines about their usability and applicability in industry.

Page 23: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware  tes;ng  techniques  and  or  tools.  

Secondary  study  (EBSE)  

TT1  CS1  CS2  

CSn  

TT2  CS1  CS2  

CSn  

TTi  CS1  CS2  

CSn  

TTn  CS1  CS2  

CSn  

Body of Evidence

Tes;ng  Techniques  and  Tools  (TT)  

Prim

ary  case  stud

ies  

Gen

eral  Framew

ork  for  E

valuaD

ng  TesDn

g  Techniqu

es  and

 Too

ls        

InstanDate    

The  idea  

Page 24: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Very  brief…what  is  EBSE  Evidence  Based  So_ware  Engineering  

•  The   essence   of   the   evidence-­‐based   paradigm   is   that   of   systema;cally  collec0ng  and  analysing  all  of  the  available  empirical  data  about  a  given  phenomenon   in   order   to   obtain   a   much   wider   and   more   complete  perspec;ve   than  would   be   obtained   from   an   individual   study,   not   least  because  each  study  takes  place  within  a  par;cular  context  and  involves  a  specific  set  of  par;cipants.  

•  The  core  tool  of  the  evidence-­‐based  paradigm  is  the  Systema;c  Literature  Review  (SLR)  

–  Secondary  study  –  Gathering  an  analysing  primary  stydies.    

•  See:  hVp://www.dur.ac.uk/ebse/about.php  

24

Page 25: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware  tes;ng  techniques  and  or  tools.  

Secondary  study  (EBSE)  

TT1  CS1  CS2  

CSn  

TT2  CS1  CS2  

CSn  

TTi  CS1  CS2  

CSn  

TTn  CS1  CS2  

CSn  

Body of Evidence

Tes;ng  Techniques  and  Tools  (TT)  

Prim

ary  case  stud

ies  

Gen

eral  Framew

ork  for  E

valuaD

ng  TesDn

g  Techniqu

es  and

 Too

ls        

InstanDate    

The  idea  

Page 26: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Empirical  research  with  industry  =  difficult  

•  Not “rocket science”-difficult

Page 27: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Empirical  research  with  industry  =  difficult  

•  But “communication science”-difficult

Page 28: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

“communica;on  science”-­‐difficult  

28

Only takes 1 hour

Not so long

Not so long

5 minutes

Researcher Practitioner in a company

Page 29: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Communica;on  is  extremely  difficult  

Page 30: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 30

Page 31: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Examples…..  

Academia  •  Wants  to  empirically  evaluate  T  

•  What  techniques/tools  can  we  compare  with?  

•  Why  don´t  you  know  that?  •  That  does  not  take  so  much  ;me!  •  Finding  real  faults  would  be  great!  •  Can  we  then  inject  faults?  

•  How  many  people  can  use  it?  •  Is  there  historical    data?  •  But  you  do  have  that  informa;on?  

Industry  •  Wants  to  execute  and  use  T  to  see  what  

happens.  •  We  use  intui;on!    •  You  want  me  to  know  all  that?  •  That  much  ;me!?  •  We  cannot  give  this  informa;on.  •  Not  ar;ficial  ones,  we  really  need  to  

know  if  this  would  work  for  real  faults.  •  We  can  assign  1  person.  •  That  is  confiden;al.  •  Oh..,  I  thought  you  did  not  need  that.  

31

Page 32: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

With  the  objec;ve  to  improve  and  reduce  some  barriers  

•  Use  a  general  methodological  framework.  •  To  use  as  a  vehicle  of  communica;on  •  To  simplify  the  design  •  To  make  sure  that  studies  can  be  compared  and  aggregated  

32

Page 33: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Exis;ng  work    

•  By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al •  Describe organizational frameworks, i.e.:

–  General steps –  Warnings when designing –  Confounding factors that should be minimized

•  We pretended to define a methodological framework that defined how to evaluate software testing techiques, i.e.: –  The research questions that can be posed –  The variables that can be measured –  Etc.

Page 34: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

The  methodological  framework  •  Imagine a company C wants to evaluate T to see whether it is useful and

worthwhile to incorporate in its company.

•  Components of the Framework (each case study will be an instantiation)

–  Objectives: effectiveness, efficiency, satisfaction

–  Cases or treatments (= the testing techniques/tools)

–  The subjects (= practitioners that will do the study)

–  The objects or pilot projects: selection criteria.

–  The variables and metrics: which data to collect?

–  Protocol that defines how to execute and collect data.

–  How to analyse the data

–  Threats to validity

–  Toolbox

Page 35: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Defini;on  of  the  framework  –  research questions

•  RQ1: How does T contribute to the effectiveness of testing when it is being

used in real testing environments of C and compared to the current practices

of C?

•  RQ2: How does T contribute to the efficiency of testing when it is being used

in real testing environments of C and compared to the current practices of C?

•  RQ3: How satisfied (subjective satisfaction) are testing practitioners of C

during the learning, installing, configuring and usage of T when used in real

testing environments?

Page 36: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Defini;on  of  the  framework  –  the cases

•  Reused a taxonomy from Vegas and Basili adapted to software testing tools

and augmented with results from Tonella

•  The case or the testing technique or tool should be described by:

–  Prerequisites: type, life-cycle, environment (platform and languages),

scalability, input, knowledge needed, experience needed.

–  Results: output, completeness, effectiveness, defect types, number of

generated test cases.

–  Operation: Interaction modes, user guidance, maturity, etc.

–  Obtaining the tool: License, cost, support.

Page 37: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Defini;on  of  the  framework  –  subjects

•  Workers of C that normally use the techniques and tools with which T is being

compared.

Page 38: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Defini;on  of  the  framework  – the objects/pilot projects

•  In order to see how we can compare and what type of study we will do, we need answers to the following questions: A.  Will  we  have  access  to  a  system  with  known  faults?  What  informa;on  is  present  

about  these  faults?  B.  Are  we  allowed/able  to  inject  faults  into  the  system?  C.  Does  your  company  gather  data  from  projects  as  standard  prac;ce?  What  data  is  

this?  Can  this  data  be  made  available  for  comparison?  Do  you  have  a  company  baseline?  Do  we  have  access  to  a  sister  project?  

D.  Does  company  C  have  enough  ;me  and  resources  to  execute  various  rounds  of  tests?,  or  more  concrete:  

•  Is  company  C  willing  to  make  a  new  testsuite  TSna  with  some  technique/tool  Ta  already  used  in  the  company  C?  

•  Is  company  C  is  willing  to  make  a  new  testsuite  TSnn  with  some  technique/tool  Tn  that  is  also  new  to  company  C?  

•  Can  we  use  an  exis;ng  testsuite  TSe  that  we  can  use  to  compare?  (Do  we  know  the  techniques  that  were  used  to  create  that  test  suite,  and  how  much  ;me  it  took?)  

Page 39: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 

Defini;on  of  the  framework  -­‐  Protocol  Can we inject

faults?

Training on T Inject faults

YES

do T to make TST; collect

data

NO

Can we compare with an existing test suite from company C (i.e. TSC)

Can company C make another test suite TSN for

comparison using Tknown or Tunknown

Do we have a version with known

faults?

Do we have a company baseline?

do Tknown o Tunknown to make TSN ;

collect data

NO YES

YESNO

NO YES

Do we have a version with known

faults?

Do we have a version with known

faults?

NO YES

NO

NO YES

YES

1 2

3

6 7

4 5

Page 40: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 

Defini;on  of  the  framework  -­‐  Scenarios  •  Remember:

o  Does company C have enough time and resources to execute various rounds of tests?, or more concrete:

•  Is company C willing to make a new testsuite TSna with some technique/tool Tal already used in the company C?

•  Is company C is willing to make a new testsuite TSnn with some technique/tool Tn that is also new to company C?

•  Can we use an existing testsuite TSe that we can use to compare? (Do we know the techniques that were used to create that test suite, and how much time it took?)

•  Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis) •  Scenario 2 (Scenario 1 /\ quantitative analysis based on company baseline) •  Scenario 3 ((Scenario 1 \/ Scenario 2) /\ quantitative analysis of FDR) •  Scenario 4 ((Scenario 1 \/ Scenario 2) /\ quantitative comparison of T and TSe) •  Scenario 5 (Scenario 4 /\ FDR of T and TSe) •  Scenario 6 ((Scenario 1 \/ Scenario 2) /\ quantitative comparison of T and (Ta or Tn)) •  Scenario 7 (Scenario 6 /\ FDR of T and (Ta or Tn))

Page 41: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

If  we  can  inject  faults,  take  care  that  

1.  The  ar;ficially  seeded  faults  are  similar  to  real  faults  that  naturally  occur  in  real  programs  due  to  mistakes  made  by  developers.  –  To  iden;fy  realis;c  fault  types,  a  history-­‐based  approach  can  be  used,  

i.e.  “real”  faults  can  be  fetched  from  the  bug  tracking  system  and  made  sure  that  these  reported  faults  are  an  excellent  representa;ve  of  faults  that  are  introduced  by  developers  during  implementa;on.    

2.  The  faults  should  be  injected  in  code  that  is  covered  by  an  adequate  number  of  test  cases  –   e.g.,  they  may  be  seeded  in  code  that  is  executed  by  more  than  20  

percent  and  less  than  80  percent  of  the  test  cases.  

3.  The  faults  should  be  injected  “fairly,”  i.e.,  an  adequate  number  of  instances  of  each  fault  type  is  seeded.  

41

Page 42: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Defini;on  of  the  framework  –  data to collect

•  Effectiveness –  Number of test cases designed or generated. –  How many invalid test cases are generated. –  How many repeated test cases are generated –  Number of failures observed. –  Number of faults found. –  Number of false positives (The test is marked as Failed, when the functionality is

working). –  Number of false negatives (The test is marked as Passed, when the functionality is not

working). –  Type and cause of the faults that were found. –  Estimation (or when possible measured) of coverage reached.

•  Efficiency

•  Subjective satisfaction

Page 43: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Effectiveness

•  Efficiency –  Time needed to learn the testing method. –  Time needed to design or generate the test cases. –  Time needed to set up the testing infrastructure (install, configure, develop test drivers,

etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/installation efforts, it might be a good idea to maintain working diaries).

–  Time needed to test the system and observe the failure (i.e. planning, implementation and execution) in hours (quantitative).

–  Time needed to identify the fault type and cause for each observed failure (i.e. time to isolate) (quantitative).

•  Subjective satisfaction

Defini;on  of  the  framework  –  data to collect

Page 44: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Effectiveness

•  Efficiency •  Subjective satisfaction

–  SUS score (10 question questionnaire with 5 likert-scale and a total score)

–  5 reactions (through reaction cards) that will be used to create a word cloud and ven diagrams)

–  Emotional face reactions during semi-structured interviews (faces will be evaluated on a 5 likert-scale from “not at all like this” to “very much like this”).

–  Subjective opinions about the tool.

A mixture of methods for evaluating subjective satisfaction

Defini;on  of  the  framework  –  data to collect

Page 45: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 45

System

 Usability  Scale  (SUS)  

© D

igita

l Equ

ipm

ent C

orpo

ratio

n, 1

986

ww

w.u

sabi

lity.

serc

o.co

m/tr

ump/

docu

men

ts/S

usch

apt.d

oc

! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree! ! !!!!!!!!!!!!!!!!!!!!!!Strongly!agree!! ! ! ! ! ! ! ! ! !!!!!!!1.!I!think!that!I!would!like!to! !!!!use!this!system!frequently! ! !!!2.!I!found!the!system!unnecessarily!!!!complex!!!3.!I!thought!the!system!was!easy!!!!to!use!!!!!!!!!!!!!!!!!!!!!!! !!4.!I!think!that!I!would!need!the!!!!support!of!a!technical!person!to!!!!be!able!to!use!this!system! !!5.!I!found!the!various!functions!in!!!!this!system!were!well!integrated!! ! ! ! !6.!I!thought!there!was!too!much!!!!inconsistency!in!this!system!! ! ! ! !7.!I!would!imagine!that!most!people!!!!would!learn!to!use!this!system!!!!very!quickly! ! ! !!8.!I!found!the!system!very!!!!cumbersome!to!use!! ! ! !9.!I!felt!very!confident!using!the!!!!system!! !10.!I!needed!to!learn!a!lot!of!!!!things!before!I!could!get!going!!!!with!this!system!! ! !!

!

!

Page 46: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Why  SUS  •  studies  (e.g.  [TS04,  BKM08])  have  shown  that  this  simple  

ques;onnaire  gives  most  reliable  results.  •  SUS  is  technology  agnos;c,  making  it  flexible  enough  to  assess  a  

wide  range  of  interface  technologies.    •  The  survey  is  rela;vely  quick  and  easy  to  use  by  both  study  

par;cipants  and  administrators.  •  The  survey  provides  a  single  score  on  a  scale  that  is  easily  

understood  by  the  wide  range  of  people  (from  project  managers  to  computer  programmers)  who  are  typically  involved  in  the  development  of  products  and  services  and  who  may  have  liVle  or  no  experience  in  human  factors  and  usability.  

•  The  survey  is  nonproprietary,  making  it  a  cost  effec;ve  tool  as  well.  

46

Page 47: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

SUS  Scoring  

•  SUS  yields  a  single  number  represen;ng  a  composite  measure  of  the  overall  usability  of  the  system  being  studied.  Note  that  scores  for  individual  items  are  not  meaningful  on  their  own.  

•  To  calculate  the  SUS  score,  first  sum  the  score  contribu;ons  from  each  item.    –  Each  item's  score  contribu;on  will  range  from  0  to  4.    –  For  items  1,  3,  5,  7  and  9  the  score  contribu;on  is  the  scale  posi;on  

minus  1.    –  For  items  2,4,6,8  and  10,  the  contribu;on  is  5  minus  the  scale  

posi;on.    –  Mul;ply  the  sum  of  the  scores  by  2.5  to  obtain  the  overall  value  of  SU.    

•  SUS  scores  have  a  range  of  0  to  100.  

47

Page 48: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

SUS  is  not  enough…  

•  In  a  literature  review  of  180  published  usability  studies,  Hornbaek  [Horn06]  concludes  that  measures  of  sa;sfac;on  should  be  extended  beyond  ques;onnaires.  

•  So  we  add  more….  

48

Page 49: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Reac;on  Cards  

49

Developed by and © 2002 Microsoft Corporation. All rights reserved.

Page 50: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Emo;onal  face  reac;ons    •  The  idea  is  to  elicit  feedback  about  the  product,  par;cularly  

emo;ons  that  arose  for  the  par;cipants  while  talking  about  the  product  (e.g.  frustra;on,  happiness).      

•  We  will  video  tape  the  users  when  they  respond  to  the  following  two  ques;ons  during  a  semi-­‐structured  interview.  

•  Would  you  recommend  this  tool  it  to  other  colleagues?    –  If  not  why  –  If  yes  what  arguments  would  you  use    

•  Do  you  think  you  can  persuade  your  management  to  invest  in  a  tool  like  this?  –  If  not  why  –  If  yes  what  arguments  would  you  use    

50

!

Not!at!all!like!this!

! ! ! ! ! Very!much!like!this!

1! 2! 3! 4! 5! 6! 7!

!

Page 51: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Analysing  and  interpre;ng  the  data  •  Depends  on  the  amount  of  data  we  have.  •  If  we  only  have  1  value  for  each  variable,  no  analysis  techniques  

are  available  and  we  just  present  and  interpret  the  data.  •  If   we   have   sets   of   values   for   a   variable   then   we   need   to   use  

sta;s;cal  methods  –  Descrip;ve  sta;s;cs  –  Sta;s;cal   tests   (or   significance   tes;ng).   In   sta;s;cs   a   result   is   called  

sta;s;cally   significant   if   it   has   been   predicted   as   unlikely   to   have  occurred   by   chance   alone,   according   to   a   pre-­‐determined   threshold  probability,  the  significance  level.  

•  Evalua;ng   SBST   tools,   we   will   always   have   sets   of   values   for  most  of  the  variables  to  deal  with  randomness  

51

Page 52: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Descrip;ve  sta;s;cs  •  Mean,   median,   middle,   standard   devia;on,   frequency,  

correla;on,  etc  •  Graphical   visualisa;on:   scaVer   plot,   box   plot,   histogram,   pie  

charts  

52

Page 53: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Sta;s;cal  test  

53

http

://sc

ienc

e.le

iden

univ.

nl/in

dex.

php/

ibl/p

ep/p

eopl

e/To

m_d

e_Jo

ng/T

each

ing

Most  impo

rtant  is  fi

nding  te  right  m

etho

d  

Page 54: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 

Defini;on  of  the  framework  -­‐  Threats  

•  Threats to validity (of confounding factors) have to be minimized

•  These are the effects or situations that might jeopardize the validity of your results….

•  Those that cannot be prevented have to be reported •  When working with people we have to consider many

sociological effects: •  Let us just look at a couple well known ones to give you an

idea…. –  The learning curve effect –  The Hawthorne effect –  The Placebo effect –  Etc….

Page 55: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

The  learning  curve  effect  •  When  using  new  methods/tools  people  gain  familiarity  with  

their  applica;on  over  ;me  (=  learning  curve).  –  Ini;ally  they  are  likely  to  them  more  ineffec;vely  than  they  might  a_er  a  

period  of  familiarisa;on/learning.  

•  Thus  the  learning  curve  effect  will  tend  to  counteract  any  posi;ve  effects  inherent  in  the  new  method/tool.    

•  In  the  context  of  evalua;ng  methods/tools  there  are  two  basic  strategies  to  minimise  the  learning  curve  effect  (that  are  not  mutually  exclusive):  1.  Provide  appropriate  training  before  undertaking  an  evalua;on  exercise;  

2.  Separate  pilot  projects  aimed  at  gaining  experience  of  using  a  method/tool  from  pilot  projects  that  are  part  of  an  evalua;on  exercise.  

55

Page 56: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

The  Hawthorn  effect  •  When  an  evalua;on  is  performed,  staff  working  on  the  pilot  

project(s)  may  have  the  percep;on  that  they  are  working  under  more  management  scru;ny  than  normal  and  may  therefore  work  more  conscien;ously.    

•  Name  comes  from  Hawthorne  aircra_  factory  (lights  low  or  high?).  •  The  Hawthorn  effect  would  tend  to  exaggerate  posi;ve  effects  

inherent  in  a  new  method/tool.    •  A  strategy  to  minimise  the  Hawthorne  effect  is  to  ensure  that  a  

similar  level  of  management  scru;ny  is  applied  to  control  projects  in  your  case  study  (i.e.  project(s)  using  the  current  method/tool)  as  is  applied  to  the  projects  that  are  using  the  new  method/tool.  

56

Page 57: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

The  placebo  effect  •  In  medical  research,  pa;ents  who  are  deliberately  given  ineffectual  treatments  

recover  if  they  believe  that  the  treatment  will  cure  them.    •  Also  a  so_ware  engineer  who  believes  that  adop;ng  some  prac;ce  (i.e.,  

wearing  a  pink  t-­‐shirt)  will  improve  the  reliability  of  his  code  may  succeed  in  producing  more  reliable  code.    

•  Such  a  result  could  not  be  generalised.  •  In  medicine,  placebo  effect  is  minimized  by  not  informing  the  subjects.    •  This  cannot  be  done  in  the  context  of  tes;ng  tool  evalua;ons.  •  When  evalua;ng  methods  and  tools  the  best  you  can  do  is  to:  

–  Assign  staff  to  pilot  projects  using  your  normal  project  staffing  methods  and  hope  that  the  actual  selec;on  of  staff  includes  the  normal  mix  of  enthusiasts,  cynics  and  no-­‐hopers  that  normally  comprise  your  project  teams.  

–  Make  a  special  effort  to  avoid  staffing  pilot  projects  with  staff  who  have  a  vested  interest  in  the  method/tool  (i.e.  staff  who  developed  or  championed  it)  or  a  vested  interest  in  seeing  it  fail  (i.e.  staff  who  really  hate  change  rather  than  just  resent  it).  

•  This  is  a  bit  like  selec;ng  a  jury.  Ini;ally  the  selec;on  of  poten;al  jurors  is  at  random,  but  there  is  addi;onal  screening  to  avoid  jurors  with  iden;fiable  bias.  

57

Page 58: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 

Defini;on  of  the  framework  -­‐  Threats  

•  There are many more factors that all need to be identified •  Not only the people but also the technology:

–  Did the measurement tools (i.e. Coverage) really measure what we thoughts

–  Are the injected faults really representative? –  Were the faults injected fairly? –  Is the pilot project and software representative? –  Were the used oracles reliable? –  Etc….

Page 59: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 

Defini;on  of  the  framework  -­‐  Toolbox  •  Toolbox

–  Demographic questionnaire: This questionnaire must be answered by the testers before performing the test. This questionnaire aims to obtain the features of the testers: level of experience in using the tool, years, job, knowledge of similar tools,

–  Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’ satisfaction when they perform the evaluation.

–  Reaction cards –  Questions for (taped) semi-structured interviews –  Process for investigating the face reactions in videos –  An fault taxonomy to classify the found fault. –  A software testing technique and tools classification/taxonomy –  Working diaries –  A fault template to classify each fault detected by means of the test. The template

contains the following information: •  Time spent to detect the fault •  Test case that found the fault •  Cause of the fault: mistake in the implementation, mistake in the design,

mistake in the analysis. •  Manifestation of the fault in the code

Page 60: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

 Applying  or  instan;a;ng  the  framework  

•  Search Based Structural Testing tool [Vos et.al. 2012] •  Search Based Functional Testing tool [Vos et. Al. 2013] •  Web testing techniques for AJAX applications [MRT08] •  Commercial combinatorial testing tool at Sulake (to be presented at ISSTA

workshop next week) •  Automated Test Case Generation at IBM (has been send to ESEM) •  We have finished other instances:

–  Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM)

•  Currently we are working on more instantiations: –  Regression testing priorization technique –  Continuous testing tool at SOFTEAM –  Rogue User Testing at SOFTEAM –  …

Page 61: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Sulake is a Finish company •  Develops social entertainment games •  Main product: Habbo hotel

–  World’s largest virtual community for teenagers –  Millions of teenagaers a week all over the world (ages 13-18) –  Access direct through the browser or facebook –  11 languages available –  218.000.000 registered users –  11.000.000 visitors / month

•  System can be accessed through wide variety of browsers, flashplayers (and their versions) than run on different operating systems!

•  Which combinations to use when testing the system?!

Combinatorial  Tes;ng  example  case  study  Sulake:  Context  

Page 62: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Can  a  tool  help?  •  What  about  the  CTE  XL  Profesional  or  CTE  for  short?  •  Combinatorial  Tree  Editor:  

–  Model  your  combinatorial  problem  in  a  tree  –  Indicate  the  priori;es  of  the  combina;ons    –  The  tool  automa;cally  generated  the  best  test  cases!  

67

Page 63: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  What do we want to find out? •  Research questions:

–  RQ1: Compared to the current test suites used for testing in Sulake, can the test cases generated by the CTE contribute to the effectiveness of testing when it is used in real testing environments at Sulake?

–  RQ2: How much effort would be required to introduce the CTE into the testing processes currently implanted at Sulake?

–  RQ3: How much effort would be required to add the generated test cases into the testing infrastructure currently used at Sulake?

–  RQ4: How satisfied are Sulake testing practitioners during the learning, installing, configuring and usage of CTE when it is used in real testing environment

Combinatorial  Tes;ng  example  case  study  Sulake:  Research  Ques;ons  

Page 64: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Current combinatorial testing practice at Sulake –  Exploratory testing with feature coverage as objective –  Based on real user information(i.e. browsers, OS, Flash,

etc.) combinatorial aspects are taken into account

COMPARED TO •  Classification Tree Editor (CTE)

–  Classify the combinatorial aspects as a classification tree –  Generate prioritized test cases and select

The  tes;ng  tools  evaluated  (the  cases  or  treatments)  

Page 65: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Subjects: 1 senior tester from Sulake (6 years sw devel, 8 years testing experience)

Who  is  doing  the  study  (the  subjects)  

Page 66: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Objects: –  2 nightly builds from Habbo –  Existing test suite that Sulake uses (TSsulake) with 42

automated test cases –  No known faults, no injection of faults

Systems  Under  Test  (objects  or  pilot  projects)  

Page 67: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Can we inject faults?

Training on T Inject faults

YES

do T to make TST; collect

data

NO

Can we compare with an existing test suite from company C (i.e. TSC)

Can company C make another test suite TSN for

comparison using Tknown or Tunknown

Do we have a version with known

faults?

Do we have a company baseline?

do Tknown o Tunknown to make TSN ;

collect data

NO YES

YESNO

NO YES

Do we have a version with known

faults?

Do we have a version with known

faults?

NO YES

NO

NO YES

YES

1 2

3

6 7

4 5

The  protocol  (scenario  4)  

Page 68: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

!Variables) TSSulake) TSCTE)Measuring!effectiveness:!Number!of!test!cases! 42! 68!(selected!42!high!priority)!Number!of!invalid!test!cases! 0! 0!Number!of!repeated!test!cases! 0! 26!Number!of!failures!observed! 0! 12!Number!of!fauls!found! 0! 2!Type!and!cause!of!the!faults! N/A! 1.!critical,!browser!hang!

2.!minor,!broken!UI!element!Feature!coverage!reached! 100%! 40%!All!pairs!coverage!reached! N/A! 80%!Measuring!efficiency:!Time!needed!to!learn!the!CTE!testing!method! N/A! 116!min!Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!! N/A! 95!min!(62!for!tree,!33!for!

removing!duplicates)!!Time!needed!to!setup!testing!infrastructure!specific!to!CTE! N/A! 74!min!Time!needed!to!automate!the!test!suite!generated!by!the!CTE!! N/A! 357!min!Time!needed!to!execute!the!test!suite! 114min! 183!min!(both!builds)!Time!needed!to!identify!fault!types!and!causes!! 0min! 116!min!Measuring!subjective!satisfaction!SUS! N/A! 50!Reaction!cards! N/A! Comprehensive,!Dated,!Old,!

Sterile,!Unattractive!Informal!interview!! N/A! video!

Collected  data  

Page 69: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Descrip;ve  sta;s;cs  for  efficiency  (1)  

Page 70: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Descrip;ve  sta;s;cs  for  efficiency  (2)  

Page 71: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Emo;onal  face  reac;ons    

76

1. 4.

2. 3.

Duplicates Test Cases

Readability of the CTE trees Technical support and user manuals

Appearance of the tool

Page 72: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  RQ1:  Compared  to  the  current  test  suites  used  for  tes;ng  in  Sulake,  can  the  test  cases  generated  by  the  CTE  contribute  to  the  effec;veness  of  tes;ng  when  it  is  used  in  real  tes;ng  environments  at  Sulake?  

–  2  new  faults  were  found!!  –  The  need  that  more  structured  combinatorial  tes;ng  is  necessary  was  confirmed  by  Sulake.  

•  RQ2:  How  much  effort  would  be  required  to  introduce  the  CTE  into  the  tes;ng  processes  currently  implanted  at  Sulake?  

–  Effort  for  learning  and  installing  is  medium,  but  can  be  jus;fied  within  Sulake  being  only  once  –  Designing  and  genera;ng  test  cases  suffers  form  duplicates  that  costs  ;me  to  be  removed.  Total  effort  can  

be  accepted  within  Sulake  because  cri;cal  faults  were  discovered.  –  Execu;ng  the  test  suite  generated  by  the  CTE  takes  1  hour  more  that  Sulake  test  suite  (due  to  more  

combinatorial  aspects  being  tested).  This  means  that  Sulake  cannot  include  these  tests  in  daily  build,  but  will  have  to  add  them  to  nightly  builds  only.  

¢  RQ3:  How  much  effort  would  be  required  to  add  the  generated  test  cases  into  the  tes;ng  infrastructure  currently  used  at  Sulake?    •  Effort  for  automa;ng  the  generated  test  cases,  but  can  be  jus;fied  within  Sulake.  

¢  RQ4:  How  sa;sfied  are  Sulake  tes;ng  prac;;oners  during  the  learning,  installing,  configuring  and  usage  of  CTE  when  it  is  used  in  real  tes;ng  environment.  •  Seems  to  have  everything  that  is  needed  but  looks  unaVrac;ve.  

Combinatorial  Tes;ng  example  case  study  Sulake:  Conclusions  

Page 73: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Automated  Test  Case  Genera;on  example  case  study  IBM:  context  

•  IBM  Research  Labs  in  Haifa,  Israel  •  Develop  a  system  (denoted  IMP  ;-­‐)  for  resource  management  

in  a  networked  environment  (servers,  virtual  machines,  switches,  storage  devices,  etc.)  

•  IBM  Lab  has  a  designated  team  that  is  responsible  for  tes;ng  new  versions  of  the  product.  

•  The  tes;ng  is  being  done  within  a  simulated  tes;ng  environment  developed  by  this  tes;ng  team  

•  IBM  Lab  is  interested  in  evalua;ng  the  Automated  Test  Case  Genera;on  tools  developed  in  the  FITTEST  project!  

78

Page 74: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  What do we want to find out?

•  Research questions:

–  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM?

–  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM?

–  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research?

Automated  Test  Case  Genera;on  example  case  study  IBM:  the  research  ques;ons  

Page 75: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Current test case design practice at IBM –  Exploratory test case design –  The objective of test cases is maximise the coverage of

the system use-cases

COMPARED TO •  FITTEST Automated Test Case Generation tools

–  Only part of the whole continuous testing approach of FITTEST

The  tes;ng  tools  evaluated  (the  cases  or  treatments)  

Page 76: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

FITTEST  Automated  Test  Case  Genera;on  

81

Logs2FSM •  Infers FSM models from the logs

•  Applying an event-based model inference approach (read more in [MTR08]).

•  The model-based oracles that also result from this tool refer to the use of the paths generated from the inferred FSM as oracles. If these paths, when transformed to test cases, cannot be fully executed, then the tester needs to inspect the failing paths to see if that is due to some faults, or the paths themselves are infeasible.

FSM2Tests •  Takes FSMs and a Domain Input

Specification (DIS) file created by a tester for the IBM Research SUT to generate concrete test cases.

•  This component implements a technique

that combines model-based and combinatorial testing (see [NMT12]):

1.  generate test paths from the FSM (using various simple and advanced graph visit algorithms)

2.  transform these paths into classification trees using the CTE XL format, enriched with the DIS such as data types and partitions;

3.  Generate test combinations from those trees using combinatorial criteria.

Page 77: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Subjects: –  1 senior tester from IBM (10 years sw devel, 5 years testing

experience of which 4 years with IMP system) –  1 researcher from FBK (10 years of experience with sw

development, 5 years of experience with research in testing)

Who  is  doing  the  study  (the  subjects)  

Page 78: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  Objects: –  SUT: Distributed application for managing system

resources in a networked environment •  A management server that communicates with multiple

managed clients (= physical or virtual resources) •  Important product for IBM with real customers •  Case study will be performed on a new version of this system

–  Existing test suite that IBM uses (TSibm) •  Selected from what they call System Validation Tests

(SVT) -> tests for high level complex costumer use-cases •  Manually designed •  Automatically executed through activation scripts

–  10 representative faults to inject into the system

Pilot  project    

Page 79: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Can we inject faults?

Training on T Inject faults

YES

do T to make TST; collect

data

NO

Can we compare with an existing test suite from company C (i.e. TSC)

Can company C make another test suite TSN for

comparison using Tknown or Tunknown

Do we have a version with known

faults?

Do we have a company baseline?

do Tknown o Tunknown to make TSN ;

collect data

NO YES

YESNO

NO YES

Do we have a version with known

faults?

Do we have a version with known

faults?

NO YES

NO

NO YES

YES

1 2

3

6 7

4 5

The  protocol  (scenario  5)  

Page 80: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

More  detailed  steps  1.  Configure  the  simulated  environment  and  create  the  logs  [IBM]  2.  Select  test  suite  TSibm [IBM]  3.  Write  ac;va;on  scripts  for  each  test  case  in  TSibm [IBM]  4.  Generate  TSfittest [FBK  subject]  

a.  Instan;ate  FITTEST  components  for  the  IMP  b.  Generate  the  FSM  with  Logs2FSM  c.  Define  the  Domain  Input  Specifica;on  (DIS)  d.  Generate  the  concrete  test  data  with  FSM2Tests  

5.  Select  and  inject  the  faults  [IBM]  6.  Develop  a  tool  that  transforms  the  concrete  test  cases  generated  

by  the  FITTEST  tool  FSM2Tests    to  an  executable  format  [IBM]  7.  Execute  TSibm  [IBM]  8.  Execute  TSfittest [IBM]  

85

Page 81: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Collected  Measures  

86

TSibm TSfittest

sizenumber of abstract test cases NA 84number of concrete test cases 4 3054number of commands (or events) 1814 18520constructiondesign of the test cases manual cf. Section III-B1 automated cf. Section III-B2effort

effort to create the test suitedesign 5 hours set up FITTEST tools 8 hoursactivation scripts 4 hours generate the FSM automated, less than 1 second CPU time

specify the DIS 2 hoursgenerate concrete tests automated, less than 1 minute CPU timetransform into executable format 20 hours

TABLE IV. DESCRIPTIVE MEASURES FOR THE TEST SUITES TSibm AND TSfittest

what the researcher have in mind and what is investigatedaccording to the research questions. This type of threat ismainly related to the use of injected faults to measure the fault-finding capability of our testing strategies. This is because thetypes of faults seeded may not be enough representative of realfaults. In order to mitigate this threat, the IBM team identifiedrepresentative faults that were based on real faults, identifiedin earlier time of the development. This identification althoughwas realized by a senior tester, the list was revised by all IBMteam that participated in this case study.

V. CONCLUSIONS

We have presented a “which is better” [9] case study forevaluating FITTEST testing tools with a real user and real taskswithin a realistic industrial environment of IBM Research.The design of the case study has been done according to themethodological framework for defining case studies presentedin [14]. Although a one-subject case study will never providegeneral conclusions with statistical significance, the obtainedresults can be generalized to other similar software in similartesting environments of IBM Research [15], [8]. Moreover, thestudy was very useful for technology transfer purposes: someremarks during the study indicate that the FITTEST techniqueswould not have been evaluated in so much depth if it wouldnot have been backed up by our case study design. Finally,having only limited number of subjects available, this studytook several weeks to complete and hence we overcame theproblem of getting too much information too late.

The objective of this research was to examine the ad-vancements of the FITTEST tools and validate their potentialto improve current testing practices at IBM Research. Thefollowing were the results of the case study:

• The FITTEST tools can increase the effectiveness ofthe current practice of the IBM Research team fortesting the IMP within the simulated environment.

• The efficiency of the FITTEST tools is found accept-able by IBM Research for testing the IMP within thesimulated environment.

• The test cases automatically generated by theFITTEST tools support better the identification of thesource of the faults when testing the IMP within thesimulated environment.

• The effort for deploying the FITTEST within a realindustry case has been found reasonable by IBMResearch.

Moreover, from the FITTEST project’s point of view wehave the following results:

• The FITTEST tools have shown to be useful withinthe context of a real industrial case.

• The FITTEST tools have the ability to automate thetesting process within a real industrial case.

ACKNOWLEDGMENT

This work was financed by the FITTEST project, ICT-2009.1.2 no 257574. Also, we would like to thank the greathelp we got from Alon Aradi for conducting the experiments.

REFERENCES

[1] http://pic.dhe.ibm.com/infocenter/director/pubs/index.jsp?topic=%2Fcom.ibm.director.vim.helps.doc%2Ffsd0 vim main.html.

[2] B. Beizer. Software Testing Techniques. International ThomsonComputer Press, 1990.

[3] A. Biermann and J. Feldman. On the synthesis of finite-state machinesfrom samples of their behavior. IEEE Trans. on Computers, 21(6),1972.

[4] A. C. Dias Neto, R. Subramanyan, M. Vieira, and G. H. Travassos.A survey on model-based testing approaches: a systematic review. In1st ACM Int. workshop on Empirical assessment of software engineer-ing languages and technologies: held in conjunction with the 22ndIEEE/ACM ASE 2007, WEASELTech ’07, pages 31–36, New York,NY, USA, 2007. ACM.

[5] M. Fewster and D. Graham. Software Test Automation. Addison-Wesley,1999.

[6] M. Grochtmann and J. Wegener. Test case design using classificationtrees and the classification-tree editor cte. In Proceedings of the 8thInternational Software Quality Week, San Francisco, USA, Mai 1995.

[7] M. Harman and B. F. Jones. Search-based software engineering.Information and Software Technology, 43(14):833–839, 2001.

[8] W. Harrison. Editorial (N=1: an alternative for software engineeringresearch). Empirical Software Engineering, 2(1):7–10, 1997.

[9] B. Kitchenham, L. Pickard, and S. Pfleeger. Case studies for methodand tool evaluation. Software, IEEE, 12(4):52 –62, July 1995.

[10] G. J. Myers and C. Sandler. The Art of Software Testing. John Wiley& Sons, 2004.

[11] C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. InProceedings of the 2012 International Symposium on Software Testingand Analysis, pages 100–110. ACM, 2012.

[12] C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput.Surv., 43:11:1–11:29, February 2011.

[13] T. Vos, P. Tonella, J. Wegener, M. Harman, W. Prasetya, and S. Ur.Testing of future internet applications running in the cloud. In S. Tilleyand T. Parveen, editors, Software Testing in the Cloud: Perspectives onan Emerging Discipline, pages 305–321. 2013.

START

S27

GET/VMControl/virtualAppliances

S43

GET/VMControl/virtualServers/{id}/customization

S47

GET/VMControl/workloads/{id}

S7

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

S46

GET/VMControl/virtualServers/{id}

S11

GET/VMControl/virtualAppliances/{id}/progress

S19

GET/isdsettings/restcompatibilities

S29

GET/VMControl/workloads/{id}

S10

GET/VMControl/virtualServers/{id}/customization

S8

GET/VMControl/virtualAppliances/{id}

S25

PUT/VMControl/virtualServers/{id}

S15

GET/VMControl/virtualAppliances/{id}/progress

S42

GET/VMControl/virtualAppliances/{id}/targets

S34

GET/VMControl/virtualAppliances

S26

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/progress

S32

GET/VMControl/virtualServers/{id}

S33

GET/VMControl/virtualAppliances/{id}

S41

GET/VMControl/virtualAppliances/{id}/targets

S2

GET/VMControl/virtualAppliances/{id}/progress

S20

GET/VMControl/virtualAppliances/{id}

S37

GET/VMControl/virtualAppliances/{id}/targets

S36

GET/VMControl/virtualAppliances

S28

GET/VMControl/virtualServers/{id}

S24

GET/VMControl/virtualAppliances

GET/VMControl/virtualServers/{id}/customization

S44

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targets

S22

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}/progress GET/isdsettings/restcompatibilities

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targetsGET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances

S48

GET/resources/OperatingSystem?Props=IPv4Address

GET/isdsettings/restcompatibilitiesGET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/progress S40

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targets

S23

GET/VMControl/virtualAppliances/{id}

S50

GET/resources/System/{id}/access/a315017165998

S38

GET/VMControl/workloads

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}

S31

GET/VMControl/workloads/{id}

S39

GET/VMControl/workloads/{id}/progress

S21

GET/VMControl/workloads/{id}

S12

GET/VMControl/virtualServers/{id}/customization

S13

GET/VMControl/virtualServers/{id}

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

S30

GET/VMControl/virtualServers/{id}

S35

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualAppliances/{id}

GET/isdsettings/restcompatibilities

GET/VMControl/virtualServers/{id}

PUT/VMControl/virtualServers/{id} GET/VMControl/virtualAppliances

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualServers/{id}

PUT/VMControl/virtualServers/{id} GET/VMControl/virtualAppliances

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualServers/{id}

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualServers/{id}

PUT/VMControl/virtualServers/{id}

S3

GET/VMControl/workloads/{id}/progress

GET/VMControl/virtualAppliances

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualAppliances/{id}

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualServers/{id}

S14

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/workloads/{id}/progress

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

GET/VMControl/virtualServers/{id}

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

S5

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization?desiredCPU=1&desiredMemory=2048&desiredCPUUnits=0.5

S6

GET/VMControl/virtualAppliances/{id}

S4

POST/VMControl/workloads

S18

GET/VMControl/workloads/{id}

S17

GET/isdsettings/restcompatibilities

S16

GET/VMControl/workloads/{id}/progress

GET/VMControl/workloads/{id}/virtualServers

GET/VMControl/virtualServers/{id}

GET/VMControl/virtualAppliancesGET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

GET/VMControl/virtualAppliances/{id}

GET/VMControl/workloads/{id}/progress

GET/VMControl/virtualServers/{id}

GET/VMControl/virtualAppliancesGET/VMControl/virtualServers/{id}/customization

GET/VMControl/workloads/{id}

GET/VMControl/virtualAppliances/{id}/targets/{id}/customization

S49

GET/resources/OperatingSystem/{id}/access/a315017165998

S45

GET/VMControl/workloads/{id}

S9

POST/VMControl/virtualAppliances

S1

GET/VMControl/virtualAppliances

GET/VMControl/workloads/{id}

GET/VMControl/virtualServers/{id}/customization

TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TOGENERATE TSfittest

VariableNumber of traces used to infer FSM 6Average trace length 100Number of nodes in generated FSM 51Number of transitions in generated FSM 134

at IBM Research, can the FITTEST technologies contributeto the effectiveness of testing when it is used in the testingenvironments at IBM Research?

IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$

Fig. 4. Effectiveness measures for both test suites with respect to the 10injected faults. “0” means that the corresponding fault was not detected, while“1” means it has been detected.

As can be seen from Table IV, the TSibm is substantialsmaller in size than the TSfittest in all parameters, this is oneof the evident results of automation. However, not only thesize of TSfittest is bigger, also the effectiveness of TSfittest,measured by the injected faults coverage (see Figure 4), issignificantly higher (50% vs 70%). Moreover, if we wouldcombine the TSibm and TSfittest suites, the effectivenessincreases to 80%. Therefore, within the context of the studiedenvironment, for IBM Research the FITTEST technologies cancontribute to the effectiveness of testing and IBM Research hasdecided that, for optimizing faults-finding capability, the twotechniques can best be combined.

RQ2: Compared to the current test suite used for testing atIBM Research, can the FITTEST technologies contribute to theefficiency of testing when it is used in the testing environmentsat IBM Research?

TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TESTSUITES.

Variable TSibm normalized TSfittest normalizedby size by size

Execution Time 36.75 9.18 127.87 1.52with fault injectionExecution Time 27.97 6.99 50.72 0.60without fault injection

It can be seen from Table III that the time to execute TSibm

is smaller than the time to execute TSfittest. This is due to thenumber of concrete tests in TSfittest. When we normalize theexecution time to the number of tests in the test suit, we seethat per test, the TSfittest execution time is much smaller (1.52vs. 9.18 minutes without the injected faults and 0.60 vs. 6.99minutes with the injected faults). This is due to the fact thatthe TSfittest suite includes much shorter tests. The executiontime is acceptable for IBM Research, considering the fact that

the effectiveness of the tests can be improved and more faultscan be detected in an efficient way (as was discussed in RQ1).Moreover, the shorter tests of which TSfittest is composed,can help identify the faults faster.

RQ3: How much effort would be required to deploy theFITTEST technologies within the testing processes implantedat IBM Research?

As can be seen from Table IV, the effort to set up theFITTEST components for the SUT and to specify the DomainInput Specification was 10 hours of effort for the FBK subject.Generating the FSM and the concrete test cases was automatedby the tools. The whole CPU time needed was about 1 minuteon a moderate personal computer. The effort to convert theconcrete tests by the FITTEST tools into executable tests forIBM Research and writing the automated activation scripts wasabout 2.5 days for the experienced IBM Research subject.

The amount of effort needed to deploy and execute theFITTEST tools is found reasonable by IBM Research, con-sidering the fact that these tasks need to be done only onceduring deployment. Moreover, the tools and format of the testsare new to the team, some learning is required. After all hasbeen set-up, effort to generate a new FITTEST test suites whennew logs would be available is fully automatic.

B. Threats to validity

Internal validity. It is of concern when causal relationsare examined. In our case study, an internal validity threatis related to the logs generated by the IBM simulation en-vironment to be used for automatically constructing the testmodels. Because of the quality of models can be affected bythe content of the input logs. We are aware of this threat andhave asked IBM for a diverse set of logs. Another similar threatis that the quality of concrete test cases can be affected bythe completeness of the Domain Input Specification (DIS) filebecause incomplete specification will weaken the efficiencyof the TSfittest. In fact, this threat might affect the overallnumber of detected faults by TSfittest, but if the specificationcan be improved, such number can be greater. Therefore, theconclusion about the effectiveness of the TSfittest remainsunchanged. Regarding to the involved subjects from IBM,although they had a high level of expertise and experienceworking in the industry as testers, they had no previousknowledge of the FITTEST tools. This threat was reduced bymeans of a closer collaboration between FBK and IBM, bycomplementing their competences in order to avoid possiblemistakes or misunderstandings.

External validity. It is concerned with to what extent itis possible to generalize the findings, and to what extent thefindings are of interest to other people outside the investigatedcase. Our results rely on one industrial case study using agiven set of artificial faults. Although running such studies isexpensive in terms of time consuming, we plan to replicateit with in order to have a more generalizable conclusions.However, as discussed earlier, the system under testing usedis a typical of a broad category of industrial systems thatcommunicates with multiple managed clients and with usersof the management system.

Construct validity. This aspect of validity reflect to whatextent the operational measures that are studied really represent

Page 82: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Collected  measures  

87

IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$

TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TOGENERATE TSfittest

VariableNumber of traces used to infer FSM 6Average trace length 100Number of nodes in generated FSM 51Number of transitions in generated FSM 134

at IBM Research, can the FITTEST technologies contributeto the effectiveness of testing when it is used in the testingenvironments at IBM Research?

IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$

Fig. 4. Effectiveness measures for both test suites with respect to the 10injected faults. “0” means that the corresponding fault was not detected, while“1” means it has been detected.

As can be seen from Table IV, the TSibm is substantialsmaller in size than the TSfittest in all parameters, this is oneof the evident results of automation. However, not only thesize of TSfittest is bigger, also the effectiveness of TSfittest,measured by the injected faults coverage (see Figure 4), issignificantly higher (50% vs 70%). Moreover, if we wouldcombine the TSibm and TSfittest suites, the effectivenessincreases to 80%. Therefore, within the context of the studiedenvironment, for IBM Research the FITTEST technologies cancontribute to the effectiveness of testing and IBM Research hasdecided that, for optimizing faults-finding capability, the twotechniques can best be combined.

RQ2: Compared to the current test suite used for testing atIBM Research, can the FITTEST technologies contribute to theefficiency of testing when it is used in the testing environmentsat IBM Research?

TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TESTSUITES.

Variable TSibm normalized TSfittest normalizedby size by size

Execution Time 36.75 9.18 127.87 1.52with fault injectionExecution Time 27.97 6.99 50.72 0.60without fault injection

It can be seen from Table III that the time to execute TSibm

is smaller than the time to execute TSfittest. This is due to thenumber of concrete tests in TSfittest. When we normalize theexecution time to the number of tests in the test suit, we seethat per test, the TSfittest execution time is much smaller (1.52vs. 9.18 minutes without the injected faults and 0.60 vs. 6.99minutes with the injected faults). This is due to the fact thatthe TSfittest suite includes much shorter tests. The executiontime is acceptable for IBM Research, considering the fact that

the effectiveness of the tests can be improved and more faultscan be detected in an efficient way (as was discussed in RQ1).Moreover, the shorter tests of which TSfittest is composed,can help identify the faults faster.

RQ3: How much effort would be required to deploy theFITTEST technologies within the testing processes implantedat IBM Research?

As can be seen from Table IV, the effort to set up theFITTEST components for the SUT and to specify the DomainInput Specification was 10 hours of effort for the FBK subject.Generating the FSM and the concrete test cases was automatedby the tools. The whole CPU time needed was about 1 minuteon a moderate personal computer. The effort to convert theconcrete tests by the FITTEST tools into executable tests forIBM Research and writing the automated activation scripts wasabout 2.5 days for the experienced IBM Research subject.

The amount of effort needed to deploy and execute theFITTEST tools is found reasonable by IBM Research, con-sidering the fact that these tasks need to be done only onceduring deployment. Moreover, the tools and format of the testsare new to the team, some learning is required. After all hasbeen set-up, effort to generate a new FITTEST test suites whennew logs would be available is fully automatic.

B. Threats to validity

Internal validity. It is of concern when causal relationsare examined. In our case study, an internal validity threatis related to the logs generated by the IBM simulation en-vironment to be used for automatically constructing the testmodels. Because of the quality of models can be affected bythe content of the input logs. We are aware of this threat andhave asked IBM for a diverse set of logs. Another similar threatis that the quality of concrete test cases can be affected bythe completeness of the Domain Input Specification (DIS) filebecause incomplete specification will weaken the efficiencyof the TSfittest. In fact, this threat might affect the overallnumber of detected faults by TSfittest, but if the specificationcan be improved, such number can be greater. Therefore, theconclusion about the effectiveness of the TSfittest remainsunchanged. Regarding to the involved subjects from IBM,although they had a high level of expertise and experienceworking in the industry as testers, they had no previousknowledge of the FITTEST tools. This threat was reduced bymeans of a closer collaboration between FBK and IBM, bycomplementing their competences in order to avoid possiblemistakes or misunderstandings.

External validity. It is concerned with to what extent itis possible to generalize the findings, and to what extent thefindings are of interest to other people outside the investigatedcase. Our results rely on one industrial case study using agiven set of artificial faults. Although running such studies isexpensive in terms of time consuming, we plan to replicateit with in order to have a more generalizable conclusions.However, as discussed earlier, the system under testing usedis a typical of a broad category of industrial systems thatcommunicates with multiple managed clients and with usersof the management system.

Construct validity. This aspect of validity reflect to whatextent the operational measures that are studied really represent

Page 83: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

•  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM?

–  TSibm finds 50% of injected faults, TSfittest finds 70%

–  Together they find 80%! -> IBM will consider combining the techniques

•  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM?

–  FITTEST test cases execute faster because they are smaller

–  Shorter tests was good for IBM -> easier to identify faults

•  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research?

–  Found reasonable by IBM considering the fact that manual tasks need to be done only once and more faults were fund.

Automated  Test  Case  Genera;on  example  case  study  IBM:  Conclusions  

Page 84: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Threats  to  validity  

•  The  learning  curve  effect  •  (an  basically  all  the  other  human  factors  ;-­‐)  •  Missing  informa;on  in  the  logs  leads  to  weak  FSMs.  •  Incomplete  specifica;on  of  the  DIS  lead  to  weak  concrete  test  

cases.  •  The  representa;veness  of  the  injected  faults  to  real  faults.  

89

Page 85: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Final  Things  ……  

 •  As  researchers,  we  should  concentrate  on  future  problems  the  

industry  will  face.  ¿no?    •  How  can  we  claim  to  know  future  needs  without  understanding  

current  ones?  

•  Go  to  industry  and  evaluate  your  results!  

90

Page 86: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Need  any  help?  More  informa;on?    Want  to  do  an  instan;a;on?    Contact:    Tanja  E.  J.  Vos    email:  [email protected]  skype:  tanja_vos  web:  hVp://tanvopol.webs.upv.es/  project:  hVp://www.facebook.com/FITTESTproject  telephone:  +34  690  917  971    

Page 87: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

References  empirical  research  in  so_ware  engineering  

[BSH86]  V  R  Basili,  R  W  Selby,  and  D  H  Hutchens.  Experimenta;on  in  so_ware  engineering.  IEEE  Trans.  So_w.  Eng.,  12:733–743,  July  1986.  [BSL99]   Victor   R.   Basili,   Forrest   Shull,   and   Filippo   Lanubile.   Building   knowledge   through   families   of   experiments.   IEEE   Trans.   So_w.   Eng.,  25(4):456–473,  1999.  [CWH05]  M.  Host  C.  Wohlin  and  K.  Henningsson.  Empirical  research  methods  in  so_ware  and  web  engineering.  In  E.  Mendes  and  N.  Mosley,  editors,  Web  Engineering  -­‐  Theory  and  Prac;ce  of  Metrics  and  Measurement    for  Web  Development,    pages  409–  429,  2005.  [Fly04]   Bent   Flyvbejrg.   Five  misunderstandings   about   case-­‐study   research.     In   Jaber   F.   Gubrium  Clive   Seale,   Giampietro  Gobo   and  David  Silverman,  editors,  Qualita;ve  Research    Prac;ce.,  pages  420–434.  London    and  Thousand  Oaks,  CA.,  2004.  [KLL97]  B.  Kitchenham,  S.  Linkman,  and  D.  Law.  Desmet:  a  methodology  for  evalua;ng  so_ware  engineering    [KPP+  02]  Barbara  A.  Kitchenham,  Shari  Lawrence  Pfleeger,  Lesley  M.  Pickard,  Peter  W.  Jones,  David  C.  Hoaglin,  Khaled  El  Emam,  and  JarreV  Rosenberg.    Preliminary  guidelines    for  empirical  research  in  so_ware  engineering.  IEEE    Trans.  So_w.  Eng.,  28(8):721–734,    2002.  [LSS05]  Timothy  C.  Lethbridge,  Susan  EllioV  Sim,  and  Janice  Singer.  Studying  so_ware  engineers:  Data    collec;on  techniques  for  so_ware  field  studies.  Empirical  So_ware  Engineering,  10:311–  341,  2005.    10.1007/s10664-­‐005-­‐1290-­‐x.  [TTDBS07]  Paolo  Tonella,  Marco  Torchiano,  Bart    Du    Bois,    and  Tarja  Syst¨a.    Empirical  studies  in  reverse  engineering:    state  of  the  art  and  future  trends.  Empirical  So_w.  Engg.,  12(5):551–571,  2007.  [WRH+00]  Claes  Wohlin,  Per  Runeson,  Mar;n  Host,  Magnus  C.    Ohls-­‐  son,  Bjorn  Regnell,  and  Anders  Wesslen.  Experimenta;on  in  so_ware  engineering:    an  introduc;on.  Kluwer  Academic  Publishers,  Norwell,  MA,  USA,  2000.  [Rob02]  Colin  Robson.  Real  World  Research.  Blackwell  Publishing  Limited,  2002.  [RH09]  Per  Runeson  and  Mar;n  Host.  Guidelines  for  conduc;ng  and  repor;ng  case  study  research  in  so_ware  engineering.  Empirical  So_w.  Engg.,  14(2):131–164,    2009.  [PPV00]   Dewayne   E.   Perry,   Adam   A.   Porter,   and   Lawrence   G.   VoVa.   2000.   Empirical   studies   of   so_ware   engineering:   a   roadmap.   In  Proceedings  of  the  Conference  on  The  Future  of  So?ware  Engineering  (ICSE  '00).  ACM,  New  York,  NY,  USA,  345-­‐355.  [EA06]  hVp://www.cs.toronto.edu/~sme/case-­‐studies/case_study_tutorial_slides.pdf  [Ple95]  Pfleeger,S.L.;  Experimental  design  and  analysis  in  so_ware  engineering.  Annals  of  So_ware  Engineering  1,  219-­‐253,  1995.  [PK01-­‐03]  Pfleeger,  S.L.  and  Kitchenham,  B.A..  Principles  of  Survey  Research.  So_ware  Engineering  Notes,  (6  parts)  Nov  2001  -­‐  Mar  2003.  [Fly06]  Flyvbjerg,  B.;  Five  Misunderstandings  about  Case  Study  Research.  Qualita;ve  Inquiry  12  (2)  219-­‐245,  April  2006  [KPP95]  Kitchenham,  B.;  Pickard,  L.;  Pfleeger,  Shari  Lawrence,  "Case  studies  for  method  and  tool  evalua;on,"  So?ware,  IEEE   ,  vol.12,  no.4,  pp.52,62,  Jul  1995          

92

Page 88: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

[BS87]  Victor  R.    Basili  and  Richard  W.    Selby.      Comparing  the  effec;veness  of  so_ware  tes;ng  strategies.   IEEE  Trans.  So_w.  Eng.,  13(12):1278–1296,  1987.  [DRE04]  Hyunsook  Do,  Gregg  Rothermel,  and  Sebas;an  Elbaum.     In-­‐   frastructure  support   for  controlled  experimenta;on  with  so_-­‐  ware  tes;ng  and  regression  tes;ng  techniques.  In  Proc.  Int.  Symp.  On  Empirical    So_ware  Engineering,    ISESE  ’04,  2004.  [EHP+  06]    Sigrid    Eldh,  Hans  Hansson,  Sasikumar  Punnekkat,  Anders  Pet-­‐  tersson,  and  Daniel  Sundmark.  A  framework  for  comparing  efficiency,  effec;veness  and  applicability  of  so_ware  tes;ng  tech-­‐  niques.  Tes;ng:  Academic  &  Industrial  Conference  on  Prac;ce  And  Research  Techniques  (TAIC    part),  0:159–170,  2006.  [JMV04]  Natalia  Juristo,  Ana  M.  Moreno,  and  Sira  Vegas.  Reviewing  25  years  of  tes;ng  technique  experiments.  Empirical  So_w.  Engg.,9(1-­‐2):7–44,  2004.  [RAT+  06]  Per  Runeson,  Carina  Andersson,  Thomas  Thelin,  Anneliese  Andrews,  and  Tomas  Berling.    What  do  we  know  about  de-­‐  fect  detec;on  methods?  IEEE  So_w.,  23(3):82–90,  2006.  [VB05]  Sira  Vegas  and  Victor  Basili.    A    characterisa;on    schema  for  so_ware  tes;ng  Techniques.  Empirical  So_w.  Engg.,  10(4):437–466,  2005.  [LR96]  Christopher  M.  LoV  and  H.  Dieter  Rombach.  Repeatable  so_ware  engineering    experiments    for  comparing  defect-­‐detec;on  techniques.  Empirical    So_ware  Engineering,    1:241–277,    1996.  10.1007/BF00127447.        

93

References  Empirical  research  in  so_ware  tes;ng  

Page 89: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

[Vos  et.  Al.  2010]  T.  E.  J.  Vos,  A.  I.  Baars,  F.  F.  Lindlar,  P.  M.  Kruse,  A.  Windisch,  and  J.  Wegener,  “Industrial  scaled  automated  structural  tes;ng  with  the  evolu;onary  tes;ng  tool,”  in  ICST,  2010,  pp.  175–184.    [Vos  et.  al  2012]  Tanja  E.  J.  Vos,  Arthur  I.  Baars,  Felix  F.  Lindlar,  Andreas  Windisch,  Benjamin  Wilmes,  Hamilton  Gross,  Peter  M.  Kruse,  Joachim  Wegener:  Industrial  Case  Studies  for  Evalua;ng  Search  Based  Structural  Tes;ng.  Interna;onal  Journal  of  So_ware  Engineering  and  Knowledge  Engineering  22(8):  1123-­‐  (2012)  [Vos  et.  Al.  2013]  T.  E.  J.  Vos,  F.  Lindlar,  B.  Wilmes,  A.  Windisch,  A.  Baars,  P.  Kruse,  H.  Gross,  and  J.  Wegener,  “Evolu;onary  func;onal  black-­‐box  tes;ng  in  an  industrial  semng,”  So?ware  Quality  Journal,  21  (2):  259-­‐288  (2013)  [MRT08]  A.  MarcheVo,  F.  Ricca,  and  P.  Tonella,  “A  case  study-­‐  based  comparison  of  web  tes;ng  techniques  applied  to  ajax  web  applica;ons,”  Interna;onal  Journal  on  So_ware  Tools  for  Technology  Transfer  (STTT),  vol.  10,  pp.  477–  492,  2008    

94

References  Descrip;ons  of  empirical  studies  done  in  so_ware  tes;ng  

Page 90: TAROT summerschool slides 2013 - Italy

The FITTEST project is funded by the European Commission (FP7-ICT-257574)

Other  references  [TS04]  T.S.  Tullis  and  J.  N.  Stetson.  A  comparison  of  ques;onnaires  for  assessing  website  usability.  In  Proceedings  of  the  Usability  Professionals  Associa;on  Conference,  2004.  [BKM08]  Aaron  Bangor,  Philip  T.    Kortum,  and  James  T.    Miller.    An  empirical  evalua;on  of  the  system  usability  scale.  Interna;onal  Journal  of  Human-­‐computer    Interac;on,    24:574–594,  2008.  [Horn06]  Hornbæk,  K.  (2006),  "Current  Prac8ce  in  Measuring  Usability:  Challenges  to  Usability  Studies  and  Research",  Interna8onal  Journal  of  Human-­‐Computer  Studies,  Volume  64,  Issue  2  ,  February  2006,  Pages  79-­‐102.    [MTR08]  MarcheVo,  Alessandro;  Tonella,  Paolo  and  Ricca,  Filippo.,  State-­‐based  tes;ng  of  Ajax  web  applica;ons,  So_ware  Tes;ng,  Verifica;on,  and  Valida;on,  2008  1st  Interna;onal  Conference  on,  pp121-­‐130,  IEEE,  2008.  [NMT12]  C.  D.  Nguyen,  A.  MarcheVo,  and  P.  Tonella.  Combining  model-­‐based  and  combinatorial  tes;ng  for  effec;ve  test  case  genera;on.  In  Proceedings  of  the  2012  Interna;onal  Symposium  on  So_ware  Tes;ng  and  Analysis,  pages  100-­‐110,  ACM,  2012.    

95