A Framework for Empirical Evaluation of Model Comprehensibility Jorge Aranda, Neil Ernst, Jennifer Horkoff, & Steve Easterbrook University of Toronto MiSE

A

Frameworkfor

Empirical Evaluationof

Model Comprehensibility

Jorge Aranda, Neil Ernst, Jennifer Horkoff, & Steve EasterbrookUniversity of Toronto

MiSE 2007, Minneapolis, MN

2

IMAGINE DILBERT STRIP HERE

3

If there are so manymodeling languages...

• Static modeling languages• Dynamic modeling languages• Intentional modeling languages• Argumentation modeling languages• Belief modeling languages• Meta-modeling languages• Unified modeling languages• ...

4

...why does nobody use them?• OK, nobody is too strong

– But from what I see, when faced with real projects...• my colleagues do not use them• my professors do not use them• my tutored students do not use them

– unless I force them to

• most small software houses I know of do not use them– 86% in a soon-to-appear field study of small software companies

• most big software corporations I know of do not use them– Figure appears to range from 10-25% depending on the survey– Highest usage number (~50%) of use case and class diagrams by companies contacted through

the OMG

• Dilbert does not seem to use them either

• There is a community that, though they may not use them, they certainly talk about them

5

Why does nobody use my modeling

language?stuff?

6

Possibility 1:Kial neniu uzi Esperanto?

• Esperanto for “why does nobody use Esperanto?”

– I think...

• Reason: Unnecessary invention– English is universal enough– Klingon is geekier– Only useful in Esperanto

conventions!

• Remedy?– Mostly hopeless

7

Possibility 2:Why does nobody use my typewriter?

• Looks complicated• Not intuitive• I’m used to handwriting

• Reason: Complexity and unfamiliarity– At first sight it doesn’t seem

to be worth the effort– But all it takes is to see an

expert doing wonders with it for us to want to learn as well

• Remedy?– Training and publicity

8

Possibility 3:Why does nobody use my bicycle?

• Very uncomfortable!– Nicknamed “bone breaker”

• Painful landings• Inappropriate roads

• Reason: Needs refinement– Evolution– Trial and error– Context (roads) evolves

along with artifact (bike)

• Remedy?– Evaluation and refinement

9

So which is it?

• Why does nobody use my modeling language?– Useless proposal?

• Certainly true in some cases– (but not for any of the members of this distinguished

audience!)

– Lack of training?• Less true than we’d like it to be• Our favourite excuse

– “Software developers don’t know what they’re doing”

– “This is why we’re going through a software crisis”

• “Adapt the user” approach– Or often “blame the user”...

– Needs refinement?• “Adapt the tool” approach

– Evaluate, identify, eliminate weaknesses

• True of almost every proposal

10

Why would we use models?

• Exploration and reflection• Model-driven development• Model checking• RUP asked me to do it

• Explaining a domain to developers• Explaining a system to clients• Documenting for future maintainers• Memory aids• Minimizing ambiguity• Simplified, abstracted terms

• Models are primarily communication artifacts!

}Communication

11

Communication Artifacts

• Models are communication artifacts...– ...so let’s study them as such!

• Some qualities of communication artifacts:– Codification effort– Learning curve– Obsolescence– Comprehensibility

• We decided (for now) to focus on comprehensibility– Why?

• It has bitten us in the past• “It is not enough to preach ---one must be heard”

12

Challenges of Evaluating Comprehensibility

• Tricky construct– Affected variables

• Correctness of understanding• Time• Confidence• Perceived difficulty

– Affecting variables• Type of task• Language expertise• Domain expertise• Problem size

– Unfeasible to evaluate them all in a single empirical study

13

Challenges of Evaluating Comprehensibility

• Accessibility of participants– It’s hard enough to find participants for standard software

engineering studies– Requiring language/domain expertise makes the task much

harder

• Ensuring “fair” comparisons– It is practically impossible to guarantee that two different

representations transmit the same meaning to a human reader– Informal semantics play a large role in human comprehension

A

D E

B C A

B DC E

14

A Framework forEmpirical Evaluation...

• Most modeling languages are never evaluated– And third-party evaluation is almost non-existent– Popular languages do get their share of studies (ER,

DFDs, some UML)• But for most proposals we’re stuck with version 1.0

• We designed a framework to run empirical studies of model comprehensibility– Based on our survey of (scarce) past

comprehensibility papers...– ...and on our struggle to design appropriate

evaluations– I am not going to explain it in full here

• No time!• I will only cover it superficially and refer you to our

paper

– Warning: The framework itself has not been evaluated!

15

The Framework

• Step 1: Select the modeling notation– Which version will be studied?– Are we including language extensions?– Can we tweak the rules of the notation (as often happens in

practice), or are we implementing the rules strictly?

16

17

The Framework (cont)

• Step 2: Articulate the underlying theory of the language– What is the language useful for?– Who should be writing in it?– Who should be reading it?– When in the software process should the language be used?

• Step 3: Formulate the claims of the notation– Re-express the underlying theory as a set of claims regarding

comprehension

18

19


• Step 4: Choose a control– It should be a sensible alternative to the notation– It does not need to be diagrammatic– Risky to compare a language extension vs. the bare language

• Step 5: Turn the claims into hypotheses– Consider the affected/affecting comprehensibility variables– From a language evolution perspective it is more important to

discover which elements and concepts work well, and which do not, rather than to make general claims of the language

20


• Step 6: Inform the hypotheses– Bring insights from other areas that study comprehension

• External cognition• Cognitive dimensions framework• ...

• Step 7: Design and execute the study– Suggestions:

• Natural domains• Explicit participant roles• Expert modelers• Two or more domains• Collect data on all affected variables

• Step 8: Improve these guidelines

21

The Framework (summary)

• Step 1: Select the modeling notation

• Step 2: Articulate the underlying theory

• Step 3: Formulate the claims of the notation

• Step 4: Choose a control

• Step 5: Turn the claims into hypotheses

• Step 6: Inform the hypotheses

• Step 7: Design and execute the study

• Step 8: Improve these guidelines

22

Questions?

Documents

A Framework for Empirical Evaluation of Model Comprehensibility Jorge Aranda, Neil Ernst, Jennifer Horkoff, & Steve Easterbrook University of Toronto MiSE