WCRE08.ppt

1/21

WCRE 2008,

Antwerp

Jane Huffman Hayes, Jane Huffman Hayes, GiulianoGiuliano ((GiulioGiulio) Antoniol and ) Antoniol and YannYann--GaGaëël Gul Guééhhééneucneuc

PrereqirPrereqir: Recovering Pre: Recovering Pre--Requirements Requirements

via Cluster Analysis via Cluster Analysis

2/21

WCRE 2008,

Antwerp

ContentContent

�� Problem StatementProblem Statement

�� PREREQUIR IdeaPREREQUIR Idea

�� PREREQUIR ProcessPREREQUIR Process

�� TechnologiesTechnologies

�� WEB Browser RequirementsWEB Browser Requirements

�� Case Study ResultsCase Study Results

�� ConclusionsConclusions

3/21

WCRE 2008,

Antwerp

The ChallengeThe Challenge

�� A few years after deployment, the RS may no A few years after deployment, the RS may no

longer exist.longer exist.

�� If it exists, it will be almost surely outdated.If it exists, it will be almost surely outdated.

�� My customers may desire new functionalities or My customers may desire new functionalities or

technologies that my system may or may not technologies that my system may or may not

implement.implement.

�� I poll my stakeholders:I poll my stakeholders:

� programmers, managers, testing team members, marketing personnel, and end users;

� find out what they believe the system should do.

4/21

WCRE 2008,

Antwerp

PREREQIR in EssencePREREQIR in Essence

�� We need a preWe need a pre--requirement document:requirement document:� what the competitor systems do;

� what our customer base needs.

�� Obtain and vet a list of requirements from diverse Obtain and vet a list of requirements from diverse stakeholders.stakeholders.

�� Structure requirements by mapping them into Structure requirements by mapping them into representation suitable for grouping via patternrepresentation suitable for grouping via pattern--recognition and similarityrecognition and similarity--based clustering.based clustering.

�� Analyze clustered requirements to divide them Analyze clustered requirements to divide them into set of essential and set of optional into set of essential and set of optional requirements.requirements.

5/21

WCRE 2008,

Antwerp

The PREREQUIR ProcessThe PREREQUIR Process

Requirements ri

Split, Stop-word Removal,

Stemming

Tokenization

TF-IDF

rp

browser support zoom

unzoom page detail

1 2 3

ri

Browser support print

Clustering

PAM/AGNES

4

Recovered PRI ri

and oj

Requirements rp

101110

001010

Labelling

Clusters

Vector space/clusteringTextual documents

6/21

WCRE 2008,

Antwerp

PREREQUIR TechnologyPREREQUIR Technology

�� Standard information retrieval vector space Standard information retrieval vector space

model.model.

�� Indexing process:Indexing process:

� Stopper;

� Stemmer;

� Thesaurus (not vital but helps);

� TF-IDF indexing.

�� Clustering PAM and AGNES.Clustering PAM and AGNES.

�� Labeling: still an open question.Labeling: still an open question.

7/21

WCRE 2008,

Antwerp

Step 1 Step 1 –– Collect Stakeholders RSCollect Stakeholders RS

�� By means of questionnaires, collect stakeholders By means of questionnaires, collect stakeholders

requirements.requirements.

�� We favor a nonWe favor a non--intrusive lightweight approach such as a intrusive lightweight approach such as a

WEB based questionnaire.WEB based questionnaire.

�� Minimize the risk of influencing stakeholder.Minimize the risk of influencing stakeholder.

�� There is risk that:There is risk that:

� he/she did not really understand the task;

� the granularity and level is very different between

respondents;

� the respondent population is not heterogeneous enough;

� the sample size is small.

8/21

WCRE 2008,

Antwerp

Step 2 Step 2 –– Vector Space MappingVector Space Mapping

�� The goal is to group single requirements by The goal is to group single requirements by

different users into clusters representing the different users into clusters representing the

same functionality/concept.same functionality/concept.

�� By means of standard IR tools, map the collected By means of standard IR tools, map the collected

requirements into a vector space.requirements into a vector space.

�� Stopper, stemmer, and TDF/IDF plus thesaurus Stopper, stemmer, and TDF/IDF plus thesaurus

expansions:expansions:

� certain stakeholders may use cryptic terms such as RFC or test/benchmark acronyms.

9/21

WCRE 2008,

Antwerp

Step 3 Step 3 –– ClusteringClustering

�� Transform similarity into a distance.Transform similarity into a distance.

�� Apply robust partition around medoids.Apply robust partition around medoids.

�� Estimate the number of clusters (different Estimate the number of clusters (different requirements) requirements) silhouettesilhouette::

�� a(ia(i)) average distance to the other PRI in the cluster; average distance to the other PRI in the cluster;

�� b(ib(i)) is the average distance to PRI in the nearest cluster.is the average distance to PRI in the nearest cluster.

�� Take the flex close to max value of the average Take the flex close to max value of the average silhouette.silhouette.

{ })(),(max

)()()(

ibia

iaibis

−=

.>0.70 very strong structure

0.50 … 0.70 reasonable structure

0.25 … 0.50 weak structure

< 0.25 no structure.

10/21

WCRE 2008,

Antwerp

Step 3 Step 3 BisBis –– Tree StructureTree Structure

�� If there is a weak structure, check for a If there is a weak structure, check for a requirement tree organization.requirement tree organization.

�� ReRe--cluster with AGNES.cluster with AGNES.

�� Compute the Agglomerative Coefficient (AC).Compute the Agglomerative Coefficient (AC).

�� AC measures the strength of the hierarchical AC measures the strength of the hierarchical structure discovered.structure discovered.

�� AC > 0.9 a very strong hierarchical structure.AC > 0.9 a very strong hierarchical structure.

�� Impose a threshold on the average similarity to Impose a threshold on the average similarity to avoid grouping avoid grouping ““too differenttoo different”” things.things.

11/21

WCRE 2008,

Antwerp

Step 4 Step 4 –– Label ClustersLabel Clusters

�� Process each PRI of a cluster:Process each PRI of a cluster:

� stopping, stemming;

� build cluster-specific dictionary;

� weight each word by its frequency in the cluster:

� If a word is in all the PRI in a cluster, its weight is 1.00. If a word

appears in half of the PRI, its weight is 0.50.

�� For a given stemmed PRI, calculate a score:For a given stemmed PRI, calculate a score:

� sum up the weights of the stems present in the cluster

dictionary to obtain a positive weight;

� count the number of words in the cluster-specific dictionary

that are absent in the current PRI:

� obtain a negative weight.

�� Assign a score to the PRI computed as:Assign a score to the PRI computed as:

� the ratio positive weight / negative weights.

�� Label the cluster:Label the cluster:

� take the PRI with the highest score.

12/21

WCRE 2008,

Antwerp

Case StudyCase Study

�� Mimic the recovery process for a Web browser.Mimic the recovery process for a Web browser.

�� Pool via ePool via e--mail to a set of users (about 200).mail to a set of users (about 200).

�� 25 answers out of which we kept 22, overall 433 25 answers out of which we kept 22, overall 433

user needs:user needs:

�� mostly male (20), age varies, average 36, standard mostly male (20), age varies, average 36, standard

deviation 9.5;deviation 9.5;

�� respondents: 10 researchers, five lecturers/professors, respondents: 10 researchers, five lecturers/professors,

four students, one programmer, and two project four students, one programmer, and two project

managers.managers.

13/21

WCRE 2008,

Antwerp

PAM PAM -- AGNESAGNES

�� We did not find a strong or evident cluster We did not find a strong or evident cluster

structure:structure:

� silhouette about 0.26;

� region between 167 – 170 cluster:

� say 170 clusters or less.

�� AGNES reports a strong structure:AGNES reports a strong structure:

� AC above 0.9.

�� Grouping via AGNESGrouping via AGNES

�� grows a tree starting from leaves grows a tree starting from leaves

14/21

WCRE 2008,

Antwerp

OutliersOutliers

�� Setting a cluster internal similarity threshold Setting a cluster internal similarity threshold

decidesdecides

�� top level clusterstop level clusters

�� singleton clusters singleton clusters -- outliersoutliers

�� inner nodesinner nodes

�� The The ““non keptnon kept”” are also important:are also important:

� single user needs;

� more expert users may use acronyms

� must comply with ACID2

� “too generic: sentences:

� it should be fast.

15/21

WCRE 2008,

Antwerp

AGNES Clusters AGNES Clusters

0

100

200

300

400

500

600

700

800

900

1000

0.0

35

0.0

75

0.1

15

0.1

55

0.1

95

0.2

35

0.2

75

0.3

15

0.3

55

0.3

95

0.4

35

0.4

75

0.5

15

0.5

55

0.5

95

0.6

35

0.6

75

0.7

15

0.7

55

0.7

95

0.8

35

0.8

75

0.9

15

Tops

Intermediate

Overall

Outliers

Leaves

Thre

sh

old

16/21

WCRE 2008,

Antwerp

Manual VerificationManual Verification

�� Two people reviewed cluster and cluster labeling.Two people reviewed cluster and cluster labeling.

�� IR measures precision and recall.IR measures precision and recall.

�� Precision measures the quality of the clusters.Precision measures the quality of the clusters.

�� A conservative approach:A conservative approach:

� “Yes” was assigned if both authors said “Yes”;

� “No” was assigned if one of the authors said “No”;

� “Maybe” was assigned in the other cases.

17/21

WCRE 2008,

Antwerp

Precision Recall Precision Recall –– 0.36 0.36

128 Common User Needs, 181 Outliers 128 Common User Needs, 181 Outliers

0

0,2

0,4

0,6

0,8

1

0,2

25

0,2

85

0,3

45

0,4

05

0,4

65

0,5

25

0,5

85

0,6

45

0,7

05

0,7

65

0,8

25

0,8

85

Precision

Recall

Percentage of Outliers

Th

resh

old

18/21

WCRE 2008,

Antwerp

Traceability TaskTraceability Task

�� PRI for a Web browser provided:PRI for a Web browser provided:

� Web site: www.learnthenet.com.

�� There are 20 There are 20 LtNLtN PRI:PRI:

� textual PRI ranging from 5 to 73 words, having on average 23.5 words.

�� LtN10: LtN10: ““The toolbar should include a Reload or The toolbar should include a Reload or

Refresh button to load the web page again.Refresh button to load the web page again.””

�� Trace via vector space retrieval with Trace via vector space retrieval with tftf--idfidf..

�� Similarity threshold of 0.20.Similarity threshold of 0.20.

19/21

WCRE 2008,

Antwerp

Manual Evaluation by Two AuthorsManual Evaluation by Two Authors

�� 14 of the 20 14 of the 20 LtNLtN PRI are traced:PRI are traced:

�� the 14 PRI were all marked as the 14 PRI were all marked as ““YesYes”” by both authors.by both authors.

�� If we also include the two marked as If we also include the two marked as ““MaybeMaybe””

there are 16 there are 16 LtNLtN PRI out of 20 traced.PRI out of 20 traced.

�� Overall, between 70% (Overall, between 70% (““YesYes”” only) and 80% only) and 80%

((““YesYes”” and and ““MaybeMaybe””) of the ) of the LtNLtN PRI are also PRI are also

found in the PRI obtained from the respondents.found in the PRI obtained from the respondents.

20/21

WCRE 2008,

Antwerp

Threats to Validity Threats to Validity

�� External validityExternal validity: only one system and 22 : only one system and 22

answers out of 200, impact of vocabulary is not answers out of 200, impact of vocabulary is not

known.known.

�� Construct validityConstruct validity: computation performed using : computation performed using

widely adopted toolsets, other tool can produce widely adopted toolsets, other tool can produce

different results.different results.

�� Reliability validityReliability validity: material will be made : material will be made

available.available.

�� Internal validityInternal validity: subjectivity introduced by : subjectivity introduced by

experts, experts, ““YesYes”” if and only if both agrees.if and only if both agrees.

21/21

WCRE 2008,

Antwerp

ConclusionConclusion

�� AGNES clusters PRI with an accuracy of 70%.AGNES clusters PRI with an accuracy of 70%.

�� A similarity threshold of about 0.36, about 55% of A similarity threshold of about 0.36, about 55% of

the PRI were common to two or more the PRI were common to two or more

stakeholders and 42% were outliers:stakeholders and 42% were outliers:

� 128 – 181.

�� We automatically label the common and outlier We automatically label the common and outlier

PRI with 82% of the labels being correct.PRI with 82% of the labels being correct.

�� The method achieves roughly 70% recall and The method achieves roughly 70% recall and

70% precision when compared to a ground truth.70% precision when compared to a ground truth.