REVERERecovering Legacy Requirements
an EPSRC-SEBPC projectan EPSRC-SEBPC project
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer2
Positioning
User/Customer Requirements Engineer
Software Architect
Environment
Needs Specification Design / Architecture
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer3
Acronyms
REVERE: Reverse engineering of requirements to support REVERE: Reverse engineering of requirements to support business process changebusiness process change
SEBPC: Systems Engineering for Business Process SEBPC: Systems Engineering for Business Process ChangeChange
CSEG: Co-operative Systems Engineering GroupCSEG: Co-operative Systems Engineering Group UCREL: University Centre for Computer Corpus Research UCREL: University Centre for Computer Corpus Research
on Languageon Language
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer4
Who?
Supervised by Roger Garside and Pete SawyerSupervised by Roger Garside and Pete Sawyer A joint CSEG & UCREL projectA joint CSEG & UCREL project Adelard consultancy providing technical advice, Adelard consultancy providing technical advice,
documentary data, evaluation of the integrated method and documentary data, evaluation of the integrated method and piloting of the toolset resulting from the project.piloting of the toolset resulting from the project.
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer5
What?
Improve the requirements analysis for legacy system Improve the requirements analysis for legacy system evolution where underlying BP has already changed.evolution where underlying BP has already changed.
Pre-change organisation
Post-changeorganisation
Target operational software
Existing operational software
De-facto organisationchange
Requiredsoftware change
Motivatingrequirements
Newrequirements
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer6
Proposal
Reverse engineering of requirements documents by the Reverse engineering of requirements documents by the novel integration of techniques for the textual analysis of novel integration of techniques for the textual analysis of documentation; modelling of business processes; and documentation; modelling of business processes; and modelling the organisational structures serving the modelling the organisational structures serving the business processes.business processes.
Project started in May 1998Project started in May 1998 Review other applications of NLPReview other applications of NLP Rule-based (Goldin and Berry, 1997: Abstfinder) or sub-Rule-based (Goldin and Berry, 1997: Abstfinder) or sub-
language examples (Cyre 1995)language examples (Cyre 1995)
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer7
Why?
BP change means redesign support systems, operating procedures and documentation.BP change means redesign support systems, operating procedures and documentation. High cost of recovering the motivating requirements.High cost of recovering the motivating requirements. Key people who possess the knowledge may be unavailable.Key people who possess the knowledge may be unavailable. Information is often implicit in documents such as requirements specifications, operating Information is often implicit in documents such as requirements specifications, operating
manuals and data models. manuals and data models.
Businessprocessesmodels
Organisationalstructures
Supportsystems
RequirementsRoles
have
enact
supportDocuments
Described inHavestakes
in
impose
define
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer8
What next?
UCREL tools provide robust analysis over unrestricted UCREL tools provide robust analysis over unrestricted domainsdomains
Mainly statistically based with template analysis Mainly statistically based with template analysis componentscomponents
Layered: POS, lemmatisation, anaphor resolution, Layered: POS, lemmatisation, anaphor resolution, semantic analysissemantic analysis
Corpus annotation is fast and accurate way of improving Corpus annotation is fast and accurate way of improving information extraction from textinformation extraction from text
Porting from UNIX to Linux & PCPorting from UNIX to Linux & PC Integrate with Adelard’s ClaviarIntegrate with Adelard’s Claviar
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer9
CLAWS POS tagging
Grammatical tagging, is the commonest form of corpus Grammatical tagging, is the commonest form of corpus annotation, and was the first form of annotation to be annotation, and was the first form of annotation to be developed by UCREL at Lancaster. Our POS tagging developed by UCREL at Lancaster. Our POS tagging software for English text, CLAWS (the Constituent software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System), has been Likelihood Automatic Word-tagging System), has been continuously developed since the early 1980s. The latest continuously developed since the early 1980s. The latest version of the tagger, CLAWS4, was used to POS tag version of the tagger, CLAWS4, was used to POS tag c.100 million words of the British National Corpus (BNC).c.100 million words of the British National Corpus (BNC).
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer10
CLAWS POS tagging
Grammatical_JJ tagging_NN1@ ,_, is_VBZ the_AT commonest_JJT Grammatical_JJ tagging_NN1@ ,_, is_VBZ the_AT commonest_JJT form_NN1 of_IO corpus_NN1 annotation_NN1 ,_, and_CC form_NN1 of_IO corpus_NN1 annotation_NN1 ,_, and_CC was_VBDZ the_AT first_MD form_NN1 of_IO annotation_NN1 was_VBDZ the_AT first_MD form_NN1 of_IO annotation_NN1 to_TO be_VBI developed_VVN by_II UCREL_NP1 at_II to_TO be_VBI developed_VVN by_II UCREL_NP1 at_II Lancaster_NP1 ._. Our_APPGE POS_NN2 tagging_VVG Lancaster_NP1 ._. Our_APPGE POS_NN2 tagging_VVG software_NN1 for_IF English_JJ text_NN1 ,_, CLAWS_NN2 software_NN1 for_IF English_JJ text_NN1 ,_, CLAWS_NN2 (_( the_AT Constituent_NN1 Likelihood_NN1 Automatic_JJ Word-(_( the_AT Constituent_NN1 Likelihood_NN1 Automatic_JJ Word-tagging_JJ System_NN1 )_) ,_, has_VHZ been_VBN tagging_JJ System_NN1 )_) ,_, has_VHZ been_VBN continuously_RR developed_VVN since_CS the_AT early_JJ continuously_RR developed_VVN since_CS the_AT early_JJ 1980s_MC2 ._. The_AT latest_JJT version_NN1 of_IO the_AT 1980s_MC2 ._. The_AT latest_JJT version_NN1 of_IO the_AT tagger_NN1 ,_, CLAWS4_FO ,_, was_VBDZ used_JJ to_II tagger_NN1 ,_, CLAWS4_FO ,_, was_VBDZ used_JJ to_II POS_NN2 tag_VV0 c.100_FO million_NNO words_NN2 of_IO POS_NN2 tag_VV0 c.100_FO million_NNO words_NN2 of_IO the_AT British_JJ National_JJ Corpus_NN1 ._.the_AT British_JJ National_JJ Corpus_NN1 ._.
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer11
Semantic tagging Grammatical_Q3 tagging_Z99 ,_PUNC is_A3+ the_Z5 Grammatical_Q3 tagging_Z99 ,_PUNC is_A3+ the_Z5
commonest_A6.2+++ form_A4.1 of_Z5 corpus_Q3 commonest_A6.2+++ form_A4.1 of_Z5 corpus_Q3 annotation_Q1.2 ,_PUNC and_Z5 was_A3+ the_Z5 first_P1c[i1.2.1 annotation_Q1.2 ,_PUNC and_Z5 was_A3+ the_Z5 first_P1c[i1.2.1 form_P1c[i1.2.2 of_Z5 annotation_Q1.2 to_Z5 be_Z5 developed_A2.1+ form_P1c[i1.2.2 of_Z5 annotation_Q1.2 to_Z5 be_Z5 developed_A2.1+ by_Z5 UCREL_Z99 at_Z5 Lancaster_Z2 ._PUNC Our_Z8 POS_I2.2 by_Z5 UCREL_Z99 at_Z5 Lancaster_Z2 ._PUNC Our_Z8 POS_I2.2 tagging_Q1.1 software_Y2 for_Z5 English_Z2 text_Q1.2 ,_PUNC tagging_Q1.1 software_Y2 for_Z5 English_Z2 text_Q1.2 ,_PUNC CLAWS_L2 (_PUNC the_Z5 Constituent_G1.2/S2mf Likelihood_A7 CLAWS_L2 (_PUNC the_Z5 Constituent_G1.2/S2mf Likelihood_A7 Automatic_A1.1.1 Word-tagging_Z99 System_X4.2 )_PUNC ,_PUNC Automatic_A1.1.1 Word-tagging_Z99 System_X4.2 )_PUNC ,_PUNC has_Z5 been_Z5 continuously_T2++ developed_A2.1+ since_Z5 the_Z5 has_Z5 been_Z5 continuously_T2++ developed_A2.1+ since_Z5 the_Z5 early_T1.3[i2.2.1 1980s_T1.3[i2.2.2 ._PUNC The_Z5 latest_T3--- early_T1.3[i2.2.1 1980s_T1.3[i2.2.2 ._PUNC The_Z5 latest_T3--- version_A4.1 of_Z5 the_Z5 tagger_Z99 ,_PUNC CLAWS4_Z99 ,_PUNC version_A4.1 of_Z5 the_Z5 tagger_Z99 ,_PUNC CLAWS4_Z99 ,_PUNC was_A3+ used_T1.1.1[i3.2.1 to_T1.1.1[i3.2.2 POS_I2.2 tag_Q1.1 was_A3+ used_T1.1.1[i3.2.1 to_T1.1.1[i3.2.2 POS_I2.2 tag_Q1.1 c.100_Z99 million_N1 words_Q3 of_Z5 the_Z5 British_Z2 National_Z3c c.100_Z99 million_N1 words_Q3 of_Z5 the_Z5 British_Z2 National_Z3c Corpus_Q3 ._PUNC Corpus_Q3 ._PUNC
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer12
Statistical analysis
Build training & test corpus and separate normative corpus Build training & test corpus and separate normative corpus for vocabulary norms:for vocabulary norms: requirements documents, operating manualsrequirements documents, operating manuals IBM manuals corpus (800K)IBM manuals corpus (800K) Subcorpus of BNC (applied science 11 million words)Subcorpus of BNC (applied science 11 million words) CSEG technical reports?CSEG technical reports? Transcripts of ethnographic studies of technical workplacesTranscripts of ethnographic studies of technical workplaces Public domain IT standards documentsPublic domain IT standards documents
Retrain CLAWS probability matrix, vocabulary and idiom Retrain CLAWS probability matrix, vocabulary and idiom usage and investigate frequency distributions for text usage and investigate frequency distributions for text normsnorms
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer13
Preliminary results
Semantic comparison of LIBSYS and BNCIT corpusSemantic comparison of LIBSYS and BNCIT corpus
Semantictag
Semantic category Example items LIBSYSrelativefrequency
Log-likelihood
BNC ITrelativefrequency
Q1.2 Paper documents andwriting
documents,records, prints
4.74 717.5 1.02
T1.1.3 Time future will, shall 3.70 483.8 0.91A1.5.1 Using user, end-user 2.64 260.1 0.81I2.1 Business agents,
commercial0.12 208.8 1.44
S7.1+ Power, organising administrator,management,order
2.31 159.5 0.89
Q4.1 The media author,catalogues,librarian
0.98 144.6 0.22
X9.1+ Ability, intelligence be-able-to 0.75 129.9 0.14X2.4 Investigate search 0.04 119.0 0.74
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer14
Objects and operations
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer15
Discussion for paper 1 (Carroll & Swatman)
Which quality features are addressed by the paper?Which quality features are addressed by the paper? Quality management of the early phases of RE process in order to target business Quality management of the early phases of RE process in order to target business
problems correctly.problems correctly. What is the main novelty/contribution of the paper?What is the main novelty/contribution of the paper?
““RE is opportunistic not deterministic”.RE is opportunistic not deterministic”. How will this novelty/contribution improve RE practice/research?How will this novelty/contribution improve RE practice/research?
Avoid focussing on one methodologyAvoid focussing on one methodology What are the main problems with the novelty/contribution and/or paper?What are the main problems with the novelty/contribution and/or paper?
Case study may be unrepresentative in terms of composition of the team.Case study may be unrepresentative in terms of composition of the team. Can the proposed approach be expected to scale to real-life problems?Can the proposed approach be expected to scale to real-life problems?
If a company has invested and trained in one methodology then they will probably use it, whether If a company has invested and trained in one methodology then they will probably use it, whether it fits the problem or not.it fits the problem or not.
REFSQ’99Paul Rayson, Roger Garside,
Pete Sawyer16
Discussion for paper 3 (Claus et al)
Which quality features are addressed by the paper?Which quality features are addressed by the paper? Quality assurance: establishing organisational procedures and standardsQuality assurance: establishing organisational procedures and standards
What is the main novelty/contribution of the paper?What is the main novelty/contribution of the paper? Demonstrates practical ‘management’ problems of introducing requirements management.Demonstrates practical ‘management’ problems of introducing requirements management.
How will this novelty/contribution improve RE practice/research?How will this novelty/contribution improve RE practice/research? Emphasise involvement of stakeholders from an early stage.Emphasise involvement of stakeholders from an early stage.
What are the main problems with the novelty/contribution and/or paper?What are the main problems with the novelty/contribution and/or paper? Technical problems were trivial in this case study.Technical problems were trivial in this case study.
Can the proposed approach be expected to scale to real-life problems?Can the proposed approach be expected to scale to real-life problems? Don’t underestimate the difficulty of making the change happen.Don’t underestimate the difficulty of making the change happen.