15
The REVERE project: experiments with the application of probabilistic NLP to systems engineering Paul Rayson 1 , Luke Emmet 2 , Roger Garside 1 and Pete Sawyer 1 1 Lancaster University and 2 Adelard

The REVERE project: experiments with the application of probabilistic NLP to systems engineering Paul Rayson 1, Luke Emmet 2, Roger Garside 1 and Pete

Embed Size (px)

Citation preview

The REVERE project: experiments with the

application of probabilistic NLP to systems engineering

Paul Rayson1, Luke Emmet2, Roger Garside1 and Pete Sawyer1

1 Lancaster University and 2Adelard

Abstract

Despite natural language's well-documented shortcomings as a medium for precise technical description, its use in software-intensive systems engineering remains inescapable. This poses many problems for engineers who must derive problem understanding and synthesise precise solution descriptions from free text. This is true both for the largely unstructured textual descriptions from which system requirements are derived, and for more formal documents, such as standards, which impose requirements on system development processes. This paper describes experiments that we have carried out in the REVERE project to investigate the use of probabilistic natural language processing techniques to provide systems engineering support.

Natural language products of the systems engineering process

SystemsEngineering

Process

RequirementsSpecification

Test plans

Natural language inputs to the systems engineering process

SystemsEngineering

Process

User interviewtranscripts

Standardsdocuments

Usermanuals

The REVERE project

Robust NLP

Largefree textcorpora

NLPtoolbox

Training data

REVERE

Syntactic and Semantic analysis

POS analysis using CLAWS Hybrid tagger using HMM Error rate of 1.5%

Semantic analysis General sense field of words and idioms Investigating applicability to technical

domainWmatrix retrieval tool

Frequency profiling and KWIC

Air Traffic Control

Ethnographic studies at ATC centre Verbatim transcripts of observations

and interviews with controllers Unstructured reports 103 pages

Role Analysis

Corpus analysis

Log-likelihood Semantic Word sense (and examples from the text)

tag

3366 S7.1 power, organising (‘controller’, ‘chief’)

2578 M5 flying (‘plane’, ‘flight’, ‘airport’)

988 O2 general objects (‘strip’, ‘holder’, ‘rack’)

643 O3 electrical equipment (‘radar’, ‘blip’)

535 Y1 science and technology (‘PH’)

449 W3 geographical terms (‘Pole Hill’, ‘Dish Sea’)

432 Q1.2 paper documents and writing (‘writing’, ‘written’, ‘notes’)

372 N3.7 measurement (‘length’, ‘height’, ‘distance’, ‘levels’, ‘1000ft’)

318 L1 life and living things (‘live’)

310 A10 indicating actions (‘pointing’, ‘indicating’, ‘display’)

306 X4.2 mental objects (‘systems’, ‘approach’, ‘mode’, ‘tactical’, ‘procedure’)

290 A4.1 kinds, groups (‘sector’, ‘sectors’)

Corpus Analysis (2)

A Standards Document

National standard for procurement of safety-critical military systems 21,000 words Strongly structured and highly stylised

POS analysis

Role Analysis

Conclusions

Rapid analyses of complex and voluminous free text

Support tools for system engineersShallow analysis to help focus

attentionTwo experiments: ATC & Standards

Doc