Integrating Visual Search with Visual Memory in a Knowledge directed Image ... · 2013-02-28 · Integrating Visual Search with Visual Memory in a Knowledge directed Image Interpretation

Integrating Visual Search with Visual Memory in aKnowledge directed Image Interpretation System

T.P. Pridmore and S.H. Joseph

Dept. of Mechanical and Process EngineeringUniversity of Sheffield

Mappin St.Sheffield Si 3JD

We describe an improved implementation of ANON, aschema-driven image analysis system capable of produc-ing high level interpretations of greyscale images of me-chanical engineering drawings. The system has been ex-tended to incorporate a drawing memory into which com-pleted schema instances are placed and which may beaccessed as an integral part of ANON's knowledge di-rected visual search. This memory provides a basis forthe integration into a coherent whole of the piecewise in-terpretations previously supplied by the system. It leadsto a richer description of drawing content, improves effi-ciency and allows ANON to deal with partially completeinterpretations. The new system's structure and opera-tion are discussed and examples shown of it's interpre-tation of real mechanical drawings.1

ANON [1,2] is a knowledge directed image analysis sys-tem capable of producing high level interpretations ofgreyscale images of mechanical engineering drawings.These documents (figure 1) contain the rich mixture ofgraphical, textual and symbolic information needed toprovide a means of communication between trained en-gineers. As with many visual tasks, the complexity of thedrawing interpretation problem is such that a systematicand extensible architecture is essential if image analysisis to remain tractable and efficient. The identificationof such an architecture is the prime motivation for thepresent work. However, the interpretation of mechanicaldrawings is not just a useful testbed for experiments invision system design; there is a real need for automaticdrawing interpretation now.

Computer aided draughting (CAD) systems arewidespread in the manufacturing industries. However,while new design problems are now usually addressedwithin CAD environments, many engineering tasks in-volve the modification of designs which were developedand presented on paper. In such cases a choice must bemade between reworking the design using appropriateCAD tools, a lengthy and largely unproductive task, orfinding some method of conversion from the old format tothe new. Much thought has been given to the problemof vectorisation, the description of a drawing image interms of an unstructured collection of line segments [3],but few systems attempt to process real images of real

1 This research is supported by SERC ACME grant GRE 41092.T.P. Pridmore is now with the School of Engineering Informa-tion Technology, Sheffield City Polytechnic, Pond St., Sheffield SI1WB.

367

drawings any further. CAD descriptions normally com-prise fewer, larger constructs and, consequently, mostCAD/CAM applications assume a higher level of repre-sentation.

o,

10.0 »

Figure 1. A drawing image (128 grey levels,10 pixels/mm), thresholded for display only.

ANON uses schemata to describe drawing content. Thesystem (figure 2) is modelled on the human "cycle ofperception" [4]; the basis of the approach being a con-tinuous loop in which a constantly changing world modeldirects perceptual exploration, determines how its find-ings are to be interpreted and is modified as a result.In ANON this role is taken by an instance of one of anumber of schema classes. On each cycle the control-ling, or "current", instance invokes appropriate mem-bers of ANON's library of image analysis routines andinforms a higher level control module of the results ofit's actions. The system presently maintains classes cor-responding to solid, dashed and chained lines, solid and

BMVC 1990 doi:10.5244/C.4.65

dashed curves, cross hatching, physical outlines, junc-tions, letters, words, witness and leader lines and certainrestricted forms of dimensioning. Each schema instancetherefore represents a particular example of some proto-typical drawing construct.

controlsystem

mod i f i e s i nforms

currentschema

directs and in t •rprets

i mageanalysist i brarLj

Figure 2. ANON's cycle of perception

The basic operation of the image analysis library is se-quential tracking; routines are available for followingstraight lines [5], circles [6] and area outlines. Thegrey level corresponding to white paper background isfirst measured and its noise level assessed. These fig-ures are then used as the basis of dynamic thresholdingprocedures which improve tracking on poor and vari-able contrast images. It should be stressed here thatall of ANON's image analysis is carried out under thecontrol of some schema instance - in the context of aparticular hypothesis regarding the local content of thedrawing. The image analysis library therefore providesan extendible tool box of procedures whose applicationvaries according to context.

ANON's control module comprises a set of strategy ruleswritten in the form of an LR(1) grammar and appliedby a parser generated using the Unix utility yacc [7].These rules define methods by which high level drawingentities may be obtained by hierarchical combination oflower level constructs. On each cycle the control systemdetermines an appropriate modification to the currentschema. Modification may mean updating an internalvariable, adding new sub-parts, or replacing the instancewith a new one representing a different type of construct.

Strategy rules, like string grammars, describe acceptablesequences of events; a parser is therefore a natural vehi-cle for their application. Moreover, LR parsing is a wellunderstood method which provides a "simple, expressiverule format", "formal precision and interpretability" and"guaranteed consistency"; attributes noted by Rao andJain [7] as among the advantages of logic as a represen-tation in computer vision. The existence of well estab-lished software tools based on the technique is an addedbonus.

The human perceptual cycle is thought to continuethroughout life. ANON, however, needs some procedurefor initiating the system given a fresh drawing and ter-minating it when all the relevant constructs have beenfound. A book-keeping module therefore directs atten-tion towards unexplained ink marks and terminates the

system when no significantly black areas remain unex-amined.

ANON has been successful in extracting high-level en-tities from images of piece-part drawings [1,2]. Thepresent work centres on the problems of integrating thesedescriptions within a hierarchical drawing memory, deal-ing with multiple interpretations of a single image object,merging consistent descriptions and flagging conflicts.

PROBLEMS & APPROACHESAlthough at a much higher level than previous vector de-scriptions, ANON's schemata still provide only a piece-wise representation of the drawing: if a complete inter-pretation is to be obtained they must be integrated intoa coherent whole. Several difficulties must be faced.

ANON's basic activity is to match image primitivesagainst schema classes in an attempt to identify appro-priate schema instances. The context sensitive natureof this task, however, raises the possibility of a givengroup of pixels contributing to several different contexts.As contexts may intersect, so may the system's schema-based representations of drawing content.

A brief examination of Figure 1 clarifies the problem.While there exist clearly identifiable entities that arethe building blocks from which drawings are constructed(hatched areas, chains, etc), these components can becombined in very many different ways. Four basic formsof dimensioning, for example, have been identified [9]and could be described by schemata. They may, how-ever, be combined quite arbitrarily; witness lines are of-ten shared, leaders run together, etc. Under such cir-cumstances overlapping interpretations are unavoidable.

Overlaps may represent local ambiguities. Althougheach schema covers a significant number of pixels, theconstructs involved are still quite small relative to thedrawing. Situations therefore arise in which this com-paratively local processing is unable to determine thecorrect interpretation.

An obvious approach would be to continue to developANON in the same vein: to attempt to define higher levelschemata which can describe the present overlapping,consistent interpretations while rejecting those that areinconsistent. We feel, however, that it is naive to expectever higher schemata to be able to capture the varietyand complexity of interaction that is found in even thesimpler mechanical drawings. McDermott [10] arguesthat a knowledge-based system will display erroneousbehaviour for either of two reasons; (1) inadequacies inits knowledge or (2) inadequacies in its problem solvingbehaviour. We believe that ANON's overlapping inter-pretations are best addressed by considering the secondpossibility. In the longer term the system will need toadopt a fundamentally different problem solving tech-nique if full integration of schemata is to be achieved.For the present we are concerned only with merging sim-ilar instances and identifying situations in which furtherwork is necessary.

A continuum of approaches presents itself: at one ex-treme one could simply accumulate schemata, then dele-gate the integration problem to a separate post-process.This mirrors the standard separation of segmentation

368

and interpretation adopted by, for example, VISIONS[11]. A middle road would be to have the integrationprocess act as a demon that is invoked whenever neces-sary, much as frames of similar type are merged by the"exception handlers" of SIGMA [12]. The opposite ex-treme is to implement continuous interaction betweenthe image interpretation and schema integration pro-cesses (cf. MAPSEE [13]), making them both part ofthe perceptual cycle. From our experience of ANON itwould appear that much of the system's strength lies init's ability to integrate segmentation with interpretation[1,2]; we have therefore chosen an approach at the latterend of the above continuum.

In it's original form, the system merely displayed com-pleted schemata to the user. Hence, as interpretationprogressed, a considerable amount of information was ac-crued but not exploited in any way. The goal of the workreported here is to make use of this knowledge by retain-ing complete schemata in a " drawing memory" (figure 3)which may be examined, along with the image, on sub-sequent perceptual cycles. During this examination, in-teractions are noted between the developing schema andthose already completed.

d~awing

memory

controlsystem

modi fies i

currentschema

directs and int

i mageanatysisL ibrary

informs

irprets

Figure 3. Modified perceptual cycle.

SEARCHING IMAGE AND MEMORYOn each cycle ANON's current schema performs threedistinct tasks [1]. First, appropriately placed ink marksare sought by schema specific combinations of circularand linear search patterns. These black marks are thenused as seed points for the development of some low leveldescriptive primitive; line, arc or area outline. The finalstep is to interpret that primitive in the prevailing con-text, this results in a symbolic label or "token" whichis attached to the primitive before it is passed to thecontrol module. How then can this sequence of eventsbe altered or extended to incorporate drawing memory?What is needed is some point of access and some way of"tokenising" the instance(s) returned. There is clearlyno need to develop schemata, complete constructs willbe supplied directly from memory.

Consider access first. Since drawing memory containsschemata extracted from the grey level array, the latterprovides a spatial index to the former. Furthermore, asschemata only arise from inked areas of the drawing, theink marks identified early in each cycle provide points

of access that are both natural and efficient. The cur-rent schema's knowledge of where to look for the nextconstruct can therefore be used to initiate analysis ofboth the drawing image and the drawing memory. Asseveral marks typically arise on each cycle a schema spe-cific function must decide which candidate(s) to pursuefirst.

ANON's descriptions are organised hierarchically (eg.figure 4). Each schema instance contains both a geo-metrical summary of the construct it represents and linksto other schemata representing its components. Dimen-sions, for example, comprise one text, one leader and twowitness instances. Each witness in turn contains a rootline, an (optional) chain and a junction instance mark-ing the dimensioned point. At the leaves of the tree areinstances of the system's primitives.

Drawing memory is naturally searched top down; suc-cessively lower level components are examined until aninstance is encountered which the current schema inter-prets as worthy of consideration by the strategy gram-mar. Suppose, for example, that a given ink mark over-lays the leader line of a previously completed dimensionschema. On first examination drawing memory revealsthe entire dimension set. If this is not deemed suitablethe next level sub-component is considered. This is theleader instance, which comprises a line with arrowheads.Should that also be found lacking attention passes tothe next level down; the line. Having reached a leaf ofthe schema tree, search terminates: with success if theline is acceptable, with failure otherwise. As interpreta-tions often overlap, a given ink mark may lie on morethan one schema. Drawing memory therefore suppliesan ordered list of instances at each level, precedence be-ing given (somewhat arbitrarily in this context) to thosemost recently stored.

This top down search for appropriate instances in draw-ing memory neatly complements ANON's bottom-up in-terpretation of the drawing image [1]; the current schemadevelops bottom up while incorporating the most signif-icant acceptable constructs from memory.

During image interpretation only primitives need to betokenised, each schema therefore incorporates a func-tion which characterises the relative geometric and otherproperties of the current instance and the latest line, arcor area outline. Memory search, however, may returninstances of any class. A combinatorial problem arises;each schema class needs some way of tokenising everyother. In the worst case this suggests an N x N arrayof tokenisation functions (N=16 at present). As suchdifficulties place restrictions on system growth, some co-herent method of avoiding the problem must be found.

One of the issues addressed during the initial implemen-tation of ANON was how an image interpretation systemwhich essentially operates bottom up could hypothesiseand test for the existence of high level constructs in topdown fashion. The solution adopted [1,2] was to employdirected bottom up processing; that is, to have the cur-rent schema predict a primitive which should be part ofthe desired structure. If found, this primitive seeds bot-tom up analysis. Should an instance of the hypothesisedclass be constructed the test is positive.

Recall that drawing memory is accessed through inkmarks which would otherwise be used as the basis of de-scriptive primitives. If, having hypothesised some highlevel entity, an instance of the appropriate class can beretrieved from memory via these marks it seems fairto assume that a similar instance would have been ex-tracted from the image had bottom up interpretationcontinued. Hence only class is considered when tokenis-ing any high level instances returned from memory. Forthese constructs at least tokenisation is trivial.

Greater care is needed when tokenising retrieved prim-itives. These lowest level entities are typically detectedby more exploratory search patterns which generate anumber of possibilities at once. Under such circum-stances explicit geometrical and other tests are clearlyrequired if the correct interpretation is to be identified.The tokenisation functions used during image interpre-tation can, however, be applied without change to prim-itives extracted from drawing memory. No extension ofschema classes is therefore necessary.

Our method of tokenising instances retrieved from mem-ory is directly analogous to that employed during theexamination of high-level predictions of image contentby the original ANON. As noted then [1], only a lim-ited amount of the knowledge embodied in the pre-diction/retrieved instance is actually exploited by themethod. So far, however, this limitation does not ap-pear to restrict the application of the technique.

In the current version of ANON constructs retrievedfrom drawing memory are given precedence over thoseextracted from the drawing image. Image analysis onlyproceeds when nothing worthy of further attention issupplied by memory search. One drawback of the ap-proach is that the system can be misled by its memory.Situations arise in which a better construct is apparentin the image but not considered because the partiallycomplete memory contains an acceptable, though sub-optimal, construct. A particularly interesting aspect ofthis work lies in exploring the extent to which such errorsdisrupt interpretation. At the present level of develop-ment no significant difficulties have been encountered,though we are currently examining alternative ways ofallocating priority.

STRATEGY RULESAn interesting feature of ANON's three layer structureis its separation of geometric and strategic knowledge;the former is confined to schema classes, the latter tothe strategy grammar. There is no principled differencebetween instances built bottom up by image interpreta-tion and those constructed previously and now extractedfrom drawing memory. We have therefore been able touse the strategy rules, without change, to manage theinsertion of previously stored instances into the currentschema. For example, the strategy grammar includes therule

witness : line BREAK junction {wit.instantiate()};

which states that a witness instance may be instantiatedand made current by location of suitably related [1,2]line and junction instances. All that need be done toincorporate data returned from drawing memory is to

add an extra clause;

witness : WITNESS;

Together, these allow a witness to be made current ei-ther bottom up or via the extraction from memory of acomplete witness instance. All subsequent rules dealingwith the non-terminal 'witness', eg.

dimension : leader ABUTS witness {dim.instantiate()};

apply regardless of how any given witness was obtained.ANON's interpretation of image and memory is there-fore very closely integrated, not only during search andtokenisation, but at the strategy level as well.

COINCIDENCE LINKSThe practical goal of this research is the identificationof interactions between the current schema and those al-ready located and now held in drawing memory. Giventhe closely linked image and memory searches describedabove this is easily achieved. Whenever a previouslystored instance is reactivated, drawing memory simplyreturns a copy of that instance augmented with a "co-incidence link" to the original, which remains in mem-ory. The copy may then be incorporated into whateverconstruct is developing at that time while retaining anexplicit link to its previous, possibly conflicting, inter-pretation.

40 .2 __^

o

wordi

letter3 /letter] arrow1\ arrow2

Letter2 6 Line3

figure J,.a) ANON's interpretation of theupper left dimensioning of figure 1 and b) theunderlying hierarchical representation.

Consider the two nested dimension sets sharing a wit-ness line at the top left of figure 1. Figure 4a depictsANON's interpretation of the uppermost construct whilefigure 4b shows the hierarchical description underlyingthis display. Once the dimension instance is complete itis placed in drawing memory and so becomes availableto the current schema on subsequent perceptual cycles.When attention falls on the second piece of dimension-ing it's leader, text and one of it's witness instances are

extracted from the image. The second, leftmost witness,however, is obtained, complete, from memory. Figure 5shows the content of drawing memory after the seconddimension has been completed. A coincidence link (fig-ure 5b) is in place between the two overlapping witnessinstances, which are marked bold in figure 5a.

40 .225.1

o

design goals is that ANON should exploit local contextwhenever possible during its analysis of the drawing im-age. By only allowing the current schema to accumulatelower level instances, we ensure that the schema withthe greatest available context is in control. Second, webelieve that the sequential processing style that arisesfrom this restriction is natural given the task at hand.Mechanical drawings are man-made artifacts intended tocommunicate information according to a predefined con-vention: they are read rather than perceived.

o

d imcns ipn2 d i mens i on]

witncss3/ \ wilness4 witnessi / \witness2

word2 Leader2 word] Leader!

is a coincidence Link

Figure 5. a) ANON's output after com-pleteion of a second dimension instance, b) Thecorresponding (summarised) schema-based de-scrption.

Figure 6 shows ANON's final interpretation of the draw-ing of figure 1, coinciding instances are drawn bold. Thesystem's description of the original drawing correspondswell to human interpretation; high level dimensioning,chain, text and physical outline schemata account formost of the image. Note the presence of coincidencelinks between all shared witnesses and between the junc-tion instances appearing in both witness and physicaloutline schemata (marked by black dots). This associ-ation of dimension schema with dimensioned object isparticularly informative, raising the possibility of com-puting a more complete description by propagating thegiven dimensions around the physical outline. Overlapsbetween chained line schemata contributing to centre-line and witness instances are also visible to the right ofthe figure. Despite being only a step on the way to fullschema integration, coincidence links already provide aricher description of drawing content.

PARTIAL INTERPRETATIONS ANDSCHEMA FUSIONDuring image interpretation, the current schema seeksto extend itself by searching for further, or missing, sub-components. As these are located they are incorporatedinto the current schema under the control of the strategygrammar. Only constructs at a lower level in the schemahierarchy than the current instance may be added in thisway, for two reasons. First, one of our most important

Figure 6. ANON's interpretation of figure1. Coinciding schemata are drawn bold.

The addition of drawing memory, however, providesANON with the opportunity to extend this operationsomewhat. Top down examination of memory may resultin an instance of the same class as the current schema be-ing discovered during a search intended to locate only thenext sub-component. Under these circumstances the twooverlapping schemata may clearly be merged (cf. [12])to form a single, usually larger, construct.

Suppose, for example, that a developing cross hatchinginstance overlaps one already held in memory. The (new)strategy rule

hatch : hatch PARALLEL HATCH {hatch .fuse()};combines these two schemata to create a single instancerepresenting the full extent of the known hatched re-gion. As hatching accounts for arbitrary areas of thedrawing it is impractical to expect entire regions to beextracted, reliably, in one pass. The level of strategicknowledge required would appear prohibitive. ANON'scross hatching strategies are therefore limited to seek-ing convex regions which may be fused together laterto form more general hatched areas. This ability to lo-cate entities piecemeal and assemble a more complete

description in memory greatly improves ANON's finalinterpretation; schema fusion leads to fewer instances,each covering a larger portion of the image. The systemis currently able to fuse hatching, physical outlines andcentrelines. It should be stressed here that the strategiesadded to control fusion of like schemata are the only ex-tensions made to the grammar during the course of thepresent work.

A similar wait-and-see approach is applied to partial in-terpretations. One of the drawbacks of schema basedsystems is that instances are frequently created but notcompleted. Such partially filled schemata can representsignificant processing effort, but are often discarded im-mediately. ANON now defers its rejection of (some) par-tial instances until processing terminates. Those partlyfilled schemata that are large enough to warrant furtherattention (eg. dimensions lacking only text) are heldin drawing memory, but labelled as partial. This givesthe system every opportunity to exploit such constructslater in the interpretation. Any schemata which remainpartial on termination are discarded.

CONCLUSIONOur ability to incorporate a drawing memory intoANON's perceptual cycle with only minimal extensionsto the system's knowledge base is encouraging. One cancharacterise drawing interpretation as the search for ac-ceptable structures in the presence of distractions; whenpursuing a given hypothesis it is important not to be leadastray by the extraneous linework and symbols whichtypically disrupt the target entity. Although it suppliesmuch useful information, drawing memory is also a clearsource of additional distractions. That this increase inpotentially misleading data does not cause ANON to failprovides further evidence of the success of the basic de-sign.

The extension of ANON's visual search to include accessto a visual memory has substantially improved the per-formance of the system. A richer description of drawingcontent is now produced which comprises fewer, largerschemata and makes explicit any (potentially) conflict-ing interpretations. Partial interpretations are storedand retrieved when necessary, reducing the amount ofwasted interpretation effort and so improving efficiency.

Despite this, further extensions are necessary. Consid-erable work remains in identifying suitable methods forthe resolution of conflicting interpretations. ANON's ex-plicit representation of overlapping schemata, however,represents a significant step towards the construction ofthe "scene constraint graph" [cf. 9] which will form thebasis of such work. The development of conflict reso-lution facilities appropriate to ANON is the subject ofongoing research.

In its present form ANON is perhaps closest to theaerial image interpretation system SIGMA [12]; both areschema driven, both use the same knowledge base to me-diate access to the image and the developing interpreta-tion, both give previously stored instances precedenceover new constructs. The major difference in emphasisis that Matsuyama and Hwang explore decentralisationas a route to domain independent control, but keep to acomparatively restricted hierarchy of schemata, whereas

ANON confines control issues to an explicit, domain de-pendent strategic knowledge base, but explores the prob-lems inherent in a substantial and extendable hierarchy.This differing balance is largely due to the disparate na-ture of the two application areas.

REFERENCES

1. S.H. Joseph and T.P. Pridmore "Knowledgedirected interpretation of mechanical engineeringdrawings" IEEE Trans. Pami in press.

2. T.P. Pridmore and S.H. Joseph "UsingSchemata to Interpret Images of Mechanical En-gineering Drawings" Proc. ECAI-90 Stockholm(1990).

3. V. Nagasamy and N.A. Langrana "Engineeringdrawing processing and vectorisation system" Com-puter Vision, Graphics and Image Processing 49 pp379-397 (1990).

4. U. Neisser Cognition and Reality: Principles andImplications of Cognitive Psychology W.H. Free-man, San Francisco (1976).

5. S.H. Joseph "Processing of engineering line draw-ings for automatic input to CAD" Pattern Recogni-tion 22 pp 1-11 (1989).

6. S.J. Cheetham The automatic extraction and clas-sification of curves from conventional line drawingsPhD Thesis, University of Sheffield (1988).

7. S.C. Johnson "Yacc - yet another compiler com-piler" Comp. Sci. Tech. Rep. No. 32, Bell Labora-tories, New Jersey (1975).

8. A.R. Rao and A.K. Jain "Knowledge representa-tion and control in computer vision systems" IEEEExpert pp 64-79 (1988).

9. D. Dori "A syntactic/geometric approach to recog-nition of dimensions in engineering machine draw-ings" Computer Vision, Graphics and Image Pro-cessing 47 pp. 271-291 (1989).

10. J. McDermott "Making Expert Systems Ex-plicit" in Information Processing '86 ed. H.J. Ku-gler, Elsevier Science Publishers (North-Holland)(1986).

11. A.R. Hanson, and E.M. Riseman "VISIONS:a computer system for interpreting scenes" in Com-puter Vision Systems ed. A. Hanson and E. Rise-man, Academic Press (1978).

12. T. Matsuyama and V. Hwang "SIGMA: aframework for image understanding - integration ofbottom-up and top-down analyses" Proc. 9th IJ-CAI pp 908-915(1985).

13. J.A. Mulder, A.K. Mackworth and W.S.Havens "Knowledge structuring and constraintsatisfaction: the MAPSEE approach" IEEE Trans.PAMI 10 6 pp 866-879 (1988).

372

Documents

Integrating Visual Search with Visual Memory in a Knowledge directed Image ... · 2013-02-28 · Integrating Visual Search with Visual Memory in a Knowledge directed Image Interpretation