44
Bard

Bard. ASLs An Application-Specific Language (ASL) is a programming language that supports: –a particular class of applications –in a highly idiosyncratic

Embed Size (px)

Citation preview

Bard

ASLs

• An Application-Specific Language (ASL) is a programming language that supports:– a particular class of applications– in a highly idiosyncratic domain

• In order to understand a particular ASL, it is usually necessary to understand the domain

• An ASL can be extremely effective when used for its intended purpose

Bard

• Bard is designed to perform pattern matching and manipulation of Abstract Syntax Trees (ASTs) in the reengineering domain.

• In order to understand Bard, it is necessary to understand a little about ASTs and reengineering

Outline of the talk

• This talk will cover:– What reengineering is all about – What an Abstract Syntax Tree (AST) is – What an "idiom" is – What the Idiom Tool does – How Bard and the Idiom Tool work together– The design decisions behind Bard

Legacy code

• The legacy code problem:– About 80% of programmer time goes to

maintenance – New software methodologies are the preferred

solution (that is, do it over from scratch!)

• BUT– 40 years of working systems can't be

abandoned.

Software reengineering

• Software reengineering is any activity that– improves one's understanding of software– improves the software documentation – improves the software itself – brings legacy code up to more current standards– aids in a new implementation of the software

Why reengineer?

• We reengineer code for many reasons:– add functionality– ease maintenance – move to new platforms and/or new languages – facilitate reuse – improve performance – decrease cost of keeping obsolete systems– combat software rot

The reengineering process

• Reverse engineering moves from "software assets" (code, documentation, etc.) to a higher level of abstraction

• Transformations modify the system at some level of abstraction

• Forward engineering moves from a higher level of abstraction back to code

Reengineering levels

• Reengineering can be carried out at many levels:

Idiom tool

• Idiom Tool is the part of the reengineering suite that performs transformations on the AST

Creating the Abstract Syntax Tree

• Parsers convert programs written in various languages into their AST representations

• This AST can be manipulated by Idiom Tool, or used as input for various other tools (to create flowcharts, data flow diagrams, concordances, etc.)

• Idiom Tool is a programmable subsystem

• The programming language used is Bard

Example of an AST (simplified)

• In C: if (n < 0) return -n;

Gray

• The ASTs can expressed in a language called Gray

• Gray is itself an ASL (Application-Specific Language)

• Since ASTs are trees, Gray borrows the syntax (but not the operators) of LISP

Example Gray code

• (If_Op (Less Identifier:n Number_Literal:0) (Return_Value (Negate Identifier:n)) Epsilon) )

Idioms

• An idiom is a conventional way of expressing an idea

• Programming languages, like natural languages, have idioms

• Examples:– for (i = 0; i < n; i++) A[i] = 0;– temp = x; x = y; y = temp;

Idioms change

• Idioms vary over time and across languages– Arrays may be zero-based or one-based– FORTRAN used parallel arrays to simulate structs– Languages may use array indices to simulate

pointers– C lacks multiply-dimensioned arrays– No early languages were object-oriented– COBOL typically represented years as two digits

Recap

• To summarize, – ReEngineer is a tool that uses – parsers to translate programs in various languages

into– Abstract Syntax Trees, which can be represented in

a linear fashion by – Gray code, and which can be manipulated by

– Idiom Tool, which uses programs written in– the Bard language.

The reengineering flow

Regular expressions

• Recognizing idioms in a program is a matter of recognizing certain kinds of patterns

• Regular expressions (regexps) are a standard, well-developed mechanism for doing pattern recognition on strings

• Regular expressions are used by grep, Perl, Tcl, awk, Python, vi, emacs, and a number of other languages and tools

Common regexp operators

Expression Meaningabc The literal characters abc. Any single character [ x̂] Negation: any character but xx? Optional xx* Zero or more x's

More regexp operators

Expression Meaningx+ One or more x's (x| y) Either x or yxy x followed by yx̂ x at beginning of string

\1 Variable

Regular expressions in Bard

• Regular expressions in Bard are modelled after regular expressions in grep

• Because we are doing pattern matching on ASTs rather than strings, – Literals are represented by Gray expressions,– Regular expression syntax must be modified to

accomodate arbitrary Gray expressions, and– Regular expressions may occur within Gray

expressions, and vice versa

Some regexps in Bard

Expression Meaning(assignment identifier:x number_literal:0)

The literal assignment statementx=0

? Any node or subtree{not x} Negation: any node or subtree but

x{opt x} Optional x{* x} Zero or more x's

More regexps in Bard

Expression Meaning{+ x} One or more x's{x | y} Either x or yx y x followed by y?x Variable x$x Variable sequence x

Inner and outer languages

• Regular expressions are not enough -- we need some way to control pattern matching

• We also need to manipulate the parts of the AST found by pattern matching

• Solution: wrap a more-or-less conventional "outer language" around the "inner language" of regular expressions

• This is also the approach taken in Perl

Conventional statements in Bard

• Conventional statements in Bard include: – assignment statements – if statements – while loops – print statements – function and procedure calls – calls to native (Ada) code – tracing and debugging facilities

Special-purpose statements in Bard

• Examining and manipulating the AST: – match pattern [at position] – insert subtree as relative-position

• for example, insert identifier:x as first child;

– delete position – replace position with subtree – find pattern [up to position] – go to position

Efficiency

• ReEngineer has been used to process hundreds of thousands of lines of code

• Pattern-matching languages are not efficient enough for this purpose

• Bard procedures are inefficient

• Bard procedures are called only when there is a high likelihood that pattern matching will succeed

Fast pre-screening of nodes

• Each Bard routine specifies the types of nodes at which it might be applicable

• Idiom Tool walks the AST and calls Bard when it finds such a node

• Since Idiom Tool is written in Ada, it can screen nodes very, very fast

The structure of Bard procedures

• A "procedure" in Bard, like a "rule" in an expert system, consists of two parts:– The test part determines whether this particular

procedure is applicable at this point in the tree• Tests must not have have side effects

– The action part consists of arbitrary Bard code• Actions may modify the AST and/or may collect

and store information about the AST

Skeleton of a Bard procedure

• procedure name (pass, node_types) tests; -- may not have side effects commit; -- separates tests from actions actions; -- manipulate the ASTend procedure;

Bard example

idiom remove_null_statements; -- Remove null statements, except empty loop bodies. procedure remove_null_statements (1, Null_Statement); go to first child of parent; not match While_Op; not match Do_While; not match C_For_Loop; commit; go to @trigger; delete; end procedure;end idiom;

Diffuse patterns

• A pattern is compact if it is a single recognizable unit

• A pattern is diffuse if it consists of multiple pieces scattered throughout the AST

• Diffuse patterns may overlap in complex ways

• The problem is to collect the various pieces

• This is not easily solved with only simple data structures such as arrays

Dealing with diffuse patterns

• Idiom Tool makes multiple passes over the data base– Each Bard procedure indicates the pass for which it

is active– This allows some procedures to collect information

and other procedures to process the information.

• Idiom tool uses a fact base to store information between passes

Modifying the fact base

• Bard's fact base is modeled after that of Prolog

• Bard allows negative assertions as well as positive assertions

• assert fact;• deny fact;• retract fact;

Interrogating the fact base

• Because Bard does not used the Closed World Assumption, facts may be "true", "false", or (unlike Prolog) "unknown"

• true fact; • false fact; • known fact; • unknown fact;

Multiple occurrences

• It is sometimes necessary to recognize multiple occurrances of the same subtree. – For example, to recognize Pascal statements of

the form x:=x+1, you have to realize that the identifier x is repeated

• (Assignment Identifier:?id (Integer_Addition Identifier:?id Numeric_Literal:1))

Unification in Bard

• When pattern matching variable ?id against value v, – If ?id has no prior value, it is unified with v– If id? has a prior value, then the match succeeds

if and only if id? == v– If a test succeeds, all unifications performed

during the test are retained

– If a test fails, all unifications performed during the test are discarded

Summary I

• Idiom tool does multiple preorder traversals of the AST; at each node it may trigger one or more Bard procedures

• A Bard procedure consists of – a header that specifies appropriate node types– a test part that performs any additional tests– the keyword commit, and – an action part

Summary II

• Special features of Bard include– an "inner language" for pattern matching– procedures modeled after rules in expert systems– powerful tests that can comprise arbitrary actions– unification and unification unwinding– an integrated fact base

• These features make Bard far better for its purpose than a general language such as Ada

Summary III

• Bard has been used– in a Pascal to C translation system– to assist in converting CMS-2 to C– in a prototype Y2K system

Current status of Bard

• Currently owned by either Lockheed or Unisys

• No longer under active development

• Lost and presumed dead as a result of corporate mergers and reorganizations

The End