MetaLexer : A Modular Lexical Specification Language

Preview:

DESCRIPTION

Andrew Casey Laurie Hendren McGill University. MetaLexer : A Modular Lexical Specification Language. www.sable.mcgill.ca/metalexer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A A A. Why MetaLexer ? - PowerPoint PPT Presentation

Citation preview

MetaLexer: A Modular Lexical

Specification LanguageAndrew CaseyLaurie Hendren McGill University

www.sable.mcgill.ca/metalexer

2

Why MetaLexer?

Why is it relevant to AOSD?

What are the challenges?

3

Structure of a Compiler Front-End

Scanner (Lexical Analysis)

Parser and Semantic Checks

Context-free grammars + actions/attributes (yacc, bison, Polyglot, JastAdd, ...)

Regular Expressions +State (flex, jflex, ...)

4

Given a front-end specification for a

language (i.e. Java), current method to

implement a front-end for an extension of that language (i.e.

AspectJ)?Lexical specification for original language

Grammar and actions for original language

Modified lexical specification for extended language

Grammar and actions for original languageGrammar

rules for extension

5

Desired Modular MetaLexer Approach

Lexical specification for original language

Grammar and actions for original language

Lexical specification for original language

Grammar and actions for original language

Grammar rules for

extension

Lexical rules for

extension

6

We also want to be able to combine lexical

specifications for diverse languages.

• Java + HTML• Java + Aspects (AspectJ)• Java + SQL• MATLAB + Aspects (AspectMatlab)

7

Would like to be able to reuse and

extend lexical

specification modules• Nested C-style comments

• Javadoc comments• Floating-point constants• URL• regular expressions• …

8

Scanning AspectJ

9

Scanning Java/Javadoc

10

First, let’s understand

the traditional lexer tools (lex, flex,

jflex). • programmer specifies regular expressions + actions• tools generate a finite automaton-based implementation• states are used to handle different language contexts

11

1 %%2 %class Lexer3 Identi¯er = [: jletter :] [: jletterdigit :]¤4 ...5 %state STR ING6 %%7 <Y Y IN IT IA L> f8 "abstract" f return symbol(sym.ABSTRACT ); g9 f Identi¯er g f return symbol(sym.IDENT IF IER ); g10 n" f string.setLength(0); yybegin(ST R ING); g11 ...12 g13

14 <STR ING> f15 n" f yybegin(Y Y IN IT IA L ); return ...; g16 [̂ nnnrn"nn]+ f string.append( yytext() ); g17 nnt f string.append('nt '); g18 ...19 g

12

Current (ugly) method for extending jflex specifications -

copy&modify

Principled way of weaving new rules into existing rules.

Modular and abstract notion of state and changing between states.

Copy jflex specification. Insert new scanner rules into copy.

Order of rules matters! Introduce new states and action logic for converting

between states.

13

Jflex Lexing Structure

Lexing rules associated with a state. Changing states associated with action

code.

Specification in one file.

14

MetaLexer Structure

Components define lexing rules associated with a state and produce meta-tokens.

Layout defines transitions between components, state changes by meta-lexer.

Each component specified in its own file.

Layout specified in its

own file.

15

Structure of a MetaLexer Specification for Matlab

16

Extending a MetaLexer Specification for Matlab

17

Sharing component specifications with MetaLexer

18

Scanning a properties file1 #some properties2 name=properties3 date=2009/ 09/ 214

5 #some more properties6 owner=root

Properties

Key Value

Util_Patterns

19

util_properties.mlc helper component

1 %component util patterns2 %helper3

4 lineTerminator = [nrnn] j "nrnn"5 otherWhitespace = [ ntnfnb]6 identi¯er = [a¡ zA ¡ Z][a¡ zA ¡ Z0¡ 9 ]¤7 comment = #[̂ nrnn]¤

20

key.mlc component1 %component key2 %extern "Token symbol(int)"3 %extern "Token symbol(int, String)"4 %extern "void error(String) throws LexerException"5

6 %%7

8 %%inherit util patterns9 f lineTerminatorg f : / ¤ignore¤/ :g10 f otherWhitespaceg f : / ¤ignore¤/ :g11 "=" f : return symbol(A SSIGN); :g ASSIGN12 %:13 f identi¯er g f : return symbol(K EY , yytext()); :g14 f commentg f : / ¤ignore¤/ :g15 %:16 <<ANY >> f : error("Unexpected char '"+yytext()+" '" ); :g17 <<EOF>> f : return symbol(EOF ); :g

21

value.mlc component

1 %component value2 %extern "Token symbol(int, String, int, int , int , int )"3 %appendf4 return symbol(VA LUE, text, startL ine, startCol,5 endL ine, endCol);6 %appendg7

8 %%9

10 %%inherit util patterns11 f lineTerminatorg f : :g L INE T ERM INATOR12 %:13 %:14 <<ANY >> f : append(yytext()); :g15 <<EOF>> f : :g L INE T ERM INATOR

22

properties.mll layout1 package properties;2 %%3 import static properties.TokenTypes.¤;4 %%5 %layout properties6 %option public "%public"7 ...8 %lexthrow "LexerException"9 %component key10 %component value11 %start key12 %%13 %%embed14 %name key value15 %host key16 %guest value17 %start A SSIGN18 %end LINE T ERM INATOR

23

MetaLexer is implemented and available:www.sable.mcgill.ca/metalexer

properties.mll

key.mlc

value.mlc

util_patterns.mlc

MetaLexerproperties.jflex

24

Key problems to solve: How to implement the meta-token lexer?

How to allow for insertion of new components, replacing of components, adding new embeddings (metalexer transitions).

How to insert new patterns into components as specific points.

25

Implementing the meta-token lexer

Recognize the

matching suffix.

Recognize a meta-

pattern, i.e. when to go

to a new component and when to

return.

26

Implementing inheritance (structured weaving).

27

Implementing MetaLexer layout inheritance

• Layouts can inherit other layouts

• %inherit directive put at the location at which the inherited transition rules (embeddings) should be placed.

• each %inherit directive can be followed by:• %unoption• %replace• %unembed• new embeddings

28

Implementing MetaLexer component inheritance

29

O

Weaving in inherited component

Original Component

New Component adds some rules and inherits original component.

Woven output

30

Results:Applied to three projects with complex scanners:

• AspectJ (abc and extensions)

• Matlab (Annotations and AspectMatlab extensions)

• MetaLexer

31

1 %%embed2 %name perclause3 %host aspect decl4 %guest pointcut5 %start [P ERCFLOW PERCFLOWBELOW PERTARGET6 P ERTHIS] LPAREN7 %end RPAREN8 %pair LPAREN , RPAREN9

10 %%embed11 %name pointcut12 %host java, aspect13 %guest pointcut14 %start POINTCUT15 %end SEM ICOLON

AspectJ and Extensions

32

Using MetaLexer for an extensible front end for

McLab

PLDI 2011 Tutorial on McLab!!!!!

33

MetaLexer scanner implemented in MetaLexer

1st version of MetaLexer written in JFlex, one for components and one for layouts.

2nd version implemented in MetaLexer, many shared components between the component lexer and the layout lexer.

34

Ad-hoc systems with separate scanner/ LALR parser Polyglot JastAdd abc

Recursive-descent scanner/parser ANTLR and systems using ANTLR

Scannerless systems Rats! (PEGs)

Integrated systems Copper (modified LALR parser which communicates with DFA-based

scanner)

Related Work

35

Conclusions MetaLexer allows one to specify modular

and extensible scanners suitable for any system that works with JFlex.

Two main ideas: meta-lexing and component/layout inheritance.

Used in large projects such as abc, McLab and MetaLexer itself.

Available at: www.sable.mcgill.ca/metalexer

Recommended