35
MetaLexer: A Modular Lexical Specification Language Andrew Casey Laurie Hendren McGill University www.sable.mcgill.ca/ metalexer

MetaLexer : A Modular Lexical Specification Language

  • Upload
    malha

  • View
    60

  • Download
    3

Embed Size (px)

DESCRIPTION

Andrew Casey Laurie Hendren McGill University. MetaLexer : A Modular Lexical Specification Language. www.sable.mcgill.ca/metalexer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A A A. Why MetaLexer ? - PowerPoint PPT Presentation

Citation preview

Page 1: MetaLexer :  A Modular Lexical  Specification Language

MetaLexer: A Modular Lexical

Specification LanguageAndrew CaseyLaurie Hendren McGill University

www.sable.mcgill.ca/metalexer

Page 2: MetaLexer :  A Modular Lexical  Specification Language

2

Why MetaLexer?

Why is it relevant to AOSD?

What are the challenges?

Page 3: MetaLexer :  A Modular Lexical  Specification Language

3

Structure of a Compiler Front-End

Scanner (Lexical Analysis)

Parser and Semantic Checks

Context-free grammars + actions/attributes (yacc, bison, Polyglot, JastAdd, ...)

Regular Expressions +State (flex, jflex, ...)

Page 4: MetaLexer :  A Modular Lexical  Specification Language

4

Given a front-end specification for a

language (i.e. Java), current method to

implement a front-end for an extension of that language (i.e.

AspectJ)?Lexical specification for original language

Grammar and actions for original language

Modified lexical specification for extended language

Grammar and actions for original languageGrammar

rules for extension

Page 5: MetaLexer :  A Modular Lexical  Specification Language

5

Desired Modular MetaLexer Approach

Lexical specification for original language

Grammar and actions for original language

Lexical specification for original language

Grammar and actions for original language

Grammar rules for

extension

Lexical rules for

extension

Page 6: MetaLexer :  A Modular Lexical  Specification Language

6

We also want to be able to combine lexical

specifications for diverse languages.

• Java + HTML• Java + Aspects (AspectJ)• Java + SQL• MATLAB + Aspects (AspectMatlab)

Page 7: MetaLexer :  A Modular Lexical  Specification Language

7

Would like to be able to reuse and

extend lexical

specification modules• Nested C-style comments

• Javadoc comments• Floating-point constants• URL• regular expressions• …

Page 8: MetaLexer :  A Modular Lexical  Specification Language

8

Scanning AspectJ

Page 9: MetaLexer :  A Modular Lexical  Specification Language

9

Scanning Java/Javadoc

Page 10: MetaLexer :  A Modular Lexical  Specification Language

10

First, let’s understand

the traditional lexer tools (lex, flex,

jflex). • programmer specifies regular expressions + actions• tools generate a finite automaton-based implementation• states are used to handle different language contexts

Page 11: MetaLexer :  A Modular Lexical  Specification Language

11

1 %%2 %class Lexer3 Identi¯er = [: jletter :] [: jletterdigit :]¤4 ...5 %state STR ING6 %%7 <Y Y IN IT IA L> f8 "abstract" f return symbol(sym.ABSTRACT ); g9 f Identi¯er g f return symbol(sym.IDENT IF IER ); g10 n" f string.setLength(0); yybegin(ST R ING); g11 ...12 g13

14 <STR ING> f15 n" f yybegin(Y Y IN IT IA L ); return ...; g16 [̂ nnnrn"nn]+ f string.append( yytext() ); g17 nnt f string.append('nt '); g18 ...19 g

Page 12: MetaLexer :  A Modular Lexical  Specification Language

12

Current (ugly) method for extending jflex specifications -

copy&modify

Principled way of weaving new rules into existing rules.

Modular and abstract notion of state and changing between states.

Copy jflex specification. Insert new scanner rules into copy.

Order of rules matters! Introduce new states and action logic for converting

between states.

Page 13: MetaLexer :  A Modular Lexical  Specification Language

13

Jflex Lexing Structure

Lexing rules associated with a state. Changing states associated with action

code.

Specification in one file.

Page 14: MetaLexer :  A Modular Lexical  Specification Language

14

MetaLexer Structure

Components define lexing rules associated with a state and produce meta-tokens.

Layout defines transitions between components, state changes by meta-lexer.

Each component specified in its own file.

Layout specified in its

own file.

Page 15: MetaLexer :  A Modular Lexical  Specification Language

15

Structure of a MetaLexer Specification for Matlab

Page 16: MetaLexer :  A Modular Lexical  Specification Language

16

Extending a MetaLexer Specification for Matlab

Page 17: MetaLexer :  A Modular Lexical  Specification Language

17

Sharing component specifications with MetaLexer

Page 18: MetaLexer :  A Modular Lexical  Specification Language

18

Scanning a properties file1 #some properties2 name=properties3 date=2009/ 09/ 214

5 #some more properties6 owner=root

Properties

Key Value

Util_Patterns

Page 19: MetaLexer :  A Modular Lexical  Specification Language

19

util_properties.mlc helper component

1 %component util patterns2 %helper3

4 lineTerminator = [nrnn] j "nrnn"5 otherWhitespace = [ ntnfnb]6 identi¯er = [a¡ zA ¡ Z][a¡ zA ¡ Z0¡ 9 ]¤7 comment = #[̂ nrnn]¤

Page 20: MetaLexer :  A Modular Lexical  Specification Language

20

key.mlc component1 %component key2 %extern "Token symbol(int)"3 %extern "Token symbol(int, String)"4 %extern "void error(String) throws LexerException"5

6 %%7

8 %%inherit util patterns9 f lineTerminatorg f : / ¤ignore¤/ :g10 f otherWhitespaceg f : / ¤ignore¤/ :g11 "=" f : return symbol(A SSIGN); :g ASSIGN12 %:13 f identi¯er g f : return symbol(K EY , yytext()); :g14 f commentg f : / ¤ignore¤/ :g15 %:16 <<ANY >> f : error("Unexpected char '"+yytext()+" '" ); :g17 <<EOF>> f : return symbol(EOF ); :g

Page 21: MetaLexer :  A Modular Lexical  Specification Language

21

value.mlc component

1 %component value2 %extern "Token symbol(int, String, int, int , int , int )"3 %appendf4 return symbol(VA LUE, text, startL ine, startCol,5 endL ine, endCol);6 %appendg7

8 %%9

10 %%inherit util patterns11 f lineTerminatorg f : :g L INE T ERM INATOR12 %:13 %:14 <<ANY >> f : append(yytext()); :g15 <<EOF>> f : :g L INE T ERM INATOR

Page 22: MetaLexer :  A Modular Lexical  Specification Language

22

properties.mll layout1 package properties;2 %%3 import static properties.TokenTypes.¤;4 %%5 %layout properties6 %option public "%public"7 ...8 %lexthrow "LexerException"9 %component key10 %component value11 %start key12 %%13 %%embed14 %name key value15 %host key16 %guest value17 %start A SSIGN18 %end LINE T ERM INATOR

Page 23: MetaLexer :  A Modular Lexical  Specification Language

23

MetaLexer is implemented and available:www.sable.mcgill.ca/metalexer

properties.mll

key.mlc

value.mlc

util_patterns.mlc

MetaLexerproperties.jflex

Page 24: MetaLexer :  A Modular Lexical  Specification Language

24

Key problems to solve: How to implement the meta-token lexer?

How to allow for insertion of new components, replacing of components, adding new embeddings (metalexer transitions).

How to insert new patterns into components as specific points.

Page 25: MetaLexer :  A Modular Lexical  Specification Language

25

Implementing the meta-token lexer

Recognize the

matching suffix.

Recognize a meta-

pattern, i.e. when to go

to a new component and when to

return.

Page 26: MetaLexer :  A Modular Lexical  Specification Language

26

Implementing inheritance (structured weaving).

Page 27: MetaLexer :  A Modular Lexical  Specification Language

27

Implementing MetaLexer layout inheritance

• Layouts can inherit other layouts

• %inherit directive put at the location at which the inherited transition rules (embeddings) should be placed.

• each %inherit directive can be followed by:• %unoption• %replace• %unembed• new embeddings

Page 28: MetaLexer :  A Modular Lexical  Specification Language

28

Implementing MetaLexer component inheritance

Page 29: MetaLexer :  A Modular Lexical  Specification Language

29

O

Weaving in inherited component

Original Component

New Component adds some rules and inherits original component.

Woven output

Page 30: MetaLexer :  A Modular Lexical  Specification Language

30

Results:Applied to three projects with complex scanners:

• AspectJ (abc and extensions)

• Matlab (Annotations and AspectMatlab extensions)

• MetaLexer

Page 31: MetaLexer :  A Modular Lexical  Specification Language

31

1 %%embed2 %name perclause3 %host aspect decl4 %guest pointcut5 %start [P ERCFLOW PERCFLOWBELOW PERTARGET6 P ERTHIS] LPAREN7 %end RPAREN8 %pair LPAREN , RPAREN9

10 %%embed11 %name pointcut12 %host java, aspect13 %guest pointcut14 %start POINTCUT15 %end SEM ICOLON

AspectJ and Extensions

Page 32: MetaLexer :  A Modular Lexical  Specification Language

32

Using MetaLexer for an extensible front end for

McLab

PLDI 2011 Tutorial on McLab!!!!!

Page 33: MetaLexer :  A Modular Lexical  Specification Language

33

MetaLexer scanner implemented in MetaLexer

1st version of MetaLexer written in JFlex, one for components and one for layouts.

2nd version implemented in MetaLexer, many shared components between the component lexer and the layout lexer.

Page 34: MetaLexer :  A Modular Lexical  Specification Language

34

Ad-hoc systems with separate scanner/ LALR parser Polyglot JastAdd abc

Recursive-descent scanner/parser ANTLR and systems using ANTLR

Scannerless systems Rats! (PEGs)

Integrated systems Copper (modified LALR parser which communicates with DFA-based

scanner)

Related Work

Page 35: MetaLexer :  A Modular Lexical  Specification Language

35

Conclusions MetaLexer allows one to specify modular

and extensible scanners suitable for any system that works with JFlex.

Two main ideas: meta-lexing and component/layout inheritance.

Used in large projects such as abc, McLab and MetaLexer itself.

Available at: www.sable.mcgill.ca/metalexer