Upload
malha
View
60
Download
3
Embed Size (px)
DESCRIPTION
Andrew Casey Laurie Hendren McGill University. MetaLexer : A Modular Lexical Specification Language. www.sable.mcgill.ca/metalexer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A A A. Why MetaLexer ? - PowerPoint PPT Presentation
Citation preview
MetaLexer: A Modular Lexical
Specification LanguageAndrew CaseyLaurie Hendren McGill University
www.sable.mcgill.ca/metalexer
2
Why MetaLexer?
Why is it relevant to AOSD?
What are the challenges?
3
Structure of a Compiler Front-End
Scanner (Lexical Analysis)
Parser and Semantic Checks
Context-free grammars + actions/attributes (yacc, bison, Polyglot, JastAdd, ...)
Regular Expressions +State (flex, jflex, ...)
4
Given a front-end specification for a
language (i.e. Java), current method to
implement a front-end for an extension of that language (i.e.
AspectJ)?Lexical specification for original language
Grammar and actions for original language
Modified lexical specification for extended language
Grammar and actions for original languageGrammar
rules for extension
5
Desired Modular MetaLexer Approach
Lexical specification for original language
Grammar and actions for original language
Lexical specification for original language
Grammar and actions for original language
Grammar rules for
extension
Lexical rules for
extension
6
We also want to be able to combine lexical
specifications for diverse languages.
• Java + HTML• Java + Aspects (AspectJ)• Java + SQL• MATLAB + Aspects (AspectMatlab)
7
Would like to be able to reuse and
extend lexical
specification modules• Nested C-style comments
• Javadoc comments• Floating-point constants• URL• regular expressions• …
8
Scanning AspectJ
9
Scanning Java/Javadoc
10
First, let’s understand
the traditional lexer tools (lex, flex,
jflex). • programmer specifies regular expressions + actions• tools generate a finite automaton-based implementation• states are used to handle different language contexts
11
1 %%2 %class Lexer3 Identi¯er = [: jletter :] [: jletterdigit :]¤4 ...5 %state STR ING6 %%7 <Y Y IN IT IA L> f8 "abstract" f return symbol(sym.ABSTRACT ); g9 f Identi¯er g f return symbol(sym.IDENT IF IER ); g10 n" f string.setLength(0); yybegin(ST R ING); g11 ...12 g13
14 <STR ING> f15 n" f yybegin(Y Y IN IT IA L ); return ...; g16 [̂ nnnrn"nn]+ f string.append( yytext() ); g17 nnt f string.append('nt '); g18 ...19 g
12
Current (ugly) method for extending jflex specifications -
copy&modify
Principled way of weaving new rules into existing rules.
Modular and abstract notion of state and changing between states.
Copy jflex specification. Insert new scanner rules into copy.
Order of rules matters! Introduce new states and action logic for converting
between states.
13
Jflex Lexing Structure
Lexing rules associated with a state. Changing states associated with action
code.
Specification in one file.
14
MetaLexer Structure
Components define lexing rules associated with a state and produce meta-tokens.
Layout defines transitions between components, state changes by meta-lexer.
Each component specified in its own file.
Layout specified in its
own file.
15
Structure of a MetaLexer Specification for Matlab
16
Extending a MetaLexer Specification for Matlab
17
Sharing component specifications with MetaLexer
18
Scanning a properties file1 #some properties2 name=properties3 date=2009/ 09/ 214
5 #some more properties6 owner=root
Properties
Key Value
Util_Patterns
19
util_properties.mlc helper component
1 %component util patterns2 %helper3
4 lineTerminator = [nrnn] j "nrnn"5 otherWhitespace = [ ntnfnb]6 identi¯er = [a¡ zA ¡ Z][a¡ zA ¡ Z0¡ 9 ]¤7 comment = #[̂ nrnn]¤
20
key.mlc component1 %component key2 %extern "Token symbol(int)"3 %extern "Token symbol(int, String)"4 %extern "void error(String) throws LexerException"5
6 %%7
8 %%inherit util patterns9 f lineTerminatorg f : / ¤ignore¤/ :g10 f otherWhitespaceg f : / ¤ignore¤/ :g11 "=" f : return symbol(A SSIGN); :g ASSIGN12 %:13 f identi¯er g f : return symbol(K EY , yytext()); :g14 f commentg f : / ¤ignore¤/ :g15 %:16 <<ANY >> f : error("Unexpected char '"+yytext()+" '" ); :g17 <<EOF>> f : return symbol(EOF ); :g
21
value.mlc component
1 %component value2 %extern "Token symbol(int, String, int, int , int , int )"3 %appendf4 return symbol(VA LUE, text, startL ine, startCol,5 endL ine, endCol);6 %appendg7
8 %%9
10 %%inherit util patterns11 f lineTerminatorg f : :g L INE T ERM INATOR12 %:13 %:14 <<ANY >> f : append(yytext()); :g15 <<EOF>> f : :g L INE T ERM INATOR
22
properties.mll layout1 package properties;2 %%3 import static properties.TokenTypes.¤;4 %%5 %layout properties6 %option public "%public"7 ...8 %lexthrow "LexerException"9 %component key10 %component value11 %start key12 %%13 %%embed14 %name key value15 %host key16 %guest value17 %start A SSIGN18 %end LINE T ERM INATOR
23
MetaLexer is implemented and available:www.sable.mcgill.ca/metalexer
properties.mll
key.mlc
value.mlc
util_patterns.mlc
MetaLexerproperties.jflex
24
Key problems to solve: How to implement the meta-token lexer?
How to allow for insertion of new components, replacing of components, adding new embeddings (metalexer transitions).
How to insert new patterns into components as specific points.
25
Implementing the meta-token lexer
Recognize the
matching suffix.
Recognize a meta-
pattern, i.e. when to go
to a new component and when to
return.
26
Implementing inheritance (structured weaving).
27
Implementing MetaLexer layout inheritance
• Layouts can inherit other layouts
• %inherit directive put at the location at which the inherited transition rules (embeddings) should be placed.
• each %inherit directive can be followed by:• %unoption• %replace• %unembed• new embeddings
28
Implementing MetaLexer component inheritance
29
O
Weaving in inherited component
Original Component
New Component adds some rules and inherits original component.
Woven output
30
Results:Applied to three projects with complex scanners:
• AspectJ (abc and extensions)
• Matlab (Annotations and AspectMatlab extensions)
• MetaLexer
31
1 %%embed2 %name perclause3 %host aspect decl4 %guest pointcut5 %start [P ERCFLOW PERCFLOWBELOW PERTARGET6 P ERTHIS] LPAREN7 %end RPAREN8 %pair LPAREN , RPAREN9
10 %%embed11 %name pointcut12 %host java, aspect13 %guest pointcut14 %start POINTCUT15 %end SEM ICOLON
AspectJ and Extensions
32
Using MetaLexer for an extensible front end for
McLab
PLDI 2011 Tutorial on McLab!!!!!
33
MetaLexer scanner implemented in MetaLexer
1st version of MetaLexer written in JFlex, one for components and one for layouts.
2nd version implemented in MetaLexer, many shared components between the component lexer and the layout lexer.
34
Ad-hoc systems with separate scanner/ LALR parser Polyglot JastAdd abc
Recursive-descent scanner/parser ANTLR and systems using ANTLR
Scannerless systems Rats! (PEGs)
Integrated systems Copper (modified LALR parser which communicates with DFA-based
scanner)
Related Work
35
Conclusions MetaLexer allows one to specify modular
and extensible scanners suitable for any system that works with JFlex.
Two main ideas: meta-lexing and component/layout inheritance.
Used in large projects such as abc, McLab and MetaLexer itself.
Available at: www.sable.mcgill.ca/metalexer