PEG GrammarExplorer

Embed Size (px)

Citation preview

  • 7/30/2019 PEG GrammarExplorer

    1/51

    Introduction

    This is the first part of a series of articles which cover the parsing technique Parsing

    Expression Grammars. This part introduces a support library and a parser generator for C# 3.0 .

    The support library consists of the classes PegCharParser and PegByteParser which are for

    parsing text and binary sources and which support user defined error handling, direct evaluationduring parsing, parse tree generation and abstract syntax tree generation. Using these base

    classes results in fast, easy to understand and to extend Parsers, which are well integrated into the

    hosting C# program.

    The underlying parsing methodologycalled Parsing Expression Grammar[1][2][3]is

    relatively new (first described 2004), but has already many implementations. Parsing

    Expressions Grammars (PEG) can be easily implemented in any programming language, but fitespecially well into languages with a rich expression syntax like functional languages and

    functionally enhanced imperative languages (like C# 3.0) because PEG concepts have a closerelationship to mutually recursive function calls, short-circuit boolean expressions and in-place

    defined functions (lambdas).

    A new trend in parsing is integration of parsers into a host language so that the semantic gap

    between grammar notation and implementation in the host language is as small as possible (Perl

    6 and boost::sprit are forerunners of this trend). Parsing Expression Grammars areescpecially well suited when striving for this goal. Earlier Grammars were not so easy toimplement, so that one grammar rule could result in dozens of code lines. In some parsing

    strategies, the relationship between grammar rule and implementation code was even lost. This is

    the reason, that until recently generators were used to build parsers.

    This article shows how the C# 3.0 lambda facility can be used to implement a support library for

    Parsing Expression Grammars, which makes parsing with the PEG technique easy. When usingthis library, a PEG grammar is mapped to a C# grammar class which inherits basic functionality

    from a PEG base class and each PEG grammar rule is mapped to a method in the C# grammar

    class. Parsers implemented with this libary should be fast (provided the C# compiler inlinesmethods whenever possible), easy to understand and to extend. Error Diagnosis, generation of a

    parse tree and addition of semantic actions are also supported by this library.

    The most striking property of PEG and especially this library is the small footprint and the lack

    of any administrative overhead.

    The main emphasis of this article is on explaining the PEG framework and on studying concrete

    application samples. One of the sample applications is a PEG parser generator, which generatesC# source code. The PEG parser generator is the only sample parser which has been written

    manually, all other sample parsers were generated by the parser generator.

    Contents

    Introduction

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Introduction0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Introduction0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Introduction0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#References
  • 7/30/2019 PEG GrammarExplorer

    2/51

    Parsing Expression Grammar Tutorialo Parsing Expression Grammars Basicso Parsing Expression Grammars partiularities and idiomso Integrating semantic actions into a PEG frameworko Parsing Expression Grammars exposed

    Parsing Expression Grammar Implementationso General implementation strategy for PEG

    o Parsing Expression Grammars mapped to C#1.0o Parsing Expression Grammars mapped to C#3.0

    Parsing Expression Grammar Exampleso Json Checker (Recognize only)o Json Tree (Build Tree)o Basic Encode Rules (Direct Evaluation + Build Tree)o Scientific Calculator (Build Tree + Evaluate Tree)

    PEG Parser Generatoro A PEG Generator implemented with PEGo

    PEG Parser Generator Grammaro The PEG Parser Generator's handling of semantic blocks

    Parsing Expression Grammars in perspectiveo Parsing Expression Grammars Tuningo Comparison of PEG parsing with other parsing techniqueso Translating LR grammars to PEG grammars

    Future Developments

    Parsing Expression Grammar Tutorial

    Parsing Expression Grammars are a kind of executable grammars. Execution of a PEG grammar

    means, that grammar patterns matching the input string advance the current input positionaccordingly. Mismatches are handled by going back to a previous input string position whereparsing eventually continues with an alternative. The following subchapters explain PEGs in

    detail and introduce the basic PEG constructs, which have been extended by the author in order

    to support error diagnosis, direct evaluation and tree generation.

    Parsing Expression Grammars Basics

    The following PEG grammar rule

    Collapse

    EnclosedDigits: [0-9]+ / '(' EnclosedDigits ')' ;

    introduces a so called Nonterminal EnclosedDigits and a right hand side consisting of two

    alternatives.

    The first alternative ([0-9]+) describes a sequence of digits, the second ( '(' EnclosedDigits ')')

    something enclosed in parentheses. Executing EnclosedDigits with the string ((123))+5 as input

    would result in a match and move the input position to just before +5.

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Tutorialhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Tutorialhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20particularities%20and%20idiomshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20particularities%20and%20idiomshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Integrating%20semantic%20actions%20into%20a%20PEG%20frameworkhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Integrating%20semantic%20actions%20into%20a%20PEG%20frameworkhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20exposedhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20exposedhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Implementationhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Implementationhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#General%20implementation%20strategy%20for%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#General%20implementation%20strategy%20for%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#1.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#1.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#3.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#3.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Checker%20%28Recognize%20only%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Checker%20%28Recognize%20only%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Tree%20%20%28Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Tree%20%20%28Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Basic%20Encode%20Rules%20%28Direct%20Evaluation%20+%20Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Basic%20Encode%20Rules%20%28Direct%20Evaluation%20+%20Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Scientific%20Calculator%20%20%28Build%20Tree%20+%20Evaluate%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Scientific%20Calculator%20%20%28Build%20Tree%20+%20Evaluate%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generatorhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generatorhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#A%20PEG%20Generator%20implemented%20with%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#A%20PEG%20Generator%20implemented%20with%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generator%20Grammarhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generator%20Grammarhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#The%20PEG%20Parser%20Generator%27s%20handling%20of%20semantic%20blockshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#The%20PEG%20Parser%20Generator%27s%20handling%20of%20semantic%20blockshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20in%20perspectivehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20in%20perspectivehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Tuninghttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Tuninghttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Comparison%20of%20PEG%20parsing%20with%20other%20parsing%20techniqueshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Comparison%20of%20PEG%20parsing%20with%20other%20parsing%20techniqueshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Translating%20LR%20grammars%20to%20PEG%20grammarshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Translating%20LR%20grammars%20to%20PEG%20grammarshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Future%20Developmentshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Future%20Developmentshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Future%20Developmentshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Translating%20LR%20grammars%20to%20PEG%20grammarshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Comparison%20of%20PEG%20parsing%20with%20other%20parsing%20techniqueshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Tuninghttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20in%20perspectivehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#The%20PEG%20Parser%20Generator%27s%20handling%20of%20semantic%20blockshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generator%20Grammarhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#A%20PEG%20Generator%20implemented%20with%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#PEG%20Parser%20Generatorhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Scientific%20Calculator%20%20%28Build%20Tree%20+%20Evaluate%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Basic%20Encode%20Rules%20%28Direct%20Evaluation%20+%20Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Tree%20%20%28Build%20Tree%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Json%20Checker%20%28Recognize%20only%29http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#3.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20mapped%20to%20C#1.0http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#General%20implementation%20strategy%20for%20PEGhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Implementationhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20exposedhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Integrating%20semantic%20actions%20into%20a%20PEG%20frameworkhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20particularities%20and%20idiomshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Tutorial
  • 7/30/2019 PEG GrammarExplorer

    3/51

    This sample also shows the potential for recursive definitions, since EnclosedDigits uses itself as

    soon as it recognizes an opening parentheses. The following table shows the outcome ofapplying the above grammar for some other input strings. The | character is an artifical character

    which visualizes the input position before and after the match.

    Input Match Position Match Result|((123))+5 ((123))|+5 true

    |123 123| true

    |5+123 |5+123 false

    |((1)] |((1)] false

    For people familiar with regular expressions, it may help to think of a parsing expression

    grammar as a generalized regular expression which always matches the beginning of an input

    string (regexp prefixed with ^). Whereas a regular expression consists of a single expression, a

    PEG consists of set of rules; each rule can use other rules to help in parsing. The starting rule

    matches the whole input and uses the other rules to match subparts of the input. During parsingone has always a current input position and the input string starting at this position must match

    against the rest of the PEG grammar. Like regular expressions PEG supports the postfixoperators * + ? , the dot . and character sets enclosed in [].

    Unique to PEG are the prefix operators & (peek) and ! (not), which are used to look aheadwithout consuming input. Alternatives in a PEG are not separated by | but by/to indicate that

    alternatives are strictly tried in sequential order. What makes PEG grammars powerful and at the

    same time a potential memory hogg is unlimited backtracking, meaning that the input position

    can be set back to any of the previously visited input positions in case an alternative fails. Agood and detailed explanation of PEG can be found in the wikipedia[2]. The following table

    gives an overview of the PEG constructs (and some homegrown extensions) which are supportedby the library class described in this article. The following terminology is used

    Notion Meaning

    Nonterminal

    Name of a grammar rule. In PEG, there must be exactly one grammar rule having a

    nonterminal on the left hand side. The right hand side of the grammar rule provides

    the definition of the grammar rule. A nonterminal on the right hand side of a

    grammar rule must reference an existing grammar rule definition.;

    Input string string which is parsed.

    Input

    positionIndicates the next input character to be read.

    MatchA grammar element can match a stretch of the input. The match starts at the current

    input position.

    Success/

    FailurePossible outcome of matching a PEG element against the input

    e, e1, e2 e, e1 and e2 stand each for arbitrary PEG expressions.

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#References
  • 7/30/2019 PEG GrammarExplorer

    4/51

    The extended PEG constructs supported by this library are listed in the following table

    (|indicates the input position, italics like in name indicate a placeholder):

    PEG element Notation Meaning

    CodePoint

    #32 (decimal)

    #x3A0 (hex)

    #b111 (binary)

    Match input against the specified unicode character.

    PEG Success Failure

    #x25 %|1 |1%

    Literal 'literal'

    Match input against quoted string.

    PEG Success Failure

    'for' for|tran |afordable

    Escapes take the same form as in the "C" syntax

    family.

    CaseInsensitive

    Literal'literal'\i

    Same as for Literal but compares case insensitive. \i

    must follow a Literal

    PEG Success Failure'FOR'\i FoR|TraN |affordable

    CharacterSet [chars]

    Same meaning as in regular expressions. Supported

    are ranges as in [A-Za-z0-9], single characters andescapes sequences.

    Any .

    increment the input position except when being at the

    end of the input.

    PEG Success Failure

    'this is the end'

    .

    this is the

    end!||this is the

    end

    BITS

    BITS

    BITS

    Interprets the Bit bitNo/Bitsequence [low-high] of thecurrent input byte as integer which must match is used

    as input for the PegElement.

    PEG Success Failure

    &BITS

    |11010101 |01010101

    Sequence e1 e2

    Match input against e1 and then -in case of success-against e2.

    PEG Success Failure

    '#'[0-9] #5| |#A

    Sequentially

    executed

    alternatives

    e1 /e2

    Match input against e1 and then - in case of failure -against e2.

    PEG Success Failure

    '

  • 7/30/2019 PEG GrammarExplorer

    5/51

    PEG element Notation Meaning

    '-'? -|42 |+42

    Greedy repeat

    zero or more

    occurrences

    e*

    Match input repeated against e until the match fails.

    PEG Success Success

    [0-9]* 42|b |-42

    Greedy repeat

    one or more

    occurrences

    e+

    Shorthand for e e*

    PEG Success Failure

    [0-9]* 42|b |-42

    Greedy repeatbetween

    minimum and

    maximum

    occurrences

    e{min}

    e{min,max}

    e{,max}e{min,}

    Match input at least min times but not more than maxtimes against e.

    PEG Success Failure

    ('.'[0-9]*){2,3} .12.36.42|.18b |.42b

    Peek &e

    Match e without changing input position.

    PEG Success Failure

    &'42' |42 |-42

    Not !e

    Like Peek but SuccessFailure

    PEG Success Failure

    !'42' |-42 |42

    FATALFATAL

    Prints the message and error location to the error

    stream and quits the parsing process afterwards

    (throws an exception).

    WARNINGWARNING

    WARNING Prints the message and

    location to the error stream.Success:

    Mandatory @e Same as (e/FATAL )

    Tree Node ^^e

    ife is matched, a tree node will be added to the parse

    tree. This tree node will hold the starting and ending

    match positions for e

    Ast Node ^elike ^^e, but node will be replaced by the child node ifthere is only one child

    Rule N: e;Nis the nonterminal; e the right hand side which is

    terminated by a semicolon.

    Rule with id [id]N: e;idmust be a positive integer,e.g. [1] Int:[0-9]+;The id will be assigned to the the tree/ast node id.

    Tree building

    rule[id] ^ N: e; Nwill be allocated as tree node having the id

    Ast buildingrule

    [id] N: e;Nwill be allocated as tree node and is eventuallyreplaced by a child if the node forNhas only one

  • 7/30/2019 PEG GrammarExplorer

    6/51

    PEG element Notation Meaning

    child which has no siblings.

    Parametrized

    Rule

    N: e;

    Ntakes the PEG epressionspeg1,peg2 ... as

    parameter. This parameters cant then be used in e.

    Into variable e:variableName

    Set the host language variable (a string, byte[],

    int, double or PegBegEnd) to the matched inputstretch. The language variable must be declared eitherin the semantic block of the corresponding rule or in

    the semantic block of the grammar (see below).

    Bits Into

    variable

    BITS

    BITS

    Interpret the Bit bitNo or the Bitsequence [low-high]

    as integer and store it in the host variable.

    Semantic

    Function_

    call host language functionf_ in a semantic block (see

    below). A semantic function has the signature bool

    _();. A return value of true is handled as successwhereas a return value of false is handled as fail.

    Semantic Block(Grammar

    level)

    BlockName{ //host

    //language

    //statements

    }

    TheBlockName can be missing in which case a localclass named _Top will be created. Functions and data

    of a grammar-level semantic block can be accessed

    from any other rule-level semantic block. Functions in

    the grammar-level semantic block can be used assemantic functions at any place in the grammar.

    CREATE

    Semantic Block

    (Grammar

    level)

    CREATE{ //host

    //language

    //statements

    }

    This kind of block is used in conjunction withcustomized tree nodes as described at the very end of

    this table

    Semantic Block

    (Rule level)

    RuleName { //host

    //language

    //statements}: e;

    Functions and data of a rule-level semantic block are

    only available from within the associated rule.

    Functions in the rule associated semantic block can beused as semantic functions on the right hand side of

    the rule.

    Using semanticblock

    (which is

    elsewhere

    defined)

    RuleName

    usingNameOfSemanticBlock:e;

    The using directive supports reusing the same

    semantic block when several rules need the same local

    semantic block.

    Custom Node

    Creation

    ^^CREATEN: e;

    ^CREATEN: e;

    Custom Node creation allows to create a user definedNode (which must be derived from the library node

    PegNode). The CreaFunc must be defined in a

    CREATE semantic block (see above) and must havethe following overall structure

    CollapsePegNode CreaFuncName(ECreatorPhase phase,

  • 7/30/2019 PEG GrammarExplorer

    7/51

    PEG element Notation Meaning

    PegNode parentOrCreated, int id){

    if (phase == ECreatorPhase.eCreate ||phase == ECreatorPhase.eCreateAndComplete)

    { // create and return the custom

    node; ifphase==ECreatorPhase.eCreateAndComplete

    // this will be the only call}else{// finish the custom node and return

    parentOrCreated; one only gets here// after successful parsing of the

    subrules}

    }

    Parsing Expression Grammars Particularities and Idioms

    PEG's behave in some respects similar to regular expressions: The application of a PEG to an

    input string can be explained by a pattern matching process which assigns matching parts of theinput string to rules of the grammar (much like with groups in regexes) and which backtracks in

    case of a mismatch. The most important difference between a PEG and regexes is the fact, that

    PEG support recursivenesss and that PEG patterns are greedy. Compared to most othertraditional language parsing techniques, PEG is surprisingly different. The most striking

    differences are:

    Parsing Expression Grammars are deterministic and never ambigous, thereby removing aproblem of most other parsing techniques. Ambiguity means that the same input string

    can be parsed with different sets of rules of a given grammar and that there is no policy

    saying which of the competing rules should be used. This is in most cases a seriousproblem, since if this gets undetected it results in different parse trees for the same input.

    The lack of ambiguity is a big plus for PEG. But the fact, that the order of alternatives in

    a PEG rule matters, takes getting used to.

    The following PEG rule e.g. rel_operator: '='; will never

    succeed in recognizing > which can be a right shift operator or the closingof two template brackets.

  • 7/30/2019 PEG GrammarExplorer

    8/51

    Parsing Expression Grammars can backtrack to an arbitrary location at the beginning ofthe input string. PEG does not require that a file which has to be parsed must be readcompletely into memory, but it prohibits to give free any part of the file which has

    already been parsed. This means that a file which foreseeably will be parsed to the end,

    should be read into memory completely before parsing starts. Fortunately memory is not

    anymore a scarce resource. In a direct evaluation scenario (semantic actions are executedas soon as the corresponding syntax element is recognized) backtracking can also cause

    problems, since already executed semantic actions are in most cases not so easily undone.

    Semantic actions should therefore be placed at points where backtracking cannotanymore occur or where backtracking would indicate a fatal error. Fatal errors in PEG

    parsing are best handled by throwing an exception.

    For many common problems idiomatic solution exist within the PEG framework asshown in the following table

    Goal Idiomatic solution Sample

    Avoid that

    white spacescanning

    clutters up the

    grammar

    White Space scanning

    should be done

    immediately after reading a terminal,but not in any other place.

    Collapse

    //to avoid[3]prod: val S ([*/] S valS)*;[4]val : [0-9]+ / '(' Sexpr ')' S;//to prefer[3]prod: val ([*/] S val)*;[4]val: ([0-9]+ / '(' Sexpr ')') S;

    Reuse Nonterminal

    when only a subset

    is applicable

    !oneOfExceptions

    reusedNonterminal

    Java spec SingleCharacter:

    InputCharacter but not ' or \

    Peg specCollapse

    SingleCharacter: !['\\]InputCharacter

    Test for end of

    input!.

    Collapse(!./FATAL )

    Generic rule

    for quoting situation

    GenericQuote

    :

    BegQuote QuoteContent EndQuote;

    CollapseGenericQuote

    Order alternatives

    having the same

    start

    longer_choice / shorter_choiceCollapse

  • 7/30/2019 PEG GrammarExplorer

    9/51

    Goal Idiomatic solution Sample

    expressive errormessages

    peeking at next symbol //poor error handling[4]object: '{' S members?@'}' S;[5]members: (str/num)(','S @(str/num))*;

    //better error handling[4]object: '{' S(&'}'/members) @'}' S;[5]members: @(str/num)(','S @(str/num))*;

    PEG Idioms Applied to Real Word Grammar Problems

    Most modern programming languages are based on grammars, which can be almost parsed by

    the predominant parsing technique (LALR(1) parsing). The emphasis here is on almost, meaning

    that there are often grammar rules which require special handling outside of the grammarframework. The PEG framework can handle this exceptional cases far better as will be shown for

    the C++ and C# grammar.

    TheC# Language Specification V3.0 e.g. has the following wording for its cast-

    expression/parenthized-expression disambiguation rule:

    CollapseA sequence of one or more tokens (2.3.3) enclosed in parentheses isconsideredthe start of a cast-expression only if at least one of the following aretrue:

    1) The sequence of tokens is correct grammar for a type,and the token immediately following the closing parentheses isthe token ~, the token !, the token (, an identifier (2.4.1),a literal (2.4.4), or any keyword (2.4.3) except as and is.

    2) The sequence of tokens is correct grammar for a type, but not for anexpression.

    This can be expressed in PEG with

    Collapsecast_expression:/*1)*/ ('(' S type ')' S &([~!(]/identifier/literal/!('as' B/'is' B)

    keyword B)/*2)*/ / !parenthesized_expression '(' S type ')' ) S unary_expression;B: ![a-zA-Z_0-9];S: (comment/whitespace/new_line/pp_directive )*;

    The C++ standard has the following wording for its expression-statement/declaration

    disambiguation rule

    Collapse

    http://msdn.microsoft.com/en-us/vcsharp/aa336809.aspxhttp://msdn.microsoft.com/en-us/vcsharp/aa336809.aspxhttp://msdn.microsoft.com/en-us/vcsharp/aa336809.aspxhttp://msdn.microsoft.com/en-us/vcsharp/aa336809.aspx
  • 7/30/2019 PEG GrammarExplorer

    10/51

    An expression-statement ... can be indistinguishable from a declaration ...In those cases the statement is a declaration.

    This can be expressed in PEG with

    Collapsestatement: declaration / expression_statement;

    Integrating Semantic Actions into a PEG Framework

    A PEG grammar can only recognize an input string, which gives you just two results, a booleanvalue indicating match success or match failure and an input position pointing to the end of the

    matched string part. But in most cases, the grammar is only a means to give the input string a

    structure. This structure is then used to associate the input string with a meaning (a semantic) and

    to execute statements based on this meaning. These statements executed during parsing arecalled semantic actions. The executable nature of PEG grammars makes integration of semantic

    actions easy. Assuming a sequence of grammar symbols e1 e2 and a semantic action es_ which

    should be performed after recognition ofe1 we just get the sequence e1es_e2 where es_ is afunction of the host language.

    From the grammar view point es_ has to conform to the same interface as e1 and e2 or any other

    PEG component, what means that es_ is a function returning a bool value as result, where true

    means success and false failure. The semantic function es_ can be defined either local to the

    rule which uses (calls) es_ or in the global environment of the grammar. A bundling of semanticfunctions, into-variables, helper data values and helper functions forms then a semantic block.

    Semantic actions face one big problem in PEG grammars, namely backtracking. In most cases,

    backtracking should not occur anymore after a semantic function (e.g. computation of a result of

    an arithemtic subexpression) has been performed. The simplest way to guard against

    backtracking in such a case is to handle any attempt to backtrack as fatal error. TheFATAL construct presented here aborts parsing (by raising an exception).

    Embedding semantic actions into the grammar enables direct evaluation of the parsed construct.

    A typical application is the stepwise computation of an arithmetical expression during the parse

    phase. Direct evaluation is fast but very limiting since it can only use information present at the

    current parse point. In many cases embedded semantic actions are therefore used to collectinformation during parsing for processing after parsing has completed.

    The collected data can have many forms, but the most important one is a tree. Optimizing parsers

    and compilers delay semantic actions until the end of the parsing phase and just create a physicalparse tree during parsing (our PEG framework supports tree generating by the prefixes ^ and ^^).

    A tree walking process then checks and optimizes the tree. Finally the tree is intrerpreted atruntime or it is just used to generate virtual or real machine code. The most important evaluation

    options are shown below

    CollapseParsing -> Direct Evaluation

  • 7/30/2019 PEG GrammarExplorer

    11/51

    -> Collecting Information during Parsing-> User defined datastructure

    ->User defined evaluation-> Tree Structure

    ->Interpretation of generated tree->Generation of VM or machine code

    In a PEG implementation, tree generation must cope with backtracking by deleting tree parts

    which were built after the backtrack restore point. Furthermore, no tree nodes should be created

    when a Peek or Not production is active. In this implementation this is handled by tree

    generation aware code in the implemenations for And, Peek, Not and ForRepeat productions.

    Parsing Expression Grammars Exposed

    The following sample grammar is also taken from the wikipedia article on PEG [2] (but with a

    sligthly different notation).

    CollapseExpr: S Sum;Sum: Product ([+-] S Product)*;Product: Value ([*/] S Value)*;Value: [0-9]+ ('.' [0-9]+)? S / '(' S Sum ')' S;S: [ \n\r\t\v]*;

    During the application of a grammar to an input string, each grammar rule is called from some

    parent grammar rule and matches a subpart of the input string which is matched by the parentrule. This results in a parse tree. The grammar rule Expr would associate the arithmetical

    expressions 2.5 * (3 + 5/7) with the following parse tree:

    CollapseExpr

    The above parse tree is not a physical tree but an implicit tree which only exists during the parse

    process. The natural implementation for a PEG parser associates each grammar rule with amethod (function). The right hand side of the grammar rule corresponds to the function body and

    each nonterminal on the right hand side of the rule is mapped to a function call. When a rule

    function is called, it tries to match the input string at the current input position against the righthand side of the rule. If it succeeds it advances the input position accordingly and returns true

    otherwise the input position is unchanged and the result is false. The above parse tree can

    therefore be regarded as a stack trace. The location marked with [*] in the above parse tree

    corresponds to the function stackValue

  • 7/30/2019 PEG GrammarExplorer

    13/51

    /'/' S Value div_)* store_ ;Value: Number S / '(' S Sum ')' S ;Number{ //semantic rule related block using C# as host language

    string sNumber;bool store_(){double.TryParse(sNumber,out result);return true;}

    }: ([0-9]+ ('.' [0-9]+)?):sNumber store_ ;

    S: [ \n\r\t\v]* ;

    In many cases on the fly evaluation during parsing is not sufficient and one needs a physical

    parse tree or an abstract syntax tree (abbreviated AST). An AST is a parse tree shrinked to the

    essential nodes thereby saving space and providing a view better suited for evaluation. Suchphysical trees typically need at least 10 times the memory space of the input string and reduce

    the parsing speed by a factor of 3 to 10.

    The following PEG grammar uses the symbol ^ to indicate an abstract snytax node and the

    symbol ^^ to indicate a parse tree node. The grammar presented below is furthermore enhancedwith the error handling item Fatal< errMsg>. Fatal leaves the parsing process immediately with

    the result fail but the input position set to the place where the fatal error occurred.

    Collapse[1] ^^Expr: S Sum (!./FATAL) ;[2] ^Sum: Product (^[+-] S Product)* ;[3] ^Product: Value (^[*/] S Value)* ;[4] Value: Number S / '(' S Sum ')' S /

    FATAL;[5] ^^Number: [0-9]+ ('.' [0-9]+)? ;[6] S: [ \n\r\t\v]* ;

    With this grammar the arithmetical expression 2.5 * (3 + 5/7) would result in the followingphysical tree:

    CollapseExpr>

    With a physical parse tree, much more options for evaluation are possible, e.g. one can generatecode for a virtual machine after first optimizing the tree.

    Parsing Expression Grammar Implementation

  • 7/30/2019 PEG GrammarExplorer

    14/51

    In this chapter I first show how to implement all the PEG constructs one by one. This will be

    expressed in pseudo code. Then I will try to find the best interface for this basic PEG functionsin C#1.0 and C#3.0.

    General Implementation Strategy for PEG

    The natural representation of a PEG is a top down recursive parser with backtracking. PEG rules

    are implemented as functions/methods which call each other when needed and return true in caseof a match and false in case of a mismatch. Backtracking is implemented by saving the input

    position before calling a parsing function and restoring the input position to the saved one in case

    the parsing function returns false.

    Backtracking can be limited to the the PEG sequence construct and the e repetitions

    if the input position is only moved forward after successful matching in all other cases. In the

    following pseudo code we use strings and integer variables, short circuit conditional expressions(using && for AND and || for OR) and exceptions. s stands for the input string and i refers to the

    current input position. bTreeBuild is an instance variable which inhibits tree build operationswhen set to false.

    PEG construct sample pseudo code to implement sample

    CodePoint#

    #x

    #x

    #32 (decimal)

    #x3A0 (hex)

    (binary)

    Collapseif i

  • 7/30/2019 PEG GrammarExplorer

    15/51

    PEG construct sample pseudo code to implement sample

    else {return false;}

    BITS BITS

    Collapseif i=low

  • 7/30/2019 PEG GrammarExplorer

    16/51

    PEG construct sample pseudo code to implement sample

    message > PrintMsg(message); return true;

    Into e :variableName

    Collapseint i0=i;bool b= e();

    variableName= s.substring(i0,i-i0);return b;

    Bits Into variable BITS

    Collapseint i0=i;if i

  • 7/30/2019 PEG GrammarExplorer

    17/51

    //[0-9]+if( !In('0', '9') ){return false;}while (In('0', '9')) ;

    for(;;){//(S [+-] S [0-9]+)*int pos= pos_;if( S() && OneOfChars('+','-') && S() ){

    //[0-9]+if( !In('0', '9') ){pos_=pos; break;}while (In('0', '9')) ;

    }else{pos_= pos; break;

    }}S();return true;

    }bool S()//S: [ \n\r\t\v]* ;{

    while (OneOfChars(' ', '\n', '\r', '\t', '\v')) ;

    return true;}

    }

    To execute the Grammar we must just call the method Sum of an object of the above class. Butwe cannot be happy and satisfied with this solution. Compared with the original grammar rule,

    the method Sum in the above class InSum_C1 is large and in its use of loops and helper variables

    quite confusing. But it is perhaps the best of what is possible in C#1.0. Many traditional parsergenerators even produce worse code.

    Parsing Expression Grammars Mapped to C#3.0

    PEG operators like Sequence, Repeat, Into, Tree Build, Peek and Not can be regarded as

    operators or functions which take a function as parameter. This maps in C# to a method with adelegate parameter. The Peg Sequence operator e.g can be implemented as a function with the

    following interface public bool And(Matcher pegSequence); where Matcher is the

    following delegate public delegate bool Matcher();.

    In older C# versions, passing a function as a parameter required some code lines, but with C#3.0

    this changed. C#3.0 supports lambdas, which are anonymous functions with a very low

    syntactical overhead. Lambdas enable a functional implementation of PEG in C#. The PEG

    Sequence e1 e2 can now be mapped to the C# term And(()=>e1() && e2()). ()=>e1()&&

    e2() looks like a normal expression, but is in effect a fullfledged function with zero parameters(hence ()=>) and the function body {return e1() && e2();}. With this facility, the Grammarfor integer sums

    CollapseSum: S [0-9]+ ([+-] S [0-9]+)* S ;S: [ \n\r\t\v]* ;

  • 7/30/2019 PEG GrammarExplorer

    18/51

    results in the following C#3.0 implementation (PegCharParser is a not shown base class with

    methods And,PlusRepeat,OptRepeat, In and OneOfChars)

    Collapseclass IntSum_C3 : PegCharParser{

    public IntSum_C3(string s) : base(s) { }public bool Sum()//Sum: S [0-9]+ (S [+-] S [0-9]+)* S ;{

    returnAnd(()=>

    S()&& PlusRepeat(()=>In('0','9'))&& OptRepeat(()=> S() && OneOfChars('+','-') && S() &&

    PlusRepeat(()=>In('0','9')))&& S());

    }public bool S()//S: [ \n\r\t\v]* ;

    {return OptRepeat(()=>OneOfChars(' ', '\n', '\r', '\t', '\v'));

    }}

    Compared to the C#1.0 implementation this parser class is a huge improvement. We have

    eliminated all loops and helper variables. The correctness (accordance with the grammar rule) is

    also much easier to check. The methods And, PlusRepeat, OptRepeat, In and OneOfChars are

    all implemented in both the PegCharParser and PegByteParser base classes.

    The following table shows most of the PEG methods available in the base library delivered withthis article.

    PEG element C# methods sample usage

    CodePoint Char(char) Char('\u0023')

    LiteralChar(char c0,char c1,...)Char(string s)

    Char("ab")

    CaseInsensitiveLiteral

    IChar(char c0,char c1,...)IChar(string s)

    IChar("ab")

    Char Set

    []

    OneOf(char c0,char c1,...)

    OneOf(string s)OneOf("ab")

    Char Set[]

    In(char c0,char c1,...)In(string s) In('A','Z','a'-'z','0'-'9')

    Any

    .Any() Any()

    BITSBits(char cLow,char

    cHigh,byte toMatch)Bits(1,5,31)

    Sequencee1 e2 ...

    And(MatcherDelegate m) And(() => S() && top_element())

  • 7/30/2019 PEG GrammarExplorer

    19/51

    PEG element C# methods sample usage

    Alternativee1 / e2 / ...

    e1 || e2 || ... @object() || array()

    Greedy Option

    e?

    Option(MatcherDelegate

    m)Option(() => Char('-'))

    Greedy repeat 0+e*

    OptRepeat(MatcherDelegate m)

    OptRepeat(() => OneOf(' ', '\t', '\r', '\n'))

    Greedy repeat 1+

    e+

    PlusRepeat(MatcherDeleg

    ate m)PlusRepeat(() => In('0', '9'))

    Greedy repeatn0..n1

    e{low,high}

    PlusRepeat(MatcherDeleg

    ate m)ForRepeat(4, 4, () => In('0', '9', 'A', 'F', 'a', 'f'))

    Peek

    &ePeek(MatcherDelegate m) Peek(() => Char('}'))

    Not

    !e Not(MatcherDelegate m) Not(()=>OneOf('"','\\'))

    FATAL

    FATALFatal("") Fatal(" expected")

    WARNING

    WARNING

    Warning("") Warning("non-json stuff before end of file")

    Into

    e :variableName

    Into(out string

    varName,MatcherDelegat

    e m)

    Into(out intvarName,MatcherDelegat

    e m)

    Into(out PegBegEnd

    varName,MatcherDelegate m)

    Into(out top.n, () => Any())

    Bits Into

    BITSvariableNa

    me

    BitsInto(int lowBitNo, int

    highBitNo,out int

    varName)

    BitsInto(1, 5,out top.tag)

    Build Tree Node

    [id]^^RuleName:

    TreeNT(int nRuleId,

    PegBaseParser.MatchertoMatch);

    TreeNT((int)Ejson_tree.json_text,()=>...)

    Build Ast Node[id] RuleName:

    TreeAST(int

    id,PegBaseParser.Matcher

    Delegate m)

    TreeAST((int)EC_KernighanRitchie2.external_declaration,()=>...)

    ParametrizedRule

    RuleName(MatcherDelegate a,

    binary(()=> relational_expression(),()=>TreeChars(

  • 7/30/2019 PEG GrammarExplorer

    20/51

    PEG element C# methods sample usage

    RuleNameMatcherDelegate b,...) ()=>Char('=','=') || Char('!','=') )

    Expression Grammar ExamplesThe following examples show uses of the PegGrammar class for all supported use cases:

    1. Recognition only: The result is just match or does not match, in which case an errormessage is issued.

    2. Build of a physical parse tree: The result is a physical tree.3. Direct evaluation: Semantic actions executed during parsing.4. Build tree, interpret tree: The generated Tree is traversed and evaluated.

    JSON Checker (Recognize only)

    JSON (JavaScript Object Notation)[5][6]is an exchange format suited for

    serializing/deserializing program data. Compared to XML it is featherweight and therefore a

    good testing candidate for parsing techniques. The JSON Checker presented here gives an error

    message and error location in case the file does not conform to the JSON grammar. The

    following PEG grammar is the basis ofjson_check.

    Collapse

    [1]json_text: S top_element expect_file_end ;

    [2]expect_file_end: !./ WARNING;[3]top_element: object / array /

    FATAL ;[4]object: '{' S (&'}'/members) @'}' S ;[5]members: pair S (',' S pair S)* ;[6]pair: @string S @':' S value ;[7]array: '[' S (&']'/elements) @']' S ;[8]elements: value S (','S value S)* ;[9]value: @(string / number / object /

    array / 'true' / 'false' / 'null') ;[10]string: '"' char* @'"' ;[11]char: escape / !(["\\]/control_chars)unicode_char ;[12]escape: '\\' ( ["\\/bfnrt] /

    'u' ([0-9A-Fa-f]{4}/FATAL)/FATAL);[13]number: '-'? int frac? exp? ;[14]int: '0'/ [1-9][0-9]* ;[15]frac: '.' [0-9]+ ;[16]exp: [eE] [-+] [0-9]+ ;[17]control_chars: [#x0-#x1F] ;[18]unicode_char: [#x0-#xFFFF] ;[19]S: [ \t\r\n]* ;

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#References
  • 7/30/2019 PEG GrammarExplorer

    21/51

    The translation of the above grammar to C#3.0 is straightforward and results in the followingcode (only the translation of the first 4 rules are reproduced).

    Collapsepublic bool json_text()

    {return And(()=> S() && top_element() && expect_file_end() );}public bool expect_file_end(){ return

    Not(()=> Any() )|| Warning("non-json stuff before end of file");

    }public bool top_element(){ return

    @object()|| array()|| Fatal("json file must start with '{' or '['");

    }public bool @object(){

    return And(()=>Char('{')

    && S()&& ( Peek(()=> Char('}') ) || members())&& ( Char('}') || Fatal(" expected"))&& S() );

    }

    JSON Tree (Build Tree)

    With a few changes of the JSON checker grammar we get a grammar which generates a physical

    tree for a JSON file. In order to have unique nodes for the JSON values true, false, null weadd corresponding rules. Furthermore, we add a rule which matches the content of a string (the

    string without the enclosing double quotes). This gives us the following grammar:

    Collapse

    [1]^^json_text: (object / array) ;[2]^^object: S '{' S (&'}'/members) S @'}' S ;[3]members: pair S (',' S @pair S)* ;

    [4]^^pair: @string S ':' S value ;[5]^^array: S '[' S (&']'/elements) S @']' S ;[6]elements: value S (','S @value S)* ;[7]value: @(string / number / object /

    array / true / false / null) ;[8]string: '"' string_content '"' ;[9]^^string_content: ( '\\'

    ( 'u'([0-9A-Fa-f]{4}/FATAL)

  • 7/30/2019 PEG GrammarExplorer

    22/51

    / ["\\/bfnrt]/FATAL)

    / [#x20-#x21#x23-#xFFFF])* ;

    [10]^^number: '-'? '0'/ [1-9][0-9]* ('.' [0-9]+)? ([eE] [-+] [0-9]+)?;[11]S: [ \t\r\n]* ;[12]^^true: 'true' ;[13]^^false: 'false' ;[14]^^null: 'null' ;

    The following table shows on the left hand side a JSON input file and on the right hand side the

    tree generated by the TreePrint helper class of our parser library.

    JSON Sample File TreePrint Output

    Collapse{

    "ImageDescription": {"Width": 800,"Height": 600,"Title": "View from 15th

    Floor","IDs": [116, 943, 234,

    38793]}

    }

    Collapse

    json_text>

    >>

    Basic Encode Rules (Direct Evaluation + Build Tree)

    BER (Basic Encoding Rules) is the most commonly used format for encoding ASN.1 data. LikeXML, ASN.1 serves the purpose of representing hierarchical data, but unlike XML, ASN.1 is

    traditionally encoded in compact binary formats and BER is one of the these formats (albeit the

    least compact one). The Internet standards SNMP and LDAP are examples of ASN.1 protocols

    using BER as encoding. The following PEG grammar for reading a BER file into a treerepresentation uses semantic blocks to store information necessary for further parsing. This kind

    of dynamic parsing which uses data read during the parsing process to decode data furtherdownstreams is typical for parsing of binary formats. The grammar rules for BER[4]as shown

    below express the following facts:

    1. BER nodes consist of the triple Tag Length Value (abbreviated as TLV) where Value iseither a primitive value or a list of TLV nodes.

    2. The Tag identifies the element (like the start tag in XML).

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Referenceshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#References
  • 7/30/2019 PEG GrammarExplorer

    23/51

    3. The Tag contains a flag whether the element is primitive or constructed. Constructedmeans that there are children.

    4. The Length is either the length of the Value in bytes or it is the special pattern 0x80 (onlyallowed for elements with children), in which case the sequence of childrens ends with

    two zero bytes (0x0000).

    5.

    The Value is either a primitive value or -if the constructed flag is set- it is a sequence ofTag Length Value triples. The sequence of TLV triples ends when the length given in the

    Length part of the TLV tripple is used up or in the case where the length is given as 0x80,

    when the end marker 0x0000 has been reached.

    Collapse

    {int tag,length,n,@byte;bool init_() {tag=0;length=0; return true;}bool add_Tag_() {tag*=128;tag+=n; return true;}

    bool addLength_(){length*=256;length+=@byte;return true;}}[1] ProtocolDataUnit: TLV;[2] ^^TLV: init_

    ( &BITS Tag ( #x80 CompositeDelimValue #0#0 / LengthCompositeValue )

    / Tag Length PrimitiveValue);

    [3] Tag: OneOctetTag / MultiOctetTag / FATAL;[4] ^^OneOctetTag: !BITS BITS;[5] ^^MultiOctetTag: . (&BITS BITS add_Tag_)* BITSadd_Tag_;[6] Length : OneOctetLength / MultiOctetLength

    / FATAL;[7] ^^OneOctetLength: &BITS BITS;[8]^^MultiOctetLength: &BITS BITS ( .:byte addLength_){:n};[9]^^PrimitiveValue: .{:length} / FATAL;[10]^^CompositeDelimValue: (!(#0#0) TLV)*;[11]^^CompositeValue{

    int len;PegBegEnd begEnd;bool save_() {len= length;return true;}bool at_end_(){return len=0;}

    }: save_

    (!at_end_ TLV:begEnd(decr_/FATAL))*;

    Scientific Calculator (Build Tree + Evaluate Tree)

  • 7/30/2019 PEG GrammarExplorer

    24/51

    This calculator supports the basic arithmetic operations + - * /, built in functions taking one

    argument like 'sin','cos',.. and assignments to variables. The calculator expects line separatedexpressions and assignments. It works as two step interpreter which first builds a tree, then

    evaluates the tree. The PEG grammar for this calculator can be translated to a peg parser by the

    parser generator coming with the PEG Grammar Explorer. The evaluator must be written by

    hand. It works by walking the tree and evaluating the results as it visits the nodes.

    The grammar for the calculator is:

    Collapse[1]^^Calc: ((^'print' / Assign / Sum)

    ([\r\n]/!./FATAL)[ \r\n\t\v]* )+(!./FATAL);

    [2]^Assign:S ident S '=' S Sum;[3]^Sum: Prod (^[+-] S @Prod)*;[4]^Prod: Value (^[*/] S @Value)*;

    [5] Value: (Number/'('S Sum @')'S/Call/ident) S;[6]^Call: ident S '(' @Sum @')' S;[7]^Number:[0-9]+ ('.' [0-9]+)?([eE][+-][0-9]+)?;[8]^ident: [A-Za-z_][A-Za-z_0-9]*;[9] S: [ \t\v]*;

    PEG Parser Generator

    A PEG Generator Implemented with PEG

    The library classes PegCharParser and PegByteParser are designed for manual Parserconstruction of PEG parsers. But it is highly recommended in any case to first write the grammar

    on paper before implementing it. I wrote a little parser generator (using PegCharParser) whichtranslates a 'paper' Peg grammar to a C# program. The current version of the PEG parser

    generator just generates a C# parser. It uses optimizations for huge character sets and for big setsof literal alternatives. Future versions will generate source code for C/C++ and other languages

    and furthermore support debugging, tracing and direct execution of the grammar without the

    need to translate it to a host language. But even the current version of the PEG parser generatoris quite helpful.

    All the samples presented in the chapterExpression Grammar Exampleswere generated with it.

    The PEG Parser Generator is an example of a PEG parser which generates a syntax tree. It takesa PEG grammar as input, validates the generated syntax tree and then writes a set of C# code

    files, which implement the parser described by the PEG grammar.

    PEG Parser Generator Grammar

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Exampleshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammar%20Examples
  • 7/30/2019 PEG GrammarExplorer

    25/51

    The PEG Parser Generator coming with this article expects a set of grammar rules written as

    described in the chapterParsing Expression Grammars Basics. These rules must be preceded bya header and terminated by a trailer as described in the following PEG Grammar:

    Collapsepeg_module: peg_head peg_specification peg_tail;peg_head: S '';attribute: attribute_key S '=' S attribute_value S;attribute_key: ident;attribute_value: "attribute value in single or double quotes";peg_specification: toplevel_semantic_blocks peg_rules;toplevel_semantic_blocks:semantic_block*;semantic_block: named_semantic_block / anonymous_semantic_block;named_semantic_block: sem_block_name S anonymous_semantic_block;anonymous_semantic_block:'{' semantic_block_content '}' S;peg_rules: S peg_rule+;peg_rule: lhs_of_rule ':'S rhs_of_rule ';' S;lhs_of_rule: rule_id? tree_or_ast? create_spec?

    rule_name_and_params(semantic_block/using_sem_block)?;

    rule_id: (![A-Za-z_0-9^] .)* [0-9]+ (![A-Za-z_0-9^] .)*;tree_or_ast: '^^'/'^';create_spec: 'CREATE' S '' S;create_method: ident;ident: [A-Za-z_][A-Za-z_0-9]*;rhs_of_rule: "right hand side of rule as described in

    %22#Parsing" expression="" grammars=""

    basics"="">Parsing Expression Grammars Basics";semantic_block_content: "semantic block content as described in

    "%22#Parsing" expression="" grammars=""

    basics"="">Parsing Expression Grammars Basics";peg_tail: '';

    The header of the grammar contains HTML/XML-style attributes which are used to determinethe name of the generated C# file and the input file properties. The following attributes are used

    by the C# code generator:

    Attribute Key Optionality Attribute Value

    Name Mandatory Name for the generated C# grammar file and namespace

    encoding_class OptionalEncoding of the input file. Must be one ofbinary, unicode,

    utf8 or ascii. Default is ascii

    encoding_detection Optional

    Must only be present if the encoding_class is set to unicode.

    In this case one of the values FirstCharIsAscii or BOM isexpected.

    All further attributes are treated as comments. The attribute reference in the following sampleheader

    http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basicshttp://www.codeproject.com/KB/recipes/%22%3C/spanhttp://www.codeproject.com/KB/recipes/%22%3C/spanhttp://www.codeproject.com/KB/recipes/%3Cspanhttp://www.codeproject.com/KB/recipes/%3Cspanhttp://www.codeproject.com/KB/recipes/%3Cspanhttp://www.codeproject.com/KB/recipes/%3Cspanhttp://www.codeproject.com/KB/recipes/%22%3C/spanhttp://www.codeproject.com/KB/recipes/%22%3C/spanhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#Parsing%20Expression%20Grammars%20Basics
  • 7/30/2019 PEG GrammarExplorer

    26/51

    Collapse

    is treated as comment.

    The PEG Parser Generator's Handling of Semantic Blocks

    Semantic blocks are translated to local classes. The code inside semantic blocks must be C#

    source text as expected in a class body, except that access keywords can be left out. The parser

    generator prepends an internal access keyword when necessary. Top level semantic blocks arehandled differently than local semantic blocks.

    A top level semantic block is created in the grammar's constructor, wheras a local semantic block

    is created each time the associated rule method is called. There is no need to define a constructor

    in a local semantic block, since the parser generator creates a constructor with one parameter, a

    reference to the grammar class. The following sample shows a grammar excerpt with a top leveland a local semantic block and its translation to C# code.

    CollapseTop{ // semantic top level block

    double result;bool print_(){Console.WriteLine("{0}",result);return true;}

    }...Number{ //local semantic block

    string sNumber;bool store_(){double.TryParse(sNumber,out result);return true;}} : ([0-9]+ ('.' [0-9]+)?):sNumber store_ ;

    These semantic blocks will be translated to the following C# source code

    Collapseclass calc0_direct : PegCharParser{

    class Top{ // semantic top level blockinternal double result;internal bool print_(){Console.WriteLine("{0}",result);return true;}

    }

    Top top;

    #region Constructorspublic calc0_direct(): base(){ top= new Top();}public calc0_direct(string src,TextWriter FerrOut): base(src,FerrOut){top=

    new Top();}#endregion Constructors...class _Number{ //local semantic block

    internal string sNumber;

  • 7/30/2019 PEG GrammarExplorer

    27/51

    internal bool store_(){double.TryParse(sNumber,outparent_.top.result);return true;}

    internal _Number(calc0_direct grammarClass){parent_ = grammarClass; }calc0_direct parent_;

    }public bool Number(){

    var _sem= new _Number(this);...

    }

    Quite often, several grammar rules must use the same local semantic block. To avoid code

    duplication, the parser generator supports the using SemanticBlockName clause. The semantic

    block named SemanticBlockName should be defined before the first grammar rule at the sameplace where the top level semantic blocks are defined. But because such a block is referenced in

    the using clause of a rule, it is treated as local semantic block.

    Local semantic blocks also support destructors. A destructor is tranlsated to a IDispose interface

    and the destructor code is placed into the corresponding Dispose() function. The grammar rulefunction which is generated by the parser generator will be enclosed in a using block. This

    allows to execute cleanup code at the end of the rule even in the presence of exceptions. The

    following sample is taken from the Python 2.5.2 sample parser.

    CollapseLine_join_sem_{

    bool prev_;Line_join_sem_ (){set_implicit_line_joining_(true,out prev_);}~Line_join_sem_(){set_implicit_line_joining_(prev_);}

    }...

    [8] parenth_form: '(' parenth_form_content @')' S;[9] parenth_form_contentusing Line_join_sem_: S expression_list?;

    ...[17]^^generator_expression: '(' generator_expression_content @')' S;[18]generator_expression_content

    using Line_join_sem_: S expression genexpr_for;

    The Line_join_sem semantic block turns Python's implicit line joining on and off (Python isline oriented except that line breaks are allowed inside constructs which are parenthized as in

    (...) {...) [...]. The Line_join_sem semantic block and rule [8] of the above grammar

    excerpt are translated to

    Collapseclass Line_join_sem_ : IDisposable{

    bool prev_;internal Line_join_sem_(python_2_5_2_i parent){

    parent_= parent;parent_._top.set_implicit_line_joining_(true,out prev_);

    }python_2_5_2_i parent_;

  • 7/30/2019 PEG GrammarExplorer

    28/51

    public void Dispose(){parent_._top.set_implicit_line_joining_(prev_);}}public bool parenth_form_content() /*[9] parenth_form_content

    using Line_join_sem_: Sexpression_list?;*/{

    using(var _sem= new Line_join_sem_(this)){return And(()=> S() && Option(()=> expression_list() )

    );}

    Parsing Expression Grammars in Perspective

    Parsing Expression Grammars narrow the semantic gap between formal grammar and

    implementation of the grammar in a functional or imperative programming language. PEGs aretherefore particularly well suited for manually written parsers as well as for attempts to integrate

    a grammar very closely into a programming language. As stated in [1], the elements which form

    the PEG framework are not new, but are well known and commonly used techniques when

    implementing parsers manually. What makes the PEG framework unique is the selection andcombination of the basic elements, namely

    PEG Feature Advantage Disadvantage

    Scannerless

    parsing

    Only one level of abstraction. No

    scanner means no scanner worries.

    Grammar sligthly cluttered up.

    Recognition of overlapping tokensmight be inefficient (e.g. identifier

    token overlaps with keywords ->

    ident: !keyword [A-Z]+; )

    Lack of

    ambiguity

    There is only one interpretation for a

    grammar. The effect is, that PEGgrammars are "executable".

    ---

    Error handling by

    using FATALalternative

    Total user control over error diagnostics Bloats the grammar.

    Excessive

    Backtrackingpossible

    Backtracking adds to the powerfulnessof PEG. If the input string is in memory,

    backtrackting just means resetting the

    input position.

    Potential memory hogg. Interferes

    with semantic actions. Solution:

    Issue a fatal error in casebacktracking cannot succeed

    anymore.

    Greedy repetition

    Greedy repetition conforms to the

    "maximum munch rule" used inscanning and therefore allows

    scannerless parsing.

    Some patterns are more difficult torecognize.

    Ordered ChoiceThe author of the grammar determines

    the selection strategy.

    Potential error source for the

    unexperienced. R: '

  • 7/30/2019 PEG GrammarExplorer

    29/51

    PEG Feature Advantage Disadvantage

    operators & and ! cost/gain ratio if backtracking is anywaysupported. Lookahead e.g. allows better

    reuse of grammar rules and supports

    more expressive error diagnostics.

    parser slow.

    A PEG grammar can incur a serious performance penalty, when backtracking occurs frequently.This is the reason that some PEG tools (so called packrat parsers) memoize already read input

    and the associated rules. It can be proven, that appropriate memoization guarantees linear parse

    time even when backtracking and unlimited lookahead occurs. Memoization (saving information

    about already taken paths) on the other hand has its own overhead and impairs performance inthe average case. The far better approach to limit backtracking is rewriting the grammar in a

    way, which reduces backtracking. How to do this will be shown in the next chapter.

    The ideas underlying PEG grammars are not entirely new and many of them are regularly used to

    manually construct parsers. Only in its support and encouragement for backtracking and

    unlimited lookahead deviates PEG from most earlier parsing techniques. The simplestimplementation for unlimited lookahead and backtracking requires that the input file must be

    read into internal memory before parsing starts. This is not a problem nowadays, but was not

    acceptable earlier when memory was a scarce resource.

    Parsing Expression Grammars Tuning

    A set of grammar rules can recognize a given language. But the same language can be describedby many different grammars even within the same formalism (e.g. PEG grammars). Grammar

    modifications can be used to meet the following goals:

    Goal Before modification After modification

    More informative

    tree nodes [1]

    Collapse[1]^^string:'"' ('\\' ["'bfnrt\\])/!'"'

    .)* '"';

    Collapse[1]^^string:'"'(escs/chars)* '"';[3] ^^escs: ('\\'["'bfnrt\\])*;[4] ^^chars: (!["\\] .)*;

    Better placed

    error indication [2]

    Collapse'/*' (!'*/' . )*('*/' /FATAL);

    Collapse'/*'( (!'*/' .)* '*/'/ FATAL

    );

    FasterGrammar

    (reduce calling depth)

    [3]

    Collapse[10]string: '"' char* '"'

    ;[11]char: escape

    / !(["\\]/control)unicode/!'"' FATAL;

    Collapse[10]string:'"'( '\\'["\\/bfnrt]/ [#0x20-#0x21#0x23-#0x5B#0x5D-#0xFFFF]/ !'"' FATAL

  • 7/30/2019 PEG GrammarExplorer

    30/51

    [12]escape: '\\' ["\\/bfnrt];[17]control [#x0-#x1F];[18]unicode: [#x0-#xFFFFF];

    character">

    )*'"';

    FasterGrammar,

    Less Backtracking(left factoring)

    Collapse[1] if_stat:

    'if' S '('expr')' stat /'if' S '('expr')' stat

    'else' S stat;

    Collapse[1] if_stat:

    'if' S '(' expr ')' stat('else' S

    stat)?;

    Remarks:

    [1] More informative tree nodes can be obtained by syntactical grouping of grammar elements so

    that postprocessing is easier. In the above example, access to the content of the string isimproved by grouping consecutive non-escape characters into one syntactical unit.

    [2] The source reference place which is given by an error message is important. In the example

    of a c comment which is not closed until the end of the input, the error message should be given

    where the comment opens.

    [3] Reducing calling depth means inlinig of function calls, since each rule corresponds to onefunction call in our PEG implementation. Such a transformation should only be carried out for

    hot spots, otherwise the expressiveness of the grammar gets lost. Furthermore, some aggressive

    inlining compilers may do this inlining for you. Reducing calling depth may be questionable, butleft factorization is certainly not. It not only improves performance but also eliminiates potential

    disruptive backtracking. When embedding semantic actions into a PEG parser, backtracking

    should in many cases not occur anymore, because undoing semantic actions may be tedious.

    Comparison of PEG Parsing with other Parsing Techniques

    Most parsing strategies currently in use are based on the notion of a context free grammar. (The

    following explanations follow -for the next 50 lines- closely the material used in the Wikipediaon Context free grammars [3]) A context free grammar consists of a set of rules similar to the set

    of rules for a PEG parser. But context free grammars are quite differently interpreted than PEGgrammars. The main difference is the fact, that context free grammars are nondeterministic,

    meaning that

    1. Alternatives in context free grammars can be chosen arbitrarily2. Nonterminals can be substituted in an arbitrary order (Substitution means replacing a

    Nonterminal on the right hand side of a rule by the definition of the Nonterminal). Bystarting with the start rule and choosing alternatives and substituting nonterminals in all

    possible orders we can generate all the strings which are described by the grammar (also

    called the language described by the grammar).

    With the context free grammarCollapse

    S : 'a' S 'b' | 'ab';

    e.g. we can generate the following language strings

  • 7/30/2019 PEG GrammarExplorer

    31/51

    Collapseab, aabb, aaabbb,aaaabbbb,...

    With PEG we cannot generate a language, we can only recognize an input string. The same

    grammar interpreted as PEG grammar

    CollapseS: 'a' S 'b' / 'ab';

    would recognize any of the following input strings

    Collapseabaabbaaabbbaaaabbbb

    It turns out, that the nondeterministic nature of context free grammars, while being indispensablefor generating a language, can be a problem when recognizing an input string. If an input string

    can be parsed in two different ways we have the problem of ambiguity, which must be avoidedby parsers. A further consequence of nondeterminism is that a context free input string

    recognizer (a parser) must choose a strategy how to substitute nonterminals on the right hand

    side of a rule. To recognize the input string

    Collapse1+1+a

    with the context free rule

    CollapseS: S '+' S | '1' | 'a';

    we either can e.g use the following substitutions:Collapse

    S '+' S ->(S '+' S) '+' S ->(('1') '+' S) '+' S ->(('1') '+' ('1')) '+' S ->(('1') '+' ('1')) '+' ('a')

    This is called a leftmost derivation. Or we can use the substitutions:

    CollapseS '+' S ->

    S '+' (S '+' S) ->

    S '+' (S '+' ('a')) ->S '+' (('1') '+' ('a')) ->('1') '+' (('1') '+' ('a'));

    This is called a rightmost derivation. A leftmost derivation parsing strategy is called LL, wherasa rightmost derivation parsing strategy is called LR (the first L in LL and LR stands for "parsethe input string from Left", but who will try it from right?). Most parsers in use are either LL or

  • 7/30/2019 PEG GrammarExplorer

    32/51

    LR parsers. Furthermore, grammars used for LL parsers and LR parsers must obey different

    rules. A grammar for an LL parser must never use left recursive rules, whereas a grammar for anLR parser prefers immediate left recursive rules over right recursive ones. The C# grammar e.g.

    is written for an LR parser. The rule for a list of local variables is therefore:

    Collapselocal-variable-declarators:local-variable-declarator

    | local-variable-declarators ',' local-variable-declarator;

    If we want use this grammar with an LL parser, then we must rewrite this rule to:

    Collapselocal-variable-declarators:

    local-variable-declarator (',' local-variable-declarator)*;

    Coming back to original context free rule

    CollapseS: S '+' S | '1' | 'a';

    and interpreting it as a PEG rule

    CollapseS: S '+' S / '1' / 'a';

    we do not have to do substitutions or/and choose a parsing strategy to recognize the input string

    Collapse1+1+a

    We only must follow the execution rules for a PEG which translates to the following steps:

    1. Set the input position to the start of the input string2. Choose the first alternative of the start rule (here: S '+' S)3. Match the input against the first component of the sequence S '+' S4. Since the first component is the nonterminal S, call this nonterminal.

    This obviously results in infinite recursion. The rule S: S '+' S / '1' / 'a'; is therefore not a

    valid PEG rule. But almost any context free rule can be transformed into a PEG rule. The context

    free rule S: S '+' S | '1' | 'a'; translates to the valid PEG rule

    CollapseS: ('1'/'a')('+' S)*;

    One of the following chapters shows how to translate a context free rule into a PEG rule.

    The following table compares the prevailing parser types.

  • 7/30/2019 PEG GrammarExplorer

    33/51

    ParserType

    Sub Type Scanner Lookahead Generality Implementation Examples

    Context

    FreeLR-Parser yes - - table driven

    Context

    Free SLR-Parser yes 1 medium table driven

    handcomputed

    table

    Context

    Free

    LALR(1)-

    Parseryes 1 high table driven YACC,Bison

    ContextFree

    LL-Parser yes - -code or tabledriven

    Context

    FreeLL(1)-Parser yes 1 low

    code or table

    drivenPredictive parsing

    Context

    FreeLL(k)-Parser yes k high

    code or table

    drivenANTLR,Coco-R

    Context

    Free LL(*)-Parser yes unlimited high+

    code or table

    driven boost::spirit

    PEG-

    ParserPEG-Parser no unlimited very high code preferred Rats,Packrat,Pappy

    The reason, that the above table qualifies the generality and powerfulness of PEG as very high isdue to the PEG operators & (peek) and ! (not). It is not difficult to implement these operations,

    but heavy use of them can impair the parser performance and earlier generations of parser writers

    carefully avoided such features because of the implied costs.

    When it comes to runtime performance, the differences between the above parser strategies are

    not so clear. LALR(1) Parser can be very fast. The same is true for LL(1) parsers (predictiveparsers). When using LL(*) and PEG-Parsers, runtime performance depends on the amount oflookahead actually used by the grammar. Special versions of PEG-Parsers (Packrat parsers) can

    guarantee linear runtime behaviour (meaning that doubling the length of the input string just

    doubles the parsing time).

    An important difference between LR-Parsers and LL- or PEG-Parsers is the fact that LR-Parser

    are always table driven. A manually written Parser is therefore in most cases either an LL-Parseror a PEG-Parser. Table driven parsing puts parsing into a black box which only allows limited

    user interaction. This is not a problem for a one time, clearly defined parsing task, but is not ideal

    if one frequently corrects/improves and extends the grammar because changing the grammar

    means in case of a table driven parser a complete table and code regeneration.

    Translating LR grammars to PEG Grammars

    Most specifications for popular programming languages come with a grammar suited for an LR

    parser. LL and PEG parsers can not directly use such grammars because of left recursive rules.

    Left recursive rules are forbidden in LL and PEG parsers because they result in idefiniterecursion. Another problem with LR grammars is that they often use alternatives with the same

  • 7/30/2019 PEG GrammarExplorer

    34/51

    beginning. This is legal in PEG but results in unwanted backtracking. The following table shows

    the necessary grammar transformations when going from an LR grammar to a PEG grammar.

    Transformation Category LR rule~PEG rule (result oftransformation)

    Immediate Left Recursion =>Factor out non recursive

    alternatives

    Collapse// s1, s2 are termswhich are not//left recursive// and not emptyA: A t1 | A t2 | s1 |s2;

    CollapseA: (s1 | s2) (t1 | t2)*

    Indirect Left Recursion =>Transfrom to Immediate Left

    Recursion=>

    Factor out non recursivealternatives

    CollapseA: B t1 | s1 ;B: A t2 | s3 ;

    Collapse// we substitute B by itsright hand sideA: (A t2 | s3) t1 | s1;//Eliminate immediate leftrecursion

    A: (s3 t1 | s1) (t2 t1)*...;

    Alternatives with same

    beginning =>

    Merge alternatives using LeftFactorization

    CollapseA: s1 t1 | s1 t2;

    CollapseA: s1 (t1 | t2);

    The following sample shows the transformation of part of the "C" grammar from the LRgrammar as presented in Kernighan and Ritchies book on "C" to a PEG grammar (the symbol S

    is used to denote scanning of white space).

    LR grammar: "C" snippet declarator stuff... PEG grammar: "C" declarator stuff...

    Collapsedeclarator: pointer?direct_declarator;

    Collapsedeclarator: pointer?direct_declarator;

    Collapse

    Collapsedirect_declarator:

    identifier| '(' declarator ')'| direct_declarator '['constant_expression? ']'

    | direct_declarator '('parameter_type_list ')'| direct_declarator '('identifier_list? ')';

    Collapsedirect_declarator:(identifier / '(' S declarator')' S)( '[' S constant_expression?']' S

    / '(' S parameter_type_list

    ')' S/ '(' S identifier_list? ')'

    S)*;

    Collapsepointer:

    '*' type_qualifier_list?| '*' type_qualifier_list?pointer;;

    Collapsepointer:

    '*' S type_qualifier_list?pointer?

  • 7/30/2019 PEG GrammarExplorer

    35/51

    LR grammar: "C" snippet declarator stuff... PEG grammar: "C" declarator stuff...

    Collapseparameter_type_list:

    parameter_list| parameter_list ',' '...';

    Collapseparameter_type_list:

    parameter_list (',' S'...')?;

    Collapse

    Collapsetype_qualifier_list:type_qualifier

    | type_qualifier_listtype_qualifier;

    Collapsetype_qualifier_list:

    type_qualifier+;Collapse

    Collapseparameter_declaration:

    declaration_specifiers declarator| declaration_specifiersabstract_declrator?;

    Collapseparameter_declaration:

    declaration_specifiers(declarator /

    abstract_declarator?);

    Collapse

    Collapseidentifier_list:

    identifier

    | identifier_list ',' identifier;

    Collapseidentifier_list:

    identifier S

    (',' S identifier)*;

    Collapse

    Future Developments

    The planned series of articles consists of the following parts

    SubjectPlanned

    Release dateDescription

    Parser and ParserGenerator

    october 2008C# classes to support work with Parsing expressiongrammars

    PEG-

    Debugger/Interpreterjanuary 2009

    Direct interpretation and debugging of PEG grammars

    without need to generate C# module

    Sample Applications june 2009 More grammar samples with postprocessing

    History

    2008 October: initial version 2008 October:..minor update

    o improved semantic block support (using clause,IDispose Interface)o added new sample parser Python 2.5.2

    References

    [1]Parsing Expression Grammars,Bryan Ford, MIT, January 2004 [^]

    [2]Parsing expression grammar, Wikipedia.en [^][3]Context-free grammar, Wikipedia.en [^]

    [4]ITU-T X.690: Specification of BER, CER and DER, INTERNATIONAL

    http://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Context-free_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://en.wikipedia.org/wiki/Parsing_expression_grammarhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.htmlhttp://www.brynosaurus.com/pub/lang/peg-slides/img0.html
  • 7/30/2019 PEG GrammarExplorer

    36/51

    TELECOMMUNICATION UNION [^]

    [5]RFC 4627:The application/json Media Type for JavaScript Object Notation (JSON) [^][6]Introducing JSON [^]

    License

    This article, along with any associated source code and files, is licensed under The Code Project

    Open License (CPOL)

    About the Author

    Martin.Holzherr

    Switzerland

    Member

    Article Top

    Rate this article for us! Poor Excellent Vote

    Add a reason or comment to your vote:x

    http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.ietf.org/rfc/rfc4627.txthttp://www.ietf.org/rfc/rfc4627.txthttp://www.ietf.org/rfc/rfc4627.txthttp://www.ietf.org/rfc/rfc4627.txthttp://www.ietf.org/rfc/rfc4627.txthttp://www.json.org/http://www.json.org/http://www.json.org/http://www.json.org/http://www.json.org/http://www.codeproject.com/info/cpol10.aspxhttp://www.codeproject.com/info/cpol10.aspxhttp://www.codeproject.com/info/cpol10.aspxhttp://www.codeproject.com/info/cpol10.aspxhttp://www.codeproject.com/script/Membership/View.aspx?mid=1879254http://www.codeproject.com/script/Membership/View.aspx?mid=1879254http://www.codeproject.com/KB/recipes/grammar_support_1.aspx#_tophttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#_tophttp://www.codeproject.com/KB/recipes/grammar_support_1.aspxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspxhttp://a.lakequincy.com/c.ashx?channel=1&format=1&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=6&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=1&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=6&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=1&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=6&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=1&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://a.lakequincy.com/c.ashx?channel=1&format=6&pageid=49B8B38C-DB6C-1BBC-82EA-521878EA517D&publisher=495&tags=Chttp://www.codeproject.com/KB/recipes/grammar_support_1.aspxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#_tophttp://www.codeproject.com/script/Membership/View.aspx?mid=1879254http://www.codeproject.com/info/cpol10.aspxhttp://www.codeproject.com/info/cpol10.aspxhttp://www.json.org/http://www.json.org/http://www.ietf.org/rfc/rfc4627.txthttp://www.ietf.org/rfc/rfc4627.txthttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdfhttp://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
  • 7/30/2019 PEG GrammarExplorer

    37/51

    Comments and Discussions

    FAQ1527112

    Search

    1527112 ?floc=%2fKB%2 /KB/recipes/gra Noise Tolerance Medium Layout

    Per page25

    Update

    New Message Msgs 1 to 25 of 44 (Total in Forum: 44) (Refresh) FirstPrevNext

    Great work! Pedro J. Molina 17:15 8 Feb '11

    Congratulations Martin!

    I enjoyed a lot reading your educational article. Amazing work!

    Are you continuing with the roadmap on PEGs and creating the debugger and the interpreter?

    I am looking forward to see more...

    Pedro J. Molina, PhD

    http://pjmolina.com/metalevel

    ReplyEmailView ThreadLinkBookmark Rate this message: 12345

    Boost.Spirit OvermindDL1 7:53 18 Aug '09

    Actually Boost.Spirit (now known as Classic Spirit) is a PEG parser (regardless of what the

    docs say) and it does have the & and ! operators in exactly the format you specified. Theydid not call it a PEG parser because the creator of the library did not know that PEG parsers

    existed. Boost.Spirit2.1 is worlds faster then Classic Spirit, is even more capable and

    powerful, and the docs actually call it what it is now, a PEG parser (which it has been all

    along).

    I have to say that your code looks a *lot* like Boost.Spirit2.1, just a lot longer due to the lack

    of overloading the operators, although Boost.Spirit2.1 handles semantic actions a lot betterand more powerful. You should probably look at Boost.Spirit2.1 and you could take some

    ideas from it (if C# is capable of such power). Since C# templates do not have the full power

    of C++ templates it may be more verbose, but you should still be able to emulate much of the

    power, which would be useful for the C# programmers since they lack the power of the Boostversions otherwise.

    ReplyEmailView ThreadLinkBookmarkRate this message: 1234

    5

    Great Stuff ! LarsAC

    7:14 27 Jul '09Dear Martin,

    this is really a great article and even comes along with excellent code.

    The code gave me a good jump start into PEGs and has permitted me to successfully tackle

    the context-dependent grammar of the legacy software I'm working on.

    http://www.codeproject.com/KB/FAQs/MessageBoardsFAQ.aspxhttp://www.codeproject.com/KB/FAQs/MessageBoardsFAQ.aspxhttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&floc=/KB/recipes/grammar_support_1.aspxhttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&floc=/KB/recipes/grammar_support_1.aspxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&noise=3&sort=Position&view=Quickhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&noise=3&sort=Position&view=Quickhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&noise=3&sort=Position&view=Quickhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=26#xx0xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=26#xx0xxhttp://www.codeproject.com/Messages/3762935/Great-work.aspxhttp://www.codeproject.com/Messages/3762935/Great-work.aspxhttp://www.codeproject.com/script/Membership/View.aspx?mid=303752http://pjmolina.com/metalevelhttp://pjmolina.com/metalevelhttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3762935&floc=/KB/recipes/grammar_support_1.aspx&action=rhttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3762935&floc=/KB/recipes/grammar_support_1.aspx&action=ehttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3762935&floc=/KB/recipes/grammar_support_1.aspx&action=ehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&sort=Position&tid=3762935http://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&sort=Position&tid=3762935http://www.codeproject.com/Messages/3762935/Great-work.aspxhttp://www.codeproject.com/Messages/3762935/Great-work.aspxhttp://www.codeproject.com/script/Bookmarks/Add.aspx?obid=3762935&obtid=3&action=AddBookmark&bio=falsehttp://www.codeproject.com/script/Bookmarks/Add.aspx?obid=3762935&obtid=3&action=AddBookmark&bio=falsehttp://www.codeproject.com/script/Bookmarks/Add.aspx?obid=3762935&obtid=3&action=AddBookmark&bio=falsehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx#xx3762935xxhttp://www.codeproject.com/Messages/3164409/Boost-Spirit.aspxhttp://www.codeproject.com/Messages/3164409/Boost-Spirit.aspxhttp://www.codeproject.com/script/Membership/View.aspx?mid=443604http://www.codeproject.com/script/Membership/View.aspx?mid=443604http://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3164409&floc=/KB/recipes/grammar_support_1.aspx&action=rhttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3164409&floc=/KB/recipes/grammar_support_1.aspx&action=ehttp://www.codeproject.com/script/Forums/Edit.aspx?fid=1527112&select=3164409&floc=/KB/recipes/grammar_support_1.aspx&action=ehttp://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&sort=Position&tid=3164409http://www.codeproject.com/KB/recipes/grammar_support_1.aspx?fid=1527112&df=90&mpp=25&sort=Position&tid=3164409http://www.codeproject.com/Messages/3164409/Boost-