Class OPParser


  • public class OPParser
    extends Object
    OPParser is the parser class for the Top Down Operator Presidence Parsing framework. Top-down Operator Precedence Parsing is a technique invented by Vaughan Pratt in the 1970's. The fundamental idea is to associate semantics with tokens embodied in objects rather than using grammar rules embodied in procedures. The token's job is to transform pieces of syntax into Abstract Syntax Trees (AST). Each token has a "binding power" that determines operator precedence and two functions, "nud" (null denotation) and "led" (left denotation). The "nud" function is typically used to process tokens that have a role of "literal" or "variable". The "led" function is used for tokens that are typically in the role of an operator such as "+" or "*". The parsing process is driven by a very simple algorithm embodied in a method named "expression" which takes a right binding power as argument. The expression method starts by processing the first token (usually a literal or a variable) by invoking its "nud" method. The expression method then loops fetching the next token and building an AST by dispatching to the current token's "led" methods so long as the passed right binding power is less than the next token's left binding power. These "led" methods may eat as many tokens as is warranted by their semantics and they may make recursive calls back to the parsers expression method using their binding power as the right binding power argument. For example, to parse the expression "1 + 2", the parser's expression method would call "nud" on the token for the literal "1". The "nud" of a literal simply returns an atomic term. Then the parser checks if the right binding power is less than the left binding power of the next token and if true, it calls "led" on that next token. In this case the right binding power is 0 and the binding power of the "+" operator token is 50. So the parser calls "led" on the "+" operator token passing itself and the left AST node term ("1"). The "led" implementation of the "+" operator recursively calls the parser's expression method with a right binding power of 50. The expression calls "nud" on the next token, the literal token "2" which returns an atomic term for "2". The loop in expression is avoided because the next token is a special token used to represent the end of stream. It has a left binding power of -1 which is lower than any possible right binding power. Therefore, the parser returns the atomic term "2". The caller ("+" operator "led") returns an AST term for the "+" operator that has left and right terms. In this case the left term is the atomic "1" and the right is the atomic "2". Had we been parsing a slightly more complicated example such as "1 + 2 * 3" when the "led" for "+" recursively called expression with its binding power of 50 as the right binding power then the three tokens "2", "*", and "3" will be consumed because "*" has a left binding power of 60 which lets expression's loop calls the "led" of "*" ultimately returning an AST that represents "*(2,3) "to be used as the right side term for "+".
    Author:
    djl 2009.03.14
    See Also:
    OPToken
    • Constructor Detail

      • OPParser

        public OPParser​(String sExper,
                        TokenTable table,
                        Set<CharSequence> setOperators)
        Construct a new OPParser that parses the given expression in the language defined by the given TokenTable.
        Parameters:
        sExper - string expression to parse
        table - the TokenTable that defines the language to parse
        setOperators - the operators this parser will use
      • OPParser

        public OPParser​(Reader reader,
                        TokenTable table,
                        Set<CharSequence> setOperators)
        Construct a new OPParser that parses the given Reader in the language defined by the given TokenTable.
        Parameters:
        reader - Reader with the chars to parse
        table - the TokenTable that defines the language to parse
        setOperators - the operators this parser will use
      • OPParser

        public OPParser​(OPScanner scanner)
        Construct a new OPParser initialized to gthe given Scanner
        Parameters:
        scanner - OPScanner that is already initialized
    • Method Detail

      • getScanner

        public OPScanner getScanner()
        Obtain the OPScanner that backs this parser
        Returns:
        the OPScanner that acts as a token source
      • parse

        public <T extends Term> T parse()
        Parse the next expression into an Abstract Syntax Tree.
        Type Parameters:
        T - the term type
        Returns:
        an ASTNode that represents the structured meaning of the expression being parsed
      • expression

        public Term expression​(int nRightBindingPower)
        Parse the next expression into an Abstract Syntax Tree using the given right binding power. Parsing will continue so long as the right binding power is less than the current tokens left binding power. This is the central algorithm of Vaughan Pratt's Top Down Operator Precedence Parser.
        Parameters:
        nRightBindingPower - defines how strong token must bind to the left before halting parsing
        Returns:
        an ASTNode that represents the structured meaning of the expression being parsed
      • nodeList

        public Term[] nodeList​(String sEndMarker)
        Parse a comma separated sequence of expressions upto the given end marker. Return an array of ASTNodes
        Parameters:
        sEndMarker - defines the symbol that ends the sequence
        Returns:
        an ASTNode array that represents the sequence
      • nodeList

        public Term[] nodeList​(String sEndMarker,
                               boolean fEndStreamAllowed)
        Parse a comma separated sequence of expressions upto the given end marker. Given flag controll if end of stream overides the end marker. Return an array of ASTNodes
        Parameters:
        sEndMarker - defines the symbol that ends the sequence
        fEndStreamAllowed - flag to overide testing for sEndMarker if at the end of stream
        Returns:
        an ASTNode array that represents the sequence
      • nodeList

        public Term[] nodeList()
        Parse a comma separated sequence of expressions to the of the tokens. Return an array of ASTNodes
        Returns:
        an ASTNode array that represents the sequence
      • readNestedCommaSeparatedList

        public Term[] readNestedCommaSeparatedList​(NestedBaseTokens nest)
        Build an array of ASTNodes by processing the this tokens nest as a comma separated list. of BaseTokens.
        Parameters:
        nest - the nest of BaseTokens to process
        Returns:
        an array of AstNodes