Class OPParser
java.lang.Object
com.tangosol.coherence.dsltools.precedence.OPParser
OPParser is the parser class for the Top Down Operator Presidence Parsing
framework. Top-down Operator Precedence Parsing is a technique invented by
Vaughan Pratt in the 1970's.
The fundamental idea is to associate semantics with tokens embodied in
objects rather than using grammar rules embodied in procedures. The
token's job is to transform pieces of syntax into Abstract Syntax Trees (AST).
Each token has a "binding power" that determines operator precedence and
two functions, "nud" (null denotation) and "led" (left denotation).
The "nud" function is typically used to process tokens that have a role of
"literal" or "variable". The "led" function is used for tokens that are
typically in the role of an operator such as "+" or "*". The parsing
process is driven by a very simple algorithm embodied in a method named
"expression" which takes a right binding power as argument. The expression
method starts by processing the first token (usually a literal or a
variable) by invoking its "nud" method. The expression method then loops
fetching the next token and building an AST by dispatching to the current
token's "led" methods so long as the passed right binding power is less
than the next token's left binding power. These "led" methods may eat as
many tokens as is warranted by their semantics and they may make recursive
calls back to the parsers expression method using their binding power as
the right binding power argument.
For example, to parse the expression "1 + 2", the parser's expression
method would call "nud" on the token for the literal "1". The "nud" of a
literal simply returns an atomic term. Then the parser checks if the right
binding power is less than the left binding power of the next token and
if true, it calls "led" on that next token. In this case the right binding
power is 0 and the binding power of the "+" operator token is 50. So the
parser calls "led" on the "+" operator token passing itself and the left
AST node term ("1").
The "led" implementation of the "+" operator recursively calls the parser's
expression method with a right binding power of 50. The expression calls
"nud" on the next token, the literal token "2" which returns an atomic term
for "2". The loop in expression is avoided because the next token is a
special token used to represent the end of stream. It has a left binding
power of -1 which is lower than any possible right binding power.
Therefore, the parser returns the atomic term "2". The caller
("+" operator "led") returns an AST term for the "+" operator that has
left and right terms. In this case the left term is the atomic "1" and
the right is the atomic "2".
Had we been parsing a slightly more complicated example such as "1 + 2 * 3"
when the "led" for "+" recursively called expression with its binding power
of 50 as the right binding power then the three tokens "2", "*", and "3"
will be consumed because "*" has a left binding power of 60 which lets
expression's loop calls the "led" of "*" ultimately returning an AST that
represents "*(2,3) "to be used as the right side term for "+".
- Author:
- djl 2009.03.14
- See Also:
-
Constructor Summary
ConstructorDescriptionConstruct a new OPParser initialized to gthe given ScannerOPParser
(Reader reader, TokenTable table, Set<CharSequence> setOperators) Construct a new OPParser that parses the given Reader in the language defined by the given TokenTable.OPParser
(String sExper, TokenTable table, Set<CharSequence> setOperators) Construct a new OPParser that parses the given expression in the language defined by the given TokenTable. -
Method Summary
Modifier and TypeMethodDescriptionexpression
(int nRightBindingPower) Parse the next expression into an Abstract Syntax Tree using the given right binding power.Obtain the OPScanner that backs this parserTerm[]
nodeList()
Parse a comma separated sequence of expressions to the of the tokens.Term[]
Parse a comma separated sequence of expressions upto the given end marker.Term[]
Parse a comma separated sequence of expressions upto the given end marker.<T extends Term>
Tparse()
Parse the next expression into an Abstract Syntax Tree.Term[]
Build an array of ASTNodes by processing the this tokens nest as a comma separated list.
-
Constructor Details
-
OPParser
Construct a new OPParser that parses the given expression in the language defined by the given TokenTable.- Parameters:
sExper
- string expression to parsetable
- the TokenTable that defines the language to parsesetOperators
- the operators this parser will use
-
OPParser
Construct a new OPParser that parses the given Reader in the language defined by the given TokenTable.- Parameters:
reader
- Reader with the chars to parsetable
- the TokenTable that defines the language to parsesetOperators
- the operators this parser will use
-
OPParser
Construct a new OPParser initialized to gthe given Scanner- Parameters:
scanner
- OPScanner that is already initialized
-
-
Method Details
-
getScanner
Obtain the OPScanner that backs this parser- Returns:
- the OPScanner that acts as a token source
-
parse
Parse the next expression into an Abstract Syntax Tree.- Type Parameters:
T
- the term type- Returns:
- an ASTNode that represents the structured meaning of the expression being parsed
-
expression
Parse the next expression into an Abstract Syntax Tree using the given right binding power. Parsing will continue so long as the right binding power is less than the current tokens left binding power. This is the central algorithm of Vaughan Pratt's Top Down Operator Precedence Parser.- Parameters:
nRightBindingPower
- defines how strong token must bind to the left before halting parsing- Returns:
- an ASTNode that represents the structured meaning of the expression being parsed
-
nodeList
Parse a comma separated sequence of expressions upto the given end marker. Return an array of ASTNodes- Parameters:
sEndMarker
- defines the symbol that ends the sequence- Returns:
- an ASTNode array that represents the sequence
-
nodeList
Parse a comma separated sequence of expressions upto the given end marker. Given flag controll if end of stream overides the end marker. Return an array of ASTNodes- Parameters:
sEndMarker
- defines the symbol that ends the sequencefEndStreamAllowed
- flag to overide testing for sEndMarker if at the end of stream- Returns:
- an ASTNode array that represents the sequence
-
nodeList
Parse a comma separated sequence of expressions to the of the tokens. Return an array of ASTNodes- Returns:
- an ASTNode array that represents the sequence
-
readNestedCommaSeparatedList
Build an array of ASTNodes by processing the this tokens nest as a comma separated list. of BaseTokens.- Parameters:
nest
- the nest of BaseTokens to process- Returns:
- an array of AstNodes
-