One
weakness in theparser1
program is that it embeds expression evaluation logic in the parsing
logic: the result is computed while the string is being parsed. This
makes evaluation quick, but it can also make it difficult to modify the
code, especially in larger systems. To simplify, we could restructure
the program to keep expression parsing and evaluation separate. Instead
of evaluating the string, the parser can build up an intermediate
representation of it that can be evaluated later. As an added incentive,
building the representation separately makes it available to other
analysis tools (e.g., optimizers, viewers, and so on)—they can be run as
separate passes over the tree.
Example 19-16
shows a
variant ofparser1
that implements
this idea. The parser analyzes the string and builds up a
parse tree
—that is, a tree of class instances that
represents the expression and that may be evaluated in a separate step.
The parse tree is built from classes that “know” how to evaluate
themselves: to compute the expression, we just ask the tree to evaluate
itself. Root nodes in the tree ask their children to evaluate
themselves, and then combine the results by applying a single operator.
In effect, evaluation in this version is simply a recursive traversal of
a tree of embedded class instances constructed by the parser.
Example 19-16. PP4E\Lang\Parser\parser2.py
"""
Separate expression parsing from evaluation by building an explicit parse tree
"""
TraceDefault = False
class UndefinedError(Exception): pass
if __name__ == '__main__':
from scanner import Scanner, SyntaxError, LexicalError # if run here
else:
from .scanner import Scanner, SyntaxError, LexicalError # from PyTree
#################################################################################
# the interpreter (a smart objects tree)
#################################################################################
class TreeNode:
def validate(self, dict): # default error check
pass
def apply(self, dict): # default evaluator
pass
def trace(self, level): # default unparser
print('.' * level + '')
# ROOTS
class BinaryNode(TreeNode):
def __init__(self, left, right): # inherited methods
self.left, self.right = left, right # left/right branches
def validate(self, dict):
self.left.validate(dict) # recurse down branches
self.right.validate(dict)
def trace(self, level):
print('.' * level + '[' + self.label + ']')
self.left.trace(level+3)
self.right.trace(level+3)
class TimesNode(BinaryNode):
label = '*'
def apply(self, dict):
return self.left.apply(dict) * self.right.apply(dict)
class DivideNode(BinaryNode):
label = '/'
def apply(self, dict):
return self.left.apply(dict) / self.right.apply(dict)
class PlusNode(BinaryNode):
label = '+'
def apply(self, dict):
return self.left.apply(dict) + self.right.apply(dict)
class MinusNode(BinaryNode):
label = '-'
def apply(self, dict):
return self.left.apply(dict) - self.right.apply(dict)
# LEAVES
class NumNode(TreeNode):
def __init__(self, num):
self.num = num # already numeric
def apply(self, dict): # use default validate
return self.num
def trace(self, level):
print('.' * level + repr(self.num)) # as code, was 'self.num'
class VarNode(TreeNode):
def __init__(self, text, start):
self.name = text # variable name
self.column = start # column for errors
def validate(self, dict):
if not self.name in dict.keys():
raise UndefinedError(self.name, self.column)
def apply(self, dict):
return dict[self.name] # validate before apply
def assign(self, value, dict):
dict[self.name] = value # local extension
def trace(self, level):
print('.' * level + self.name)
# COMPOSITES
class AssignNode(TreeNode):
def __init__(self, var, val):
self.var, self.val = var, val
def validate(self, dict):
self.val.validate(dict) # don't validate var
def apply(self, dict):
self.var.assign( self.val.apply(dict), dict )
def trace(self, level):
print('.' * level + 'set ')
self.var.trace(level + 3)
self.val.trace(level + 3)
#################################################################################
# the parser (syntax analyser, tree builder)
#################################################################################
class Parser:
def __init__(self, text=''):
self.lex = Scanner(text) # make a scanner
self.vars = {'pi':3.14159} # add constants
self.traceme = TraceDefault
def parse(self, *text): # external interface
if text:
self.lex.newtext(text[0]) # reuse with new text
tree = self.analyse() # parse string
if tree:
if self.traceme: # dump parse-tree?
print(); tree.trace(0)
if self.errorCheck(tree): # check names
self.interpret(tree) # evaluate tree
def analyse(self):
try:
self.lex.scan() # get first token
return self.Goal() # build a parse-tree
except SyntaxError:
print('Syntax Error at column:', self.lex.start)
self.lex.showerror()
except LexicalError:
print('Lexical Error at column:', self.lex.start)
self.lex.showerror()
def errorCheck(self, tree):
try:
tree.validate(self.vars) # error checker
return 'ok'
except UndefinedError as instance: # args is a tuple
varinfo = instance.args
print("'%s' is undefined at column: %d" % varinfo)
self.lex.start = varinfo[1]
self.lex.showerror() # returns None
def interpret(self, tree):
result = tree.apply(self.vars) # tree evals itself
if result != None: # ignore 'set' result
print(result) # ignores errors
def Goal(self):
if self.lex.token in ['num', 'var', '(']:
tree = self.Expr()
self.lex.match('\0')
return tree
elif self.lex.token == 'set':
tree = self.Assign()
self.lex.match('\0')
return tree
else:
raise SyntaxError()
def Assign(self):
self.lex.match('set')
vartree = VarNode(self.lex.value, self.lex.start)
self.lex.match('var')
valtree = self.Expr()
return AssignNode(vartree, valtree) # two subtrees
def Expr(self):
left = self.Factor() # left subtree
while True:
if self.lex.token in ['\0', ')']:
return left
elif self.lex.token == '+':
self.lex.scan()
left = PlusNode(left, self.Factor()) # add root-node
elif self.lex.token == '-':
self.lex.scan()
left = MinusNode(left, self.Factor()) # grows up/right
else:
raise SyntaxError()
def Factor(self):
left = self.Term()
while True:
if self.lex.token in ['+', '-', '\0', ')']:
return left
elif self.lex.token == '*':
self.lex.scan()
left = TimesNode(left, self.Term())
elif self.lex.token == '/':
self.lex.scan()
left = DivideNode(left, self.Term())
else:
raise SyntaxError()
def Term(self):
if self.lex.token == 'num':
leaf = NumNode(self.lex.match('num'))
return leaf
elif self.lex.token == 'var':
leaf = VarNode(self.lex.value, self.lex.start)
self.lex.scan()
return leaf
elif self.lex.token == '(':
self.lex.scan()
tree = self.Expr()
self.lex.match(')')
return tree
else:
raise SyntaxError()
#################################################################################
# self-test code: use my parser, parser1's tester
#################################################################################
if __name__ == '__main__':
import testparser
testparser.test(Parser, 'parser2') # run with Parser class here
Notice the way we handle undefined name exceptions inerrorCheck
. When exceptions are derived from
the built-inException
class, their
instances automatically return the arguments passed to the exception
constructor call as a tuple in theirargs
attribute—convenient for use in string
formatting here.
Also notice that the new parser reuses the same scanner module as
well. To catch errors raised by the scanner, it also imports the
specific classes that identify the scanner’s exceptions. Both the
scanner and the parser can raise exceptions on errors (lexical errors,
syntax errors, and undefined name errors). They’re caught at the top
level of the parser, and they end the current parse. There’s no need to
set and check status flags to terminate the recursion. Since math is
done using integers, floating-point numbers, and Python’s operators,
there’s usually no need to trap numeric overflow or underflow errors.
But as is, the parser doesn’t handle errors such as division by
zero—such Python exceptions make the parser system exit with a Python
stack trace and message. Uncovering the cause and fix for this is left
as suggested exercise.
Whenparser2
is run as a
top-level program, we get the same test code output as
forparser1
.
In fact, it reuses the very
same test code—both parsers pass in their parser class object totestparser.test
. And since classes
are also objects, we can also pass
this
version
of the parser totestparser
’s
interactive loop:testparser
.
interact
(
parser2
.
Parser
)
.
The new parser’s external behavior is identical to that of the
original, so I won’t repeat all its output here (run this live for a
firsthand look). Of note, though, this parser supports both use as a
top-level script, and package imports from other directories, such as
the PyTree viewer we’ll use in a moment. Python 3.X no longer searches a
module’s own directory on the import search path, though, so we have to
use package-relative import syntax for the latter case, and import from
another directory when testing
interactively
:
C:\...\PP4E\Lang\Parser>parser2.py
parser2
5.0
...rest is same as for parser1...
C:\...PP4E\Lang\Parser>python
>>>import parser2
from .scanner import Scanner, SyntaxError, LexicalError # from PyTree
ValueError: Attempted relative import in non-package
C:\...\PP4E\Lang\Parser>cd ..
C:\...\PP4E\Lang>Parser\parser2.py
parser2
5.0
...rest is same as for parser1...
C:\...\PP4E\Lang>python
>>>from Parser import parser2
>>>x = parser2.Parser()
>>>x.parse('1 + 2 * 3 + 4')
11
>>>import Parser.testparser
>>>Parser.testparser.interact(parser2.Parser)
Enter=>4 * 3 + 5
17
Enter=>stop
>>>
Using full package import paths inparser2
instead of either package-relative or
unqualified imports:
from PP4E.Lang.Parser import scanner
would suffice for all three use cases—script, and both same and
other directory
imports—
but
requires the path to be set properly, and seems overkill for importing a
file in the same directory as the
importer.
Really, the
only tangible difference with this latest parser is that
it builds and uses trees to evaluate an expression internally instead of
evaluating as it parses. The intermediate representation of an
expression is a tree of class instances, whose shape reflects the order
of operator evaluation. This parser also has logic to print an indented
listing of the constructed parse tree if thetraceme
attribute is set toTrue
(or1
). Indentation gives the nesting of subtrees,
and binary operators list left subtrees first. For example:
C:\...\PP4E\Lang>
>>>from Parser import parser2
>>>p = parser2.Parser()
>>>p.traceme = True
>>>p.parse('5 + 4 * 2')
[+]
...5
...[*]
......4
......2
13
When this tree is evaluated, theapply
method recursively evaluates subtrees
and applies root operators to their results. Here,*
is evaluated before+
, since it’s lower in the tree. TheFactor
method consumes the*
substring before returning a right subtree
toExpr
. The next tree takes a
different shape:
>>>p.parse('5 * 4 - 2')
[-]
...[*]
......5
......4
...2
18
In this example,*
is evaluated
before-
. TheFactor
method loops through a substring of*
and/
expressions before returning the resulting
left subtree toExpr
. The next
example is more complex, but follows the same rules:
>>>p.parse('1 + 3 * (2 * 3 + 4)')
[+]
...1
...[*]
......3
......[+]
.........[*]
............2
............3
.........4
31
Trees are made of nested class instances. From an OOP perspective,
it’s another way to use composition. Since tree nodes are just class
instances, this tree could be created and evaluated manually,
too:
PlusNode( NumNode(1),
TimesNode( NumNode(3),
PlusNode( TimesNode(NumNode(2), NumNode(3)),
NumNode(4) ))).apply({})
But we might as well let the parser build it for us (Python is not
that much like Lisp, despite what you may have heard).