Abstract Syntax Tree Unparsing
The computation of IdemPtg in a given context can be decomposed into
two tasks: collecting the IdemPtg attribute values from the
children, and combining those values into a representation of the
current context.
Methods for attribute value collection depend on the tree grammar,
and are embodied in LIDO computations.
Methods for combining values, on the other hand, depend on the desired
form of the unparsed text.
They are embodied in PTG patterns.
There are two ways to override the output defined by the IdemPtg
attribute at a given node:
-
Override the PTG pattern associated with that node
-
Override the computation of the
IdemPtg attribute in the
associated rule
The first method should be used when the change is simply one of format
(adding constant strings, changing the order of the components, or omitting
components).
When it is necessary to add significant content to the unparsed
representation of a node, then the second method should be used.
Any arbitrary computation yielding an object of type PTGNode can be
carried out, using any information at the processor's disposal.
(Such a solution usually also requires overriding of the pattern.)
The generated unparser specification contains a PTG pattern for
each non-literal terminal symbol
and
each LIDO rule
in the definition of the tree grammar.
Each pattern name is the name of the construct
(non-literal terminal or rule),
preceded by a prefix followed by an underscore.
The default prefix is Idem .
All of the non-literal terminal symbols are represented by patterns of the
following form (`name' is the non-literal terminal symbol):
Idem_`name': [PtgOutId $ int]
This pattern is a single function call insertion
(see Function Call Insertion of PTG: Pattern-based Text Generator).
PtgOutId is a function exported by the PtgCommon module
(see Commonly used Output patterns for PTG of Tasks related to generating output).
Its argument is assumed to be a
string table index
(see Character String Storage of Library Reference Manual)
and it outputs the indexed string.
This default pattern for a non-literal terminal symbol assumes that
the value of that symbol is, in fact, a string table index.
If the internal representation of the symbol was created by either the
token processor
mkidn
(see Available Processors of Lexical Analysis)
or the token processor mkstr , this will be the case.
In the expression language specification, mkidn is used to
establish the internal representation of an Identifier , and
mkstr is used to establish the internal representation of an
Integer .
Suppose, however, that the internal representation of an Integer
was created by the token processor
mkint .
In that case, the user would have to provide the following PTG pattern
to override the normal pattern generation.
Idem_Integer: $ int
It is vital to ensure that the PTG pattern associated with a non-literal
terminal symbol is
compatible with the token processor creating the
internal representation of that symbol.
The only differences between the infix and postfix representations of an
expression tree are in the literal terminal symbols reconstructed by the
textual unparser (parentheses appear in an infix representation but not in
a postfix representation) and in the order in which values are combined
(operators between operands in an infix representation but following them
in a postfix representation).
Thus we can override the PTG patterns generated from the expression
language definition to produce a postfix unparser:
Idem_PlusExp: $1 $2 "+" [Separator]
Idem_StarExp: $1 $2 "*" [Separator]
Idem_Parens: $1
Idem_CallExp: $1 $2 [Separator]
Earlier
(see Using an Unparser),
we used a LIDO computation to ensure that a textual unparser generated from
the expression language definition separated the arguments of a call with
commas.
The same effect can be achieved by simply overriding the PTG pattern that
defines the "combine" function of the computation inherited by
Arguments :
Idem_2ArgList: $ { "," [Separator] } $
As usual, an invocation of Separator follows the terminal symbol
,.
In some situations, it is necessary to omit one or more children of a node.
This cannot be done simply by omitting indexed insertion points from the
appropriate PTG pattern, because PTG determines the number of arguments to
the generated function from the set of insertion points.
An invocation of the generated function, with one argument per child,
already appears in the computation for the node.
Thus any change in the number of insertion points would result in a
mismatch between the number of parameters to the function and the number of
arguments to the call.
A child can be omitted from the unparsed tree by "wrapping" the
corresponding indexed insertion point in the PTG pattern
(`i' is the integer index):
[ IGNORE $`i' ]
IGNORE is a macro defined in the generated FunnelWeb file.
It does nothing, so the effect is that the indexed sub-tree does not
appear in the unparsed output.
The unparser generator implements the computation of the
IdemPtg attribute as a
class symbol computation.
This class symbol computation can be overridden either by a
tree symbol computation
or by a
rule computation
(see Inheritance of Computations of LIDO - Reference Manual).
When overriding the default
computation for an IdemPtg value, it is
often convenient to be able to write the new computation in terms of
the overridden value.
Thus the unparser generator actually produces two class symbol
computations:
The IdemOrigPtg attribute of the class symbol is first computed
by applying the appropriate PTG function to the IdemPtg attributes
of the children.
Then the IdemPtg attribute of the class symbol is assigned the value
of the IdemOrigPtg attribute of the class symbol.
To see how IdemPtg and IdemOrigPtg could be used when an
unparser's behavior must be changed, suppose that the
Parens rule
were omitted from the definition of the expression language.
In that case, the unparser has no information about parentheses present in
the original input text.
Thus a pretty-printer would fail to output parentheses that were necessary
to override the normal operator precedence and association in certain
expressions, changing the meaning of those expressions.
Here is a simple
tree symbol computation to ensure that the unparsed form
has the same meaning as the original tree.
It overrides the class symbol computation for IdemPtg that was
produced by the unparser generator by a tree symbol computation:
SYMBOL Expression COMPUTE
SYNT.IdemPtg=PTGParen(THIS.IdemOrigPtg);
END;
PTGParen is defined by the pattern:
Paren: "(" $ ")"
This specification puts parentheses around every expression, which
certainly preserves the meaning but may make the result hard to read.
A more readable representation could be created by parenthesizing
only those expressions containing operators:
RULE: Expression ::= Expression Operator Expression
COMPUTE
Expression[1].IdemPtg=PTGParen(Expression[1].IdemOrigPtg);
END;
RULE: Expression ::= Operator Expression
COMPUTE
Expression[1].IdemPtg=PTGParen(Expression[1].IdemOrigPtg);
END;
This illustrates the use of
rule computations to override the generated class symbol computation.
A comma-separated argument list can be produced by overriding the
computation of IdemOrigPtg (or IdemPtg ,
see Using an Unparser):
RULE: Arguments LISTOF Expression
COMPUTE
Arguments.IdemOrigPtg=
CONSTITUENTS Expression.IdemPtg SHIELD Expression
WITH (PTGNode, PTGArgSep, IDENTICAL, PTGNull);
END;
|