General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Tree ParsingThe Tree To Be ParsedProblems amenable to solution by tree parsing involve hierarchical relationships among entities. Each entity is represented by a node in a tree, and the structure of the tree represents the hierarchical relationship among the entities represented by its nodes. The relationships are such that nodes corresponding to entities of a particular kind always have the same number of children. No constraint is placed on the kinds of children a particular kind of node can have; only the number of children is fixed. This tree parser accepts only trees in which each node has no more than two children.
An entity like an integer addition operator is completely characterized by
the kind of node representing it.
Integer constants, on the other hand, are not completely characterized by
the fact that they are represented by A user builds the tree describing the hierarchical relationships among the entities of interest by invoking specific constructor functions. The constructor used to build a particular node depends on the number of children and the number and type of attributes required by that node. This section begins by formalizing the structure of a tree to be parsed. It then characterizes the attributes, and finally explains the naming conventions for the constructors.
Tree StructureThe tree structure is defined in terms of a set of symbols that constitute a ranked alphabet: Each symbol has an associated arity that determines the number of children a node representing the symbol will have. Each node of the tree represents a symbol of the ranked alphabet, and the number of children of a node is the arity of the symbol it represents. Any such tree is legal; there is no constraint on the symbols represented by the children of a node, only on their number. The ranked alphabet is extracted from the specification supplied by the user (see The Tree Patterns). The translator verifies that the arity of each symbol is consistent over the specification. Each symbol of the ranked alphabet denotes a particular kind of entity. For example, here is a set of symbols forming a ranked alphabet that could be the basis of a tree describing simple arithmetic expressions:
IntegerVal FloatingVal IntegerVar FloatingVar Negative Plus Minus Star Slash
The symbols in the first row have arity 0, and are therefore represented by
leaves of the tree.
A tree describing the expression in the first line has one node, representing
the symbol A tree describing the expression in the last line has seven nodes. Four are leaves because the symbols they represent have arity 0; each of the remaining three has two children because the symbol it represents has arity 2. A tree is not acceptable to the tree parser described in this document if any node has more than two children. Thus no symbol of the ranked alphabet may have arity greater than 2. That is not a significant restriction, since any tree can be represented as a binary tree. Suppose that we want to use trees to describe the following C expressions:
`i>j ? i-j : j-i' `(i=1, j=3, k=5, l+3) + 7'
Although
Conditional( Greater(IntegerVar,IntegerVar), Alternatives(Minus(IntegerVar,IntegerVar),Minus(IntegerVar,IntegerVar))) The comma expression might have any number of components, but they can simply be accumulated from left to right:
Plus( Comma( Comma( Comma(Assign(IntegerVar,IntegerVal),Assign(IntegerVar,IntegerVal)), Assign(IntegerVar,IntegerVal)), Plus(IntegerVar,IntegerVal)), IntegerVal)
Decorating NodesIn addition to its arity, each symbol in the ranked alphabet may be associated with a fixed number of attributes. Each attribute has a specific type. The attributes decorate the nodes of the tree, but they do not contribute any structural information. In the examples of the previous section, the symbols of arity 0 did not provide all of the necessary information about the leaves. Each symbol of arity 0 specified what the leaf was, but not which value of that kind it represented. This is often the case with leaves, so a leaf usually has an associated attribute. Interior nodes, on the other hand, seldom need attributes. Each attribute must be given a value of the proper type when the node corresponding to the symbol is created. This value will not affect the tree parse in any way, but will be passed unchanged to the function implementing the action associated with the rule used in the derivation of the node. Thus attributes are a mechanism for passing information through the tree parse. Node Construction Functions
Each node of the tree to be parsed is constructed by invoking a function
whose name and parameters depend on the number of children and
attributes of the node.
The name always begins with the characters The set of constructors is determined from the specification supplied by the user (see The Tree Patterns). The translator verifies that each occurrence of a symbol is consistent with respect to the number of children and types of attributes. Consider the simple expression trees discussed above (see Tree Structure):
IntegerVal FloatingVal IntegerVar FloatingVar Negative Plus Minus Star Slash
Suppose that integer and floating-point values are represented by the integer
indexes of their denotations in the string table
(see Character String Storage of Library Reference),
and variables are represented by definition table keys
(see The Definition Table Module of Property Definition Language).
In that case each tree node representing either : TPNode TP_0_int (int symbol, int attr)
Return a symbol leaf decorated with attr,
of type : TPNode TP_0_DefTableKey (int symbol, DefTableKey attr)
Return a symbol leaf decorated with attr,
of type : TPNode TP_1 (int symbol, TPNode child) Return an undecorated symbol node with one child : TPNode TP_2 (int symbol, TPNode left, TPNode right) Return an undecorated symbol node with two children Here's how the tree describing the expression `-i+1' could be constructed:
TP_2( Plus, TP_1(Negative, TP_0_DefTableKey(IntegerVar, keyOfi)), TP_0_int(IntegerVal, indexOf1))
Here
All tree construction functions return values of type
TP_2_int_int(Symbol, child1, child2, attr1, attr2);
|