Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Tree Parsing

Next Chapter Table of Contents


The Tree To Be Parsed

Problems amenable to solution by tree parsing involve hierarchical relationships among entities. Each entity is represented by a node in a tree, and the structure of the tree represents the hierarchical relationship among the entities represented by its nodes.

The relationships are such that nodes corresponding to entities of a particular kind always have the same number of children. No constraint is placed on the kinds of children a particular kind of node can have; only the number of children is fixed. This tree parser accepts only trees in which each node has no more than two children.

An entity like an integer addition operator is completely characterized by the kind of node representing it. Integer constants, on the other hand, are not completely characterized by the fact that they are represented by IntDenotation nodes. Each IntDenotation node must therefore carry the constant's value as an attribute. This tree parser allows an arbitrary number of attributes of arbitrary type to be attached to each node.

A user builds the tree describing the hierarchical relationships among the entities of interest by invoking specific constructor functions. The constructor used to build a particular node depends on the number of children and the number and type of attributes required by that node.

This section begins by formalizing the structure of a tree to be parsed. It then characterizes the attributes, and finally explains the naming conventions for the constructors.

Tree Structure

The tree structure is defined in terms of a set of symbols that constitute a ranked alphabet: Each symbol has an associated arity that determines the number of children a node representing the symbol will have. Each node of the tree represents a symbol of the ranked alphabet, and the number of children of a node is the arity of the symbol it represents. Any such tree is legal; there is no constraint on the symbols represented by the children of a node, only on their number.

The ranked alphabet is extracted from the specification supplied by the user (see The Tree Patterns). The translator verifies that the arity of each symbol is consistent over the specification.

Each symbol of the ranked alphabet denotes a particular kind of entity. For example, here is a set of symbols forming a ranked alphabet that could be the basis of a tree describing simple arithmetic expressions:

IntegerVal   FloatingVal   IntegerVar   FloatingVar
Negative
Plus         Minus         Star         Slash

The symbols in the first row have arity 0, and are therefore represented by leaves of the tree. Negative has arity 1, and the symbols in the third row all have arity 2. Each symbol has the obvious meaning when describing an expression:

`3.1415'
FloatingVal
`-3'
Negative(IntegerVal)
`k-3'
Minus(IntegerVar,IntegerVal)
`(a*7)/(j+2)'
Slash(Star(FloatingVar,IntegerVal),Plus(IntegerVar,IntegerVal))
The notation here the normal algebraic one: A term is either a symbol of arity 0, or it is a symbol of arity k followed by a parenthesized list of k terms. Each term corresponds to a node of the tree.

A tree describing the expression in the first line has one node, representing the symbol FloatingVal. Because FloatingVal has arity 0, that node has no children. (The value `3.1415' would appear as an attribute of the node, see Decorating Nodes.)

A tree describing the expression in the last line has seven nodes. Four are leaves because the symbols they represent have arity 0; each of the remaining three has two children because the symbol it represents has arity 2.

A tree is not acceptable to the tree parser described in this document if any node has more than two children. Thus no symbol of the ranked alphabet may have arity greater than 2. That is not a significant restriction, since any tree can be represented as a binary tree.

Suppose that we want to use trees to describe the following C expressions:

`i>j ? i-j : j-i'
`(i=1, j=3, k=5, l+3) + 7'

Although ?: is usually thought of as a ternary operator, its semantics provide a natural decomposition into a condition and two alternatives:

Conditional(
  Greater(IntegerVar,IntegerVar),
  Alternatives(Minus(IntegerVar,IntegerVar),Minus(IntegerVar,IntegerVar)))

The comma expression might have any number of components, but they can simply be accumulated from left to right:

Plus(
  Comma(
    Comma(
      Comma(Assign(IntegerVar,IntegerVal),Assign(IntegerVar,IntegerVal)),
      Assign(IntegerVar,IntegerVal)),
    Plus(IntegerVar,IntegerVal)),
  IntegerVal)

Decorating Nodes

In addition to its arity, each symbol in the ranked alphabet may be associated with a fixed number of attributes. Each attribute has a specific type. The attributes decorate the nodes of the tree, but they do not contribute any structural information.

In the examples of the previous section, the symbols of arity 0 did not provide all of the necessary information about the leaves. Each symbol of arity 0 specified what the leaf was, but not which value of that kind it represented. This is often the case with leaves, so a leaf usually has an associated attribute. Interior nodes, on the other hand, seldom need attributes.

Each attribute must be given a value of the proper type when the node corresponding to the symbol is created. This value will not affect the tree parse in any way, but will be passed unchanged to the function implementing the action associated with the rule used in the derivation of the node. Thus attributes are a mechanism for passing information through the tree parse.

Node Construction Functions

Each node of the tree to be parsed is constructed by invoking a function whose name and parameters depend on the number of children and attributes of the node. The name always begins with the characters TP_, followed by the digit representing the number of children. If there are attributes, the attribute types follow. Each attribute type is preceded by an underscore.

The set of constructors is determined from the specification supplied by the user (see The Tree Patterns). The translator verifies that each occurrence of a symbol is consistent with respect to the number of children and types of attributes.

Consider the simple expression trees discussed above (see Tree Structure):

IntegerVal   FloatingVal   IntegerVar   FloatingVar
Negative
Plus         Minus         Star         Slash

Suppose that integer and floating-point values are represented by the integer indexes of their denotations in the string table (see Character String Storage of Library Reference), and variables are represented by definition table keys (see The Definition Table Module of Property Definition Language). In that case each tree node representing either IntegerVal or FloatingVal would be decorated with an int-valued attribute; each tree node representing either IntegerVar or FloatingVar would be decorated with a DefTableKey-valued attribute. No other node would have attributes, and four tree construction functions would be created by the translator:

: TPNode TP_0_int (int symbol, int attr)

Return a symbol leaf decorated with attr, of type int

: TPNode TP_0_DefTableKey (int symbol, DefTableKey attr)

Return a symbol leaf decorated with attr, of type DefTableKey

: TPNode TP_1 (int symbol, TPNode child)

Return an undecorated symbol node with one child

: TPNode TP_2 (int symbol, TPNode left, TPNode right)

Return an undecorated symbol node with two children

Here's how the tree describing the expression `-i+1' could be constructed:

TP_2(
  Plus,
  TP_1(Negative, TP_0_DefTableKey(IntegerVar, keyOfi)),
  TP_0_int(IntegerVal, indexOf1))

Here keyOfi is a variable holding the definition table key associated with variable i and indexOf1 is a variable holding the string table index of the denotation for 1.

All tree construction functions return values of type TPNode. Attributes can be attached to nodes with children, although there are no such nodes in the example above. Here's the constructor invocation for a node with two children and two integer attributes:

TP_2_int_int(Symbol, child1, child2, attr1, attr2);


Next Chapter Table of Contents