next up previous
Next: Declarations Up: Structural Analysis Previous: Structural Analysis


The Abstract Syntax Tree

A compiler must carry out a number of computations over the abstract syntax tree in the process of analyzing and translating the source program. Simplifying these computations is the primary goal of AST design. While the AST certainly reflects the phrase structure of the source program, it may differ significantly in certain respects.

Attribute grammars describe both the structure of the AST and the computations carried out over it. Figure 1,

Figure 1: The top-level program structure
\begin{figure}\begin{flushleft}
\textbf{Structure.lido}\textnormal{[1]}:
\begin{...
...notesize This macro is attached to a product file.}
\end{flushleft}
\end{figure}
written in LIDO, is the attribute grammar fragment describing the overall structure of a Pascal program. Each rule describes a node of the AST, and corresponds to a class in an object-oriented implementation. The identifier following the keyword RULE names the class of the object that would represent such a node. These are the only classes that can be instantiated. (It is not necessary to name the rules in a LIDO specification, because Eli will generate unique names if none are given. Rules are named in Figure 1 to make it easier to discuss them.)

An identifier that precedes ::= or LISTOF in some rule is called a nonterminal; all other identifiers are terminals. Nonterminals following ::= represent subtrees, while terminals represent values that are not subtrees. (For example, identifier is a terminal representing an identifier appearing in the source program.)

Terminals can also be thought of as class names, but these classes are defined outside of the LIDO specification. They do not represent tree nodes, but rather values that are components of a rule's object. Objects of class pgpr therefore have no children, but each stores a representation of an identifier appearing in the source program.

A nonterminal (such as block in Figure 1) names the abstract class that characterizes the contexts in which the construct can appear. Each rule class (such as body in Figure 1) is a subclass of the abstract class named by the nonterminal preceding ::= or LISTOF.

Each rule containing ::= describes a node with a fixed number of children and/or component values. Nonterminals following the ::= specify children, in left-to-right order. Each child must be represented by an object belonging to a subclass of the abstract class named by the nonterminal.

Each rule containing LISTOF describes a node with an arbitrary number of children (including none at all). Nonterminals following LISTOF (separated by vertical bars) specify the possible children. There may be any number of children corresponding to each nonterminal.

Literals in Figure 1 do not represent information present in the abstract syntax tree; they are used solely to establish a correspondence between the abstract syntax tree nodes and the phrase structure of the input.



Subsections
next up previous
Next: Declarations Up: Structural Analysis Previous: Structural Analysis
2008-08-29