Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Name Analysis Reference Manual

Previous Chapter Next Chapter Table of Contents


Tree Grammar Preconditions

The ScopeGraphs specification module provides computational roles to be inherited by nonterminal symbols of the abstract syntax tree. To make effective use of these roles, the abstract syntax tree must be designed with them in mind. Generally speaking, that means that the abstract syntax tree structure will differ somewhat from the structure of the parse tree. Eli offers tools that aid in mapping a parse tree to an appropriate abstract syntax tree (see The Relationship Between Phrases and Tree Nodes of Syntactic Analysis).

A developer can often use the parsing grammar provided by the language designer, and define simple mapping rules to obtain the desired abstract syntax tree (see Syntax development hints of Syntactic Analysis). Sometimes, however, a change in the parsing grammar is required. That can lead to difficulties, depending on the parsing algorithm (see How to Resolve Parsing Conflicts of Syntactic Analysis).

Representation of identifiers

An identifier is a basic symbol of the language being analyzed, and will correspond to a named terminal symbol of the concrete grammar describing the input text. (Typically the terminal symbol is named `Identifier' or `Ident'.) Named terminal symbols do not contribute to the trees specified by the tree grammar, but a value derived from the basic symbol may be used in computations associated with the rule of a production or with the symbol on the left-hand side of a production (see Productions of LIDO -- Reference Manual). A unique integer value is typically derived from each identifier by the token processor mkidn (see Available processors of Lexical Analysis).

Identifiers usually play different roles in different syntactic contexts. We recommend that these distinctions not be made in the concrete syntax. The concrete syntax should use the same named terminal (e.g. Ident) to represent identifiers in every context. Syntactic contexts should be distinguished in the abstract syntax by LIDO RULEs.

Consider the concrete productions

   ObjDecl:        TypeDenoter Ident.
   TypeDenoter:    Ident.
   Variable:       Ident.
We distinguish the different roles of identifiers by introducing new symbol names in the corresponding LIDO RULEs:

   RULE: ObjDecl     ::= TypeDenoter DefIdent END;
   RULE: TypeDenoter ::= TypeUseIdent END;
   RULE: Variable    ::= UseIdent END;
The new symbol names must, of course, be defined:

   RULE: DefIdent     ::= Ident END;
   RULE: UseIdent     ::= Ident END;
   RULE: TypeUseIdent ::= Ident END;

Representation of ranges

Recall that a range is a subtree of the abstract syntax tree that encompasses a scope. Most programming languages allow nested scopes, and allow the meaning of an identifier in an inner scope to differ from the meaning in an outer scope. In that case, we say that the identifier's binding from the outer scope is not visible in the inner scope. The developer's goal for the abstract syntax tree should be that each range encompasses a scope containing all occurrences of identifiers defined in that scope but not containing any defining occurrence whose binding is visible outside of that scope.

For example, consider the following grammar fragment:

ProcedureDeclaration:
  'procedure' ProcedureHeading ProcedureBody /
  Type 'procedure' ProcedureHeading ProcedureBody .
ProcedureHeading:
  ProcedureIdentifier FormalParameterPart ';'
    ValuePart SpecificationPart .
ProcedureIdentifier: Identifier .
ProcedureBody: Statement .

A ProcedureDeclaration is nested inside a block (not shown) and the semantics of the language make the ProcedureIdentifier visible in the range containing that block. FormalParameterPart, ValuePart, SpecificationPart, and Statement all belong to a single range whose bindings are visible within the ProcedureDeclaration. An abstract syntax tree corresponding to this grammar does not meet our goal: ProcedureDeclaration is the only region containing all identifier occurrences defined in the procedure, but it also contains the defining occurrence ProcedureIdentifier that is visible outside of the region.

The solution is to add a nonterminal ProcedureRange to the grammar:

ProcedureDeclaration:
  'procedure' ProcedureIdentifier ProcedureRange /
  Type 'procedure' ProcedureIdentifier ProcedureRange .
ProcedureIdentifier: Identifier .
ProcedureRange:
  ProcedureHeading ProcedureBody .
ProcedureHeading:
  FormalParameterPart ';' ValuePart SpecificationPart .
ProcedureBody: Statement .

Here the ProcedureRange subtree contains all occurrences of identifiers defined in the procedure, and does not contain the defining occurrence ProcedureIdentifier.

The ProcedureHeading can be considered to be the interface to the procedure, and the ProcedureBody to be the implementation of that interface. Suppose that the language allows the interface and its implementation to be widely separated, instead of packaged into a single declaration:

ProcedureInterface:
  'procedure' ProcedureIdentifier ProcedureHeading /
  Type 'procedure' ProcedureIdentifier ProcedureHeading .
ProcedureImplementation:
  'implementation' ProcedureIdentifier ProcedureBody .

The ProcedureInterface and ProcedureImplementation are both nested within the same block (not shown) and the semantics of the language make the ProcedureIdentifier visible in the range containing that block.

The ProcedureHeading and ProcedureBody are now completely disjoint subtrees; there is no ProcedureRange to contain them. Together, they contain all occurrences of identifiers defined in the procedure and therefore together play the same role as the ProcedureRange subtree in the previous example.

We have defined a scope as a contiguous sequence of program text, and said that the developer needs to map each scope to an abstract syntax subtree encompassing that scope (see Fundamentals of Name Analysis). In order to deal with separate interface and implementation, it is convenient to generalize these concepts by allowing a scope to be a set of disjoint contiguous sequences of program text, and requiring the developer to map these regions into an abstract syntax tree subforest that encompasses those sequences. The subforest then has the desired properties of a range: it contains all of the occurrences in the scope, and does not contain any defining occurrence whose binding is visible outside.


Previous Chapter Next Chapter Table of Contents