Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Syntactic Analysis

Previous Chapter Next Chapter Table of Contents


The Relationship Between Phrases and Tree Nodes

RULE declarations in files of type `lido' describe the structure of the abstract syntax tree over which computations are performed. Eli will create a routine to construct an abstract syntax tree if any tree computations are specified (see Tree Structure of LIDO -- Computations in Trees). In order to do this, Eli must be able to deduce a unique correspondence between the concrete and abstract syntaxes, such that for each rule in the concrete syntax it is possible to uniquely determine what abstract syntax tree fragment to build. The tool within Eli that does this is called Maptool. In addition to generating a routine to construct the abstract syntax tree, Maptool will also deduce complete versions of the concrete and abstract syntaxes if only incomplete versions of each are provided by the user. This can only be done if the two syntaxes can together form a complete syntax.

The concrete syntax is provided by the user in files of type `con'. Since EBNF constructs are allowed in these files, they are first translated into their strict BNF equivalents before being processed by Maptool (see Using extended BNF to describe more complex rules). The abstract syntax is extracted from the RULE declarations made in files of type `lido' (see Rule Specifications of LIDO - Reference Manual).

The remainder of this section will discuss how Maptool deduces the correspondence between the two syntaxes, the use of files of type `map' to influence the mapping process, and some usage hints.

Syntax mapping process

Maptool begins by matching any LISTOF constructs that appear in the abstract syntax to any appropriate concrete rules. The next phase examines each concrete rule not matched in the previous phase and tries to find a matching abstract syntax rule. After all matching is complete, unmatched concrete rules are added to the abstract syntax and unmatched abstract rules are added to the concrete syntax. There are a few exceptions to this as are noted in the remainder of this section.

While the most obvious benefit to having Maptool deduce syntax fragments from one syntax and place them in the other is to reduce the amount of typing required, the more important advantage is the support it gives for incremental development. It allows the user to only specify those portions of the syntax with which they are concerned at the moment.

Chain rule definitions

Chain rules have different behavior than other rules during the matching process. Descriptions for three different kinds of chain rules are given here to assist in the explanations given in the remainder of this section:

Chain Rule
A normal chain rule is a rule in which there is exactly one symbol on the right hand side of the rule that is not equivalent to the left hand side. For example, `X ::= Y' where X is not equivalent to Y is a chain rule.

Trivial Chain Rule
A trivial chain rule is a chain rule in which the left hand side is equivalent to the right hand side. This typically happens when a symbolic equivalence class is defined that includes both the left hand side symbol and the right hand side symbol (see Specifying symbolic equivalence classes).

Literal Chain Rule
A literal chain rule is similar to a trivial chain rule, except that it also has literal symbols on its right hand side. A typical example of this is the rule `Expr ::= '(' Expr ')''.

Based on the above definition for normal chain rules, we define coercions between symbols. A symbol X can be coerced to a symbol Y if there is a chain rule with X on the right hand side and Y on the left hand side. Coercions are also transitive. If X is coercible to Y and Y is coercible to Z, then X is also coercible to Z. A symbol is also considered coercible to itself.

Matching the LISTOF construct

The LISTOF construct denotes zero or more occurrences of the elements that appear on its right hand side. It does not dictate the ordering of those right hand side symbols or any delimiters that may be used to separate them. The ordering and delimiters are determined by concrete rules. In simple terms, Maptool begins with the left hand side of the LISTOF and recursively matches rules until it finds the right hand side elements. The next paragraph gives a more precise description.

An abstract LISTOF construct is matched by starting with the symbol on the left hand side of the LISTOF. All concrete rules with equivalent left hand side symbols are added to the set of matched rules. For each rule added to the set, the right hand side symbols are examined. Of these symbols, literal symbols are ignored. If terminal symbols are encountered that aren't coercible to the symbols appearing on the right hand side of the LISTOF, an error is signaled, because the left hand side of the LISTOF may not derive symbols other than those that appear on the right hand side. For each nonterminal symbol that isn't coercible to one of the right hand side symbols, the concrete rules that have that symbol on their left hand side are added to the set. The process continues until no more rules can be added to the set.

The intermediate nonterminal symbols that are encountered as new concrete rules are added to the set may not appear on the right hand side of other concrete rules.

If Maptool doesn't find any concrete rules to match a LISTOF, it will generate a canonical left recursive representation. For the list:

RULE: Program LISTOF Declaration | Statement END;

Maptool would generate the following:

Program: LST_Program .
LST_Program: LST_Program Declaration .
LST_Program: LST_Program Statement .
LST_Program: .

This specifies zero or more occurrences of Declaration's and Statement's.

There is one other important thing to note about the LISTOF construct. Attribute computations associated with a LISTOF construct can just as easily be written as symbol computations on the symbols of the LISTOF. The advantage to using the LISTOF construct is that it becomes possible to generate an abstract syntax tree structure which allows for more efficient traversal. In order to construct this special tree structure, it is sometimes necessary to insert an additional chain rule into the concrete syntax at the root of the LISTOF.

This is the case when the rules matching the LISTOF have a recursive occurrence of the left hand side symbol. As an example, the LISTOF construct shown above might be written as follows in the concrete syntax:

Program: Program Declaration .
Program: Program Statement .
Program: .

As you can see, the root of the LISTOF, Program is used both on the left hand side and right hand side of rules that match the LISTOF construct, meaning that it is used recursively. If the LISTOF construct is provided in a `.lido' file, Maptool must introduce the chain rule `Program ::= LST_Program' and change other occurrences of Program to LST_Program in order to build the efficient tree structure.

Users should be aware that it is possible for the addition of this chain rule to cause LALR(1) conflicts for the parsability of the concrete syntax that do not appear in the absence of the LISTOF construct. In these cases, users must either rewrite the concrete syntax or avoid the use of the LISTOF construct to avoid the problem.

Matching remaining rules

After all LISTOF constructs have been matched, Maptool attempts to match the remaining concrete rules to rules given in the abstract syntax. A match is determined if the signature of the concrete rule is equivalent to the signature of an abstract rule or coercions (see Chain rule definitions) exist between any symbols which differ in the signatures. Remember that symbolic equivalence classes are applied to concrete rules before this matching takes place, so symbols in the signatures are considered equivalent if they belong to the same equivalence class.

For example, consider the following abstract rules:

RULE: Declaration ::= IdDef Type END;
RULE: IdDef ::= Identifier END;

The following concrete rule will match the first of the above abstract rules, because of the coercion defined between Identifier and IdDef:

Declaration: Identifier Type .

The reason for doing this is to distinguish semantically between occurrences of Identifier's in different contexts. In the above example, we have used IdDef to represent a definition of an Identifier. In another place in the grammar, we may want to refer to uses of identifiers instead and use the symbol IdUse. Note that use of chain rules in the manner just described makes it impossible to perform attribute computations during tree construction (see Constraints on grammar mapping).

It is possible for Maptool to detect multiple possible matching abstract rules for a single concrete rule. Maptool signals an error in this case that must be fixed by changing the grammar to disambiguate the contexts.

Complete generated concrete and abstract syntaxes

After rule matching is complete, unmatched concrete rules, except trivial chain rules and literal chain rules (see Chain rule definitions) are added to the abstract syntax. The reason for this is that trivial chain rules are meaningless in the abstract syntax and literal chain rules are only meaningful if they have attribute computations associated with them, in which case they would already have been specified as part of the abstract syntax.

Sometimes it is desirable to include literal chain rules in the abstract syntax even when the user has not explicitly included them there. A typical situation where this occurs is when generating output conforming to the concrete syntax using the Idem tool (see Textual unparser of Abstract Syntax Tree Unparsing). In this situation the output must contain all literals hence the literal chain rules must be in the abstract syntax so that Idem can generate output patterns for them. To preserve the literal chain rules in the abstract syntax use the MAPCHAINS keyword in a specification (see Preserving literal chain rules).

Unmatched abstract rules are included in the concrete syntax except in the following instances:

  • The rule is a chain rule whose left hand side is not a symbol in the concrete syntax. Adding the rule to the concrete syntax in this case would cause the concrete syntax to be disconnected.

  • The rule can only be part of a computed subtree (see Computed Subtrees of LIDO - Reference Manual). This is true if the rule is only reachable from the root symbol if symbols preceded by a $ are included.

Users can use the :consyntax product (see consyntax of Products and Parameters) to view the complete version of the concrete syntax.

The :abstree product (see abstree of Products and Parameters) is used to view the complete abstract tree grammar. The :absyntax product (see absyntax of Products and Parameters) by contrast only shows the abstract syntax rules which are not part of computed subtrees.

User mapping specifications

Files of type `map' can be provided by the user to influence the way in which certain rules are matched. The syntax of map files can be found with other grammar description towards the end of this document (see Grammars for the Specification Files).

There are currently three ways in which the mapping can be affected. The first are symbolic equivalence classes, which group together symbols that have the same semantic meaning. The second method is to map specific rules. Using this method, concrete rules can be rewritten and/or reordered to match a specific abstract rule. The third method controls the elimination of literal chain rules.

Specifying symbolic equivalence classes

Symbolic equivalence classes are used to group together symbols appearing in the concrete syntax because the semantics of the symbols are equivalent. As a result, a single symbol can be used to represent all of the members of the symbolic equivalence class in the abstract syntax. This representative symbol can either be one of the concrete symbols or a new symbol altogether. Symbolic equivalence classes are specified in files of type `map'. A series of symbolic equivalences must be preceded by the keyword MAPSYM. An equivalence class is then specified by giving the representative symbol (the symbol to appear in the abstract syntax), followed by ::= and the list of symbolically equivalent symbols from the concrete syntax terminated by a period. For example, the following specification says that a Primary, Factor, and Expr belong to the same equivalence class:

MAPSYM
Expr ::= Primary Factor .

Application of symbolic equivalence classes to rules in the concrete syntax is done before the matching process begins. Symbolic equivalence classes can only be created for symbols which are either all nonterminals or all terminals (see How to describe a context-free grammar). An error message will also be issued if a symbolic equivalence class specification includes abstract syntax symbols on the right hand side, since each abstract syntax symbol represents its own equivalence class.

For backward compatibility with previous releases of Eli, symbolic equivalence classes may also be specified in files of type `sym'.

Specifying rule mappings

Rule mapping allows users to rewrite a concrete rule for the purposes of matching it to a specific abstract rule. This is useful in cases where two syntactically different constructs are semantically equivalent. Consider the following expression language with bound identifiers:

Computation: LetExpr / WhereExpr .
LetExpr: 'let' Definitions 'in' Expr .
WhereExpr: Expr 'where' Definitions .

In this example, LetExpr and WhereExpr are semantically equivalent constructs, but the ordering of Definitions and Expr are reversed and they use different literal symbols. We'd like to only specify the semantic computations for the two constructs once. To do this, we can define a symbolic equivalence class for LetExpr and WhereExpr:

MAPSYM
BoundExpr ::= LetExpr WhereExpr .

The abstract rule that we can use to represent the two constructs is:

RULE: BoundExpr ::= Definitions Expr END;

Finally, we must use rule mapping specifications to rewrite the two concrete rules to match the abstract rule:

MAPRULE
LetExpr: 'let' Definitions 'in' Expr < $1 $2 > .
WhereExpr: Expr 'where' Definitions < $2 $1 > .

The keyword MAPRULE precedes a group of rule mapping specifications in the map file. The concrete rule to be rewritten is specified exactly as it appears in the concrete syntax, and is followed by a description of the right-hand side of the abstract rule enclosed in angle brackets. Any literal symbols in the abstract rule are represented in the description by themselves. Literal symbols appearing in the abstract rule need not have counterparts in the concrete rule, and literal symbols in the concrete rule need not appear in the abstract rule.

Nonliteral symbols (both nonterminals and nonliteral terminals) in the right-hand side of the abstract rule must have counterparts in the right-hand side of the concrete rule. Each nonliteral symbol in the right-hand side of the abstract rule is represented by a `$' followed by the index of its counterpart in the right-hand side of the concrete rule. There is no requirement that all nonliteral symbols from the concrete rule appear in the abstract rule, and no requirement on the order of appearance, but duplicates are not allowed.

Here is another example of a concrete rule:

Exp: Lambda Idn '.' Exp .
This rule has three nonliteral symbols: Lambda, Idn, and Exp. Lambda and Idn are terminals; Exp is a nonterminal.

Suppose that our abstract syntax has the following rules:

RULE: IdDef ::= Idn END;

RULE: Exp ::= IdDef 'is bound in' Exp END;
The first abstract rule tells us that an Idn may be interpreted as an IdDef in certain contexts (such as that represented by the second abstract rule). We can map the concrete rule given above into the second abstract rule, since the second abstract rule expresses the semantics of the concrete rule:

MAPRULE
Exp: Lambda Idn '.' Exp < $2 'is bound in' $3 > .
Here the concrete rule is given, and the right-hand side of the abstract rule is described in the angle brackets. The nonliteral terminal symbol Lambda is dropped (positional parameter $1 does not appear in the angle brackets), and the literal terminal symbol '.' is replaced by 'is bound in'.

An abstract syntax will sometimes have several rules with different names but identical signatures. For example, consider the case where dyadic expressions are represented by abstract rules that do not contain operators:

RULE Add: Expression ::= Expression Expression END;
RULE Mul: Expression ::= Expression Expression END;
...
In this case, the rule mapping must specify the abstract rule name explicitly in order to disambiguate the pattern match:

MAPRULE
Expression: '(' Expression '+' Expression ')' < $1 $2 >: Add .
Expression: '(' Expression '*' Expression ')' < $1 $2 >: Mul .
...
Rule names are optional, and may be omitted when the pattern match is unambiguous (as in the earlier examples).

When rule matching proceeds, the concrete rule is seen in its rewritten form. An abstract syntax rule must exist in a LIDO specification that corresponds to the rule mapping specification given. Note that the use of rule mapping makes it impossible to perform attribute computations during tree construction (see Constraints on grammar mapping).

Preserving literal chain rules

The mapping process normally does not include literal chain rules in the complete abstract syntax unless they appear in the user-supplied abstract syntax (see Complete generated concrete and abstract syntaxes). Sometimes it is desirable to preserve literal chain rules even if the user has not included them in the abstract syntax. To force literal chain rules to be included in the abstract syntax, use the MAPCHAINS keyword. The behavior is unchanged if all literal chain rules already appear in the abstract syntax.

Care should be taken when using MAPCHAINS in conjunction with attribution. A specification using this keyword may require more attribution than the same specification without it, because it may be necessary to transfer attribute values from the child to the parent or vice versa. The presence of symbol computations for the symbols occurring in the chain rules without the transfer computations just mentioned may result in incorrect attribution without warning.

Influences of BOTTOMUP specifications on mapping

The generation of the parsing grammar (the input to the parser) may be influenced by BOTTOMUP specifications (see Computations of LIDO - Reference Manual) specified in your attribute grammar. This is because the parsing grammar must ensure that the nodes of the abstract syntax tree are constructed in a particular order in the presence of BOTTOMUP constraints.

In order to deal with this, Maptool must sometimes inject generated chain rules into the parsing grammar to which tree building actions can be attached. These injected chain rules may cause the parsing grammar to exhibit LALR(1) conflicts. If so, an error will be reported to indicate that the BOTTOMUP constraints you have provided cause your grammar to not be parsable.

In trying to resolve such a conflict, it is useful to use the :pgram derivation (see pgram of Products and Parameters) to be able to view the parsing grammar that is submitted to the parser generator and contains the injected chain rules. It is also useful to use the :OrdInfo derivation to get more information about how BOTTOMUP constraints were introduced for specific rules. Approaches to resolving such a problem include eliminating unnecessary BOTTOMUP constraints from the attribute grammar or making changes to the concrete syntax that allow the chain rules to be injected without causing LALR(1) conflicts.

Syntax development hints

This section begins by describing typical patterns of syntax development. This is followed by two more specific examples of how to use the mapping techniques described in the previous sections.

Typical patterns of syntax development

When developing a translator for an existing language, the complete concrete syntax is typically already available. In these cases, it is advantageous to start with the complete concrete syntax and add symbolic equivalences and rule mapping specifications to suit the attribute computations as they are being developed.

On the other hand, when designing a new language, it is easier to start work by specifying attribute computations and adding concrete syntax rules as necessary to resolve issues of precedence, associativity, and other parsing ambiguities.

When errors relating to the syntax appear, it is strongly recommended that the first course of action be to look at the complete generated versions of the syntaxes by using the :consyntax, :absyntax, and :abstree products (see Specifications of Products and Parameters). Very often these problems are simply a result of not correctly anticipating the matching process.

Constraints on grammar mapping

The LIGA attribute grammar system allows users to specify that the first pass of computations are to be performed as the abstract syntax tree is being built. This is specified either by an option given in a LIGA control specification see Order Options of LIGA Control Language or by using an additional keyword in an attribute grammar computation see Computations of LIDO - Reference Manual.

Combining computations with tree construction, however, requires that the tree be constructed in strict left-to-right and bottom-to-top order. In the presence of more advanced grammar mappings, it is not possible to maintain this strict ordering. For this reason, Maptool generates the LIGA control directive:

ORDER: TREE COMPLETE ;

when it detects that one of these grammar mappings is required. The control directive indicates that the tree should be constructed completely before any computations take place.

The grammar mappings which cause Maptool to emit these directives are the use of chain rules in the abstract syntax that do not exist in the concrete syntax (see Matching remaining rules) and any use of rule mapping (see Specifying rule mappings). Aside from symbolic mappings (see Specifying symbolic equivalence classes) and the use of LISTOF constructs, the generated concrete and abstract syntaxes need to be identical in order to allow computations to take place during tree construction.

Abstracting information from literals

Literal terminals often distinguish phrases whose structures are identical except for the particular literal terminal. For example, in a normal arithmetic expression the phrase describing addition and the phrase describing subtraction are identical except for the literal + or -. Taking nonterminal equivalence classes into account, it may be that all phrases representing operations with two operands are identical except for the operator literal.

When phrases have identical structure except for one or more literals, the tree computations carried out at the nodes corresponding to those phrases are often identical except for some parameter that depends on the particular literal. It is then useful to abstract from the distinct literals, obtaining a single phrase with which to associate the computation and a set of phrases with which to associate the parameter evaluation. The key point here is that in many cases the computation will apply to a wide variety of translation problems, whereas the particular set of literals characterizes a single translation problem. By abstracting from the distinct literals, the computation can be reused.

To abstract from a specific literal, simply replace that literal with a nonterminal and add a production that derives the literal from that nonterminal. This added production represents the phrase with which the parameter evaluation would be associated. The computation for the phrase in which the literal was replaced by the nonterminal will now obtain the parameter value from the corresponding child, rather than evaluating it locally.

Mapping expressions for overload resolution

It is quite common for a single operator to have different meanings that depend on the types of its operands. For example, in Pascal the operator + might mean integer addition, real addition or set union. There are well-known techniques for deciding what is meant in a particular context, and these techniques depend only on the particular set of operators and operand types. The computations themselves are parameterized by this information (see Selecting an operator at an expression node of Type Analysis).

In order to reuse the tree computation to resolve overloading, abstract from the particular set of literals that represent the operators of the language. Then define equivalence classes in which every nonterminal representing an expression is replaced by Expr and every nonterminal representing an operator by Op. Finally, associate the appropriate computations with the following rules:

Expr: Expr Op Expr.
Expr: Op Expr.
Expr: Identifier.
Expr: Integer.
...

(Here ... indicates rules for other denotations, such as floating-point numbers, true, etc., defined in the language.)

As an example of the process, consider a language with integer and Boolean expressions in the style of Pascal.

Description of the input text, abstracting operators

The literals that represent operators in this language are +, -, *, /, div, mod, and, or and not. Define a new nonterminal for each precedence level of the dyadic operators, one for the unary arithmetic operators, and one for not:

Addop: '+' / '-' / 'or' .
Mulop: '*' / '/' / 'div' / 'mod' / 'and' .
Sign: '+' / '-' .
Notop: 'not' .

These productions abstract from the literals, and embody the information about the precedence and association (all operators are left-associative) needed to determine the phrase structure.

Using these new nonterminals, define the phrase structure of an expression:

SimpleExpression: Sign Sum / Sum .
Sum: Sum Addop Term / Term .
Term: Term Mulop Factor / Factor .
Factor: Notop Factor / Primary .
Primary: Integer / Id / 'true' / 'false' / '(' SimpleExpression ')' .

(Here Integer is a terminal representing arbitrary digit sequences and Id is a terminal representing arbitrary identifiers. These symbols will be recognized by the lexical analyzer.)

Description of the equivalence classes

All of the dyadic operators fall into the same equivalence class, which should be represented by the symbol Binop. Sign and Notop both belong to the Unop class, and SimpleExpression, Sum, Term, Factor, Primary are in the Expr class. Here is a type-`map' file defining these classes:

MAPSYM
Op ::= Addop Mulop Sign Notop .
Expr ::= SimpleExpression Sum Term Factor Primary .


Previous Chapter Next Chapter Table of Contents