General Information

	Eli: Translator Construction Made Easy
	Global Index
	Frequently Asked Questions
	Typical Eli Usage Errors

Tutorials

	Quick Reference Card
	Guide For new Eli Users
	Release Notes of Eli
	Tutorial on Name Analysis
	Tutorial on Scope Graphs
	Tutorial on Type Analysis
	Typical Eli Usage Errors

Reference Manuals

	User Interface
	Eli products and parameters
	LIDO Reference Manual
	Typical Eli Usage Errors

Libraries

	Eli library routines
	Specification Module Library

Translation Tasks

	Lexical analysis specification
	Syntactic Analysis Manual
	Computation in Trees

Tools

	LIGA Control Language
	Debugging Information for LIDO
	Graphical ORder TOol

FunnelWeb User's Manual

	Pattern-based Text Generator
	Property Definition Language
	Operator Identification Language
	Tree Grammar Specification Language
	Command Line Processing
	COLA Options Reference Manual

Generating Unparsing Code

Monitoring a Processor's Execution

Administration

System Administration Guide

Type Analysis

Expressions

An expression node represents a program construct that yields a value, and an expression tree is a subtree of the abstract syntax tree made up entirely of expression nodes. Type analysis within an expression tree is uniform; additional specifications are needed only at the roots and leaves. (Note that these need not be roots and leaves in the sense of the abstract syntax tree.) A designer often chooses to represent a programming language expression by more than one expression tree, in order to capture special relationships within that expression. For example, each argument of a function call might be a separate expression tree because more type conversions are allowed in that context than in the context of an operator.

The Expression module provides computational roles and rule computations to implement the type analysis of expression trees:

ExpressionSymbol: The computational role inherited by a grammar symbol that represents an expression node.
OperatorSymbol: The computational role inherited by a grammar symbol that represents an operator node.
OpndExprListRoot
BalanceListRoot: Computational roles inherited by grammar symbols that represent operand lists.
OpndExprListElem
BalanceListElem: Computational roles inherited by grammar symbols that represent operand list elements. Every OpndExprListElem must be a descendant of OpndExprListRoot in the tree; every BalanceListElem must be a descendant of BalanceListRoot.
PrimaryContext
TransferContext
BalanceContext
MonadicContext
DyadicContext
ListContext
ConversionContext
CastContext
RootContext: Rule computations implementing common expression contexts.
Indication
OperName: Rule computations for expression contexts where operations are performed, but which have no grammar symbol representing the possible operations.

The expression module is usually instantiated by:

$/Type/Expression.gnrc :inst

For a discussion of alternatives, see Selecting an operator at an expression node.

Type analysis of expression trees

The symbol on the left-hand side of a rule defining an expression node characterizes the expression's result. It inherits the ExpressionSymbol role. Two attributes of ExpressionSymbol describe its type:

Required: An inherited attribute whose DefTableKey value represents the type required by the surrounding context. (A value of NoKey indicates that no specific type is required.) Required may be set by a user computation at the root of an expression subtree; computations are supplied by the Expression module for all other expression nodes.
Type: An attribute whose DefTableKey value is set by computations supplied by the Expression module to represent the type of the result delivered by the expression subtree rooted in this node. (A value of NoKey indicates that the type delivered by the node is unknown.) Type may depend on Required as well as on the possible types of the node's children; it must never be set by user computation.

An expression node is type-correct if the type specified by its Type attribute is acceptable as the type specified by its Required attribute. Any type is acceptable as an undefined required type, and an undefined type is acceptable as any required type.

In order to support incremental development, ExpressionSymbol defines default computations setting the values of both Required and Type to NoKey; those computations are overridden by the rule computations described in this chapter. The default computations allow one to declare that a symbol inherits ExpressionSymbol without producing specification errors for every context containing that symbol. This advantage is offset by the fact that if one forgets to provide rule computations for some contexts, the generated compiler will silently ignore certain errors in the input program.

Rules defining expression nodes in an abstract syntax tree for a typical programming language describe constants, variables, and computations:

SYMBOL Expr INHERITS ExpressionSymbol   END;

RULE: Expr   ::= Number                 END;
RULE: Expr   ::= ExpIdUse               END;
RULE: Expr   ::= Expr Operator Expr     END;
RULE: Expr   ::= Expr '[' Subscript ']' END;

The first two rules describe leaves of expression subtrees. Any node described by the first rule is a leaf of the abstract syntax tree as well as a leaf of some expression subtree. Nodes described by the second rule are not leaves of the abstract syntax tree because each has an ExpIdUse node as a child.

A leaf of an expression subtree delivers a value whose type must be determined from the context of that leaf according to the definition of the language. For example, the Expr node in the first rule might deliver the language-defined integer type; in the second rule, the delivered type is the value of ExpIdUse.Type.

The type analyzer models most interior expression nodes by operators applied to operands:

An indication is derived from the context.
One operator is selected from the set associated with that indication.
The Type attribute of the node is set to the result type of the selected operator.
The Required attributes of one or more children are set to the operand types of the selected operator.

For example, in the following rule, the indication is provided by the Operator child:

RULE: Expr   ::= Expr Operator Expr     END;

Usually, a set of several operators (such as {iAddOp, fAddOp}) is associated with that indication. An operator is then selected from that set as discussed in the next section.

In the fourth rule, we might assume that each array type definition adds a dyadic access operator to an indication fixed by the rule (see Operator, function, and method definitions):

RULE: Expr   ::= Expr '[' Subscript ']' END;

The left operand of that operator is the array type, the right operand is the index type, and the result is the element type.

The operator/operand model provides support for expression node semantics that are ubiquitous in programming languages. Several other models, useful in special circumstances, are supported and will be introduced in later sections of this chapter. It is clear, however, that there will be situations in which the semantics of an expression context do not fit any of the supported models. Our advice is to consider such a context as a place where several disjoint expression subtrees meet: The expression symbol on the left-hand side of the rule defining the context is a leaf of an expression tree above the context, and each expression symbol on the right-hand side is the root of an expression tree below the context.

Selecting an operator at an expression node

If an indication is associated with a singleton operator set, that operator is selected regardless of operand or result types.

There are two standard algorithms for selecting an operator if the indication's set has more than one element. The simplest ignores the type required by the context. For each operator in the set, it checks whether each operand is acceptable as the type required by that operator. If more than one operator in the set satisfies this condition, the algorithm chooses the one requiring the fewest coercions. If no operator can be identified, then the unknown operator is chosen.

To select this algorithm, instantiate the Expression module without the +referto parameter:

$/Type/Expression.gnrc :inst

Ada is a language in which the selection of an operator depends on the type required by the context as well as the types delivered by the operands. In that case, a two-pass algorithm is required.

Starting with the leaves of an expression tree and working towards the root of that tree, the algorithm determines a possible type set for each expression node. Every type in the possible type set at a node is either a leaf type or is a type associated with an operator whose result is that type, and whose operands are elements of the possible type sets of the node's children. The algorithm associates a cost with each type in a possible type set.

OIL allows one to specify an arbitrary integer cost for each operator and coercion independently. If this specification is omitted, a cost of 1 is assigned to the operator or coercion. For the remainder of this section, we assume that all cost specifications have been omitted and therefore all costs are 1.

Consider a very simple expression node, the integer constant 3. One element of the possible type set for this node is the language-defined integer type, which has cost 0 because no operations are needed to create a value of that type. If a coercion has been defined with the language-defined integer type as its operand and the language-defined floating-point type as its result, then another element of the possible type set of this node is the language-defined floating-point type. It has cost 1, the total number of operators required to produce it. Similarly, if there is a coercion from floating-point to double, double will be an element of the possible type set of the node and it will have cost 2.

When the algorithm computes the possible type set of an interior expression node, it considers the operators in that node's indication set. For each operator, it checks whether the type required by that operator for a given argument is in the possible type set of the corresponding operand. Each operator meeting that condition is a possible operator at the node. The cost of using an operator is one more than the sum of the costs associated with its argument types in the possible type sets of its operands. Finally, the possible type set of the node is the set of all types T such that the result type of a possible operator is acceptable as T.

Often a particular element of the possible type set of an interior node can be obtained in more than one way. For example, consider a node representing the sum of two integers. Assuming that the integer type is acceptable as the floating-point type, the possible type set of each child contains both integer and floating-point types. Thus both integer addition and floating-point addition are possible selections at this node. There are then two ways to obtain a floating-point result:

Use an integer addition and convert the result to floating-point.
Convert each operand to floating-point and use a floating-point addition.

The cost of using an integer addition in this context is 1 because the cost of the integer elements of the possible type sets are both 0. Converting the result to floating-point costs one coercion, for a total cost of 2. The cost of using a floating-point addition, on the other hand, is 3 because the cost of the floating-point elements of the possible type sets are both 1.

The cost of obtaining a value of type T using a particular operator is the sum of the cost of using that operator and the number of coercion operators required to convert a value of the result type to a value of type T. When more than one possible operator selection leads to a value of a given type, the algorithm only retains the one with the lowest cost.

If the Required attribute of the expression tree root is the unknown type, then the algorithm chooses the lowest-cost element of the root's possible type set as the result type of the expression. Otherwise, the Required attribute of the root is taken as the result type regardless of whether it appears in the root's possible type set.

The second pass starts with the root of the tree and works towards the leaves. At each node, the value of the Required attribute specifies an element of the possible type set and hence an operator. Given an operator, the values of the node's Type attribute and the Required attributes of any operands are fixed.

If the possible type set of an expression node does not contain an element equal to the value of that node's Required attribute, then the unknown operator is selected; the node's Type attribute and the Required attributes of any operands are set to the unknown type.

To select the two-pass algorithm, instantiate the Expression module with the +referto=Result parameter:

$/Type/Expression.gnrc +referto=Result :inst

Expression contexts without operators

Let `e1' be a grammar symbol playing the ExpressionSymbol role, and `type' be an expression yielding the definition table key of a type. A primary context is one in which the parent `e1' delivers a value of a known type. PrimaryContext(`e1',`type') provides the rule computations to set `e1'.Type to the type `type'. (Recall that the value of the Type attribute of an expression node must never be set directly by a user computation.)

The constant and variable expressions in C are examples of primary contexts:

SYMBOL Expr INHERITS ExpressionSymbol END;

RULE: Expr ::= Number COMPUTE
  PrimaryContext(Expr,intType);
END;

SYMBOL ExpIdUse INHERITS TypedUseId END;

RULE: Expr ::= ExpIdUse COMPUTE
  PrimaryContext(Expr,ExpIdUse.Type);
END;

The type of an integer constant is the language-defined integer type (see Language-defined types), and the type of a variable is the type with which it was declared (see Accessing the type of an entity).

Let `e1' and `e2' be grammar symbols playing the ExpressionSymbol role. A transfer context is one in which the parent `e1' and one of its children `e2' are identical with respect to type. TransferContext(`e1',`e2') provides the rule computations to set `e1'.Type and `e2'.Required.

The comma expression in C is an example of a transfer context:

RULE: Expr ::= Expr ',' Expr COMPUTE
  TransferContext(Expr[1],Expr[3]);
END;

Notice that the left operand of the comma, Expr[2], is the root of an expression subtree distinct from the one containing the TransferContext. The value of this expression will be discarded, so its type is arbitrary. Thus there is no need to override the default computation Expr[2].Required=NoKey.

Let `e1', `e2', and `e3' be grammar symbols playing the ExpressionSymbol role. A balance context is one in which the parent `e1' must deliver either the result delivered by child `e2' or the result delivered by child `e3'. This means that values delivered by both children must be acceptable as a common type, and the parent must deliver a value of that common type. BalanceContext(`e1',`e2',`e3') provides the rule computations to set `e1'.Type, `e2'.Required, and `e3'.Required such that the following relations hold:

`e2'.Type acceptableAs `e1'.Type
`e3'.Type acceptableAs `e1'.Type
There is no type `t' other than `e1'.Type such that
- `e2'.Type acceptableAs `t'
- `e3'.Type acceptableAs `t'
- `t' acceptableAs `e1'.Type
`e2'.Required equals `e1'.Type
`e3'.Required equals `e1'.Type

The conditional expression of C is an example of a balance context:

RULE: Expr ::= Expr '?' Expr ':' Expr COMPUTE
  BalanceContext(Expr[1],Expr[3],Expr[4]);
  Expr[2].Required=scalarType;
END;

The condition, Expr[2], is the root of an expression subtree distinct from that containing the BalanceContext. The definition of C requires that Expr[2] return a value of scalar type, independent of the types of the other expression nodes. (Pointers and numbers are values of scalar type in C.) Thus the default computation Expr[2].Required=NoKey must be overridden in this context.

Some languages generalize the conditional expression to a case expression. For example, consider an ALGOL 68 computation of the days in a month:

begin int days, month, year;
days := case month in
  31,
  (year mod 4 and year mod 100 <> 0 or year mod 400 = 0 | 28 | 29),
  31,30,31,30,31,31,30,31,30,31 esac
end

The number of cases is not fixed, and the balancing process therefore involves an arbitrary list. This is the purpose of the BalanceListRoot and BalanceListElem roles. Both inherit from ExpressionSymbol, and neither requires computations beyond the ones used in any expression context. The balancing computation described above is carried out pairwise on the list elements:

SYMBOL CaseExps INHERITS BalanceListRoot END;
SYMBOL CaseExp  INHERITS BalanceListElem END;

RULE: Expr ::= 'case' Expr 'in' CaseExps 'esac' COMPUTE
  TransferContext(Expr[1],CaseExps);
END;

RULE: CaseExps LISTOF CaseExp END;

RULE: CaseExp ::= Expr COMPUTE
  TransferContext(CaseExp,Expr);
END;

Notice that these rule computations simply interface with the BalanceListRoot and BalanceListElem roles; all significant computations are done by module code generated from those roles.

Operators with explicit operands

Tree symbols in the abstract syntax that correspond to operator symbols in a source program usually inherit the OperatorSymbol role. Two attributes of OperatorSymbol describe the operator selection in the current context:

Indic: A synthesized attribute whose DefTableKey value represents the indication derived from the context. (A value of NoKey indicates that no such indication can be derived.) Indic must be set by a user computation.
Oper: An attribute whose DefTableKey value is set by Expression module computations to represent the operator selected from the set identified by the associated indication. (A value of NoKey indicates that no operator could be selected.) Oper may depend on Required as well as on the possible types of the node's children and the operator indication; it must never be set by user computation.

In order to support incremental development, OperatorSymbol defines a default computation setting the values of both Indic and Oper to NoKey. The default computation of Indic is overridden by a user computation, and that of Oper by the rule computations described in this section. The default computations allow one to declare that a symbol inherits OperatorSymbol without producing specification errors for every context containing that symbol. This advantage is offset by the fact that if one forgets to provide appropriate overriding computations, the generated compiler will silently ignore certain errors in the input program.

Let `e1', `e2', and `e3' all play the ExpressionSymbol role, and `rator' play the OperatorSymbol role. A monadic (dyadic) context is one in which the parent `e1' delivers the result of applying `rator' to the operand(s). The following provide rule computations to set `rator'.Oper, `e1'.Type, and `e2'.Required (plus `e3'.Required if present):

MonadicContext(`e1',`rator',`e2')
DyadicContext(`e1',`rator',`e2',`e3')

Contexts with arbitrary numbers of operands are discussed in the next section.

SYMBOL Operator INHERITS OperatorSymbol END;

RULE: Expr ::= Expr Operator Expr COMPUTE
  DyadicContext(Expr[1],Operator,Expr[2],Expr[3]);
END;

RULE: Operator ::= '+' COMPUTE
  Operator.Indic=PlusInd;
END;

The array access rule also fits the DyadicContext pattern, but has no symbol playing the OperatorSymbol role. In such cases, the `rator' argument is omitted and the indication supplied by an additional context-dependent rule computation.

Let `ind' be a definition table key representing an indication. Indication(`ind') provides the rule computations to set the node's indication to `ind'.

If the indication indexInd's operator set includes one access operator for every array type (see Operator, function, and method definitions), then the following computation implements the type relationship in an array access:

SYMBOL Subscript INHERITS ExpressionSymbol END;

RULE: Expr ::= Expr '[' Subscript ']' COMPUTE
  DyadicContext(Expr[1],,Expr[2],Subscript);
  Indication(indexInd);
END;

Note that Expr[2] can be any expression yielding an array value; it need not be a simple array name.

In some cases it is useful to know the name of the operator selected from the indication set. The OperatorSymbol.Oper attribute normally supplies this information, but when there is no symbol playing that role the value can be accessed via a context-dependent rule computation:

OperName: Yields the operator selected from the context's indication set. If no operator can be selected, the result is the unknown operator.

Operators with operand lists

Function calls and multidimensional array references are common examples of expression contexts whose operators have operand lists rather than explicit operands. One symbol on the right-hand side of the rule defining such a context characterizes the entire list of operands. It inherits the OpndExprListRoot role.

The symbol defining an operand in the list inherits the OpndExprListElem role. OpndExprListElem inherits the ExpressionSymbol role, and overrides ExpressionSymbol's default computation of the Required attribute in all upper contexts.

Let `e' be a grammar symbol playing the ExpressionSymbol role, `rator' be a grammar symbol playing the OperatorSymbol role, and `rands' be a grammar symbol playing the OpndExprListRoot role. A list context is one in which the parent `e' delivers the result of applying `rator' to the operand list `rands'. ListContext(`e',`rator',`rands') provides the rule computations to set `e'.Type, `rator'.Oper, and OpndExprListElem.Required for each OpndExprListElem descendant of `rands'.

If the language has multi-dimensional array references, they can be implemented using a strategy that differs from that of the previous section:

SYMBOL Subscripts INHERITS OpndExprListRoot END;
SYMBOL Subscript  INHERITS OpndExprListElem END;

RULE: Expr ::= Expr '[' Subscripts ']' COMPUTE
  ListContext(Expr[1],,Subscripts);
  Indication(GetAccessor(Expr[2].Type,NoKey));
END;

RULE: Subscripts LISTOF Subscript END;

RULE: Subscript ::= Expr COMPUTE
  TransferContext(Subscript,Expr);
END;

This computation assumes that the indication is the value of the Accessor property of the array type (see User-Defined Types).

Some languages have variadic operators -- operators taking a variable number of operands. The most common of these are max and min, which can take two or more numeric operands. All of the operands must ultimately be of the same type, so the situation is similar to that of a balanced context.

For type checking purposes, the variadic operator can be considered to have a single operand, whose type is determined by balancing the elements of the list (see Expression contexts without operators). Of course this form of operand list must be distinguished syntactically from a normal list operand:

SYMBOL VarRands INHERITS BalanceListRoot END;
SYMBOL VarRand  INHERITS BalanceListElem END;

RULE: Expr ::= VarOper '(' VarRands ')' COMPUTE
  MonadicContext(Expr,VarOper,VarRands)
END;

RULE: VarRands LISTOF VarRand END;

RULE: VarRand ::= Expr COMPUTE
  TransferContext(Actual,Expr);
END;

Type conversion

The acceptableAs relation models implicit type conversion in the context of operators applied to operands. In other contexts, additional type conversions may be possible. For example, both Java and C allow a floating-point value to be assigned to an integer variable. That conversion cannot be modeled by the acceptableAs relation (see Language-defined coercibility).

Additional type conversions such as those taking place on assignment can be modeled by specific conversion operators. An indication is associated with each context in which additional type conversions are possible, and the indication's set contains exactly the conversions allowable in that context.

Let `e1' and `e2' play the ExpressionSymbol role, and `rator' play the OperatorSymbol role. A conversion context is one in which the rules of the language allow the type conversions in `rator'.Indic's set to be applied to the value yielded by `e2' (in addition to any coercions) in order to obtain the type that must be yielded by `e1'. ConversionContext(`e1',`rator',`e2') provides rule computations to set `rator'.Oper, `e1'.Type, and `e2'.Required. If no additional conversion operator is required, or if none can be selected from `rator'.Indic's set, then `rator'.Oper is set to the unknown operator and both `e1'.Type and `e2'.Required are set to `e1'.Required.

In C, an actual argument to a function call may be implicitly converted to the type of the corresponding formal parameter prior to the function call. The same set of conversions can be used in assignment contexts, so assume that the indication is assignCvt:

SYMBOL Actual INHERITS OpndExprListElem END;

RULE: Actual ::= Expr COMPUTE
  ConversionContext(Actual,,Expr);
  Indication(assignCvt);
END;

Let `e1' and `e2' play the ExpressionSymbol role, `rator' play the OperatorSymbol role, and `type' yield a DefTableKey value representing a type. A cast context is a conversion context in which the desired type is inherent in the context itself, rather than being determined by `e1'. CastContext(`e1',`rator',`e2',`type') provides rule computations to set `rator'.Oper, `e1'.Type, and `e2'.Required. If no additional conversion operator is required, or if none can be selected from `rator'.Indic's set, then `rator'.Oper is set to the unknown operator and both `e1'.Type and `e2'.Required are set to `type'.

The C cast expression is an example of a cast context. Here we assume that castInd is an indication whose set consists of all of the possible C conversions:

RULE: Expr ::= '(' Type ')' Expr COMPUTE
  CastContext(Expr[1],,Expr[2],Type.Type);
  Indication(castInd);
END;

Let `e2' play the ExpressionSymbol role, `rator' play the OperatorSymbol role, and `type' yield a DefTableKey value representing a type. A root context is a conversion context in which the desired type is inherent in the context itself, which is not an expression context. RootContext(`type',`rator',`e2') provides rule computations to set `rator'.Oper and `e2'.Required. If no additional conversion operator is required, or if none can be selected from `rator'.Indic's set, then `rator'.Oper is set to the unknown operator and `e2'.Required is set to `type'.

The C return statement is an example of a root context. It is not itself an expression, but it has an expression operand. That operand must yield the return type of the function, which is inherent in the context of the return statement, and can be obtained from the Function node. Here we assume that assignInd is an indication whose set consists of all of the possible C assignment conversions:

RULE: Statement ::= 'return' Expr COMPUTE
  RootContext(INCLUDING (Function.ResultType),,Expr);
  Indication(assignInd);
END;