General Information

	Eli: Translator Construction Made Easy
	Global Index
	Frequently Asked Questions
	Typical Eli Usage Errors

Tutorials

	Quick Reference Card
	Guide For new Eli Users
	Release Notes of Eli
	Tutorial on Name Analysis
	Tutorial on Scope Graphs
	Tutorial on Type Analysis
	Typical Eli Usage Errors

Reference Manuals

	User Interface
	Eli products and parameters
	LIDO Reference Manual
	Typical Eli Usage Errors

Libraries

	Eli library routines
	Specification Module Library

Translation Tasks

	Lexical analysis specification
	Syntactic Analysis Manual
	Computation in Trees

Tools

	LIGA Control Language
	Debugging Information for LIDO
	Graphical ORder TOol

FunnelWeb User's Manual

	Pattern-based Text Generator
	Property Definition Language
	Operator Identification Language
	Tree Grammar Specification Language
	Command Line Processing
	COLA Options Reference Manual

Generating Unparsing Code

Monitoring a Processor's Execution

Administration

System Administration Guide

Type Analysis

Types, Operators, and Indications

A type characterizes a subset of values in the universe manipulated by the program, an operator defines an operation applied to operands of specified types to produce a result of a specific type, and an indication defines a set of operators. These three concepts form the basis for any type model. The designer must specify language-defined types, operators, and indications; it may also be possible for the user to provide additional specifications as part of a program (see User-Defined Types).

Although language-defined types may be specified individually, they are usually introduced by language-defined operator specifications. These, along with language-defined indication specifications, are described in a language called OIL (see OIL's Specification Language of Oil Reference Manual). OIL text is written in a file whose name has the form `name'.oil.

Language-defined types

Each type is represented by a unique definition table key whose IsType property has the value 1. Further properties of that key may be used to provide information about that particular type. NoKey represents an unknown or non-existent type.

Language-defined types like "integer" are represented by known keys (see Initializations of Definition Table). The known key name can be used directly in an attribute computation. For example, suppose that the designer chose intType as the name of the known key for the Java integer type. The following rule would then interpret the keyword int as denoting that type:

RULE: Type ::= 'int' COMPUTE
  Type.Type=intType;
END;

Language-defined types are sometimes denoted by pre-defined identifiers (as in Pascal) instead of keywords. This approach increases the complexity of the specification by introducing type identifiers (see Type identifiers). It also allows a user to re-define the names of language-defined types as names of variables or parameters, making programs very hard to understand. We recommend that designers use keywords to denote language-defined types.

Language-defined operators

An operator has a fixed signature, and is represented by a unique definition table key. Properties of that key may be used to provide information about that particular operator. NoKey represents an unknown or non-existent operator.

The OIL statement OPER `opr' `sig'; defines the key and signature of an operator:

`opr': The name of the known definition table key representing the operator. Multiple operator definitions with the same value of `opr' are not allowed.
`sig': The signature of the operator represented by `opr'. It consists of a parenthesized (possibly empty), comma-separated list of operand types followed by a colon and a return type. All of the types in the signature are automatically defined as known keys representing types; no further specification is required.

Only one occurrence of the keyword OPER is required for a sequence of contiguous operator definitions:

OPER
  iAddOp (intType,intType):intType;
  fAddOp (floatType,floatType):floatType;
  iGtrOp (intType,intType):boolType;

The known keys named intType, floatType, and boolType are defined by this specification, each with its IsType property set to 1. Known keys named iAddOp, fAddOp, and iGtrOp are also defined; no further specification of these names is necessary.

Often there are a number of language-defined operators sharing the same signature. A shorthand notation for describing such a situation allows the designer to provide a comma-separated list of operator names and write the shared signature only once:

OPER
  iAddOp,iSubOp,iMulOp,iDivOp (intType,intType):intType;
  fAddOp,fSubOp,fMulOp,fDivOp (floatType,floatType):floatType;

Language-defined indications

Each operator must belong to a set of operators associated with some indication, also represented by a unique definition table key. Properties of that key may be used to provide additional information about that indication. NoKey represents an unknown or non-existent indication.

The OIL statement INDICATION `ind': `list'; defines a subset of the operators associated with an indication:

`ind': The known definition table key representing the indication. Multiple indication definitions with the same value of `ind' are allowed. In that case, the operator set associated with the indication is the union of the sets specified by the individual definitions.
`list': A comma-separated list of operators in the indication's set.

Only one occurrence of the keyword INDICATION is required for a sequence of contiguous indication definitions:

INDICATION
  PlusInd: iAddOp, fAddOp;
  MinusInd: iSubOp, fSubOp;

Language-defined coercibility

Language properties like the "usual arithmetic conversions" of C and the "widening conversions" of Java allow the compiler to accept an operand of one type as though it were a value of another type. We use the relation acceptableAs on types to model these properties. acceptableAs is a partial order:

Reflexive: (`T' acceptableAs `T') for any type `T'
Transitive: (`T' acceptableAs `T1') and (`T1' acceptableAs `T2') for some types `T', `T1', and `T2' implies (`T' acceptableAs `T2')
Antisymmetric: (`T' acceptableAs `T1') and (`T1' acceptableAs `T') implies `T' is identical to `T1'

To see why these properties are important, consider the following expression in C or Java (s is of type short and f is of type float):

s + f

Both C and Java allow implicit conversion of short to int and int to float in the context of an arithmetic operand. Thus a designer would specify (short acceptableAs int) and (int acceptableAs float) for C or Java. Transitivity guarantees that (short acceptableAs float), and reflexivity guarantees that (float acceptableAs float), so the operator fAddOp can be selected from the set associated with the indication PlusInd of the last section.

Suppose that (float acceptableAs int). In that case, the meaning of the expression is ambiguous. There is no way to decide whether to select the operator iAddOp or the operator fAddOp from PlusInd's set. But because acceptableAs is antisymmetric, (float acceptableAs int) would imply that int and float were identical types. Thus the designer cannot specify (float acceptableAs int) for C or Java.

The acceptableAs relation is specified by defining coercion operators. The OIL statement COERCION `opr' `sig'; defines the key and signature of a coercion:

`opr': The name of the known definition table key representing the coercion operator. If `opr' is omitted, the OIL compiler will generate a unique name internally. Multiple coercion definitions with the same value of `opr' are not allowed. A coercion definition cannot have the same value of `opr' as an operator definition.
`sig': The signature of the coercion operator represented by `opr'. It consists of a parenthesized operand type followed by a colon and a return type. Both types in the signature are automatically defined as known keys representing types; no further specification is required.

Only one occurrence of the keyword COERCION is required for a sequence of contiguous coercion operator definitions:

COERCION
  sToi (shortType):intType;
       (intType):floatType;

This specification illustrates both named and anonymous coercions. Generally speaking, coercions need be named only if they are to be discussed in associated documentation or extracted to support further processing (such as the evaluation of constant expressions).

Reducing specification size

A full specification of language-defined operators often leads to a combinatorial explosion. In many applications the effects of this explosion on the written specification can be mitigated by avoiding unnecessary operator names. For example, the task of type analysis is to verify type correctness; the identity of the operator that models the type behavior at a specific node is normally irrelevant.

Language definitions avoid combinatorial explosions by giving names to sets of types and then defining properties of operations in terms of these sets rather than the individual elements. For example, the C definition describes operations on "arithmetic types" rather than describing those operations on integers and then again on floating-point values. OIL provides a notation for naming and manipulating sets of types that allows the designer to encode such language definitions directly.

The OIL statement SET `name' = `expr' ; defines a set of types:

`name'

An identifier naming a set. Multiple sets with the same name are not allowed.

`expr'

An expression defining the types that are members of the set. There are five possible expression formats:

[ `elements' ]: Each member of the comma-separated list `elements' is a known key representing a type. That type is an element of the value of this expression. There are no other elements.
`name': The previously-defined set `name' is the value of this expression.
`s1' + `s2': The value of this expression is the union of set `s1' and set `s2'.
`s1' * `s2': The value of this expression is the intersection of set `s1' and set `s2'.
`s1' - `s2': The value of this expression is the set of elements of `s1' that are not elements of `s2'.

Here are some definitions that mirror the C language standard:

SET Signed_IntegerType =
  [signed_charType, shortType, intType, longType];

SET Unsigned_IntegerType =
  [unsigned_charType, unsigned_shortType,
   unsigned_intType,  unsigned_longType];

SET FloatingType =
  [floatType, doubleType, long_doubleType];

SET IntegralType =
  [charType] + Signed_IntegerType + Unsigned_IntegerType;

SET ArithmeticType = IntegralType + FloatingType;

SET ScalarType = ArithmeticType + [VoidPointerType];

A specific context in a program will often require a value that can be of any type in a particular set. For example, the condition value in a C if statement or conditional expression can be of any scalar type. We model this situation by defining a "type" (scalarType, say) and making each scalar type acceptable as that type. The context can then require a value of scalarType, and any scalar type will be acceptable.

When a type set name is used in an OPER or COERCION signature, the result is a number of distinct operators. Each operator's signature is constructed by consistently substituting one element of the named type set for each instance of the type name. Thus every scalar type can be made acceptable as scalarType as follows:

COERCION (ScalarType):scalarType;

Similarly, signatures containing type set names can be used to reduce the number of specifications needed for operators. For example, consider the following specification:

OPER ArithOp (ArithmeticType, ArithmeticType): ArithmeticType;

It defines a set of 12 operators, each named by the known key ArithOp. Each operator has a distinct signature, one of which is (charType,charType):charType. That signature results from the consistent substitution of the charType element of ArithmeticType for the name of that set in the OPER statement's signature.

This set of 12 operators can be associated with an indication:

INDICATION ArithInd: ArithOp;

Because the same element of a type set is substituted for each instance of the name of that set in a signature, the only way to get all combinations of elements is to create another name for that set and use both names in the signature. For example, a value of any scalar type in C can be cast to any other scalar type:

SET CastResult = ScalarType;
OPER ScalarCast (ScalarType):CastResult;

One of the operators named by the known key ScalarCast has the signature (charType):floatType. That signature results from substituting the charType element of ScalarType for the name of that set and the floatType element of CastResult for the name of that set.