Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Name Analysis Reference Manual

Next Chapter Table of Contents


Fundamentals of Name Analysis

Identifiers are basic symbols of a language. Each occurrence of an identifier in a program corresponds to a leaf of the abstract syntax tree of that program. A programmer uses a freely-chosen identifier to represent some entity in the program's universe. The task of name analysis is to discover the entity represented by each identifier occurrence. Four concepts relating identifiers to the entities they represent are necessary to describe this task:

  • A binding is a pair consisting of an identifier `i' and an entity `k'.

  • A scope is a contiguous sequence of program text associated with a set of bindings. We say that a binding in the set has the scope.

  • A defining occurrence is an occurrence of an identifier `i' that could legally be the only occurrence of that identifier in the program. A binding `(i,k)' is associated with each defining occurrence.

    • Language rules specify the scope of a defining occurrence, and thus a scope of the binding associated with that defining occurrence. Other language rules may specify further scopes of a binding that are not in the context of the defining occurrence.

  • An applied occurrence is an occurrence of an identifier `i' that is legal only if it identifies a binding `(i,k)' in some set associated with its context.

    • Language rules specify the set(s) in which an applied occurrence may identify bindings.

Defining occurrences, applied occurrences, and scopes are concepts related to the program text; how are these concepts represented in the tree? Each identifier occurrence is a single basic symbol in the text, and is represented by a leaf of the tree. A scope, on the other hand, extends over some region of the text. The developer needs to map each scope to an abstract syntax subtree encompassing that scope. We call such a subtree a range; several scopes may map into the same range.

Because a defining occurrence may be the only occurrence of that identifier, the entity and range of the binding must be determined from the syntactic context of the corresponding abstract syntax tree leaf. The way in which this is done is language dependent, but involves no name analysis. Bindings for defining occurrences can therefore be regarded as constants for the purposes of name analysis.

Name analysis is a computation carried out on the tree that is the internal representation of the program being analyzed; its goal is to determine the appropriate binding for each applied occurrence, based on the concepts and language rules discussed above. That computation is specified in terms of the abstract syntax, and takes the form of assignments of values to attributes of tree nodes (see Computations of LIDO - Reference Manual).

The search for a binding for an applied occurrence begins in the range containing that applied occurrence. If the identifier is not bound in that range, then most languages require the search to continue in related ranges. Ranges may be related syntactically, but many languages also allow the programmer to specify arbitrary relationships as part of their program. Such relationships cannot reasonably be described in terms of the abstract syntax tree. Thus the name analyzer builds a separate data structure, a directed graph called a scope graph, to encode the relationships among ranges.

The ScopeGraphs specification module provides a generic lookup algorithm to implement the search for a binding in a scope graph. A language definition may specify context conditions to determine whether or not a binding found by the generic algorithm is appropriate. If the binding is not appropriate, language rules may require either that the search be continued (possibly with some restriction) or abandoned. Such rules can be enforced by developer-coded functions that are invoked by the generic algorithm whenever a binding for the applied occurrence is found (see Implementing Language-Specific Behavior).

The task of type analysis is to establish the type associated with every program construct (parameters, variables, expressions, etc.) that has a type (see Top of Type Analysis Reference Manual). Name analysis of a program written in a simple language can often be completed before beginning the type analysis, but this is not possible in general. Rather than considering name and type analyzers as monolithic processes, we need to think of them as collections of computations to be applied in particular linguistic contexts. Each computation requires certain inputs and delivers certain outputs; it can be applied whenever the necessary inputs are available.

Scope graphs

Each node of a scope graph corresponds to a range, and each edge defines a relationship between ranges. We say that node @math{n} is the tail of edge @math{(n,n')} and node @math{n'} is its tip.

There are two kinds of edges, parent edges and path edges. A parent edge defines a textual containment relationship that is established by the syntactic structure of the program, while a path edge defines a relationship established by the semantics of some program construct.

We label node @math{n} with a partial function @math{Bind(n): (identifier \mapsto entity)} specifying the bindings associated with the range corresponding to node @math{n}, and we label each path edge of a scope graph with a positive integer further classifying that path edge. A node may have an arbitrary number of path edges incident upon it. A scope graph may be cyclic, provided each cycle contains at least one path edge.

Some languages allow a programmer to use the same identifier to represent entities of different kinds in the same region of the program. It may also be the case that the ranges associated with different kinds of identifiers are different. Because of these properties, the ScopeGraphs specification module requires the developer to use a distinct scope graph for each kind of identifier. When a particular kind of identifier has a unique set of ranges, as does the label identifier in Java, a unique instantiation of the ScopeGraphs specification module is required for that kind of identifier. Often, however, a number of different kinds of identifier share the same set of ranges. (For example, field identifiers and method identifiers in Java share the same set of ranges.) In this case, the scope graphs for the different kinds of identifier are isomorphic and a single instantiation of the ScopeGraphs specification module can be used for all (see Isomorphic Scope Graphs).

The generic lookup

The generic lookup algorithm implements the search for an applied occurrence's binding. An applied occurrence `i' can appear on its own as a simple name, or it can be a component of a qualified name like `q.i'. (Here `q' is the qualifier, and may itself be a qualified name. The leftmost applied occurrence in `q' is considered to be a simple name.) If a binding is being sought for a simple name, the search begins at the scope graph node for the smallest range containing that applied occurrence; otherwise the search begins at the scope graph node for the entity named by the qualifier.

Suppose that the search begins at node @math{n} of a scope graph. The generic algorithm first checks @math{Bind(n)}. If @math{Bind(n)} does not contain an appropriate binding, the algorithm explores all paths starting at node @math{n} and made up solely of edges labeled 1. At each node @math{n_i} on those paths, @math{Bind(n_i)} is checked before continuing.

If the applied occurrence is qualified, and if no binding is found after exploring all paths with label 1, the algorithm reports that there is no binding for the applied occurrence. This behavior implements the normal semantics of a qualified name, if the path edges labeled 1 represent an inheritance relation: the binding must be in the entity named by the qualifier, or it must be in some entity from which the qualifier inherits.

The binding for a simple name might be inherited, but it might also be imported into the range for the scope graph node @math{n} or be found in an outer range @math{n'}. A developer would use integers larger than 1 to label path edges that express semantic relations like import, or that tune the generic lookup. In general, if no binding is found for a simple name after traversing all paths starting at node @math{n} and made up solely of edges labeled @math{k}, the process is repeated with paths starting at node @math{n} and made up solely of edges labeled @math{k+1}. This search continues until either a binding is found or the value @math{k+1} is not an edge label. The developer must take that behavior into account when selecting edge labels to represent specific relationships.

If no binding is found starting at node @math{n} for a simple name, and if there is a parent edge @math{n \rightarrow n'}, then the algorithm re-starts at node @math{n'}. Note that the new search beginning at node @math{n'} is completely independent of the one that started at node @math{n}.

Whenever the generic lookup algorithm finds a node whose @math{Bind} function contains a binding for the applied occurrence, it invokes a specified function (see Implementing Language-Specific Behavior). The module contains default implementations of these functions, but the developer can easily override those defaults to provide language-specific behavior. (The default implementation simply accepts the binding found.)

If the largest path edge label, `MaxLabel', is greater than 1, then the developer must provide a file named `ScopeGraphs.h' as part of the specification. `ScopeGraphs.h' must contain the following declaration:

#define MaxKindsPathEdge `MaxLabel'

The syntactic context of a defining occurrence must determine the scope graph in which it is bound, but the syntactic context of an applied occurrence might not. When a unique scope graph cannot be determined for an applied occurrence from its syntactic context, the generic algorithm conducts a search in each of the possible graphs. If there is more than one result from these searches, a language-dependent disambiguation process must be used (see Implementing Language-Specific Behavior).

Interaction with type analysis

Consider the construct `v.f', where `v' is a variable of a record type and `f' is a field of that record. Name analysis can find the key bound to `v', which will represent a variable. However, the field name `f' is bound in the range associated with the type of that variable, not with the variable itself. In order to apply the generic lookup to `f' in `v.f', some computation of the type analyzer must establish the type associated with the variable `v'. Because the qualifier is a type, we call a construct like `v.f' a type-qualified name.

If an applied occurrence can be distinguished syntactically as a typed entity use, the type is guaranteed to be available (see Accessing the type of an entity of Type Analysis Reference Manual). This is not generally true for a type-qualified name, where the identifiers may denote either types (such as Java classes) or typed entities (such as Java fields). Here the value of a property of the key bound to the identifier must be accessed to obtain the type, and some dependence relation must guarantee that the property's value has been set. If Eli's Typing module is used, the appropriate relation can be established by:

CLASS SYMBOL TypedDefId INHERITS *SetTypeOfEntity END;
CLASS SYMBOL TypedUseId COMPUTE

  SYNT.TypeIsSet=INCLUDING OutSideInDeps.GotEntityTypes;
END;

Note that only one instantiation of the ScopeGraphs module can interact with Eli's Typing module.

In some languages, a type-qualified name may define the tip of a path edge. Examples of such situations are the Pascal with statement and the Java anonymous class. This situation is problematic because both name and type analysis computations are necessary to resolve the tip name, but the scope graph is not complete without the path edge.

A Pascal with statement or a Java anonymous class corresponds to a subgraph, @math{W}, of the scope graph @math{G}. The rules of the language are such that @math{G} will never contain any edge whose tip is in @math{W} and whose tail is in @math{G-W}. That means that name analysis in @math{G-W} can never depend on any information from @math{W}.

Suppose that we partition the analysis of a program containing such constructs, completing both name and type analysis in @math{G-W} before examining any applied occurrences in @math{W}. Since the type-qualified name defining the tip of the path edge in question lies in the text corresponding to @math{W}, but all of its component bindings are defined in @math{G-W}, all of the necessary information for creating the path edge will be available when analyzing @math{W}. Moreover, the missing path edge will be irrelevant for the analysis of @math{G-W}.

A different kind of interaction between name and type analysis occurs when the language permits a single function name to describe several distinct operations. In this case, a range may contain more than one defining occurrence for the function identifier, each of which denotes a different entity. We say that in such a situation the function identifier is overloaded.

Because @math{Bind(n)} is a function, there is a single binding for an overloaded function identifier in a given range. The key to which the function identifier is bound represents an indication that is associated with the set of operations described by the defining occurrences (see Selecting an operator at an expression node of Type Analysis Reference Manual). The operations in the set are distinguished by the types of operands they require, information that must be derived from type analysis computations.

If none of the operations whose defining occurrences lie in a particular range satisfies the requirements of the actual operands, then the language may require that the search continue in the normal way (see The generic lookup). In this case, name analysis computations advancing the lookup alternate with type analysis computations attempting to identify the indications found.


Next Chapter Table of Contents