Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Tutorial for Name Analysis Using ScopeGraphs

Previous Chapter Next Chapter Table of Contents


Multiple Scope Graphs

The scope graph is the basic data structure for name analysis (see Fundamentals of Name Analysis of Name Analysis Reference Manual). Its structure embodies the relationships among ranges dictated by scope rules, and its content reflects the set of defining occurrences subject to those rules. There are two reasons that a processor specification may require more than one scope graph:

  • The language permits several defining occurrences of the same identifier to bind to different entities in a given range.

  • Different constructs in the language obey different scope rules.
In the first case the scope graphs are isomorphic, but the contents vary because they describe different defining occurrences; in the second the structure varies because the scope rules are different. This chapter extends NameLan in two ways, to illustrate the necessary computations.

Reusing identifiers in the same scope

It's hard to think up names! Suppose that, as language designers, we allow a programmer to use the same identifier to represent a package, a class, a variable, and a method in a single range. If we, as developers, can decide which of these is meant on the basis of context then we can use distinct scope graphs to obtain the proper defining occurrence for each applied occurrence. Here is an example:

mgcd.nl[106]==

import gcd.*;
class gcd { int z; }
int gcd;
int gcd(int x, int y) {
  while (x != y) {
    if (x > y) x = x - y;
    else y = y - x;
  }
  return x;
}
{ gcd c;
  gcd = gcd(c.z, w); }

package gcd;
int w;
This macro is attached to a non-product file.

The program in `mgcd.nl' contains four applied occurrences of gcd with different meanings. You would probably have little difficulty deciding on the basis of context that the first applied occurrence names the package, the second names the class, the third names the variable, and the fourth names the method.

An Eli-generated processor needs four scope graphs to support that intuition, each modeling the bindings for one kind of entity. Because the scope rules are the same for all of the identifiers naming those entities, the scope graphs will have the same structure but different contents; the graphs themselves are isomorphic (see Isomorphic Scope Graphs of Name Analysis Reference Manual).

A single instantiation of the ScopeGraphs module will support any number of isomorphic scope graphs, but the scope graph data structures that Eli builds when generating a processor must be capable of handling the maximum number of isomorphic graphs supported by any single instantiation. This number is conveyed by the pre-processor identifier MaxIsoGraphs (see Isomorphic Scope Graphs of Name Analysis Reference Manual). We need to override the default value of 1 by placing a directive in `ScopeGraphs.h' (see Import on demand).

ScopeGraphs.h content[107]==

#define MaxIsoGraphs  4
This macro is defined in definitions 72, 107, and 109.
This macro is invoked in definition 73.

The actual number of isomorphic scope graphs associated with a single instantiation of the module is specified by the NumberOfIsoGraphs attribute of the root symbol of the grammar (see Isomorphic Scope Graphs of Name Analysis Reference Manual).

Abstract syntax tree[108]==

SYMBOL Collection COMPUTE SYNT.NumberOfIsoGraphs=4; END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

Each graph is indexed by a integer whose value is between 0 and 3. Values larger that 3 can be used to give information about the syntactic context, but indicate that the specific graph to be used either is unknown or must be computed. (We call these weak contexts to distinguish them from the strong contexts where the specific graph can be determined.)

It's a good idea to use a symbolic name instead of a numeric value when specifying a graph index. Appropriate names can be defined as a C enumeration in `ScopeGraphs.h':

ScopeGraphs.h content[109]==

enum {
  packageGraph, classGraph, variableGraph, methodGraph,
  packageOrClassContext, ambiguousContext
};
This macro is defined in definitions 72, 107, and 109.
This macro is invoked in definition 73.

The first four identifiers, whose representations end in Graph, index the scope graphs. The remaining identifiers, whose representations end in Context, model weak contexts.

Each defining or applied occurrence has a GraphIndex attribute. By default, the values of those attributes are set to 0. This default corresponds to the case of a single scope graph.

When the module instantiation supports several isomorphic scope graphs, the developer must override the default values of the GraphIndex attributes with the index of the appropriate scope graph. Defining occurrences are strong contexts that are straightforward to specify (note that a parameter is considered to be a variable):

Abstract syntax tree[110]==

ATTR GraphIndex: int;

SYMBOL PackageDefName COMPUTE
  INH.GraphIndex = packageGraph;
END;

SYMBOL ClassDefName COMPUTE
  INH.GraphIndex = classGraph;
END;

SYMBOL VarDefName COMPUTE
  INH.GraphIndex = variableGraph;
END;

SYMBOL ParamDefName COMPUTE
  INH.GraphIndex = variableGraph;
END;

SYMBOL MethodDefName COMPUTE
  INH.GraphIndex = methodGraph;
END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

Applied occurrences are always components of the complete names that we have defined throughout this document (see Qualified names). As language designers, we must state NameLan rules for the context of each complete name, and then as developers derive LIDO computations to implement them. Here we write just the LIDO computation for each such context, however, leaving it to you to deduce an appropriate NameLan rule:

Abstract syntax tree[111]==

RULE: ClassName ::= Name COMPUTE
  Name.GraphIndex = classGraph;
END;

RULE: ExprName  ::= Name COMPUTE
  Name.GraphIndex = variableGraph;
END;

RULE: MethName  ::= Name COMPUTE
  Name.GraphIndex = methodGraph;
END;

RULE: WithName  ::= Name COMPUTE
  Name.GraphIndex = variableGraph;
END;

RULE: SuperClass ::= WLName COMPUTE
  WLName.GraphIndex = classGraph;
END;

RULE: ImportName ::= WLName COMPUTE
  WLName.GraphIndex = classGraph;
END;

RULE: PCName    ::= WLName COMPUTE
  WLName.GraphIndex = packageOrClassContext;
END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

Note that all but one of these complete names is in a strong context. The only weak context is the PCName, which could be either a package name or a class name (see Import on demand).

Once we know the context of a complete name, we need to analyze the applied occurrences making up that name. The context for a complete name applies only to the rightmost applied occurrence of that complete name. If the complete name is a simple name, the computations are:

Abstract syntax tree[112]==

RULE: Name ::= SimpleName COMPUTE
  SimpleName.GraphIndex = Name.GraphIndex;
END;

RULE: WLName ::= SimpleWLName COMPUTE
  SimpleWLName.GraphIndex = WLName.GraphIndex;
END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

The NameLan rules that we, as language designers, have formulated to define the meanings of other applied occurrences come in two parts: definition of the weak contexts of any qualifiers, and then resolution of those weak contexts by selecting possible bindings. The rules for qualifier contexts are:

  • A name to the left of the rightmost . in a qualified ClassName is a package-or-class name.

  • A name to the left of the rightmost . in a qualified package-or-class name is a package-or-class name.

  • A name to the left of the rightmost . in a qualified ExprName is an ambiguous name.

  • A name to the left of the rightmost . in a qualified MethName is an ambiguous name.

  • A name to the left of the rightmost . in a qualified ambiguous name is an ambiguous name.

Here is a LIDO implementation of these rules:

Abstract syntax tree[113]==

RULE: Name ::= Name '.' QualifiedId COMPUTE
  QualifiedId.GraphIndex = Name[1].GraphIndex;
  Name[2].GraphIndex =
    IF(EQ(Name[1].GraphIndex,classGraph),
      packageOrClassContext,
    IF(EQ(Name[1].GraphIndex,packageOrClassContext),
      packageOrClassContext,
    ambiguousContext));
END;

RULE: WLName ::= WLName '.' QualifiedWLId COMPUTE
  QualifiedWLId.GraphIndex = WLName[1].GraphIndex;
  WLName[2].GraphIndex = packageOrClassContext;
END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

The GraphIndex attribute of the applied occurrence is passed to the generic lookup as the GraphIndex property of the applied occurrence's UseKey (see Applied Occurrences of Name Analysis Reference Manual). If the value of the GraphIndex property is less than the number of isomorphic scope graphs, N, the generic algorithm seeks a defining occurrence in the indexed graph (recall that indexes start at 0). Otherwise, it seeks a defining occurrence independently in each of the graphs, and presents the results in an array G of length N. If no defining occurrence was found in graph k, then G[k] contains the value NoKey.

If the value of the GraphIndex property was greater than or equal to N, the generic lookup invokes a C function named DisambiguateGraphs before returning (see Deciding among possible bindings of Name Analysis Reference Manual). DisambiguateGraphs is given G, N, and the UseKey as arguments, and must return an appropriate value as the final result of the generic lookup.

Eli's default version of DisambiguateGraphs checks whether exactly one of the searches yielded a result other than NoKey. If so, then it returns that result; otherwise it returns NoKey. This implements the rule that wherever the context would allow a simple name to have more than one meaning, the normal scope rules find only one of those meanings. File `mgcd.nl' satisfies that condition, but `ambig.nl' does not:

ambig.nl[114]==

import v.*;

class A {
  class C { int x; }
}

A v;
{ v.C x;
  v.C.x = 0; }

package v;
class C { int x; }
This macro is attached to a non-product file.

Consider the two uses of v at the beginning of the program block. It is reasonable to interpret the first as an applied occurrence of the package name, since using a variable in the type of another variable is counterintuitive. Interpretation of the second as an applied occurrence of the variable name is also reasonable, although perhaps not as obvious as the first. If NameLan is to allow programs like `ambig.nl', it must provide appropriate rules for the programmer. Those rules obviously can't be stated in terms of G[k], which is an implementation detail.

The solution is to explain the choice as a reclassification of the context, based on the results of the searches. This reclassification turns a weak context into a strong context, which requires the desired selection. Here are our rules for NameLan:

  • Package-or-class names are reclassified as follows:

    • If the package-or-class name is a simple name consisting of a single identifier:

      • If the identifier is in the scope of a class declaration then the package-or-class name is reclassified as a class name.

      • Otherwise the package-or-class name is reclassified as a package name.

    • If the package-or-class name is a qualified name, then it is reclassified as a class name.

  • Ambiguous names are reclassified as follows:

    • If the ambiguous name is a simple name consisting of a single identifier:

      • If the identifier is in the scope of a variable or parameter declaration then the ambiguous name is reclassified as a variable name.

      • Otherwise if the ambiguous name is in the scope of a class declaration then the ambiguous name is reclassified as a class name.

      • Otherwise the ambiguous name is reclassified as a package name.

    • If the ambiguous name is a qualified name, the qualifier is first reclassified. Then:

      • If the qualifier is reclassified as a package name or a class name then:

        • If the identifier is a variable name in the qualifier's range then the ambiguous name is reclassified as a variable name.

        • Otherwise if the identifier is a class name in the qualifier's range then the ambiguous name is reclassified as a class name.

        • Otherwise an error is reported.

      • If the qualifier is reclassified as a variable name of type `T' then:

        • If the identifier is a variable name in `T''s class then the ambiguous name is reclassified as a variable name.

        • Otherwise if the identifier is a class name in `T''s class then the ambiguous name is reclassified as a class name.

        • Otherwise an error is reported.

Note that these rules do not require any actual modification of any GraphIndex. All they do is to specify which of the elements of G DisambiguateGraphs should return. There is no requirement that the selected element of G differ from NoKey. If the variable GraphIndex contains the value of the GraphIndex property of the UseKey, here is a C computation that implements the NameLan rules:

Implementation of the NameLan reclassification[115]==

if (GraphIndex == packageOrClassContext) {
  if (G[classGraph] != NoKey) return G[classGraph];
  return G[packageGraph];
}

if (GraphIndex == ambiguousContext) {
  if (G[variableGraph] != NoKey) return G[variableGraph];
  if (G[classGraph] != NoKey) return G[classGraph];
  return G[packageGraph];
}
This macro is invoked in definition 116.

The simplest way to complete the implementation is to download Eli's version of DisambiguateGraphs.c and then replace the default computation:

-> $elipkg/Name/DisambiguateGraphs.c >

DisambiguateGraphs.c[116]==

#include "deftbl.h"
#include "pdl_gen.h"
#include "ScopeGraphs.h"

DefTableKey
DisambiguateGraphs (DefTableKey G[], int N, DefTableKey app)
/* On entry-
 *   G[] is an array with N elements giving the result for each search
 *   app is the UseKey of the applied occurrence sought
 * On exit-
 *   DisambiguateGraphs is the selected key
 ***/
{ int GraphIndex = GetGraphIndex(app, 0);

  Implementation of the NameLan reclassification[115]

  return NoKey;
}
This macro is attached to a product file.

GetGraphIndex is used to obtain the UseKey's GraphIndex property (see Behavior of the basic query operations of The Definition Table Module). The include file `pdl_gen.h' provides the interface for GetGraphIndex, and `ScopeGraphs.h' defines the symbolic constants used in the computation. The final return statement avoids a compiler warning.

`DisambiguateGraphs.c' must be included in the list of specifications:

Specification files[117]==

DisambiguateGraphs.c
This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81,
   91, 117, 121, 139, 142, 146, and 152.
This macro is invoked in definition 13.

Exercises

These exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:

-> $elipkg/Name/LearnSG%Kinds > .

None of these files will have write permission in your current directory.

  1. Explain why all qualifier contexts of WLNames are packageOrClassContext.

  2. Explain why the default implementation of `DisambiguateGraphs.c' suffices for analyzing `mgcd.nl' (see Deciding among possible bindings of Name Analysis Reference Manual).

  3. Consider the two uses of v at the beginning of `ambig.nl''s program block.

    1. Give the contents of the array G for the first applied occurrence of v.

    2. Use the context reclassification rules of NameLan to explain how the context of the first use of v should be reclassified.

    3. Show that `DisambiguateGraphs' will return the correct graph index for this applied occurrence of v.

    4. Repeat the first three parts of this exercise for the second applied occurrence of v.

  4. Use `Bindings.specs' to generate a processor and apply it to `ambig.nl'. Did you get the results you expected?

Constructs obeying different scope rules

Consider the task of reading integer values from a file until the current number would cause the running sum to exceed the first value on the file. The program must print the number of values making up the final sum. The problem here is that there are two conditions under which to exit the loop: either the data runs out or the preset value is exceeded.

As language designers, we can extend NameLan to simplify the solution to this problem:

Phrase structure[118]==

Statement:      Loop / 'break' Ident ';'.
Loop:           Ident ':' 'while' '(' Expr ')' Statement.
This macro is defined in definitions 2, 25, 30, 32, 41, 50, 55,
   67, 76, 97, 118, and 124.
This macro is invoked in definition 3.

Here is a program that uses the extension:

maxsum.nl[119]==

import stdio.*;
{ int sum = 0, count = 0, below;
  below = getint();
  loop: while (feof(stdin) == 0) {
    int next = sum + getint();
    if (next > below) break loop;
    sum = next; count = count + 1;
  }
  putint(count);
}

package stdio;
class file { }
file stdin;
int feof(file f) { }
int getint() { }
void putint(int v) { }
This macro is attached to a non-product file.

As developers, we need new abstract syntax symbols to distinguish the two new contexts for an identifier (see Representation of identifiers of Name Analysis Reference Manual). The obvious choices are LabelDef and LabelUse:

Abstract syntax of identifiers[120]==

RULE: Loop      ::= LabelDef ':' 'while' '(' Expr ')' Statement END;
RULE: LabelDef  ::= Ident END;

RULE: Statement ::= 'break' LabelUse ';' END;
RULE: LabelUse  ::= Ident END;
This macro is defined in definitions 8, 9, 26, 31, 33, 42, 51, and 120.
This macro is invoked in definition 10.

The Loop construct is a labeled while statement. Execution of a break statement terminates the smallest enclosing Loop construct defining the given label. Our scope rule is:

  • The scope of a defining occurrence LabelDef is the smallest enclosing Loop.

The key point for name analysis of labels is that this scope rule requires a scope graph whose structure is completely different from that of the scope graphs we have been using. It therefore cannot be implemented by isomorphism, but requires an additional instantiation of the ScopeGraphs and SGProof modules (see Top of Name Analysis Reference Manual).

Specification files[121]==

$/Name/ScopeGraphs.gnrc +instance=LBL_                  :inst
$/Name/SGProof.gnrc     +instance=LBL_+referto=Ident    :inst
This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81,
   91, 117, 121, 139, 142, 146, and 152.
This macro is invoked in definition 13.

The instance argument LBL_ is needed to distinguish this instantiation from the one used for analysis of other names (see Basic Scope Rules of Name Analysis according to scope rules). That instantiation argument must prefix each role name used for analysis of labels:

Abstract syntax tree[122]==

SYMBOL Loop     INHERITS LBL_RangeScope                 END;
SYMBOL LabelDef INHERITS LBL_IdDefScope                 END;
SYMBOL LabelUse INHERITS LBL_GCSimpleName, LBL_ChkIdUse END;
This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53,
   65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111,
   112, 113, 122, 133, 134, 135, 136, 137, 144, and 153.
This macro is invoked in definition 11.

Exercises

These exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:

-> $elipkg/Name/LearnSG%Labels > .

None of these files will have write permission in your current directory. You will need to add write permission in order to do the exercises.

  1. Use `Bindings.specs' to generate a processor named `lbl' and apply it to `maxsum.nl'.

    It may happen that Eli does not find an evaluation order for the computations of this specification. More than one instantiation plus type-qualified inheritance relations (e.g. the with statement) present too many possibilities. In that case, you need to eliminate some of those possibilities by adding dependence. The following delays the computations of the LBL_ instance until the computations of the original instance are done. That may help Eli to find an evaluation order:

    SYMBOL LabelDef COMPUTE
      INH.LBL_GraphIndex=0 <- INCLUDING OutSideInDeps.FPSolved;
    END;
    


Previous Chapter Next Chapter Table of Contents