General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Tutorial for Name Analysis Using ScopeGraphsKernel LanguageThis chapter considers the name analysis problem for a very simple subset of a language called NameLan. We begin with specifications of the phrase structure and basic symbols. A processor built from these specifications will accept a text in NameLan and build a tree. These specifications can then be augmented by others, enhancing that processor to make it solve the subset's name analysis problem. The sample text that we will analyze is: machar.nl[1]== int radix; { float a = 1.0; while (((a + 1.0) - a) - 1.0 == 0.0) a = a + a; float b = 1.0; while ((a + b) - a == 0.0) b = b + b; radix = (a + b) - a; } This macro is attached to a non-product file. This program determines the radix of the floating point representation of the machine on which it runs, and stores that value in a global variable. NameLan does not require fully-parenthesized expressions, but parentheses are needed here to guarantee the order of evaluation within the expressions. (Optimizers do not generally violate the integrity of parenthesized expressions.)
Text structureText is considered to be made up of a sequence of comments and basic symbols. Comments are character sequences to be ignored, while basic symbols are character sequences that correspond to terminal symbols of the grammar describing the phrase structure of the text. Basic symbols can be grouped according to the information they carry:
Name analysis is concerned with establishing the entity named by each identifier in the text. Since basic symbols correspond to terminal symbols of the grammar describing a language's phrase structure, it is generally simplest to start by specifying that grammar (see Context-Free Grammars and Parsing of Syntactic Analysis). The terminal symbols are then precisely the symbols that do not occur on the left-hand side of any grammar rule. The concrete syntax of a language is a specification of the phrase structure of a program written in that language. Here is a concrete syntax for the NameLan subset of interest here: Phrase structure[2]== Program: Declaration* Block. Declaration: VarDecl. Block: '{' DeclStmt* '}'. DeclStmt: VarDecl / Statement. VarDecl: Type VarDefs ';'. VarDefs: VarDef // ',' . VarDef: Ident [ '=' Expr ]. Type: 'int' / 'float' . Statement: Name '=' Expr ';' / 'if' Expr Statement $'else' / 'if' Expr Statement 'else' Statement / 'while' '(' Expr ')' Statement / Block. Expr: AExpr [ Relop AExpr ] . AExpr: [Addop] Term / AExpr Addop Term . Term: Factor / Term Mulop Factor . Factor: Name / Integer / Real / '(' Expr ')' . Relop: '<' / '<=' / '==' / '!=' / '>=' / '>'. Addop: '+' / '-'. Mulop: '*' / '/'. Name: Ident. This macro is defined in definitions 2, 25, 30, 32, 41, 50, 55, 67, 76, 97, 118, and 124. This macro is invoked in definition 3.
(See The effect of a $-modification of Syntactic Analysis,
for an explanation of the NameLan.con[3]== Phrase structure[2] This macro is attached to a product file.
Notice that many of the terminal symbols of this grammar are given
literally (for example, NameLan.gla[4]== Ident: C_IDENTIFIER Integer: C_INTEGER Real: C_FLOAT C_COMMENT This macro is attached to a product file. This specification indicates that the formats of these NameLan terminal symbols and comments are identical to those of their counterparts in C (see Top of Lexical Analysis).
ExercisesThese exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:
-> $elipkg/Name/LearnSG%Kernel > . None of these files will have write permission in your current directory. You will need to add write permission in order to do the exercises.
Tree Structure`NameLan.con' is the concrete syntax of our NameLan subset. It describes all of the phrases that can be used to construct a program like `machar.nl'. The processor generated from `NameLan.specs' in the last section verifies that a text file represents a syntactically-correct program in the NameLan subset. An Eli-generated processor represents a program internally by a tree, and carries out tasks like name analysis by computing the values of attributes attached to the nodes of the tree (see Top of LIDO -- Computation in Trees). The abstract syntax of a language describes all of the possible trees that a generated processor could build to represent syntactically-correct programs in that language. Eli uses LIDO notation to define the abstract syntax of a language (see LIDO -- Reference Manual). Eli can deduce an abstract syntax from a concrete syntax, but the trees described by that abstract syntax might not be the best ones for subsequent analysis. The rules describing expressions and operators in `NameLan.con' provide an excellent example. They implement the operator precedence and association rules of NameLan, and guarantee that the phrase structure reflects the relationship between operators and operands (see Using structure to convey meaning of Syntactic Analysis). Those rules determine the structure of the tree to be built.
Consider the expression in the second
(a + b) - a == 0.0
According to the grammar, this is an
AExpr Relop AExprThe interpretation of this structure is that the equality operator ==
compares the subexpression (a + b) - a with the number 0.0 .
It reflects the fact that addition and subtraction operations
should be carried out before comparison operations, i.e. that they should
have higher precedence
(see Operator precedence of Syntactic Analysis).
`NameLan.con' lists groups of operators in order of increasing precedence.
This program text would therefore be represented internally by a tree with
three subtrees: one representing (a + b) - a , one representing
== , and one representing 0.0 .
The parentheses in the expression to the left of
AExpr Addop TermThus the structure is the same as that of the parenthesized form.
The
There is no need to retain the distinction between different levels of
expression (e.g. Mappings from concrete symbols to abstract symbols[6]== MAPSYM Expr ::= AExpr Term Factor. Oper ::= Relop Addop Mulop. This macro is defined in definitions 6. This macro is invoked in definition 7. NameLan.map[7]== Mappings from concrete symbols to abstract symbols[6] This macro is attached to a product file.
`NameLan.map' removes the distinction between symbols of the concrete
syntax.
We also need to be able to introduce distinctions where a single
symbol of the concrete syntax plays different roles in different contexts.
For example, identifiers are always represented by the concrete symbol
We recommend that, wherever possible, abstract syntax symbols needed to distinguish identifier contexts not be added to the concrete syntax (see Representation of identifiers of Name Analysis Reference Manual). The reason for this recommendation is that the parser often does not have sufficient contextual information to recognize the phrase, but that information is available when the tree is being built. The abstract syntax derived from `Text.specs' contains the following rule:
RULE rule_024: Name ::= Ident END;In order to distinguish the identifier context of a simple name, let's add the symbol SimpleName to the abstract syntax.
This can be done by writing two explicit LIDO rules:
Abstract syntax of identifiers[8]== RULE: Name ::= SimpleName END; RULE: SimpleName ::= Ident END; This macro is defined in definitions 8, 9, 26, 31, 33, 42, 51, and 120. This macro is invoked in definition 10.
When the parser reports that it has recognized the phrase correponding to
We use a similar strategy to distinguish the two contexts in which an identifier is declared: Abstract syntax of identifiers[9]== RULE: VarDef ::= VarDefName END; RULE: VarDef ::= VarDefName '=' Expr END; RULE: VarDefName ::= Ident END; This macro is defined in definitions 8, 9, 26, 31, 33, 42, 51, and 120. This macro is invoked in definition 10. As we extend NameLan to demonstrate more complex name analysis issues, we will need to distinguish other identifier contexts. Abstract syntax tree[10]== Abstract syntax of identifiers[8] This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53, 65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111, 112, 113, 122, 133, 134, 135, 136, 137, 144, and 153. This macro is invoked in definition 11. NameLan.lido[11]== Abstract syntax tree[10] This macro is attached to a product file. We can now specify a processor that will take account of these mappings when deducing the abstract syntax, and produce a simpler tree from `machar.nl': Specification files[12]== NameLan.con NameLan.gla NameLan.map NameLan.lido This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81, 91, 117, 121, 139, 142, 146, and 152. This macro is invoked in definition 13. NameLan.specs[13]== Specification files[12] This macro is attached to a product file.
ExercisesThese exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:
-> $elipkg/Name/LearnSG%Tree > . None of these files will have write permission in your current directory.
Basic name analysis
When you read `machar.nl', your programming experience tells you that
that it is manipulating three variables.
Two of these variables are named In order to formalize this intuition, the designer of NameLan makes use of four concepts that relate identifiers to the entities they name (see Fundamentals of Name Analysis of Name Analysis Reference Manual).
As language designers, we have decided to specify the rules for the kernel language as follows:
A binding `( Consider the following trivial program:
float x = 0; {int x; x = 2;}The scope of the defining occurrence of x on line 1 is the
Program phrase, while that of the defining occurrence of x on
line 2 is the Block phrase making up line 2.
Suppose that the binding associated with the defining occurrence on line 1
is `(x,kf)' and the one associated with the defining occurrence on
line 2 is `(x,ki)'.
Which of these two bindings will the applied occurrence of x on line
2 identify?
Both of the scopes encompass the applied occurrence, but `(x,ki)' has
the smaller scope.
Therefore the applied occurrence will identify `(x,ki)'.
The term usually used for this effect is hiding.
We say that the binding in the Name analysis is a computation carried out on the tree that is the internal representation of the program being analyzed (see Fundamentals of Name Analysis of Name Analysis Reference Manual). That computation is specified in terms of the abstract syntax, and takes the form of assignments of values to attributes of tree nodes (see Computations of LIDO - Reference Manual). The goal of the computation is to determine the entities named by the program's identifiers, based on the concepts and language rules discussed above. Defining occurrences, applied occurrences, and scopes are concepts related to the program text; how are these concepts represented in the tree? Each identifier occurrence is a single basic symbol in the text, and is represented by a leaf of the tree. A scope, on the other hand, extends over some region of the text. The developer needs to map each such text region to an abstract syntax subtree encompassing that region. We call such a subtree a range; several scopes may map into the same range. A scope is represented by the root node of its range.
For example, consider the defining occurrence
Applying similar reasoning to the defining occurrence of Specific identifiers are character sequences in an input text, and are represented internally by integer values (see Character String Storage of The Eli Library). Identifiers with the same spelling are represented by the same integer (see Available Processors of Lexical Analysis).
Conventionally we use the attribute Attribute representing an identifier[14]== ATTR Sym: int; This macro is invoked in definition 18.
An entity is represented internally by a definition table key,
of type Attribute referencing an entity[15]== ATTR Key: DefTableKey; This macro is invoked in definition 18.
Name analysis determines a value of the
The value of the
The value of the Scope rules determining the relationships between identifiers and entities follow patterns that do not vary significantly among programming languages, and standard attribute computations have therefore been developed to implement name analysis. Eli has packaged those computations in a specification module (see Top of Introduction of specification modules). The developer can obtain them by instantiating that module: Specification files[16]== $/Name/ScopeGraphs.gnrc :inst This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81, 91, 117, 121, 139, 142, 146, and 152. This macro is invoked in definition 13. An Eli specification module provides computational roles: constellations of attributes and operations that are relevant for particular contexts in the tree. For example, a tree node playing the role of a defining occurrence is given three attributes and a particular set of computations to be carried out at specific visits to that node (see Defining Occurrences of Name Analysis Reference Manual). Computational roles must be associated with appropriate tree nodes so that the computations they describe can be carried out as those nodes are visited during traversals of the tree. This is done by using LIDO inheritance (see Inheritance of Computations of LIDO - Reference Manual).
The
Our scope rule for the kernel language defines the scope of a
Tree nodes playing ScopeGraphs roles[17]== SYMBOL Block INHERITS RangeScope END; SYMBOL VarDefName INHERITS IdDefScope END; SYMBOL SimpleName INHERITS GCSimpleName END; This macro is invoked in definition 18.
The The computations associated with the three roles use an internal data structure called a scope graph (see Fundamentals of Name Analysis of Name Analysis Reference Manual). Each scope graph node specifies a set of bindings; scope graph edges describe search paths used to find a binding for a given identifier.
Any tree node that plays the A basic understanding of the way in which the computations attached to the abstract syntax tree by computational roles use the scope graph will help you to understand why we do things in a certain way, and how to extend name analysis to more complex situations. We will therefore briefly look at the name analysis of `machar.nl'.
A computation inherited by the root node of the tree from the
Computations inherited from the
The search for a binding for the applied occurrence of
The complete specification of a processor that does name analysis on the
kernel language requires the Abstract syntax tree[18]== Attribute representing an identifier[14] Attribute referencing an entity[15] Tree nodes playing ScopeGraphs roles[17] This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53, 65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111, 112, 113, 122, 133, 134, 135, 136, 137, 144, and 153. This macro is invoked in definition 11. A processor generated from this specification will build a tree and perform name analysis, but it will not produce any output. For the purpose of this tutorial, however, we want our processor to make the results of the name analysis visible. We want to be able to see the relationships between applied and defining occurrences so that we can check our understanding. Eli provides another module that we can instantiate for this purpose (see Name Analysis Testing of Name Analysis Reference Manual). Bindings.specs[19]== NameLan.specs $/Name/SGProof.gnrc +referto=Ident :inst This macro is attached to a product file.
ExercisesThese exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:
-> $elipkg/Name/LearnSG%Basic > . None of these files will have write permission in your current directory. You will need to add write permission in order to do the exercises.
Error reporting
If the first line were removed from `machar.nl',
the applied occurrence of
Recall that the search for the meaning of the applied occurrence of
A Report an undefined identifier[20]== SYMBOL SimpleName INHERITS ChkIdUse END; This macro is invoked in definition 22. Now suppose that the first line of `machar.nl' were:
int radix; float radix;Both defining occurrences of radix have the Program range.
The scope rules of the previous section yield an ambiguous result when
several defining occurrences of the same identifier have the same
range: the assignment in the last line of `machar.nl' could be
setting either the integer or the floating point variable.
The ScopeGraphs module creates a new definition table key to represent the entity named by the first defining occurrence the attribute computation encounters in a specific range. (That defining occurrence may not be textually first; attribute computations are not necessarily carried out in textual order.) Any subsequent defining occurrence of the same identifier in that range will be represented by the same key.
For NameLan, we should report multiple defining occurrences bound to the
same definition table key as an error.
The ScopeGraphs module does not provide an error-reporting role for this
condition because there are a number of situations in which it is legal.
However, a simple computation based on the Report a multiply-defined identifier[21]== SYMBOL MultDefChk INHERITS Unique COMPUTE IF(NOT(THIS.Unique), message( ERROR, CatStrInd("Identifier is multiply defined: ", THIS.Sym), 0, COORDREF)); END; SYMBOL VarDefName INHERITS MultDefChk END; This macro is invoked in definition 22.
(See Source Text Coordinates and Error Reporting of The Eli Library, for an explanation of the
Combining these computations into a single specification that we add to
those of the previous section, and instantiating the Abstract syntax tree[22]== Report an undefined identifier[20] Report a multiply-defined identifier[21] This macro is defined in definitions 10, 18, 22, 38, 47, 52, 53, 65, 69, 71, 88, 89, 92, 93, 95, 96, 101, 105, 108, 110, 111, 112, 113, 122, 133, 134, 135, 136, 137, 144, and 153. This macro is invoked in definition 11. Specification files[23]== $/Prop/Unique.gnrc :inst This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81, 91, 117, 121, 139, 142, 146, and 152. This macro is invoked in definition 13.
ExercisesThese exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:
-> $elipkg/Name/LearnSG%Report > . None of these files will have write permission in your current directory. You will need to add write permission in order to do the exercises.
ProceduresSuppose that we want to be able to define and invoke procedures in our simple language. Here's an example of a program that defines a procedure to compute the greatest common divisor (gcd) of two integers. It also defines a global variable, and sets that variable to the gcd of two specific values: gcd.nl[24]== int gcd(int x, int y) { while (x != y) { if (x > y) x = x - y; else y = y - x; } return x; } int x; { x = gcd(9, 12); } This macro is attached to a non-product file. Our extension of the phrase structure reflects object-oriented terminology because we will eventually make NameLan an object-oriented language: Phrase structure[25]== Declaration: MethodDecl. MethodDecl: Type Ident MethodBody. MethodBody: '(' Parameters ')' '{' DeclStmt* '}'. Parameters: [ Parameter // ',' ]. Parameter: Type Ident. Type: 'void'. Statement: Name '(' Arguments ')' ';' / 'return' [ Expr ] ';'. Factor: Name '(' Arguments ')'. Arguments: [ Expr // ',' ]. This macro is defined in definitions 2, 25, 30, 32, 41, 50, 55, 67, 76, 97, 118, and 124. This macro is invoked in definition 3. We need abstract syntax symbols to distinguish the two new identifier contexts found in `Method.con' (see Representation of identifiers of Name Analysis Reference Manual). Abstract syntax of identifiers[26]== RULE: MethodDecl ::= Type MethodDefName MethodBody END; RULE: MethodDefName ::= Ident END; RULE: Parameter ::= Type ParamDefName END; RULE: ParamDefName ::= Ident END; This macro is defined in definitions 8, 9, 26, 31, 33, 42, 51, and 120. This macro is invoked in definition 10.
Our extension of NameLan introduces new defining occurrences, applied
occurrences, and ranges.
These changes require us to modify the scope rule for
This is the first section in which the exercises will ask you to create a specification file on your own. You will need to add the names of such specification files to those of the files supplied by the tutorial. We support that by adding the name of the file `Solutions.specs' to the tutorial's list of specification files: Specification files[27]== Solutions.specs This macro is defined in definitions 12, 16, 23, 27, 74, 78, 81, 91, 117, 121, 139, 142, 146, and 152. This macro is invoked in definition 13. `Solutions.specs' is not supplied by the tutorial, but by you. In the introdution to this manual, we suggested that you create a directory containing an empty file `Solutions.specs' before beginning the exercises. (If you decided to use a workbook, `Solutions.specs' should contain the name of the workbook file.) `Solutions.specs' is entirely under your control, as is your workbook if you use one. When you need to create a specification to be used in generating a processor, simply add its name to your `Solutions.specs' or define it in your workbook.
ExercisesThese exercises are based on files defined in the Tutorial. To obtain copies of those files in your current directory, enter Eli and give the following command:
-> $elipkg/Name/LearnSG%Procs > . None of these files will have write permission in your current directory. You will need to add write permission in order to do the exercises.
|