Eli   Documents Get Eli: Translator Construction Made Easy at SourceForge.net.
    Fast, secure and Free Open Source software downloads

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Guide for New Eli Users

Previous Chapter Next Chapter Table of Contents


Example of Eli Use

The example in this chapter illustrates how text processors are specified to Eli. Each section covers a major step in the development process, discussing the purpose of that step and then carrying it out for the example. A set of exercises is provided with each section. The purpose of these exercises is to familiarize you with the basic facilities that Eli provides for dealing with specifications, and how Eli is typically used.

All of the text used in the exercises can be obtained, and the exercises themselves can be carried out, using the facilities of Eli's system documentation browser described in the first set of exercises given below (see Exercises).

Statement of the problem to be solved

You need to classify a collection of several thousand words. There are no duplicate words in any class, but a given word may belong to more than one class. The classes have arbitrary names, and no two classes may have the same name.

A C program will manipulate the words and classes. Because of the application, classes will be relatively fluid. The customer expects that new words and classes will be added, and the classification of existing words changed, on the basis of experience and changing requirements. Nevertheless, the system design requires that the data be built into the C program at compile time.

The system designers have settled on an internal representation of the data involving an array for each class containing the words in that class as strings, an array containing the class names as strings, and an array containing the sizes of the classes as integers. The number of classes is also given. All of these data items are to be specified in a single header file. Here is an example of such a header file for a simple classification:

int number_of_sets = 3;

char *name_of_set[] = {
"colors",
"bugs",
"verbs"};

int size_of_set[] = {
3,
5,
4};

char *set_of_colors[] = {
"red",
"blue",
"green"};

char *set_of_bugs[] = {
"ant",
"spider",
"fly",
"moth",
"bee"};

char *set_of_verbs[] = {
"crawl",
"walk",
"run",
"fly"};

char **values_of_set[] = {
set_of_colors,
set_of_bugs,
set_of_verbs};

Although the meaning of the internal representation is straightforward, it is quite clear that making the necessary alterations will be a tedious and error-prone process. Any change requires compatible modifications of several arrays. For example, moving a word from one class to another means not only cutting and pasting the word itself, but also changing the sizes of both classes.

It would be simpler and safer to define the classification with a notation ideally suited to that task, and generate the header file from that definition. Here is an example of an obvious notation, defining the classification represented by the header file given above:

colors{red blue green}
bugs{ant spider fly moth bee}
verbs{crawl walk run fly}

The remainder of this chapter discusses the specification and generation of a program that translates class descriptions into header files. This program must accept a class description, verify that class names are unique and that there is only one occurrence of any given word in a class, and then write a header file defining the appropriate data structure. Its specification is broken into four parts, each stored in a separate file:

`sets.con'
A context-free grammar describing the structure of the input text.

`word.gla'
A specification of the character sequences that are acceptable words, and how those character sequences should be represented internally, plus a specification of the character sequences to be ignored.

`symbol.lido'
A specification of the context conditions on uniqueness of set names and elements within a single set.

`code.fw'
A specification of the form of the output text and how it is constructed.

File `sets.specs' lists the names of these four files and contains requests to instantiate three library modules that support them.

Exercises

Invoke Eli and type ?, followed by a carriage return. This request will begin a documentation browsing session. Use the browser's goto command to place yourself at node (novice)tskex (Browser commands are described in Some Advanced Info Commands of Info.)

If you are using a computer with multiple-window capability, your documentation browsing session is independent of your Eli session, so you can simultaneously browse the documentation and make Eli requests. Otherwise you must terminate the document browsing session in order to make an Eli request. In that case, you might want to make a note of your current node (given in the highlighted status line) before exiting. When you begin a new session, you can then use the g command to go directly to that node.

  1. Use the documentation browser's run command to obtain a copy of the complete specification. You will use this copy to do the exercises in the remainder of this chapter.

  2. Verify that you have obtained all of the specification files by making the following Unix request via Eli (see Running Unix commands from Eli of User Interface Reference Manual):

    -> !ls
    

  3. Examine the file that is not a part of the specification by requesting Eli to display it on screen (see Copying to Standard Output of User Interface Reference Manual):

    -> input>
    

Specifying the desired phrase structure

The first step in specifying a problem to Eli is to develop a context-free grammar that describes the phrase structure of the input text. This structure must reflect the desired semantics, and it must be possible for Eli to construct a parser from the grammar. Grammar development is a surprisingly difficult task. It is best to concentrate on the meaning of the tree structure as you are developing the grammar, and not try to economize by using the same symbols to describe constructs that look the same but have different meanings.

One possible description of the structure of the set definition text is (see How to describe a context-free grammar of Syntactic Analysis):

text: set_defs .
set_defs: set_def / set_defs set_def .
set_def: set_name '{' set_body '}' .
set_name: word .
set_body: elements .
elements: set_element / elements set_element .
set_element: word .

Here each set definition is described as a set_name followed by a bracketed set_body. The text will be made up of an arbitrary number of such definitions. A set_body, in turn, consists of an arbitrary number of elements. The set_name and each set_element is a word.

This structure represents the semantics of the input text: Set names and set elements have different meanings, even though they are both written as words. Set bodies are significant units, even though they have the same form as any subset of themselves. The following specification would not reflect the semantics, even though it is simpler and describes the same input language:

text: set_defs .
set_defs: set_def / set_defs set_def .
set_def: word '{' elements '}' .
elements: word / elements word .

Exercises

To get information about whether Eli can construct a parser from a grammar without actually trying to build the whole program, use the :parsable product (see parsable of Products and Parameters Reference Manual). In order to be parsable, the grammar must satisfy the LALR(1) condition. If the LALR(1) condition is satisfied, parsable will indicate that fact. Otherwise it will say that the grammar is not LALR(1) and provide a listing of conflicts (see How to Resolve Parsing Conflicts of Syntactic Analysis).

  1. Tell Eli to explain what it is doing, and request verification that Eli can generate a parser. Append > to the derivation to tell Eli to copy the results on the screen.

    -> LogLevel=4
    -> sets.specs :parsable>
    

    When the process is complete, repeat the last request. To save keystrokes, you can scroll through the history (see The History Mechanism of User Interface) using the up- and down-arrow keys.

    Explain the very different responses to the two requests for verification of parsability.

  2. Request an editing session on the file `sets.con' (see Editing with the Copy Command of Eli User Interface Reference Manual):

    -> sets.con<
    

    Delete text from the first rule of the grammar, leaving the colon and everything following it unchanged. Then request execution as before, scrolling through the history:

    -> sets.specs:parsable>
    

    Why was Eli's response so much shorter than before?

  3. Obtain help diagnosing the error you created by deleting text:

    -> sets.specs :parsable :help
    

    This request will start a documentation browsing session. Follow the menu to the display for the file in error, and use the edit command to gain access to that file. Correct the error by inserting text before the colon, exit the editor, and then quit the browsing session. Use the Eli history mechanism to repeat the last request:

    -> sets.specs :parsable :help
    

    Explain Eli's response to this request. Why was no documentation browsing session started? Why was the response so much shorter than the response to the original request for derivation and execution of the processor?

  4. Your request for help after fixing the error really didn't demonstrate that the grammar was parsable, because it didn't show you the result. Request the result:

    -> sets.specs :parsable>
    

    Explain Eli's response to this request. What steps were carried out to satisfy it? Why were these the only ones necessary?

  5. Delete the braces { } from the rule defining set_def in file `sets.con'. This change makes the entire input text nothing but a list of words, with no differentiation between set names and elements or between different sets. Eli will not be able to generate a parser from this grammar. Request verification of parsability to see the error report:

    -> sets.specs :parsable >
    

    A shift-reduce conflict is a situation in which the parser can't tell whether it should recognize a complete phrase or continue to add symbols to an incomplete phrase. In this example, the next word is either the name of a new set (and thus the current elements phrase is complete), or it is another set element (and thus belongs to the current elements phrase).

    Add a comma as a separator between set elements, but do not re-introduce the braces. Do you think that Eli will be able to generate a parser? Briefly explain, and then verify your answer. (For a more complete treatment of conflict resolution, see How to Resolve Parsing Conflicts of Syntactic Analysis.)

    Restore the original specification, or change `input' to conform to your new specification, to ensure correct behavior for later exercises.

Nonliteral character sequences and comments

The terminal symbols of the grammar are the literal braces and the nonliteral symbol word. Eli can easily deduce that the braces are significant characters, but we must provide a definition of the significant character sequences that could make up a word. We must also describe how to capture the significant information in a word for further processing.

Eli normally assumes that white space characters (space, tab and newline) are not significant. If we want to provide a facility for commenting a classification then we must additionally define the form of a comment and specify that character sequences having this form are also not significant.

Here is one possible description of the non-literal character sequences and comments:

word:   $[a-zA-Z]+      [mkidn]
        C_COMMENT

The first line defines a word to be any sequence of one or more letters (see Regular Expressions of Lexical Analysis). Whenever such a sequence is recognized in the input text, mkidn is invoked to capture the significant information represented by the sequence. This processor associates an integer with the recognized sequence, and arranges for that integer to become the value representing the character sequence. If two character sequences recognized as words are identical, mkidn will represent them with the same integer; distinct sequences are represented by different integers.

The second line of the specification does not begin with a symbol followed by :, which indicates that the character sequences it describes are not significant. It uses a canned description to describe character sequences taking the form of C comments (see Canned Symbol Descriptions of Lexical Analysis). Thus any character sequence taking the form of a C comment will be ignored in the input text read by the generated program.

Exercises

  1. Tell Eli to keep quiet about what it is doing, and then ask it to run the processor derived from your specification:

    -> LogLevel=2
    -> input +cmd=(sets.specs :exe) :stdout >
    

  2. The sample input file does not contain either comments or errors. Introduce an error by inserting a digit into one of the words of the example and repeat your request:

    -> input<
    -> input +cmd=(sets.specs :exe) :stdout >
    

    Briefly explain Eli's response to this request.

  3. Verify that the generated processor correctly handles C comments.

  4. Change the specification to accept only words that begin with upper-case letters. Generate a translator and verify its correctness.

  5. Change the specification to allow Ada comments instead of C comments. (Hint: see Canned Symbol Descriptions of Lexical Analysis.) Generate a translator and verify its correctness.

Managing source text definitions

The statement of the problem requires that the names of the classes be unique, and that there be only one occurrence of a given word in a given class. This sort of condition is very common in translation problems. It involves recognition of regions and specific entities within those regions. For example, a set_body is a region and a set_element is a specific entity within that region. The check to be made is that no set_element appears more than once in any set_body.

Regions are defined by the grammar. Entities may be defined both by the grammar and by the values representing the terminal symbols: the grammar selects a particular kind of phrase, while the instances of this phrase are differentiated by the values of their terminal symbols. Some computation must be carried out over the region to verify the condition. This computation is standard, involving only the concepts of region and entity, so a generic module can be used to carry it out.

In order to use a generic module we must instantiate that module and connect the concepts it provides to the specification of our problem. Instantiation is handled in the `sets.specs' file:

$/Name/AlgScope.gnrc :inst
$/Prop/Unique.gnrc :inst

The AlgScope module provides the concept of nested regions containing entities distinguished by integer values (see Algol-like Basic Scope Rules of Specification Module Library: Name Analysis). The Unique module provides the concept of an error for entities that appear more than once in a region (see Check for Unique Object Occurrences of Specification Module Library: Properties of Definitions).

An attribute grammar fragment is used to connect the concepts provided by these two modules to the specification of the phrase structure:

ATTR Sym: int;

SYMBOL Entity INHERITS IdDefScope, Unique COMPUTE
  SYNT.Sym=TERM;
  IF(NOT(THIS.Unique), 
     message(ERROR,"Multiply-defined word", 0, COORDREF));
END;

SYMBOL text INHERITS RootScope, RangeUnique END;
SYMBOL set_body INHERITS RangeScope END;
SYMBOL set_name INHERITS Entity END;
SYMBOL set_element INHERITS Entity END;

The symbol Entity, which does not occur in the context-free grammar, is used to represent the concept of a word that must be unique within some region: a set_name must be unique over the region lying outside of all sets, while a set_element must be unique over the set in which it appears. Entity allows us to gather all of the characteristics of that concept in one place -- a symbol attribution -- and then use inheritance to associate those characteristics with the symbols that embody the concept.

An Entity appears in the input text as a word, which is represented internally by the value representing the terminal symbol word. That value was established by mkidn when the word was recognized (see Nonliteral character sequences and comments).

The two symbols inheriting Entity, set_name and set_element, are defined by rules that have terminal symbols on their right-hand sides. TERM can be used in a symbol computation to represent the value of a terminal symbol appearing on the right-hand side of any rule defining the given symbol (see Terminal Access of LIDO Reference Manual). Thus SYNT.Sym=TERM sets the Sym attribute of the Entity to the value representing the terminal symbol defining that Entity. (SYNT means that the computation takes place in the lower context of the symbol, i.e. in the rules corresponding to the phrases set_name: word and set_element: word. See Types and Classes of Attributes of LIDO Reference Manual.)

The concept of a definition within a region is embodied in the symbol IdDefScope exported by the AlgScope module, while the concept that such a definition must be unique is embodied in the symbol Unique, exported by the Unique module. Thus Entity inherits from these two symbols.

If the Unique attribute of the Entity is false, then the message operation is invoked to output an error report. The report has severity ERROR, which indicates that a definite error has been detected and therefore no object code should be produced (see Source Text Coordinates and Error Reporting of Library Reference Manual). It is placed at the source text coordinates (line and column) represented by COORDREF, and consists of the string Multiply-defined word. COORDREF always refers to the location of the leftmost character of the phrase corresponding to the rule in which it appears. Since this computation is inherited by set_name and set_element, it appears in rules corresponding to the phrases set_name: word and set_element: word. Any error report will therefore refer to the leftmost character of the multiply-defined word.

The text is the outermost region (RootScope), within which the set names are defined. Each set body is an inner region (RangeScope), within which the set elements are defined. Therefore text inherits the RootScope computations and set_body inherits the RangeScope computations.

A set_name must be unique within the text and a set_element must be unique within its set_body, so both text inherits the RangeUnique computations.

Exercises

  1. Create a subdirectory `src' and then request source code for the set processor (see Running Unix commands from Eli of User Interface Reference Manual):

    -> !mkdir src
    -> sets.specs :source >src
    

    Does `src' really contain a version of the translator that is independent of Eli? How can you be certain?

  2. Try deriving source code without first creating a directory:

    -> sets.specs :source >foo
    

    What was the result? Can you explain what happened?

  3. Request an executable version of the translator:

    -> sets.specs :exe >sets.exe
    

    Is this executable independent of Eli? How can you be certain?

Creating structured output text

At least two specifications, and sometimes more, are needed to create structured output text. The form of the text must be described, along with the source of its components. Eli generates a set of procedures from the description of the form of the text, one for each kind of component. The information about the components is then provided by a computation that invokes these procedures with appropriate arguments.

We have already seen how Eli allows us to break the specification into files that encapsulate individual tasks. So far, each of the tasks has required only one kind of specification. Here, however, we have a single task that requires at least two specifications. There is strong coupling between these specifications, however, so that a change to one will often involve a change to the other. The solution is to combine such related specifications into a single file of type `fw'. A type-`fw' file describes a set of specification files that Eli will separate as necessary, but which the user can manipulate as a single entity.

The variable atoms of the generated C code are integers (like the number of elements in a set) and strings (like the elements of a set). LeafPtg (see PTG Output of Leaf Nodes of Specification Module Library: Generating Output) is a generic module that provides operations useful in writing such atoms, so it is instantiated in file `sets.specs':

$/Output/LeafPtg.gnrc :inst

Three related specifications are used to describe the creation of the C declarations. First the general form of the declarations is given in a special language called PTG, then there is an attribute grammar fragment that computes the individual components, and finally two C macros are needed to implement operations used in the computations. All three specifications are combined in a single type-`fw' file called `code.fw':

@O@<code.ptg@>@{
Table:
  "int number_of_sets = " $/*integer*/ ";\n\n"
  "char *name_of_set[] = {\n"
  $/*list of set names*/ "};\n\n"
  "int size_of_set[] = {\n"
  $/*list of set sizes*/ "};\n\n"
  $/*list of sets*/
  "char **values_of_set[] = {\n"
  $/*list of set representations*/ "};"

Set:
  "char *set_of_" $/*set name*/ "[] = {\n"
  $/*list of set elements*/ "};\n\n"

Seq:
  $ $

List:
  $ ",\n" $

Quoted:
  "\"" $ "\""

Name:
  "set_of_" $
@}

@O@<code.lido@>@{
ATTR Ptg: PTGNode;
SYMBOL Entity INHERITS IdPtg END;

SYMBOL text COMPUTE
  IF(NoErrors,
    PTGOut(
      PTGTable(
        PTGNumb(CONSTITUENTS set_name.Sym WITH (int, ADD, ARGTOONE, ZERO)),
        CONSTITUENTS set_name.Ptg WITH (PTGNode, PTGList, PTGQuoted, PTGNull),
        CONSTITUENTS set_body.Size WITH (PTGNode, PTGList, PTGNumb, PTGNull),
        CONSTITUENTS set_def.Ptg WITH (PTGNode, PTGSeq, IDENTICAL, PTGNull),
        CONSTITUENTS set_name.Ptg WITH (PTGNode, PTGList, PTGName, PTGNull))));
END;

ATTR Size: int;
SYMBOL set_body COMPUTE
  SYNT.Size=CONSTITUENTS set_element.Sym WITH (int, ADD, ARGTOONE, ZERO);
END;

SYMBOL set_def COMPUTE
  SYNT.Ptg=
    PTGSet(
      CONSTITUENT set_name.Ptg,
      CONSTITUENTS set_element.Ptg WITH (PTGNode, PTGList, PTGQuoted, PTGNull));
END;
@}

@O@<code.HEAD.phi@>@{
#include "err.h"
#define NoErrors (ErrorCount[ERROR]==0)
@}

The PTG specification, which is introduced by @O@<code.ptg@>@{ and terminated by @}, is simply a collection of parameterized templates for output. Each template is given a name, and consists of a sequence of items that will be output in the given order. Quoted C strings are output as they stand, and each $ stands for one parameter. A text fragment is constructed according to a particular template by invoking a function whose name is PTG followed by the template name. This function returns a value of type PTGNode, and must be provided with one argument of type PTGNode for each parameter.

To construct a text fragment according to the template named Quoted, for example, invoke PTGQuoted with one argument of type PTGNode. The result will be a value of type PTGNode that describes the text fragment ", followed by the text fragment described by the argument, followed by ".

The attribute grammar fragment, which is introduced by @O@<code.lido@>@{ and terminated by @}, invokes the PTG functions with appropriate arguments. PTGNumb and PTGName are defined by the LeafPtg module (see PTG Output of Leaf Nodes of Specification Module Library: Generating Output). They construct values of type PTGNode that describe the text fragment consisting of a single integer value or word respectively. These text fragments are then used in building larger text fragments, and so on.

The translated output should be produced only if no errors were detected by the translator. NoErrors is a user-defined macro that tests whether any reports of severity ERROR were issued. NoErrors must be defined as a C macro, in a file of type `.HEAD.phi'. This can be done by a segment of the FunnelWeb file introduced by @O@<code.HEAD.phi@>@{ and terminated by @}.

The error module maintains an array ErrorCount, indexed by severity, each of whose elements specifies the number of reports issued at the corresponding severity (see Source Text Coordinates and Error Reporting of Library Reference Manual). If the ERROR element of this array is 0, then no reports of severity ERROR have been issued.

In addition to building the C declarations, the attribute grammar fragment computes the total number of sets and the total number of elements in each set:

ATTR Size: int;
SYMBOL set_body COMPUTE
  SYNT.Size=CONSTITUENTS set_element.Sym WITH (int, ADD, ARGTOONE, ZERO);
END;

ADD, ARGTOONE and ZERO are built-in functions of Eli (see Predefined Entities of LIDO Reference Manual).

This computation visits nodes of the subtree rooted in the set_body node. If the node is a set_element node, function ARGTOONE is applied to the value of the Sym attribute to yield the integer value 1. If the node has no set_element descendants, then its descendants are not visited and the function ZERO is invoked with no arguments to yield the integer value 0. Otherwise the integers computed for the children of the node are combined pairwise in left-to-right order via the function ADD. See CONSTITUENT(S) of LIDO Reference Manual.

Exercises

  1. Request the code generated by the processor from the specification:

    -> sets.specs :gencode :viewlist
    

    Find the files generated from code.fw and verify the content of code.HEAD.phi. (Hint: Re-read the discussion of code.fw.)

  2. Which file contains the definition of the function PTGQuoted? (Hint: Use grep, or create a `tags' file and then give the command vi -t PTGQuoted.)

  3. Briefly explain the operation of PTGQuoted. How is the text fragment created by this function printed?

  4. Use lint to search for anomalies in the collection of C routines. Are any of those found significant? Explain briefly.

  5. Request an interactive debugging session:

    -> sets.specs +debug :dbx
    

    (If you prefer to use the GNU debugger gdb, simply replace dbx with gdb).

    Set breakpoints to stop the program in PTGQuoted, _PrPTGQuoted and PTGOut. Run the program with the name of file `input' as its command line argument. What is the order of the calls to these three routines? Explain briefly.

  6. Redefine the output text so that each array value is indented, and the closing brace is at the beginning of a line. For example, the array of set names should look like this:

    char *name_of_set[] = {
            "colors",
            "bugs",
            "verbs"
    };
    

    Generate a translator and verify its correctness.


Previous Chapter Next Chapter Table of Contents