Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Guide for New Eli Users

Next Chapter Table of Contents


How Eli Creates a Text Processing Program

The program generated by Eli reads a file containing the text, examining it character-by-character. Character sequences are recognized as significant units or discarded, and relationships among the sequences are used to build a tree that reflects the structure of the text. Computations are then carried out over the tree, and the results of these computations determine the processor's output. Thus Eli assumes that the original text processing problem is decomposed into the problems of determining which character sequences are significant and which should be discarded, what tree structure should be imposed on significant sequences, what computations should be carried out over the resulting tree, and how the computed values should be encoded and output.

A user describes a particular text processing problem to Eli by specifying the characteristics of its subproblems. For example, the tree structure that should be imposed on significant character sequences is specified to Eli by providing a context-free grammar. Different subproblems are characterized in different ways, and they are specified by descriptions written in different languages. Those specifications are processed by a variety of tools, which verify their consistency and generate C-compatible C++ code to solve the specified subproblems.

Eli focuses the user's attention on specification by automating the process of tool invocation. It responds to a request for a piece of computed information, called a derived object, by invoking the minimum number of tools necessary to produce that information. Derived objects are automatically stored by Eli for later re-use, significantly improving the response time for subsequent requests.

This chapter provides an overview of the most important subproblems and how they can be described, summarizes the available descriptive mechanisms and explains the most common derived objects, and indicates how Eli is used to create and test programs.

How to Decompose a Text Processing Problem

There is considerable variability in the specific decompositions for particular text processing problems. For example, one subproblem of the compilation problem for many programming languages is overload resolution: how to decide that (say) a particular + operator means integer addition instead of real addition. Overload resolution would probably not be a subproblem of the problem of creating a PostScript version of a musical score from a description of the desired notes. This section briefly reviews five major subproblems common to a wide range of problems:

Syntactic analysis
Determining the structure of an input text

Lexical analysis
Recognizing character sequences

Attribution
Computing values over trees

Property storage
Maintaining information about entities

Text generation
Producing structured output

Syntactic analysis determines the phrase structure of the input text. The phrase structure is described to Eli by a context-free grammar. Here is a context-free grammar describing the structure of a geographical position (see Context-Free Grammars and Parsing of Syntactic Analysis):

Position: Latitude Longitude .
Latitude: NS Coordinate .
Longitude: EW Coordinate .
NS: 'N' / 'S' .
EW: 'E' / 'W' .
Coordinate: Integer Float .

This grammar has eight symbols (Position, Latitude, Longitude, NS, EW, Coordinate Integer and Float), four literals ('N', 'S', 'E' and 'W') and eight rules (x: y / z . is an abbreviation for the two rules x: y . and x: z .). Two of the symbols, Integer and Float, are not defined by the grammar. One of the symbols, Position, is not used in the definition of any other symbol.

Symbols that are not defined by the grammar are called terminal symbols; those defined by the grammar are called nonterminal symbols. The (unique) symbol not used in the definition of any other symbol is called the axiom of the grammar.

The entire input text is called a sentence, and corresponds to the axiom. Thus the input text N41 58.8 W087 54.3 is a sentence corresponding to Position. A phrase is a portion of the sentence corresponding to a nonterminal symbol of the grammar. For example, N41 58.8 is a phrase corresponding to Latitude, and 087 54.3 is a phrase corresponding to Coordinate. 54 and 58.8 W087 are not phrases because they do not correspond to any symbols. (54 is a part of the phrase 087 54.3 corresponding to Coordinate and 58.8 W087 is made up of part of the phrase corresponding to Latitude and part of the phrase corresponding to Longitude.)

Lexical analysis is the process that examines the source program text, retaining significant character sequences and discarding those that are not significant. A character sequence is significant if it corresponds to a literal in the grammar or to a terminal symbol. In the sentence N41 58.8 W087 54.3, the significant character sequences are N, 41, 58.8, W, 087 and 54.3. The spaces separating the numbers of a Coordinate and preceding the Longitude are not significant.

Eli obtains information about character sequences that correspond to literals directly from the grammar. The user must, however, provide descriptions of the character sequences corresponding to each terminal symbol. Those descriptions determine the form of the character sequences, but not their content. For example, the character sequences corresponding to the terminal symbol Integer might be described as sequences of one or more decimal digits, and those corresponding to the terminal symbol Float might be described as pairs of such sequences separated by a dot (see Regular Expressions of Lexical Analysis):

Integer:  $[0-9]+
Float:    $[0-9]+"."[0-9]+

Computations may be specified over a tree that reflects the phrase structure of a sentence. That tree has one node corresponding to each phrase, with the root corresponding to the sentence. For example, the tree that reflects the structure of the sentence N41 58.8 W087 54.3 has a node corresponding to the Latitude phrase N41 58.8. Children of a node correspond to the immediate component phrases of the phrase corresponding to that node. Thus the children of the the node corresponding to the Latitude phrase N41 58.8 would correspond to the phrases N and 41 58.8, because the immediate components of the Latitude phrase N41 58.8 are an NS phrase (N) and a Coordinate phrase (41 58.8).

One or more values, called attributes, may be associated with each tree node. Computations over the tree may involve only attribute access, C constants, and function application. Here is an example:

RULE DegMin: Coordinate ::= Integer Float
COMPUTE
  Coordinate.minutes=Minutes(Integer, Float);
END;

According to the rule DegMin, Coordinate has a minutes attribute attached to it. The minutes attribute of Coordinate is computed by applying the function Minutes to the values representing Integer and Float, but how are those values obtained? Clearly they must depend on the character sequences corresponding to these terminal symbols.

A single integer value can be used to represent any terminal symbol. That value is determined by a token processor whose name is attached to the definition of the character sequences corresponding to the symbol, enclosed in brackets (see Token Processors of Lexical Analysis):

Integer:  $[0-9]+           [mkstr]
Float:    $[0-9]+"."[0-9]+  [mkstr]

The processor mkstr is a library routine that stores the character sequence in an array and yields the sequence's index. Minutes can then use the index values to obtain the stored strings, convert them to numbers, and perform the necessary computation.

Often an input text describes some set of entities and the relationships among them. For example, a program in a conventional language may describe some set of constants, variables, types and procedures, and how these entities are related in the execution of an algorithm. Entities may be defined by one part of the input text and used by another. It is therefore necessary for computations over the tree representing the phrase structure of a sentence to be able to refer to entities and their properties at arbitrary points. Eli provides a definition table to meet this requirement (see Definition Table Design Criteria of Definition Table).

Each entity can be represented by a unique definition table key, which allows access to arbitrary information about that entity. The Eli user specifies the information that might be stored, and possibly a set of query and update operations for that information. (Eli provides a standard query operation and a standard update operation that suffice for most purposes.)

Library modules are available for associating unique definition table keys with identifiers according to the scope rules of the input text, and for maintaining various kinds of information about entities. Definition table keys themselves can be stored as attributes, compared for equality, passed to query and update routines, and accessed either directly or remotely. A distinguished value, NoKey, represents the absence of a definition table key.

Output may be derived from arbitrary information in the input text. The elements of the output may be arranged in arbitrary ways based on computations over the tree representing the phrase structure of a sentence. It is therefore useful to be able to build up a complex output data structure piecemeal, combining components according to information gleaned from computation.

The program text generation facility allows the user to specify templates that describe output text fragments (see Pattern Specifications of PTG: Pattern-Based Text Generator). "Holes" in these templates can be filled with text generated according to other templates. The result is a directed, acyclic graph in which each node represents a single template and the children of that node represent generated text fragments. Text at the leaves can be generated by arbitrary user-supplied routines. (A library module provides common leaf generation routines.)

Eli generates a set of functions, one per template, that are invoked during computations to build the directed, acyclic graph. These functions return values that can be stored as attributes, passed to text generation functions, and accessed either directly or remotely. A distinguished value, PTGNULL, represents the absence of a graph.

Printing functions are also provided by Eli to output the generated text on an arbitrary file (including the standard output unit). These functions accept a graph and perform a depth-first, left-to-right traversal. The text is output during the traversal, with text generated by a common subgraph being output once for each parent of that subgraph.

Descriptive Mechanisms Known to Eli

The Eli user describes the subproblems of a particular text processing problem. Eli derives code fragments from these descriptions, and combines them into a complete program. Each description is given in a notation that is ideally suited to the subproblem being described.

Subproblem descriptions are placed into files, each of which has a type. The type is indicated by the file name extension: `foo.c' is a type-`c' file. Eli recognizes type-`c' and type-`h' files as C program text and include files respectively. Here is a list of the other file types recognized by Eli:

`specs'
A collection of object names, one per line.

`con'
A description of the phrase structure of the input text. Each phrase may be associated with a computation to be carried out when that phrase is recognized. Eli generates a parser from these specifications. See Syntactic Analysis.

`gla'
A description of character sequences, whether they are meaningful or not, and what (if any) computation should be carried out when they are recognized in the input text. Eli generates a scanner from these specifications. See Lexical Analysis.

`lido'
A description of the structure of a tree and the computations to be carried out on that tree. Eli generates a tree-walking evaluator from these specifications. For a discussion on constructing computations in trees, see LIDO - Computation in Trees. For reference, see LIDO - Reference Manual.

`ctl'
Constraints on evaluator generation. Eli uses these specifications to modify its behavior when constructing the routine that carries out the computations. See LIGA Control Language Reference Manual.

`ptg'
A description of structured output text. Eli generates a set of output functions from these specifications. See Pattern-Based Text Generator.

`pdl'
A definition of entities and their associated properties. Eli generates a definition table module from these specifications. See PDL Reference Manual.

`oil'
A definition of possible tree node re-labeling. Eli generates a re-labeling module from these specifications. See OIL Reference Manual.

`clp'
A description of the meanings of command line arguments. Eli generates a module that accesses command line arguments from these specifications. See CLP Reference Manual.

`map'
A description of the relationship between the phrase structure of the input text and the structure of the tree over which computations are to be made. Eli uses this specification to determine the tree building actions that must be attached to rules of the parsing grammar. See Mapping of Syntactic Analysis.

`sym'
This is provided for backward compatibility with previous Eli releases for specifying symbolic equivalence classes. It is superseded by type-`map' files. See Specifying symbolic equivalence classes of Syntactic Analysis.

`delit'
Specifies literals appearing in a type-`con' file that are to be recognized by special routines. Each line of a type-`delit' file consists of a regular expression (see Regular Expressions of Lexical Analysis) optionally followed by an identifier. The regular expression defines the literal to be recognized specially. A #define directive making the identifier a synonym for that literal's syntax code is placed in the generated file `litcode.h'.

`str'
Specifies initial contents of the identifier table. Each line of a type-`str' file consists of two integers and a sequence of characters. The first integer is the syntax code to be returned by mkidn (see Unique Identifier Management of Library Reference Manual), and the second is the length of the character sequence. The integer representing the length is terminated by a single space. The character sequence begins immediately after this space, and consists of exactly the number of characters specified by the length.

`gnrc'
Defines a generic module. Generic modules can be instantiated to yield collections of specifications that solve specific problems.

`fw'
Combines a collection of strongly-coupled specifications with documentation describing their relationships. Eli splits these specifications according to their types and processes them individually. It can also create a printed document or on-line hypertext from a type-`fw' file.

`phi'
Material to be included in some part of the generated processor. Specification files of this type should have a name consisting of three parts: foo.bar.phi. All the files whose names end in `bar.phi' are concatenated together in arbitrary order to form a file named bar.h. An #include directive can then be used to incorporate `bar.h' into any generated file.

`.phi'-file-parts may also be generated by different Eli-Tools. They may only be used in files of type `.h' and `.c'. They are automatically protected against multiple inclusion.

`eta'
Material to be included in some part of the specifications. Specification files of this type should have a name consisting of three parts: foo.bar.eta. All the files whose names end in `bar.eta' are concatenated together in arbitrary order to form a file named bar.eta.h. An #include directive can then be used to incorporate `bar.eta.h' in any specification file.

`.eta'-file-parts can be used in any specification file with the exception of `.specs' and `.fw'-files. The generated include-files are not protected against multiple inclusion.

Any of these files can contain C-style comments and preprocessor directives such as #include, #define and #ifdef. The C preprocessor is applied to files of all types except type-`fw' before those files are examined. C-style comments and preprocessor directives appearing in type-`fw' files are passed unchanged to the files generated from the type-`fw' file.

Eli includes three pre-defined header files, which are usually generated from type-`phi' specifications, in specified places:

`HEAD.h'
Included at the beginning of the main program, the tree constructor, and the attribute evaluator. This header file is used primarily to define abstract data types used in tree computation.

`INIT.h'
Included at the beginning of the main program's executable code. `INIT.h' may contain declarations, but only if they appear at the beginning of compound statements that lie wholly within `INIT.h'. The content of `INIT.h' will be executed before any other code. Its primary purpose is to initialize abstract data types used in tree computation.

`FINL.h'
Included at the end of the main program's executable code. `FINL.h' may contain declarations, but only if they appear at the beginning of compound statements that lie wholly within `FINL.h'. The content of `FINL.h' will be executed after all other code. Its primary purpose is to finalize abstract data types used in tree computation.

Common Derived Objects

Eli recognizes three kinds of object: a file, a string and a list. Examples of files are a specification file such as those mentioned in the previous section, an executable binary file, or an output file from a test run. A flag to a command is an example of a string. Lists are ordered sequences of objects, such as the arguments to a command or the Makefile, C files, and header files that implement a text processor.

Source objects can be created or modified directly by the user. They can be regular files, directories, or symbolic links. Source objects cannot be automatically recreated by Eli; they are the basic building blocks from which Eli creates all other objects. Every source object is given a type by Eli based on its host filename, and this type determines what derived objects can be produced from the source object.

The file type of a source file is the longest suffix of the file name that matches one of the source type suffixes listed in the last section. If no suffix match is found, the file type is empty.

Derived objects are objects that can be produced from source objects and other derived objects through the invocation of one or more tools. Tools are invoked only as needed to create a specified derived object. Eli automatically caches derived objects for re-use in future derivations. Derived objects are created and modified only by Eli itself, not by users.

A derived object is named by an odin-expression (see Referring to Objects of User Interface). Lexically, an odin-expression is composed of a sequence of identifier and operator tokens, and is terminated by a newline character. An odin-expression can be continued on multiple lines by escaping each newline character with a backslash. This backslash (but not the newline) is deleted before the expression is parsed. Multiple odin-expressions can be specified on the same line by separating them with semicolon operators.

An identifier token is just a sequence of characters. The following characters must be escaped to be included in an identifier:

: + = ( ) / % ; ? $ < > ! # \ ' space tab newline

A single character can be escaped by preceding it with a backslash (e.g. lost\+found). A sequence of characters can be escaped by enclosing them in single quote marks (e.g. 'lost+found').

Unescaped white space characters (spaces, tabs, and newlines) are ignored during parsing except when they separate adjacent identifiers.

Here are a number of odin-expressions that name common objects derived from the same collection of specifications (all of the spaces are redundant). The identifier following a colon (:) is an object type (or product) that characterizes the properties of the derived object, while the identifier following a plus (+) is a parameter type that modifies those properties without changing the nature of the derived object. (For the characteristics of all of the products and parameters defined by Eli, see Top of Products and Parameters.)

sets.specs :exe
is the executable program generated by Eli from the specifications enumerated in sets.specs. It is a normal program for the machine on which it was generated, and is independent of Eli.

sets.specs :source
is a set of C files, a set of header files, and a Makefile. The result of running make with this information is the executable program generated by Eli from the specifications enumerated in sets.specs.

sets.specs :exe :help
is a browser session that helps to explain inconsistencies in the specifications enumerated in sets.specs. It provides cross-references to on-line documentation and allows you to invoke an editor on the proper files to make corrections.

. +cmd=(sets.specs :exe) (input) :run
is the result of running the program generated by Eli from the specifications enumerated in sets.specs as a command with the file `input' from the current directory as an argument. The name of the directory (in this case ., the name of the current directory) in which the program is to be executed precedes the parameter that defines the command to be executed.

sets.specs +monitor +arg=(input) :mon
is an interaction with the program generated by Eli from the specifications enumerated in sets.specs, as it processes the data file `input'. This interaction allows you to follow the execution at the level of your specifications, rather than at the level of the machine on which the program is running.

sets.specs +debug :dbx
is an interaction with the program generated by Eli from the specifications enumerated in sets.specs using the symbolic debugger of the machine on which the program is running. It is useful when some of your specifications are written directly in C. (Replace dbx with gdb to use the GNU symbolic debugger.)

sets.specs :gencode :viewlist
is an interactive shell executing in a directory containing all text files generated by Eli from the specifications enumerated in sets.specs. This facility is sometimes useful in diagnosing compiler errors due to type mismatches.

sets.specs :exe :err >
is the raw set of reports generated by inconsistencies in the specifications enumerated in sets.specs, written to the screen. (It would be sent to your editor if you replaced > with <.) This display is sometimes useful if the reports are garbled by the help derivation.

sets.specs :exe :warn >
is the raw set of reports generated by anomalies in the specifications enumerated in sets.specs, written to the screen. (It would be sent to your editor if you replaced > with <.) This display is sometimes useful if the reports are garbled by the help derivation.

How to Request Product Manufacture

Eli is invoked by giving the command eli. If you have never used Eli before, it will have to establish a cache (see Hints on Cache Management). This process is signaled by a long sequence of messages about installing packages, followed by a note that the packages have been compiled.

After printing an identifying banner, Eli writes the prompt -> and waits for input. The interactive session can be terminated by responding to the -> prompt with a ^D, and you can browse the documentation by responding with a question mark.

Entering a derived object name in response to the -> prompt constitutes a request to bring that derived object up to date with respect to all of the source objects on which it depends. Eli will carry out the minimum number of operations required to satisfy this request. When the next -> prompt appears, the given object will be up to date.

Bringing an object up to date does not yield a copy of that object. To obtain a copy, you must add an output request. The precise form and effect of an output request depends on whether the object being output is a file object or a list object. All source objects are file objects; to find out the kind of a derived object, consult Top of Products and Parameters.

Here are some examples of common output requests:

sets.specs :parsable >
requests that the derived file object sets.specs :parsable be written to the standard output (normally the screen). Garbage will result if the derived object is not a text file.

sets.specs :exe > trans
requests that the derived file object sets.specs :exe be written to file `trans'. The derived object must be a file object, but it need not be text. If the file `trans' does not exist, it will be created; if it does exist, it will be overwritten if its content differs from that of the derived object sets.specs :exe. If `trans' exists and is a directory, a file named sets.specs.exe will be written to that directory (see Extracting and Editing Objects of User Interface).

sets.specs :source > src
requests that the derived list object sets.specs :source be written to directory `src'. The directory `src' must exist. A file in `src' before the request will be overwritten only if it has the same name as one of the file objects in the list sets.specs :source, but different content. (Normally, `src' would be either an empty directory or one that contains an earlier version of sets.specs :source.)

sets.con <
requests that your current editor be invoked on the object sets.con

How to Invoke Unix Commands

While one is interacting with Eli, there are a number of situations in which one wishes to execute normal operating system commands. These commands can be executed with or without derived objects as arguments. We have already seen the most general form, a derived object that is an execution of an arbitrary command in an arbitrary directory (see Common Derived Objects). Although this facility is general enough to handle any command execution, it is cumbersome for simple commands.

The ! character introduces a host command line (see Unix Commands of User Interface). If the first non-white space character following the ! is not :, ; or = then the rest of the line is treated as a single, escaped sequence of characters. This avoids the confusion resulting from interactions between the escape conventions of host commands and odin-expressions. A leading :, ;, = or whitespace can be included in the escaped sequence by preceding it with \.

If the name of a file object precedes the ! character, that object is brought up to date and the name of a file containing it is appended to the host command line.

Here are examples of some typical command invocations using !:

!ls
lists the files in the current directory.

!mkdir src
makes a new subdirectory of the current directory.

(sets.specs :exe) ! size
provides information about the space used by the processor generated from sets.specs.

input +cmd=(sets.specs:exe) :stdout ! diff desired
compares the file `desired' with the result of applying the processor generated from sets.specs to the file `input'.


Next Chapter Table of Contents