Guide for New Eli Users
The example in this chapter illustrates how text processors are specified
to Eli.
Each section covers a major step in the development process,
discussing the purpose of that step and then carrying it out for the example.
A set of exercises is provided with each section.
The purpose of these exercises is to familiarize you with the basic
facilities that Eli provides for dealing with specifications, and how Eli
is typically used.
All of the text used in the exercises can be obtained, and the exercises
themselves can be carried out, using the facilities of Eli's
system documentation browser described in the first set of exercises given
below (see Exercises).
You need to classify a collection of several thousand words.
There are no duplicate words in any class, but a given word may belong to
more than one class.
The classes have arbitrary names, and no two classes may have the same
name.
A C program will manipulate the words and classes.
Because of the application, classes will be relatively fluid.
The customer expects that new words and classes will be added,
and the classification of existing words changed,
on the basis of experience and changing requirements.
Nevertheless, the system design requires that the data be built into the C
program at compile time.
The system designers have settled on an internal representation of the data
involving an array for each class containing the words in that class as
strings, an array containing the class names as strings, and an array
containing the sizes of the classes as integers.
The number of classes is also given.
All of these data items are to be specified in a single header file.
Here is an example of such a header file for a simple classification:
int number_of_sets = 3;
char *name_of_set[] = {
"colors",
"bugs",
"verbs"};
int size_of_set[] = {
3,
5,
4};
char *set_of_colors[] = {
"red",
"blue",
"green"};
char *set_of_bugs[] = {
"ant",
"spider",
"fly",
"moth",
"bee"};
char *set_of_verbs[] = {
"crawl",
"walk",
"run",
"fly"};
char **values_of_set[] = {
set_of_colors,
set_of_bugs,
set_of_verbs};
Although the meaning of the internal representation is straightforward, it
is quite clear that making the necessary alterations will be a tedious and
error-prone process.
Any change requires compatible modifications of several arrays.
For example, moving a word from one class to another means not only cutting
and pasting the word itself, but also changing the sizes of both classes.
It would be simpler and safer to define the classification with a notation
ideally suited to that task, and generate the header file from that
definition.
Here is an example of an obvious notation, defining the classification
represented by the header file given above:
colors{red blue green}
bugs{ant spider fly moth bee}
verbs{crawl walk run fly}
The remainder of this chapter discusses the specification and generation of
a program that translates class descriptions into header files.
This program must accept a class description, verify that class names are
unique and that there is only one occurrence of any given word in a class,
and then write a header file defining the appropriate data structure.
Its specification is broken into four parts, each stored in a separate
file:
- `sets.con'
- A context-free grammar describing the structure of the input text.
- `word.gla'
- A specification of the character sequences that are acceptable words,
and how those character sequences should be represented internally,
plus a specification of the character sequences to be ignored.
- `symbol.lido'
- A specification of the context conditions on uniqueness of set names and
elements within a single set.
- `code.fw'
- A specification of the form of the output text and how it is constructed.
File `sets.specs' lists the names of these four files and contains
requests to instantiate three library modules that support them.
Invoke Eli and type ?, followed by a carriage return.
This request will begin a documentation browsing session.
Use the browser's goto command to place yourself at node
(novice)tskex
(Browser commands are described in
Some Advanced Info Commands of Info.)
If you are using a computer with multiple-window capability, your
documentation browsing session is independent of your Eli session,
so you can simultaneously browse the documentation and make Eli requests.
Otherwise you must terminate the document browsing session in order to make
an Eli request.
In that case, you might want to make a note of your current node (given in
the highlighted status line) before exiting.
When you begin a new session, you can then use the g command to go
directly to that node.
-
Use the documentation browser's
run command to obtain a copy of the
complete specification.
You will use this copy to do the exercises in the remainder of this
chapter.
-
Verify that you have obtained all of the specification files by making the
following Unix request via Eli
(see Running Unix commands from Eli of User Interface Reference Manual):
-> !ls
-
Examine the file that is not a part of the specification by requesting Eli
to display it on screen
(see Copying to Standard Output of User Interface Reference Manual):
-> input>
The first step in specifying a problem to Eli is to develop a
context-free grammar
that describes the phrase structure
of the input text.
This structure must reflect the desired semantics, and it must be possible
for Eli to construct a parser
from the grammar.
Grammar development
is a surprisingly difficult task.
It is best to concentrate on the meaning of the tree structure
as you are developing the grammar,
and not try to economize by using the same symbols
to describe constructs that look the same but have different meanings.
One possible description of the structure of the set definition text is
(see How to describe a context-free grammar of Syntactic Analysis):
text: set_defs .
set_defs: set_def / set_defs set_def .
set_def: set_name '{' set_body '}' .
set_name: word .
set_body: elements .
elements: set_element / elements set_element .
set_element: word .
Here each set definition is described as a set_name followed by a
bracketed set_body .
The text will be made up of an arbitrary number of such definitions.
A set_body , in turn, consists of an arbitrary number of elements.
The set_name and each set_element is a word .
This structure represents the semantics of the input text:
Set names and set elements have different meanings, even though they are
both written as words.
Set bodies are significant units, even though they have the same form as
any subset of themselves.
The following specification would not reflect the semantics, even
though it is simpler and describes the same input language:
text: set_defs .
set_defs: set_def / set_defs set_def .
set_def: word '{' elements '}' .
elements: word / elements word .
To get information about whether Eli can construct a parser from a grammar
without actually trying to build the whole program,
use the :parsable
product
(see parsable of Products and Parameters Reference Manual).
In order to be parsable, the grammar must satisfy the
LALR(1) condition.
If the LALR(1) condition is satisfied, parsable will indicate that
fact.
Otherwise it will say that the grammar is not LALR(1) and provide a listing
of conflicts
(see How to Resolve Parsing Conflicts of Syntactic Analysis).
-
Tell Eli to explain what it is doing, and
request verification that Eli can generate a parser. Append
> to the
derivation to tell Eli to copy the results on the screen.
-> LogLevel=4
-> sets.specs :parsable>
When the process is complete, repeat the last request.
To save keystrokes, you can scroll through the history
(see The History Mechanism of User Interface)
using the up- and down-arrow keys.
Explain the very different responses to the two requests
for verification of parsability.
-
Request an editing session on the file `sets.con'
(see Editing with the Copy Command of Eli User Interface Reference Manual):
-> sets.con<
Delete text from the first rule of the grammar, leaving
the colon and everything following it unchanged.
Then request execution as before, scrolling through the history:
-> sets.specs:parsable>
Why was Eli's response so much shorter than before?
-
Obtain help diagnosing the error you created by deleting
text :
-> sets.specs :parsable :help
This request will start a documentation browsing session.
Follow the menu to the display for the file in error, and use the edit
command to gain access to that file.
Correct the error by inserting text before the colon,
exit the editor, and then quit the browsing session.
Use the Eli history mechanism to repeat the last request:
-> sets.specs :parsable :help
Explain Eli's response to this request.
Why was no documentation browsing session started?
Why was the response so much shorter than the response to the original
request for derivation and execution of the processor?
-
Your request for help after fixing the error really didn't demonstrate that
the grammar was parsable, because it didn't show you the result.
Request the result:
-> sets.specs :parsable>
Explain Eli's response to this request.
What steps were carried out to satisfy it?
Why were these the only ones necessary?
-
Delete the braces { } from the rule defining
set_def in file
`sets.con'.
This change makes the entire input text nothing but a list of words,
with no differentiation between set names and elements or between different
sets.
Eli will not be able to generate a parser from this grammar.
Request verification of parsability to see the error report:
-> sets.specs :parsable >
A shift-reduce conflict
is a situation in which the parser can't tell whether it should recognize a
complete phrase or continue to add symbols to an incomplete phrase.
In this example, the next word is either the name of a new set
(and thus the current elements phrase is complete), or
it is another set element (and thus belongs to the current elements
phrase).
Add a comma as a separator between set elements, but do not re-introduce
the braces.
Do you think that Eli will be able to generate a parser?
Briefly explain, and then verify your answer.
(For a more complete treatment of conflict resolution, see
How to Resolve Parsing Conflicts of Syntactic Analysis.)
Restore the original specification, or change `input' to conform to
your new specification, to ensure correct behavior for later exercises.
The terminal symbols
of the grammar are the literal
braces and the nonliteral symbol
word .
Eli can easily deduce that the braces are significant characters, but we
must provide a definition of the significant character sequences
that could make up a word .
We must also describe how to capture the significant information in a word
for further processing.
Eli normally assumes that white space
characters (space, tab
and newline)
are not significant.
If we want to provide a facility for commenting a classification then we
must additionally define the form of a comment
and specify that character sequences having this form are also not significant.
Here is one possible description of the non-literal character sequences
and comments:
word: $[a-zA-Z]+ [mkidn]
C_COMMENT
The first line defines a word to be any sequence of one or more
letters
(see Regular Expressions of Lexical Analysis).
Whenever such a sequence is recognized in the input text, mkidn is
invoked to capture the significant information represented by the sequence.
This processor associates an integer with the recognized sequence, and
arranges for that integer to become the value representing the character
sequence.
If two character sequences recognized as words are identical, mkidn
will represent them with the same integer; distinct sequences are represented
by different integers.
The second line of the specification does not begin with a symbol followed
by :, which indicates that the character sequences it describes are
not significant.
It uses a canned description
to describe character sequences taking the form of C comments
(see Canned Symbol Descriptions of Lexical Analysis).
Thus any character sequence taking the form of a C comment will be ignored
in the input text read by the generated program.
-
Tell Eli to keep quiet about what it is doing, and then ask it to run the
processor derived from your specification:
-> LogLevel=2
-> input +cmd=(sets.specs :exe) :stdout >
-
The sample input file does not contain either comments or errors.
Introduce an error by inserting a digit into one of the words of the
example and repeat your request:
-> input<
-> input +cmd=(sets.specs :exe) :stdout >
Briefly explain Eli's response to this request.
-
Verify that the generated processor correctly handles C comments.
-
Change the specification to accept only words that begin with upper-case
letters.
Generate a translator and verify its correctness.
-
Change the specification to allow Ada comments instead of C comments.
(Hint: see Canned Symbol Descriptions of Lexical Analysis.)
Generate a translator and verify its correctness.
The statement of the problem requires that the names of the classes be
unique, and that there be only one occurrence of a given word in a given
class.
This sort of condition is very common in translation problems.
It involves recognition of regions
and specific entities
within those regions.
For example, a set_body is a region and a set_element is a
specific entity within that region.
The check to be made is that no set_element appears more than once
in any set_body .
Regions are defined by the grammar.
Entities may be defined both by the grammar and by the values representing
the terminal symbols: the grammar selects a particular kind of phrase,
while the instances of this phrase are differentiated by the values of their
terminal symbols.
Some computation must be carried out over the region to verify the
condition.
This computation is standard, involving only the concepts of region and
entity, so a generic module
can be used to carry it out.
In order to use a generic module
we must instantiate that module and
connect the concepts it provides to the specification of our problem.
Instantiation
is handled in the `sets.specs' file:
$/Name/AlgScope.gnrc :inst
$/Prop/Unique.gnrc :inst
The AlgScope module
provides the concept of nested regions
containing entities distinguished by integer values
(see Algol-like Basic Scope Rules of Specification Module Library: Name Analysis).
The Unique module
provides the concept of an error for entities that
appear more than once in a region
(see Check for Unique Object Occurrences of Specification Module Library: Properties of Definitions).
An attribute grammar
fragment is used to connect the concepts provided by
these two modules to the specification of the phrase structure:
ATTR Sym: int;
SYMBOL Entity INHERITS IdDefScope, Unique COMPUTE
SYNT.Sym=TERM;
IF(NOT(THIS.Unique),
message(ERROR,"Multiply-defined word", 0, COORDREF));
END;
SYMBOL text INHERITS RootScope, RangeUnique END;
SYMBOL set_body INHERITS RangeScope END;
SYMBOL set_name INHERITS Entity END;
SYMBOL set_element INHERITS Entity END;
The symbol Entity ,
which does not occur in the context-free grammar, is used to represent the
concept of a word that must be unique within some region: a
set_name must be unique over the region lying outside of all sets,
while a set_element must be unique over the set in which it appears.
Entity allows us to gather all of the characteristics of that
concept in one place -- a symbol attribution
-- and then use inheritance
to associate those characteristics with the symbols that embody the concept.
An Entity appears in the input text as a word, which is represented
internally by the value representing the terminal symbol word .
That value was established by mkidn when the word was recognized
(see Nonliteral character sequences and comments).
The two symbols inheriting Entity , set_name and
set_element , are defined by rules that have terminal symbols on
their right-hand sides.
TERM can be used in a symbol computation to represent the value of
a terminal symbol appearing on the right-hand side of any rule defining the
given symbol
(see Terminal Access of LIDO Reference Manual).
Thus SYNT.Sym=TERM sets the Sym attribute of the
Entity to the value representing the terminal symbol defining that
Entity .
(SYNT means that the computation takes place in the
lower context
of the symbol, i.e. in the rules corresponding to the phrases
set_name: word and set_element: word .
See Types and Classes of Attributes of LIDO Reference Manual.)
The concept of a definition within a region is embodied in the symbol
IdDefScope
exported by the AlgScope module, while the concept that such a
definition must be unique is embodied in the symbol Unique ,
exported by the Unique module.
Thus Entity inherits from these two symbols.
If the Unique attribute of the Entity is false,
then the message
operation is invoked to output an error report.
The report has severity
ERROR ,
which indicates that a definite error has been detected and therefore
no object code should be produced
(see Source Text Coordinates and Error Reporting of Library Reference Manual).
It is placed at the source text coordinates (line and column) represented
by COORDREF , and consists of the string
Multiply-defined word .
COORDREF always refers to the location of the leftmost character of
the phrase corresponding to the rule in which it appears.
Since this computation is inherited by set_name and
set_element , it appears in rules corresponding to the phrases
set_name: word and set_element: word .
Any error report will therefore refer to the leftmost character of the
multiply-defined word.
The text is the outermost region (RootScope ),
within which the set names are defined.
Each set body is an inner region (RangeScope ),
within which the set elements are defined.
Therefore text inherits the RootScope computations and
set_body inherits the RangeScope computations.
A set_name must be unique within the text and a
set_element must be unique within its set_body ,
so both text inherits the RangeUnique computations.
-
Create a subdirectory `src' and then request source code for the
set processor
(see Running Unix commands from Eli of User Interface Reference Manual):
-> !mkdir src
-> sets.specs :source >src
Does `src' really contain a version of the translator
that is independent of Eli?
How can you be certain?
-
Try deriving source code without first creating a directory:
-> sets.specs :source >foo
What was the result?
Can you explain what happened?
-
Request an executable version of the translator:
-> sets.specs :exe >sets.exe
Is this executable independent of Eli?
How can you be certain?
At least two specifications, and sometimes more, are needed to create
structured output text.
The form of the text must be described, along with the source of its
components.
Eli generates a set of procedures from the description of the form of the
text, one for each kind of component.
The information about the components is then provided by a computation that
invokes these procedures with appropriate arguments.
We have already seen how Eli allows us to break the specification into
files that encapsulate individual tasks.
So far, each of the tasks has required only one kind of specification.
Here, however, we have a single task that requires at least two
specifications.
There is strong coupling between these specifications, however, so that a
change to one will often involve a change to the other.
The solution is to combine such related specifications
into a single file of type `fw'.
A type-`fw' file describes a set of specification files that Eli will
separate as necessary, but which the user can manipulate as a single
entity.
The variable atoms of the generated C code are integers (like the number of
elements in a set) and strings (like the elements of a set).
LeafPtg
(see PTG Output of Leaf Nodes of Specification Module Library: Generating Output)
is a generic module
that provides operations useful in writing such atoms,
so it is instantiated in file `sets.specs':
$/Output/LeafPtg.gnrc :inst
Three related specifications are used to describe the creation of the C
declarations.
First the general form of the declarations is given in a special language
called PTG, then there is an attribute grammar
fragment that computes
the individual components, and finally two C macros
are needed to implement operations used in the computations.
All three specifications are combined in a single type-`fw' file
called `code.fw':
@O@<code.ptg@>@{
Table:
"int number_of_sets = " $/*integer*/ ";\n\n"
"char *name_of_set[] = {\n"
$/*list of set names*/ "};\n\n"
"int size_of_set[] = {\n"
$/*list of set sizes*/ "};\n\n"
$/*list of sets*/
"char **values_of_set[] = {\n"
$/*list of set representations*/ "};"
Set:
"char *set_of_" $/*set name*/ "[] = {\n"
$/*list of set elements*/ "};\n\n"
Seq:
$ $
List:
$ ",\n" $
Quoted:
"\"" $ "\""
Name:
"set_of_" $
@}
@O@<code.lido@>@{
ATTR Ptg: PTGNode;
SYMBOL Entity INHERITS IdPtg END;
SYMBOL text COMPUTE
IF(NoErrors,
PTGOut(
PTGTable(
PTGNumb(CONSTITUENTS set_name.Sym WITH (int, ADD, ARGTOONE, ZERO)),
CONSTITUENTS set_name.Ptg WITH (PTGNode, PTGList, PTGQuoted, PTGNull),
CONSTITUENTS set_body.Size WITH (PTGNode, PTGList, PTGNumb, PTGNull),
CONSTITUENTS set_def.Ptg WITH (PTGNode, PTGSeq, IDENTICAL, PTGNull),
CONSTITUENTS set_name.Ptg WITH (PTGNode, PTGList, PTGName, PTGNull))));
END;
ATTR Size: int;
SYMBOL set_body COMPUTE
SYNT.Size=CONSTITUENTS set_element.Sym WITH (int, ADD, ARGTOONE, ZERO);
END;
SYMBOL set_def COMPUTE
SYNT.Ptg=
PTGSet(
CONSTITUENT set_name.Ptg,
CONSTITUENTS set_element.Ptg WITH (PTGNode, PTGList, PTGQuoted, PTGNull));
END;
@}
@O@<code.HEAD.phi@>@{
#include "err.h"
#define NoErrors (ErrorCount[ERROR]==0)
@}
The PTG specification,
which is introduced by @O@<code.ptg@>@{ and
terminated by @} , is simply a collection of parameterized
templates
for output.
Each template is given a name, and consists of a sequence of items that
will be output in the given order.
Quoted C strings are output as they stand, and each $ stands for one
parameter.
A text fragment is constructed according to a particular template by
invoking a function whose name is PTG followed by the template name.
This function returns a value of type PTGNode ,
and must be provided
with one argument of type PTGNode for each parameter.
To construct a text fragment according to the template named Quoted ,
for example, invoke PTGQuoted with one argument of type
PTGNode .
The result will be a value of type PTGNode that describes the text
fragment ", followed by the text fragment described by the argument,
followed by ".
The attribute grammar fragment,
which is introduced by
@O@<code.lido@>@{ and terminated by @} ,
invokes the PTG functions with appropriate arguments.
PTGNumb
and PTGName
are defined by the LeafPtg
module
(see PTG Output of Leaf Nodes of Specification Module Library: Generating Output).
They construct values of type PTGNode that describe the text
fragment consisting of a single integer value or word respectively.
These text fragments are then used in building larger text fragments, and
so on.
The translated output should be produced only if no errors were detected
by the translator.
NoErrors is a user-defined macro that tests whether any reports of
severity ERROR
were issued.
NoErrors must be defined as a C macro, in a file of type
`.HEAD.phi'.
This can be done by a segment of the FunnelWeb file
introduced by @O@<code.HEAD.phi@>@{ and terminated by @} .
The error module maintains an array ErrorCount ,
indexed by severity, each of whose elements specifies the number of reports
issued at the corresponding severity
(see Source Text Coordinates and Error Reporting of Library Reference Manual).
If the ERROR element of this array is 0 , then no reports
of severity ERROR have been issued.
In addition to building the C declarations, the attribute grammar fragment
computes the total number of sets and the total number of elements in each
set:
ATTR Size: int;
SYMBOL set_body COMPUTE
SYNT.Size=CONSTITUENTS set_element.Sym WITH (int, ADD, ARGTOONE, ZERO);
END;
ADD , ARGTOONE and ZERO are built-in functions of Eli
(see Predefined Entities of LIDO Reference Manual).
This computation visits nodes of the subtree rooted in the set_body
node.
If the node is a set_element node, function ARGTOONE
is applied to the value of the Sym attribute to yield the integer
value 1.
If the node has no set_element descendants, then its descendants are
not visited and the function ZERO is invoked with no arguments to
yield the integer value 0.
Otherwise the integers computed for the children of the node are combined
pairwise in left-to-right order via the function ADD .
See CONSTITUENT(S) of LIDO Reference Manual.
-
Request the code generated by the processor from the specification:
-> sets.specs :gencode :viewlist
Find the files generated from code.fw and verify the content of
code.HEAD.phi .
(Hint: Re-read the discussion of code.fw .)
-
Which file contains the definition of the function
PTGQuoted ?
(Hint: Use grep, or create a `tags' file and then give the command
vi -t PTGQuoted .)
-
Briefly explain the operation of
PTGQuoted .
How is the text fragment created by this function printed?
-
Use
lint
to search for anomalies in the collection of C routines.
Are any of those found significant?
Explain briefly.
-
Request an interactive debugging session:
-> sets.specs +debug :dbx
(If you prefer to use the GNU debugger gdb, simply replace
dbx with gdb ).
Set breakpoints to stop the program in PTGQuoted , _PrPTGQuoted
and PTGOut .
Run the program with the name of file `input' as its command line
argument.
What is the order of the calls to these three routines?
Explain briefly.
-
Redefine the output text so that each array value is indented, and the
closing brace is at the beginning of a line.
For example, the array of set names should look like this:
char *name_of_set[] = {
"colors",
"bugs",
"verbs"
};
Generate a translator and verify its correctness.
|