General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Lexical AnalysisLiteral SymbolsIf the generated processor includes a parser (see Top of Syntactic Analysis), then Eli will extract the descriptions of any literal terminal symbols from the context-free grammar defining that parser and add them to the specifications provided by type-`gla' files. For example, consider the following context-free grammar:
Program: Expression . Expression: Evaluation / Binding . Evaluation: Constant / BoundVariable / '(' Expression '+' Expression ')' / '(' Expression '*' Expression ')' . Binding: 'let' BoundVariable '=' Evaluation 'in' Expression .
This grammar has nine terminal symbols.
Two (
Only the character sequences to be classified as
Constant: PASCAL_INTEGER BoundVariable: PASCAL_IDENTIFIER PASCAL_COMMENT
Overriding the Default Treatment of Literal Symbols
By default, a literal terminal symbol specified in a context-free grammar
supplied to Eli will be recognized as though it had been specified by the
appropriate regular expression.
Thus the literal symbols
Plus: $\+ Let: $let
(Here In some situations it is useful to carry out more complex operations at the time the literal symbol is recognized. In this case, the user must do two things:
As a concrete example, suppose that
One approach to this problem would be to count the number of occurrences of
the literal symbol
In order to mark the literal symbol
$%% PercentPercent
Each line of a type-`delit' file consists of a regular expression and
an identifier, separated by white space.
The regular expression must describe a literal symbol appearing in a
context-free grammar supplied to Eli.
That literal symbol will not be incorporated automatically into the
generated lexical analyzer; it must be specified explicitly by the user.
The identifier will be given the appropriate value by an Eli-generated
In our example,
$%% (SkipOrNot) [CommentOrNot]
Initially, the separator will be classed as a comment because there is no
identifier preceding the regular expression.
#include <fcntl.h> #include "source.h" #include "litcode.h" static int Second = 0; char * SkipOrNot(char *start, int length) { if (!Second) return start + length; (void)close(finlBuf()); initBuf("/dev/null", open("/dev/null", O_RDONLY)); return TEXTSTART; } void CommentOrNot(char *start, int length, int *syncode, int *intrinsic) { if (!Second) { Second++; *syncode = PercentPercent; } }
The remainder of the text is skipped by closing the current input file and
opening an empty file to read
(see Text Input of Library Reference Manual).
Since
File `fcntl.h' defines
Using Literal Symbols to Represent Other ThingsIn some cases the phrase structure of a language depends upon layout cues rather than visible character sequences. For example, indentation is used in Occam2 to indicate block structure: If the first non-blank character of a line is indented further than the first non-blank character of the line preceding it, then the new line begins a new block. If the first non-blank character of a line is not indented as far as the first non-blank character of the line preceding it, then the old line ends one or more blocks depending on the difference in indentation. If the first non-blank characters of two successive lines are indented by the same amount, then the lines simply contain adjacent statements of the same block. Layout cues can be represented by literal symbols in the context-free grammar that describes the phrase structure. The processing needed to recognize the layout cues can then be described in any convenient manner, and the sequence of white space characters implementing those cues can be classified as the appropriate literal symbol.
Suppose that the beginning of a block is represented in the Occam2 grammar
by the literal symbol
$\{ Initiate $; Separate $\} Terminate Indentation can be specified as white space following a new line:
$\n[\t\040]* [OccamIndent]
The token processor
|