General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Lexical AnalysisSpaces, Tabs and NewlinesAn Eli-generated processor examines its input text sequentially, recognizing character sequences in the order in which they appear. At each point it matches the longest possible sequence, classifies that sequence, and then begins anew with the next character. If the first character of a sequence is a space, tab or newline then the default behavior is to classify the sequence consisting of that character and all succeeding spaces, tabs and newlines as a comment. This behavior is consistent with the definitions of most programming languages, and is reasonable in a large fraction of text processing tasks. Even though tabs and newlines are considered comments by default, some processing is needed to account for their effect on the source text position. Eli-generated processors define a two-dimensional coordinate system (line number and column index), which they use to link error reports to the source text (see Source Text Coordinates and Error Reporting of Library Reference Manual). White space may be significant in two situations:
Appropriate white space may be specified as part of the description of a complete character sequence (provided that it is not at the beginning) without disrupting the default behavior. (Coordinate processing for tabs and newlines must be provided if they are allowed within the sequence.) The default behavior is overridden, however, by any specification of white space on its own or at the beginning of another character sequence. Overriding is specific to the white space character used: a specification of new behavior for a space overrides the default behavior for a space, but not the default behavior for a tab or newline. The following sections explain how coordinate processing is provided for newlines and tabs, and how to re-establish default behavior of white space on its own when white space can occur at the beginning of another character sequence.
Maintaining the Source Text Coordinates
The raw data for determining coordinates are two variables,
LineNum=Cumulative index of the current line in the input text (Pointer to current character)-StartLine=index of the current character in the current line This invariant must hold whenever the lexical analyzer begins to process a character sequence. It may be destroyed during the processing of that sequence, but must be re-established before processing of the next character sequence begins.
A tab character in the input text represents one or more spaces, depending
upon its position relative to the next tab stop, but it occupies only one
character position.
If the tab represents n spaces, n-1 must be subtracted from
Because the value of n depends upon the index of the current
character and the settings of the tab stops in the line, Eli provides an
operation
Suppose that
#include "tabsize.h" ... if ((*p++) == '\t') StartLine -= TABSIZE(p - StartLine); ...
-> $elipkg/gla/tabsize.c > MyTabSize.c
After modifying the routine appropriately, add the name The coordinate invariant is maintained automatically if no patterns matching tabs or newline characters are defined, and no auxiliary scanners that advance over tabs or newline characters are provided by the user. If such patterns or scanners are needed, then the user must define them in such a way that they maintain the coordinate invariant.
Three auxiliary scanners
(
For an example of the use of code in an auxiliary scanner to maintain the
coordinate invariant, see the library version of
Restoring the Default Behavior for White Space
When a pattern beginning with a space, tab or newline character overrides
the default behavior for that character, the character will only be
accepted as part of an explicit pattern.
The default behavior can be restored by using one of the canned
descriptions Define: $\040+define SPACES
Here the pattern for
Note that this specification is ambiguous:
A sequence of spaces followed by
Making White Space Illegal
When white space is illegal at the beginning of a pattern, the default
treatment of white space must be overridden with an explicit comment
pattern.
Because the sequence is specified to be a comment, nothing will be returned
to the parser.
A token processor like SPACES [lexerr]
The canned descriptions
|