William.Waite@Colorado.edu |
Fortran77[1]==The generated FORTRAN 90 compiler uses a command line option to decide whether to accept fixed- or variable-format source text.This macro is invoked in definitions 3, 26, 28, 30, 33, 37, 38, 53, 54, 59, 61, 63, 73, 82, 86, 87, 88, 95, 96, 97, and 104.1
Only the lexical analysis task (scanning and computation of intrinsic attributes) is covered in this document. Because of the ad-hoc nature of basic symbol definition in FORTRAN, a mixture of declarative and operational specifications is necessary. The declarative specifications are regular expressions, and are used to describe some of the character strings the FORTRAN scanner must recognize. Descriptions for other strings are extracted automatically by Eli from the context-free grammar defining FORTRAN's phrase structure. Those descriptions need not be repeated here.
This specification was developed while the author was a visiting researcher at the GMD in Berlin, and was originally published as Arbeitspapiere der GMD 816 in January of 1994.
Eli Library Modules Used[2]==This section briefly describes the facilities of these modules that are used in the scanner. For a complete specification of each module, consult the Eli Library Reference Manual.This macro is invoked in definition 104.The Error Reporting Module[4] The Source Text Input Module[5] The Memory Object Management Module[6] The Character String Storage Module[7] The Unique Identifier Module[14]
The Command Line Processing Module[3]==This specification says that a positional parameter is to be recognized, and its string attached as a property of the known key InputFile. The value of that property will be taken as the processor's input file. If no parameter is specified, standard input will be used.This macro is invoked in definition 102.InputFile input "File to be processed"; #if !Fortran77[1] FixedFormat "-f" boolean "Select fixed input format"; #endif
The Error Reporting Module[4]==When the scanner constructs a token, it places the coordinates of the first character of the string represented by that token into curpos. Errors detected by the scanner itself are reported via the message operation. Any of these errors is sufficient to prevent the compiler from producing executable code, and hence they are reported with severity ERROR.This macro is invoked in definition 2.#include "err.h" /* Exported entities used in the FORTRAN scanner: * POSITION (type): Source text coordinates * ERROR (constant): Severity indicating output cannot be run * curpos (variable): Storage for token coordinates * message (operation): Report an error ***/
The Source Text Input Module[5]==TEXTSTART initially points to the first character of the first line of the source text, and LineNum initially has the value 1. It is the responsibility of the source module's client to maintain the values of TEXTSTART and LineNum so that they satisfy some appropriate condition. When all of the lines in the buffer have been examined, refillBuf can be invoked to obtain more text (if there is more in the file). On return from refillBuf, TEXTSTART contains a pointer to the new text or (if no more text is available) a pointer to a null string.This macro is invoked in definition 2.#include "source.h" /* Exported entities used in the FORTRAN scanner: * TEXTSTART (variable): Pointer to a line of input text * LineNum (variable): Index of the current text buffer line * refillBuf (operation): Refill the text buffer from the source file ***/
The Memory Object Management Module[6]==All of the information added between one invocation of obstack_finish and the next is guaranteed to be stored contiguously, and the pointer returned by obstack_finish points to the beginning of that contiguous storage area.This macro is invoked in definition 2.#include "obstack.h" /* Exported entities used in the FORTRAN scanner: * ObstackP (type): Variable-size storage area * obstack_init (operation): Initialize using defaults * obstack_begin (operation): Initialize using specified values * obstack_grow (operation): Add data to the current contiguous area * obstack_1grow (operation): Add one character to the current string * obstack_finish (operation): Complete the current growth * obstack_free (operation): Cut back the storage in use ***/
Obstack storage is allocated in a stack-like fashion:
If obstack_free is applied to a pointer returned by obstack_finish,
all of the storage allocated after the previous obstack_finish is
freed.
Subsequent growth will re-use that storage.
1.5 The Character String Storage Module
The character string storage module provides both temporary and permanent
storage for character strings.
The Character String Storage Module[7]==NoStr represents a non-existent string (in contrast to the empty string "", which certainly exists). It is used as a value of a pointer to a string when there is no string to point to. Csm_obstk represents the module's dynamic string storage facility in all Obstack operations. This facility can be used to store arbitrary strings, temporarily or permanently. When a string is to be stored permanently, the result of the obstack_finish operation that defined it should be stored in CsmStrPtr and then stostr invoked on CsmStrPtr. Any string pointer, including CsmStrPtr, can be used to describe strings that are stored temporarily. Storage for temporary strings must be freed explicitly via the obstack_free operation.This macro is invoked in definition 2.#include "csm.h" /* Exported entities used in the FORTRAN scanner: * NoStr (constant): Non-existent string * Csm_obstk (constant): Dynamic string storage facility * CsmStrPtr (variable): Pointer to string in dynamic string storage * stostr (operation): Make a stored string permanent ***/
The Generated Scanner Module[8]==The specifications from which the scanner is generated can describe character sequences that are to be ignored by the remainder of the generated program. For example, comments are usually ignored when translating a programming language, yet they must be recognized by the scanner. All such sequences are given the classification code NORETURN by the scanner, and their presence is not reported to the routine that invoked the scanner. NORETURN is exported by the scanner module to make it available to user-defined modules that override some of the internal operations of the scanner, as described below.This macro is invoked in definition 104.#include "gla.h" /* Exported entities used in the FORTRAN scanner support routines * NORETURN (constant): Classification of an ignored character sequence * TokenStart (variable): Pointer to the current character sequence * TokenEnd (variable): Pointer to the first unprocessed character * ResetScan (variable): True if TokenEnd is invalid ***/
The scanner module has no internal storage for text. Instead, TokenEnd is used to specify the text to be scanned. If it is valid when the scanner is invoked, TokenEnd points to a sequence of characters stored in contiguous memory locations, the last of which contains a zero byte. The value of ResetScan on invocation of the scanner determines whether TokenEnd is valid.
TokenStart is set by the scanner to point to the first character of the current character sequence.
Several of the internal operations of an Eli-generated scanner
can be replaced by user-defined versions.
These changes allow the user to specify certain aspects of the scanner's
behavior operationally, while specifying others declaratively.
The remainder of this section describes the operations that can be replaced,
and how they relate to the rest of the translation task.
2.1 Establish a Scan Pointer
The position of the scanner in the input text is defined by
TokenEnd, a character pointer exported by the scanner module.
TokenEnd points to the next character to be examined by the scanner.
When the scanner is invoked, it checks the content of its exported variable
ResetScan.
If ResetScan is nonzero, the scanner sets it to zero and uses the macro
SCANPTR to obtain a non-null value for TokenEnd:
Establish a Scan Pointer[9]==In an Eli-generated processor, the source text module is initialized before the scanner is invoked for the first time. Thus the initial invocation of SCANPTR can assume that the exit condition of the source buffer initialization operation holds, provided that the user has not supplied any additional operations to invalidate that condition.This macro is invoked in definition 57./* Establish a scan pointer * If no further text is available then on exit- * TokenEnd points to a null string * Otherwise on exit- * TokenEnd points to a string that is guaranteed to contain * a newline character */ #define SCANPTR
ResetScan may be set nonzero by any user-supplied procedure, causing SCANPTR to be invoked at the beginning of the next invocation of the scanner.
SCANPTR normally sets TokenEnd to the value of TEXTSTART, a
string pointer exported by the source module.
ResetScan is not normally set by any component of the generated compiler,
so SCANPTR is only executed on the first scanner invocation.
2.2 Set the Coordinates of a Token
The scanner establishes coordinates for each token that it recognizes.
These coordinates are used primarily for associating reports with
appropriate positions in the source text.
SETCOORD is a macro that places the coordinates of the current text
position into the variable curpos, exported by the error reporting
module.
Set the Coordinates of a Token[10]==SETCOORD normally sets the line coordinate to the value of LineNum, an integer variable exported by the error reporting module, and sets the column coordinate to the value of its argument p.This macro is invoked in definition 21./* Set the coordinates of the current token * On entry- * p=index of the current position in the current source line * On exit- * curpos=coordinates of the current position */ #define SETCOORD(p)
Set the Extent of a Token[11]==SETENDCOORD normally sets the line coordinate to the value of LineNum, an integer variable exported by the error reporting module, and sets the column coordinate to the value of its argument p.This macro is invoked in definition 21./* Set the coordinates of the end of the current token * On entry- * p=index of the current position in the current source line * On exit- * endpos=coordinates of the current position */ #define SETENDCOORD(p)
Define an Auxiliary Scanner[12](¶1)==Normally, an auxiliary scanner changes the length of the character sequence matched by the pattern. This allows the user to specify operationally patterns that are very tedious to describe with regular expressions.This macro is invoked in definitions 55 and 83.char * #ifdef PROTO_OK ¶1(char *start, int length) #else ¶1(start, length) char *start; int length; #endif /* Standard interface for an auxiliary scanner * On entry- * start points to the first character of the scanned string * length=length of the scanned string * On exit- * The function returns a pointer to the first character * beyond the scanned string ***/
Eli nominates the auxiliary scanner auxNUL for the end-of-text pattern, which the user is not allowed to specify via a regular expression. On entry to auxNUL, start points to a zero byte and length=0. A default routine, which simply returns the value of start, will be used if this routine is left unspecified.
If auxNUL returns a pointer to a non-null string, the generated scanner
will immediately scan that string, considering it to be a continuation of
the string being scanned when the end-of-text pattern was recognized.
In this case the scanner will not return any indication that the
end-of-text pattern was recognized.
2.5 Define a Token Processor
In the specification of a pattern defining a basic symbol, the user can
nominate a token processor.
It will be invoked after the scanner has recognized the pattern described
by the regular expression, and after any specified auxiliary scanner has
been invoked.
All token processors obey the same interface:
Define a Token Processor[13](¶1)==Normally, a token processor calculates a value for the intrinsic attribute from the characters of the scanned string. It may also change the classification of the scanned string and/or the value of TokenEnd.This macro is invoked in definitions 60, 63, 65, 67, 70, 77, 81, 83, 85, 87, 91, 94, 97, and 100.void #ifdef PROTO_OK ¶1(char *start, int length, int *klass, int *intrinsic) #else ¶1(start, length, klass, intrinsic) char *start; int length, *klass; int *intrinsic; #endif /* Standard interface for a processor * On entry- * start points to the first character of the scanned string * length=length of the scanned string * klass points to a location containing the initial classification * intrinsic points to a location to receive the intrinsic attribute * On exit- * klass points to a location containing the final classification * intrinsic points to a location containing the intrinsic attribute value * (if relevant) ***/
Eli nominates the token processor EndOfText for the end-of-text pattern, which the user is not allowed to specify via a regular expression. On entry to EndOfText, start points to a zero byte, length=0, and klass points to a location containing the classification code for the end of the text. A default routine, which simply returns, will be used if this routine is left unspecified. EndOfText will not be invoked if auxNUL returns a pointer to a non-null string.
The token processor mkidn is used to guarantee that only one copy of a specific string appears in the character storage module. This token processor obeys the standard interface given above. It is part of the unique identifier management module, whose interface is the file idn.h.
The Unique Identifier Module[14]==This macro is invoked in definition 2.#include "idn.h"
It might be that a particular sequence of characters could be interpreted as any of several different tokens. The scanner must make a choice among these possibilities, returning that choice to the parser, but it does not have information about whether this choice would be acceptable.
When the parser finds a token unacceptable, it invokes the routine Reparatur with a description of the unacceptable token. Reparatur may choose to alter the token and request that the parser decide whether the altered token is acceptable:
Deal With an Unacceptable Token[15]==A default routine, which returns 0 without modifying the token, will be used if Reparatur is left unspecified.This macro is invoked in definition 50.int #ifdef PROTO_OK Reparatur(POSITION *coord, int *klass, int *intrinsic) #else Reparatur(coord, klass, intrinsic) POSITION *coord; int *klass, *intrinsic; #endif /* Repair a syntax error by changing the lookahead token * On entry- * coord points to the coordinates of the lookahead token * klass points to the classification of the lookahead token * intrinsic points to the intrinsic attribute of the lookahead token * If the lookahead token has been changed then on exit- * Reparatur=1 * coord, klass and intrinsic reflect the change * Else on exit- * Reparatur=0 * coord, klass and intrinsic are unchanged ***/
This view is inappropriate for FORTRAN. The presence of continuation lines means that a single FORTRAN statement can be spread over an arbitrary number of lines, with any token broken between lines at arbitrary points. In FORTRAN 90, it is also possible to write a number of statements on the same line.
The statement is the natural unit of text for a FORTRAN scanner to store contiguously in memory: Basic symbols cannot span statement boundaries. Recognition of certain constructs involves extensive lookahead, but lookahead beyond the end of a statement is never required.
A classic structure clash like the one between the source module's lines and the scanner's statements is solved by using a buffer that contains integral numbers of both kinds of object. The buffer is filled by operations on the largest object that is a component of each of the clashing objects.
The FORTRAN structure clash is solved by filling a buffer with the statements from a sequence of lines. This buffer is terminated at the first point where the end of a statement coincides with the end of a line. Characters are the largest objects common to both lines and statements, and therefore the buffer is filled by operations on characters.
The statement buffer is implemented as an Obstack containing a single string. Source text operations define an abstract data type used to conceal the details of access to input lines. Information needed for precise error reporting is stored in a coordinate map:
Units of Text[16]==This macro is invoked in definition 104.static Obstack Statement; static char *Stmt = NoStr; The Coordinate Map[17] Operations on Source Text Lines[23] Statement Buffer Construction[37]
The Coordinate Map[17]==The array has one element for each point in the statement where the coordinates of the character do not have the same line number as the coordinates of its predecessor within that statement. The coordinates of characters following that point, up to the next such point, can be determined from their distance from that point and the map element for that point.This macro is invoked in definition 16.typedef struct { int IndexInStmt, LineIndex, Offset; } MapElement; static MapElement *Map, Current; static Obstack MapData; Coordinate-setting routine[22]
There is also an element for the first character position of the statement buffer because it has no predecessor, and an element for the character position beyond the end of the statement buffer to provide an upper limit for the lookup operation.
IndexInStmt is the index of the element's character position in the
statement buffer.
LineIndex is the line number of the character at that position, and
(IndexInStmt - Offset) is its column number.
Map points to the completed array for the statement buffer,
while Current is used in constructing the map elements.
3.1.1 Mark a line change
Map elements marking line changes are created as follows:
Mark a line change[18](¶1)==LineNum is a variable exported by the error reporting module. Its value is initially 1, and it is neither set nor examined by any operation of the error module or the source module.This macro is invoked in definitions 38, 39, 42, and 44.{ Current.LineIndex = LineNum; Current.Offset = Current.IndexInStmt - ((¶1) - TEXTSTART + 1); obstack_grow(&MapData, &Current, sizeof(Current)); }
TEXTSTART is a variable exported by the source module. Each source module operation that delivers text sets TEXTSTART to point to the start of the text it has delivered, but otherwise TEXTSTART is neither set nor examined by the source module.
The operations on source text lines guarantee that LineNum and TEXTSTART satisfy the following condition at appropriate points:
Invariant for the source text coordinate system[19]==Current.Offset is maintained by the statement buffer construction operations. For fixed-format text, updates are fixed by the definition of FORTRAN. When constructing a statement buffer from variable-format text the following operation is used:This macro is invoked in definitions 29 and 34.* LineNum=index of the current source line in the entire text * TEXTSTART points to the first character of the current source line
Add a character to the statement buffer[20](¶1)==This macro is invoked in definitions 38, 40, and 44.obstack_1grow(&Statement, ¶1); Current.IndexInStmt++;
Set Token Coordinates[21]==This macro is invoked in definition 103.Set the Coordinates of a Token[10] \ { extern void TokenCoords ELI_ARG((int, POSITION *, int)); \ TokenCoords(p, &curpos, 0); } Set the Extent of a Token[11] \ { extern void TokenCoords ELI_ARG((int, POSITION *, int)); \ TokenCoords(p, &curpos, 1); }
Coordinate-setting routine[22]==The while loop ensures that the proper map element is being used. The change defined after the newline character marking the end of the statement guarantees that this loop will always terminate.This macro is invoked in definition 17.static MapElement *CoordBase; void #ifdef PROTO_OK TokenCoords(int p, POSITION *pos, int right) #else TokenCoords(p, pos, right) int p; POSITION *pos; int right; #endif { while (p >= CoordBase[1].IndexInStmt) CoordBase++; if (!right) { pos->line = CoordBase->LineIndex; #ifdef MONITOR pos->col = pos->cumcol = p - CoordBase->Offset; #else pos->col = p - CoordBase->Offset; #endif } else { #ifdef RIGHTCOORD pos->rline = CoordBase->LineIndex; #ifdef MONITOR pos->rcol = pos->rcumcol = p - CoordBase->Offset; #else pos->rcol = p - CoordBase->Offset; #endif #endif } }
Operations on Source Text Lines[23]==These operations can be thought of as defining an abstract data type that embodies the essential properties of FORTRAN source text: All access to the source text is handled by these operations, all of which behave properly when the source text is exhausted.This macro is invoked in definition 16.Predicates classifying lines[24] Positioning operations[28] Operations that extract information from a line[33]
Predicates classifying lines[24]==Both of the line classification predicates have the same interface specification:This macro is invoked in definition 23.Comment Lines[26] Continuation Lines[27]
Line classification predicate:[25](¶1)==This macro is invoked in definitions 26 and 27.char * #ifdef PROTO_OK ¶1(char *p) #else ¶1(p) char *p; #endif /* Standard interface for a line classification predicate * On entry- * p points to the first character of a source text line * If the scanning operation is satisfied then on exit- * The function returns a pointer to the first unexamined character * Otherwise on exit- * The function returns a null pointer ***/
The following code provides an operational description of these rules; it is satisfied if the line pointed to by p on entry is a comment line. If IsComment is satisfied, it returns a pointer to the first character of the next source text line:
Comment Lines[26]==This macro is invoked in definition 24.Line classification predicate:[25](`IsComment') { register char *q; if (!*p) return NoStr; for (q = p; *q == ' ' || *q == '\t'; q++) ; switch (*q) { case '\n': LineNum++; return q+1; default: return NoStr; #if !Fortran77[1] case '!': if (FixedFormat && q == p+5) return NoStr; break; #endif case 'C': case 'c': case '*': if (!FixedFormat || q != p) return NoStr; } do { q++; } while (*q != '\n'); LineNum++; return q+1; }
The last rule involves interaction among lines, and is therefore not within the competence of any line scanning operation. It is described as a part of the statement buffer construction process.
The following code provides an operational description of the first two rules; it is satisfied if the line pointed to by p on entry is a continuation line:
Continuation Lines[27]==This macro is invoked in definition 24.Line classification predicate:[25](`IsContinue') { if (!*p) return NoStr; if (FixedFormat) { register int i; for (i = 0; i < 5; i++) if (p[i] != ' ') return NoStr; if (p[5] == ' ' || p[5] == '\t') return NoStr; if (p[5] == '0') { p[5] = ' '; return NoStr; } return p+6; } else { register char c; while ((c = *p++) == ' ' || c == '\t') ; return c == '&' ? p : NoStr; } }
Positioning operations[28]==All of the positioning operations have the same interface specification:This macro is invoked in definition 23.Advance to the next non-comment line[30] Advance to the next initial line[31] #if !Fortran77[1] Advance to a new file if necessary[32] #endif
Line positioning operation:[29](¶1)==This macro is invoked in definitions 30, 31, and 32.void #ifdef PROTO_OK ¶1(char *p) #else ¶1(p) char *p; #endif /* Standard interface for a line positioning operation * On entry- * p points to the first character of a source text line * LineNum is the index of the source line pointed to by p * On exit- Invariant for the source text coordinate system[19] * The line pointed to by TEXTSTART is of the desired class ***/
Advance to the next non-comment line[30]==If the source text buffer remains empty after an invocation of refillBuf, the end of the source file has been reached. In FORTRAN 90, the source file may be one named in an INCLUDE directive. ContinuationText will check for this situation and handle it properly. If the source text buffer remains empty after return from ContinuationText then there is no further input text of any kind.This macro is invoked in definition 28.Line positioning operation:[29](`NextNonComment') { char *next; for (;;) { if (!*p) { refillBuf(p); p = TEXTSTART; #if !Fortran77[1] if (!*p) p = ContinuationText(); #endif } if (!(next = IsComment(p))) break; p = next; } #if Fortran77[1] TEXTSTART = p; #else NextIncludedLine(p); #endif }
Advance to the next initial line[31]==NextInitialLine returns a pointer to the next initial line, and reports an error at the first significant character position of any continuation line preceding that initial line.This macro is invoked in definition 28.Line positioning operation:[29](`NextInitialLine') { for (;;) { char *next; NextNonComment(p); if (!(next = IsContinue(TEXTSTART))) return; { POSITION e; e.line = LineNum; e.col = next - TEXTSTART + 1; message(ERROR, "Continuation without initial line", 0, &e); while (*next++ != '\n') ; LineNum++; p = next; } } }
Advance to a new file if necessary[32]==ReadingFrom is an operation exported by the Eli Include module.This macro is invoked in definition 28.Line positioning operation:[29](`NextIncludedLine') { char c, *q; if (!(TEXTSTART = p)) return; StartLine = p - 1; while ((c = *p++) == ' ' || c == '\t') { if (c == '\t') StartLine -= TABSIZE(p - StartLine); } q = "include"; while (*q) { if (F77Fold[c] != *q) return; c = *p++; q++; } while ((c = *p++) == ' ' || c == '\t') { if (c == '\t') StartLine -= TABSIZE(p - StartLine); } if (c != '\'' && c != '"' ) return; curpos.line = LineNum; curpos.col = p - StartLine; p = fstr(p - 1, 1); q = CsmStrPtr; while ((c = *p++) == ' ' || c == '\t') { if (c == '\t') StartLine -= TABSIZE(p - StartLine); } if (c == '!') while (c != '\n') c = *p++; LineNum++; if (c == '\n') { TEXTSTART = p; if (!ReadingFrom(FindFile(q))) message(ERROR, "Cannot open include file", 0, &curpos); p = TEXTSTART; } else { curpos.col = p - StartLine; message(ERROR, "Only a comment can follow INCLUDE", 0, &curpos); while (c != '\n') c = *p++; } obstack_free(Csm_obstk, q); NextInitialLine(p); }
The constraints on the information in a line and the subsequent processing needs are different for the fixed and variable input formats, so separate extraction operations are required:
Operations that extract information from a line[33]==No variable-format extraction operation is required when a FORTRAN 77 scanner is being generated.This macro is invoked in definition 23.Extract fixed-format text[35] #if !Fortran77[1] Extract variable-format text[36] #endif
Both of the extraction operations have the same interface specification:
Line extraction operation:[34](¶1)==This macro is invoked in definitions 35 and 36.void #ifdef PROTO_OK ¶1(char *p) #else ¶1(p) char *p; #endif /* Standard interface for a line extraction operation * On entry- * The current line is not a comment line * p points to the first character to be extracted Invariant for the source text coordinate system[19] * On exit- * The source text is positioned at the next non-comment line Invariant for the source text coordinate system[19] ***/
Extract fixed-format text[35]==Information is placed directly into the statement buffer by ExtractFixedLine, because the fixed format guarantees that all of the information actually belongs to the statement.This macro is invoked in definition 33.Line extraction operation:[34](`ExtractFixedLine') { register char c; char *Position0 = TEXTSTART - 1; while (p - Position0 <= 72) { /* Invariant: (p - Position0) indexes the first unfilled column * p points to the first unprocessed character */ if ((c = *p) == '\n') { obstack_1grow(&Statement, ' '); Position0--; } else { p++; if (c != '\t') obstack_1grow(&Statement, c); else { register int size = TABSIZE(p - Position0); do obstack_1grow(&Statement, ' '); while (size-- && (p - Position0--) <= 72); } } } while (*p++ != '\n') ; LineNum++; NextNonComment(p); }
Extract variable-format text[36]==In the variable format it is necessary to examine lines that are adjacent in the statement, but not necessarily adjacent in source text. The content of the second line is used to determine whether it is a continuation of the first and, if so, how the two should be joined. Thus the current line is stored temporarily in the character storage module's Obstack by ExtractVariableLine. This allows access to the current line via CurrentLine, and to the next non-comment line via TEXTSTART.This macro is invoked in definition 33.char *CurrentLine = NoStr; Line extraction operation:[34](`ExtractVariableLine') { register char c; char *Position0 = TEXTSTART - 1; if (CurrentLine) obstack_free(Csm_obstk, CurrentLine); if (*p) { while ((c = *p++) != '\n') { if (c != '\t') obstack_1grow(Csm_obstk, c); else { register int size = TABSIZE(p - Position0); Position0 -= size; do obstack_1grow(Csm_obstk, ' '); while (size--); } } obstack_1grow(Csm_obstk, '\n'); } obstack_1grow(Csm_obstk, '\0'); CurrentLine = obstack_finish(Csm_obstk); LineNum++; NextNonComment(p); }
Statement Buffer Construction[37]==After the statement buffer has been loaded, the string pointed to by Stmt is either null (indicating no further text is available) or it contains an integral number of FORTRAN statements. The last character of the string is a newline character. On each request to load the statement buffer, the previous contents (if any) of the statement buffer and associated coordinate map are discarded and the space reused:This macro is invoked in definition 16.Load fixed-format text[39] #if !Fortran77[1] Load variable-format text[40] #endif Load the Statement Buffer[38]
Load the Statement Buffer[38]==During the process of loading the statement buffer, TEXTSTART points to the first character of the first unused line in the source text.This macro is invoked in definition 37.void LoadStmtBuffer() { if (Stmt) { obstack_free(&Statement, (void *)Stmt); obstack_free(&MapData, (void *)Map); } else { obstack_init(&Statement); obstack_init(&MapData); NextInitialLine(TEXTSTART); } Current.IndexInStmt = 1; Mark a line change[18](`TEXTSTART') if (*TEXTSTART) { #if Fortran77[1] LoadFixedFormat(); #else if (FixedFormat) LoadFixedFormat(); else LoadVariableFormat(); #endif Add a character to the statement buffer[20](`'\n'') } Add a character to the statement buffer[20](`'\0'') Mark a line change[18](`TEXTSTART') Stmt = (char *)obstack_finish(&Statement); CoordBase = Map = (MapElement *)obstack_finish(&MapData); }
Load fixed-format text[39]==This macro is invoked in definition 37.void LoadFixedFormat() { ExtractFixedLine(TEXTSTART); Current.IndexInStmt += 72; for (;;) { char *next; if (!(next = IsContinue(TEXTSTART))) return; Mark a line change[18](`next') ExtractFixedLine(next); Current.IndexInStmt += 66; } }
Load variable-format text[40]==JSQ, JDQ, JHOLL and HCount are state variables. Continuations are dealt with differently depending on the context (inside or outside of a string). LoadVariableFormat guarantees that upon exiting the while loop the last line processed was not continued (recall that the statement buffer is terminated upon reaching the end of a statement that is also the end of a line). Thus it is appropriate to use NextInitialLine to skip comments and verify that there are no spurious continuation lines.This macro is invoked in definition 37.void LoadVariableFormat() { register char J, *p; int JSQ = 0, /* 1 if within a string delimited by ' */ JDQ = 0, /* 1 if within a string delimited by " */ JHOLL = 0, /* 1 if within a Hollerith constant */ HCount = -1; /* Possible Hollerith count */ ExtractVariableLine(TEXTSTART); p = CurrentLine; while ((J = *p++) != '\n') { Characters within a string[41] Characters not within any string[43] add: Add a character to the statement buffer[20](`J') } obstack_free(Csm_obstk, CurrentLine); CurrentLine = NoStr; NextInitialLine(TEXTSTART); }
The label add is the common target for adding a character to the
statement buffer regardless of its context.
Jumps to add are used in lieu of more structured control flow because
the decisions about context and when to actually add a character form a
tree.
3.3.2.1 Characters within a string
String context is indicated by a nonzero value of one of the three state
variables JSQ, JDQ or JHOLL.
Each of the three kinds of string requires a check for completion,
which depends upon the kind of string:
Characters within a string[41]==If an ampersand appears in a string context, then it indicates a continuation if and only if the remainder of the line is blank and the next non-comment line's first nonblank character is also an ampersand. If these conditions are not met, the ampersand is simply a character of the string:This macro is invoked in definition 40.if (JSQ && J == '\'') { JSQ = 0; goto add; } if (JDQ && J == '"') { JDQ = 0; goto add; } if (JSQ || JDQ || JHOLL) { if (J != '&') { if (JHOLL && --HCount == 0) JHOLL = 0; goto add; } Advance to the continuing character of a string[42] continue; }
Advance to the continuing character of a string[42]==This macro is invoked in definition 41.{ register char temp, *tempp = p; char *next = IsContinue(TEXTSTART); while ((temp = *tempp++) == ' ') ; if (temp != '\n' || !next) goto add; Mark a line change[18](`next') ExtractVariableLine(next); p = CurrentLine; }
Characters not within any string[43]==An exclamation point indicates a comment that terminates the line, and either of the string quotes indicates a shift to string context. A shift to string context due to the beginning of a Hollerith constant is harder to detect.This macro is invoked in definition 40.switch (J) { case '&': Advance to the continuing character of a non-string[44] continue; case '!': while (*p != '\n') p++; continue; case '\'': JSQ = 1; HCount = -1; break; case '"': JDQ = 1; HCount = -1; break; case '(': case ',': case '/': case '*': HCount = 0; break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (HCount >= 0) HCount = HCount * 10 + J - '0'; break; case 'H': case 'h': if (HCount > 0) JHOLL = 1; break; case ' ': if (HCount == 0) break; default: HCount = -1; }
The state variable HCount is used to keep track of the possibility of a Hollerith constant, and its putative length: HCount negative indicates that no Hollerith constant is possible, a zero value indicates that it is possible but no length has been specified, and a positive value indicates that it is possible and (if present) has the specific length. Hollerith constants can only follow four characters, and blanks are not allowed within the count portion of a Hollerith constant in the variable format.
Continuation of the line is indicated by an ampersand, which might be followed by a comment:
Advance to the continuing character of a non-string[44]==Outside of a string context, an ampersand must be followed by either the end of the line or a comment.This macro is invoked in definition 43.{ char *next = IsContinue(TEXTSTART); while ((J = *p++) == ' ') ; if (J != '\n') { if (J != '!') { POSITION e; e.line = Current.LineIndex; e.col = p - CurrentLine - Current.Offset; message(ERROR, "Only a comment can follow &", 0, &e); } while (*p++ != '\n') ; } if (!next) { Add a character to the statement buffer[20](`' '') if (HCount > 0) HCount = -1; next = TEXTSTART; } Mark a line change[18](`next') ExtractVariableLine(next); p = CurrentLine; }
If the continuation line begins with an ampersand, the continuation may
occur within any token.
Otherwise a token may not be partially on one line and partially on the
next.
This exception is indicated by inserting a space which, in the variable
format, terminates any non-string token.
3.4 Character Sequence Normalization
Upper- and lower-case letters are equivalent in FORTRAN except when they
appear in string data.
Also, blanks are insignificant outside of string data in FORTRAN 77 and the
fixed-format input text of FORTRAN 90.
Any attempt to fold characters and remove irrelevant blanks when assembling
a statement leads to rather complex decision processes and code that is
difficult to follow.
It is easier to deal with these problems after the character sequence for a
token has been recognized.
Several different kinds of transformations may be necessary to normalize character strings in different contexts, but all of these operations follow the same basic pattern:
Character Sequence Normalization[45](¶1)==The routine uses each character of the string pointed to by its first argument as an index into the table pointed to by the fourth argument. Depending on the content of the indexed element, a value may be added to the Obstack pointed to by the third argument. (This pattern is similar to that of the ``translate and test'' instructions found on machines beginning in the 1960's.)This macro is invoked in definitions 48 and 49.char * #ifdef PROTO_OK ¶1(char *start, int length, ObstackP obstk, char *table) #else ¶1(start, length, obstk, table) char *start; int length; ObstackP obstk; char *table; #endif /* Normalize a string to an obstack * On entry- * start points to a string to be normalized * length=length of the string to be normalized * obstk points to the area in which the normalized string will be stored * table points to the translation table * On exit- * ¶1 points to the normalized string ***/
Character translation code[46]==This macro is invoked in definition 104.FORTRAN Character Conversion Table[47] IMPLICIT Character Conversion Table[101] Normalization for Fixed-Format Input[48] Normalization for Variable-Format Input[49]
FORTRAN Character Conversion Table[47]==The zero entries in the table indicate special treatment. Although the null character indicates special treatment, that entry is irrelevant because the string to be normalized will never contain a null in this application. There is nothing in the interface to preclude appearance of null characters, and they might be present in other applications of character sequence translation. A table for such an application would have an appropriate entry.This macro is invoked in definition 46.static char F77Fold[] = { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0 , '!', '"', '#', '$', '%', '&', '\'', /* Skip spaces */ '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@','a', 'b', 'c', 'd', 'e', 'f', 'g', /* Change upper to lower */ 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '[', '\\',']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', 127 };
Normalization for Fixed-Format Input[48]==This macro is invoked in definition 46.Character Sequence Normalization[45](`NormalizeFixed') { register char temp; while (length-- > 0) { if (temp = table[*start++]) obstack_1grow(obstk, temp); } obstack_1grow(obstk, '\0'); return (char *)obstack_finish(obstk); }
Normalization for Variable-Format Input[49]==This macro is invoked in definition 46.Character Sequence Normalization[45](`NormalizeVariable') { register char temp; while (length-- > 0 && (temp = table[*start++])) obstack_1grow(obstk, temp); obstack_1grow(obstk, '\0'); return (char *)obstack_finish(obstk); }
Additional context must be used when the characters of the token do not
uniquely determine its classification.
If no more than one of the possible classifications is allowable at every
point in a parse, then the parser can provide sufficient context.
Some cases that the parser cannot resolve can be resolved on the basis of
whether the statement being processed is or is not an assignment.
In a few cases, more specialized processing is needed to classify the
token.
4.1 Parser Resolution of Token Classification
When the parsing context is sufficient to determine which of several
classifications is appropriate, the scanner simply chooses one possible
class for the token.
If tokens of that class are not allowed in the given context, the parser
will invoke the Reparatur routine.
This routine then chooses another possible class.
Effectively, the parser forces the scanner to step through the possible
classifications until one is found that works in the current context or all
have been exhausted.
Resolution of any classification problem is the task of the token processors. One token processor is nominated in the specification of the pattern defining the token. This token processor chooses one possible class, saves information defining a token of another class, and nominates a processor to be applied to that token. Reparatur uses the saved information to invoke that processor if the current token is unacceptable to the parser:
Parser Resolution of Token Classification[50]==If the current token has no alternative, Reparatur terminates immediately with an indication that the parser should report an error. Otherwise it notes that the alternative has been used, establishes the entry conditions for a token processor, and invokes the processor nominated to handle this alternative.This macro is invoked in definition 104.State describing the next possible classification[51] int CurrentClass = NORETURN; Deal With an Unacceptable Token[15] { if (CurrentClass != *klass) return 0; CurrentClass = NORETURN; *klass = NewClass; TokenEnd = NewEnd; Processor(TokenStart, TokenEnd - TokenStart, klass, intrinsic); return 1; }
Three values, are required to characterize an alternative classification:
State describing the next possible classification[51]==When specifying an alternative classification, a token processor must note the presence of an alternative and establish these three values:This macro is invoked in definition 50.int NewClass; /* Code for the alternative classification */ char *NewEnd; /* Pointer to the first character beyond the token */ void (*Processor) ELI_ARG((char *, int, int *, int *));
Define the next possible interpretation[52](¶3)==Note that at the token processor has decided upon the classification it will return to the parser at the time it defines the alternative.This macro is invoked in definitions 64, 66, 70, 81, 85, 91, 94, 97, and 100./* klass points to the classification being returned by this processor */ { CurrentClass = *klass; NewClass = ¶1; NewEnd = ¶2; Processor = ¶3; }
Recognizing an assignment statement is tedious, but relatively straightforward:
Assignment Statement Recognition[53]==JSQ and JDQ indicate whether the current position is within a string (in which case the actual character is ignored). JEQ is nonzero only after an equals sign not enclosed by parentheses has been seen. ISW is nonzero only if the current character is the one following a right parenthesis that is itself not enclosed in parentheses, and no unparenthesized equals sign appears to the left of the current character.This macro is invoked in definition 104.int #ifdef PROTO_OK IsAssignment(char *p) #else IsAssignment(p) char *p; #endif /* Check for an assignment statement * If the string pointed to by p is an assignment statement then on exit- * IsAssignment=1 * Otherwise on exit- * IsAssignment=0 ***/ { register char J; char JSQ = 0, JDQ = 0, ISW = 0, JEQ = 0; int Level = 0, JHOLL = 0; #if !Fortran77[1] char JCOLON = 0; #endif if (!*p) return 0; while ((J = *p++) != '\n') { if (J == ' ') continue; if (JSQ) { if (J == '\'') JSQ = 0; continue; } if (JDQ) { if (J == '"') JDQ = 0; continue; } Case analysis for assignment[54] if (ISW) { Remember this point for a possible assignment test[56](`p - 1') return JEQ; } } return JEQ; }
The basic idea is that an assignment statement is characterized by an equals sign that is not contained in parentheses. Of the non-assignment statements, only the DO statement has an equals sign that is not contained in parentheses. But the DO statement also has a comma that is not contained in parentheses, while the assignment does not, so the two are easily distinguished.
If ISW=1 after analyzing the current character,
that character determines whether the statement is or is not an assignment:
An equals sign means an assignment, and any other character means some
other kind of statement.
That distinction is expressed by JEQ.
Note, however, that only a part of the statement has been examined.
If the statement is a logical IF statement then the remainder constitutes a
statement in its own right and classification of tokens within that
statement will depend on whether it is an assignment.
Thus it is necessary to remember the position of the current character
(p - 1) as the point at which to begin a new assignment test
if one becomes necessary.
4.2.1 Case analysis for assignment
Individual characters must be examined in order to maintain the state
variables and recognize situations that allow the procedure to terminate
before reaching the end of the statement:
Case analysis for assignment[54]==Hollerith constants cannot occur in assignments, so the recognition of a Hollerith constant results in immediate termination without the need to examine the constant itself. JHOLL controls the recognition of a Hollerith constant: It is set to 1 when a character that might precede a Hollerith constant is seen, and incremented for each digit occurring in a context where a Hollerith constant might be expected. JHOLL is set to 0 after seeing any character that could not precede a Hollerith constant or be part of its count.This macro is invoked in definition 53.#if !Fortran77[1] if (J != ':') JCOLON = 0; #endif switch (J) { case '\'': JSQ = 1; JHOLL = 0; break; case '"': JDQ = 1; JHOLL = 0; break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (JHOLL) JHOLL++; break; case 'h': case 'H': if (JHOLL > 1) return 0; JHOLL = 0; break; #if !Fortran77[1] case ':': if (JCOLON) return 0; JCOLON = 1; JHOLL = 0; break; case ';': return JEQ; #endif case '/': case '*': if (Level == 0 && !JEQ) return 0; JHOLL = 1; break; case ',': if (Level == 0) return 0; JHOLL = 1; break; case '(': Level++; ISW = 0; JHOLL = 1; continue; case ')': JHOLL = 0; if (--Level) break; ISW = !JEQ; continue; case '=': if (Level != 0 && !JEQ) return 0; JEQ = 1; default: JHOLL = 0; }
FORTRAN 90's double colon cannot occur in assignments, so the recognition of a double colon results in immediate termination. JCOLON controls the recognition of a double colon: It is set to 1 when a colon is seen and set to 0 by any other character except a space (the case analysis code is not executed for spaces).
FORTRAN 90 also introduces semicolon as a statement terminator, so that
character ends the analysis.
4.2.2 Check the Remainder of a Logical IF
A FORTRAN logical IF statement consists of a parenthesized logical
expression followed by a statement.
When the assignment test is applied to a logical IF statement, it decides
that statement is not an assignment when it reaches the first non-blank
character after the parenthesized logical expression.
This character is the first character of the statement controlled by the
logical expression, and is the point at which the assignment test must be
re-applied.
In order to re-apply the assignment test, normal scanning must be interrupted when the scanner reaches the appropriate point in the input text. This can be guaranteed by using the null character as a ``breakpoint'': Save the first character of the statement controlled by the logical expression, replacing it in the input text with a null character. When the scanner recognizes the null character (which denotes ``end of text''), it invokes the auxiliary scanner auxNUL. The default auxNUL procedure simply returns the location of the null character. By supplying a different version of auxNUL, however, this behavior can be changed:
Check the Remainder of a Logical IF[55]==Note that this strategy could be extended to a variety of different kinds of breakpoint simply by making auxNUL do more complex testing to determine the kind of breakpoint it had reached.This macro is invoked in definition 104.Define an Auxiliary Scanner[12](`auxNUL') { if (NewScanMark) { *start = NewScanMark; NewScanMark = '\0'; Assignment = IsAssignment(start); } return start; }
Setting the breakpoint in this simple case is straightforward:
Remember this point for a possible assignment test[56](¶1)==This macro is invoked in definition 53.{ NewScanMark = *(¶1); *(¶1) = '\0'; }
Because the scanner does not operate directly on the text in the buffer provided by the source module, the standard initialization operation must be overridden:
Initialize the scanner[57]==NewStmtBuffer must invoke the statement buffer construction operation and establish the values of TokenEnd and StartLine for the scanner. Because classification of identifiers and keywords depends upon whether the scanner is processing an assignment, NewStmtBuffer must also use IsAssignment to classify the first statement:This macro is invoked in definition 103.Establish a Scan Pointer[9] \ { extern void NewStmtBuffer(); NewStmtBuffer(); }
Create a statement buffer and prepare to scan it[58]==The statement buffer may contain an arbitrary number of statements. Those statements will be separated by semicolons, and the last may or may not be terminated by a semicolon. Any sequence of semicolons has the same effect as a single semicolon, and any sequence of semicolons followed by a newline has the same effect as a newline alone. Either a semicolon or a newline constitutes an end-of-statement token. These rules are embodied in the following specification:This macro is invoked in definition 104.void NewStmtBuffer() { LoadStmtBuffer(); TokenEnd = Stmt; StartLine = Stmt - 1; Assignment = IsAssignment(TokenEnd); }
End-of-statement marker[59]==It consists of a terminal name (xEOS), a regular expression (introduced by $ and terminated by white space) that describes the allowable character sequences, and the name of a token processor (EndOfStmt) nominated to be invoked after the pattern is recognized.This macro is invoked in definition 105.#if Fortran77[1] xEOS: $\n [EndOfStmt] #else xEOS: $\n|;(\040*;)*(\040*\n)? [EndOfStmt] #endif
EndOfStmt must decide whether the sequence ended with a newline (in which case it must arrange for the statement buffer to be refilled) or not (in which case it must merely classify the next statement in the buffer):
End-of-statement token processor[60]==Recall than if ResetScan is 1 when the scanner is entered, the value of TokenEnd is invalid and must be re-established. That will cause the scanner initialization operation to be executed, invoking NewStmtBuffer as described above.This macro is invoked in definition 104.Define a Token Processor[13](`EndOfStmt') { if (start[length-1] == '\n') ResetScan = 1; else Assignment = IsAssignment(TokenEnd); }
Identifiers and Keywords[61]==Keywords also satisfy this definition, and keywords are not reserved. Thus any identifier or keyword will be classified as an xIdent by the scanner generated from this specification, and the distinction must be made by keycheck. Even when the symbol could be a keyword, that keyword might not be acceptable in the current context. If the parser rejects the keyword, the symbol must be re-classified as an identifier. Re-classification is also carried out by a token processor, and two such processors are needed for FORTRAN because there are different re-classification requirements for the two input formats. Finally, keyword recognition in the fixed format requires the ability to recognize some prefix of the character sequence scanned, because the fact that spaces are ignored means that the keyword can be run together with an identifier that follows it. Thus four distinct routines are needed to process identifiers and keywords:This macro is invoked in definition 105.#if Fortran77[1] xIdent: $[a-zA-Z](\040*[a-zA-Z0-9])* [keycheck] #else xIdent: $[a-zA-Z](\040*[a-zA-Z0-9_])* [keycheck] #endif
Token processors for identifiers and keywords[62]==This macro is invoked in definition 104.Keyword Recognition in Fixed-Format Text[68] Re-Classify a Fixed-Format Keyword as an Identifier[67] Re-Classify a Variable-Format Keyword as an Identifier[65] Distinguishing Identifiers From Keywords[63]
Distinguishing Identifiers From Keywords[63]==A FORTRAN 77 compiler will not contain a copy of the routine NormalizeVariable, and therefore the call must not be generated in that case. Similarly, there is no need to include the variable-format keyword test in a FORTRAN 77 compiler.This macro is invoked in definition 62.Define a Token Processor[13](`keycheck') { int k; #if Fortran77[1] CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); #else if (FixedFormat) CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); else { CsmStrPtr = NormalizeVariable(start, length, Csm_obstk, F77Fold); TokenEnd = start + strlen(CsmStrPtr); } #endif if (Assignment) { int dummy = xIdent; mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); return; } #if !Fortran77[1] Variable-Format Keyword Test[64] #endif Fixed-Format Keyword Test[66]
In the variable-format case, the normalization process determines the
length of the sequence.
Thus TokenEnd can be set immediately after normalization.
The fixed-format case is more difficult, because spaces can occur within
the sequence and the end is not known until after the keyword test has been
completed.
6.2 Variable-Format Keyword Test
The keyword test is straightforward in the variable-format case.
Keywords are pre-loaded into the string table, and the normalized string is
known to contain exactly the characters of the identifier or keyword.
Therefore mkidn can be used to look the string up and return its
classification and intrinsic attribute value.
If the classification is not that of an identifier, the sequence is
reported as a keyword.
Of course this classification is incorrect if the keyword is not acceptable
in the current context, so the interpretation of the sequence as an
identifier must be noted.
Variable-Format Keyword Test[64]==If the parser rejects the keyword, the only action necessary is to re-classify the sequence as an identifier; the intrinsic attribute remains unchanged. But this re-classification has already been done by the time the token processor nobody is invoked, so that token processor does nothing.This macro is invoked in definition 63.if (!FixedFormat) { mkidn(CsmStrPtr, TokenEnd - start, klass, intrinsic); if (*klass != xIdent) Define the next possible interpretation[52](`xIdent', ` TokenEnd', ` nobody') return; }
Re-Classify a Variable-Format Keyword as an Identifier[65]==This macro is invoked in definition 62.Define a Token Processor[13](`nobody') { }
Fixed-Format Keyword Test[66]==Once the keyword has been identified, the normalized string can be discarded by invoking obstack_free. The keyword's classification is obtained from the table, and if the parser rejects that keyword then the sequence must be presented as an identifier.This macro is invoked in definition 63.if ((k = Keyword(CsmStrPtr)) >= 0) { int n; obstack_free(Csm_obstk, CsmStrPtr); *klass = KeyTable[k].keycode; Define the next possible interpretation[52](`xIdent', ` start + length', ` mkfidn') TokenEnd = start; for (n = 0; n < KeyTable[k].length; n++) while (*TokenEnd++ == ' ') ; } else { int dummy = xIdent; mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); return; } }
Because the keyword may be any prefix of the character sequence, TokenEnd must be reset by advancing over the non-blank characters of the keyword.
If the parser rejects the keyword then the scanner must recover the complete character sequence, normalize it, and use mkidn to obtain the corresponding intrinsic attribute value.
Re-Classify a Fixed-Format Keyword as an Identifier[67]==In this case the classification of the token is known to be ``identifier'', so the value set by mkidn must be ignored. If the particular character sequence has not been seen previously, however, its classification must be set. This behavior is obtained through the use of dummy, which is used to communicate the classification to mkidn and receive the updated classification from mkidn.This macro is invoked in definition 62.Define a Token Processor[13](`mkfidn') { int dummy = *klass; CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); }
Keyword Recognition in Fixed-Format Text[68]==File keywds.h contains an initialized declaration of array KeyTable, the generated keyword table. It also declares MAXKWD, the index of the last table element.This macro is invoked in definition 62.typedef struct { /* Definition of a keyword */ char *keychars; /* Character form */ int keycode; /* Syntax code */ int length; /* Length of the keyword string */ } Keywd; #include "keywds.h" /**/ int #ifdef PROTO_OK Keyword(char *c) #else Keyword(c) char *c; #endif /* Get the classification code for a keyword * On entry- * c points to a normalized identifier string * If c has a keyword prefix then on exit- * Keyword=Syntax code of the keyword * Otherwise on exit- * Keyword=-1 **/ { int i; for (i = MAXKWD; i >= 0; i--) { register char *p = c, *q = KeyTable[i].keychars; register int different = 0; while (*p && *q && !different) different = *p++ - *q++; if (!different) { if (!*q) return i; } else if (different > 0 && p == c+1) return -1; } return -1; }
Keywords in I/O Statements[69]==Note here that no classification is specified for the sequence (the specification has no name). Such sequences are given the distinguished classification NORETURN by the scanner, and unless this classification is changed their presence will not be reported to the parser. The character sequences that are valid keywords are pre-loaded into the identifier table, with their classification, so they can be recognized by an invocation of mkidn:This macro is invoked in definition 105.$[a-zA-Z][a-zA-Z\040]*= [mkiokw]
Make a Keyword that is Terminated by =[70]==I/O statement keywords obviously cannot appear in assignment statements, and if mkidn does not change the NORETURN classification then the sequence is not a valid keyword. When the sequence is determined not to be a keyword, the = character is stripped off and keycheck invoked to complete the classification. Even when the sequence is a valid keyword, however, it may not be appearing in an appropriate context. Therefore the processor must prepare for the possibility that the parser will reject this classification.This macro is invoked in definition 104.Define a Token Processor[13](`mkiokw') { if (!Assignment) { CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); mkidn(CsmStrPtr, strlen(CsmStrPtr), klass, intrinsic); if (*klass != NORETURN) { Define the next possible interpretation[52]( `xIdent', ` start + length - 1', ` mkfidn') return; } } TokenEnd = start + length - 1; *klass = xIdent; keycheck(start, length - 1, klass, intrinsic); }
Denotations[71]==Denotations are defined in this section to contain spaces, even though FORTRAN 90 variable-format input permits spaces only in strings. The reason is that the generated scanner must be able to handle either the fixed or the variable format.This macro is invoked in definition 105.Integer Denotations[73] Floating-Point Denotations[78] String Denotations[82] Hollerith Denotations[84] Operator Denotations[86]
Each denotation is represented internally by an intrinsic attribute, whose meaning depends on the particular denotation. A token processor is therefore nominated to compute the intrinsic attribute value for each denotation:
Token processors for denotations[72]==The nominated processor may also perform other duties, as noted in its description.This macro is invoked in definition 104.Make an Integer Value[77] Make a Floating Point Value[81] Make a String Value[83] Make a Hollerith Value[85] Make an Operator Denotation[87]
Integer Denotations[73]==Decimal digits are useful in describing other denotations. To reduce the amount of space occupied by these descriptions, it is useful to define Dig as a shorthand notation for any sequence of digits and spaces, the first of which is a digit:This macro is invoked in definition 71.xIcon: $Dig[74]\040*(Op[75]|Efmt[76])? [mkfint] #if !Fortran77[1] xBcon: $B('[01]+'|\"[01]+\") xOcon: $O('[0-7]+'|\"[0-7]+\") xZcon: $Z('[0-9a-fA-F]+'|\"[0-9a-fA-F]+\") #endif
Dig[74]==An integer followed by a dot or the letter e (or E) can be falsely recognized as a floating-point number. It is not possible to distinguish these cases syntactically, since floating-point denotations and integer denotations are often acceptable in the same context. Therefore the scanner must look far ahead, recognizing the construct beginning with either the dot or the e for what it is, in order to decide that the digit sequence is really an integer. One such construct is a dot-delimited operator, abbreviated by Op, and the other is an E-format descriptor, abbreviated by Efmt:This macro is invoked in definitions 73, 76, 78, 79, 80, 84, 88, 89, and 90.[0-9](\040*[0-9])*
Op[75]==This macro is invoked in definitions 73 and 86.\.\040*[a-zA-Z][a-zA-Z\040]*\.
Efmt[76]==An integer denotation is represented internally by an intrinsic attribute that gives the value of the integer:This macro is invoked in definition 73.(E|e)\040*Dig[74]\040*\.\040*Dig[74]
Make an Integer Value[77]==The token processor mkfint also sets TokenEnd to point to the first character that is neither a digit nor a space. This character would be the dot or letter e, and hence the first character of the following token.This macro is invoked in definition 72.Define a Token Processor[13](`mkfint') { *intrinsic = 0; while (length-- > 0) { register int v = *start - '0'; if (v >= 0 && v < 10) *intrinsic = *intrinsic * 10 + v; else if (*start != ' ') { TokenEnd = start; return; } start++; } #ifdef MONITOR while (TokenEnd[-1] == ' ') TokenEnd--; #endif }
If monitoring is enabled, TokenEnd is backed up over any sequence of
spaces following the integer, so that a lexical monitor won't try to lump
these characters with the integer.
7.2 Floating-Point Denotations
Floating-point denotations are described by sequences of digits in
conjunction with either a decimal point or an exponent:
Floating-Point Denotations[78]==Double-precision values are indicated by the letter d (or D) as the exponent marker.This macro is invoked in definition 71.xRcon: $Dig[74]Exp[79](`e|E')|Sig[80](Exp[79](`e|E'))? [mkfloat] xDcon: $Dig[74]Exp[79](`d|D')|Sig[80](Exp[79](`d|D'))? [mkfloat]
Exponents are described by the following shorthand:
Exp[79](¶1)==A significand is a sequence of digits containing a decimal point. There may be digits before and/or after the point:This macro is invoked in definition 78.(¶1)\040*(\+|\-)?\040*Dig[74]
Sig[80]==A floating-point denotation is represented internally by the index of a string in the character storage:This macro is invoked in definition 78.(Dig[74]\040*\.(\040*[0-9])*|\.\040*Dig[74])
Make a Floating Point Value[81]==The token processor mkfloat also establishes the initial digit string (if one exists) as a possible token if the floating-point denotation is unacceptable at this point in the parse.This macro is invoked in definition 72.Define a Token Processor[13](`mkfloat') { int dummy = xRcon; CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); if (*start == '.') return; while (length-- > 0) { register int temp = *start; if (temp >= '0' && temp <= '9' || temp == ' ') start++; } Define the next possible interpretation[52](`xIcon', ` start', ` mkfint') }
String Denotations[82]==The auxiliary scanner fstr extracts the body of the string from the statement, replacing each doubled internal quote by a single quote. It stores this body in the space provided by the character storage module. The token processor mkfstr then uses mkidn to obtain a unique index for the string:This macro is invoked in definition 71.#if Fortran77[1] xScon: $' (fstr) [mkfstr] #else xScon: $['\"] (fstr) [mkfstr] #endif
Make a String Value[83]==It would have been possible to combine fstr and mkfstr into a single routine, but this was not done because fstr by itself is useful for extracting the file name of an INCLUDE directive.This macro is invoked in definition 72.Define an Auxiliary Scanner[12](`fstr') /* Additional postcondition- * CsmStrPtr points to the transformed string ***/ { register char temp, quote; quote = *start++; for (;;) { if ((temp = *start++) == '\n') { message(ERROR, "Closing quote missing", 0, &curpos); start--; break; } if (temp == quote) { if (*start != quote) break; start++; } obstack_1grow(Csm_obstk, temp); } obstack_1grow(Csm_obstk, '\0'); CsmStrPtr = (char *)obstack_finish(Csm_obstk); return start; } Define a Token Processor[13](`mkfstr') { int dummy = xScon; mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); }
Hollerith Denotations[84]==A Hollerith denotation is represented internally by the index of a string in the character storage. The token processor evaluates the length of the string that should follow the letter h and then collects it. It also establishes the sequence of digits preceding the letter h as an integer, in case the Hollerith denotation is syntactically unacceptable:This macro is invoked in definition 71.xHcon: $Dig[74](H|h) [mkholl]
Make a Hollerith Value[85]==Collection of the string is terminated by either exhaustion of the count or arrival at the end of the available text (indicated by \n). Note that arrival at the end of the text is not an error, but simply indicates that the construct should not be regarded as a Hollerith denotation. Thus mkholl simply falls back and returns the integer count. The value of the intrinsic attribute has already been set (before using the count to extract characters), and the character string extracted is discarded.This macro is invoked in definition 72.Define a Token Processor[13](`mkholl') { register char temp, *p = start; register int count = 0, digits = length - 1; int dummy = xScon; while (digits--) { register int v = *p++ - '0'; if (v >= 0) count = count * 10 + v; } *intrinsic = count; p++; while (count > 0 && (temp = *p++) != '\n') { obstack_1grow(Csm_obstk, temp); count--; } obstack_1grow(Csm_obstk, '\0'); CsmStrPtr = (char *)obstack_finish(Csm_obstk); if (count) { /* Return the integer preceding the H */ obstack_free(Csm_obstk, CsmStrPtr); *klass = xIcon; TokenEnd = start + length - 1; return; } mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); *klass = xScon; Define the next possible interpretation[52]( `xIcon', ` start + length - 1', ` mkfint') TokenEnd = p; }
Operator Denotations[86]==An operator denotation is represented internally by the index of its normalized string (including the bounding dots) in the character storage:This macro is invoked in definition 71.#if Fortran77[1] $Op[75] [mkfopr] #else xDop: $Op[75] [mkfopr] #endif
Make an Operator Denotation[87]==In the FORTRAN 90 variable format, an operator denotation cannot contain spaces. Thus if the normalized version of the string is not the same length as the original then the original contained a space and an error must be reported. The normalized string is discarded in this case.This macro is invoked in definition 72.Define a Token Processor[13](`mkfopr') { #if Fortran77[1] CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); #else if (FixedFormat) CsmStrPtr = NormalizeFixed(start, length, Csm_obstk, F77Fold); else { CsmStrPtr = NormalizeVariable(start, length, Csm_obstk, F77Fold); if (length != strlen(CsmStrPtr)) { message(ERROR,"Space within an operator",0,&curpos); obstack_free(Csm_obstk, CsmStrPtr); *intrinsic = 0; return; } } #endif mkidn(CsmStrPtr, strlen(CsmStrPtr), klass, intrinsic); }
Format descriptors containing dots or beginning with sequences of digits need to be recognized specially:
Format Descriptors[88]==Here D.D is shorthand for two digit sequences separated by a dot and Efw is shorthand for an exponent field width specification:This macro is invoked in definition 105.#if Fortran77[1] xFcon: $[IiFfDd]D.D[89]|[EeGg]D.D[89](Efw[90])? [mkidn] #else xFcon: $[IiBbOoZzFfDd]D.D[89]|(E[NnSs]?|e[NnSs]?|G|g)D.D[89](Efw[90])? [mkidn] #endif xPcon: $((\+|\-)\040*)?Dig[74](P|p) [mkfmti] xXcon: $Dig[74](X|x) [mkfmti]
D.D[89]==This macro is invoked in definition 88.\040*Dig[74]\.\040*Dig[74]
Efw[90]==An xFcon cannot be anything but a format descriptor, so mkidn is used to enter the character sequence into permanent character storage and set the intrinsic attribute to index that entry. Either of the other descriptors could be an integer followed by an identifier or keyword. In addition to entering the string into the character storage, we define xIcon as an alternate interpretation of the token should the parse fail given the initial one:This macro is invoked in definition 88.\040*[Ee]\040*Dig[74]
Make a Format Descriptor[91]==This macro is invoked in definition 104.Define a Token Processor[13](`mkfmti') { register char c = *start; mkidn(start, length, klass, intrinsic); if (c != '+' && c != '-') Define the next possible interpretation[52]( `xIcon', ` start + length - 1', ` mkfint') }
Concatenation Operator[92]==Literals in the grammar do not have names, and therefore we have no names to use in the specifications that nominate token processors for those literals. One solution would be to replace the literal in the grammar by non-literal terminals (thus naming them), and using the non-literal terminals in the specification above. That solution would reduce the documentation value of the grammar, however. A preferable solution is to supply an additional specification that simply associates names with the literals:This macro is invoked in definition 105.$\/\040*\/ [mkconc] $\/ [mkslsh]
Literal recognized when dealing with the concatenation operator[93]==This name can then be used in the normal way by the token processors:This macro is invoked in definition 106.$\/ Slash $\/\/ Concat
Token processors for the concatenation operator[94]==This macro is invoked in definition 104.Define a Token Processor[13](`mkslsh') { *klass = Slash; } Define a Token Processor[13](`mkconc') { *klass = Concat; Define the next possible interpretation[52](`Slash', ` start + 1', ` mkslsh') }
Array Constructor Brackets[95]==We need to associate names with the literals:This macro is invoked in definition 105.#if !Fortran77[1] $\(\040*\/ [mklabr] $\/\040*\) [mkrabr] $\) [mkrpar] #endif $\( [mklpar]
Literals recognized when dealing with array constructor brackets[96]==These names can then be used by the token processors:This macro is invoked in definition 106.#if !Fortran77[1] $\(\/ LeftAcBracket $\/\) RightAcBracket $\) RightParen #endif $\( LeftParen
Token processors for array constructor brackets[97]==This macro is invoked in definition 104.Define a Token Processor[13](`mklpar') { *klass = LeftParen; } #if !Fortran77[1] Define a Token Processor[13](`mkrpar') { *klass = RightParen; } Define a Token Processor[13](`mklabr') { *klass = LeftAcBracket; Define the next possible interpretation[52](`LeftParen', ` start + 1', ` mklpar') } Define a Token Processor[13](`mkrabr') { *klass = RightAcBracket; Define the next possible interpretation[52](`Slash', ` start + 1', ` mkslsh') } #endif
Consider the FORTRAN 90 IMPLICIT statement IMPLICIT INTEGER (A-Z) (I-N). Here the character sequence (A-Z) is an expression defining the kind of integer values and the character sequence (I-N) is the letter range. The distinguishing property is that the letter range is followed by the end of the statement, whereas the expression is not. Clearly this distinction requires looking beyond a putative letter range to see whether the next character is a comma, semicolon or newline. Note that it is not possible to make the decision with any smaller context than the entire letter sequence plus the following character:
Letter Ranges for IMPLICIT Statements[98]==This macro is invoked in definition 105.xImpl: $\(Range[99](,Range[99])*\)\040*(,|;|\n) [mkimpl]
Range[99]==It is clear that most sequences classified by the generated scanner as xImpl will not, in fact, represent letter ranges. The token processor mkimpl must therefore make provision for reclassifying the first character of the sequence as a left parenthesis:This macro is invoked in definition 98.\040*[a-zA-Z]\040*(-\040*[a-zA-Z]\040*)?
Letter Sequences in IMPLICIT Statements[100]==Note that TokenEnd is set to begin the next scan with the comma, semicolon or newline that followed the letter range and that this character is not part of the string being normalized.This macro is invoked in definition 104.Define a Token Processor[13](`mkimpl') { int dummy; CsmStrPtr = NormalizeFixed(start, length - 1, Csm_obstk, FoldIntrinsic); TokenEnd = start + length - 1; dummy = xIdent; mkidn(CsmStrPtr, strlen(CsmStrPtr), &dummy, intrinsic); Define the next possible interpretation[52](`LeftParen', ` start + 1', ` mklpar') }
The intrinsic attribute established for the letter range should be the normalized sequence without the enclosing parentheses, and that is accomplished by using the following translation table:
IMPLICIT Character Conversion Table[101]==Here, in addition to the entry for the space character, the entries for the two parentheses are zero. Thus these characters will be skipped when normalizing the sequence. Only the letters, dashes and internal commas remain.This macro is invoked in definition 46.static char FoldIntrinsic[] = { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0 , '!', '"', '#', '$', '%', '&', '\'', /* Skip spaces */ 0 , 0 , '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@','a', 'b', 'c', 'd', 'e', 'f', 'g', /* Change upper to lower */ 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '[', '\\',']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', 127 };
scan.clp[102]==This macro is attached to a product file.The Command Line Processing Module[3]
scanops.h[103]==This macro is attached to a product file.#include "eliproto.h" Initialize the scanner[57] Set Token Coordinates[21]
scan.c[104]==This macro is attached to a product file.static char RCSid[] = "$Id: Scan.fw,v 1.22 1998/07/07 20:40:43 waite Exp $"; #include <string.h> #include "eliproto.h" #if Fortran77[1] #define FixedFormat 1 #endif #if !Fortran77[1] #include "Include.h" #include "clp.h" #include "CmdLineIncl.h" #endif Eli Library Modules Used[2] The Generated Scanner Module[8] #include "termcode.h" #include "litcode.h" #include "tabsize.h" Character translation code[46] #if !Fortran77[1] extern void NextIncludedLine ELI_ARG((char *)); extern char *fstr ELI_ARG((char *start, int length)); #endif Units of Text[16] static int Assignment = 0; /* Nonzero if the statement is an assignment */ static char NewScanMark = '\0'; /* Trigger for restarting the scanner */ Assignment Statement Recognition[53] Check the Remainder of a Logical IF[55] Parser Resolution of Token Classification[50] Create a statement buffer and prepare to scan it[58] Token processors for identifiers and keywords[62] Token processors for denotations[72] Make a Keyword that is Terminated by =[70] Make a Format Descriptor[91] Token processors for the concatenation operator[94] Token processors for array constructor brackets[97] Letter Sequences in IMPLICIT Statements[100] End-of-statement token processor[60]
scan.gla[105]==This macro is attached to a product file.Identifiers and Keywords[61] Keywords in I/O Statements[69] Denotations[71] Format Descriptors[88] Concatenation Operator[92] Array Constructor Brackets[95] Letter Ranges for IMPLICIT Statements[98] End-of-statement marker[59]
scan.delit[106]==This macro is attached to a product file.Literal recognized when dealing with the concatenation operator[93] Literals recognized when dealing with array constructor brackets[96]