General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Syntactic AnalysisImproving Error Recovery in the Generated Parser
In some cases, the same pattern in the input text may represent
different tokens in the grammar. Knowing which token the pattern
represents may be based on other available information.
When the parser determines that it cannot accept the next
look-ahead token, the Boolean function
int Reparatur (POSITION *coord, int *syncode, int *intrinsic);
This allows the user to change the look-ahead token based on
other available information. If the function returns
By default, the Eli system provides file `dfltrepar.c'
containing a definition of the function In case of erroneous input the generated parser invokes its error recovery. The error recovery works completely automatically and usually behaves satisfactorily, in that it produces a tree that is close to the one that might be expected if there were no syntactic errors. This enables the compiler to go on and detect additional semantic errors. It is also possible to generate a program that will terminate after parsing if syntactic errors were detected. To generate a program with this property, simply add the following parameter to the request for derivation (see define of Products and Parameters Reference):
+define='STOPAFTERBADPARSE' There are a few possibilities to control the error recovery in order to improve its behavior. To understand the control facilities it is necessary to know how the error recovery works in principle. If an error in the input is detected two methods for error repair are used. The first method tries to "correct" the error by deleting, inserting, or replacing one input symbol. The repair is considered successful, if the next 4 parsing steps don't lead to another error. The use of this method is optional. If the first method is not used or if it failed the second method performs a complete correction without backtracking. It skips input symbols until a so-called restart point is reached. The restart point is a symbol where normal parsing can be resumed. Before normal parsing resumes error correction takes place. Input symbols are inserted in order to construct a syntactically correct input and the associated semantic actions are executed. The intention is to pass consistent information to the following compiler phases, which therefore do not have to bother with syntax errors. The second method for error recovery can be controlled by providing additional information. The intention is to decrease the probability of error avalanches caused by wrong error repair decisions. As a running example, we use an ALGOL-like language defined by the following grammar:
block : 'begin' declarations statements 'end' . declarations : declarations declaration ';' / . declaration : 'real' 'identifier' / / 'procedure' 'identifier' ';' statement . statements : statement / statements ';' statement . statement : 'identifier' / block . Three types of error recovery information can be specified by the user in files of type `.perr':
List SeparatorsThe error recovery has a major drawback when applied to errors in lists, defined, e.g., as
statements : statement / statements ';' statement . A missing delimiter ';' cannot be inserted in order to parse the rest of the list. This could lead to an infinite loop in the parser. Therefore errors like
begin identifier begin identifier ; ... cannot be repaired by inserting the semicolon ';' but by deleting the two symbols 'begin' and 'identifier'. The following specification in a `.perr' file defines the mentioned terminals as list separators.
$SEPA ';' . ',' . A list separator will always be inserted if a restart point can be found immediately behind it. In this case the rest of the list can be parsed without the danger of getting into an infinite loop.
Semantic BracketsProgramming languages have bracketed structures like 'begin' and 'end' which delimit not only the syntactic structure of "block" but also the scope of identifiers. Deleting or inserting such semantically significant parentheses is highly probably to cause avalanches of syntactic and semantic errors. Therefore, the error recovery should not change the structures of a program as far as it concerns scopes of identifiers or similar semantic concepts. Consider the following erroneous input:
begin procedure identifier ; begin real identifier ; identifier ; real identifier ; identifier ; Inserting the terminal 'end' before the second "real declaration" corrects the program syntactically but may lead to a semantic error in the last line, as the scope structure is changed. The specification
$BRACKET 'begin' . 'end' . in a file of type `.perr' declares the mentioned terminals to be delimiters of semantically significant regions (semantic delimiters). These terminals are not inserted unless the restart point is end of input or the restart point itself is specified as such a delimiter.
Unsafe Restart PointsUsually there are a few terminals not suited as restart points. The reason is that is programming languages terminals like 'identifier' or 'number' occur in many different syntactic positions. Consider the error
begin real identifier identifier ; real identifier ... There is no safe way to tell whether the second identifier belongs to a statement or to a declaration. If it is used as a restart point, the error is corrected to
begin real identifier ; identifier ; real identifier ... This corresponds to a transition from the declaration part into the statement part of the block, a frequent cause for error avalanches. In general, terminals like 'identifier' or 'number' are not feasible as restart points. The specification
$SKIP 'identifier' . 'integer_number' . 'real_number' . in a type `.perr' file defines the mentioned terminals as unsafe restart points. Unsafe restart points are skipped in case of an error in order to search for restart points more feasible. With the above specification the second identifier in the mentioned example will be skipped. Parsing resumes at the following semicolon without carrying out a transition to the statement part.
|