Eli   Documents Get Eli: Translator Construction Made Easy at SourceForge.net.
    Fast, secure and Free Open Source software downloads

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Syntactic Analysis

Previous Chapter Next Chapter Table of Contents


Improving Error Recovery in the Generated Parser

In some cases, the same pattern in the input text may represent different tokens in the grammar. Knowing which token the pattern represents may be based on other available information. When the parser determines that it cannot accept the next look-ahead token, the boolean function Reparatur is called:

int Reparatur (POSITION *coord, int *syncode, int *intrinsic);

This allows the user to change the look-ahead token based on other available information. If the function returns 0, then the token has not been altered and the generated parser continues with its normal error recovery. If the function returns 1, it is assumed that the passed in attributes of the token have been changed (in particular syncode), and the generated parser rechecks the look-ahead token to see if it can accept it.

By default, the Eli system provides file `dfltrepar.c' containing a definition of the function Reparatur that always returns 0. To override the default, the user must provide a new definition of the function Reparatur in some C file.

In case of erroneous input the generated parser invokes its error recovery. The error recovery works completely automatically and usually behaves satisfactorily, in that it produces a tree that is close to the one that might be expected if there were no syntactic errors. This enables the compiler to go on and detect additional semantic errors.

It is also possible to generate a program that will terminate after parsing if syntactic errors were detected. To generate a program with this property, simply add the following parameter to the request for derivation (see define of Products and Parameters Reference):

+define='STOPAFTERBADPARSE'

There are a few possibilities to control the error recovery in order to improve its behavior. To understand the control facilities it is necessary to know how the error recovery works in principle.

If an error in the input is detected two methods for error repair are used. The first method tries to "correct" the error by deleting, inserting, or replacing one input symbol. The repair is considered successful, if the next 4 parsing steps don't lead to another error. The use of this method is optional. If the first method is not used or if it failed the second method performs a complete correction without backtracking. It skips input symbols until a so-called restart point is reached. The restart point is a symbol where normal parsing can be resumed. Before normal parsing resumes error correction takes place. Input symbols are inserted in order to construct a syntactically correct input and the associated semantic actions are executed. The intention is to pass consistent information to the following compiler phases, which therefore do not have to bother with syntax errors.

The second method for error recovery can be controlled by providing additional information. The intention is to decrease the probability of error avalanches caused by wrong error repair decisions. As a running example, we use an ALGOL-like language defined by the following grammar:

block : 'begin' declarations statements 'end' .
declarations : declarations declaration ';' / .
declaration : 'real' 'identifier' /
                / 'procedure' 'identifier' ';' statement .
statements : statement / statements ';' statement .
statement : 'identifier' / block .

Three types of error recovery information can be specified by the user in files of type `.perr':

The error recovery has a major drawback when applied to errors in lists, defined, e.g., as

statements : statement / statements ';' statement .

A missing delimiter ';' cannot be inserted in order to parse the rest of the list. This could lead to an infinite loop in the parser. Therefore errors like

begin identifier begin identifier ; ...

cannot be repaired by inserting the semicolon ';' but by deleting the two symbols 'begin' and 'identifier'.

The following specification in a `.perr' file defines the mentioned terminals as list separators.

$SEPA ';' . ',' .

A list separator will always be inserted if a restart point can be found immediately behind it. In this case the rest of the list can be parsed without the danger of getting into an infinite loop.

Programming languages have bracketed structures like 'begin' and 'end' which delimit not only the syntactic structure of "block" but also the scope of identifiers. Deleting or inserting such semantically significant parentheses is highly probably to cause avalanches of syntactic and semantic errors. Therefore, the error recovery should not change the structures of a program as far as it concerns scopes of identifiers or similar semantic concepts.

Consider the following erroneous input:

begin
  procedure identifier ;
  begin
    real identifier ;
    identifier ;
    real identifier ;
    identifier ;

Inserting the terminal 'end' before the second "real declaration" corrects the program syntactically but may lead to a semantic error in the last line, as the scope structure is changed.

The specification

$BRACKET 'begin' . 'end' .

in a file of type `.perr' declares the mentioned terminals to be delimiters of semantically significant regions (semantic delimiters). These terminals are not inserted unless the restart point is end of input or the restart point itself is specified as such a delimiter.

Usually there are a few terminals not suited as restart points. The reason is that is programming languages terminals like 'identifier' or 'number' occur in many different syntactic positions. Consider the error

begin real identifier identifier ; real identifier ...

There is no safe way to tell whether the second identifier belongs to a statement or to a declaration. If it is used as a restart point, the error is corrected to

begin real identifier ; identifier ; real identifier ...

This corresponds to a transition from the declaration part into the statement part of the block, a frequent cause for error avalanches. In general, terminals like 'identifier' or 'number' are not feasible as restart points.

The specification

$SKIP 'identifier' . 'integer_number' . 'real_number' .

in a type `.perr' file defines the mentioned terminals as unsafe restart points. Unsafe restart points are skipped in case of an error in order to search for restart points more feasible.

With the above specification the second identifier in the mentioned example will be skipped. Parsing resumes at the following semicolon without carrying out a transition to the statement part.


Previous Chapter Next Chapter Table of Contents