Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Syntactic Analysis

Previous Chapter Next Chapter Table of Contents


How to Resolve Parsing Conflicts

Eli attempts to construct a particular kind of parser from the context-free grammar specifying the desired phrase structure. If this attempt fails, Eli reports that failure by describing a set of conflicts. In order to understand what these conflicts mean, and to understand how they might be resolved, it is necessary to have a rudimentary idea of how the constructed parser determines the phrase structure of the input text.

A context-free grammar is said to be ambiguous if it permits more than one phrase structure to describe a single input text. Most conflicts are the result of such ambiguities, and there are three ways of resolving them:

  1. Change the grammar so that only one phrase structure is possible.

  2. Provide additional information that causes the parser to select one of the set of phrase structures.

  3. Change the form of the input text to avoid the ambiguity.

Note that all of these methods result in the parser recognizing a different language than the one described by the original grammar.

How the generated parser determines phrase structure

The generated parser is a finite-state machine with a stack of states. This machine examines the input text from left to right, one basic symbol at a time. The current state of the machine is the one at the top of the stack. It defines the set of productions the parser might be recognizing, and its progress in recognizing each. For example, consider the following trivial grammar:

Sentence: Expression.
Expression: Primary.
Expression: Expression '+' Primary.
Primary: Integer.
Primary: Id.

Initially, the parser might be recognizing the first production, but in order to do so it must recognize either the second or the third. In order to recognize the second production, it must recognize either the fourth or fifth. Finally, because we are considering the initial situation, no progress has been made in recognizing any of these productions. All of the information expressed by this paragraph is represented by the initial state, which is the only element of the stack.

On the basis of the state at the top of the stack, and the basic symbol being examined, the machine decides on one of two moves:

Shift
Accept the basic symbol as the corresponding terminal, push a new state onto the stack, and examine the next basic symbol.

Reduce
Note that a specific phrase has been recognized, remove a number of states equal to the number of symbols in the sequence of the corresponding production from the stack, push a new state onto the stack, and examine the current basic symbol again.

The parser halts after the reduce move noting that the production containing the axiom has been recognized.

If the first basic symbol of the text were an identifier, a parser for the sample grammar would make a shift move. The new state would be one in which the parser had completely recognized the fifth production. Regardless of the next basic symbol, the parser would then make a reduce move because the fifth production has been recognized. One state would be removed from the stack, and a new state pushed in which the the parser had completely recognized the second production. Again the parser would make a reduce move, removing one state from the stack and pushing a state in which the parser had either completely recognized the first production or recognized the first symbol of the third production.

The parser's next move is determined by the current input symbol. If the text is empty then the parser makes the reduce move noting that the first production has been recognized and halts. If the current symbol of the text is '+' then the parser makes a shift move.

A conflict occurs when the information available (the current state and the basic symbol being examined) does not allow the parser to make a unique decision. If either a shift or a reduce is possible, the conflict is a shift-reduce conflict; if more than one phrase could have been recognized, the conflict is a reduce-reduce conflict.

Example of a shift-reduce conflict

The classic example of a shift-reduce conflict is the so-called "dangling else problem":

Statement: 'if' Expression 'then' Statement.
Statement: 'if' Expression 'then' Statement 'else' Statement.

A parser built from a grammar containing these productions will have at least one state in which it could be recognizing either, and has just completed recognition of the Statement following then. Suppose that the current basic symbol is else; what move should the parser make next?

Clearly it could shift, accepting the else and going to a state in which it is recognizing the second production and has just completed recognition of the else. It could also reduce, however, recognizing an instance of the first production, popping four elements from the stack and returning to the current state. Thus there is a shift-reduce conflict.

The conflict here is due to an ambiguity in the grammar. Consider the following input text (E1 and E2 are arbitrary expressions, S1 and S2 are statements that do not contain if):

if E1 then if E2 then S1 else S2

There are two possible phrase structures for this text, depending on whether the else is assumed to belong with the first or second if:

if E1 then {if E2 then S1} else S2
if E1 then {if E2 then S1 else S2}

In each case the bracketed sub-sequence is a Statement according to one of the given rules, and the entire line is a Statement according to the other. Both are perfectly legal phrase structures according to the grammar.

Example of a shift-reduce conflict

The following description of integer denotations in various bases leads to a reduce-reduce conflict:

Denotation: Seq / Seq Base.
Seq: Digit / Seq Next.
Next: Digit / Hexit.
Digit: '0' / '1' / '2' / '3' / '4' / '5' / '6' / '7' / '8' / '9'.
Hexit: 'a' / 'b' / 'c' / 'd' / 'e' / 'f'.
Base: 'b' / 'o' / 'e' / 'x'.

When Base is omitted, the integer is assumed to be decimal if it contains no Hexit and hexadecimal otherwise. An explicit Base indicates the base of the digits to be 2, 8, 10 or 16 respectively.

One of the states of the parser constructed from this grammar indicates that either the Hexit b or the Base b has been recognized. If the input is not empty then the parser has recognized a Hexit, but either is possible if the input is empty. Thus the parser cannot determine the production by which to reduce, and the conflict arises.

This conflict indicates an ambiguity in the grammar, exemplified by the input text "1b". Two phrase structures are possible, one yielding the value "1 base 2" and the other yielding the value "1b base 16".

Conflict resolution by changing the grammar

An ambiguity can sometimes be resolved by changing the grammar. The altered grammar must define exactly the same set of input texts as the grammar that gave rise to the conflict, but it cannot describe more than one phrase structure for any particular text. That phrase structure must reflect the meaning of the text as defined by the language design.

Most languages solve the dangling else problem by associating an else with the closest if. Here is an unambiguous grammar describing that phrase structure:

Statement: matched / unmatched.
matched:
   'if' Expression 'then' matched 'else' matched /
   Others.
unmatched:
   'if' Expression 'then' matched 'else' unmatched /
   'if' Expression 'then' Statement.

(Others stands for all sequences by which Statement could be replaced that contain no if.)

If the identifiers Statement, matched and unmatched are placed in an equivalence class, then this grammar yields exactly the same phrase structure as the ambiguous grammar given in the previous section. It is therefore acceptable as far as the remainder of the translation problem is concerned.

Conflict resolution by ignoring possible structures

When Eli is constructing a parser from a grammar, it computes a set of symbols called the exact right context for each production in each state. The exact right context of a production in a state contains all of the symbols that could follow the phrase associated with that production in that state. It is possible for the parser to reduce by a production if the current state indicates that all of the symbols in the production's sequence have been accepted, and the next basic symbol of the input is a member of the exact right context of that production in that state.

By adding a modification to the description of a production in a type-`con' file, the user can specify that a particular symbol be deleted from one or more exact right contexts. The user is, in effect, telling Eli that these symbols cannot follow the phrase associated with that production in that state. In other words, the parser is to ignore phrase structures in which the specified symbol follows the phrase.

A modification is a sequence consisting of either a dollar ($) or at-rate-of (@) followed by a terminal. It can be placed anywhere within a production, and more than one modification can appear in a single production. If a modification is introduced but no conflict is resolved thereby, an error is reported.

The effect of a $-modification

Suppose that a modification $S is introduced into a production P. The effect of this modification is to delete the symbol S from the exact right context of production P. This kind of modification can be used to solve the dangling else problem:

Statement: 'if' Expression 'then' Statement $'else'.
Statement: 'if' Expression 'then' Statement 'else' Statement.

The modification introduced into the first production removes else from the exact right context of that production, and therefore makes a reduce move impossible for the parser when it is in the state indicating that it is recognizing one of these productions and has just recognized the first Statement. Since the reduce move is impossible, there is no shift-reduce conflict.

The effect of a @-modification

Suppose that a modification @S is introduced into a production P. The effect of this modification is to delete the symbol S from the exact right context of any production involved in a reduce-reduce conflict with production P. This kind of modification can be used to solve the integer denotation problem:

Denotation: Seq / Seq Base.
Seq: Digit / Seq Next.
Next: Digit / Hexit.
Digit: '0' / '1' / '2' / '3' / '4' / '5' / '6' / '7' / '8' / '9'.
Hexit: 'a' / 'b' / 'c' / 'd' / 'e' / 'f'.
Base: 'b' @EOF / 'o' / 'e' @EOF / 'x'.

The two modifications introduced into the productions remove EOF (the empty input text) from the exact right contexts of the Hexit productions that conflict with these two Base productions, and therefore make it impossible to reduce the Hexit productions when the parser is in the state indicating it has completed recognizing either a Hexit or Base and the input is empty. A b or e at the end of an input text will thus always be interpreted as a marker: "1b" means "1 base 2", not "1b base 16". ("1b base 16" would have to be written as "1bx".)


Previous Chapter Next Chapter Table of Contents