General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Syntactic AnalysisHow to Resolve Parsing ConflictsEli attempts to construct a particular kind of parser from the context-free grammar specifying the desired phrase structure. If this attempt fails, Eli reports that failure by describing a set of conflicts. In order to understand what these conflicts mean, and to understand how they might be resolved, it is necessary to have a rudimentary idea of how the constructed parser determines the phrase structure of the input text. A context-free grammar is said to be ambiguous if it permits more than one phrase structure to describe a single input text. Most conflicts are the result of such ambiguities, and there are three ways of resolving them:
Note that all of these methods result in the parser recognizing a different language than the one described by the original grammar.
How the generated parser determines phrase structureThe generated parser is a finite-state machine with a stack of states. This machine examines the input text from left to right, one basic symbol at a time. The current state of the machine is the one at the top of the stack. It defines the set of productions the parser might be recognizing, and its progress in recognizing each. For example, consider the following trivial grammar:
Sentence: Expression. Expression: Primary. Expression: Expression '+' Primary. Primary: Integer. Primary: Id. Initially, the parser might be recognizing the first production, but in order to do so it must recognize either the second or the third. In order to recognize the second production, it must recognize either the fourth or fifth. Finally, because we are considering the initial situation, no progress has been made in recognizing any of these productions. All of the information expressed by this paragraph is represented by the initial state, which is the only element of the stack. On the basis of the state at the top of the stack, and the basic symbol being examined, the machine decides on one of two moves:
The parser halts after the reduce move noting that the production containing the axiom has been recognized. If the first basic symbol of the text were an identifier, a parser for the sample grammar would make a shift move. The new state would be one in which the parser had completely recognized the fifth production. Regardless of the next basic symbol, the parser would then make a reduce move because the fifth production has been recognized. One state would be removed from the stack, and a new state pushed in which the the parser had completely recognized the second production. Again the parser would make a reduce move, removing one state from the stack and pushing a state in which the parser had either completely recognized the first production or recognized the first symbol of the third production. The parser's next move is determined by the current input symbol. If the text is empty then the parser makes the reduce move noting that the first production has been recognized and halts. If the current symbol of the text is '+' then the parser makes a shift move. A conflict occurs when the information available (the current state and the basic symbol being examined) does not allow the parser to make a unique decision. If either a shift or a reduce is possible, the conflict is a shift-reduce conflict; if more than one phrase could have been recognized, the conflict is a reduce-reduce conflict.
Example of a shift-reduce conflictThe classic example of a shift-reduce conflict is the so-called "dangling else problem":
Statement: 'if' Expression 'then' Statement. Statement: 'if' Expression 'then' Statement 'else' Statement.
A parser built from a grammar containing these productions will have at
least one state in which it could be recognizing either, and has just
completed recognition of the
Clearly it could shift, accepting the
The conflict here is due to an ambiguity in the grammar.
Consider the following input text (E1 and E2 are arbitrary expressions, S1
and S2 are statements that do not contain
if E1 then if E2 then S1 else S2
There are two possible phrase structures for this text, depending on
whether the
if E1 then {if E2 then S1} else S2 if E1 then {if E2 then S1 else S2}
In each case the bracketed sub-sequence is a
Example of a shift-reduce conflictThe following description of integer denotations in various bases leads to a reduce-reduce conflict:
Denotation: Seq / Seq Base. Seq: Digit / Seq Next. Next: Digit / Hexit. Digit: '0' / '1' / '2' / '3' / '4' / '5' / '6' / '7' / '8' / '9'. Hexit: 'a' / 'b' / 'c' / 'd' / 'e' / 'f'. Base: 'b' / 'o' / 'e' / 'x'.
When
One of the states of the parser constructed from this grammar indicates
that either the This conflict indicates an ambiguity in the grammar, exemplified by the input text "1b". Two phrase structures are possible, one yielding the value "1 base 2" and the other yielding the value "1b base 16".
Conflict resolution by changing the grammarAn ambiguity can sometimes be resolved by changing the grammar. The altered grammar must define exactly the same set of input texts as the grammar that gave rise to the conflict, but it cannot describe more than one phrase structure for any particular text. That phrase structure must reflect the meaning of the text as defined by the language design.
Most languages solve the dangling else problem
by associating an
Statement: matched / unmatched. matched: 'if' Expression 'then' matched 'else' matched / Others. unmatched: 'if' Expression 'then' matched 'else' unmatched / 'if' Expression 'then' Statement.
(
If the identifiers
Conflict resolution by ignoring possible structuresWhen Eli is constructing a parser from a grammar, it computes a set of symbols called the exact right context for each production in each state. The exact right context of a production in a state contains all of the symbols that could follow the phrase associated with that production in that state. It is possible for the parser to reduce by a production if the current state indicates that all of the symbols in the production's sequence have been accepted, and the next basic symbol of the input is a member of the exact right context of that production in that state. By adding a modification to the description of a production in a type-`con' file, the user can specify that a particular symbol be deleted from one or more exact right contexts. The user is, in effect, telling Eli that these symbols cannot follow the phrase associated with that production in that state. In other words, the parser is to ignore phrase structures in which the specified symbol follows the phrase. A modification is a sequence consisting of either a dollar ($) or at-rate-of (@) followed by a terminal. It can be placed anywhere within a production, and more than one modification can appear in a single production. If a modification is introduced but no conflict is resolved thereby, an error is reported.
The effect of a $-modification
Suppose that a modification
Statement: 'if' Expression 'then' Statement $'else'. Statement: 'if' Expression 'then' Statement 'else' Statement.
The modification introduced into the first production removes
The effect of a @-modification
Suppose that a modification
Denotation: Seq / Seq Base. Seq: Digit / Seq Next. Next: Digit / Hexit. Digit: '0' / '1' / '2' / '3' / '4' / '5' / '6' / '7' / '8' / '9'. Hexit: 'a' / 'b' / 'c' / 'd' / 'e' / 'f'. Base: 'b' @EOF / 'o' / 'e' @EOF / 'x'.
The two modifications introduced into the productions remove
|