Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Abstract Syntax Tree Unparsing

Previous Chapter Next Chapter Table of Contents


Available Kinds of Unparser

Eli is capable of generating specifications for the following kinds of unparsers:

Textual
Print a "source text" representation of the tree. This kind of unparser is most useful for processors that solve a "source-to-source" translation problem, such as pretty-printing or language extension.

Structural
Print a "structural" representation of the tree. This kind of unparser is most useful for debugging applications, and for processors that output textual representations of tree-structured data objects.

Textual unparser

A textual unparser creates source text which, when parsed, results in the tree that was unparsed. For example, the pretty-printer described above accepted a sentence in the expression language and built a tree. It then unparsed that tree to produce an equivalent sentence in the expression language that was formatted in a particular way. If the resulting sentence were parsed according to the rules of the expression language, the result would be the tree from which it had been created.

Consider the tree representing the following sentence in the expression language:

a((b+c)*d, e)

None of the terminal symbols ( ) + * is stored explicitly in the tree. Thus the textual unparser must reconstruct these terminal symbols from the LIDO rules defining the tree nodes.

The terminal symbol , doesn't appear in any of the LIDO rules, and therefore it cannot be automatically reconstructed by the textual unparser. Additional information must be provided by the user to insert it into the unparsed text. This is a common consequence of using LISTOF productions.

Our definition of the tree grammar for the expression language contains the following rule:

RULE Parens: Expression ::= '(' Expression ')' END;

The purpose of this rule is to support the unparser by retaining information about the presence of parentheses used to override the normal operator precedence. Such parentheses result in a Parens node in the tree, and the unparser can then use this LIDO rule to reconstruct the parentheses.

Parens is an example of a chain rule, in which the left-hand non-terminal symbol appears exactly once as the only non-terminal symbol on the right-hand side. Such chain rules are often eliminated from a tree grammar, because they have no significance to the computations it supports. If a textual unparser is to be generated, however, then either a chain rule must be in the tree grammar or there must be additional information that allows the unparser to reconstruct its terminal symbols.

One important aspect of the textual form of the program that is missing from the LIDO rules is how to separate the basic symbols. For example, consider a CallExp in the expression language:

a(b, c)

There is no information in the LIDO rules about whether a space should precede and/or follow a ( or ,. Spacing is important for making the text readable, however, and cannot simply be ignored.

Computations for plain productions

A generated textual unparser defines the following computation (two attributes are used to simplify overriding, see Changing IdemPtg computations):

ATTR IdemPtg, IdemOrigPtg: PTGNode;

CLASS SYMBOL IdemReproduce COMPUTE
  SYNT.IdemOrigPtg=
    RuleFct("PTGIdem_", RHS.IdemPtg, TermFct("PTGIdem_"));
  SYNT.IdemPtg=THIS.IdemOrigPtg;
END;

The class symbol IdemReproduce is inherited by every non-terminal symbol appearing on the left-hand side of a plain production. For example, it is inherited by Expression and Axiom in the textual unparser specification generated from the expression language definition.

This computation invokes a function specific to the LIDO rule and, if the rule contains any instances of non-literal terminal symbols, a function specific to each. For example, the effect of Expression inheriting IdemReproduce is to carry out computations at the StarExp and CallExp rules that are equivalent to the following rule computations:

RULE StarExp: Expression ::= Expression '*' Expression
COMPUTE
  Expression[1].IdemOrigPtg=
    PTGIdem_StarExp(Expression[2].IdemPtg,Expression[3].IdemPtg);
  Expression[1].IdemPtg=Expression[1].IdemOrigPtg;
END;

RULE CallExp: Expression ::= Identifier '(' Arguments ')'
COMPUTE
  Expression[1].IdemOrigPtg=
    PTGIdem_CallExp(Arguments.IdemPtg,PTGIdem_Identifier(Identifier));
  Expression[1].IdemPtg=Expression[1].IdemOrigPtg;
END;

(For details about RuleFct and TermFct, see Predefined Entities of LIDO - Reference Manual.)

Here are several PTG patterns appearing in a textual unparser generated from the expression language definition that illustrate how the PTG functions are specified:

Idem_StarExp: $1  "*" [Separator]  $2
Idem_IdnExp:  $1  [Separator]
Idem_CallExp: $2  [Separator] "(" [Separator]  $1  ")" [Separator]

Notice how these patterns reconstruct the terminal symbols *, (, and ).

The different orders of the indexed insertion points in the patterns Idem_StarExp and Idem_CallExp are due to the definition of the computation above. Idem_StarExp has two non-terminal children, which appear in order as the arguments of the generated rule function. Idem_CallExp, on the other hand, has a non-literal terminal as its first child and a non-terminal as its second. The non-literal terminal arguments of the generated rule function follow the non-terminal arguments.

Separator is the function that makes the decision about how to place layout characters (see Introduce Separators in PTG Output of Tasks related to generating output). A call to Separator is inserted into every pattern after each terminal symbol, both literal and non-literal. This allows a decision to be made about layout characters between each pair of terminal symbols.

The separator module provides the following output functions, which must be used instead of the corresponding PTG output functions (see Output Functions of PTG: Pattern-based Text Generator):

PTGNode Sep_Out(PTGNode root);
PTGNode Sep_OutFile(char *filename, PTGNode root);
PTGNode Sep_OutFPtr(FILE *fptr, PTGNode root);

The module library contains two modules that implement different strategies for selecting layout characters:

`Sp_Separator.fw'
A single space is used as a separator regardless of the context.

`C_Separator.fw'
Reasonable separator placement rules for C program text: a newline is added after any of ; { }, no separator is added after any of ( [ . ++ --, no separator is added before any of [ ] , . ; ++ --, and a single space added in all other cases.

If none of the available modules is satisfactory, then you must create your own. The simplest approach is to modify one from the library. Here is a sequence of Eli requests that will extract `C_Separator.fw' as file My_Separator.fw, make My_Separator.fw writable, and initiate an editor session on it:

-> $elipkg/Output/C_Separator.fw > My_Separator.fw
-> My_Separator.fw !chmod +w
-> My_Separator.fw <

In order to change the decision about what (if any) separator is to be inserted in a given context, you need to change the function called Sep_Print. Sep_Print (see Introduce Separators in PTG Output of Specification Module Library: Generating Output) has three arguments: a pointer to the file to which the separator is to be written, a pointer to the string that immediately precedes the separator, and a pointer to the string that immediately follows the separator.

Sep_Print must decide whether a separator is appropriate between the two strings given by its second and third arguments, and if so then what that separator should be. If a separator is required, Sep_Print must write that separator to the file. Sep_Print must not modify the strings passed to it.

Computations for LISTOF productions

A generated textual unparser defines the following computation for a LISTOF production named `r' with left-hand side `X' and elements `Y | Z' (two attributes are used to simplify overriding, see Changing IdemPtg computations):

ATTR IdemPtg, IdemOrigPtg: PTGNode;

CLASS SYMBOL IdemReproduce_X COMPUTE
  SYNT.IdemOrigPtg=
    PTG_r(
      CONSTITUENTS (Y.IdemPtg, Z.IdemPtg) SHIELD (Y, Z)
      WITH (PTGNode, PTGIdem_2r, PTGIdem_1r, PTGNull));
  SYNT.IdemPtg=THIS.IdemOrigPtg;
END;

The class symbol IdemReproduce_X is inherited by the non-terminal symbol X. For example, in the textual unparser specification generated from the expression language, there is such a rule with r being ArgList, X being Arguments, and Y being Expression. There is no Z in that case:

CLASS SYMBOL IdemReproduce_Arguments COMPUTE
  SYNT.IdemOrigPtg=
    PTG_ArgList(
      CONSTITUENTS (Expression.IdemPtg) SHIELD (Expression)
      WITH (PTGNode, PTGIdem_2ArgList, PTGIdem_1ArgList, PTGNull));
  SYNT.IdemPtg=THIS.IdemOrigPtg;
END;

Arguments inherits IdemReproduce_Arguments in the textual unparser specification generated from the expression language definition.

The computation for the class symbol invokes three functions specific to the LIDO rule. Here are the three PTG patterns specifying those functions for the ArgList rule:

Idem_ArgList:  $
Idem_2ArgList: $ $
Idem_1ArgList: $

PTG patterns for other LISTOF productions will differ from these only in the pattern names.

Structural unparser

A structural unparser creates a textual description of the tree in terms of rule names and non-literal terminal symbols. For example, the sentence `a(b,c)' in the expression language could be unparsed as the XML file:

<rule_000>
  <CallExp>
    a
    <ArgList>
      <IdnExp>b</IdnExp>
      <IdnExp>c</IdnExp>
    </ArgList>
  </CallExp>
</rule_000>

The entire sentence is output as a rule_000 because the LIDO rule defining Axiom was generated, and was given the name rule_000 by Eli. The single child of this node is a CallExp with two components, the non-literal terminal symbol `a' and an ArgList made up of two IdnExp nodes.

Appropriate layout, with meaningful line breaks and indentation, is important for a human trying to understand the output of a structural unparser. This formatting depends only on structure, however, not on the content of the output.

Structural unparser generators producing both simple descriptions of trees and descriptions in several standard languages are available. It is also possible for a user to create an unparser generator that describes the tree in a language of their own choosing.

Computations for plain productions

A generated structural unparser defines the following computation (two attributes are used to simplify overriding, see Changing IdemPtg computations):

ATTR IdemPtg, IdemOrigPtg: PTGNode;

CLASS SYMBOL IdemReproduce COMPUTE
  SYNT.IdemOrigPtg=
    RuleFct("PTGIdem_", RHS.IdemPtg, TermFct("PTGIdem_"));
  SYNT.IdemPtg = THIS.IdemOrigPtg;
END;

The class symbol IdemReproduce is inherited by every non-terminal symbol appearing on the left-hand side of a plain production. For example, it is inherited by Expression and Axiom in the structural unparser specification generated from the expression language definition.

This computation invokes a function specific to the LIDO rule and, if the rule contains any instances of non-literal terminal symbols, a function specific to each. For example, the effect of Expression inheriting IdemReproduce is to carry out computations at the StarExp and CallExp rules that are equivalent to the following rule computations:

RULE StarExp: Expression ::= Expression '*' Expression
COMPUTE
  Expression[1].IdemOrigPtg=
    PTGIdem_StarExp(Expression[2].IdemPtg,Expression[3].IdemPtg);
  Expression[1].IdemPtg=Expression[1].IdemOrigPtg;
END;

RULE CallExp: Expression ::= Identifier '(' Arguments ')'
COMPUTE
  Expression[1].IdemOrigPtg=
    PTGIdem_CallExp(Arguments.IdemPtg,PTGIdem_Identifier(Identifier));
  Expression[1].IdemPtg=Expression[1].IdemOrigPtg;
END;

(For details about RuleFct and TermFct, see Predefined Entities of LIDO - Reference Manual.)

Here are several PTG patterns from a structural unparser generated from the expression language definition that illustrate how those functions are specified:

Idem_StarExp:
  "<StarExp>" [BP_BeginBlockI]
    [BP_BreakLine] $1 [BP_BreakLine] $2 [BP_BreakLine]
  [BP_EndBlockI] "</StarExp>"
Idem_IdnExp:
  "<IdnExp>" [BP_BeginBlockI]
    [BP_BreakLine] $1 [BP_BreakLine]
  [BP_EndBlockI] "</IdnExp>"
Idem_CallExp:
  "<CallExp>" [BP_BeginBlockI]
    [BP_BreakLine] $2 [BP_BreakLine] $1 [BP_BreakLine]
  [BP_EndBlockI] "</CallExp>"

These patterns are the ones generated if the output is to be an XML file (see Languages describing tree structure).

The different orders of the indexed insertion points in the patterns Idem_StarExp and Idem_CallExp are due to the definition of the computation above. Idem_StarExp has two non-terminal children, which appear in order as the arguments of the generated rule function. Idem_CallExp, on the other hand, has a non-literal terminal as its first child and a non-terminal as its second. The non-literal terminal arguments of the generated rule function follow the non-terminal arguments.

Generated structural unparsers use the block print module (see Typesetting for Block Structured Output of Tasks related to generating output) to provide layout. The generated PTG patterns invoke functions of this module to mark potential line breaks and the boundaries of logical text blocks. The block print module provides the following output functions, which must be used instead of the corresponding PTG output functions (see Output Functions of PTG: Pattern-based Text Generator):

PTGNode BP_Out(PTGNode root);
PTGNode BP_OutFPtr(FILE *fptr, PTGNode root);
PTGNode BP_OutFile(char *filename, PTGNode root);

Note that the textual representation of the children of every node is considered to be a logical text block. A line break can occur before each child. The effect of this specification is to keep the textual representation of a node on a single line if that is possible. Otherwise, the sequence of children is written one per line, indented from the name of the block's rule.

Computations for LISTOF productions

A generated structural unparser defines the following computation for a LISTOF production named `r' with left-hand side `X' and elements `Y | Z' (two attributes are used to simplify overriding, see Changing IdemPtg computations):

ATTR IdemPtg, IdemOrigPtg: PTGNode;

CLASS SYMBOL IdemReproduce_X COMPUTE
  SYNT.IdemOrigPtg=
    PTG_r(
      CONSTITUENTS (Y.IdemPtg, Z.IdemPtg) SHIELD (Y, Z)
      WITH (PTGNode, PTGIdem_2r, PTGIdem_1r, PTGNull));
  SYNT.IdemPtg=THIS.IdemOrigPtg;
END;

The symbol IdemReproduce_X is inherited by the non-terminal symbol X. For example, in the structural unparser specification generated from the expression language, there is such a rule with r being ArgList, X being Arguments, and Y being Expression. There is no Z in that case:

CLASS SYMBOL IdemReproduce_Arguments COMPUTE
  SYNT.IdemOrigPtg=
    PTG_ArgList(
      CONSTITUENTS (Expression.IdemPtg) SHIELD (Expression)
      WITH (PTGNode, PTGIdem_2ArgList, PTGIdem_1ArgList, PTGNull));
  SYNT.IdemPtg=THIS.IdemOrigPtg;
END;

Arguments inherits IdemReproduce_Arguments in the structural unparser specification generated from the expression language definition.

The computation for the class symbol invokes three functions specific to the LIDO rule. Here are the three PTG patterns specifying those functions for the ArgList rule:

Idem_ArgList:
  "<ArgList>" [BP_BeginBlockI]
   [BP_BreakLine] $ [BP_BreakLine]
  [BP_EndBlockI] "</ArgList>"
Idem_2ArgList: $ { [BP_BreakLine] } $
Idem_1ArgList: $

PTG patterns for other LISTOF productions will differ from these only in the rule name.

Languages describing tree structure

By default, a structural unparser generator uses a generic functional representation to describe the tree. Here's the default representation of the sentence `a(b,c)' in the expression language:

rule_000(CallExp(a,IdnExp(b),IdnExp(c)))

(Recall that the entire sentence is output as a rule_000 because the LIDO rule defining Axiom was generated, and was given the name rule_000 by Eli.)

Four other standard representations are available:

XML
Generates an unparser that produces an XML representation of the tree, and a separate DTD file defining the possible structures.

CPP
Generates an unparser that produces C++ code to build the tree, and a separate module defining the set of C++ classes used.

Java
Generates an unparser that produces Java code to build the tree, and a separate package defining the set of Java classes used.

It is also possible to build structural unparser generators for other application languages by modifying existing generator specifications. All unparser generators have the same general organization: they analyze the tree grammar and produce class symbol computations and PTG patterns to output any tree defined by that grammar. Much of the analysis is common, with differences appearing only in the final output of the generated FunnelWeb file.

The unparser generator specifications available in the library are:

`$/Unparser/Analysis.fw'
Analysis of the input text that defines the tree grammar. Common attribute computations supporting a wide range of unparsers.

`$/Unparser/Idem.fw'
Attribute computations specific to textual unparsers.

`$/Unparser/Tree.fw'
Attribute computations specific to the generic functional representation.

`$/Unparser/Xml.fw'
Attribute computations specific to XML files and the associated DTD file.

`$/Unparser/Cpp.fw'
Attribute computations specific to C++ code and the associated module definition.

`$/Unparser/Java.fw'
Attribute computations specific to Java code and the associated package definition.

Suppose that you wanted to create an unparser generator that would produce Modula-3 code to build the tree, and a separate interface file defining the tree structure. Because Modula-3 is quite similar to Java in its structure, you might start by modifying `$/Unparser/Java.fw' from the library. Here is a sequence of Eli requests that will extract `Java.fw' as file Modula-3.fw, make Modula-3.fw writable, and initiate an editor session on it:

-> $elipkg/Unparser/Java.fw > Modula-3.fw
-> Modula-3.fw !chmod +w
-> Modula-3.fw <

After suitable modification, `Modula-3.fw' could be combined with the library specification `$/Unparser/Analysis.fw' to define the new unparser generator. Thus you might create a file called `M3.specs' with the following content:

Modula-3.fw
$/Unparser/Analysis.fw

The unparser generator could then be derived from `M3.specs' as usual (see exe -- Executable Version of the Processor of Products and Parameters Reference):

-> M3.specs :exe


Previous Chapter Next Chapter Table of Contents