Eli   Documents Get Eli: Translator Construction Made Easy at SourceForge.net.
    Fast, secure and Free Open Source software downloads

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Syntactic Analysis

Previous Chapter Next Chapter Table of Contents


Using Foreign parsers

When Eli is used to generate a parser, Maptool is able to relate the concrete syntax to the abstract syntax and create all of the code necessary to build a tree representing the input text. If a parser is generated by other tools, or written by hand, tree-building code must be created manually. In this section, we assume that a parser for the source language exists, and that Eli is being used to generate code from a LIDO specification of the abstract syntax and desired tree computations.

The interface specification of any parser designed to support tree computation defines a set of function invocations that will occur as parsing of the input text proceeds. If the parser has been generated, these function invocations are included in the grammar as semantic actions (see Carrying Out Actions During Parsing).

The code generated from a LIDO specification includes a set of tree construction functions, one for each rule context (see Tree Construction Functions of LIDO - Reference Manual). These functions must be invoked at appropriate times with appropriate arguments during the course of the parse. In order to use an existing parser, therefore, we must implement a module obeying the interface specification of that parser and correctly invoking the tree construction functions generated from the LIDO specification.

It would be possible to develop the module in isolation and then integrate it with the foreign parser, but a better approach is to use the foreign parser as part of the Eli specification of the complete program. Development can then proceed incrementally using Eli tools like execution monitoring to track down errors and verify correct behavior.

Building tree nodes

A typical parser interface specifies a data structure to define text fragments, in addition to the semantic actions:

typedef struct {  /* Basic symbol */
  int line;       /*   Source line containing the symbol */
  int col;        /*   Column containing the first character */
  int type;       /*   Classification code of the symbol */
  char *text;     /*   Symbol text */
} Token;
                  /* Symbol classification codes */
#define ETXT 1    /*   End of the shource file */
#define LPAR 2    /*   Left parenthesis */
#define RPAR 3    /*   Right parenthesis */
#define PLUS 4    /*   Plus */
#define STAR 5    /*   Asterisk */
#define INTG 6    /*   Integer */
...

Each tree construction function generated by Eli from a LIDO specification must be invoked with pointers to its children, and therefore the tree must be built bottom-up. The usual strategy is to store pointers to constructed nodes on a stack until their parent node is built. Eli provides a stack module for this purpose:

NODEPTR *_nst;         /* Stack array: _nst[...] are the elements */
int _nsp;              /* Stack index: _nst[_nsp] is the top element */
void _incrnodestack(); /* Push an empty element onto the stack */
Elements of the stack are _nst[_nsp], _nst[_nsp-1], etc. The statement _nsp-=k; pops k elements off of the stack, and the statement _incrnodestack(); pushes an empty element onto the stack. To make the stack visible, include the file `treestack.h'.

The behavior of the functions called by the parser is determined primarily by the needs of the abstract syntax. We'll consider two LIDO specifications, one for computing the value of an integer expression involving addition and multiplication and the other for carrying out overload resolution in more general expressions.

Tree designed for expression evaluation

Consider the following LIDO specification, which evaluates an integer expression involving addition and multiplication. It assumes each Integer terminal is represented by the value of the corresponding integer:

ATTR val: int;

RULE Top: Root ::= Expr COMPUTE
  printf("The value is %d\n", Expr.val);
END;

RULE Add: Expr ::= Expr '+' Expr COMPUTE
  Expr[1].val=ADD(Expr[2].val,Expr[3].val);
END;

RULE Mul: Expr ::= Expr '*' Expr COMPUTE
  Expr[1].val=MUL(Expr[2].val,Expr[3].val);
END;

RULE Num: Expr ::= Integer COMPUTE
  Expr.val=Integer;
END;
Eli generates four node construction functions from this specification:

NODEPTR MkTop(POSITION *_coord, NODEPTR _d1);
NODEPTR MkAdd(POSITION *_coord, NODEPTR _d1, NODEPTR _d2);
NODEPTR MkMul(POSITION *_coord, NODEPTR _d1, NODEPTR _d2);
NODEPTR MkNum(POSITION *_coord, int _TERM1);
To make the node construction functions visible, include the file `treecon.h'.

The _coord parameter will be discussed in detail in the next section; here we will always supply NoPosition as the value of this argument (see Source Text Coordinates and Error Reporting of The Eli Library).

Our module must call MkNum whenever the parser recognizes an integer, and we must provide the internal value of that integer as the second argument of that call. The result of the call must be pushed onto the top of the stack.

Suppose that by looking at the code of the parser, or the grammar from which the parser was generated, we determine that when the parser recognizes an integer in the input text it calls the function int_literal_constant with the Token describing that integer as its argument. We might then implement int_literal_constant as follows:

void int_literal_constant(Token *t)
{ _incrnodestack();
  _nst[_nsp]=MkNum(NoPosition,atoi(t->text));
}
Note that this code does not check for an error in the conversion of the string. That might or might not be reasonable, depending upon how careful the parser was in accepting a string as a representation of an integer value.

Further examination of the parser might show that it calls the function mult_operand with no arguments when it has recognized an expression involving two operands and an asterisk operator. In this case, the nodes for the two operand expressions are already on the stack. They must be removed and replaced by a Mul node:

void mult_operand(void)
{ _nst[_nsp-1]=MkMul(NoPosition,_nst[_nsp-1],_nst[_nsp]);
  _nsp--;
}
Implementation of the action when the parser recognizes an expression involving two operands and a plus operator is identical except that MkAdd is invoked instead of MkMul.

If the parser invokes level_0_expr with no arguments when it has completed recognition of the input text, the implementation of that function might be:

void level_0_expr(void)
{ _nst[_nsp]=MkTop(NoPosition,_nst[_nsp]);
}

Suppose that the parser invokes level_3_expr with no arguments when it has recognized an expression in parentheses. There is no corresponding rule in the abstract syntax, because parentheses serve only to override operator precedence and do not affect the computation. In that case, the routine does nothing:

void level_3_expr(void)
{ }

An expression language usually has precedence levels containing several operators. For example, dyadic + and - operators usually have the same precedence, as do dyadic * and /. A parser may invoke a single function when it recognizes any dyadic expression whose operator is at a specific precedence level. In that case, some indication of the operator must be passed to that function. For example, the parser might call mult_operand with a pointer to the operator token. The implementation of mult_operand must then use the token type to select the correct node construction function:

void mult_operand(Token *o)
{ if (o->type == STAR)
    _nst[_nsp-1]=MkMul(NoPosition,_nst[_nsp-1],_nst[_nsp]);
  else
    _nst[_nsp-1]=MkDiv(NoPosition,_nst[_nsp-1],_nst[_nsp]);
  _nsp--;
}
This assumes that the parser will not invoke mult_operand unless the operator is either * or /, and therefore no error checking is required. (If the number of operators at the precedence level were larger, then a switch statement might be preferable to the conditional.)

The code for mult_operand also assumes that the division is implemented by a LIDO rule named Div:

RULE Div: Expr ::= Expr '/' Expr COMPUTE
  Expr[1].val=DIV(Expr[2].val,Expr[3].val);
END;

Tree designed for overload resolution

Consider the following LIDO specification, which provides a structure to analyze the result type of an expression involving addition, subtraction, multiplication, and division of integers and floating point numbers (see Operator Overloading of Tutorial on Type Analysis). It assumes that each Integer and Float terminal is represented by the string definining the corresponding number:

RULE Top: Root ::= Expr END;

RULE Dya: Expr ::= Expr BinOp Expr END;

RULE Pls: BinOp ::= '+' END;
RULE Min: BinOp ::= '-' END;
RULE Str: BinOp ::= '*' END;
RULE Sls: BinOp ::= '/' END;

RULE Ntg: Expr ::= Integer END;
RULE Flt: Expr ::= Float   END;
Eli generates eight node construction functions from this specification. The first six are:

NODEPTR MkTop(POSITION *_coord, NODEPTR _d1);
NODEPTR MkDya(POSITION *_coord, NODEPTR _d1, NODEPTR _d2, NODEPTR _d3);
NODEPTR MkPls(POSITION *_coord);
NODEPTR MkMin(POSITION *_coord);
NODEPTR MkAst(POSITION *_coord);
NODEPTR MkSls(POSITION *_coord);
To make the node construction functions visible, include the file `treenode.h'.

In this example, we would like to represent the integer and floating-point constants in the tree by the strings that represent them in the source text. There are two possibilities:

  1. If the foreign parser stores permanent copies of token strings, then pointers to those strings can be stored in the tree nodes.

  2. If the foreign parser points to token strings in the input buffer, then our module must store them permanently for reference by the tree nodes.

In case 1, a specification must be added to the LIDO description of the tree:

TERM Integer, Float: CharPtr;
CharPtr is the LIDO name for the C type char *. The definition of CharPtr is made available by including file `strings.h'.

The TERM specification causes the two functions MxNtg and MkFlt to be defined as follows:

NODEPTR MkNtg(POSITION *_coord, CharPtr t);
NODEPTR MkFlt(POSITION *_coord, CharPtr t);

Suppose that, as discussed in the last subsection, the parser calls int_literal_constant when it recognizes an integer in the source text. That routine could be implemented as:

void int_literal_constant(Token *t)
{ _incrnodestack();
  _nst[_nsp]=MkNtg(NoPosition,t->text);
}

In case 2, we can make use of Eli's MakeName module (see Generating Optional Identifiers of Solutions of common problems). It provides a function to store a string uniquely and return an integer-valued hash table index to that unique representation:

int MakeName(char *c);
Because the default type of a LIDO terminal is int, we can omit the TERM specification and the two functions MxNtg and MkFlt will be defined as follows:

NODEPTR MkNtg(POSITION *_coord, int t);
NODEPTR MkFlt(POSITION *_coord, int t);
The implementation of int_literal_constant would be:

void int_literal_constant(Token *t)
{ _incrnodestack();
  _nst[_nsp]=MkNum(NoPosition,MakeName(t->text));
}

The MakeName module must be instantiated in order to gain access to the MakeName function. This is done by adding the following line to a `.specs' file:

$/Tech/MakeName.gnrc:inst
No +instance parameter should be supplied, because scanning and parsing are provided by the foreign code. Once the module has been instantiated, the definition of the MakeName function is made available to the tree construction module by including file `MakeName.h'.

Let's assume that the parser invokes a single function when it recognizes any dyadic expression whose operator is at a specific precedence level, passing the operator token to that function. For example, + and - might both be operators at precedence level 1:

void level_1_operator(Token *o)
{ NODEPTR op;
  if (o->type == PLUS) op=MkPls(NoPosition);
  else                 op=MkMin(NoPosition);
  _nst[_nsp-1]=MkDya(NoPosition,_nst[_nsp-1],op,_nst[_nsp]);
  _nsp--;
}
This assumes that the parser will not invoke level_1_operator unless the operator is either + or -, and therefore no error checking is required. (If the number of operators at the precedence level were larger, then a switch statement might be preferable to the conditional.)

If, on the other hand, the parser invokes a function add_operand with no arguments when it has recognized an expression involving two operands and an addition operator then add_operand can be implemented as:

void add_operand(void)
{ _nst[_nsp-1]=
    MkDya(NoPosition,_nst[_nsp-1],MkPls(NoPosition),_nst[_nsp]);
  _nsp--;
}
Note that the operator node implied by the add_operand call must be explicitly created in this case; it is only implicit in the parse.

Tree nodes for chain rules

Recall that a chain rule has the form `X ::= Y', where `X' differs from `Y' (see Chain rule definitions). Such a rule will always result in a tree node with a single child, and if the rule name is `Ch' then the constructor function will be:

NODEPTR MkCh(POSITION *_coord, NODEPTR _d1);

With the exception of the root node of the tree, it is never necessary to explicitly invoke the constructor of a chain rule node. This is actually a very important property of the tree construction module. For example, consider the following fragment of a LIDO specification:

RULE SimpleVar: Var ::= VrblIdUse                      END;
RULE SubscrVar: Var ::= Var '[' Exp ']'                END;
RULE VarExp:    Exp ::= Var                            END;
RULE ArrayExp:  Exp ::= TypeIdUse '[' Exp ']' 'of' Exp END;
RULE Typ: TypeIdUse ::= Symbol                         END;
RULE Var: VrblIdUse ::= Symbol                         END;
RULE Idn:    Symbol ::= Identifier                     END;
Identifier is a terminal symbol, represented by a unique permanent string (see Tree designed for overload resolution).

The problem here is that, given the input sequence `a[', a parser would have to look beyond the matching `]' in order to decide whether `a' was a VrblIdUse or a TypeIdUse. But because the rules Typ and Var are chain rules, their constructor functions don't need to be called. That means the parser can construct an Idn node for `a' and leave it on the stack. If that node is later used as the left child of a SubscrVar node, the tree construction module will insert the necessary Var and SimpleVar nodes. If, on the other hand, the Idn node is used as the left child of an ArrayExp node then the tree construction module will insert the necessary Typ node. There is no need for the parser to look ahead.

Adding coordinate information

LIDO computations may access the coordinates of the first character of the source text region represented by a node. Usually, these computations are used to attach error reports to appropriate text locations. Many of the modules that implement common computations use this facility for error reporting (for an example, see Verifying typed identifier usage of Type Analysis).

Execution monitoring is provided by Noosa, a separate process that can display the abstract syntax tree and graphically relate it to the source text (see Trees and Attribute Values of Execution Monitoring Reference). Noosa requires that both the source text coordinates of the first character of a tree context and those of the first character beyond that context be supplied to its construction function.

Specific source text coordinates are represented by a POSITION (see Source Text Coordinates and Error Reporting of The Eli Library). This data type and the operations upon it are made visible by including the file `err.h'. An appropriate POSITION value must be created from parser data and a pointer to that data passed to the tree construction function.

Supplying coordinates for computation

LIDO provides three names that can be used in computations to obtain source text coordinates of a tree context (see Predefined Entities of LIDO - Reference Manual):

LINE
the source line number of the tree context.
COL
the source column number of the tree context.
COORDREF
the address of the source coordinates of the tree context, to be used for example in calls of the message routine of the error module or in calls of tree construction functions.
If any of these three names appear in the LIDO computation, the source text coordinates of the first character of each tree context must be supplied to its node construction function. That information must be extracted from the parser.

In order to support the use of coordinates in computation, the tree construction function must have access to the location of the first character of its tree context. We have assumed that each token provided by the parser specifies the line and column of the first character of the corresponding input string (see Building tree nodes). This information can be used to build a POSITION value:

POSITION curpos;

void int_literal_constant(Token *t)
{ LineOf(curpos) = t->line; ColOf(curpos) = t->col;
  _incrnodestack();
  _nst[_nsp]=MkNum(&curpos,atoi(t->text));
}
Notice that the address of curpos, rather then curpos itself, is passed to the node construction function MkNum.

Unfortunately, this information isn't sufficient. We must not only pass the coordinates to MkNum, we must also save them on the stack in case this node is the left child of another node. At that point, the coordinates of the first character of this token would be the coordinates of the first character of the larger tree context.

The Eli stack module actually provides two parallel stacks, _nst for nodes and _pst for positions. Thus the complete code for integer literal constants would be:

void int_literal_constant(Token *t)
{ LineOf(curpos) = t->line; ColOf(curpos) = t->col;
  _incrnodestack();
  _pst[_nsp]=curpos;
  _nst[_nsp]=MkNum(&curpos,atoi(t->text));
}
The position value, not a pointer to that value, is saved on the stack. That frees curpos to be used in constructing other values.

When a node whose children are all on the stack is constructed, the coordinates are obtained from the leftmost child:

void mult_operand(void)
{ _nst[_nsp-1]=MkMul(&_pst[_nsp-1],_nst[_nsp-1],_nst[_nsp]);
  _nsp--;
}

Generally speaking, the stack location for the left operand becomes the stack location for the result. Because the coordinates of the result are the coordinates of the left operand, there is no need for an assignment to _pst.

Supplying coordinates for Noosa

Noosa requires the coordinates of the first character of a tree context and also the coordinates of the first character beyond the end of that context. The additional coordinates should be supplied, however, only if execution monitoring has actually been specified for the particular run. This is because the POSITION value will only have the necessary space if monitoring has been specified.

The simplest strategy is to define a routine to compute the appropriate POSITION value for a given token:

POSITION PositionOf(Token_t *token)
{ POSITION curpos;

  LineOf(curpos) = token->line; ColOf(curpos) = token->col;
#ifdef RIGHTCOORD
  RLineOf(curpos) = LineOf(curpos);
  RColOf(curpos) = ColOf(curpos) + strlen(token->text);
#ifdef MONITOR
  CumColOf(curpos) = ColOf(curpos); RCumColOf(curpos) = RColOf(curpos);
#endif
#endif
  return curpos;
}
RIGHTCOORD and MONITOR are defined by Eli for each C compilation if the user specifies the +monitor parameter to the derivation (see monitor of Products and Parameters Reference).

A node for an integer literal constant would then be built by:

void int_literal_constant(Token *t)
{ curpos = PositionOf(t);
  _incrnodestack();
  _pst[_nsp]=curpos;
  _nst[_nsp]=MkNum(&curpos,atoi(t->text));
}

The construction of a node whose children are on the stack becomes more complex, because the coordinates of the constructed node involve the coordinates of the first character of the leftmost child node and the coordinates of the first character beyond the end of rightmost child node. The tree stack module provides a function, SpanOf, to compute the correct coordintes:

POSITION SpanOf(POSITION left, POSITION right);

Using SpanOf, the mult_operand routine would be written as:

void mult_operand(void)
{ curpos=SpanOf(_pst[_nsp-1],_pst[_nsp]);
   _pst[_nsp-1]=curpos;
   _nst[_nsp-1]=MkMul(&curpos,_nst[_nsp-1],_nst[_nsp]);
  _nsp--;
}

Building LISTOF constructs

There are three tree construction functions associated with a LISTOF construct with the rule name `Ex' (see Tree Construction Functions of LIDO - Reference Manual):

NODEPTR MkEx(POSITION *_coord, NODEPTR _d1);
NODEPTR Mk0Ex(POSITION *_coord);
NODEPTR Mk2Ex(POSITION *_coord, NODEPTR _d1, NODEPTR _d2);
Arguments `_d1' and `_d2' may be:

  • the result of Mk0Ex, which represents an empty portion of the list (any call to Mk0Ex can be replaced by the constant NULLNODEPTR)

  • the result of Mk2Ex, which represents a portion (possibly empty) of the list

  • any node that can be made a list element subtree by implicit insertion of chain contexts, which represents a single element of the list
The node representing the complete `Ex' construct is the one resulting from a call of MkEx.

LISTOF constructs always involve either looping or recursion in a parser. For example, consider a language in which a block consists of an arbitrary non-empty sequence of declarations and statements. The LIDO specification for the abstract syntax might contain the rule:

RULE Blk: Block LISTOF Declaration | Statement END;

Suppose that the parser calls declaration_action after each Declaration has been recognized and statement_action after each Statement has been recognized. Moreover, it calls block_begin prior to beginning analysis of the list and block_end when the end of the block has been reached:

void block_begin(void)
{ _incrnodestack();
  _nst[_nsp]=Mk0Blk(&curpos);
}

void declaration_action(void)
{ curpos=SpanOf(_pst[_nsp-1],_pst[_nsp]);
  _pst[_nsp-1]=curpos;
  _nst[_nsp-1]=Mk2Blk(&curpos,_nst[_nsp-1],_nst[_nsp]);
  _nsp--;
}

void statement_action(void)
{ declaration_action(void) }

void block_end(void)
{ _nst[_nsp]=MkBlk(&_pst[_nsp],_nst[_nsp]); }

Running a foreign parser under Eli

There are two distinct possibilities for the implementation of a foreign parser:

  • The foreign parser exists as a collection of C/C++ source files and/or object files that can be linked with the tree construction and computation modules. (A scanner/parser created by LEX/YACC or FLEX/Bison would have this property.)

  • The foreign parser exists as an executable file that expects to load a shared library containing the tree construction and computation modules. (A scanner/parser created by a Java-based tool like ANTLR would have this property.)

The parser is a collection of routines

When the parser is a collection of routines, whether in source or object form, the files containing those routines can be listed in a `.specs' file (see Descriptive Mechanisms Known to Eli of Guide for New Eli Users). The name of that file then appears in the overall specification. For example, suppose that all of the components of the foreign parser are listed in file `parser.specs' and the tree computations are defined by the file `treecomp.lido'. Then the overall specification of the program might be `prog.specs' with the content:

parser.specs
treecomp.lido
(Normally the tree computation would involve a number of different specifications rather than a single `.lido' file, so a more realistic example would use `treecomp.specs' or `treecomp.fw' to specify it.)

Eli normally generates a parser from every specification. When a parser is supplied, this behavior must be suppressed by adding the parameter +parser=none to the derivation (see How to Request Product Manufacture of Guide for New Eli Users):

prog.specs +parser=none :exe

Eli also normally provides the following main program:

int main(int argc , char *argv[])
{
#ifdef MONITOR
  _dap_init (argv[0]);
  _dapto_enter ("driver");
#endif

  ParseCommandLine(argc, argv);

#include "INIT.h"

  TREEBUILD();

#ifdef STOPAFTERBADPARSE
  if (ErrorCount[ERROR] == 0)
#endif
  ATTREVAL();

#include "FINL.h"

#ifdef MONITOR
  _dapto_leave ("driver");
#endif
  return (ErrorCount[ERROR] > 0);
}
One possible strategy is to write a wrapper procedure named TREEBUILD that carries out all of the setup operations needed for the foreign parser and then invokes it. This can often be done by renaming a main program provided with the foreign parser and making a few changes to it.

If it is not feasible to modify the main program of the foreign parser, then production of Eli's main program must be suppressed by adding the parameter +nomain to the derivation:

prog.specs +nomain +parser=none :exe
In this case, however, the interface module must:

  1. include the initialization code file `INIT.h',

  2. invoke ATTREVAL after the tree has been built,

  3. and include the finalization code file `FINL.h'.

If the parser executes a function call `begin_parse();' before invoking any other functions of the interface, and a function call `end_parse();' when it has completed recognition of the input text, then the implementation of these two functions might be:

void begin_parse(void)
{
#ifdef MONITOR
  _dap_init ("");  /* Argument is generally the program name */
#endif

#include "INIT.h"
}

void end_parse(void)
{  _nst[_nsp]=MkRoot(&_pst[_nsp],_nst[_nsp]);
  ATTREVAL();

#include "FINL.h"
}
Replace "Root" by the name of the rule that creates the root node of the tree. If the root node is created by another function, omit the Mk-function call. ATTREVAL assumes that the root node is at the top of the stack; if this pre-condition is not satisfied then the computation will silently do nothing.

The parser is an executable file

When the parser is an executable file that expects to load a shared library, that library must be built from the specifications of the tree construction and computation (see so of Products and Parameters Reference). The library must not contain a parser or a main program:

treecomp.specs +nomain +parser=none :so
Here we assume that all of the components of the LIDO specification, tree construction interface, and supporting modules are listed in `treecomp.specs'.

The simplest approach to integrating the foreign parser with the shared library is to copy it to a file with the name that the foreign parser expects. For example, if the parser program expects to load a shared library named `ParserActions.so', then use the following derivation to make the library available under that name:

treecomp.specs +nomain +parser=none :so > libParserActions.so
(See your system documentation for the placement and naming of shared library files.)


Previous Chapter Next Chapter Table of Contents