General Information
Tutorials
Reference Manuals
Libraries
Translation Tasks
Tools
Administration
|
Syntactic AnalysisUsing Foreign parsersWhen Eli is used to generate a parser, Maptool is able to relate the concrete syntax to the abstract syntax and create all of the code necessary to build a tree representing the input text. If a parser is generated by other tools, or written by hand, tree-building code must be created manually. In this section, we assume that a parser for the source language exists, and that Eli is being used to generate code from a LIDO specification of the abstract syntax and desired tree computations. The interface specification of any parser designed to support tree computation defines a set of function invocations that will occur as parsing of the input text proceeds. If the parser has been generated, these function invocations are included in the grammar as semantic actions (see Carrying Out Actions During Parsing). The code generated from a LIDO specification includes a set of tree construction functions, one for each rule context (see Tree Construction Functions of LIDO - Reference Manual). These functions must be invoked at appropriate times with appropriate arguments during the course of the parse. In order to use an existing parser, therefore, we must implement a module obeying the interface specification of that parser and correctly invoking the tree construction functions generated from the LIDO specification. It would be possible to develop the module in isolation and then integrate it with the foreign parser, but a better approach is to use the foreign parser as part of the Eli specification of the complete program. Development can then proceed incrementally using Eli tools like execution monitoring to track down errors and verify correct behavior.
Building tree nodesA typical parser interface specifies a data structure to define text fragments, in addition to the semantic actions:
typedef struct { /* Basic symbol */ int line; /* Source line containing the symbol */ int col; /* Column containing the first character */ int type; /* Classification code of the symbol */ char *text; /* Symbol text */ } Token; /* Symbol classification codes */ #define ETXT 1 /* End of the source file */ #define LPAR 2 /* Left parenthesis */ #define RPAR 3 /* Right parenthesis */ #define PLUS 4 /* Plus */ #define STAR 5 /* Asterisk */ #define INTG 6 /* Integer */ ... Each tree construction function generated by Eli from a LIDO specification must be invoked with pointers to its children, and therefore the tree must be built bottom-up. The usual strategy is to store pointers to constructed nodes on a stack until their parent node is built. Eli provides a stack module for this purpose:
NODEPTR *_nst; /* Stack array: _nst[...] are the elements */ int _nsp; /* Stack index: _nst[_nsp] is the top element */ void _incrnodestack(); /* Push an empty element onto the stack */Elements of the stack are _nst[_nsp] , _nst[_nsp-1] , etc.
The statement _nsp-=k; pops k elements off of the stack,
and the statement _incrnodestack(); pushes an empty element onto the
stack.
To make the stack visible, include the file `treestack.h'.
The behavior of the functions called by the parser is determined primarily by the needs of the abstract syntax. We'll consider two LIDO specifications, one for computing the value of an integer expression involving addition and multiplication and the other for carrying out overload resolution in more general expressions.
Tree designed for expression evaluation
Consider the following LIDO specification, which evaluates an integer
expression involving addition and multiplication.
It assumes each
ATTR val: int; RULE Top: Root ::= Expr COMPUTE printf("The value is %d\n", Expr.val); END; RULE Add: Expr ::= Expr '+' Expr COMPUTE Expr[1].val=ADD(Expr[2].val,Expr[3].val); END; RULE Mul: Expr ::= Expr '*' Expr COMPUTE Expr[1].val=MUL(Expr[2].val,Expr[3].val); END; RULE Num: Expr ::= Integer COMPUTE Expr.val=Integer; END;Eli generates four node construction functions from this specification:
NODEPTR MkTop(POSITION *_coord, NODEPTR _d1); NODEPTR MkAdd(POSITION *_coord, NODEPTR _d1, NODEPTR _d2); NODEPTR MkMul(POSITION *_coord, NODEPTR _d1, NODEPTR _d2); NODEPTR MkNum(POSITION *_coord, int _TERM1);To make the node construction functions visible, include the file `treecon.h'.
The
Our module must call
Suppose that by looking at the code of the parser, or the grammar from
which the parser was generated, we determine that when the parser
recognizes an integer in the input text it calls the function
void int_literal_constant(Token *t) { _incrnodestack(); _nst[_nsp]=MkNum(NoPosition,atoi(t->text)); }Note that this code does not check for an error in the conversion of the string. That might or might not be reasonable, depending upon how careful the parser was in accepting a string as a representation of an integer value.
Further examination of the parser might show that it calls the function
void mult_operand(void) { _nst[_nsp-1]=MkMul(NoPosition,_nst[_nsp-1],_nst[_nsp]); _nsp--; }Implementation of the action when the parser recognizes an expression involving two operands and a plus operator is identical except that MkAdd is invoked instead of MkMul .
If the parser invokes
void level_0_expr(void) { _nst[_nsp]=MkTop(NoPosition,_nst[_nsp]); }
Suppose that the parser invokes
void level_3_expr(void) { }
An expression language usually has precedence levels containing several
operators.
For example, dyadic
void mult_operand(Token *o) { if (o->type == STAR) _nst[_nsp-1]=MkMul(NoPosition,_nst[_nsp-1],_nst[_nsp]); else _nst[_nsp-1]=MkDiv(NoPosition,_nst[_nsp-1],_nst[_nsp]); _nsp--; }This assumes that the parser will not invoke mult_operand unless the
operator is either * or / , and therefore no error checking is
required.
(If the number of operators at the precedence level were larger, then a
switch statement might be preferable to the conditional.)
The code for
RULE Div: Expr ::= Expr '/' Expr COMPUTE Expr[1].val=DIV(Expr[2].val,Expr[3].val); END;
Tree designed for overload resolution
Consider the following LIDO specification, which provides a structure to
analyze the result type of an expression involving addition, subtraction,
multiplication, and division of integers and floating point numbers
(see Operator Overloading of Tutorial on Type Analysis).
It assumes that each
RULE Top: Root ::= Expr END; RULE Dya: Expr ::= Expr BinOp Expr END; RULE Pls: BinOp ::= '+' END; RULE Min: BinOp ::= '-' END; RULE Str: BinOp ::= '*' END; RULE Sls: BinOp ::= '/' END; RULE Ntg: Expr ::= Integer END; RULE Flt: Expr ::= Float END;Eli generates eight node construction functions from this specification. The first six are:
NODEPTR MkTop(POSITION *_coord, NODEPTR _d1); NODEPTR MkDya(POSITION *_coord, NODEPTR _d1, NODEPTR _d2, NODEPTR _d3); NODEPTR MkPls(POSITION *_coord); NODEPTR MkMin(POSITION *_coord); NODEPTR MkAst(POSITION *_coord); NODEPTR MkSls(POSITION *_coord);To make the node construction functions visible, include the file `treenode.h'. In this example, we would like to represent the integer and floating-point constants in the tree by the strings that represent them in the source text. There are two possibilities:
In case 1, a specification must be added to the LIDO description of the tree:
TERM Integer, Float: CharPtr; CharPtr is the LIDO name for the C type char * .
The definition of CharPtr is made available by including file
`strings.h'.
The
NODEPTR MkNtg(POSITION *_coord, CharPtr t); NODEPTR MkFlt(POSITION *_coord, CharPtr t);
Suppose that, as discussed in the last subsection, the parser calls
void int_literal_constant(Token *t) { _incrnodestack(); _nst[_nsp]=MkNtg(NoPosition,t->text); }
In case 2, we can make use of Eli's
int MakeName(char *c);Because the default type of a LIDO terminal is int , we can omit the
TERM specification and the two functions MxNtg and
MkFlt will be defined as follows:
NODEPTR MkNtg(POSITION *_coord, int t); NODEPTR MkFlt(POSITION *_coord, int t);The implementation of int_literal_constant would be:
void int_literal_constant(Token *t) { _incrnodestack(); _nst[_nsp]=MkNum(NoPosition,MakeName(t->text)); }
The
$/Tech/MakeName.gnrc:instNo +instance parameter should be supplied, because scanning and
parsing are provided by the foreign code.
Once the module has been instantiated, the definition of the MakeName
function is made available to the tree construction module by including file
`MakeName.h'.
Let's assume that the parser invokes a single function when it recognizes
any dyadic expression whose operator is at a specific precedence level,
passing the operator token to that function.
For example,
void level_1_operator(Token *o) { NODEPTR op; if (o->type == PLUS) op=MkPls(NoPosition); else op=MkMin(NoPosition); _nst[_nsp-1]=MkDya(NoPosition,_nst[_nsp-1],op,_nst[_nsp]); _nsp--; }This assumes that the parser will not invoke level_1_operator unless
the operator is either + or - , and therefore no error checking
is required.
(If the number of operators at the precedence level were larger, then a
switch statement might be preferable to the conditional.)
If, on the other hand, the parser invokes a function
void add_operand(void) { _nst[_nsp-1]= MkDya(NoPosition,_nst[_nsp-1],MkPls(NoPosition),_nst[_nsp]); _nsp--; }Note that the operator node implied by the add_operand call must be
explicitly created in this case; it is only implicit in the parse.
Tree nodes for chain rulesRecall that a chain rule has the form `X ::= Y', where `X' differs from `Y' (see Chain rule definitions). Such a rule will always result in a tree node with a single child, and if the rule name is `Ch' then the constructor function will be:
NODEPTR MkCh(POSITION *_coord, NODEPTR _d1); With the exception of the root node of the tree, it is never necessary to explicitly invoke the constructor of a chain rule node. This is actually a very important property of the tree construction module. For example, consider the following fragment of a LIDO specification:
RULE SimpleVar: Var ::= VrblIdUse END; RULE SubscrVar: Var ::= Var '[' Exp ']' END; RULE VarExp: Exp ::= Var END; RULE ArrayExp: Exp ::= TypeIdUse '[' Exp ']' 'of' Exp END; RULE Typ: TypeIdUse ::= Symbol END; RULE Var: VrblIdUse ::= Symbol END; RULE Idn: Symbol ::= Identifier END; Identifier is a terminal symbol, represented by a unique permanent
string
(see Tree designed for overload resolution).
The problem here is that, given the input sequence `a[', a parser
would have to look beyond the matching `]' in order to decide whether
`a' was a
Adding coordinate informationLIDO computations may access the coordinates of the first character of the source text region represented by a node. Usually, these computations are used to attach error reports to appropriate text locations. Many of the modules that implement common computations use this facility for error reporting (for an example, see Verifying typed identifier usage of Type Analysis). Execution monitoring is provided by Noosa, a separate process that can display the abstract syntax tree and graphically relate it to the source text (see Trees and Attribute Values of Execution Monitoring Reference). Noosa requires that both the source text coordinates of the first character of a tree context and those of the first character beyond that context be supplied to its construction function.
Specific source text coordinates are represented by a
Supplying coordinates for computationLIDO provides three names that can be used in computations to obtain source text coordinates of a tree context (see Predefined Entities of LIDO - Reference Manual):
In order to support the use of coordinates in computation,
the tree construction function must have access to
the location of the first character of its tree context.
We have assumed that each token provided by the parser specifies the line
and column of the first character of the corresponding input string
(see Building tree nodes).
This information can be used to build a
POSITION curpos; void int_literal_constant(Token *t) { LineOf(curpos) = t->line; ColOf(curpos) = t->col; _incrnodestack(); _nst[_nsp]=MkNum(&curpos,atoi(t->text)); }Notice that the address of curpos , rather then curpos itself,
is passed to the node construction function MkNum .
Unfortunately, this information isn't sufficient.
We must not only pass the coordinates to
The Eli stack module actually provides two parallel stacks,
void int_literal_constant(Token *t) { LineOf(curpos) = t->line; ColOf(curpos) = t->col; _incrnodestack(); _pst[_nsp]=curpos; _nst[_nsp]=MkNum(&curpos,atoi(t->text)); }The position value, not a pointer to that value, is saved on the stack. That frees curpos to be used in constructing other values.
When a node whose children are all on the stack is constructed, the coordinates are obtained from the leftmost child:
void mult_operand(void) { _nst[_nsp-1]=MkMul(&_pst[_nsp-1],_nst[_nsp-1],_nst[_nsp]); _nsp--; }
Generally speaking, the stack location for the left operand
becomes the stack location for the result.
Because the coordinates of the result are the coordinates of the left
operand, there is no need for an assignment to
Supplying coordinates for Noosa
Noosa requires the coordinates of the first character of a tree context and
also the coordinates of the first character beyond the end of that context.
The additional coordinates should be supplied, however, only if
execution monitoring has actually been specified for the particular run.
This is because the
The simplest strategy is to define a routine to compute the appropriate
POSITION PositionOf(Token_t *token) { POSITION curpos; LineOf(curpos) = token->line; ColOf(curpos) = token->col; #ifdef RIGHTCOORD RLineOf(curpos) = LineOf(curpos); RColOf(curpos) = ColOf(curpos) + strlen(token->text); #ifdef MONITOR CumColOf(curpos) = ColOf(curpos); RCumColOf(curpos) = RColOf(curpos); #endif #endif return curpos; } RIGHTCOORD and MONITOR are defined by Eli for each C
compilation if the user specifies the +monitor parameter to the
derivation
(see monitor of Products and Parameters Reference).
A node for an integer literal constant would then be built by:
void int_literal_constant(Token *t) { curpos = PositionOf(t); _incrnodestack(); _pst[_nsp]=curpos; _nst[_nsp]=MkNum(&curpos,atoi(t->text)); }
The construction of a node whose children are on the stack becomes more
complex, because the coordinates of the constructed node involve the
coordinates of the first character of the leftmost child node and the
coordinates of the first character beyond the end of rightmost child node.
The tree stack module provides a function,
POSITION SpanOf(POSITION left, POSITION right);
Using
void mult_operand(void) { curpos=SpanOf(_pst[_nsp-1],_pst[_nsp]); _pst[_nsp-1]=curpos; _nst[_nsp-1]=MkMul(&curpos,_nst[_nsp-1],_nst[_nsp]); _nsp--; }
Building LISTOF constructsThere are three tree construction functions associated with a LISTOF construct with the rule name `Ex' (see Tree Construction Functions of LIDO - Reference Manual):
NODEPTR MkEx(POSITION *_coord, NODEPTR _d1); NODEPTR Mk0Ex(POSITION *_coord); NODEPTR Mk2Ex(POSITION *_coord, NODEPTR _d1, NODEPTR _d2);Arguments `_d1' and `_d2' may be:
MkEx .
LISTOF constructs always involve either looping or recursion in a parser. For example, consider a language in which a block consists of an arbitrary non-empty sequence of declarations and statements. The LIDO specification for the abstract syntax might contain the rule:
RULE Blk: Block LISTOF Declaration | Statement END;
Suppose that the parser calls
void block_begin(void) { _incrnodestack(); _nst[_nsp]=Mk0Blk(&curpos); } void declaration_action(void) { curpos=SpanOf(_pst[_nsp-1],_pst[_nsp]); _pst[_nsp-1]=curpos; _nst[_nsp-1]=Mk2Blk(&curpos,_nst[_nsp-1],_nst[_nsp]); _nsp--; } void statement_action(void) { declaration_action(void) } void block_end(void) { _nst[_nsp]=MkBlk(&_pst[_nsp],_nst[_nsp]); }
Running a foreign parser under EliThere are two distinct possibilities for the implementation of a foreign parser:
The parser is a collection of routinesWhen the parser is a collection of routines, whether in source or object form, the files containing those routines can be listed in a `.specs' file (see Descriptive Mechanisms Known to Eli of Guide for New Eli Users). The name of that file then appears in the overall specification. For example, suppose that all of the components of the foreign parser are listed in file `parser.specs' and the tree computations are defined by the file `treecomp.lido'. Then the overall specification of the program might be `prog.specs' with the content:
parser.specs treecomp.lido(Normally the tree computation would involve a number of different specifications rather than a single `.lido' file, so a more realistic example would use `treecomp.specs' or `treecomp.fw' to specify it.)
Eli normally generates a parser from every specification.
When a parser is supplied, this behavior must be suppressed by adding the
parameter
prog.specs +parser=none :exe Eli also normally provides the following main program:
int main(int argc , char *argv[]) { #ifdef MONITOR _dap_init (argv[0]); _dapto_enter ("driver"); #endif ParseCommandLine(argc, argv); #include "INIT.h" TREEBUILD(); #ifdef STOPAFTERBADPARSE if (ErrorCount[ERROR] == 0) #endif ATTREVAL(); #include "FINL.h" #ifdef MONITOR _dapto_leave ("driver"); #endif return (ErrorCount[ERROR] > 0); }One possible strategy is to write a wrapper procedure named TREEBUILD
that carries out all of the setup operations needed for the foreign parser
and then invokes it.
This can often be done by renaming a main program provided with the foreign
parser and making a few changes to it.
If it is not feasible to modify the main program of the foreign parser,
then production of Eli's main program must be suppressed by adding the
parameter
prog.specs +nomain +parser=none :exeIn this case, however, the interface module must:
If the parser executes a function call `begin_parse();' before invoking any other functions of the interface, and a function call `end_parse();' when it has completed recognition of the input text, then the implementation of these two functions might be:
void begin_parse(void) { #ifdef MONITOR _dap_init (""); /* Argument is generally the program name */ #endif #include "INIT.h" } void end_parse(void) { _nst[_nsp]=MkRoot(&_pst[_nsp],_nst[_nsp]); ATTREVAL(); #include "FINL.h" }Replace "Root" by the name of the rule that creates the root node of the tree. If the root node is created by another function, omit the Mk -function call.
ATTREVAL assumes that the root node is at the top of the stack;
if this precondition is not satisfied then the computation will silently
do nothing.
The parser is an executable fileWhen the parser is an executable file that expects to load a shared library, that library must be built from the specifications of the tree construction and computation (see so of Products and Parameters Reference). The library must not contain a parser or a main program:
treecomp.specs +nomain +parser=none :soHere we assume that all of the components of the LIDO specification, tree construction interface, and supporting modules are listed in `treecomp.specs'. The simplest approach to integrating the foreign parser with the shared library is to copy it to a file with the name that the foreign parser expects. For example, if the parser program expects to load a shared library named `ParserActions.so', then use the following derivation to make the library available under that name:
treecomp.specs +nomain +parser=none :so > libParserActions.so(See your system documentation for the placement and naming of shared library files.)
|