Structure of the Input File

Next: Placement of Tree Computations Up: Example Previous: Structure of the Output

Structure of the Input File

Figure 3 gives the structure of the input file: an arbitrary number of set definitions, each consisting of a set name and a set body. The set body contains an arbitrary number of set elements.

**Figure 3:** Input File Structure

Figure 3 reflects the problem requirement that there be only one occurrence of a given word in a given class. This sort of requirement is very common in translation problems. It involves recognition of regions and specific entities within those regions. For example, a SetBody is a region and a SetElement is a specific entity within that region. The requirement is that no SetElement appear more than once in any SetBody. Clearly a SetBody node must appear in Figure 3 in order to allow us to describe this requirement.

The structure of the input file can be described by the following context-free grammar:

InputText.con[4]:

text: defs . defs: SetDef / defs SetDef . SetDef: SetName '{' SetBody '}' . SetName: Word . SetBody: elements . elements: SetElement / elements SetElement . SetElement: Word .

This macro is attached to a product file.

The terminal symbols of the grammar are the literal braces and the nonliteral symbol Word. Eli can easily deduce that the braces are significant characters, but we must provide a definition of the significant character sequences that could make up a Word. We must also describe how to capture the significant information in a word for further processing.

Eli normally assumes that white space characters (space, tab and newline) are not significant. If we want to provide a facility for commenting a classification then we must additionally define the form of a comment and specify that character sequences having this form are not significant.

Here is one possible description of the non-literal character sequences:

InputText.gla[5]:

Word: $[a-zA-Z]+ [mkidn] C_COMMENT

This macro is attached to a product file.

The first line defines a Word by the regular expression [a-zA-Z]+ (regular expressions are introduced by $ and terminated by white space) [Gra88]. Whenever such a sequence is recognized in the input text, mkidn is invoked to capture the significant information represented by the sequence. This processor (available in Eli's library) associates an integer with the recognized sequence, and arranges for that integer to become the value of the terminal symbol Word. If two character sequences recognized as words are identical, mkidn will associate the same integer with them; distinct sequences get different integers.

The second line of the specification does not begin with a symbol followed by :, which indicates that the character sequences it describes are not significant. It uses a canned description from Eli's library to describe character sequences taking the form of C comments. Thus any character sequence taking the form of a C comment will be ignored in the input text read by the generated program.

Next: Placement of Tree Computations Up: Example Previous: Structure of the Output

2007-05-18