General Information

	Eli: Translator Construction Made Easy
	Global Index
	Frequently Asked Questions
	Typical Eli Usage Errors

Tutorials

	Quick Reference Card
	Guide For new Eli Users
	Release Notes of Eli
	Tutorial on Name Analysis
	Tutorial on Scope Graphs
	Tutorial on Type Analysis
	Typical Eli Usage Errors

Reference Manuals

	User Interface
	Eli products and parameters
	LIDO Reference Manual
	Typical Eli Usage Errors

Libraries

	Eli library routines
	Specification Module Library

Translation Tasks

	Lexical analysis specification
	Syntactic Analysis Manual
	Computation in Trees

Tools

	LIGA Control Language
	Debugging Information for LIDO
	Graphical ORder TOol

FunnelWeb User's Manual

	Pattern-based Text Generator
	Property Definition Language
	Operator Identification Language
	Tree Grammar Specification Language
	Command Line Processing
	COLA Options Reference Manual

Generating Unparsing Code

Monitoring a Processor's Execution

Administration

System Administration Guide

Lexical Analysis

Canned Symbol Descriptions

For many applications, the exact structure of the symbols that must be recognized is not important or the problem description specifies that the symbols should be the same as the symbols used in some other situation (e.g. identifiers might be specified to use the same format as C identifiers). To cover this common situation, Eli provides a set of canned symbol descriptions.

To use a canned description, simply write the canned description's identifier in a specification instead of writing a regular expression. For example, the following type-`gla' file tells Eli that the input text will contain C-style identifiers and strings, Ada-style comments, and Pascal-style integers:

Identifier: C_IDENTIFIER
            ADA_COMMENT
String:     C_STRING_LIT
Integer:    PASCAL_INTEGER

Identifier, String and Integer would appear as non-literal terminal symbols in the context-free grammar defining the phrase structure of this input text (see How to describe a context-free grammar of Syntax Analysis).

The available canned descriptions are defined later in this section. All of these definitions include a regular expression, and some include auxiliary scanners and/or token processors. An auxiliary scanner or token processor specified by a canned description can be overridden by nominating a different one in the specification that names the canned description. For example, the canned description PASCAL_STRING includes the token processor mkstr (see Available scanners). This token processor stores multiple copies of the same string in the character storage module. The following specification overrides mkstr with mkidn, which stores only one copy of each distinct string:

Str: PASCAL_STRING [mkidn]

The auxiliary scanner auxPascalString, included in the canned description, is not overridden by this specification.

The remainder of this section characterizes the canned descriptions that are available in the Eli library, and also gives their definitions.

Available Descriptions

Each of the identifiers in the following list is the name of a canned description specifying the lexical structure of some component of an existing programming language. Here they are simply characterized by the role they play in that language. A complete definition of each, consisting of a regular expression, possibly an auxiliary scanner name, and possibly a token processor name, is given in the next section.

When building a new language, it is a good idea to use canned descriptions for lexical components: Time is not wasted in deciding on their form, mistakes are not made in their implementation, and users are familiar with them.

The list also provides canned descriptions for spaces, tabs and newlines. These white space characters are treated as comments by default. If, however, you define any pattern that will accept a white space character in its first position, this pattern overrides the default treatment and that white space character will be accepted only in contexts that are specified explicitly (see Spaces, Tabs and Newlines). For example, suppose that the following pattern were defined and that no other patterns contain spaces:

Separator:  $\040+#\040+

In that situation, a space will be accepted only if it is part of a Separator. To treat spaces that are not part of a Separator as comments, include the canned description SPACES as a comment specification:

Separator:  $\040+#\040+
            SPACES

Note that only a white space character that appears at the beginning of a pattern loses its default interpretation in this way. In this example, neither the tab nor the newline appeared at the beginning of a pattern and therefore tabs and newlines continue to be treated as comments.

C_IDENTIFIER, C_INTEGER, C_INT_DENOTATION, C_FLOAT, C_STRING_LIT, C_CHAR_CONSTANT, C_COMMENT

Identifiers, integer constants, floating point constants, string literals, character literals, and comments from the C programming language, respectively.

C_INTEGER does not permit the L or U flags, but does correctly accept all other C integer denotations. By default, it uses c_mkint to convert the denotation to an internal int value. c_mkint obeys the C rules for determining the radix of the conversion.

C_INT_DENOTATION accepts all valid ANSI C integer denotations. By default, it uses mkstr to deliver a unique string table index for every occurrence of a denotation. This behavior is often overridden by adding [mkidn]:

Integer:  C_INT_DENOTATION [mkidn]

In this case, two identical denotations will have the same string table index.

C_IDENTIFIER_ISO

Character sequences obeying the the definition of a C identifier, but accepting all ISO/IEC 8859-1 letters. Care must be taken in using this description because these identifiers are not acceptable to most C compilers. That means they cannot usually be used as (parts of) identifiers in generated code.

PASCAL_IDENTIFIER, PASCAL_INTEGER, PASCAL_REAL, PASCAL_STRING, PASCAL_COMMENT

Identifiers, integer constants, real constants, string literals, and comments from the Pascal programming language, respectively.

MODULA2_INTEGER, MODULA2_CHARINT, MODULA2_LITERALDQ, MODULA2_LITERALSQ, MODULA2_COMMENT

Integer constants, characters specified using character codes, string literals delimited by double and single quotes, and comments from the Modula-2 programming language, respectively.

MODULA3_COMMENT

Comments from the Modula-3 programming language.

ADA_IDENTIFIER, ADA_COMMENT

Identifiers and comments from the Ada programming language.

AWK_COMMENT

Comments from the AWK programming language.

SPACES

Sequence of one or more spaces.

TAB

A single horizontal tab.

NEW_LINE

A single newline.

Definitions of Canned Descriptions

Eli textually replaces a reference to a canned description with its definition. If a user nominates an auxiliary scanner and/or a token processor for a canned description, that overrides the corresponding nomination appearing in the definition of the canned description.

The following is an alphabetized list of the canned descriptions available in the Eli library, with their definitions. Use this list as a formal definition, and as an example for constructing specifications. (C_FLOAT and PASCAL_REAL have definitions that are too long to fit on one line of this document. Each is, however, a single line in the specification file.)

ADA_COMMENT: $-- (auxEOL)
ADA_IDENTIFIER: $[a-zA-Z](_?[a-zA-Z0-9])* [mkidn]
AWK_COMMENT: $# (auxEOL)
C_COMMENT: $"/*" (auxCComment)
C_CHAR_CONSTANT: $' (auxCChar) [c_mkchar]
C_FLOAT: $((([0-9]+\.[0-9]*|\.[0-9]+)((e|E)(\+|-)?[0-9]+)?)| ([0-9]+(e|E)(\+|-)?[0-9]+))[fFlL]? [mkstr]
C_IDENTIFIER: $[a-zA-Z_][a-zA-Z_0-9]* [mkidn]
C_INTEGER: $([0-9]+|0[xX][0-9a-fA-F]*) [c_mkint]
C_INT_DENOTATION: $([1-9][0-9]*|0[0-7]*|0[xX][0-9a-fA-F]+)([uU][lL]?|[lL][uU]?)? [mkstr]
C_STRING_LIT: $\" (auxCString) [mkstr]
MODULA_INTEGER: $[0-9][0-9A-Fa-f]*[BCH]? [modula_mkint]
MODULA2_COMMENT, MODULA3_COMMENT: $\(\* (auxM3Comment)
MODULA2_CHARINT: $[0-9][0-9A-Fa-f]*C [modula_mkint]
MODULA2_INTEGER: $[0-9][0-9A-Fa-f]*[BH]? [modula_mkint]
MODULA2_LITERALDQ: $\" (auxM2String) [mkstr]
MODULA2_LITERALSQ: $\' (auxM2String) [mkstr]
PASCAL_COMMENT: $"{"|"(*" (auxPascalComment)
PASCAL_IDENTIFIER: $[a-zA-Z][a-zA-Z0-9]* [mkidn]
PASCAL_INTEGER: $[0-9]+ [mkint]
PASCAL_REAL: $(([0-9]+\.[0-9]+)((e|E)(\+|-)?[0-9]+)?)|([0-9]+(e|E)(\+|-)?[0-9]+) [mkstr]
PASCAL_STRING: $' (auxPascalString) [mkstr]
SPACES: $\040+
TAB: $\t (auxTab)
NEW_LINE: $[\r\n] (auxNewLine)