General Information

	Eli: Translator Construction Made Easy
	Global Index
	Frequently Asked Questions
	Typical Eli Usage Errors

Tutorials

	Quick Reference Card
	Guide For new Eli Users
	Release Notes of Eli
	Tutorial on Name Analysis
	Tutorial on Scope Graphs
	Tutorial on Type Analysis
	Typical Eli Usage Errors

Reference Manuals

	User Interface
	Eli products and parameters
	LIDO Reference Manual
	Typical Eli Usage Errors

Libraries

	Eli library routines
	Specification Module Library

Translation Tasks

	Lexical analysis specification
	Syntactic Analysis Manual
	Computation in Trees

Tools

	LIGA Control Language
	Debugging Information for LIDO
	Graphical ORder TOol

FunnelWeb User's Manual

	Pattern-based Text Generator
	Property Definition Language
	Operator Identification Language
	Tree Grammar Specification Language
	Command Line Processing
	COLA Options Reference Manual

Generating Unparsing Code

Monitoring a Processor's Execution

Administration

System Administration Guide

Lexical Analysis

The purpose of the lexical analyzer is to partition the input text, delivering a sequence of comments and basic symbols. Comments are character sequences to be ignored, while basic symbols are character sequences that correspond to terminal symbols of the grammar defining the phrase structure of the input (see Context-Free Grammars and Parsing of Syntactic Analysis).

A user must define the forms of comments and the forms of all basic symbols corresponding to non-literal terminal symbols of the grammar. Eli can deduce the form of a literal terminal symbol from the grammar specification.

The definition consists of one or more type-`gla' files. Each line of a type-`gla' file describes a set of character sequences. If a line begins with an identifier followed by a colon (:), then all of the character sequences described by the line are instances of the non-literal terminal symbol named by that identifier; otherwise they are comments.

Here is an example of a type-`gla' file:

HexInteger:  $0[Xx][0-9A-Fa-f]+
             $!  (auxEOL)
Identifier:  C_IDENTIFIER

The first line of this specification uses a regular expression to define a hexadecimal integer as a zero, followed by the letter X (either upper or lower case) and one or more hexadecimal digits represented in the usual way. In the second line, one form of comment is defined by a regular expression and the name of a C routine. The C routine will be invoked when the regular expression has been matched. This approach allows the user to define character sequences operationally when a declarative definition is tedious or does not support appropriate error reporting.

Since certain lexical structures are common to many languages, Eli provides a library of definitions that can be invoked simply be giving their names. C_IDENTIFIER, in the third line, is such an invocation. The effect of the third line is to define the form of the basic symbol Identifier as that of an identifier in C: a letter or underscore followed by some sequence of letters, digits and underscores.

Chapter 1 defines the usage, form and content of specifications provided by the user as type-`gla' files. Those specifications may refer to canned descriptions, which are defined in Chapter 2. Chapter 3 presents the default processing of spaces, tabs and newlines and explains how to define other strategies. The treatment and meaning of literal terminal symbols is discussed in Chapter 4, and Chapter 5 explains how a generated lexical analyzer can be made insensitive to the case of letters. Complex lexical analysis problems may require modification of the behavior of the generated module; Chapter 6 discusses the possibilities.