The phrase structure of a language defines the form of the input representation. Annex B of ANSI/ISO 9899-1990 summarizes the phrase structure of C, and this chapter captures most of that information (pre-processor definitions are omitted).
A scanner and parser that verify lexical and syntactic correctness of pre-processed C text can be generated from this chapter. It can also be used as one component of a larger specification from which a C compiler or special-purpose analyzer could be generated, or it could form the basis for a specification of an extension to C.
Section 1.1 specifies the way in which input characters are grouped into tokens and comments. This task is made more difficult than usual in C because certain basic symbols with the lexical structure of an identifier need to be classified as type names. The details of this process, which requires feedback from an analysis of the syntactic structure, are given in Section 1.3.
Section 1.2 specifies the way in which the syntactic structure is derived from the sequence of tokens.