1
The Compiler Generator Coco/R
User Manual
Hanspeter Mössenböck
Johannes Kepler University Linz
Institute of System Software
Coco/R
1
is a compiler generator, which takes an attributed grammar of a source language
and generates a scanner and a parser for this language. The scanner works as a
deterministic finite automaton. The parser uses recursive descent. LL(1) conflicts can be
resolved by a multi-symbol lookahead or by semantic checks. Thus the class of accepted
grammars is LL(k) for an arbitrary k.
There are versions of Coco/R for C#, Java, C++, Delphi, Modula-2, Oberon and other
languages. This manual describes the versions for C#, Java and C++ from the University
of Linz.
Download from: http://ssw.jku.at/Coco/
Compiler Generator Coco/R,
Copyright © 1990, 2010 Hanspeter Mössenböck, University of Linz
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the
Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
As an exception, it is allowed to write an extension of Coco/R that is used as a plugin in non-free software.
If not otherwise stated, any source code generated by Coco/R (other than Coco/R itself) does not fall under the
GNU General Public License.
1
Coco/R stands for compiler compiler generating recursive descent parsers.
2
Contents
1. Overview..............................................................................................................................................3
1.1 Sample Production.........................................................................................................................3
1.2 Sample Parsing Method.................................................................................................................4
1.3 Summary of Features.....................................................................................................................4
2. Input Language.....................................................................................................................................5
2.1 Vocabulary.....................................................................................................................................5
2.2 Overall Structure............................................................................................................................6
2.3 Scanner Specification ....................................................................................................................7
2.3.1 Character sets.........................................................................................................................7
2.3.2 Tokens....................................................................................................................................8
2.3.3 Pragmas..................................................................................................................................9
2.3.4 Comments............................................................................................................................10
2.3.5 White space..........................................................................................................................10
2.3.6 Case sensitivity ....................................................................................................................11
2.4 Parser Specification .....................................................................................................................11
2.4.1 Productions ..........................................................................................................................11
2.4.2 Semantic Actions .................................................................................................................12
2.4.3 Attributes .............................................................................................................................12
2.4.4 The Symbol ANY ................................................................................................................14
2.4.5 LL(1) Conflicts ....................................................................................................................14
2.4.6 LL(1) Conflict Resolvers .....................................................................................................17
2.4.7 Syntax Error Handling.........................................................................................................20
2.4.8 Frame Files...........................................................................................................................23
3. User Guide..........................................................................................................................................23
3.1 Installation ...................................................................................................................................23
3.2 Options.........................................................................................................................................23
3.3 Invocation....................................................................................................................................24
3.4 Interfaces of the Generated Classes.............................................................................................25
3.4.1 Scanner.................................................................................................................................25
3.4.2 Token ...................................................................................................................................25
3.4.3 Buffer...................................................................................................................................25
3.4.4 Parser ...................................................................................................................................26
3.4.5 Errors ...................................................................................................................................26
3.5 Main Class of the Compiler.........................................................................................................27
3.6 Grammar Tests.............................................................................................................................28
4. A Sample Compiler............................................................................................................................30
5. Applications of Coco/R ......................................................................................................................32
6. Acknowledgements ............................................................................................................................33
A. Syntax of Cocol/R .............................................................................................................................34
B. Sources of the Sample Compiler........................................................................................................35
B.1 Taste.ATG ....................................................................................................................................35
B.2 SymTab.cs (symbol table)...........................................................................................................38
B.3 CodeGen.cs (code generator)......................................................................................................40
B.4 Taste.cs (main program)..............................................................................................................42
3
1. Overview
Coco/R is a compiler generator, which takes an attributed grammar of a source
language and generates a scanner and a recursive descent parser for this language. The
user has to supply a main class that calls the parser as well as semantic classes (e.g. a
symbol table handler or a code generator) that are used by semantic actions in the
parser. This is shown in Figure 1.
compiler
description
Main
Parser
Scanner
semantic classes
Coco/R
Figure 1 Input and output of Coco/R
1.1 Sample Production
In order to give you an idea of how attributed grammars look like in Coco/R, let us
look at a sample production for variable declarations in a Pascal-like language:
VarDeclaration<ref int adr> (. string name; TypeDesc type; .)
= Ident<out name> (. Obj x = symTab.Enter(name);
int n = 1; .)
{ ',' Ident<out name> (. Obj y = symTab.Enter(name);
x.next = y; x = y;
n++; .)
}
':' Type<out type> (. adr += n * typ.size;
for (int a = adr; x != null; x = x.next) {
a -= type.size;
x.adr = a;
} .)
';' .
The core of this specification is the EBNF production
VarDeclaration = Ident {',' Ident} ':' Type ';'.
It is augmented with attributes and semantic actions. The attributes (e.g. <out name>)
specify the parameters of the symbols. There are input attributes (e.g. <x, y>) and
output attributes (e.g. <out z> or <ref z>). A semantic action is a piece of code that
is written in the target language of Coco/R (e.g. in C#, Java or C++) and is executed
by the generated parser at its position in the production.
4
1.2 Sample Parsing Method
Every production is translated into a parsing method. The method for VarDeclaration,
for example, looks like this in C# (code parts originating from attributes or semantic
actions are shown in gray):
void VarDeclaration(ref int adr) {
string name; TypeDesc type;
Ident(out name);
Obj x = symTab.Enter(name);
int n = 1;
while (la.kind == comma) {
Get();
Ident(out name);
Obj y = symTab.Enter(name);
x.next = y; x = y;
n++;
}
Expect(colon);
Type(out type);
adr += n * type.size;
for (int a = adr; x != null; x = x.next) {
a -= type.size;
x.adr = a;
}
Expect(semicolon);
}
Coco/R also generates a scanner that reads the input stream and returns a stream of
tokens to the parser.
1.3 Summary of Features
Scanner
The scanner is specified by a list of token declarations. Literals (e.g.
"if" or
"while") do not have to be declared as tokens but can be used directly in the
productions of the grammar.
The scanner is implemented as a deterministic finite automaton (DFA). Therefore
the terminal symbols (or tokens) have to be described by a regular EBNF grammar.
Comments may be nested. One can specify multiple kinds of comments for a
language.
The scanner supports Unicode characters encoded in UTF-8.
The scanner can be made case-sensitive or case-insensitive.
The scanner can recognize tokens depending on their context in the input stream.
The scanner can read from any input stream (not just from a file). However, all
input must come from a single stream (no includes).
The scanner can handle so-called pragmas, which are tokens that are not part of the
syntax but can occur anywhere in the input stream (e.g. compiler directives or end-
of-line characters).
The user can suppress the generation of a scanner and can provide a hand-written
scanner instead.
5
Parser
The parser is specified by a set of EBNF productions with attributes and semantic
actions. The productions allow for alternatives, repetition and optional parts.
Coco/R translates the productions into an efficient recursive descent parser. The
parser is reentrant, so multiple instances of it can be active at the same time.
Nonterminal symbols can have any number of input and output attributes (the Java
version allows just one output attribute, which may, however, be an object of a
suitable composite class). Terminal symbols do not have explicit attributes, but the
tokens returned by the scanner contain information that can be viewed as attributes.
All attributes are evaluated during parsing (i.e. the grammar is processed as an L-
attributed grammar).
Semantic actions can be placed anywhere in the grammar (not just at the end of
productions). They may contain arbitrary statements or declarations written in the
language of the generated parser (e.g. C#, Java or C++).
The special symbol ANY can be used to denote a set of complementary tokens.
In principle, the grammar must be LL(1). However, Coco/R can also handle non-
LL(1) grammars by using so-called resolvers that make a parsing decision based on
a multi-symbol lookahead or on semantic information.
Every production can have its own local variables. In addition to these, one can
declare global variables or methods, which are translated into fields and methods of
the parser. Semantic actions can also access other objects or methods from user-
written classes or from library classes.
Coco/R checks the grammar for completeness, consistency and non-redundancy. It
also reports LL(1) conflicts.
The error messages printed by the generated parser can be configured to conform to
a user-specific format.
The generated parser and scanner can be specified to belong to a certain namespace
(or package).
2. Input Language
This section specifies the compiler description language Cocol/R that is used as the
input language for Coco/R. A compiler description consists of a set of grammar rules
that describe the lexical and syntactical structure of a language as well as its
translation to a target language.
2.1 Vocabulary
The basic elements of Cocol/R are identifiers, numbers, strings and character
constants, which are defined as follows:
ident = letter {letter | digit}.
number = digit {digit}.
string = '"' {anyButQuote} '"'.
char = '\'' anyButApostrophe '\''.
Upper case letters are distinct from lower case letters. Strings must not extend across
multiple lines. Both strings and character constants may contain the following escape
sequences: