C Traps and Pitfalls*
Andrew Koenig
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
ABSTRACT
The C language is like a carving knife: simple, sharp, and extremely useful in
skilled hands. Like any sharp tool, C can injure people who don’t know how to handle it.
This paper shows some of the ways C can injure the unwary, and how to avoid injury.
0. Introduction
The C language and its typical implementations are designed to be used easily by experts. The lan-
guage is terse and expressive. There are few restrictions to keep the user from blundering. A user who has
blundered is often rewarded by an effect that is not obviously related to the cause.
In this paper, we will look at some of these unexpected rewards. Because they are unexpected, it
may well be impossible to classify them completely. Nevertheless, we have made a rough effort to do so
by looking at what has to happen in order to run a C program. We assume the reader has at least a passing
acquaintance with the C language.
Section 1 looks at problems that occur while the program is being broken into tokens. Section 2 fol-
lows the program as the compiler groups its tokens into declarations, expressions, and statements. Section
3 recognizes that a C program is often made out of several parts that are compiled separately and bound
together. Section 4 deals with misconceptions of meaning: things that happen while the program is actually
running. Section 5 examines the relationship between our programs and the library routines they use. In
section 6 we note that the program we write is not really the program we run; the preprocessor has gotten at
it first. Finally, section 7 discusses portability problems: reasons a program might run on one implementa-
tion and not another.
1. Lexical Pitfalls
The first part of a compiler is usually called a lexical analyzer. This looks at the sequence of charac-
ters that make up the program and breaks them up into tokens. A token is a sequence of one or more char-
acters that have a (relatively) uniform meaning in the language being compiled. In C, for instance, the
token -> has a meaning that is quite distinct from that of either of the characters that make it up, and that is
independent of the context in which the -> appears.
For another example, consider the statement:
if (x > big) big = x;
Each non-blank character in this statement is a separate token, except for the keyword if and the two
instances of the identifier big.
In fact, C programs are broken into tokens twice. First the preprocessor reads the program. It must
tokenize the program so that it can find the identifiers, some of which may represent macros. It must then
replace each macro invocation by the result of evaluating that macro. Finally, the result of the macro
replacement is reassembled into a character stream which is given to the compiler proper. The compiler
then breaks the stream into tokens a second time.
__________________
* This paper, greatly expanded, is the basis for the book C Traps and Pitfalls (Addison-Wesley, 1989, ISBN
0–201–17928–8); interested readers may wish to refer there as well.