Theory and Practice
Data
Flow
Analysis
© 2009 by Taylor & Francis Group, LLC
CRC Press is an imprint of the
Taylor & Francis Group, an informa business
Boca Raton London New York
Uday P. Khedker
Amitabha Sanyal
Bageshri Karkare
Theory and Practice
Data
Flow
Analysis
© 2009 by Taylor & Francis Group, LLC
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2009 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-13: 978-0-8493-2880-0 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher can-
not assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Khedker, Uday.
Data flow analysis : theory and practice / Uday Khedker, Amitabha Sanyal,
Bageshri Karkare.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-8493-2880-0 (hardcover : alk. paper)
1. Compilers (Computer programs) 2. Data flow computing. 3. Software
engineering. 4. Computer software--Verification. I. Sanyal, Amitabha. II. Karkare,
Bageshri. III. Title.
QA76.76.C65K54 2009
004’.35--dc22 2009002056
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
© 2009 by Taylor & Francis Group, LLC
Preface
Data flow analysis is a classical static analysis technique that has been used to dis-
cover useful properties of programs being analyzed. It has found many useful ap-
plications ranging from compiler optimizations to software engineering to software
verification. Modern compilers use this technique to produce code that maximize
performance. In software engineering, it is used to re-engineer or reverse engineer
programs. Finally, data flow analysis based techniques are used in software verifica-
tion to prove the soundness of programs with respect to properties of interest.
This book provides a detailed treatment of data flow analysis. Although we explain
it in the context of compiler optimizations, the concepts are general enough to be
used for other applications. This is possible because we use a general model of data
flow equations to represent the specification of data flow analysis. These data flow
equations are defined in terms of constant and dependent
Gen
and
Kill
components.
For classical bit vector frameworks, the constant
Gen
and
Kill
suffice; dependent
parts are required for frameworks like constant propagation, points-to analysis etc.
Such a modeling explicates the inter-dependence of data flow values and leads to
an orthogonal generality that models flow functions in terms of a rather small set
of constituent functions called entity functions. On the one hand, modeling flow
functions in terms of entity functions allows us to define information flow paths that
explain empirical observations for a large class of data flow frameworks and facilitate
tight complexity bounds on solution procedures for data flow equations. On the
other hand, this modeling also allows reasoning about the feasibility of constructing
summary flow functions.
The book is organized in three parts: The first part deals with the specification of
data flow frameworks and the solution process at the intraprocedural level. This part
presents the lattice theoretic modeling of data flow frameworks apart from the gen-
eralizations of constant and dependent parts in flow functions and entity functions
as constituents of flow functions. It shows how these generalizations lead to tight
complexity bounds. This part also presents a large number of data flow frameworks.
The diversity of these analyses is an evidence of the wide applicability of the gener-
alizations presented. The final chapter of the first part presents SSA representation of
programs. This is interesting because it builds an additional layer of abstraction over
the control flow graph representation of programs and directly relates the definition
points and the use points of data. This increases the efficiency with which a class of
optimizations can be performed.
The second part of the book presents interprocedural data flow analysis. As a
matter of choice, we avoid methods that are specific to a particular application or
v
© 2009 by Taylor & Francis Group, LLC