Data Mining
Practical Machine Learning Tools and Techniques
P088407-FM.qxd 5/3/05 2:21 PM Page i
The Morgan Kaufmann Series in Data Management Systems
Series Editor: Jim Gray, Microsoft Research
Data Mining: Practical Machine Learning
Tools and Techniques, Second Edition
Ian H. Witten and Eibe Frank
Fuzzy Modeling and Genetic Algorithms for
Data Mining and Exploration
Earl Cox
Data Modeling Essentials, Third Edition
Graeme C. Simsion and Graham C. Witt
Location-Based Services
Jochen Schiller and Agnès Voisard
Database Modeling with Microsoft® Visio for
Enterprise Architects
Terry Halpin, Ken Evans, Patrick Hallock,
and Bill Maclean
Designing Data-Intensive Web Applications
Stefano Ceri, Piero Fraternali, Aldo Bongio,
Marco Brambilla, Sara Comai, and
Maristella Matera
Mining the Web: Discovering Knowledge
from Hypertext Data
Soumen Chakrabarti
Advanced SQL: 1999—Understanding
Object-Relational and Other Advanced
Features
Jim Melton
Database Tuning: Principles, Experiments,
and Troubleshooting Techniques
Dennis Shasha and Philippe Bonnet
SQL: 1999—Understanding Relational
Language Components
Jim Melton and Alan R. Simon
Information Visualization in Data Mining
and Knowledge Discovery
Edited by Usama Fayyad, Georges G.
Grinstein, and Andreas Wierse
Transactional Information Systems: Theory,
Algorithms, and the Practice of Concurrency
Control and Recovery
Gerhard Weikum and Gottfried Vossen
Spatial Databases: With Application to GIS
Philippe Rigaux, Michel Scholl, and Agnès
Voisard
Information Modeling and Relational
Databases: From Conceptual Analysis to
Logical Design
Terry Halpin
Component Database Systems
Edited by Klaus R. Dittrich and Andreas
Geppert
Managing Reference Data in Enterprise
Databases: Binding Corporate Data to the
Wider World
Malcolm Chisholm
Data Mining: Concepts and Techniques
Jiawei Han and Micheline Kamber
Understanding SQL and Java Together: A
Guide to SQLJ, JDBC, and Related
Technologies
Jim Melton and Andrew Eisenberg
Database: Principles, Programming, and
Performance, Second Edition
Patrick O’Neil and Elizabeth O’Neil
The Object Data Standard: ODMG 3.0
Edited by R. G. G. Cattell, Douglas K.
Barry, Mark Berler, Jeff Eastman, David
Jordan, Craig Russell, Olaf Schadow,
Torsten Stanienda, and Fernando Velez
Data on the Web: From Relations to
Semistructured Data and XML
Serge Abiteboul, Peter Buneman, and Dan
Suciu
Data Mining: Practical Machine Learning
Tools and Techniques with Java
Implementations
Ian H. Witten and Eibe Frank
Joe Celko’s SQL for Smarties: Advanced SQL
Programming, Second Edition
Joe Celko
Joe Celko’s Data and Databases: Concepts in
Practice
Joe Celko
Developing Time-Oriented Database
Applications in SQL
Richard T. Snodgrass
Web Farming for the Data Warehouse
Richard D. Hackathorn
Database Modeling & Design, Third Edition
Toby J. Te o re y
Management of Heterogeneous and
Autonomous Database Systems
Edited by Ahmed Elmagarmid, Marek
Rusinkiewicz, and Amit Sheth
Object-Relational DBMSs: Tracking the Next
Great Wave, Second Edition
Michael Stonebraker and Paul Brown, with
Dorothy Moore
A Complete Guide to DB2 Universal
Database
Don Chamberlin
Universal Database Management: A Guide
to Object/Relational Technology
Cynthia Maro Saracco
Readings in Database Systems, Third Edition
Edited by Michael Stonebraker and Joseph
M. Hellerstein
Understanding SQL’s Stored Procedures: A
Complete Guide to SQL/PSM
Jim Melton
Principles of Multimedia Database Systems
V. S. Subrahmanian
Principles of Database Query Processing for
Advanced Applications
Clement T. Yu and Weiyi Meng
Advanced Database Systems
Carlo Zaniolo, Stefano Ceri, Christos
Faloutsos, Richard T. Snodgrass, V. S.
Subrahmanian, and Roberto Zicari
Principles of Transaction Processing for the
Systems Professional
Philip A. Bernstein and Eric Newcomer
Using the New DB2: IBM’s Object-Relational
Database System
Don Chamberlin
Distributed Algorithms
Nancy A. Lynch
Active Database Systems: Triggers and Rules
For Advanced Database Processing
Edited by Jennifer Widom and Stefano Ceri
Migrating Legacy Systems: Gateways,
Interfaces & the Incremental Approach
Michael L. Brodie and Michael Stonebraker
Atomic Transactions
Nancy Lynch, Michael Merritt, William
Weihl, and Alan Fekete
Query Processing For Advanced Database
Systems
Edited by Johann Christoph Freytag, David
Maier, and Gottfried Vossen
Transaction Processing: Concepts and
Techniques
Jim Gray and Andreas Reuter
Building an Object-Oriented Database
System: The Story of O
2
Edited by François Bancilhon, Claude
Delobel, and Paris Kanellakis
Database Transaction Models For Advanced
Applications
Edited by Ahmed K. Elmagarmid
A Guide to Developing Client/Server SQL
Applications
Setrag Khoshafian, Arvola Chan, Anna
Wong, and Harry K. T. Wong
The Benchmark Handbook For Database
and Transaction Processing Systems, Second
Edition
Edited by Jim Gray
Camelot and Avalon: A Distributed
Transaction Facility
Edited by Jeffrey L. Eppinger, Lily B.
Mummert, and Alfred Z. Spector
Readings in Object-Oriented Database
Systems
Edited by Stanley B. Zdonik and David
Maier
P088407-FM.qxd 5/3/05 5:42 PM Page ii
Data Mining
Practical Machine Learning Tools and Techniques,
Second Edition
Ian H. Witten
Department of Computer Science
University of Waikato
Eibe Frank
Department of Computer Science
University of Waikato
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
P088407-FM.qxd 4/30/05 10:55 AM Page iii
Publisher: Diane Cerra
Publishing Services Manager: Simon Crump
Project Manager: Brandy Lilly
Editorial Assistant: Asma Stephan
Cover Design: Yvo Riezebos Design
Cover Image: Getty Images
Composition: SNP Best-set Typesetter Ltd., Hong Kong
Technical Illustration: Dartmouth Publishing, Inc.
Copyeditor: Graphic World Inc.
Proofreader: Graphic World Inc.
Indexer: Graphic World Inc.
Interior printer: The Maple-Vail Book Manufacturing Group
Cover printer: Phoenix Color Corp
Morgan Kaufmann Publishers is an imprint of Elsevier.
500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 2005 by Elsevier Inc. All rights reserved.
Designations used by companies to distinguish their products are often claimed as trademarks
or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a
claim, the product names appear in initial capital or all capital letters. Readers, however, should
contact the appropriate companies for more complete information regarding trademarks and
registration.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—
without prior written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in
Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier
homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining
Permissions.”
Library of Congress Cataloging-in-Publication Data
Witten,I.H.(Ian H.)
Data mining : practical machine learning tools and techniques / Ian H. Witten, Eibe
Frank. – 2nd ed.
p. cm. – (Morgan Kaufmann series in data management systems)
Includes bibliographical references and index.
ISBN: 0-12-088407-0
1. Data mining. I. Frank, Eibe. II. Title. III. Series.
QA76.9.D343W58 2005
006.3–dc22 2005043385
For information on all Morgan Kaufmann publications,
visit our Web site at www.mkp.com or www.books.elsevier.com
Printed in the United States of America
0506070809 54321
Working together to grow
libraries in developing countries
www.elsevier.com | www.bookaid.org | www.sabre.org
P088407-FM.qxd 5/3/05 2:22 PM Page iv