Machine Learning for Hackers
by Drew Conway and John Myles White
Copyright © 2012 Drew Conway and John Myles White. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles (http://my.safaribooksonline.com). For more information, contact our
corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.
Editor: Julie Steele
Production Editor: Melanie Yarbrough
Copyeditor: Genevieve d’Entremont
Proofreader: Teresa Horton
Indexer: Angela Howard
Cover Designer: Karen Montgomery
Interior Designer: David Futato
Illustrator: Robert Romano
February 2012: First Edition.
Revision History for the First Edition:
2012-02-06 First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449303716 for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc. Machine Learning for Hackers, the cover image of a griffon vulture, and related
trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-1-449-30371-6
[LSI]
1328629742
www.it-ebooks.info
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Using R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
R for Machine Learning 2
Downloading and Installing R 5
IDEs and Text Editors 8
Loading and Installing R Packages 9
R Basics for Machine Learning 12
Further Reading on R 27
2. Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Exploration versus Confirmation 29
What Is Data? 30
Inferring the Types of Columns in Your Data 34
Inferring Meaning 36
Numeric Summaries 37
Means, Medians, and Modes 37
Quantiles 40
Standard Deviations and Variances 41
Exploratory Data Visualization 44
Visualizing the Relationships Between Columns 61
3. Classification: Spam Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
This or That: Binary Classification 73
Moving Gently into Conditional Probability 77
Writing Our First Bayesian Spam Classifier 78
Defining the Classifier and Testing It with Hard Ham 85
Testing the Classifier Against All Email Types 88
Improving the Results 90
iii
www.it-ebooks.info
- 1
- 2
- 3
- 4
- 5
- 6
前往页