Andreas C. Müller & Sarah Guido
Introduction to
Machine
Learning
with P y t hon
A GUIDE FOR DATA SCIENTISTS
Andreas C. Müller and Sarah Guido
Introduction to Machine Learning
with Python
A Guide for Data Scientists
Boston Farnham Sebastopol
Tokyo
Beijing Boston Farnham Sebastopol
Tokyo
Beijing
978-1-449-36941-5
[LSI]
Introduction to Machine Learning with Python
by Andreas C. Müller and Sarah Guido
Copyright © 2017 Sarah Guido, Andreas Müller. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/
institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Dawn Schanafelt
Production Editor: Kristen Brown
Copyeditor: Rachel Head
Proofreader: Jasmine Kwityn
Indexer: Judy McConville
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
October 2016: First Edition
Revision History for the First Edition
2016-09-22: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781449369415 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Introduction to Machine Learning with
Python, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Why Machine Learning? 1
Problems Machine Learning Can Solve 2
Knowing Your Task and Knowing Your Data 4
Why Python? 5
scikit-learn 5
Installing scikit-learn 6
Essential Libraries and Tools 7
Jupyter Notebook 7
NumPy 7
SciPy 8
matplotlib 9
pandas 10
mglearn 11
Python 2 Versus Python 3 12
Versions Used in this Book 12
A First Application: Classifying Iris Species 13
Meet the Data 14
Measuring Success: Training and Testing Data 17
First Things First: Look at Your Data 19
Building Your First Model: k-Nearest Neighbors 20
Making Predictions 22
Evaluating the Model 22
Summary and Outlook 23
iii