Sebastian Raschka
Vahid Mirjalili
BIRMINGHAM - MUMBAI
Machine Learning and Deep Learning
with Python, scikit-learn and TensorFlow
Python Machine Learning
Second Edition
Preface xi
Chapter 1: Giving Computers the Ability to Learn from Data 1
Building intelligent machines to transform data into knowledge 2
The three different types of machine learning 2
Making predictions about the future with supervised learning 3
Classication for predicting class labels 3
Regression for predicting continuous outcomes 5
Solving interactive problems with reinforcement learning 6
Discovering hidden structures with unsupervised learning 7
Finding subgroups with clustering 7
Dimensionality reduction for data compression 8
Introduction to the basic terminology and notations 8
A roadmap for building machine learning systems 11
Preprocessing – getting data into shape 12
Training and selecting a predictive model 12
Evaluating models and predicting unseen data instances 13
Using Python for machine learning 13
Installing Python and packages from the Python Package Index 14
Using the Anaconda Python distribution and package manager 14
Packages for scientic computing, data science, and machine learning 15
Summary 15
Chapter 2: Training Simple Machine Learning Algorithms
for Classication 17
Articial neurons – a brief glimpse into the early history of
machine learning 18
The formal denition of an articial neuron 19
The perceptron learning rule 21
Contents
Implementing a perceptron learning algorithm in Python 24
An object-oriented perceptron API 24
Training a perceptron model on the Iris dataset 28
Adaptive linear neurons and the convergence of learning 34
Minimizing cost functions with gradient descent 35
Implementing Adaline in Python 38
Improving gradient descent through feature scaling 42
Large-scale machine learning and stochastic gradient descent 44
Summary 50
Chapter 3: A Tour of Machine Learning Classiers
Using scikit-learn 51
Choosing a classication algorithm 52
First steps with scikit-learn – training a perceptron 52
Modeling class probabilities via logistic regression 59
Logistic regression intuition and conditional probabilities 59
Learning the weights of the logistic cost function 63
Converting an Adaline implementation into an algorithm for
logistic regression 66
Training a logistic regression model with scikit-learn 71
Tackling overtting via regularization 73
Maximum margin classication with support vector machines 76
Maximum margin intuition 77
Dealing with a nonlinearly separable case using slack variables 79
Alternative implementations in scikit-learn 81
Solving nonlinear problems using a kernel SVM 82
Kernel methods for linearly inseparable data 82
Using the kernel trick to nd separating hyperplanes in
high-dimensional space 84
Decision tree learning 88
Maximizing information gain – getting the most bang for your buck 90
Building a decision tree 95
Combining multiple decision trees via random forests 98
K-nearest neighbors – a lazy learning algorithm 101
Summary 105
Chapter 4: Building Good Training Sets – Data Preprocessing 107
Dealing with missing data 107
Identifying missing values in tabular data 108
Eliminating samples or features with missing values 109
Imputing missing values 110
Understanding the scikit-learn estimator API 111
Handling categorical data 112
Nominal and ordinal features 113
Creating an example dataset 113
Mapping ordinal features 113
Encoding class labels 114
Performing one-hot encoding on nominal features 116
Partitioning a dataset into separate training and test sets 118
Bringing features onto the same scale 120
Selecting meaningful features 123
L1 and L2 regularization as penalties against model complexity 124
A geometric interpretation of L2 regularization 124
Sparse solutions with L1 regularization 126
Sequential feature selection algorithms 130
Assessing feature importance with random forests 136
Summary 139
Chapter 5: Compressing Data via Dimensionality Reduction 141
Unsupervised dimensionality reduction via principal
component analysis 142
The main steps behind principal component analysis 142
Extracting the principal components step by step 144
Total and explained variance 147
Feature transformation 148
Principal component analysis in scikit-learn 151
Supervised data compression via linear discriminant analysis 155
Principal component analysis versus linear discriminant analysis 155
The inner workings of linear discriminant analysis 156
Computing the scatter matrices 157
Selecting linear discriminants for the new feature subspace 160
Projecting samples onto the new feature space 162
LDA via scikit-learn 163
Using kernel principal component analysis for nonlinear mappings 165
Kernel functions and the kernel trick 166
Implementing a kernel principal component analysis in Python 172
Example 1 – separating half-moon shapes 173
Example 2 – separating concentric circles 176
Projecting new data points 179
Kernel principal component analysis in scikit-learn 183
Summary 184