Sebastian Raschka

Vahid Mirjalili

BIRMINGHAM - MUMBAI

Machine Learning and Deep Learning

with Python, scikit-learn and TensorFlow

Python Machine Learning

Second Edition

Preface xi

Chapter 1: Giving Computers the Ability to Learn from Data 1

Building intelligent machines to transform data into knowledge 2

The three different types of machine learning 2

Making predictions about the future with supervised learning 3

Classication for predicting class labels 3

Regression for predicting continuous outcomes 5

Solving interactive problems with reinforcement learning 6

Discovering hidden structures with unsupervised learning 7

Finding subgroups with clustering 7

Dimensionality reduction for data compression 8

Introduction to the basic terminology and notations 8

A roadmap for building machine learning systems 11

Preprocessing – getting data into shape 12

Training and selecting a predictive model 12

Evaluating models and predicting unseen data instances 13

Using Python for machine learning 13

Installing Python and packages from the Python Package Index 14

Using the Anaconda Python distribution and package manager 14

Packages for scientic computing, data science, and machine learning 15

Summary 15

Chapter 2: Training Simple Machine Learning Algorithms

for Classication 17

Articial neurons – a brief glimpse into the early history of

machine learning 18

The formal denition of an articial neuron 19

The perceptron learning rule 21

Contents

Implementing a perceptron learning algorithm in Python 24

An object-oriented perceptron API 24

Training a perceptron model on the Iris dataset 28

Adaptive linear neurons and the convergence of learning 34

Minimizing cost functions with gradient descent 35

Implementing Adaline in Python 38

Improving gradient descent through feature scaling 42

Large-scale machine learning and stochastic gradient descent 44

Summary 50

Chapter 3: A Tour of Machine Learning Classiers

Using scikit-learn 51

Choosing a classication algorithm 52

First steps with scikit-learn – training a perceptron 52

Modeling class probabilities via logistic regression 59

Logistic regression intuition and conditional probabilities 59

Learning the weights of the logistic cost function 63

Converting an Adaline implementation into an algorithm for

logistic regression 66

Training a logistic regression model with scikit-learn 71

Tackling overtting via regularization 73

Maximum margin classication with support vector machines 76

Maximum margin intuition 77

Dealing with a nonlinearly separable case using slack variables 79

Alternative implementations in scikit-learn 81

Solving nonlinear problems using a kernel SVM 82

Kernel methods for linearly inseparable data 82

Using the kernel trick to nd separating hyperplanes in

high-dimensional space 84

Decision tree learning 88

Maximizing information gain – getting the most bang for your buck 90

Building a decision tree 95

Combining multiple decision trees via random forests 98

K-nearest neighbors – a lazy learning algorithm 101

Summary 105

Chapter 4: Building Good Training Sets – Data Preprocessing 107

Dealing with missing data 107

Identifying missing values in tabular data 108

Eliminating samples or features with missing values 109

Imputing missing values 110

Understanding the scikit-learn estimator API 111

Handling categorical data 112

Nominal and ordinal features 113

Creating an example dataset 113

Mapping ordinal features 113

Encoding class labels 114

Performing one-hot encoding on nominal features 116

Partitioning a dataset into separate training and test sets 118

Bringing features onto the same scale 120

Selecting meaningful features 123

L1 and L2 regularization as penalties against model complexity 124

A geometric interpretation of L2 regularization 124

Sparse solutions with L1 regularization 126

Sequential feature selection algorithms 130

Assessing feature importance with random forests 136

Summary 139

Chapter 5: Compressing Data via Dimensionality Reduction 141

Unsupervised dimensionality reduction via principal

component analysis 142

The main steps behind principal component analysis 142

Extracting the principal components step by step 144

Total and explained variance 147

Feature transformation 148

Principal component analysis in scikit-learn 151

Supervised data compression via linear discriminant analysis 155

Principal component analysis versus linear discriminant analysis 155

The inner workings of linear discriminant analysis 156

Computing the scatter matrices 157

Selecting linear discriminants for the new feature subspace 160

Projecting samples onto the new feature space 162

LDA via scikit-learn 163

Using kernel principal component analysis for nonlinear mappings 165

Kernel functions and the kernel trick 166

Implementing a kernel principal component analysis in Python 172

Example 1 – separating half-moon shapes 173

Example 2 – separating concentric circles 176

Projecting new data points 179

Kernel principal component analysis in scikit-learn 183

Summary 184