Machine Learning Essentials: Practical Guide in R Book preview

所需积分/C币:50 2018-03-19 20:28:19 323KB PDF
收藏 收藏
举报

Discovering knowledge from big multivariate data, recorded every days, requires specialized machine learning techniques. This book presents an easy to use practical guide in R to compute the most popular machine learning methods for exploring data sets, as well as, for building predictive models.
Copyright @2017 by Alboukadel Kassambara. All rights reserved PublishedbySthda(Http://www.sthda.com),alBoukadelKassambara Contact:AlboukadelKassambara<alboukadelkassambara@gmail.com> No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, without the prior written permission of the Publisher. Requests to the Publisher for perMission should beaddressedtoSthda(httP://www.sthDa.com) Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in Preparing ihis book, they make no represent. ations or warra nt ies with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties o merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials Neither the publisher nor the authors. contributors, or editors assume any liability for any injury and /or damage o persons or property as a Inatter of protluc t s lia bility negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein ForgeneralinforinationcontactAlboukadelKassainbara.<a.Ibouka.del.kassa.inba.raagmaIl.com> Contents 0. 1 What you will learn IX 0.2 Key features of this book 0.3 Book website X about the author I Basics 1 Iutroduction to R 1.1 Insta II R and Studio 1.2 Install and load required R packages 1.3 Data, format 1. 4 Import your data in R 1222333 1.5 Demo data sets 1.6D latic 4 1.7 Data visula.lization 4 1.8 Close your R/RStudio session II Regression Analysis 5 2 Introduction 6 2.1 Examples of data set 3 Linear Regression 10 3.1 Introduction 10 3.2 Formula 11 3.3 Loading Required R packages °) 3.4 Preparing the data 12 3.5 Computing linear regression 3.6 Interpretation ,15 3.7 Making predictions 18 3.8 Discussion 19 4 Interaction Effects in Multiple regression 21 4.1 Introduction 4.2 Equati 21 4.3 Loading required r packages 22 4.4 Preparing the data 22 4.5Co Itat 4.6 Interpretation 24 CONTENTS 4.7 Comparing the additive and the interaction models 4.8 Discussion 5 Regression with Categorical variables 26 1 Introducti 26 5.2 Loading required r packages 5.3 Example of data set 4) 5. 4 Categorical variables with two levels 7 3.5 Categorical variables with more than two levels .28 5. 6 Discussion 30 6 Nonlinear Regression 31 6.1 Introduction 31 6.2 Loading required R, packages 31 6.3r the data 6.4 Linear regression (linear-reg) 6.5 Polynomial regressioN 33 6. 6 Log transformation 6.7 Spline 36 6. 8 Generalized additive models 37 6.9C ng the models 6.10 Discussion nil Regression Diagnostics 39 7 Introduction 10 8 Regression Assumptions and Diagnostics 41 41 8.2 Loading required r packages 83E 42 8. 4 Building a regression model 8.5 Fitted values and residuals 42 8.6 Regression assumptions 44 8.7 Regression diagnostics reg-diagl 44 8. 8 Linearity of the data 46 8.9 Homogeneity of variance 8.10 Normality of residuals 49 8.11 Outliers and high levarage points 8. 12 Influential values 51 8.13 Discussion g Multicollinearity 54 9. 1 Introduction .,54 9.2 Loading required R packages 54 9.3 Preparing the data 9. 4 Building a regression model 9.5 Detecting multicollinearity 9.6 Dealing with multicollinearity 9. 7 Discussion 56 10 Confounding variables 57 CONTENTS Iv Regression Model validation 58 11 ntroduction 59 12 Regression Model Accuracy Metrics 60 12. 1 Introduction 12.2 Model performance metrics 12.3 Loading required R packages 61 12.4 Example of data 12.5 Building regression models 61 12.6 Assessing Inodel quality 12.7 Comparing regression models performance 12.8 Discussion 13 Cross-validation 65 13.1 Introduction 13.2 Loading required R packages 13.3 Example of data 13.4 Model perforIllallce Metrics 13.5 Cross-validation methods 66 13.6 Discussion 70 14 Bootstrap resampling 71 14.1 Introduction 141.2 Loading required R packages 71 14.3 Example of data 71 14. Bootstrap procedure 72 14.5 Evaluating a predictive model performance 14.6 Quantifying an estimator uncertainty and confidence intervals 73 1 4.7 Discussion 74 v Model selection 75 15 Introduction 76 16 Best Subsets regression 77 16.1 Introduction 16.2 Loading required R packages 16.3 Example of data 16.4 Computing best subsets regression 16.5 Choosing the optimal inodel 78 16.6 Discussion 17 Stepwise Regression 82 17.1 Introduction 17.2 Loading required R pa.ckages 17.3 Computing stepwise regression 17.4 Discussion 85 18 Penalized Regression: Ridge, Lasso and Elastic Net 87 18.1 Introduction 87 18.2 Shrinkage methods CONTENTS 18.3 Loading required R packages 18.4 Preparing the data 18.5 Computing penalized linear regression 18.6 Discussion 19 Principal Component and Partial Least Squares Regression 96 19.1 Introduction 19.2 Principal component regression 19.3 Partial least squares regression 7 19.4 Loading required R, packages 97 19.5 Preparing the data 19.6 Computation 97 19.7 Discussion 100 vi Classification 102 20 Introduction 103 20.1 Examples of data set ,,103 21 Logistic Regression 105 21.1 Introduction 105 21.2 Logistic function 105 21.3 Loading required R packages 106 21. Preparing the data 106 21.5 Computing logistic regression 107 21.6 Interpretation .109 21.7 Making predictions .110 21.8 Assessing model accuracy ,111 21.9 Discussion 22 Stepwise Logistic Regression 113 22 1 Loading required R packages 113 22.2 Preparing the data 113 22.3 Computing stepwise logistique regression ,114 22.4 Discussion 115 23 Penalized Logistic Regression 116 23.1 Introduction 116 23. 2 Loading required R packages 116 23.3 Preparing the data 117 23.4 Computing penalized logistic regression 117 23.5 Discussion 121 24 Logistic Regression Assumptions and Diagnostics 122 24.1 Introduction 122 24.2 Logistic regression assumptions 122 24.3 Loading required R packages 24.4 Building a logistic regression model 123 24.5 Logistic regression diagnostics 24 6 Discussion 126 25 Multinomial Logistic Regression l27 CONTENTS 25.1 Introduction 12 25.2 Loading re ed r packa 127 25.3 Preparing the data 127 25.4 Computing multinomial logistic regression 128 25.5 Discussio 128 26 Discriminant Analysis 129 26.1 Introduction 129 26.2 Loading required r packages .130 26.3 Preparing the da 130 26.4 Linear discriminant analysis-LDA 130 26.5 Quadratic discriminant analysis-QDA 26.6 Mixture discriminant analysis- MDA 133 .133 26.7 Flexible discriminant analvsis- FDA 134 26.8 Regularized discriminant analvsis 1:35 26.9 Discussion 135 27 Naive Bayes Classifier 136 27.1 Introduction 136 27.2 Loading required R packages ,136 27.3 Preparing the data 136 27. 4 Computing Naive baves 137 27.5 Using caret R package 137 27.6 Discussion ..137 28 Support Vector Machine l38 28.1 Introduction 138 28.2 Loading required R packages 138 28.3 Example of data set 138 28.4 SVM linear classifier ,,139 28.5 SVM classifier using Non-Linear Kernel .,140 28.6 Discussion ,,,141 29 Classification model evaluation 143 29.1 Introduction 143 29.2 Loading required R packages 143 29.3 Building a classificatiON Inodel 144 29.4 Overall classification accuracy 144 29.5 Confusion natrix 145 29.6 Precision, Recall and Specificity 146 29.7 ROC curve .148 9 8 Multiclass set tings 29.9 Discussion ,152 Vii Statistical Machine Learning 153 30 Introduction 154 31 KNN -k-Nearest Neighbors 155 31 15 31.2 KNn algorithm 155 31.3 Loading required R packages 156 CONTENTS 31.4 Classification 156 31.5 KNN for regression 157 31.6 Discussion 158 32 Decision tree models 160 32.1 Introduction 160 32.2 Loading required R packages l60 32.3 Decision tree algorithm 160 32.4 Classification trees 16 32.5 Regression trees 32.6 Conditionnal inference tree 169 32.7 Discussion ..171 33 Bagging and Random Forest 172 33.1 Introduction 172 33.2 Loading required R packages 172 33.3 Classification 173 33.4 egression 176 33.5 Hyperparameters 177 33.6 Discussion ,178 34 Boosting I79 34.1 Loading required R packages 179 34.2 Classification 180 34.3Re LO 181 34. 4 discussion .182 VI Unsupervised Learning 183 35 Unsupervised Learning 184 35.1 Introduction l84 35.2 Principal component methods 184 35.3 Loading required R packages 185 35.4 Clust 191 35.5 Discussi 195 Preface 0.1 What you will learn Large amount of data are recorded every day in different fields, including marketing bio-medical and security. To discover knowledge from these data, you need machine learning techniques which are classified into two categories 1. Unsupervised machine learning methods These include mainly clustering and principal component analysis methods. The goal of clu tering is to identify pattern or groups of similar objects within a data set of interest. Principal component methods consist of summarizing and visualizing the most important information contained in a multivariate data set These methods are "unsupervised" because we are not guided by a priori ideas of which variable or samples belong in which clusters or groups. The machine algorithm"learns "how to cluster or summarize the data 2. Supervised machine learning methods Supervised learning consists of building mathematical models for predicting the out come of future observations. Predictive models can be classified into two main groups regres sion, analy s for predicting a continuous variable. For example, you might, want to predict life expectancy based on socio-economic indicators Classification for predicting the class (or group) of individuals. For example, you might want to predict the probability of being diabetes-positive based on the glucose concentra tion in the plasma of patients These met hods are supervised beca uise we build the model based on known out come values That is, the Inachine learns froIn known observation outcomes in order to predict the outcome of future cases In this book, we present a practical guide to machine learning methods for exploring data sets as well as, for building predictive models You'll learn the basic ideas of each method and reproducible r codes for easily computing a large nunber of machine learning techniques 0.2 Key features of this book Our goal was to write a practical guide to machine learning for every one The main parts of the book include

...展开详情
试读 43P Machine Learning Essentials: Practical Guide in R  Book preview
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    抢沙发
    一个资源只可评论一次,评论内容不能少于5个字
    img
    johnmy

    关注 私信 TA的资源

    上传资源赚积分,得勋章
    最新推荐
    Machine Learning Essentials: Practical Guide in R Book preview 50积分/C币 立即下载
    1/43
    Machine Learning Essentials: Practical Guide in R  Book preview第1页
    Machine Learning Essentials: Practical Guide in R  Book preview第2页
    Machine Learning Essentials: Practical Guide in R  Book preview第3页
    Machine Learning Essentials: Practical Guide in R  Book preview第4页
    Machine Learning Essentials: Practical Guide in R  Book preview第5页
    Machine Learning Essentials: Practical Guide in R  Book preview第6页
    Machine Learning Essentials: Practical Guide in R  Book preview第7页
    Machine Learning Essentials: Practical Guide in R  Book preview第8页
    Machine Learning Essentials: Practical Guide in R  Book preview第9页
    Machine Learning Essentials: Practical Guide in R  Book preview第10页
    Machine Learning Essentials: Practical Guide in R  Book preview第11页
    Machine Learning Essentials: Practical Guide in R  Book preview第12页
    Machine Learning Essentials: Practical Guide in R  Book preview第13页

    试读已结束,剩余30页未读...

    50积分/C币 立即下载 >