iii
5 Data Preparation for Gradient Boosting 23
5.1 Label Encode String Class Values . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 One Hot Encode Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Support for Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 How to Evaluate XGBoost Models 33
6.1 Evaluate Models With Train and Test Sets . . . . . . . . . . . . . . . . . . . . . 33
6.2 Evaluate Models With k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . 34
6.3 What Techniques to Use When . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Visualize Individual Trees Within A Model 37
7.1 Plot a Single XGBoost Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
III XGBoost Advanced 40
8 Save and Load Trained XGBoost Models 41
8.1 Serialize Models with Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.2 Serialize Models with Joblib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9 Feature Importance With XGBoost and Feature Selection 45
9.1 Feature Importance in Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . 45
9.2 Manually Plot Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.3 Using theBuilt-in XGBoost Feature Importance Plot . . . . . . . . . . . . . . . 47
9.4 Feature Selection with XGBoost Feature Importance Scores . . . . . . . . . . . 49
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10 Monitor Training Performance and Early Stopping 51
10.1 Early Stopping to Avoid Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.2 Monitoring Training Performance With XGBoost . . . . . . . . . . . . . . . . . 51
10.3 Evaluate XGBoost Models With Learning Curves . . . . . . . . . . . . . . . . . 53
10.4 Early Stopping With XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
11 Tune Multithreading Support for XGBoost 59
11.1 Problem Description: Otto Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 59
11.2 Impact of the Number of Threads . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.3 Parallelism When Cross Validating XGBoost Models . . . . . . . . . . . . . . . 63
11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12 Train XGBoost Models in the Cloud with Amazon Web Services 66
12.1 Tutorial Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
12.2 Setup Your AWS Account (if needed) . . . . . . . . . . . . . . . . . . . . . . . . 67
12.3 Launch Your Server Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
评论0
最新资源