# Detecting Credit Card Fraud using XGBoost and Bayesian Hyper-Parameter Optimization
## Summary
A detailed description of the project can be found [here](https://github.com/MiladShahidi/xgboost-fraud-detection/blob/master/XGBoost_Fraud_Detection.ipynb).
In this project I use the **Extreme Gradient Boosting (XGBoost)** algorithm to detect fradulent credit card transactions in a real-world (anonymized) dataset of european credit card transactions. I evaluate the performance of the model on a held-out test set and compare its performance to a few other popular classification algorithms, namely, **Logistic Regression, Random Forests and Extra Trees Classifier** (Geurts, Ernst, and Wehenkel 2006), and **show that a well-tuned XGBoost classifier outperforms all of them**.
The main challenge in fraud detection is the **extreme class imbalance** in the data which makes it difficult for many classification algorithms to effectively separate the two classes. **Only 0.172% of transactions are labeled as fradulent** in this dataset. I address the class imbalance by reweighting the data before training XGBoost (and by SMOTE oversamping in the case of Logistic regression).
Hyper-parameter tuning can considerably improve the performance of learning algorithms. XGBoost has many hyper-parameters which make it powerful and flexible, but also very difficult to tune due to the high-dimensional parameter space. Instead of the more traditional tuning methods (i.e. grid search and random search) that perform a brute force search through the parameter space, I use **Bayesian hyper-parameter optimization** (implemented in the hyperopt package) which has been shown to be more efficient than grid and random search (Bergstra, Yamins, and Cox 2013).
The full python code can be found [here](https://github.com/MiladShahidi/xgboost-fraud-detection/blob/master/XGBoost_Fraud_Detection.py).
Keywords: **XGBoost, Imbalanced/Cost-sensitive learning, Bayesian hyper-parameter tuning**
使用极端梯度提升(XGBoost)检测信用卡欺诈.zip
版权申诉
33 浏览量
2023-03-31
22:45:28
上传
评论
收藏 139KB ZIP 举报
快撑死的鱼
- 粉丝: 1w+
- 资源: 9153
最新资源
- python-leetcode面试题解之第274题H指数.zip
- python-leetcode面试题解之第270题最接近二叉搜索树值.zip
- python-leetcode面试题解之第267题回文排列II.zip
- python-leetcode面试题解之第264题丑数II.zip
- python-leetcode面试题解之第263题丑数.zip
- python-leetcode面试题解之第258题各位相加.zip
- python-leetcode面试题解之第257题二叉树的所有路径.zip
- python-leetcode面试题解之第253题会议室II.zip
- python-leetcode面试题解之第252题会议室.zip
- python-leetcode面试题解之第249题移位字符串分组.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈