TreeInfluence: Influence Estimation for Gradient-Boosted Decision Trees
---
[![PyPi version](https://img.shields.io/pypi/v/tree_influence)](https://pypi.org/project/tree_influence/)
[![Python version](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue)](https://pypi.org/project/tree_influence/)
[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/jjbrophy47/tree_influence/blob/master/LICENSE)
[![Build](https://github.com/jjbrophy47/tree_influence/actions/workflows/wheels.yml/badge.svg?branch=v0.0.3)](https://github.com/jjbrophy47/tree_influence/actions/workflows/wheels.yml)
**tree-influence** is a python library that implements influence estimation for gradient-boosted decision trees (GBDTs), adapting popular techniques such as TracIn and Influence Functions to GBDTs. This library is compatible with all major GBDT frameworks including LightGBM, XGBoost, CatBoost, and SKLearn.
<p align="center">
<img align="center" src="images/illustration.png" alt="illustration">
</p>
Installation
---
```shell
pip install tree-influence
```
Usage
---
Simple example using *BoostIn* to identify the most influential training instances to a given test instance:
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from tree_influence.explainers import BoostIn
# load iris data
data = load_iris()
X, y = data['data'], data['target']
# use two classes, then split into train and test
idxs = np.where(y != 2)[0]
X, y = X[idxs], y[idxs]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)
# train GBDT model
model = LGBMClassifier().fit(X_train, y_train)
# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)
# estimate training influences on each test instance
influence = explainer.get_local_influence(X_test, y_test) # shape=(no. train, no. test)
# extract influence values for the first test instance
values = influence[:, 0] # shape=(no. train,)
# sort training examples from:
# - most positively influential (decreases loss of the test instance the most), to
# - most negatively influential (increases loss of the test instance the most)
training_idxs = np.argsort(values)[::-1]
```
Supported Estimators
---
**tree-influence** supports the following influence-estimation techniques in GBDTs:
| Method | Description |
| -------| ----------- |
| BoostIn | Traces the influence of a training instance throughout the training process (adaptation of TracIn). |
| TREX | Trains a surrogate kernel model that approximates the original model and decomposes any prediction into a weighted sum of the training examples (adaptation of representer-point methods). |
| LeafInfluence | Estimates the impact of a training example on the *final* GBDT model (adaptation of influence functions). |
| TreeSim | Computes influence via similarity in tree-kernel space. |
| LOO | Leave-one-out retraining, measures the influence of a training instance by removing and retraining without that instance.
License
---
[Apache License 2.0](https://github.com/jjbrophy47/tree_influence/blob/master/LICENSE).
Reference
---
Brophy, Hammoudeh, and Lowd. [Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees](http://jmlr.org/papers/v24/22-0449.html). *Journal of Machine Learning Research* (JMLR), 2023.
```
@article{brophy2023treeinfluence,
author = {Jonathan Brophy and Zayd Hammoudeh and Daniel Lowd},
title = {Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees},
journal = {Journal of Machine Learning Research},
year = {2023},
volume = {24},
number = {154},
pages = {1--48},
url = {http://jmlr.org/papers/v24/22-0449.html},
}
```
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
《梯度增强决策树影响估计方法的适应与评价》论文及实验代码 (259个子文件)
CITATION 391B
spambase.DOCUMENTATION 6KB
.gitignore 337B
LICENSE 11KB
makefile 461B
README.md 4KB
readme.md 485B
readme.md 432B
readme.md 429B
readme.md 421B
readme.md 393B
readme.md 385B
readme.md 357B
readme.md 357B
readme.md 356B
readme.md 351B
readme.md 350B
readme.md 349B
readme.md 346B
readme.md 344B
readme.md 333B
readme.md 328B
readme.md 319B
readme.md 315B
readme.md 313B
readme.md 311B
readme.md 309B
readme.md 307B
readme.md 303B
readme.md 297B
readme.md 296B
readme.md 294B
readme.md 286B
readme.md 281B
readme.md 271B
illustration.png 21KB
_tree64.pxd 3KB
_tree32.pxd 3KB
util.py 22KB
trex.py 19KB
test_parser.py 18KB
leaf_refitLE.py 17KB
leaf_infLE.py 17KB
variance.py 16KB
leaf_refit.py 16KB
test_util.py 15KB
leaf_inf.py 15KB
noise_set.py 13KB
util.py 13KB
dshap.py 11KB
label_edit.py 11KB
prediction.py 11KB
compress.py 11KB
counterfactual2.py 11KB
post_args.py 11KB
leaf_infSPLE.py 11KB
label_set.py 10KB
remove_set.py 10KB
boostinW2LE.py 10KB
boostinW1LE.py 10KB
leaf_influence_ut.py 10KB
leaf_influence.py 10KB
leaf_refit.py 10KB
boostinLE.py 10KB
exp_args.py 9KB
tree.py 9KB
counterfactual.py 9KB
structure.py 9KB
parser_sk.py 9KB
util.py 9KB
targeted_edit.py 8KB
reinfluence.py 8KB
summ_args.py 8KB
ranking.py 8KB
tree_sim.py 8KB
variance.py 8KB
leaf_infSP.py 8KB
noise_set.py 7KB
correlation.py 7KB
noise_set.py 7KB
magnitude.py 7KB
ranking_copy.py 7KB
remove.py 7KB
subsample.py 7KB
loo.py 7KB
looLE.py 7KB
boostinW2.py 7KB
boostinW1.py 7KB
poison_set.py 7KB
correlation.py 7KB
boostin.py 7KB
explain_compas.py 7KB
status.py 7KB
influenceLE.py 7KB
poison.py 7KB
leaf_analysis.py 7KB
label_set.py 6KB
adaptation.py 6KB
resources.py 6KB
poison_set.py 6KB
共 259 条
- 1
- 2
- 3
资源评论
蓝海渔夫
- 粉丝: 692
- 资源: 20
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功