sklearn-gbmi: scikit-learn gradient-boosting-model interactions
===============================================================
This package provides a Python module for computing Friedman and Popescu's *H* statistics, in order to look for
interactions among variables in scikit-learn gradient-boosting models
(http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).
See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.*
**2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
Installation
------------
pip install sklearn-gbmi
On some systems, if you wish to use this package with Python 3, then you must install with `pip3` rather than `pip`.
In case of difficulties with installing or using this package, consult "Advanced installation" below.
Usage
-----
Given a scikit-learn gradient-boosting model `gbm` that has been fitted to a NumPy array or pandas data frame
`array_or_frame` and a list of indices of columns of the array or columns of the data frame `indices_or_columns`, the
*H* statistic of the variables represented by the elements of `array_or_frame` and specified by `indices_or_columns` can
be computed via
from sklearn_gbmi import *
h(gbm, array_or_frame, indices_or_columns)
Alternatively, the two-variable *H* statistic of each pair of variables represented by the elements of `array_or_frame`
and specified by `indices_or_columns` can be computed via
from sklearn_gbmi import *
h_all_pairs(gbm, array_or_frame, indices_or_columns)
(Compared to iteratively calling `h`, calling `h_all_pairs` avoids redundant computations.)
`indices_or_columns` is optional, with default value `'all'`. If it is `'all'`, then all columns of `array_or_frame` are
used.
`NaN` is returned if a computation is spoiled by weak main effects and rounding errors.
*H* varies from 0 to 1. The larger *H*, the stronger the evidence for an interaction among the variables.
Example
-------
See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a
complete example of how to use this package.
Notes
-----
1. Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of
main effects are available as `gbm.feature_importances_` once `gbm` has been fitted.
2. Per Friedman and Popescu, collinearity among variables can lead to interactions in `gbm` that are not present in the
target function. To forestall such spurious interactions, check for strong correlations among variables before fitting
`gbm`.
Advanced installation
---------------------
Installing this package requires NumPy, so if installation fails with a complaint that NumPy is missing, add it to the
install command:
pip install numpy sklearn-gbmi
For performance, this package is partly implemented using Cython (C extensions for Python). It includes a C file that
was generated by Cython, which is compiled for your system when you install the package. Normally, this C file is fine,
but occasionally, it may not compile, or the result may not run. In the first case, installing the package fails, while
in the second case, using the package fails, typically with a cryptic error message; for example:
ValueError: sklearn.tree._criterion.Criterion size changed, may indicate binary incompatibility.
In such a case, you may still be able to install and use the package by regenerating the C file, as follows.
First, if this package is installed (i.e., installation succeeds, but usage fails), uninstall it:
pip uninstall sklearn-gbmi
Then, install Cython:
pip install cython
Next, set the environment variable `USE_CYTHONIZE` to 1. For bash and similar shells:
export USE_CYTHONIZE=1
For csh and similar shells:
setenv USE_CYTHONIZE 1
Finally, reinstall this package:
pip install sklearn-gbmi --no-cache-dir
The C file should be regenerated and compiled for your system, hopefully making this package usable on your system.
scikit-learn梯度提升模型交互.zip
版权申诉
184 浏览量
2023-03-31
22:35:11
上传
评论
收藏 182KB ZIP 举报
快撑死的鱼
- 粉丝: 1w+
- 资源: 9156
最新资源
- 沈丘盛世龙门图纸符合规范化
- HM3400-VB一款N-Channel沟道SOT23的MOSFET晶体管参数介绍与应用说明
- 基于python+streamlit联邦学习进行高校学生成绩预测研究python源码+项目说明+模型+数据.zip
- HM3400D-VB一款N-Channel沟道SOT23的MOSFET晶体管参数介绍与应用说明
- HM3400B-VB一款N-Channel沟道SOT23的MOSFET晶体管参数介绍与应用说明
- spring-boot示例
- 搜集火星资源.py
- JAR应用启动停止脚本化解决方案.zip
- 配合eclipse svn插件subclipse-4.3.4版本的javahl
- Bash脚本教程:如何优雅地停止JAR服务.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈