# `bayesian_bootstrap` ![test badge](https://travis-ci.org/lmc2179/bayesian_bootstrap.svg?branch=master) [![PyPI version](https://badge.fury.io/py/bayesian_bootstrap.svg)](https://badge.fury.io/py/bayesian_bootstrap)
`bayesian_bootstrap` is a package for Bayesian bootstrapping in Python. For an overview of the Bayesian bootstrap, I highly recommend reading [Rasmus Bååth's writeup](http://www.sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/). This Python package is similar to his [R package](http://www.sumsar.net/blog/2016/02/bayesboot-an-r-package/).
This README contains some examples, below. For the documentation of the package's API, see the [docs](http://htmlpreview.github.io/?https://github.com/lmc2179/bayesian_bootstrap/blob/master/docs/bootstrap_documentation.html).
This package is on pypi - you can install it with `pip install bayesian_bootstrap`.
# Overview of the `bootstrap` module
The main module in the `bayesian_bootstrap` package is the `bootstrap` module. The `bootstrap` module contains tools
for doing approximate bayesian inference using the Bayesian Bootstrap introduced in [Rubin's _The Bayesian Bootstrap_](https://projecteuclid.org/euclid.aos/1176345338).
It contains the following:
* The `mean` and `var` functions, which simulate the posterior distributions of the mean and variance
* The `bayesian_bootstrap` function, which simulates the posterior distribution of an arbitrary statistic
* The `BayesianBootstrapBagging` class, a wrapper allowing users to generate ensembles of regressors/classifiers
using Bayesian Bootstrap resampling. A base class with a scikit-learn like estimator needs to be provided. See also
the `bayesian_bootstrap_regression` function.
* The `central_credible_interval` and `highest_density_interval` functions, which compute credible intervals from
posterior samples.
For more information about the function signatures above, see the examples below or the docstrings of each function/class.
One thing that's worth making clear is the interpretation of the parameters of the `bayesian_bootstrap`, `BayesianBootstrapBagging`, and `bayesian_bootstrap_regression` functions, which all do sampling within each bootstrap replication:
* The number of replications is the number of times the statistic of interested will be replicated. If we think about the classical bootstrap, this is the number of times your dataset is resampled. If we think about it from a bayesian point of view, this is the number of draws from the posterior distribution.
* The resample size is the size of the dataset used to calculate the statistic of interest in each replication. More is better - you'll probably want this to be at least as large as your original dataset.
# Example: Estimating the mean
Let's say that we observe some data points, and we wish to simulate the posterior distribution of their mean.
The following code draws four data points from an exponential distribution:
```
X = np.random.exponential(7, 4)
```
Now, we are going to simulate draws from the posterior of the mean. `bayesian_bootstrap` includes a `mean` function in
the `bootstrap` module that will do this for you.
The code below performs the simulation and calculates the 95% highest density interval using 10,000 bootstrap replications. It also uses the wonderful
`seaborn` library to visualize the histogram with a Kernel density estimate.
Included for reference in the image is the same dataset used in a classical bootstrap, to illustrate the comparative
smoothness of the bayesian version.
```
from bayesian_bootstrap.bootstrap import mean, highest_density_interval
posterior_samples = mean(X, 10000)
l, r = highest_density_interval(posterior_samples)
plt.title('Bayesian Bootstrap of mean')
sns.distplot(posterior_samples, label='Bayesian Bootstrap Samples')
plt.plot([l, r], [0, 0], linewidth=5.0, marker='o', label='95% HDI')
```
The above code uses the `mean` method to simulate the posterior distribution of the mean. However, it is a special
(if very common) case, along with `var` - all other statistics should use the `bayesian_bootstrap` method. The
following code demonstrates doing this for the posterior of the mean:
```
from bayesian_bootstrap.bootstrap import bayesian_bootstrap
posterior_samples = bayesian_bootstrap(X, np.mean, 10000, 100)
```
![Posterior](bayesian_bootstrap/demos/readme_exponential.png)
# Example: Regression modelling
<!--
Problem setup
Sample data points
Show scatterplot + code
Show posterior samples for slope
Show show scatterplot with prediction bands
-->
Let's take another example - fitting a linear regression model. The following code samples a few points in the plane.
The mean is y = x, and normally distributed noise is added.
```
X = np.random.normal(0, 1, 5).reshape(-1, 1)
y = X.reshape(1, -1).reshape(5) + np.random.normal(0, 1, 5)
```
We build models via bootstrap resampling, creating an ensemble of models via bootstrap aggregating. A
`BayesianBootstrapBagging` wrapper class is available in the library, which is a bayesian analogue to scikit-learn's
`BaggingRegressor` and `BaggingClassifer` classes.
```
m = BayesianBootstrapBagging(LinearRegression(), 10000, 1000)
m.fit(X, y)
```
Once we've got our ensemble trained, we can make interval predictions for new inputs by calculating their HDIs under the
ensemble:
```
X_plot = np.linspace(min(X), max(X))
y_predicted = m.predict(X_plot.reshape(-1, 1))
y_predicted_interval = m.predict_highest_density_interval(X_plot.reshape(-1, 1), 0.05)
plt.scatter(X.reshape(1, -1), y)
plt.plot(X_plot, y_predicted, label='Mean')
plt.plot(X_plot, y_predicted_interval[:,0], label='95% HDI Lower bound')
plt.plot(X_plot, y_predicted_interval[:,1], label='95% HDI Upper bound')
plt.legend()
plt.savefig('readme_regression.png', bbox_inches='tight')
```
![Posterior](bayesian_bootstrap/demos/readme_regression.png)
Users interested in accessing the base models can do so via the `base_models_` attribute of the object.
# Contributions
Interested in contributing? We'd love to have your help! Please keep the following in mind:
* Bug fixes are welcome! Make sure you reference the issue number that is being resolved, and that all test cases in `tests` pass on both Python 2.7 and 3.4/3.5.
* New features are welcome as well! Any new features should include docstrings and unit tests in the `tests` directory.
* If you want to contribute a case study or other documentation, feel free to write up a github-flavored markdown document or ipython notebook and put it in the `examples` folder before issuing a pull request.
Credit for past contributions:
* [roya0045](https://github.com/roya0045) implemented the original version of the low-memory optimizations.
* [JulianWgs](https://github.com/JulianWgs) implemented the Bayesian machine learning model using weight distributions instead of resampling.
# Further reading
* [_The Bayesian Bootstrap_, Rubin, 1981](https://projecteuclid.org/euclid.aos/1176345338)
* [Rasmus Bååth's original writeup on the Bayesian Bootstrap](http://www.sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/)
没有合适的资源?快使用搜索试试~ 我知道了~
bayesian_bootstrap:python中的贝叶斯自举
共17个文件
py:9个
png:2个
txt:1个
需积分: 47 11 下载量 175 浏览量
2021-01-31
17:29:22
上传
评论
收藏 107KB ZIP 举报
温馨提示
bayesian_bootstrap bayesian_bootstrap是Python中用于贝叶斯引导的软件包。 有关贝叶斯引导程序的概述,我强烈建议阅读。 这个Python包类似于他的。 本自述文件包含以下一些示例。 有关软件包API的,请参阅 。 该软件包位于pypi上-您可以使用pip install bayesian_bootstrap进行pip install bayesian_bootstrap 。 bootstrap模块概述 bayesian_bootstrap软件包中的主要模块是bootstrap模块。 bootstrap模块包含使用引入的Bayesian Bootstrap进行近似贝叶斯推理的工具。 它包含以下内容: mean和var函数,用于模拟均值和方差的后验分布 bayesian_bootstrap函数,用于模拟任意统计量的后验分布 BayesianBootstrapBagging类,一个包装器,允许用户使用Bayesian Bootstrap重采样来生成回归器/分类器的集合。 需要提供一个具有scikit-learn之类的估计器的基类。 另请参见b
资源详情
资源评论
资源推荐
收起资源包目录
bayesian_bootstrap-master.zip (17个子文件)
bayesian_bootstrap-master
MANIFEST.in 242B
requirements.txt 36B
bayesian_bootstrap
bootstrap.py 10KB
demos
group_mean_secret_weapon.py 812B
__init__.py 0B
linear_regression.py 294B
readme_regression.png 28KB
demos.py 6KB
readme_exponential.png 59KB
__init__.py 0B
tests
test_bootstrap.py 8KB
.travis.yml 333B
LICENSE 1KB
setup.py 1KB
README.md 7KB
docs
bootstrap_documentation.html 111KB
build.py 125B
共 17 条
- 1
小马甲不小
- 粉丝: 30
- 资源: 4714
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0