Master status: [![Master Build Status - Mac/Linux](https://travis-ci.com/EpistasisLab/tpot.svg?branch=master)](https://travis-ci.com/EpistasisLab/tpot)
[![Master Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/master?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=master)
[![Master Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=master)](https://coveralls.io/github/EpistasisLab/tpot?branch=master)
Development status: [![Development Build Status - Mac/Linux](https://travis-ci.com/EpistasisLab/tpot.svg?branch=development)](https://travis-ci.com/EpistasisLab/tpot/branches)
[![Development Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/development?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=development)
[![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=development)](https://coveralls.io/github/EpistasisLab/tpot?branch=development)
Package information: [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
[![License: LGPL v3](https://img.shields.io/badge/license-LGPL%20v3-blue.svg)](http://www.gnu.org/licenses/lgpl-3.0)
[![PyPI version](https://badge.fury.io/py/TPOT.svg)](https://badge.fury.io/py/TPOT)
<p align="center">
<img src="https://raw.githubusercontent.com/EpistasisLab/tpot/master/images/tpot-logo.jpg" width=300 />
</p>
**TPOT** stands for **T**ree-based **P**ipeline **O**ptimization **T**ool. Consider TPOT your **Data Science Assistant**. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
![TPOT Demo](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-demo.gif "TPOT Demo")
TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.
![An example Machine Learning pipeline](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-ml-pipeline.png "An example Machine Learning pipeline")
<p align="center"><strong>An example Machine Learning pipeline</strong></p>
Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.
![An example TPOT pipeline](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-pipeline-example.png "An example TPOT pipeline")
TPOT is built on top of scikit-learn, so all of the code it generates should look familiar... if you're familiar with scikit-learn, anyway.
**TPOT is still under active development** and we encourage you to check back on this repository regularly for updates.
For further information about TPOT, please see the [project documentation](http://epistasislab.github.io/tpot/).
## License
Please see the [repository license](https://github.com/EpistasisLab/tpot/blob/master/LICENSE) for the licensing and usage information for TPOT.
Generally, we have licensed TPOT to make it as widely usable as possible.
## Installation
We maintain the [TPOT installation instructions](http://epistasislab.github.io/tpot/installing/) in the documentation. TPOT requires a working installation of Python.
## Usage
TPOT can be used [on the command line](http://epistasislab.github.io/tpot/using/#tpot-on-the-command-line) or [with Python code](http://epistasislab.github.io/tpot/using/#tpot-with-code).
Click on the corresponding links to find more information on TPOT usage in the documentation.
## Examples
### Classification
Below is a minimal working example with the optical recognition of handwritten digits dataset.
```python
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25, random_state=42)
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_digits_pipeline.py')
```
Running this code should discover a pipeline that achieves about 98% testing accuracy, and the corresponding Python code should be exported to the `tpot_digits_pipeline.py` file and look similar to the following:
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import PolynomialFeatures
from tpot.builtins import StackingEstimator
from tpot.export_utils import set_param_recursive
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
train_test_split(features, tpot_data['target'], random_state=42)
# Average CV score on the training set was: 0.9799428471757372
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
StackingEstimator(estimator=LogisticRegression(C=0.1, dual=False, penalty="l1")),
RandomForestClassifier(bootstrap=True, criterion="entropy", max_features=0.35000000000000003, min_samples_leaf=20, min_samples_split=19, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
```
### Regression
Similarly, TPOT can optimize pipelines for regression problems. Below is a minimal working example with the practice Boston housing prices data set.
```python
from tpot import TPOTRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
train_size=0.75, test_size=0.25, random_state=42)
tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')
```
which should result in a pipeline that achieves about 12.77 mean squared error (MSE), and the Python code in `tpot_boston_pipeline.py` should look similar to:
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from tpot.export_utils import set_param_recursive
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
train_test_split(features, tpot_data['target'], random_state=42)
# Average CV score on the training set was: -10.812040755234403
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
ExtraTreesRegressor(bootstrap=False, max_features=0.5, min_samples_leaf=2, min_samples_split=3, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeli
没有合适的资源?快使用搜索试试~ 我知道了~
tpot-master-source
共153个文件
py:51个
md:13个
html:12个
需积分: 9 0 下载量 56 浏览量
2022-07-20
15:10:13
上传
评论
收藏 7.31MB ZIP 举报
温馨提示
tpot-master-source
资源详情
资源评论
资源推荐
收起资源包目录
tpot-master-source (153个子文件)
test_config.py.bad 1KB
.coveragerc 54B
theme.css 114KB
theme_extra.css 3KB
Data_FinalProject.csv 4.66MB
MAGIC Gamma Telescope Data.csv 1.41MB
titanic_train.csv 60KB
titanic_test.csv 28KB
tests.csv 21KB
submission.csv 3KB
subset_test.csv 126B
lato-italic.eot 262KB
lato-bolditalic.eot 260KB
lato-bold.eot 250KB
lato-regular.eot 248KB
fontawesome-webfont.eot 162KB
roboto-slab-v7-bold.eot 78KB
roboto-slab-v7-regular.eot 76KB
roboto-slab.eot 76KB
tpot-demo.gif 280KB
.gitignore 991B
sitemap.xml.gz 269B
index.html 51KB
index.html 49KB
index.html 28KB
index.html 19KB
index.html 14KB
index.html 12KB
index.html 9KB
index.html 8KB
index.html 8KB
index.html 7KB
search.html 6KB
404.html 5KB
favicon.ico 1KB
MANIFEST.in 116B
Titanic_Kaggle.ipynb 38KB
MAGIC Gamma Telescope.ipynb 31KB
Portuguese Bank Marketing Strategy.ipynb 29KB
IRIS.ipynb 12KB
Higgs_Boson.ipynb 10KB
Digits.ipynb 6KB
cuML_Regression_Example.ipynb 5KB
cuML_Classification_Example.ipynb 5KB
tpot-logo.jpg 177KB
lunr.js 97KB
jquery-2.1.1.min.js 82KB
modernizr-2.8.3.min.js 11KB
theme.js 4KB
worker.js 4KB
main.js 3KB
search_index.json 193KB
LICENSE 7KB
api.md 44KB
using.md 37KB
releases.md 16KB
README.md 12KB
examples.md 11KB
contributing.md 6KB
installing.md 4KB
citing.md 3KB
related.md 2KB
index.md 1KB
ISSUE_TEMPLATE.md 858B
PULL_REQUEST_TEMPLATE.md 639B
support.md 435B
tpot-ml-pipeline.png 218KB
tpot-pipeline-example.png 199KB
tpot_tests.py 90KB
base.py 84KB
export_tests.py 30KB
gp_deap.py 20KB
driver.py 19KB
one_hot_encoder.py 18KB
export_utils.py 15KB
driver_tests.py 13KB
one_hot_encoder_tests.py 12KB
operator_utils.py 11KB
nn.py 10KB
classifier_nn.py 7KB
classifier.py 7KB
regressor.py 6KB
feature_set_selector.py 6KB
feature_transformers.py 6KB
feature_set_selector_tests.py 5KB
stacking_estimator_tests.py 5KB
stats_test.py 5KB
decorators.py 4KB
regressor_cuml.py 4KB
classifier_cuml.py 4KB
classifier_sparse.py 4KB
regressor_light.py 4KB
tpot.py 3KB
stacking_estimator.py 3KB
classifier_light.py 3KB
nn_tests.py 3KB
test_log_file.py 3KB
regressor_sparse.py 3KB
metrics.py 3KB
feature_transformers_tests.py 2KB
共 153 条
- 1
- 2
qq_37959585
- 粉丝: 0
- 资源: 138
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0