tpot-master-source资源-CSDN文库

共153个文件

py：51个

md：13个

html：12个

需积分: 9 56 浏览量 2022-07-20 15:10:13 上传评论收藏 7.31MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

tpot-master-source （153个子文件）

test_config.py.bad 1KB

.coveragerc 54B

theme.css 114KB

theme_extra.css 3KB

Data_FinalProject.csv 4.66MB

MAGIC Gamma Telescope Data.csv 1.41MB

titanic_train.csv 60KB

titanic_test.csv 28KB

tests.csv 21KB

submission.csv 3KB

subset_test.csv 126B

lato-italic.eot 262KB

lato-bolditalic.eot 260KB

lato-bold.eot 250KB

lato-regular.eot 248KB

fontawesome-webfont.eot 162KB

roboto-slab-v7-bold.eot 78KB

roboto-slab-v7-regular.eot 76KB

roboto-slab.eot 76KB

tpot-demo.gif 280KB

.gitignore 991B

sitemap.xml.gz 269B

index.html 51KB

index.html 49KB

index.html 28KB

index.html 19KB

index.html 14KB

index.html 12KB

index.html 9KB

index.html 8KB

index.html 7KB

search.html 6KB

404.html 5KB

favicon.ico 1KB

MANIFEST.in 116B

Titanic_Kaggle.ipynb 38KB

MAGIC Gamma Telescope.ipynb 31KB

Portuguese Bank Marketing Strategy.ipynb 29KB

IRIS.ipynb 12KB

Higgs_Boson.ipynb 10KB

Digits.ipynb 6KB

cuML_Regression_Example.ipynb 5KB

cuML_Classification_Example.ipynb 5KB

tpot-logo.jpg 177KB

lunr.js 97KB

jquery-2.1.1.min.js 82KB

modernizr-2.8.3.min.js 11KB

theme.js 4KB

worker.js 4KB

main.js 3KB

search_index.json 193KB

LICENSE 7KB

api.md 44KB

using.md 37KB

releases.md 16KB

README.md 12KB

examples.md 11KB

contributing.md 6KB

installing.md 4KB

citing.md 3KB

related.md 2KB

index.md 1KB

ISSUE_TEMPLATE.md 858B

PULL_REQUEST_TEMPLATE.md 639B

support.md 435B

tpot-ml-pipeline.png 218KB

tpot-pipeline-example.png 199KB

tpot_tests.py 90KB

base.py 84KB

export_tests.py 30KB

gp_deap.py 20KB

driver.py 19KB

one_hot_encoder.py 18KB

export_utils.py 15KB

driver_tests.py 13KB

one_hot_encoder_tests.py 12KB

operator_utils.py 11KB

nn.py 10KB

classifier_nn.py 7KB

classifier.py 7KB

regressor.py 6KB

feature_set_selector.py 6KB

feature_transformers.py 6KB

feature_set_selector_tests.py 5KB

stacking_estimator_tests.py 5KB

stats_test.py 5KB

decorators.py 4KB

regressor_cuml.py 4KB

classifier_cuml.py 4KB

classifier_sparse.py 4KB

regressor_light.py 4KB

tpot.py 3KB

stacking_estimator.py 3KB

classifier_light.py 3KB

nn_tests.py 3KB

test_log_file.py 3KB

regressor_sparse.py 3KB

metrics.py 3KB

feature_transformers_tests.py 2KB

共 153 条

Master status: [![Master Build Status - Mac/Linux](https://travis-ci.com/EpistasisLab/tpot.svg?branch=master)](https://travis-ci.com/EpistasisLab/tpot) [![Master Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/master?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=master) [![Master Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=master)](https://coveralls.io/github/EpistasisLab/tpot?branch=master) Development status: [![Development Build Status - Mac/Linux](https://travis-ci.com/EpistasisLab/tpot.svg?branch=development)](https://travis-ci.com/EpistasisLab/tpot/branches) [![Development Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/development?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=development) [![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=development)](https://coveralls.io/github/EpistasisLab/tpot?branch=development) Package information: [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/) [![License: LGPL v3](https://img.shields.io/badge/license-LGPL%20v3-blue.svg)](http://www.gnu.org/licenses/lgpl-3.0) [![PyPI version](https://badge.fury.io/py/TPOT.svg)](https://badge.fury.io/py/TPOT) <img src="https://raw.githubusercontent.com/EpistasisLab/tpot/master/images/tpot-logo.jpg" width=300 /> **TPOT** stands for **T**ree-based **P**ipeline **O**ptimization **T**ool. Consider TPOT your **Data Science Assistant**. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. ![TPOT Demo](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-demo.gif "TPOT Demo") TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data. ![An example Machine Learning pipeline](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-ml-pipeline.png "An example Machine Learning pipeline") An example Machine Learning pipeline Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there. ![An example TPOT pipeline](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-pipeline-example.png "An example TPOT pipeline") TPOT is built on top of scikit-learn, so all of the code it generates should look familiar... if you're familiar with scikit-learn, anyway. **TPOT is still under active development** and we encourage you to check back on this repository regularly for updates. For further information about TPOT, please see the [project documentation](http://epistasislab.github.io/tpot/). ## License Please see the [repository license](https://github.com/EpistasisLab/tpot/blob/master/LICENSE) for the licensing and usage information for TPOT. Generally, we have licensed TPOT to make it as widely usable as possible. ## Installation We maintain the [TPOT installation instructions](http://epistasislab.github.io/tpot/installing/) in the documentation. TPOT requires a working installation of Python. ## Usage TPOT can be used [on the command line](http://epistasislab.github.io/tpot/using/#tpot-on-the-command-line) or [with Python code](http://epistasislab.github.io/tpot/using/#tpot-with-code). Click on the corresponding links to find more information on TPOT usage in the documentation. ## Examples ### Classification Below is a minimal working example with the optical recognition of handwritten digits dataset. ```python from tpot import TPOTClassifier from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split digits = load_digits() X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, train_size=0.75, test_size=0.25, random_state=42) tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42) tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test)) tpot.export('tpot_digits_pipeline.py') ``` Running this code should discover a pipeline that achieves about 98% testing accuracy, and the corresponding Python code should be exported to the `tpot_digits_pipeline.py` file and look similar to the following: ```python import numpy as np import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline, make_union from sklearn.preprocessing import PolynomialFeatures from tpot.builtins import StackingEstimator from tpot.export_utils import set_param_recursive # NOTE: Make sure that the outcome column is labeled 'target' in the data file tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64) features = tpot_data.drop('target', axis=1) training_features, testing_features, training_target, testing_target = \ train_test_split(features, tpot_data['target'], random_state=42) # Average CV score on the training set was: 0.9799428471757372 exported_pipeline = make_pipeline( PolynomialFeatures(degree=2, include_bias=False, interaction_only=False), StackingEstimator(estimator=LogisticRegression(C=0.1, dual=False, penalty="l1")), RandomForestClassifier(bootstrap=True, criterion="entropy", max_features=0.35000000000000003, min_samples_leaf=20, min_samples_split=19, n_estimators=100) ) # Fix random state for all the steps in exported pipeline set_param_recursive(exported_pipeline.steps, 'random_state', 42) exported_pipeline.fit(training_features, training_target) results = exported_pipeline.predict(testing_features) ``` ### Regression Similarly, TPOT can optimize pipelines for regression problems. Below is a minimal working example with the practice Boston housing prices data set. ```python from tpot import TPOTRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split housing = load_boston() X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, train_size=0.75, test_size=0.25, random_state=42) tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2, random_state=42) tpot.fit(X_train, y_train) print(tpot.score(X_test, y_test)) tpot.export('tpot_boston_pipeline.py') ``` which should result in a pipeline that achieves about 12.77 mean squared error (MSE), and the Python code in `tpot_boston_pipeline.py` should look similar to: ```python import numpy as np import pandas as pd from sklearn.ensemble import ExtraTreesRegressor from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline from sklearn.preprocessing import PolynomialFeatures from tpot.export_utils import set_param_recursive # NOTE: Make sure that the outcome column is labeled 'target' in the data file tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64) features = tpot_data.drop('target', axis=1) training_features, testing_features, training_target, testing_target = \ train_test_split(features, tpot_data['target'], random_state=42) # Average CV score on the training set was: -10.812040755234403 exported_pipeline = make_pipeline( PolynomialFeatures(degree=2, include_bias=False, interaction_only=False), ExtraTreesRegressor(bootstrap=False, max_features=0.5, min_samples_leaf=2, min_samples_split=3, n_estimators=100) ) # Fix random state for all the steps in exported pipeline set_param_recursive(exported_pipeline.steps, 'random_state', 42) exported_pipeline.fit(training_features, training_target) results = exported_pipeli