[![PyPI Shield](https://img.shields.io/pypi/v/ballet.svg)](https://pypi.org/project/ballet)
[![Run Tests](https://github.com/HDI-Project/ballet/workflows/Run%20Tests/badge.svg)](https://github.com/HDI-Project/ballet/actions?query=workflow%3A%22Run+Tests%22)
[![codecov Shield](https://codecov.io/gh/HDI-Project/ballet/branch/master/graph/badge.svg)](https://codecov.io/gh/HDI-Project/ballet)
# ballet
A **light**weight framework for collaborative, open-source data science
projects through **feat**ure engineering.
- Free software: MIT license
- Documentation: https://hdi-project.github.io/ballet
- Homepage: https://github.com/HDI-Project/ballet
## Overview
Do you develop machine learning models? Do you work by yourself or on a team?
Do you share notebooks or are you committing code to a shared repository? In
contrast to successful, massively collaborative, open-source projects like
the Linux kernel, the Rails framework, Firefox, GNU, or Tensorflow, most
data science projects are developed by just a handful of people. But think if
the open-source community could leverage its ingenuity and determination to
collaboratively develop data science projects to predict the incidence of
disease in a population, to predict whether vulnerable children will be evicted
from their homes, or to predict whether learners will drop out of online
courses.
Our vision is to make collaborative data science possible by making it more
like open-source software development. Our approach is based on decomposing the
data science process into modular patches - standalone units of contribution -
that can then be intelligently combined, representing objects like "feature",
"labeling function", or "prediction task definition". Collaborators work in
parallel to write patches and submit them to a repo. Our software framework
provides the underlying functionality to merge high-quality contributions,
collect modules from the file system, and compose the accepted contributions
into a single product. It also provides a familiar notebook-based development
experience that is friendly to data scientists and other inexperienced
open-source contributors. We don't require any computing infrastructure beyond
that which is commonly used in open-source software development.
Currently, Ballet focuses on supporting collaboratively developing
*feature engineering pipelines*, an important part of many data science
projects. Individual features are represented as separate Python modules,
declaring the subset of a dataframe that they operate on and a
scikit-learn-style learned transformer that extracts feature values from the
raw data. Ballet collects individual features and composes them into a
feature engineering pipeline. At any point, a project built on Ballet can be
installed for end-to-end feature engineering on new data instances for the
same problem. How do we ensure the feature engineering pipeline is always
useful? Ballet thoroughly validates proposed features for correctness and
machine learning performance, using an extensive test suite and a novel
streaming feature definition selection algorithm. Accepted features can be
automatically merged by the ballet GitHub app into projects.
<img src="./docs/_static/feature_lifecycle.png" alt="Ballet Feature Lifecycle" width="400" />
## Next steps
*Are you a data owner or project maintainer that wants to organize a
collaboration?*
ð Check out the [Ballet Maintainer Guide](https://hdi-project.github.io/ballet/maintainer_guide.html)
*Are you a data scientist or enthusiast that wants to join a collaboration?*
ð Check out the [Ballet Contributor Guide](https://hdi-project.github.io/ballet/contributor_guide.html)
*Want to learn about how Ballet enables Better Feature Engineeringâ¢ï¸?*
ð Check out the [Feature Engineering Guide](https://hdi-project.github.io/ballet/feature_engineering_guide.html)
*Want to see a demo collaboration in progress and maybe even participate yourself?*
ð Check out the [ballet-predict-house-prices](https://github.com/HDI-Project/ballet-predict-house-prices) project
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共147个文件
py:96个
rst:13个
png:8个
资源分类:Python库 所属语言:Python 资源全名:ballet-0.9.0.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
Python库 | ballet-0.9.0.tar.gz (147个子文件)
make.bat 804B
setup.cfg 2KB
auth_with_github.gif 697KB
.gitignore 1KB
MANIFEST.in 298B
Analysis.ipynb 7KB
workspace.json 2KB
cookiecutter.json 1KB
cookiecutter.json 152B
.cookiecutter_context.json 44B
LICENSE 1KB
Makefile 607B
HISTORY.md 4KB
README.md 4KB
README.md 1KB
not-zip-safe 1B
PKG-INFO 11KB
PKG-INFO 11KB
feature_lifecycle.png 177KB
assemble_submit_button.png 34KB
assemble_submit_button_annotated_submit.png 34KB
assemble_submit_button_annotated_github.png 33KB
assemble_submit_button_annotated.png 32KB
assemble_error_not_valid_python_code.png 14KB
launch-assemble.png 2KB
built-with-ballet.png 2KB
postBuild 378B
test_util.py 29KB
entropy.py 13KB
base.py 12KB
update.py 12KB
project.py 11KB
transformer.py 11KB
common.py 10KB
test_update_end_to_end.py 10KB
test_base.py 9KB
checks.py 8KB
test_entropy.py 8KB
test_project_structure.py 8KB
testing.py 7KB
git.py 6KB
main.py 6KB
templating.py 6KB
test_validation_end_to_end.py 6KB
conf.py 5KB
test_transformer.py 5KB
test_testing.py 5KB
mod.py 5KB
misc.py 5KB
ci.py 5KB
fs.py 5KB
__init__.py 4KB
validator.py 4KB
util.py 4KB
util.py 4KB
gfssf.py 4KB
log.py 4KB
test_pruners.py 4KB
test_misc.py 4KB
contrib.py 4KB
io.py 4KB
cli.py 4KB
test_update.py 3KB
client.py 3KB
test_feature_api_checks.py 3KB
test_diff_checks.py 3KB
feature.py 3KB
test_main.py 3KB
feature_engine.py 3KB
checks.py 3KB
setup.py 3KB
test_feature.py 3KB
test_contrib.py 3KB
code.py 3KB
pipeline.py 3KB
validator.py 3KB
base.py 2KB
test_accepters.py 2KB
conftest.py 2KB
test_project.py 2KB
validator.py 2KB
category_encoders.py 2KB
test_ts.py 1KB
test_cli.py 1KB
sklearn.py 1KB
exc.py 1KB
post_gen_project.py 1KB
test_external.py 1KB
missing.py 1KB
test_templating.py 1KB
__init__.py 1KB
skits.py 1KB
ts.py 1KB
compat.py 1KB
__main__.py 996B
test_start_new_feature_end_to_end.py 951B
__init__.py 867B
load_data.py 742B
setup.py 643B
external.py 621B
共 147 条
- 1
- 2
资源评论
挣扎的蓝藻
- 粉丝: 12w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功