# automl_research
Code repository for AutoML research to support Foreshadow project
---
## Feature Type Inference (Intent Resolution)
When analyzing raw data set feature columns in `Foreshadow`, the type (intent) of the each feature column has to be known a priori to select the appropriate feature transformation downstream.
The goal of this research project is to build an intent resolver that can separate numerical and categorical raw feature columns. More classes can be added in the future.
### Installation
This library was developed on Python 3.6.8 and uses the same package dependencies as `Foreshadow` as of Oct. 17, 2019.
To install additional package dependencies for research-based functionalities, run the following:
```
pip install -r research_requirements.txt
```
### Usage
The functionality of this library is exposed through the `IntentResolver` class API as shown below. The class outputs a prediction of "Numerical", "Categorical" or "Neither" for each raw feature column. Predictions with confidences lower than the `threshold` parameter (default = 0.7) in the `.predict` method are set to "Neither".
```
import pandas as pd
from lib import IntentResolver
# Initialise object
raw = pd.read_csv('path_to_dataset.csv', encoding='latin', low_memory=False)
resolver = IntentResolver(raw)
# Predict intent
# Outputs a pd.Series of predicted intents
resolver.predict()
# OR: Predict intent with confidences at a lower threshold (i.e. less rigorous prediction)
# Outputs a pd.DataFrame of predicted intent and confidences
resolver.predict(threshold=0.6, return_conf=True)
```
### Data Sources
- [Original Meta Data Set (OMDS)](https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/ML%20Schema%20Inference/Data)
- [360 Raw Data Sets (RDSs)](https://drive.google.com/file/d/1HGmDRBSZg-Olym2envycHPkb3uwVWHJX/view) (Sourced from the [GitHub README.md](https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/ML%20Schema%20Inference))
### References
1. V. Shah, P. Kumar, K. Yang, and A. Kumar, “Towards semi-automatic mlfeature type inference."
2. N. Hynes, D. Sculley, and M. Terry, “The data linter: Lightweight, auto-mated sanity checking for ml data sets,” in NIPS MLSys Workshop, 2017.
没有合适的资源?快使用搜索试试~ 我知道了~
PyPI 官网下载 | foreshadow-0.3.dev2.tar.gz
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 53 浏览量
2022-01-11
20:31:32
上传
评论
收藏 990KB GZ 举报
温馨提示
资源来自pypi官网。 资源全名:foreshadow-0.3.dev2.tar.gz
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | foreshadow-0.3.dev2.tar.gz (192个子文件)
breast_cancer.csv 121KB
processed_data.csv 84KB
boston_housing.csv 35KB
boston_housing_processed.csv 21KB
heart-h.csv 19KB
heart-h_impute_multi.csv 8KB
heart-h_impute_median.csv 4KB
heart-h_impute_mean.csv 3KB
heart-h_impute_mode.csv 2KB
foreshadow_tpot.json 185KB
foreshadow_boston_housing_linear_regression.json 92KB
X_train_summary.json 22KB
test_serialize.json 4KB
complete_pipeline_test.json 822B
invalid_transformer_class.json 713B
invalid_transformer_params.json 700B
malformed_transformer.json 622B
override_column_intent_pipeline.json 489B
invalid_optimizer_config.json 452B
optimizer_test.json 450B
test.json 318B
override_multi_pipeline.json 210B
override_intent_pipeline_single.json 205B
empty_pipeline_test.json 198B
override_intent_pipeline_multi.json 196B
configs_override4.json 39B
configs_override3.json 39B
configs_empty.json 2B
configs_override2.json 2B
configs_override1.json 2B
LICENSE 11KB
README.md 2KB
PKG-INFO 6KB
resolver_components.pkl 3.82MB
test_params.pkl 14KB
search_space_optimize.pkl 7KB
search_space_no_combo.pkl 2KB
search_space_no_cfg.pkl 2KB
configs_default.pkl 724B
configs_empty.pkl 2B
test_foreshadow.py 34KB
parallelprocessor.py 24KB
test_transformers.py 22KB
preparerstep.py 21KB
foreshadow.py 19KB
all.py 16KB
wrapper.py 15KB
raw_data_set_featurizer_via_lambda.py 15KB
serializers.py 15KB
test_smart.py 14KB
test_internal.py 12KB
heuristics.py 11KB
console.py 11KB
test_cache_manager.py 10KB
auto.py 9KB
test_serializer.py 9KB
smart.py 9KB
cachemanager.py 9KB
ngram_featurizer.py 9KB
logging.py 9KB
base.py 8KB
preparer.py 8KB
base_data_set_parser.py 8KB
base_text_featurizer.py 7KB
test_base.py 7KB
test_console.py 7KB
test_logging.py 7KB
metrics.py 7KB
test_auto.py 7KB
test_registry.py 7KB
config.py 7KB
pipeline.py 7KB
param_mapping.py 7KB
param_distribution.py 7KB
intent_resolver.py 7KB
setup.py 6KB
test_utils.py 6KB
raw_data_set_parser.py 6KB
common.py 6KB
validation.py 6KB
__init__.py 5KB
test_general.py 5KB
test_data_cleaner.py 5KB
tuner.py 5KB
test_newpreprocessor.py 5KB
default_estimator_factory.py 4KB
test_preparer.py 4KB
feature_reducer.py 4KB
random_search.py 4KB
test_random_search.py 3KB
testing.py 3KB
test_config.py 3KB
financial.py 3KB
meta_data_set_featurizer_via_lambda.py 3KB
intentresolver.py 3KB
cleaner.py 3KB
test_metrics.py 3KB
fancyimpute.py 3KB
test_meta.py 3KB
test_integration.py 3KB
共 192 条
- 1
- 2
资源评论
挣扎的蓝藻
- 粉丝: 14w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功