用于候选项目匹配的可配置、可调和可重现的库_Python

共87个文件

py：43个

pyc：21个

yaml：12个

版权申诉

26 浏览量 2023-04-23 10:14:45 上传评论收藏 3.81MB ZIP 举报

标题中的“用于候选项目匹配的可配置、可调和可重现的库_Python_下载.zip”表明这是一个基于Python的库，专门设计用于处理候选项目的匹配问题。这种库通常在数据科学、机器学习或人工智能领域中使用，特别是推荐系统、协同过滤或其他匹配算法的实现。它强调了三个关键特性：可配置性、可调整性和可重现性。 1. 可配置性：这意味着库允许用户根据特定需求定制匹配过程。用户可以设定不同的参数，如相似度阈值、权重分配或匹配算法类型（如余弦相似性、Jaccard相似性等），以适应不同场景。 2. 可调整性：这个特性意味着库能够随着业务需求的变化进行灵活调整。例如，可能需要更新模型以适应新的数据集，或者优化匹配性能。库应提供易于理解和修改的接口，使得用户能够快速进行调整。 3. 可重现性：在科学研究或工程实践中，可重现性至关重要。这意味着相同的输入数据和配置应该总是产生相同的结果。库应有良好的记录和版本控制，确保实验结果可以被其他人复现，有助于验证和评估算法的准确性和稳定性。压缩包中的“MatchBox-main”可能是指该库的主代码目录。通常，这样的结构包含以下部分： - `README.md`：介绍库的使用方法、依赖项和安装指南。 - `requirements.txt`：列出库运行所需的Python包及其版本。 - `setup.py`：用于安装和分发库的Python脚本。 - `src`或`matchbox`目录：包含核心匹配算法和其他功能的源代码。 - `tests`目录：测试用例，用于验证库的功能和确保代码质量。 - `examples`目录：示例代码，展示如何使用库来解决实际问题。在深入使用这个库之前，首先需要按照`README.md`的指示安装所有必要的依赖，并了解如何配置和调用匹配功能。这可能涉及到设置配置文件、导入库模块、实例化匹配类、加载数据以及运行匹配过程。对于复杂的应用场景，可能还需要调整算法参数，比如改变相似度计算的方式，或者引入额外的特征进行匹配。在进行匹配时，用户可能会遇到性能优化的问题，例如通过并行处理加快计算速度，或者使用更高效的数据结构来存储和检索候选项目。此外，为了评估匹配效果，通常需要准备基准数据，计算匹配的精度、召回率和F1分数等指标。这个Python库为候选项目匹配提供了一种灵活且可靠的解决方案，它不仅允许用户根据具体情况调整匹配逻辑，还保证了结果的可重现性，这对于数据驱动的决策过程至关重要。在实际应用中，用户可以根据需求对库进行深度定制，以实现更高效、更精确的匹配服务。

资源推荐

资源详情

资源评论

收起资源包目录

用于候选项目匹配的可配置、可调和可重现的库_Python_下载.zip （87个子文件）

MatchBox-main

model_zoo

__init__.py 104B

YouTubeNet

src

__init__.py 34B

YouTubeNet.py 4KB

run_expid.py 3KB

config

YouTubeNet_yelp18_m1

dataset_config.yaml 1020B

model_config.yaml 950B

SimpleX

src

__init__.py 28B

SimpleX.py 8KB

run_expid.py 3KB

config

SimpleX_gowalla_m1

dataset_config.yaml 954B

SimpleX_gowalla_m1_013_4ecb0cbe.log 35KB

model_config.yaml 1008B

SimpleX_amazonbooks_m1

SimpleX_amazonbooks_m1_003_a30a8992.log 33KB

dataset_config.yaml 1KB

model_config.yaml 1KB

SimpleX_yelp18_m1

dataset_config.yaml 997B

model_config.yaml 995B

SimpleX_yelp18_m1_034_297a4b82.log 38KB

src

__init__.py 46B

MF.py 2KB

BiasMF.py 4KB

run_expid.py 3KB

config

MF_CCL_yelp18_m1

dataset_config.yaml 793B

model_config.yaml 778B

MF_BPR_yelp18_m1

dataset_config.yaml 793B

model_config.yaml 772B

benchmark

run_expid_list.py 771B

enumerate_expid_list.py 985B

run_expid.py 3KB

run_param_tuner.py 793B

data

Yelp18

yelp18_m1

matchbox_convert_data.py 2KB

test.txt 1.89MB

README.md 298B

train.txt 6.59MB

README.md 136B

LICENSE 11KB

matchbox

utils.py 4KB

__init__.py 32B

ann

__init__.py 43B

faiss.py 614B

preprocess.py 5KB

version.py 23B

features.py 17KB

metrics.py 6KB

autotuner.py 8KB

datasets

__init__.py 25B

data_utils.py 7KB

pytorch

__init__.py 0B

layers

__init__.py 67B

embedding.py 7KB

sequence.py 660B

mlp.py 2KB

__pycache__

embedding.cpython-36.pyc 4KB

sequence.cpython-36.pyc 1KB

mlp.cpython-36.pyc 1KB

__init__.cpython-36.pyc 178B

data_generator.py 10KB

losses

__init__.py 324B

pairwise_logistic_loss.py 548B

cosine_contrastive_loss.py 1KB

sigmoid_crossentropy_loss.py 605B

softmax_crossentropy_loss.py 572B

pairwise_margin_loss.py 611B

mse_loss.py 580B

__pycache__

pairwise_logistic_loss.cpython-36.pyc 1000B

pairwise_margin_loss.cpython-36.pyc 1KB

sigmoid_crossentropy_loss.cpython-36.pyc 1KB

softmax_crossentropy_loss.cpython-36.pyc 1KB

mse_loss.cpython-36.pyc 978B

__init__.cpython-36.pyc 502B

cosine_contrastive_loss.cpython-36.pyc 1KB

models

__init__.py 34B

base_model.py 11KB

__pycache__

MF.cpython-36.pyc 2KB

YoutubeDNN.cpython-36.pyc 3KB

__init__.cpython-36.pyc 205B

SimpleX.cpython-36.pyc 5KB

base_model.cpython-36.pyc 8KB

__pycache__

torch_utils.cpython-36.pyc 3KB

data_generator.cpython-36.pyc 8KB

torch_utils.cpython-37.pyc 3KB

__init__.cpython-37.pyc 121B

__init__.cpython-36.pyc 110B

torch_utils.py 4KB

requirements.txt 58B

.gitignore 1KB

README.md 3KB

# MatchBox Industrial recommender systems generally have two main stages: matching and ranking. In the first stage, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility. ## Model Zoo | Publication | Model | Paper | Benchmark | |:-----------:|:--------------:|:----------------------------------------------------------------- |:-------------:| | UAI'09 | [MF-BPR](./model_zoo/MF) | [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf) | [:arrow_upper_right:](./model_zoo/MF/config) | | RecSys'16 | [YoutubeNet](./model_zoo/YoutubeNet) | [Deep Neural Networks for YouTube Recommendations](https://dl.acm.org/doi/10.1145/2959100.2959190) | [:arrow_upper_right:](./model_zoo/YouTubeNet/config) | | CIKM'21 | [MF-CCL](./model_zoo/MF)/ [SimpleX](./model_zoo/SimpleX) | [SimpleX: A Simple and Strong Baseline for Collaborative Filtering](https://arxiv.org/abs/2109.12613) | [:arrow_upper_right:](./model_zoo/SimpleX/config) | ## Dependency We suggest to use the following environment where we test MatchBox only. + CUDA 10.0 + python 3.6 + pytorch 1.0 + PyYAML + pandas + scikit-learn + numpy + h5py + tqdm ## Get Started The code workflow is structured as follows: ```python # Set the dataset config and model config feature_cols = [{...}] # define feature columns label_col = {...} # define label column params = {...} # set data params and model params # Set the feature encoding specs feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset # Load data generators train_gen, valid_gen, test_gen = datasets.h5_generator(feature_encoder, ...) # Define a model model = SimpleX(...) # Train the model model.fit(train_gen, valid_gen, ...) # Evaluation model.evaluate(test_gen) ``` #### Run the code For reproducing the experiment results, you can run the benchmarking scripts with the corresponding configs as follows. + --config: The config directory where dataset config and model config are located. + --expid: The experiment id defined in a model config file to set a group of hyper-parameters. + --gpu: The gpu index used for experiment, and -1 for CPU. ```bash cd data/Yelp18/yelp18_m1 python matchbox_convert_data.py cd model_zoo/SimpleX python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0 ... python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0 python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0 ``` The running logs are also available in each config directory.

评论收藏

内容反馈

版权申诉