2021微信大数据挑战赛复赛Rank23源码+学习说明.zip_2021微信大数据挑战赛资源-CSDN文库

共23个文件

py：11个

sh：6个

txt：2个

版权申诉

大学生竞赛

源码

学习资料

5星 · 超过95%的资源 74 浏览量 2024-01-14 18:05:26 上传评论收藏 39KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

2021微信大数据挑战赛复赛Rank23源码+学习说明.zip （23个子文件）

code_20105

src

evaluation.py 7KB

train

run_submit.py 4KB

.ipynb_checkpoints

run_submit-checkpoint.py 4KB

inference1.py 5KB

inference.py 4KB

prepare

get_features.py 5KB

.ipynb_checkpoints

get_features-checkpoint.py 5KB

model

mmoe.py 20KB

__pycache__

mmoe.cpython-36.pyc 11KB

.ipynb_checkpoints

mmoe-checkpoint.py 20KB

__pycache__

evaluation.cpython-36.pyc 5KB

.ipynb_checkpoints

inference1-checkpoint.py 5KB

inference-checkpoint.py 4KB

inference.sh 885B

init.sh 31B

requirements.txt 138B

train.sh 1KB

.ipynb_checkpoints

requirements-checkpoint.txt 138B

train-checkpoint.sh 1KB

inference-checkpoint.sh 885B

init-checkpoint.sh 31B

README-checkpoint.md 4KB

README.md 3KB

# WX challenge ## **1. 环境依赖** - Python 3.6.5 - numba 0.53.1 - numpy 1.18.5 - pandas 1.0.5 - scikit-learn 0.23.1 - tensorflow-gpu 1.13.1 - tqdm 4.46.1 - scipy 1.5.0 - deepctr 0.8.6 - gensim 3.8 ## **2. 目录结构** ``` ./ ├── README.md ├── requirements.txt, python package requirements ├── init.sh, script for installing package requirements ├── train.sh, script for preparing train/inference data and training models, including pretrained models ├── inference.sh, main function for inference on test dataset ├── src │ ├── prepare, codes for preparing train/inference dataset | ├──get_features.py │ ├── model, codes for model architecture | ├──mmoe.py | ├── train, codes for training | ├──run_submit.py | ├── evaluation.py, main function for evaluation | ├── inference.py | ├── inference1.py ├── data │ ├── wedata | ├──wechat_algo_data1, dataset of the competition | ├──wechat_algo_data2, dataset of the competition | ├── submission, prediction result after running inference.sh | ├── model, model files | ├── feature, feature files ``` ## **3. 运行流程** - 进入目录：cd /home/tione/notebook/wbdc2021-semi - 安装环境：使用 conda_tensorflow_py3虚拟环境运行sh init.sh - 数据准备和模型训练：sh train.sh - 预测并生成结果文件：sh inference.sh /home/tione/notebook/wbdc2021-semi/data/wedata/wechat_algo_data2/test_b.csv ## **4. 模型及特征** - 模型：[MMOE](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007) - 参数： - batch_size: 4092 - emded_dim: 512 - num_epochs: 5 - learning_rate: 0.01 - 特征： - userid, feedid, authorid, bgm_singer_id, bgm_song_id等id类特征 - keyword、tag标签特征 - 视频类别、作者类别 - userid序列embedding - feed聚类、author聚类、user聚类 ## **5. 算法性能** - 资源配置：2*P40_48G显存_14核CPU_112G内存 - 预测耗时 - 总预测时长: 1791 s - 单个目标行为2000条样本的平均预测时长: 120.344 ms ## **6. 代码说明** 模型预测部分代码位置如下： | 路径 | 行数 | 内容 | | :--- | :--- | :--- | | src/inference.py | 82 - 96 | `pred_ans = train_model.predict(test_model_input, batch_size=batch_size * 100) `| | src/inference1.py | 93 - 108 | `pred_ans = train_model.predict(test_model_input, batch_size=batch_size * 100) `| ## **7. 相关文献** * Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 1930-1939. * Weichen Shen. (2017). DeepCTR: Easy-to-use,Modular and Extendible package of deep-learning based CTR models. https://github.com/shenweichen/deepctr.

评论收藏

内容反馈

版权申诉