南京理工大学机器学习与人工智能选修课程大作业备份.zip资源-CSDN文库

共73个文件

py：37个

sh：18个

md：5个

机器学习

需积分: 5 44 浏览量 2024-04-16 22:51:50 上传评论收藏 114KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

南京理工大学机器学习与人工智能选修课程大作业备份.zip （73个子文件）

content

DeepFM

deepfm

README_CN.md 19KB

preprocess.py 2KB

eval.py 3KB

mindspore_hub_conf.py 1KB

ascend310_infer

inc

utils.h 1KB

CMakeLists.txt 668B

src

utils.cc 4KB

main.cc 5KB

build.sh 919B

src

__init__.py 658B

dataset.py 12KB

model_utils

__init__.py 0B

moxing_adapter.py 4KB

local_adapter.py 1KB

device_adapter.py 1021B

config.py 4KB

preprocess_data.py 12KB

deepfm.py 16KB

callback.py 4KB

export.py 2KB

default_config.yaml 3KB

requirements.txt 28B

postprocess.py 1KB

train.py 6KB

README.md 20KB

scripts

run_standalone_train.sh 1KB

run_distribute_train.sh 2KB

run_distribute_train_gpu.sh 1KB

run_eval.sh 1KB

run_infer_310.sh 4KB

LICENCE 1KB

readme.md 353B

KNN

wine.data 11KB

KNN_main.py 3KB

requirements.txt 71B

.gitignore 249B

Linear_Regression

LineReg_main.py 2KB

Logistic_Regression

iris.data 4KB

LogiReg_main.py 3KB

Wide_and_Deep

README_CN.md 15KB

train_and_eval_distribute.py 5KB

eval.py 4KB

mindspore_hub_conf.py 1KB

train_and_eval_parameter_server_distribute.py 7KB

src

wide_and_deep.py 21KB

__init__.py 0B

metrics.py 2KB

generate_synthetic_data.py 5KB

datasets.py 15KB

preprocess_data.py 13KB

callbacks.py 5KB

process_data.py 11KB

config.py 6KB

export.py 2KB

train_and_eval_parameter_server_standalone.py 5KB

train_and_eval.py 4KB

requirements.txt 21B

script

run_auto_parallel_train.sh 1KB

run_multigpu_train.sh 1KB

run_auto_parallel_train_cluster.sh 2KB

run_multinpu_train.sh 1KB

run_parameter_server_train_distribute.sh 3KB

common.sh 3KB

run_parameter_server_train_cluster.sh 4KB

run_parameter_server_train_standalone.sh 3KB

cluster_32p.json 300B

run_standalone_train_for_gpu.sh 1KB

run_multigpu_train_host_device.sh 1KB

deploy_cluster.sh 1KB

start_cluster.sh 2KB

train.py 3KB

README.md 16KB

train_and_eval_auto_parallel.py 6KB

# Contents - [Contents](#contents) - [DeepFM Description](#deepfm-description) - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Environment Requirements](#environment-requirements) - [Quick Start](#quick-start) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) - [Training Process](#training-process) - [Training](#training) - [Distributed Training](#distributed-training) - [Evaluation Process](#evaluation-process) - [Evaluation](#evaluation) - [Inference Process](#inference-process) - [Export MindIR](#export-mindir) - [Infer on Ascend310](#infer-on-ascend310) - [result](#result) - [Model Description](#model-description) - [Performance](#performance) - [Training Performance](#training-performance) - [Inference Performance](#inference-performance) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) # [DeepFM Description](#contents) Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. [Paper](https://arxiv.org/abs/1703.04247): Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction # [Model Architecture](#contents) DeepFM consists of two components. The FM component is a factorization machine, which is proposed in to learn feature interactions for recommendation. The deep component is a feed-forward neural network, which is used to learn high-order feature interactions. The FM and deep component share the same input raw feature vector, which enables DeepFM to learn low- and high-order feature interactions simultaneously from the input raw features. # [Dataset](#contents) - [1] A dataset used in Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[J]. 2017. # [Environment Requirements](#contents) - Hardware（Ascend/GPU/CPU） - Prepare hardware environment with Ascend, GPU, or CPU processor. - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below： - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html) # [Quick Start](#contents) After installing MindSpore via the official website, you can start training and evaluation as follows: - preprocess dataset '''bash #download dataset #Please refer to [1] to obtain the download link mkdir -p data/origin_data && cd data/origin_data wget DATA_LINK tar -zxvf dac.tar.gz #preprocess dataset python src/preprocess_data.py --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0 ''' - running on Ascend ```shell # run training example python train.py \ --dataset_path='dataset/train' \ --ckpt_path='./checkpoint' \ --eval_file_name='auc.log' \ --loss_file_name='loss.log' \ --device_target=Ascend \ --do_eval=True > ms_log/output.log 2>&1 & # run distributed training example bash scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json # run evaluation example python eval.py \ --dataset_path='dataset/test' \ --checkpoint_path='./checkpoint/deepfm.ckpt' \ --device_target=Ascend > ms_log/eval_output.log 2>&1 & OR bash scripts/run_eval.sh 0 Ascend /dataset_path /checkpoint_path/deepfm.ckpt ``` For distributed training, a hccl configuration file with JSON format needs to be created in advance. Please follow the instructions in the link below: [hccl tools](https://gitee.com/mindspore/models/tree/r1.5/utils/hccl_tools). - running on GPU ```shell # run training example python train.py \ --dataset_path='dataset/train' \ --ckpt_path='./checkpoint' \ --eval_file_name='auc.log' \ --loss_file_name='loss.log' \ --device_target=GPU \ --do_eval=True > ms_log/output.log 2>&1 & # run distributed training example bash scripts/run_distribute_train_gpu.sh 8 /dataset_path # run evaluation example python eval.py \ --dataset_path='dataset/test' \ --checkpoint_path='./checkpoint/deepfm.ckpt' \ --device_target=GPU > ms_log/eval_output.log 2>&1 & OR bash scripts/run_eval.sh 0 GPU /dataset_path /checkpoint_path/deepfm.ckpt ``` - running on CPU ```shell # run training example python train.py \ --dataset_path='dataset/train' \ --ckpt_path='./checkpoint' \ --eval_file_name='auc.log' \ --loss_file_name='loss.log' \ --device_target=CPU \ --do_eval=True > ms_log/output.log 2>&1 & # run evaluation example python eval.py \ --dataset_path='dataset/test' \ --checkpoint_path='./checkpoint/deepfm.ckpt' \ --device_target=CPU > ms_log/eval_output.log 2>&1 & ``` - Running on [ModelArts](https://support.huaweicloud.com/modelarts/) ```bash # Train 8p with Ascend # (1) Perform a or b. # a. Set "enable_modelarts=True" on default_config.yaml file. # Set "distribute=True" on default_config.yaml file. # Set "dataset_path='/cache/data'" on default_config.yaml file. # Set "train_epochs: 5" on default_config.yaml file. # (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file. # Set other parameters on default_config.yaml file you need. # b. Add "enable_modelarts=True" on the website UI interface. # Add "distribute=True" on the website UI interface. # Add "dataset_path=/cache/data" on the website UI interface. # Add "train_epochs: 5" on the website UI interface. # (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface. # Add other parameters on the website UI interface. # (2) Prepare model code # (3) Upload or copy your pretrained model to S3 bucket if you want to finetune. # (4) Perform a or b. (suggested option a) # a. First, zip MindRecord dataset to one zip file. # Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.) # b. Upload the original dataset to S3 bucket. # (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.) # (5) Set the code directory to "/path/deepfm" on the website UI interface. # (6) Set the startup file to "train.py" on the website UI interface. # (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface. # (8) Create your job. # # Train 1p with Ascend # (1) Perform a or b. # a. Set "enable_modelarts=True" on default_config.yaml file. # Set "dataset_path='/cache/data'" on default_config.yaml file. # Set "train_epochs: 5" on default_config.yaml file. # (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file. # Set other parameters on default_config.yaml file you need. # b. Add "enable_modelarts=True" on the website UI interface. # Add "dataset_path='/cache

评论收藏

内容反馈