毕设&课程作业_基于深度学习的广告推荐CTR预估模型.zip资源-CSDN文库

共173个文件

py：95个

json：22个

index：11个

版权申诉

深度学习

python

5星 · 超过95%的资源 170 浏览量 2024-01-16 17:04:46 上传评论收藏 544KB ZIP 举报

《基于深度学习的广告推荐CTR预估模型》在当今信息爆炸的时代，如何精准地将广告推送给潜在的用户，成为广告商与互联网平台的重要挑战。为此，许多研究者和工程师利用深度学习技术构建了广告点击率（CTR）预估模型，以提高广告投放的效果。本文将深入探讨这一领域的关键知识点。一、深度学习基础深度学习是机器学习的一个分支，通过模拟人脑神经网络的工作机制，构建多层非线性模型。主要的深度学习模型包括：深度神经网络（DNN）、卷积神经网络（CNN）和循环神经网络（RNN）。在广告推荐系统中，DNN特别适用于处理高维稀疏特征，如用户的浏览历史、搜索关键词等。二、广告点击率预估 CTR预估是预测用户看到特定广告后是否会点击它的概率。准确的CTR预估对于提升广告收益至关重要。传统方法如逻辑回归虽简单但难以捕捉复杂关联，而深度学习模型则能通过自动学习特征表示和交互，提升预估精度。三、模型架构 1. Wide & Deep模型：结合线性模型（Wide部分）和深度神经网络（Deep部分），兼顾记忆和泛化能力。 2. Factorization Machines（FM）：通过因子分解来捕获特征之间的二阶交互，可视为轻量级深度学习模型。 3. Deep Crossing：通过多层交叉层来学习特征的任意阶交互，适合处理高维稀疏数据。四、特征工程特征工程是提升模型性能的关键步骤。在广告推荐中，包括用户特征（如年龄、性别、历史行为）、广告特征（如广告类别、展示时间）和上下文特征（如设备类型、地理位置）。特征编码、组合以及选择合适的特征表示方法（如one-hot编码、embedding）对模型效果有直接影响。五、模型训练常用的优化算法有随机梯度下降（SGD）、Adam、Adagrad等，用于调整模型参数。此外，由于广告数据通常具有长尾分布，因此，损失函数常选择对数似然损失或加权对数似然损失，以平衡样本的不平衡。六、评估指标评估CTR模型的指标主要包括AUC（面积在ROC曲线下的面积）、LogLoss（对数损失）和HitRate（点击率）。AUC衡量模型分类能力，LogLoss反映模型预测的不确定性，HitRate则关注实际点击情况。七、在线服务与实时更新在实际应用中，模型需要部署为在线服务，处理实时请求。同时，模型需定期更新以适应用户行为的变化，常见的策略有基于时间窗口的滑动更新和基于样本量的累积更新。在《毕设&课程作业_基于深度学习的广告推荐CTR预估模型》项目中，你将学习如何运用Python进行模型开发，可能涉及TensorFlow或PyTorch等深度学习框架，以及C++进行系统优化。通过这个项目，你将掌握深度学习在广告推荐中的实际应用，为未来在大数据和人工智能领域的发展奠定坚实基础。

资源推荐

资源详情

资源评论

收起资源包目录

毕设&课程作业_基于深度学习的广告推荐CTR预估模型.zip （173个子文件）

census_test.csv 3KB

model.ckpt.data-00000-of-00001 96KB

model.ckpt.data-00000-of-00001 36KB

model.ckpt.data-00000-of-00001 33KB

model.ckpt.data-00000-of-00001 32KB

model.ckpt.data-00000-of-00001 76B

model.ckpt.data-00000-of-00001 4B

.DS_Store 10KB

.DS_Store 6KB

expected_graph 27KB

expected_graph 22KB

expected_graph 20KB

expected_graph 16KB

expected_graph 14KB

expected_graph 6KB

expected_graph 5KB

expected_graph 893B

model.ckpt.index 824B

model.ckpt.index 680B

model.ckpt.index 641B

model.ckpt.index 628B

model.ckpt.index 521B

model.ckpt.index 477B

model.ckpt.index 275B

model.ckpt.index 254B

model.ckpt.index 136B

results.json 148B

results.json 147B

results.json 144B

results.json 143B

results.json 142B

results.json 140B

results.json 127B

results.json 66B

tf_version.json 51B

tf_version.json 48B

results.json 20B

README.md 6KB

README.md 3KB

guidelines.md 2KB

din_feature_column.py 23KB

logger.py 16KB

logger_test.py 14KB

reference_data.py 13KB

transformer.py 10KB

movielens.py 9KB

distribution_utils.py 9KB

metric_hook_test.py 9KB

train.py 8KB

utils.py 8KB

train.py 8KB

train.py 7KB

_performance.py 7KB

file_io.py 7KB

train.py 7KB

census_dataset.py 7KB

xdeepfm.py 7KB

din.py 6KB

_base.py 6KB

fibinet.py 6KB

afm.py 6KB

input_fn.py 6KB

file_io_test.py 6KB

mlperf_helper.py 6KB

movielens_dataset.py 6KB

input_fn.py 6KB

共 173 条

# Predicting Income with the Census Income Dataset ## Overview The [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income) contains over 48,000 samples with attributes including age, occupation, education, and income (a binary label, either `>50K` or `<=50K`). The dataset is split into roughly 32,000 training and 16,000 testing samples. Here, we use the [wide and deep model](https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) to predict the income labels. The **wide model** is able to memorize interactions with data with a large number of features but not able to generalize these learned interactions on new data. The **deep model** generalizes well but is unable to learn exceptions within the data. The **wide and deep model** combines the two models and is able to generalize while learning exceptions. For the purposes of this example code, the Census Income Data Set was chosen to allow the model to train in a reasonable amount of time. You'll notice that the deep model performs almost as well as the wide and deep model on this dataset. The wide and deep model truly shines on larger data sets with high-cardinality features, where each feature has millions/billions of unique possible values (which is the specialty of the wide model). Finally, a key point. As a modeler and developer, think about how this dataset is used and the potential benefits and harm a model's predictions can cause. A model like this could reinforce societal biases and disparities. Is a feature relevant to the problem you want to solve, or will it introduce bias? For more information, read about [ML fairness](https://developers.google.com/machine-learning/fairness-overview/). --- The code sample in this directory uses the high level `tf.estimator.Estimator` API. This API is great for fast iteration and quickly adapting models to your own datasets without major code overhauls. It allows you to move from single-worker training to distributed training, and it makes it easy to export model binaries for prediction. The input function for the `Estimator` uses `tf.contrib.data.TextLineDataset`, which creates a `Dataset` object. The `Dataset` API makes it easy to apply transformations (map, batch, shuffle, etc.) to the data. [Read more here](https://www.tensorflow.org/guide/datasets). The `Estimator` and `Dataset` APIs are both highly encouraged for fast development and efficient training. ## Running the code First make sure you've [added the models folder to your Python path](/official/#running-the-models); otherwise you may encounter an error like `ImportError: No module named official.wide_deep`. ### Setup The [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income) that this sample uses for training is hosted by the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/). We have provided a script that downloads and cleans the necessary files. ``` python census_dataset.py ``` This will download the files to `/tmp/census_data`. To change the directory, set the `--data_dir` flag. ### Training You can run the code locally as follows: ``` python census_main.py ``` The model is saved to `/tmp/census_model` by default, which can be changed using the `--model_dir` flag. To run the *wide* or *deep*-only models, set the `--model_type` flag to `wide` or `deep`. Other flags are configurable as well; see `census_main.py` for details. The final accuracy should be over 83% with any of the three model types. You can also experiment with `-inter` and `-intra` flag to explore inter/intra op parallelism for potential better performance as follows: ``` python census_main.py --inter=<int> --intra=<int> ``` Please note the above optional inter/intra op does not affect model accuracy. These are TensorFlow framework configurations that only affect execution time. For more details regarding the above inter/intra flags, please refer to [Optimizing_for_CPU](https://www.tensorflow.org/performance/performance_guide#optimizing_for_cpu) or [TensorFlow config.proto source code](https://github.com/tensorflow/tensorflow/blob/26b4dfa65d360f2793ad75083c797d57f8661b93/tensorflow/core/protobuf/config.proto#L165). ### TensorBoard Run TensorBoard to inspect the details about the graph and training progression. ``` tensorboard --logdir=/tmp/census_model ``` ## Inference with SavedModel You can export the model into Tensorflow [SavedModel](https://www.tensorflow.org/guide/saved_model) format by using the argument `--export_dir`: ``` python census_main.py --export_dir /tmp/wide_deep_saved_model ``` After the model finishes training, use [`saved_model_cli`](https://www.tensorflow.org/guide/saved_model#cli_to_inspect_and_execute_savedmodel) to inspect and execute the SavedModel. Try the following commands to inspect the SavedModel: **Replace `${TIMESTAMP}` with the folder produced (e.g. 1524249124)** ``` # List possible tag_sets. Only one metagraph is saved, so there will be one option. saved_model_cli show --dir /tmp/wide_deep_saved_model/${TIMESTAMP}/ # Show SignatureDefs for tag_set=serve. SignatureDefs define the outputs to show. saved_model_cli show --dir /tmp/wide_deep_saved_model/${TIMESTAMP}/ \ --tag_set serve --all ``` ### Inference Let's use the model to predict the income group of two examples: ``` saved_model_cli run --dir /tmp/wide_deep_saved_model/${TIMESTAMP}/ \ --tag_set serve --signature_def="predict" \ --input_examples='examples=[{"age":[46.], "education_num":[10.], "capital_gain":[7688.], "capital_loss":[0.], "hours_per_week":[38.]}, {"age":[24.], "education_num":[13.], "capital_gain":[0.], "capital_loss":[0.], "hours_per_week":[50.]}]' ``` This will print out the predicted classes and class probabilities. Class 0 is the <=50k group and 1 is the >50k group. ## Additional Links If you are interested in distributed training, take a look at [Distributed TensorFlow](https://www.tensorflow.org/deploy/distributed). You can also [run this model on Cloud ML Engine](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction), which provides [hyperparameter tuning](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction#hyperparameter_tuning) to maximize your model's results and enables [deploying your model for prediction](https://cloud.google.com/ml-engine/docs/getting-started-training-prediction#deploy_a_model_to_support_prediction).

评论收藏

内容反馈

版权申诉