【免费】强化学习推荐系统论文研究_强化学习和推荐论文资源-CSDN文库

需积分: 0 104 浏览量 2023-03-23 21:36:36 上传评论收藏 1005KB PDF 举报

"强化学习在推荐系统中的应用研究" 强化学习在推荐系统中的应用研究是当前IT行业的热门话题。强化学习是一种机器学习方法，它可以在不需要明确定义奖励函数的情况下，通过试验和错误来学习最优策略。强化学习在推荐系统中的应用可以解决很多实际问题，如如何对用户行为进行建模、如何对项目进行排序等。第一，强化学习可以解决推荐系统中的 cold start 问题。Cold start 问题是指新加入的用户或项目没有足够的行为数据，从而无法进行有效的推荐。强化学习可以通过模拟用户行为来解决这个问题，并且可以根据用户的反馈信息来调整推荐策略。第二，强化学习可以提高推荐系统的多样性。多样性是指推荐系统能够提供多种类型的项目的能力。强化学习可以通过探索不同的项目来提高推荐系统的多样性，从而提高用户的满意度。第三，强化学习可以解决推荐系统中的偏好问题。偏好问题是指推荐系统中存在一些固定的偏好，从而影响了推荐结果。强化学习可以通过探索不同的项目来解决这个问题，并且可以根据用户的反馈信息来调整推荐策略。第四，强化学习可以提高推荐系统的实时性。实时性是指推荐系统能够实时地对用户的行为进行响应的能力。强化学习可以通过实时地对用户的行为进行建模来提高推荐系统的实时性。强化学习在推荐系统中的应用可以解决很多实际问题，提高推荐系统的性能，并且能够带来很高的商业价值。论文《Top-K Off-Policy Correction for a REINFORCE Recommender System》对强化学习在推荐系统中的应用进行了深入的研究。论文中提出了一个通用的Recipe来解决推荐系统中的偏好问题，并且展示了强化学习在推荐系统中的应用价值。论文的贡献主要有四点：第一，scaling REINFORCE to a production recommender system with an action space on the orders of millions；第二，applying off-policy correction to address data biases in learning from logged feedback collected from multiple behavior policies；第三，proposing a novel top-K off-policy correction to account for our policy recommending multiple items at a time；第四，showcasing the value of exploration。论文的实验结果表明，强化学习可以带来很高的商业价值，并且能够提高推荐系统的性能。论文的结果对强化学习在推荐系统中的应用具有重要的参考价值。强化学习在推荐系统中的应用研究是当前IT行业的热门话题，具有重要的理论价值和商业价值。强化学习可以解决推荐系统中的很多实际问题，并且能够提高推荐系统的性能。

资源推荐

资源详情

资源评论

Top-𝐾 O-Policy Correction

for a REINFORCE Recommender System

Minmin Chen

∗

, Alex Beutel

∗

, Paul Covington

∗

, Sagar Jain, Francois Belletti, Ed H. Chi

Google, Inc.

Mountain View, CA

minminc,alexbeutel,pcovington,sagarj,belletti,edchi@google.com

ABSTRACT

Industrial recommender systems deal with extremely large action

spaces – many millions of items to recommend. Moreover, they

need to serve billions of users, who are unique at any point in

time, making a complex user state space. Luckily, huge quantities

of logged implicit feedback (e.g., user clicks, dwell time) are avail-

able for learning. Learning from the logged feedback is however

subject to biases caused by only observing feedback on recommen-

dations selected by the previous versions of the recommender. In

this work, we present a general recipe of addressing such biases in

a production top-

𝐾

recommender system at YouTube, built with a

policy-gradient-based algorithm, i.e. REINFORCE [

]. The contri-

butions of the paper are: (1) scaling REINFORCE to a production

recommender system with an action space on the orders of millions;

(2) applying o-policy correction to address data biases in learning

from logged feedback collected from multiple behavior policies; (3)

proposing a novel top-

𝐾

o-policy correction to account for our

policy recommending multiple items at a time; (4) showcasing the

value of exploration. We demonstrate the ecacy of our approaches

through a series of simulations and multiple live experiments on

YouTube.

ACM Reference Format:

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed

H. Chi. 2019. Top-K O-Policy Correction for a REINFORCE Recommender

System. In The Twelfth ACM International Conference on Web Search and

Data Mining (WSDM’ 19), February 11-15, 2019, Melbourne, VIC, Australia.

ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3289600.3290999

1 INTRODUCTION

Recommender systems are relied on, throughout industry, to help

users sort through huge corpuses of content and discover the small

fraction of content they would be interested in. This problem is

challenging because of the huge number of items that could be rec-

ommended. Furthermore, surfacing the right item to the right user

at the right time requires the recommender system to constantly

adapt to users’ shifting interest (state) based on their historical

∗

Authors contributed equally.

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

ACM ISBN 978-1-4503-5940-5/19/02.

https://doi.org/10.1145/3289600.3290999

interaction with the system [

]. Unfortunately, we observe rela-

tively little data for such a large state and action space, with most

users only having been exposed to a small fraction of items and

providing explicit feedback to an even smaller fraction. That is,

recommender systems receive extremely sparse data for training

in general, e.g., the Netix Prize dataset was only 0.1% dense [

As a result, a good amount of research in recommender systems

explores dierent mechanisms for treating this extreme sparsity.

Learning from implicit user feedback, such as clicks and dwell-time,

as well as lling in unobserved interactions, has been an important

step in improving recommenders [

] but the problem remains an

open one.

In a mostly separate line of research, reinforcement learning (RL)

has recently achieved impressive advances in games [

] as well

as robotics [

]. RL in general focuses on building agents that

take actions in an environment so as to maximize some notion of

long term reward. Here we explore framing recommendation as

building RL agents to maximize each user’s long term satisfaction

with the system. This oers us new perspectives on recommenda-

tion problems as well as opportunities to build on top of the recent

RL advancement. However, there are signicant challenges to put

this perspective into practice.

As introduced above, recommender systems deal with large state

and action spaces, and this is particularly exacerbated in industrial

settings. The set of items available to recommend is non-stationary

and new items are brought into the system constantly, resulting in

an ever-growing action space with new items having even sparser

feedback. Further, user preferences over these items are shifting

all the time, resulting in continuously-evolving user states. Being

able to reason through these large number of actions in such a

complex environment poses unique challenges in applying existing

RL algorithms. Here we share our experience adapting the REIN-

FORCE algorithm [

] to a neural candidate generator (a top-

𝐾

recommender system) with extremely large action and state spaces.

In addition to the massive action and state spaces, RL for recom-

mendation is distinct in its limited availability of data. Classic RL

applications have overcome data ineciencies by collecting large

quantities of training data with self-play and simulation [

]. In

contrast, the complex dynamics of the recommender system has

made simulation for generating realistic recommendation data non-

viable. As a result, we cannot easily probe for reward in previously

unexplored areas of the state and action space, since observing

reward requires giving a real recommendation to a real user. In-

stead, the model relies mostly on data made available from the

previous recommendation models (policies), most of which we can-

not control or can no longer control. To most eectively utilize

logged-feedback from other policies, we take an o-policy learning

arXiv:1812.02353v3 [cs.LG] 15 Dec 2021

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

评论收藏

内容反馈

Mx_wangDD

粉丝: 10
资源: 2

强化学习 推荐系统 论文 研究

新南威尔士首篇《深度强化学习推荐系统》综述论文

论文研究-强化学习研究综述.pdf

分布式强化学习系统的体系结构研究

论文研究-基于多智能体强化学习的多机器人协作策略研究.pdf

深度强化学习论文

强化学习论文

顶会论文 65篇IJCAI深度强化学习论文汇总.pdf

深度强化学习的论文综述

论文研究-强化学习在基于多主体模型决策支持系统中的应用 ——以湖泊水环境决策支持系统为例.pdf

论文研究-基于极限学习机的强化学习 .pdf

论文研究-分层强化学习研究进展.pdf

RNN,强化学习论文

论文研究-基于遗传算法与强化学习的机位分配问题研究 .pdf

论文研究-基于启发式探测的多智能体分层强化学习 .pdf

基于学习的推荐系统论文集

论文研究-基于深度强化学习的动态计算卸载 .pdf

论文研究-弱通信条件下基于Q图迁移的多智能体分层强化学习 .pdf

最新《智能交通系统的深度强化学习》综述论文

论文研究-多智能体强化学习在城市交通网络信号控制方法中的应用综述.pdf

论文研究-供应链联合补充问题的强化学习算法 .pdf

论文研究-基于强化学习的多成品率衰变设备预防维修策略.pdf

论文研究-基于排队模型和强化学习的动态云任务调度算法 .pdf

基于协同过滤算法和强化学习的电影推荐系统.docx

一种深度强化学习的机械臂控制方法.pdf

基于强化学习的无人驾驶车辆行为决策方法研究进展.docx

基于深度强化学习的仿真机器人轴孔装配研究.pdf

基于深度强化学习的陆军分队战术决策问题研究.pdf

论文研究-基于平均奖赏强化学习算法的零阶分类元系统.pdf

基于强化学习的机组组合问题求解方法研究1

最新资源

强化学习推荐系统论文研究