ExploitLatentDirichletAllocationforOne-ClassCollaborativeFiltering资源-CSDN文库

96 浏览量 2021-02-09 15:27:31 上传评论收藏 368KB PDF 举报

一、推荐系统与协同过滤问题推荐系统的目标是自动向每个用户推荐他们可能感兴趣的项目。传统的协同过滤方法通过挖掘用户评分历史数据来预测用户的兴趣，这些评分数据是多值分数，可以被归类为“多类”推荐问题。许多基于机器学习的算法被设计出来，用于在这些多值数据上预测用户的兴趣，其中基于矩阵分解的算法是最常见的一种。二、单类协同过滤单类协同过滤（OCCF）问题研究包括基于点的方法、基于对的方法和基于内容的方法。这些方法所做的基本假设大致相同，它们都将所有缺失值视为负值。然而，这并不合理，因为实际上缺失值是正面和负面示例的混合体。一个用户没有对某个项目给出积极反馈可能仅仅是因为她/他不知道该项目，但实际上她/他是喜欢它的。此外，基于内容的方法，例如协同主题回归（CTR），通常需要项目的文本内容信息。在某些情况下，这无法得到满足。三、隐含狄利克雷分配模型本文提出在OCCF问题上利用隐含狄利克雷分配（LDA）模型。它假设缺失值未知，并且只建模观察到的数据，同时它也不需要项目的文本内容信息。在我们的模型中，项目被视为词汇，用户被看作文档，用户-项目反馈矩阵表示语料库。实验结果表明，我们提出的模型在各种基于排名的评估指标上优于先前的方法。四、隐含狄利克雷分配模型的运用 LDA是一种主题模型，用于从大规模文本数据集中发现主题信息。它是概率模型，对文档集中的每个文档都建模成一组主题的概率分布，而每个主题又是一组词汇的概率分布。在推荐系统中运用LDA模型，可以把用户和项目分别映射为文档和词汇，用户的兴趣和项目的特征通过主题关联起来。五、缺失值的处理在OCCF问题中，由于用户对某些项目的评分信息缺失，而这些缺失值实际上包含着对项目正面和负面的评价。本文提出的方法中，缺失值被视为未知，并不直接参与模型的训练过程。这就避免了将缺失值直接解释为负面反馈带来的不合理性。六、文本内容信息的需求一些基于内容的推荐系统方法需要项目本身的文本内容信息。在实际应用中，并非所有项目都有足够的文本内容信息供算法使用。本文提出的基于LDA的方法不需要使用项目的文本内容信息，从而解决了这一局限。七、模型评估标准实验结果表明，本文提出的基于LDA的单类协同过滤方法在各种基于排名的评估指标上优于先前的方法。这意味着该方法能够更准确地预测用户对于项目的真实兴趣，并在实际的推荐系统中提供更为合理的推荐结果。本文提出的单类协同过滤方法在解决用户评分缺失问题以及不需要项目文本内容信息的前提下，通过隐含狄利克雷分配模型提供了一种更高效的推荐系统设计思路。这种方法既满足了当前推荐系统的实际需求，也避免了传统方法中存在的一些缺陷，显示出良好的实际应用潜力。

资源推荐

资源详情

资源评论

Exploit Latent Dirichlet Allocation for One-Class

Collaborative Filtering

Haijun Zhang, Zhoujun Li, Yan Chen, Xiaoming Zhang, Senzhang Wang

State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China

haijun_cumtb@126.com, lizj@buaa.edu.cn, yanchensmile@gmail.com, yolixs@buaa.edu.cn,

szwang@cse.buaa.edu.cn.

ABSTRACT

Previous work studied one-class collaborative filtering (OCCF)

problems including pointwise methods, pairwise methods, and

content-based methods. The fundamental assumptions made on

these approaches are roughly the same. They regard all missing

values as negative. However, this is unreasonable since the

missing values actually are the mixture of negative and positive

examples. A user does not give a positive feedback on an item

probably only because she/he is unaware of the item, but in fact,

she/he is fond of it. Furthermore, content-based methods, e.g.

collaborative topic regression (CTR), usually require textual

content information of items. This cannot be satisfied in some

cases. In this paper, we exploit latent Dirichlet allocation (LDA)

model on OCCF problem. It assumes missing values unknown

and only models the observed data, and it also does not need

content information of items. In our model items are regarded as

words and users are considered as documents and the user-item

feedback matrix denotes the corpus. Experimental results show

that our proposed method outperforms the previous methods on

various ranking–oriented evaluation metrics.

Categories and Subject Descriptors

H.3.3 [Information Search and Retrieval]: Information

filtering- Recommendation System.

General Terms

Algorithms.

Keywords

One-class Collaborative Filtering; Latent Dirichlet Allocation;

Topic Model

1. INTRODUCTION

The goal of recommendation system is to automatically suggest

items to each user that she/he may find appealing. Traditional

collaborative filtering approaches predict users’ interests by

mining user rating history data, and these rating data are multi-

valued scores, which can be categorized as “multi-class”

recommendation problem [1]. Many machine learning based

algorithms are designed to predict user’s interesting on these

multi-valued data, among of which matrix factorization based

algorithm achieved great success [2].

However, in many applications, the collected data of user

behaviors are in “one-class” form rather than multi-class form,

e.g., “like” in Facebook, “bought” in Amazon, “collect” in

Taobao and “follow” in Sina weibo. Such data are usually called

implicit [3] or one-class [1, 4, 5] feedback. The one-class

collaborative filtering (OCCF) problem is different from that of

multi-valued rating prediction problem, since the former only

contains positive feedback rather than both positive feedback and

negative feedback, and the goal is item ranking instead of rating

prediction. Traditional machine learning based algorithms cannot

directly be used to tackle OCCF because of imbalanced data [6].

The important difficulty to tackle OCCF problem is over fitting,

because only positive feedbacks are observed. In order to avoid

over fitting previous methods, including pointwise methods [4],

pairwise methods [1, 3] and content-based methods [7, 8], all

assume missing data negative. However they introduced newer

over fitting problem caused by too many negative data.

A good learning method is that it fits the observed data well and

as well as avoids over fitting. In this paper, similar to [7, 8], we

introduce topic model, i.e., latent Dirichlet allocation [9, 10]

(LDA) to deal with this problem. Our model is different from the

methods proposed in [7, 8] in two aspects: (1) we only model the

observed data, and the latter assume missing data negative and

strive to fit all the data. (2) our model does not need contents

information of items. Compared with the pointwise [4] and

pairwise methods [1, 3], the parameters learned by our model are

probability distributions, which have the inherent characteristics

that the probability is a positive number and the summation of

probabilities is equals to 1. These characteristics made our model

have excellent ability to avoid over fitting. Experimental results

show that our proposed method outperforms the previous methods

on various ranking–oriented evaluation metrics.

The rest of the paper is organized as follows. In the next section,

we review previous works related to the OCCF problems. In

Section 3, we propose our approach for OCCF problems. In

Section 4, we empirically compare our method to state-of-the-art

methods on three real world data sets. Finally, we conclude the

paper and give some future works.

2. RELATED WORKS

In this section, we review the literature of a few state-of-the-art

approaches proposed for OCCF problems. There are mainly three

types of approaches, (1) pointwise methods, (2) pairwise methods,

and (3) content-based methods.

2.1 Pointwise Methods

Pointwise methods take implicit feedback as absolute preference

scores. For example, an observed user-item pair is interpreted as a

positive feedback and is assigned with a high absolute rating

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for profit or commercial advantage and that copies bear

this notice and the full citation on the first page. Copyrights for components of

this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specific permission and/or a fee. Request

permissions from Permissions@acm.org.

CIKM '14, November 3-7, 2014, Shanghai, China.

http://dx.doi.org/10.1145/2661829.2661992

1991

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余3页未读，立即下载

评论收藏

内容反馈

weixin_38543280

粉丝: 4
资源: 975

Exploit Latent Dirichlet Allocation for One-Class Collaborative ...

最新资源

Exploit Latent Dirichlet Allocation for One-Class Collaborative ...

Unsupervised Language Filtering using the Latent Dirichlet Allocation

Latent dirichelt allocation

Latent Dirichlet Allocation

CollaborativeFiltering

利用潜在的Dirichlet分配进行协作过滤

JNDI-Injection-Exploit-1.0-SNAPSHOT-all.jar

JNDI-Injection-Exploit-1.0-SNAPSHOT-all

JNDI-Injection-Exploit-1.0-SNAPSHOT-all.zip

MSF辅助模块Auxiliary,Exploit应用，CVE-2015-1635漏洞攻防实战

us-18-Wu-Towards-Automating-Exploit-Generation-For-Arbitrary-

LatentDirichletAllocation

collaborative filtering

Collaborative-Filtering

Latent Dirichlet Allocation:

Latent Dirichlet Allocation算法论文

Exploit编写系列教程2--泉哥

MSF_EXPLOIT-WINDOWS-FILEFORMAT-WINRAR_CVE_2023_38831-.txt

exploit （Linux 内核CVE-2024-1086漏洞复现脚本）

JNDI-Inject-Exploit

CVE-2018-15473-Exploit-master

Latent Dirichlet allocation code

jndi-JNDI-Injection-Exploit

A Shared-Subspace Learning Framework for Multi-Label Classification

高危Windows 0day漏洞

看雪翻译的exploit教程合集

Black Hole Exploit Kit

linux-exploit-suggester:Linux特权升级审核工具

JNDI-Injection-Exploit:JNDI注入测试工具（生成JNDI链接的工具可以启动多个服务器来利用JNDI Injection漏洞，例如Jackson，Fastjson等）

最新资源