使用归因数据的基于奇异值分解的推荐资源-CSDN文库

需积分: 9 108 浏览量 2021-04-27 00:53:18 上传评论收藏 1.33MB PDF 举报

在当今信息科技时代，推荐系统作为一个重要的数据挖掘应用，在电子商务、电影推荐、音乐推荐等多个领域都扮演着举足轻重的角色。推荐系统的核心目标是根据用户的历史行为、偏好以及用户的社交网络信息等，向用户推荐他们可能感兴趣的商品或服务。奇异值分解（SVD）是推荐系统中非常成功的一种方法，但面对数据稀疏性问题时，其推荐质量会受到影响。本文所探讨的是一种新的基于归因数据的奇异值分解推荐方法，即ISVD（Imputation-based Singular Value Decomposition），旨在解决传统SVD方法中遇到的数据稀疏性问题。我们需要了解什么是奇异值分解以及它在推荐系统中的应用。SVD是一种数学工具，通过分解一个矩阵为三个或更多的矩阵乘积来简化复杂度，这在处理高维数据时特别有用。在推荐系统中，SVD可以用来分析用户与项目间的交互矩阵，从而发现用户和项目之间的潜在关联性，即可以推断出用户的潜在喜好和项目间的关联特征。然而，当用户或项目的交互数据不充足时，即数据稀疏性问题出现时，推荐系统的性能会急剧下降。 ISVD方法提出了一种新颖的基于归因数据的推荐策略。它首先提出了一个基于相似度度量的邻居选择算法，该算法设置了两个阈值来为每个用户或项目选择有效的邻居。根据这些邻居的评分生成归因数据。最终，将这些归因数据整合到SVD框架中。通过在SVD中使用归因训练数据，ISVD方法能够准确地学习预测模型。在实验方面，研究者在四个真实数据集上进行了实验：MovieLens 100k、MovieLens 1M、Netflix和Filmtrust。实验结果显示，ISVD方法在测试中超越了其他基于SVD和基于归因的方法。在RMSE（均方根误差）和MAE（平均绝对误差）方面，ISVD的性能比其他方法提高了超过10%。数据稀疏性是推荐系统中一个普遍存在的问题，它指的是推荐系统中的用户-物品交互矩阵中有很多缺失的值。数据稀疏性会导致推荐质量降低，因为它限制了系统发现用户与物品间潜在关系的能力。ISVD方法通过引入归因数据，能够有效地缓解这一问题。对于推荐系统的开发者来说，ISVD方法提供了一种创新的视角，即将缺失数据的归因处理与传统推荐技术结合起来，从而改进了推荐质量。这种结合使用可以为系统提供更丰富、更准确的用户偏好信息，从而作出更高质量的推荐。 ISVD方法通过有效地整合归因数据到SVD框架中，提高了推荐系统的性能，特别是在数据稀疏性问题突出的情况下。这为处理推荐系统中常见的数据稀疏性问题提供了一条新的解决路径，并且对整个推荐系统领域的研究和实践都具有重要的意义。未来的工作将可能进一步改进邻居选择算法，探索更复杂的归因模型，以及将ISVD方法应用于更多类型的推荐系统中，比如基于内容的推荐或者混合推荐系统。

资源推荐

资源详情

资源评论

Knowledge-Based Systems 163 (2019) 485–494

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

Singular value decomposition based recommendation using imputed

data

Xiaofeng Yuan

a,b

, Lixin Han

∗

, Subin Qian

a,b

, Guoxia Xu

, Hong Yan

College of Computer and Information, Hohai University, Nanjing 210024, China

School of Information Engineering, Yancheng Teachers University, Yancheng 224002, China

Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong, China

h i g h l i g h t s

• We propose a novel method (ISVD) to incorporate imputed data into the SVD framework. ISVD also proposes a novel algorithm to choose effective

neighbors of users or items for generating imputed data.

• ISVD is useful to all SVD-based recommendation methods.

• We conduct several experiments on four real datasets: MovieLens 100k, MovieLens 1M, Netflix and Filmtrust. Experiment results show that ISVD

outperforms the state-of-the-art CFs and the RMSEs/MAEs of ISVD are better than those from other imputation-based and SVD-based methods by

more than 10%.

a r t i c l e i n f o

Article history:

Received 20 March 2018

Received in revised form 27 August 2018

Accepted 8 September 2018

Available online 12 September 2018

Keywords:

Imputation-based recommendation

SVD-based recommendation

Data sparsity

a b s t r a c t

Among widely used recommendation methods, singular value decomposition (SVD) based approaches

are the most successful ones. Although SVD-based methods are effective, they suffer from the problem of

data sparsity, which could lead to poor recommendation quality. This paper proposes a novel imputation-

based recommendation method, called the imputation-based SVD (ISVD), to solve the problem of data

sparsity in SVD-based methods. Firstly, we propose a neighbor selection algorithm based on a similarity

measure for users and items. In this algorithm, we set two thresholds to select effective neighbors for

each user and item. Secondly, we generate the imputed data according to the neighbors’ ratings. Finally,

we imputed these data into the SVD framework. By using imputed training data in SVD, our method can

learn the prediction model accurately. We have tested our method on the MovieLens 100k, MovieLens 1M,

Netflix and Filmtrust datasets. Experiment results show that our method outperforms the state-of-the-art

ones. This study not only offers new insights into generating imputed data but also provides a guide to

the alleviation of data sparsity in SVD-based methods.

1. Introduction

Recommender Systems (RS) have been studied to overcome the

information overload during the past two decades. In order to meet

users’ ever-increasing demand of information, recommendation

techniques have been studied widely. Due to the commercial value,

relevant research results have been applied in the information

technology industry. Although many recommendation methods

have been proposed, their recommendation qualities have not yet

∗

Corresponding author.

E-mail address: lixinhan2002@aliyun.com (L. Han).

met users’ demand. Improving the recommendation quality is the

main goal of the research on recommender systems.

Existing recommendation methods can be categorized into Col-

laborative Filtering (CF) [1], Content-based filtering [2], Hybrid

methods [3] and others. CF can be further divided into two groups,

Neighborhood-based and Model-based [4]. All these methods have

their own drawbacks. For example, CF algorithms suffer from cold

start and sparsity problems [5]. Content-based methods need addi-

tional features of items, which are not always available [6]. Hybrid

methods still face the problem of data sparsity. To alleviate the

data sparsity, in recent years, several other methods have been

proposed in literature, such as imputation-based [7,8] ones and

https://doi.org/10.1016/j.knosys.2018.09.011

486 X. Yuan et al. / Knowledge-Based Systems 163 (2019) 485–494

social network-based [9,10] ones. Among existing methods, SVD-

based methods have been used widely.

Although the above-mentioned methods alleviated data spar-

sity and achieved good performances, they have not solved the

problem of data sparsity and their recommendation qualities are

not high enough. Similarly, existing SVD-based methods still face

the problem of data sparsity. How to alleviate the data sparsity for

the SVD-based methods is a challenge.

In this paper, we focus on the following problems:

- How to alleviate the data sparsity for SVD-based methods by

incorporating imputed data?

- How to generate the effective imputed data?

- Can the SVD model incorporated by imputed data improve

the recommendation quality?

In order to answer the above-mentioned research questions,

we propose a method called the Imputation-based SVD (ISVD)

by using imputed ratings to alleviate data sparsity and improve

recommendation quality. Specifically, ISVD computes the imputed

missing ratings and impute them into the training data of SVD,

thus largely alleviate the data sparsity of training data in SVD. In

this procedure, generating the imputed data is an important step

since the recommendation results depend on the effectiveness of

the imputed data. So, we propose an effective method of generating

imputed data. In summary, our contributions include:

- A novel algorithm is proposed to generate effective neigh-

bors for each user and item. In this algorithm, we solve the

problems of similarity overestimation and shortage of similar

users, which are normally present in the Top-N methods, by

using the number of common ratings and a fixed threshold

respectively.

- We propose a novel framework to incorporate imputed data

into the SVD model which can alleviate the data sparsity for

all SVD-based methods.

- We conduct several experiments on four real datasets:

MovieLens 100k, MovieLens 1M, Netflix and Filmtrust. Exper-

iment results show that ISVD outperforms the state-of-the-

art CF methods and the RMSEs/MAEs of ISVD are better than

those from other imputation-based and SVD-based methods

by more than 10%.

The rest of this paper is organized as follows. In Section 2, we

review the past work in the area. In Section 3, we describe the

problems we study in this paper. Section 4 presents the details of

ISVD after reviewing the RSVD. Section 5 provides the experiment

process and results. Finally, we discuss and conclude our paper in

Section 6.

2. Related work

In this section, we review recommendation methods related to

our work, including the Collaborative Filtering, Hybrid methods,

Social network-based methods and Imputation-based methods.

2.1. Collaborative filtering

Collaborative filtering can be generally divided into

Neighborhood-based CF and Model-based CF. Neighborhood-

based CF methods generate the recommend list by using pre-

defined similarity calculation methods to identify similar users

or items. In these methods, the similarity is usually calculated

according to the Pearson Correlation Coefficient (PCC) or Vector

Space Similarity (VSS) [11,12]. Although Neighborhood-based CF

methods are most commonly used, they have the problems of data

sparsity and poor scalability.

Patra et al. [13] proposed a method of computing similarity

for Neighborhood-based CF. This method, unlike existing ones,

uses all ratings made by a pair of users. It finds importance of

each pair of rated items by exploiting the Bhattacharyya similarity.

Alqadah et al. [14] proposed a novel collaborative filtering method

for top-n recommendation tasks using bi-clustering neighborhood

approach. This method uses the local bi-clustering structure for a

more precise and localized collaborative filtering. It builds user-

specific bi-clusters and creates an innovative rank scoring of can-

didate items that combines the local similarity of bi-clusters with

the global similarity.

In contrast to the Neighborhood-based CF, Model-based CF

produces the predicted ratings by a model trained from the user-

item matrix. Model-based methods include the Latent Semantic

Model (LFM) [11,12], pLSA, LDA, Latent Class Model, Latent Topic

Model and Matrix Factorization. These methods are essentially

equivalent, among which Matrix Factorization is most common in

recommender systems. A major problem of Model-based methods

is the poor interpretability and data sparsity [15].

Ji et al. [16] proposed a method for alleviating the problem of

cold start by incorporating content-based information. It firstly

uses a Neighborhood approach to build a tag-keyword relation

matrix according to rating data. Then, with the relation matrix, it

builds a 3-factor matrix factorization model on the rating matrix

to learn every interest vector of every user for selected tags and

learn the correlation vector of every item for extracted keywords.

Finally, it incorporates the relation matrix into two kinds of vectors

to make recommendations. Luo et al. [15] proposed an SVD-based

model via second-order optimization and achieve higher accuracy.

They proposed a Hessian-free optimization-based latent factor

model, which can extract latent factors from the given incomplete

matrices via a second-order optimization process.

2.2. Hybrid methods

Hybrid recommender systems combine Collaborative Filtering

and Content-based techniques to solve the problem of cold-start

and to alleviate data sparsity.

LA-LDA [17] method alleviates data sparsity by using the spatial

information of users and items and achieves good recommenda-

tion results. This method produces recommendations by incorpo-

rating and quantifying the influence from local public preferences

and by capturing patterns with the nature of item co-occurrence

and item location co-occurrence. ST-LDA method [18] solves the

problem of user’s interest drifting across geographical regions by

learning region-dependent personal interest. Besides, ST-LDA alle-

viates the data sparsity by incorporating the crowd’s preferences

and by building a social-spatial collective inference framework.

This method uses the content of POIs and social information to

alleviate the data sparsity and achieves high recommendation

quality.

2.3. Social network-based methods

Recently, many researchers have paid attention to the alle-

viation of data sparsity to improve recommendation quality by

incorporating additional information, such as social relationship

between users or items. The social relationship information in-

cludes explicit information, implicit information and community

information [19]. There are several methods proposed in literature.

The social regularization (RS) method [9,20] alleviates data

sparsity by adding a social regularization term, which can be used

to incorporate the social information, into the Matrix Factorization

framework. SH-CDL method [21] combines deep representation

learning for Point-of-interest recommendations and hierarchically

剩余9页未读，继续阅读

评论收藏

内容反馈

weixin_38528086

粉丝: 2
资源: 921

使用归因数据的基于奇异值分解的推荐

基于近似奇异值分解（ApproSVD）的个性化推荐算法

基于关联规则与奇异值分解的音乐推荐系统

基于LU分解和交替最小二乘法的分布式奇异值分解推荐算法.pdf

一种基于高阶奇异分解的个性化股票推荐算法_茅斯佳1

基于学习过程数据挖掘的高职学困生归因分析与对策研究.pdf

基于Barra模型的业绩归因分析1

04- 基于持仓的基金业绩归因：始于 Brinson，归于 Barra1

颜色分类leetcode-DeepExplain:用于深度神经网络可解释性的扰动和基于梯度的归因方法的统一框架。DeepExplain还包括对

海量数据下奇异值分解推荐算法的改进与实现

CSE523-Machine-Learning-KHVM:一种音乐推荐系统，它使用协作过滤和机器学习算法（例如K近邻和奇异值分解（SVD））根据用户的偏好向用户推荐歌曲

PacketWhisper：PacketWhisper：使用DNS查询和基于文本的隐写术秘密窃取数据并破坏归因。 避免与典型的DNS渗透方法相关的问题。 在系统之间传输数据，而通信设备之间不会直接相互连接或与公共端点连接。 无需控制DNS名称服务器

C 代码 读取包含历史降雪数据的文件，并 使用奇异值分解 （SVD） 分析数据.rar

基于matlab的表情识别代码-SVD:使用奇异值分解的MATLAB面部重建

Brinson多期归因计算示例

基于水文模型与机器学习集合模拟的水沙变异归因定量识别——以黄河中游窟野河流域为例.pdf

基于文本数据分析的上市公司业绩归因研究

大类资产与基金研究专题报告：基于净值数据的Campisi型债基归因模型_中信建投-19页.pdf

归因偏差对他人的认知与归因.ppt

Image-Processing:包含在计算机视觉、图像处理和机器学习的交叉点基于奇异值分解和霍夫曼编码使用稀疏矩阵和张量投影到示例正交基上来实现紧凑图像表示算法的程序

Image_Watermarking_使用奇异值分解和离散小波变换：这是一种基于svd和dwt的图像水印算法-matlab开发

SVD用于一维数据的降维：在SVM分类问题中使用奇异值分解对特征向量进行降维-matlab开发

基于奇异值分解的医疗数据信息提取及分类方法.rar

基于奇异值分解的医疗数据信息提取及分类方法.pdf

论文研究 - 评估幼儿的敌对归因偏见

基于深度学习的放大攻击归因技术.pdf

行业主题主动基金业绩归因框架：从归因解读到基金优选的思路.pdf

数据分析：结构比率归因、量化异常分析.docx

最新资源

PacketWhisper：PacketWhisper：使用DNS查询和基于文本的隐写术秘密窃取数据并破坏归因。避免与典型的DNS渗透方法相关的问题。在系统之间传输数据，而通信设备之间不会直接相互连接或与公共端点连接。无需控制DNS名称服务器

C 代码读取包含历史降雪数据的文件，并使用奇异值分解（SVD）分析数据.rar