486 X. Yuan et al. / Knowledge-Based Systems 163 (2019) 485–494
social network-based [9,10] ones. Among existing methods, SVD-
based methods have been used widely.
Although the above-mentioned methods alleviated data spar-
sity and achieved good performances, they have not solved the
problem of data sparsity and their recommendation qualities are
not high enough. Similarly, existing SVD-based methods still face
the problem of data sparsity. How to alleviate the data sparsity for
the SVD-based methods is a challenge.
In this paper, we focus on the following problems:
- How to alleviate the data sparsity for SVD-based methods by
incorporating imputed data?
- How to generate the effective imputed data?
- Can the SVD model incorporated by imputed data improve
the recommendation quality?
In order to answer the above-mentioned research questions,
we propose a method called the Imputation-based SVD (ISVD)
by using imputed ratings to alleviate data sparsity and improve
recommendation quality. Specifically, ISVD computes the imputed
missing ratings and impute them into the training data of SVD,
thus largely alleviate the data sparsity of training data in SVD. In
this procedure, generating the imputed data is an important step
since the recommendation results depend on the effectiveness of
the imputed data. So, we propose an effective method of generating
imputed data. In summary, our contributions include:
- A novel algorithm is proposed to generate effective neigh-
bors for each user and item. In this algorithm, we solve the
problems of similarity overestimation and shortage of similar
users, which are normally present in the Top-N methods, by
using the number of common ratings and a fixed threshold
respectively.
- We propose a novel framework to incorporate imputed data
into the SVD model which can alleviate the data sparsity for
all SVD-based methods.
- We conduct several experiments on four real datasets:
MovieLens 100k, MovieLens 1M, Netflix and Filmtrust. Exper-
iment results show that ISVD outperforms the state-of-the-
art CF methods and the RMSEs/MAEs of ISVD are better than
those from other imputation-based and SVD-based methods
by more than 10%.
The rest of this paper is organized as follows. In Section 2, we
review the past work in the area. In Section 3, we describe the
problems we study in this paper. Section 4 presents the details of
ISVD after reviewing the RSVD. Section 5 provides the experiment
process and results. Finally, we discuss and conclude our paper in
Section 6.
2. Related work
In this section, we review recommendation methods related to
our work, including the Collaborative Filtering, Hybrid methods,
Social network-based methods and Imputation-based methods.
2.1. Collaborative filtering
Collaborative filtering can be generally divided into
Neighborhood-based CF and Model-based CF. Neighborhood-
based CF methods generate the recommend list by using pre-
defined similarity calculation methods to identify similar users
or items. In these methods, the similarity is usually calculated
according to the Pearson Correlation Coefficient (PCC) or Vector
Space Similarity (VSS) [11,12]. Although Neighborhood-based CF
methods are most commonly used, they have the problems of data
sparsity and poor scalability.
Patra et al. [13] proposed a method of computing similarity
for Neighborhood-based CF. This method, unlike existing ones,
uses all ratings made by a pair of users. It finds importance of
each pair of rated items by exploiting the Bhattacharyya similarity.
Alqadah et al. [14] proposed a novel collaborative filtering method
for top-n recommendation tasks using bi-clustering neighborhood
approach. This method uses the local bi-clustering structure for a
more precise and localized collaborative filtering. It builds user-
specific bi-clusters and creates an innovative rank scoring of can-
didate items that combines the local similarity of bi-clusters with
the global similarity.
In contrast to the Neighborhood-based CF, Model-based CF
produces the predicted ratings by a model trained from the user-
item matrix. Model-based methods include the Latent Semantic
Model (LFM) [11,12], pLSA, LDA, Latent Class Model, Latent Topic
Model and Matrix Factorization. These methods are essentially
equivalent, among which Matrix Factorization is most common in
recommender systems. A major problem of Model-based methods
is the poor interpretability and data sparsity [15].
Ji et al. [16] proposed a method for alleviating the problem of
cold start by incorporating content-based information. It firstly
uses a Neighborhood approach to build a tag-keyword relation
matrix according to rating data. Then, with the relation matrix, it
builds a 3-factor matrix factorization model on the rating matrix
to learn every interest vector of every user for selected tags and
learn the correlation vector of every item for extracted keywords.
Finally, it incorporates the relation matrix into two kinds of vectors
to make recommendations. Luo et al. [15] proposed an SVD-based
model via second-order optimization and achieve higher accuracy.
They proposed a Hessian-free optimization-based latent factor
model, which can extract latent factors from the given incomplete
matrices via a second-order optimization process.
2.2. Hybrid methods
Hybrid recommender systems combine Collaborative Filtering
and Content-based techniques to solve the problem of cold-start
and to alleviate data sparsity.
LA-LDA [17] method alleviates data sparsity by using the spatial
information of users and items and achieves good recommenda-
tion results. This method produces recommendations by incorpo-
rating and quantifying the influence from local public preferences
and by capturing patterns with the nature of item co-occurrence
and item location co-occurrence. ST-LDA method [18] solves the
problem of user’s interest drifting across geographical regions by
learning region-dependent personal interest. Besides, ST-LDA alle-
viates the data sparsity by incorporating the crowd’s preferences
and by building a social-spatial collective inference framework.
This method uses the content of POIs and social information to
alleviate the data sparsity and achieves high recommendation
quality.
2.3. Social network-based methods
Recently, many researchers have paid attention to the alle-
viation of data sparsity to improve recommendation quality by
incorporating additional information, such as social relationship
between users or items. The social relationship information in-
cludes explicit information, implicit information and community
information [19]. There are several methods proposed in literature.
The social regularization (RS) method [9,20] alleviates data
sparsity by adding a social regularization term, which can be used
to incorporate the social information, into the Matrix Factorization
framework. SH-CDL method [21] combines deep representation
learning for Point-of-interest recommendations and hierarchically