Exploit Latent Dirichlet Allocation for One-Class
Collaborative Filtering
Haijun Zhang, Zhoujun Li, Yan Chen, Xiaoming Zhang, Senzhang Wang
State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
haijun_cumtb@126.com, lizj@buaa.edu.cn, yanchensmile@gmail.com, yolixs@buaa.edu.cn,
szwang@cse.buaa.edu.cn.
ABSTRACT
Previous work studied one-class collaborative filtering (OCCF)
problems including pointwise methods, pairwise methods, and
content-based methods. The fundamental assumptions made on
these approaches are roughly the same. They regard all missing
values as negative. However, this is unreasonable since the
missing values actually are the mixture of negative and positive
examples. A user does not give a positive feedback on an item
probably only because she/he is unaware of the item, but in fact,
she/he is fond of it. Furthermore, content-based methods, e.g.
collaborative topic regression (CTR), usually require textual
content information of items. This cannot be satisfied in some
cases. In this paper, we exploit latent Dirichlet allocation (LDA)
model on OCCF problem. It assumes missing values unknown
and only models the observed data, and it also does not need
content information of items. In our model items are regarded as
words and users are considered as documents and the user-item
feedback matrix denotes the corpus. Experimental results show
that our proposed method outperforms the previous methods on
various ranking–oriented evaluation metrics.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Information
filtering- Recommendation System.
General Terms
Algorithms.
Keywords
One-class Collaborative Filtering; Latent Dirichlet Allocation;
Topic Model
1. INTRODUCTION
The goal of recommendation system is to automatically suggest
items to each user that she/he may find appealing. Traditional
collaborative filtering approaches predict users’ interests by
mining user rating history data, and these rating data are multi-
valued scores, which can be categorized as “multi-class”
recommendation problem [1]. Many machine learning based
algorithms are designed to predict user’s interesting on these
multi-valued data, among of which matrix factorization based
algorithm achieved great success [2].
However, in many applications, the collected data of user
behaviors are in “one-class” form rather than multi-class form,
e.g., “like” in Facebook, “bought” in Amazon, “collect” in
Taobao and “follow” in Sina weibo. Such data are usually called
implicit [3] or one-class [1, 4, 5] feedback. The one-class
collaborative filtering (OCCF) problem is different from that of
multi-valued rating prediction problem, since the former only
contains positive feedback rather than both positive feedback and
negative feedback, and the goal is item ranking instead of rating
prediction. Traditional machine learning based algorithms cannot
directly be used to tackle OCCF because of imbalanced data [6].
The important difficulty to tackle OCCF problem is over fitting,
because only positive feedbacks are observed. In order to avoid
over fitting previous methods, including pointwise methods [4],
pairwise methods [1, 3] and content-based methods [7, 8], all
assume missing data negative. However they introduced newer
over fitting problem caused by too many negative data.
A good learning method is that it fits the observed data well and
as well as avoids over fitting. In this paper, similar to [7, 8], we
introduce topic model, i.e., latent Dirichlet allocation [9, 10]
(LDA) to deal with this problem. Our model is different from the
methods proposed in [7, 8] in two aspects: (1) we only model the
observed data, and the latter assume missing data negative and
strive to fit all the data. (2) our model does not need contents
information of items. Compared with the pointwise [4] and
pairwise methods [1, 3], the parameters learned by our model are
probability distributions, which have the inherent characteristics
that the probability is a positive number and the summation of
probabilities is equals to 1. These characteristics made our model
have excellent ability to avoid over fitting. Experimental results
show that our proposed method outperforms the previous methods
on various ranking–oriented evaluation metrics.
The rest of the paper is organized as follows. In the next section,
we review previous works related to the OCCF problems. In
Section 3, we propose our approach for OCCF problems. In
Section 4, we empirically compare our method to state-of-the-art
methods on three real world data sets. Finally, we conclude the
paper and give some future works.
2. RELATED WORKS
In this section, we review the literature of a few state-of-the-art
approaches proposed for OCCF problems. There are mainly three
types of approaches, (1) pointwise methods, (2) pairwise methods,
and (3) content-based methods.
2.1 Pointwise Methods
Pointwise methods take implicit feedback as absolute preference
scores. For example, an observed user-item pair is interpreted as a
positive feedback and is assigned with a high absolute rating
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components of
this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
CIKM '14, November 3-7, 2014, Shanghai, China.
Copyright 2014 ACM 978-1-4503-2598-1/14/11…$15.00.
http://dx.doi.org/10.1145/2661829.2661992