Using Co-clustering for Predicting Movie Rating in Netflix
Tuyen Huynh and Duy Vu
1. INTRODUCTION
Prediction of missing values has recently found many prac-
tical applications in sales forecasting and recommendation
systems. It is a main task in information filtering where
the goal is to identify the relevance of an item such as a
movie or a book to a given user based on the user profile
and/or known preferences of the other users. We can also
view these input data as a matrix where each row represents
for the ratings of one user and one column corresponds to
a movie. The task now is to impute missing entries of the
matrix.
The recent study on Bregman co-clustering [1][3] has pro-
posed a method for simultaneously partitioning rows and
columns of such a data matrix and then using clustering
results to predict those missing entries. By incorporating
both row and column clustering information, a kind of sta-
tistical regularization technique, co-clustering can yield bet-
ter quality clusters even only single-sided clustering results
are needed. In addition, co-clustering is more scalable than
traditional single-sided clustering [1].
In this project, we explore the combination of co-clustering
andothermethodsformissingvaluepredictiononapartic-
ular subset of the original Netflix dataset. The original is a
huge dataset which contains over 100 million of ratings from
480 thousands Netflix users over 17 thousand movies. Each
rating is on a scale from 1 to 5. Since this dataset is too
large, a subset of it is sampled for the pilot study. In this
project based on the subset dataset we would like to initially
answer two following questions. The first one is how well the
Bregman co-clustering performs on the Netflix dataset? The
second one is whether we can use the co-clustering as an in-
termediate step for partitioning the data into blocks, then
applying some expensive but more accurate methods such
as using SVD on co-clusters of data to achieve higher perfor-
mance? The intuition of this idea is that the co-clustering
method can capture the favorite relations between groups of
users and groups of movies. Therefore, this approach does
not only make the dataset scalable to SVD, but it may also
help to improve the predicting performance.
2. METHODS FOR PREDICTING MOVIE
RATING
2.1 Missing Value Prediction Using Co-clustering
In [2],[4], it has been shown that the missing value pre-
diction problem can be formulated as a weighted matrix
approximation problem where the weights are 1’s for known
values and 0’s for unknown ones, then we can use co-clustering
for finding the best matrix approximation based on some cri-
teria. The assumption is that the original matrix has a low
parameter structure involving properties of row and column
clusters. By minimizing a desired loss function, co-clustering
can find that low parameter structure and this structure is
used for reconstructing the approximate matrix. Let Z be
the random variable that takes values in the matrix, U and
V be discrete random variables whose values are the row
and column indices respectively, and
ˆ
U and
ˆ
V be discrete
random variables which takes values on the row cluster and
column cluster indices. Then [2] shows that, for a given
co-clustering, there are only six distinct sets of summary
statistics or co-clustering bases which can be used for ap-
proximating the matrix
ˆ
Z:
C
1
= {{
ˆ
U }, {
ˆ
V }} C
2
= {{
ˆ
U,
ˆ
V }}
C
3
= {{
ˆ
U,
ˆ
V }, {
ˆ
U }} C
4
= {{
ˆ
U,
ˆ
V }, {
ˆ
V }}
C
5
= {{
ˆ
U,
ˆ
V }, {
ˆ
U }, {
ˆ
V }} C
6
= {{
ˆ
U,V }, {U,
ˆ
V }}
We also call these co-clustering bases as schemes. More-
over, using the Minimum Bregman Information (MBI), we
can find the best approximation matrix
ˆ
Z for a given co-
clustering, a given co-clustering basis, and a given Bregman
divergence. Table 1 and 2 present the best approximation
solutions of each basis for squared Euclidean distance and
I-divergence, where the expectation in those formulae are
interpreted as follows:
• E[Z]: the average value of the entire matrix
• E[Z|U ]andE[Z|V ]: row averages and column aver-
ages respectively
• E[Z|
ˆ
U ]andE[Z|
ˆ
V ]: row cluster averages and column
cluster averages respectively
• E[Z|U,
ˆ
V ]andE[Z|
ˆ
U,V ]: row column cluster averages
and column row cluster averages respectively. In other
words, they are the average of each row in a column
cluster and the average of each column in a row cluster.
• E[Z|
ˆ
U,
ˆ
V ]: co-cluster averages
2.2 Missing value prediction based on Singu-
lar Value Decomposition (SVD)
SVD is a popular matrix factorization technique which has
been used a lot in data mining, especially in dimensionality
reduction. Given a m × n matrix R, SVD decomposes that
matrix into:
R = U · S · V