Deep Learning based Recommender System: A Survey and New Perspectives • 1:5
incepted in Computer Vision and Natural Language Processing domains. However, it has also been an
emerging trend in deep recommender system research.
•
Deep Reinforcement Learning (DRL) [
106
]. Reinforcement learning operates on a trial-and-error paradigm.
e whole framework mainly consists of the following components: agents, environments, states, actions
and rewards. e combination between deep neural networks and reinforcement learning formulate
DRL which have achieved human-level performance across multiple domains such as games and self-
driving cars. Deep neural networks enable the agent to get knowledge from raw data and derive ecient
representations without handcraed features and domain heuristics.
Note that there are numerous advanced model emerging each year, here we only briey listed some important
ones. Readers who are interested in the details or more advanced models are referred to [45].
2.3 Why Deep Neural Networks for Recommendation?
Before diving into the details of recent advances, it is benecial to understand the reasons of applying deep
learning techniques to recommender systems. It is evident that numerous deep recommender systems have
been proposed in a short span of several years. e eld is indeed bustling with innovation. At this point, it
would be easy to question the need for so many dierent architectures and/or possibly even the utility of neural
networks for the problem domain. Along the same tangent, it would be apt to provide a clear rationale of why
each proposed architecture and to which scenario it would be most benecial for. All in all, this question is highly
relevant to the issue of task, domains and recommender scenarios. One of the most aractive properties of neural
architectures is that they are (1) end-to-end dierentiable and (2) provide suitable inductive biases catered to the
input data type. As such, if there is an inherent structure that the model can exploit, then deep neural networks
ought to be useful. For instance, CNNs and RNNs have long exploited the instrinsic structure in vision (and/or
human language). Similarly, the sequential structure of session or click-logs are highly suitable for the inductive
biases provided by recurrent/convolutional models [56, 143, 175].
Moreover, deep neural networks are also composite in the sense that multiple neural building blocks can be
composed into a single (gigantic) dierentiable function and trained end-to-end. e key advantage here is when
dealing with content-based recommendation. is is inevitable when modeling users/items on the web, where
multi-modal data is commonplace. For instance, when dealing with textual data (reviews [
202
], tweets [
44
]
etc.), image data (social posts, product images), CNNs/RNNs become indispensable neural building blocks. Here,
the traditional alternative (designing modality-specic features etc.) becomes signicantly less aractive and
consequently, the recommender system cannot take advantage of joint (end-to-end) representation learning. In
some sense, developments in the eld of recommender systems are also tightly coupled with advances research in
related modalities (such as vision or language communities). For example, to process reviews, one would have to
perform costly preprocessing (e.g., keyphrase extraction, topic modeling etc.) whilst newer deep learning-based
approaches are able to ingest all textual information end-to-end [
202
]. All in all, the capabilities of deep learning
in this aspect can be regarded as paradigm-shiing and the ability to represent images, text and interactions in a
unied joint framework [197] is not possible without these recent advances.
Pertaining to the interaction-only seing (i.e., matrix completion or collaborative ranking problem), the key
idea here is that deep neural networks are justied when there is a huge amount of complexity or when there is
a large number of training instances. In [
53
], the authors used a MLP to approximate the interaction function
and showed reasonable performance gains over traditional methods such as MF. While these neural models
perform beer, we also note that standard machine learning models such as BPR, MF and CML are known to
perform reasonably well when trained with momentum-based gradient descent on interaction-only data [
145
].
However, we can also consider these models to be also neural architectures as well, since they take advantage of
recent deep learning advances such as Adam, Dropout or Batch Normalization [
53
,
195
]. It is also easy to see that,
traditional recommender algorithms (matrix factorization, factorization machines, etc.) can also be expressed
ACM Computing Surveys, Vol. 1, No. 1, Article 1. Publication date: July 2018.