ffm英文文献资源-CSDN文库

需积分: 9 118 浏览量 2019-03-03 22:14:18 上传评论收藏 361KB PDF 举报

在给出的文件内容中，关于ffm（Field-aware Factorization Machines，也称为场感知因子分解机）的英文文献，我们可以从中提炼出以下知识点。 ### FFM模型原理 1. **FFM模型背景**：在计算广告领域，点击率（CTR）预测是一个重要任务。基于二阶多项式映射和因子分解机（Factorization Machines，简称FMs）的模型被广泛应用于此任务。FFM是一种FMs的变体，最近在一些世界范围内的CTR预测竞赛中表现出色。 2. **FFM模型特点**：FFM模型特别适用于处理大规模稀疏数据，包括CTR预测数据。FFM通过为每个特征定义“场”（field），并对不同的场之间的交互作用进行建模来改进性能。在FFM中，“场”指的是特征所在的域，例如用户信息、广告信息或上下文信息等。 ### 数学原理和公式推导 3. **FFM模型推导**：FFM通过引入一个场的因子分解结构，对模型的预测函数进行优化。给定数据集中的实例 (yi, xi)，其中 yi 是标签，xi 是 n 维特征向量，FFM的模型通过解决以下优化问题获得： \[ \min_w \frac{\lambda}{2} ||w||^2_2 + \sum_{i=1}^{m}\log(1+\exp(-y_iφ_{LM}(w, xi))) \] 其中，$ w $ 是模型参数向量，$ λ $ 是正则化项，$ φ_{LM} $ 表示逻辑回归模型的特征映射函数。该优化问题通常通过梯度下降法或其他优化技术求解。 ### 正则化方法 4. **正则化技术**：在模型训练过程中，正则化方法被用于防止模型过拟合。正则化项 $ \frac{\lambda}{2} ||w||^2_2 $ 在目标函数中起到了平衡模型复杂度和拟合数据间的关系的作用。适当的正则化强度有助于模型在训练集和测试集上都有良好的泛化能力。 ### 与SVM的对比 5. **模型对比分析**：FFM模型与支持向量机（SVM）在CTR预测等分类问题上的效果进行了比较。SVM是另一种常用的机器学习模型，特别在小规模数据集上表现良好。但是，FFM通过其场感知机制在处理大规模稀疏数据时能够更好地捕捉特征间的高阶交互作用，因此在某些特定的分类问题上，FFM能够达到比SVM更好的效果。 ### 实现和应用 6. **FFM的高效实现**：文献中还讨论了FFM的高效实现方式，包括对于训练FFM的算法的优化，以提升计算效率。 7. **软件包的发布**：作者们公开了他们实现FFM的软件包，使得该技术可以被更广泛地应用在实际中，比如在计算广告领域。 ### 结论 8. **FFM的实用性**：实验表明，FFM对于某些分类问题非常有用。通过仔细设计和优化，FFM能够在现实世界的大规模问题上取得优越的性能。通过文献“Field-aware Factorization Machines for CTR Prediction”我们可以了解到FFM模型在CTR预测任务中的应用和效能，以及它的数学原理、正则化技术、与SVM的对比和高效实现方法。这篇文献不仅为FFM模型的研究和应用提供了理论支持，也为计算广告领域提供了实用的工具。

资源推荐

资源详情

资源评论

Field-aware Factorization Machines for CTR Prediction

Yuchin Juan

Criteo Research

∗

Palo Alto, CA

yc.juan@criteo.com

Yong Zhuang

Dept. of ECE

∗

Carnegie Mellon Univ., USA

yong.zhuang22@gmail.com

Wei-Sheng Chin

Dept. of Computer Science

National Taiwan Univ., Taiwan

d01944006@csie.ntu.edu.tw

Chih-Jen Lin

Dept. of Computer Science

National Taiwan Univ., Taiwan

cjlin@csie.ntu.edu.tw

ABSTRACT

Click-through rate (CTR) prediction plays an important role

in computational advertising. Models based on degree-2

polynomial mappings and factorization machines (FMs) are

widely used for this task. Recently, a variant of FMs, ﬁeld-

aware factorization machines (FFMs), outperforms existing

models in some world-wide CTR-prediction competitions.

Based on our experiences in winning two of them, in this

paper we establish FFMs as an eﬀective method for clas-

sifying large sparse data including those from CTR predic-

tion. First, we propose eﬃcient implementations for training

FFMs. Then we comprehensively analyze FFMs and com-

pare this approach with competing models. Experiments

show that FFMs are very useful for certain classiﬁcation

problems. Finally, we have released a package of FFMs for

public use.

Keywords

Machine learning; Click-through rate prediction; Computa-

tional advertising; Factorization machines

1. INTRODUCTION

Click-through rate (CTR) prediction plays an important

role in advertising industry [1, 2, 3]. Logistic regression is

probably the most widely used model for this task [3]. Given

a data set with m instances (y

, x

), i = 1, . . . , m, where y

is the label and x

is an n-dimensional feature vector, the

model w is obtained by solving the following optimization

problem.

min

kwk

i=1

log(1 + exp(−y

(w, x

))). (1)

∗

Part of the work was done when these authors were in

National Taiwan University.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

RecSys ’16, September 15-19, 2016, Boston , MA, USA

 2016 ACM. ISBN 978-1-4503-4035-9/16/09. . . $15.00

DOI: http://dx.doi.org/10.1145/2959100.2959134

Publisher Advertiser

+80 −20 ESPN Nike

+10 −90 ESPN Gucci

+0 −1 ESPN Adidas

+15 −85 Vogue Nike

+90 −10 Vogue Gucci

+10 −90 Vogue Adidas

+85 −15 NBC Nike

+0 −0 NBC Gucci

+90 −10 NBC Adidas

Table 1: An artiﬁcial CTR data set, where + (−) represents

the number of clicked (unclicked) impressions.

In problem (1), λ is the regularization parameter, and in the

loss function we consider the linear model:

(w, x) = w · x.

Learning the eﬀect of feature conjunctions seems to be

crucial for CTR prediction; see, for example, [1]. Here, we

consider an artiﬁcial data set in Table 1 to have a better

understanding of feature conjunctions. An ad from Gucci

has a particularly high CTR on Vogue. This information

is however diﬃcult for linear models to learn because they

learn the two weights Gucci and Vogue separately. To ad-

dress this problem, two models have been used to learn the

eﬀect of feature conjunction. The ﬁrst model, degree-2 poly-

nomial mappings (Poly2) [4, 5], learns a dedicate weight for

each feature conjunction. The second model, factorization

machines (FMs) [6], learns the eﬀect of feature conjunction

by factorizing it into a product of two latent vectors. We

will discuss details about Poly2 and FMs in Section 2.

A variant of FM called pairwise interaction tensor factor-

ization (PITF) [7] was proposed for personalized tag recom-

mendation. In KDD Cup 2012, a generalization of PITF

called “factor model” was proposed by “Team Opera Solu-

tions” [8]. Because this term is too general and may easily

be confused with factorization machines, we refer to it as

“ﬁeld-aware factorization machines” (FFMs) in this paper.

The diﬀerence between PITF and FFM is that PITF con-

siders three special ﬁelds including “user,”“item,” and “tag,”

while FFM is more general. Because [8] is about the over-

all solution for the competition, its discussion of FFM is

limited. We can conclude the following results in [8]:

1. They use stochastic gradient method (SG) to solve the

optimization problem. To avoid over-ﬁtting, they only

train with one epoch.

2. FFM performs the best among six models they tried.

In this paper, we aim to concretely establish FFM as an

eﬀective approach for CTR prediction. Our major results

are as follows.

• Though FFM is shown to be eﬀective in [8], this work

may be the only published study of applying FFMs on

CTR prediction problems. To further demonstrate the

eﬀectiveness of FFMs on CTR prediction, we present the

use of FFM as our major model to win two world-wide

CTR competitions hosted by Criteo and Avazu.

• We compare FFMs with two related models, Poly2 and

FMs. We ﬁrst discuss conceptually why FFMs might be

better than them, and conduct experiments to see the

diﬀerence in terms of accuracy and training time.

• We present techniques for training FFMs. They include

an eﬀective parallel optimization algorithm for FFMs and

the use of early-stopping to avoid over-ﬁtting.

• To make FFMs available for public use, we release an open

source software.

This paper is organized as follows. Before we present

FFMs and its implementation in Section 3, we discuss the

two existing models Poly2 and FMs in Section 2. Experi-

ments comparing FFMs with other models are in Section 4.

Finally, conclusions and future directions are in Section 5.

Code used for experiments in this paper and the package

LIBFFM are respectively available at:

http://www.csie.ntu.edu.tw/˜cjlin/ﬀm/exps

http://www.csie.ntu.edu.tw/˜cjlin/libﬀm

2. POLY2 AND FM

Chang et. al [4] have shown that a degree-2 polynomial

mapping can often eﬀectively capture the information of fea-

ture conjunctions. Further, they show that by applying a

linear model on the explicit form of degree-2 mappings, the

training and test time can be much faster than using ker-

nel methods. This approach, referred to as Poly2, learns a

weight for each feature pair:

Poly2

(w, x) =

h(j

)

, (2)

where h(j

, j

) is a function encoding j

and j

into a natural

number. The complexity of computing (2) is O(¯n

), where

¯n is the average number of non-zero elements per instance.

FMs proposed in [6] implicitly learn a latent vector for

each feature. Each latent vector contains k latent factors,

where k is a user-speciﬁed parameter. Then, the eﬀect of

feature conjunction is modelled by the inner product of two

latent vectors:

(w, x) =

· w

. (3)

The number of variables is n ×k, so directly computing (3)

costs O(¯n

k) time. Following [6], by re-writing (3) to

(w, x) =

j=1

(s − w

) · w

where

s =

the complexity is reduced to O(¯nk).

Rendle [6] explains why FMs can be better than Poly2

when the data set is sparse. Here we give a similar illus-

tration using the data set in Table 1. For example, there

is only one negative training data for the pair (ESPN, Adi-

das). For Poly2, a very negative weight w

ESPN,Adidas

might

be learned for this pair. For FMs, because the prediction of

(ESPN, Adidas) is determined by w

ESPN

· w

Adidas

, and be-

cause w

ESPN

and w

Adidas

are also learned from other pairs

(e.g., (ESPN, Nike), (NBC, Adidas)), the prediction may be

more accurate. Another example is that there is no training

data for the pair (NBC, Gucci). For Poly2, the prediction on

this pair is trivial, but for FMs, because w

NBC

and w

Gucci

can be learned from other pairs, it is still possible to do

meaningful prediction.

Note that in Poly2, the naive way to implement h(j

, j

)

is to consider every pair of features as a new feature [4].

This approach requires the model as large as O(n

), which is

usually impractical for CTR prediction because of very large

n. Vowpal Wabbit (VW) [9], a widely used machine learning

package, solves this problem by hashing j

and j

Our

implementation is similar to VW’s approach. Speciﬁcally,

h(j

, j

) = (

+ j

)(j

+ j

+ 1) + j

) mod B,

where the model size B is a user-speciﬁed parameter.

In this paper, for the simplicity of formulations, we do not

include linear terms and bias term. However, in Section 4,

we include them for some experiments.

3. FFM

The idea of FFM originates from PITF [7] proposed for

recommender systems with personalized tags. In PITF, they

assume three available ﬁelds including User, Item, and Tag,

and factorize (User, Item), (User, Tag), and (Item, Tag) in

separate latent spaces. In [8], they generalize PITF for more

ﬁelds (e.g., AdID, AdvertiserID, UserID, QueryID) and ef-

fectively apply it on CTR prediction. Because [7] aims at

recommender systems and is limited to three speciﬁc ﬁelds

(User, Item, and Tag), and [8] lacks detailed discussion on

FFM, in this section we provide a more comprehensive study

of FFMs on CTR prediction. For most CTR data sets like

that in Table 1, “features” can be grouped into “ﬁelds.” In

our example, three features ESPN, Vogue, and NBC, belong

to the ﬁeld Publisher, and the other three features Nike,

Gucci, and Adidas, belong to the ﬁeld Advertiser. FFM is

a variant of FM that utilizes this information. To explain

how FFM works, we consider the following new example:

Clicked Publisher (P) Advertiser (A) Gender (G)

Yes ESPN Nike Male

Recall that for FMs, φ

(w, x) is

ESPN

· w

Nike

+ w

ESPN

· w

Male

+ w

Nike

· w

Male

More precisely, [4] includes the original features as well,

though we do not consider such a setting until the experi-

ments.

See http://github.com/JohnLangford/vowpal wabbit/

wiki/Feature-interactions for details.

剩余7页未读，继续阅读

评论收藏

内容反馈

论时间煮雨

粉丝: 1
资源: 13

ffm英文文献

定流量发生器FFM-100.pdf

关于机器学习最前沿的deep learning的相关文献

一篇英文文献

一个英文文献

英文文献1

ISPF英文文献

英文文献翻译

spark-ffm, 关于 Spark，FFM ( 字段Awared分解机).zip

labvIEW英文文献

ffm算法的详细解读.docx

英文文献阅读法

ARM 英文文献

英文原文文献

电子信息英文文献

B/S英文文献

ffmpeg_simple_player_ffm视频wow_ffmplaysimpleplay_ffmpaly_ffmpeg_f

虚拟现实技术文献（英文）

数字图像处理英文文献

深入FFM原理与实践1

首次提出SVM的英文论文，105页pdf

机器学习经典论文（人工智能）

FFM简介及实践1

试题库英文文献

FM24C64中文资料

FFM及DeepFFM模型在推荐系统的应用.pptx

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

用于研究python写界面小工具

最新资源