没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
The Learning Behind Gmail Priority Inbox
Douglas Aberdeen Ondrej Pacovsky Andrew Slater
Google Inc.
Zurich, Switzerland
{daa,ondrej,aws}@google.com
Abstract
The Priority Inbox feature of Gmail ranks mail by the probability that the user
will perform an action on that mail. Because “importance” is highly personal,
we try to predict it by learning a per-user statistical model, updated as frequently
as possible. This research note describes the challenges of online learning over
millions of models, and the solutions adopted.
1 The Gmail Priority Inbox
Many Gmail users receive tens or hundreds of mails per day. The Priority Inbox attempts to alleviate
such information overload by learning a per-user statistical model of importance, and ranking mail
by how likely the user is to act on that mail. This is not a new problem [3, 4], however to do this
at scale, performing real-time ranking and near-online updating of millions of models per day sig-
nificantly complicates the problem. The challenges include inferring the importance of mail without
explicit user labelling; finding learning methods that deal with non-stationary and noisy training
data; constructing models that reduce training data requirements; storing and processing terabytes
of per-user feature data; and finally, predicting in a distributed and fault tolerant way.
While ideas were borrowed from the application of ML in Gmail spam detection [6], importance
ranking is harder as users disagree on what is important, requiring a high degree of personalization.
The result is one of the largest and most user facing applications of ML at Google.
2 The Learning Problem
2.1 Features
There are many hundred features falling into a few categories. Social features are based on the degree
of interaction between sender and recipient, e.g. the percentage of a sender’s mail that is read by the
recipient. Content features attempt to identify headers and recent terms that are highly correlated
with the recipient acting (or not) on the mail, e.g. the presence of a recent term in the subject. Recent
user terms are discovered as a pre-processing step prior to learning. Thread features note the user’s
interaction with the thread so far, e.g. if a user began a thread. Label features examine the labels that
the user applies to mail using filters. We calculate feature values during ranking and we temporarily
store those values for later learning. Continuous features are automatically partitioned into binary
features using a simple ID3 style algorithm on the histogram of the feature values.
2.2 Importance Metric
A goal of Priority Inbox is to rank without explicit labelling from the user, allowing the system
to work “out-of-the-box”. Importance ground truth is based on how the user interacts with a mail
after delivery. Our goal is to predict the probability that the user will interact with the mail within
1
资源评论
- zhongguono1zhiwang2017-09-28还没看到效果
- universe_forever2015-11-10很有趣的论文
- 凡人阿彬2015-07-01交代的还是很清楚的,蛮好的
happylilly
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功