谷歌智能邮箱思想介绍thelearningbehindgmailpriorityinbox资源-CSDN文库

4星 · 超过85%的资源需积分: 10 99 浏览量 2014-08-06 16:13:14 上传评论收藏 261KB PDF 举报

资源推荐

资源详情

资源评论

The Learning Behind Gmail Priority Inbox

Douglas Aberdeen Ondrej Pacovsky Andrew Slater

Google Inc.

Zurich, Switzerland

{daa,ondrej,aws}@google.com

Abstract

The Priority Inbox feature of Gmail ranks mail by the probability that the user

will perform an action on that mail. Because “importance” is highly personal,

we try to predict it by learning a per-user statistical model, updated as frequently

as possible. This research note describes the challenges of online learning over

millions of models, and the solutions adopted.

1 The Gmail Priority Inbox

Many Gmail users receive tens or hundreds of mails per day. The Priority Inbox attempts to alleviate

such information overload by learning a per-user statistical model of importance, and ranking mail

by how likely the user is to act on that mail. This is not a new problem [3, 4], however to do this

at scale, performing real-time ranking and near-online updating of millions of models per day sig-

niﬁcantly complicates the problem. The challenges include inferring the importance of mail without

explicit user labelling; ﬁnding learning methods that deal with non-stationary and noisy training

data; constructing models that reduce training data requirements; storing and processing terabytes

of per-user feature data; and ﬁnally, predicting in a distributed and fault tolerant way.

While ideas were borrowed from the application of ML in Gmail spam detection [6], importance

ranking is harder as users disagree on what is important, requiring a high degree of personalization.

The result is one of the largest and most user facing applications of ML at Google.

2 The Learning Problem

2.1 Features

There are many hundred features falling into a few categories. Social features are based on the degree

of interaction between sender and recipient, e.g. the percentage of a sender’s mail that is read by the

recipient. Content features attempt to identify headers and recent terms that are highly correlated

with the recipient acting (or not) on the mail, e.g. the presence of a recent term in the subject. Recent

user terms are discovered as a pre-processing step prior to learning. Thread features note the user’s

interaction with the thread so far, e.g. if a user began a thread. Label features examine the labels that

the user applies to mail using ﬁlters. We calculate feature values during ranking and we temporarily

store those values for later learning. Continuous features are automatically partitioned into binary

features using a simple ID3 style algorithm on the histogram of the feature values.

2.2 Importance Metric

A goal of Priority Inbox is to rank without explicit labelling from the user, allowing the system

to work “out-of-the-box”. Importance ground truth is based on how the user interacts with a mail

after delivery. Our goal is to predict the probability that the user will interact with the mail within

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余3页未读，立即下载

评论收藏

内容反馈

zhongguono1zhiwang

2017-09-28

还没看到效果
universe_forever

2015-11-10

很有趣的论文
凡人阿彬

2015-07-01

交代的还是很清楚的，蛮好的

happylilly

粉丝: 0
资源: 1

谷歌智能邮箱思想介绍the learning behind gmail priority inbox

最新资源

谷歌智能邮箱思想介绍the learning behind gmail priority inbox

Machine Learning for Email(O'Reilly,2011)

deep q_learning

SetPriority.zip_priority_priority list_zip

Android Priority Job Queue

stl-huffman.rar_The First_huffman_huffman priority

matlab开发-priority

STL中priority_queue

priority.zip_priority

比较QoS服务策略的bandwidth和priority命令

Android代码-android-priority-jobqueue

Android代码-dashclock

Laravel开发-route-priority

priority-navigation, Priority+导航—的Javascript实现无相关性.zip

C++ 用linked list写priority queue

前端开源库-js-priority-queue

Transit Signal Priority (TSP) 公交优先信号手册.pdf

自己编写优类似优先队列数据（priority_queue）的功能

Improving Palliative Care with Deep Learning

Windows Internals 5th

Vector Davinci官方帮助配置使用手册（AutoSAR）.pdf

c++入门，核心，提高讲义笔记

数字图像处理 冈萨雷斯 课后习题

离散数学及其应用 第八版 奇数编号练习答案.pdf

科研伦理与学术规范 期末考试2 （40题）.pdf

最值得收藏的 考研线性代数 全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx

软件著作权设计说明书模板（含填写说明）.docx

AUTOSAR培训教材.rar

菜菜sklearn课程讲义.rar

“互联网+”大学生创新创业大赛项目计划书

最新资源

数字图像处理冈萨雷斯课后习题

离散数学及其应用第八版奇数编号练习答案.pdf

科研伦理与学术规范期末考试2 （40题）.pdf

最值得收藏的考研线性代数全部知识点思维导图整理(张宇, 汤家凤), 附带惯用思维/做题技巧/易错点整理.emmx