没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Active Learning Literature Survey
Burr Settles
Computer Sciences Technical Report 1648
University of Wisconsin–Madison
Updated on: January 26, 2010
Abstract
The key idea behind active learning is that a machine learning algorithm can
achieve greater accuracy with fewer training labels if it is allowed to choose the
data from which it learns. An active learner may pose queries, usually in the form
of unlabeled data instances to be labeled by an oracle (e.g., a human annotator).
Active learning is well-motivated in many modern machine learning problems,
where unlabeled data may be abundant or easily obtained, but labels are difficult,
time-consuming, or expensive to obtain.
This report provides a general introduction to active learning and a survey of
the literature. This includes a discussion of the scenarios in which queries can
be formulated, and an overview of the query strategy frameworks proposed in
the literature to date. An analysis of the empirical and theoretical evidence for
successful active learning, a summary of problem setting variants and practical
issues, and a discussion of related topics in machine learning research are also
presented.
Contents
1 Introduction 3
1.1 What is Active Learning? . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Active Learning Examples . . . . . . . . . . . . . . . . . . . . . 5
1.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Scenarios 8
2.1 Membership Query Synthesis . . . . . . . . . . . . . . . . . . . . 9
2.2 Stream-Based Selective Sampling . . . . . . . . . . . . . . . . . 10
2.3 Pool-Based Sampling . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Query Strategy Frameworks 12
3.1 Uncertainty Sampling . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Query-By-Committee . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Expected Model Change . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Expected Error Reduction . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Density-Weighted Methods . . . . . . . . . . . . . . . . . . . . . 25
4 Analysis of Active Learning 26
4.1 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Problem Setting Variants 30
5.1 Active Learning for Structured Outputs . . . . . . . . . . . . . . 30
5.2 Active Feature Acquisition and Classification . . . . . . . . . . . 32
5.3 Active Class Selection . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 Active Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Practical Considerations 34
6.1 Batch-Mode Active Learning . . . . . . . . . . . . . . . . . . . . 35
6.2 Noisy Oracles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Variable Labeling Costs . . . . . . . . . . . . . . . . . . . . . . . 37
6.4 Alternative Query Types . . . . . . . . . . . . . . . . . . . . . . 39
6.5 Multi-Task Active Learning . . . . . . . . . . . . . . . . . . . . . 42
6.6 Changing (or Unknown) Model Classes . . . . . . . . . . . . . . 43
6.7 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1
7 Related Research Areas 44
7.1 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 44
7.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 45
7.3 Submodular Optimization . . . . . . . . . . . . . . . . . . . . . . 46
7.4 Equivalence Query Learning . . . . . . . . . . . . . . . . . . . . 47
7.5 Model Parroting and Compression . . . . . . . . . . . . . . . . . 47
8 Conclusion and Final Thoughts 48
Bibliography 49
2
1 Introduction
This report provides a general review of the literature on active learning. There
have been a host of algorithms and applications for learning with queries over
the years, and this document is an attempt to distill the core ideas, methods, and
applications that have been considered by the machine learning community. To
make this survey more useful in the long term, an online version will be updated
and maintained indefinitely at:
http://active-learning.net/
When referring to this document, I recommend using the following citation:
Burr Settles. Active Learning Literature Survey. Computer Sciences Tech-
nical Report 1648, University of Wisconsin–Madison. 2009.
An appropriate BIBT
E
X entry is:
@techreport{settles.tr09,
Author = {Burr Settles},
Institution = {University of Wisconsin--Madison},
Number = {1648},
Title = {Active Learning Literature Survey},
Type = {Computer Sciences Technical Report},
Year = {2009},
}
This document is written for a machine learning audience, and assumes the reader
has a working knowledge of supervised learning algorithms (particularly statisti-
cal methods). For a good introduction to general machine learning, I recommend
Mitchell (1997) or Duda et al. (2001). I have strived to make this review as com-
prehensive as possible, but it is by no means complete. My own research deals pri-
marily with applications in natural language processing and bioinformatics, thus
much of the empirical active learning work I am familiar with is in these areas.
Active learning (like so many subfields in computer science) is rapidly growing
and evolving in a myriad of directions, so it is difficult for one person to provide
an exhaustive summary. I apologize for any oversights or inaccuracies, and en-
courage interested readers to submit additions, comments, and corrections to me
at: bsettles@cs.cmu.edu.
3
剩余66页未读,继续阅读
资源评论
cwbforever
- 粉丝: 1
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功