icml2008英文论文资源-CSDN文库

icml

需积分: 3 117 浏览量 2011-10-19 15:33:48 上传评论收藏 846KB PDF 举报

资源详情

资源评论

Large Graph Construction for Scalable Semi-Supervised Learning

Wei Liu wliu@ee.columbia.edu

Junfeng He jh2700@columbia.edu

Shih-Fu Chang sfchang@ee.columbia.edu

Department of Electrical Engineering, Columbia University, New York, NY 10027, USA

Abstract

In this paper, we address the scalability issue

plaguing graph-based semi-supervised learn-

ing via a small number of anchor points which

adequately cover the entire point cloud. Crit-

ically, these anchor points enable nonpara-

metric regression that predicts the label for

each data point as a locally weighted av-

erage of the labels on anchor points. Be-

cause conventional graph construction is inef-

ﬁcient in large scale, we propose to construct

a tractable large graph by coupling anchor-

based label prediction and adjacency matrix

design. Contrary to the Nystr¨om approxi-

mation of adjacency matrices which results

in indeﬁnite graph Laplacians and in turn

leads to potential non-convex optimization

over graphs, the proposed graph construction

approach based on a unique idea called An-

chorGraph provi de s nonnegative adjacency

matrices to guarantee positive semideﬁnite

graph Laplacians. Our approach scales lin-

early with the data size and in practice usu-

ally produces a large sparse gr aph . Experi-

ments on large datasets demonstrate the sig-

niﬁcant accuracy improvement and scalabil-

ity of the proposed approach.

1. Introduction

In pervasive applications of machine learning, one fre-

quently encounters situations where only a few labeled

data are available and large amounts of data remain

unlabeled. The labeled data often suﬀer from diﬃcult

and expensive acqu is it ion whereas the unlabeled data

can be cheaply and automatically gathered. Semi-

supervised learning (SSL) (

Chapelle et al., 2006)(Zhu,

Appearing in Proceedings of the 27

International Confer-

ence on Machine Learning, Haifa, Isr ael, 2010. Copyright

2010 by the author(s)/owner(s).

2008) has been recommended to cope with the very

situations of limited labeled data and abundant unla-

beled data.

With rapid development of the Internet, now we can

collect massive (up to hundreds of millions) unlabeled

data such as images and videos, and then the need for

large scale SSL arises. Unfortunately, most SSL meth-

ods scale badly with the data size n. For instance, the

classical TSVM (

Joachims, 1999) is computationally

challenging, scaling exponentially with n. Among vari-

ous versions of TSVM, CCCP-TSVM (

Collobert et al.,

2006) has the lowest complexity, but it scales as at

least O(n

) so it is still diﬃcult to scale up.

Graph-based SSL (

Zhu et al., 2003)(Zhou et al., 2004)

(Belkin et al., 2006) is appealing recently because it is

easy to implement and gives rise to closed-form solu-

tions. However, graph-based SSL usually has a cubic

time complexity O(n

) since the inverse of th e n × n

graph Lapl acian is needed

, thus blocking widespread

applicability to real-life problems that encounter grow-

ing amounts of unlabe led data. To temper the cubic

time complexity, recent studies seek to reduce the in-

tensive computation upon the graph Laplacian manip-

ulation. (

Delalleu et al., 2005) proposed a nonpara-

metric inductive function which makes label predic-

tion based on a subset of samples and then truncates

the graph Laplacian with the selected subset and its

connection to the rest samples. Clearly, such a trunca-

tion ignores the topology structure within the major-

ity part of input data and thereby loses considerable

data information. (

Zhu & Laﬀerty, 2005) ﬁtted a gen-

erative mixture model to the raw data and proposed

the h ar monic mix tu r es to span the label prediction

function, but it did not explain how to construct a

large sparse graph such that the proposed harmonic

mixtures meth od can be scalable. (

Tsang & Kwok,

2007) scaled up the manifold regularization technology

ﬁrst proposed in (Belkin et al., 2006) through solving

It is not easy yet to exactly solve the equivalent large-

scale linear systems.

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

评论收藏

内容反馈

icml 2008 英文论文

评论0

最新资源

icml 2008 英文论文

评论0

最新资源

相关推荐

icml2020文章列表及下载链接.zip

icml 2018年 会议文章目录（含文章下载链接）

ICML 2019年 会议文章目录 （含论文下载链接）

icml 2016年 会议文章目录

ICML 2013国际会议论文集论文

ICML2020论文列表与下载链接爬虫

ICML2019 (6).zip

ICML2015 第三部分

ICML2015 第二部分

ICML2015 第四部分

ICML2023_Tutorial.pdf

ICML2020-2.zip

蚂蚁金服人工智能部研究员ICML贡献论文07.pdf

ICML2015 第一部分

蚂蚁金服人工智能部研究员ICML贡献论文01.pdf

ICML 2014 机器学习国际会议论文集

ICML2020-1.zip

icml 2017年 会议文章目录

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

icml 2018年会议文章目录（含文章下载链接）

ICML 2019年会议文章目录（含论文下载链接）

icml 2016年会议文章目录

icml 2017年会议文章目录

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar