基于稀疏流形聚类嵌入模型和资源-CSDN文库

83 浏览量 2021-01-14 12:56:16 上传评论收藏 1.19MB PDF 举报

资源推荐

资源详情

资源评论

第 29 卷第 6 期

Vol. 29 No. 6

控制与决策

Control and Decision

2014 年 6 月

Jun. 2014

基于稀疏流形聚类嵌入模型和 𝐿

范数正则化的标签错误检测

文章编号: 1001-0920 (2014) 06-1103-06 DOI: 10.13195/j.kzyjc.2013.0318

夏建明, 杨俊安

(合肥电子工程学院 a. 通信对抗系，b. 安徽省电子制约技术重点实验室，合肥 230037)

摘要: 综合利用含错标签中的有用信息和数据结构中蕴含的鉴别信息, 提出一种基于稀疏流形聚类嵌入模型

和 𝐿

范数正则化的标签错误检测修正方法. 首先, 用稀疏流形聚类嵌入模型将数据投影到易分类的空间, 利用标注

正确的极少量样本和最近邻分类器获得新标签; 然后, 构造标签错误检测模型, 获得仅含 0、1 元素的检测向量, 正

确、错误的标签分别对应着 1、0 的位置; 最后, 给出了相应的优化算法及收敛证明, 并在相关实验上验证了算法的有

效性.

关键词: 标签错误；稀疏流形聚类嵌入；𝐿

范数正则化；凸松弛

中图分类号: TP181 文献标志码: A

Labeling errors detecting and correcting algorithm based on sparse

manifold clustering and embedding and 𝐿

norm regularization

XIA Jian-ming, YANG Jun-an

(a. Department of Communication Countermeasure，b. Key Laboratory of Electronic Restriction，Electronic Engineering

Institute，Hefei 230037，China．Correspondent：XIA Jian-ming，E-mail：jianmingeei@163.com)

Abstract: As to detect and correct the labeling errors, a labeling errors detecting and correcting algorithm based on sparse

manifold clustering and embedding and 𝐿

norm regularization is proposed. The proposed algorithm is based on the useful

information in the original labels and the natural discriminating information which is contained in the data structure. Firstly,

the original data are projected to the new space by using the sparse manifold clustering and embedding model. Then, a

nearest neighbor classiﬁer and a very small amount samples which are labeled correctly are used to obtain new labels for the

original data. Meanwhile, the constructing labeling error detection model is built and then the sparse label detection vector

which consists of 0 and 1 is obtained to modify the detection errors. The inaccurate and accurate labels correspond to 0 and

1 in the label detection vector respectively. Finally, the convex optimization scheme is introduced to solve the optimization

problem and the convergence proofs are given. The experiment results show the effectiveness of the proposed algorithm

based on the artiﬁcial data of complex manifold structure and the typical low-dimensional, high-dimensional data.

Key words: labeling errors；sparse manifold clustering and embedding；𝐿

norm regularization；convex relaxation

0 引引引言言言

信息社会中, 生物、军事、经济等领域的数据爆

炸性增长给相应的机器学习算法带来了极大的挑战.

监督型学习算法通过处理已标签的样本获得分类准

则, 如果忽略学习策略的影响, 则分类准则的好坏将

严重依赖于样本的质量. 能否获得高质量的训练数据

已成为决定机器学习效果好坏的一个重要条件. 传统

的算法往往假设样本标签是正确的, 但在实际问题中,

由于录入错误、缺乏有效信息等原因, 标签往往会发

生错误, 而标签错误对分类准则的影响要更甚于属性

中的噪声影响, 会显著恶化学习的效果

[1-3]

传统的监督学习算法或简单地忽视了标签错误,

或者假设算法对标签错误具有一定的鲁棒性

[4]

. 在标

签出错的条件下, 有几类获得分类准则的方法: 1) 数

据预处理的方法, 它是最直接简单的方法, 在数据进

入分类器前进行置信度的分配和过滤, 将标签错误数

据移除或是重新进行标注

[5]

, 但是这种方法有可能剔

除有用信息, 尤其是在训练样本规模较小的情况下;

2) 变精度粗糙集方法, 通过引入一些附加的参数来增

强算法对标签错误的鲁棒性

[6-7]

; 3) 多事例学习的框

收稿日期: 2013-03-24；修回日期: 2013-12-04.

基金项目: 国家自然科学基金项目(61272333)；安徽省自然科学基金项目(1208085MF94, 1308085QF99).

作者简介: 夏建明(1982−), 男, 博士, 从事数据挖掘、机器学习的研究；杨俊安(1965−), 男, 教授, 博士生导师, 从事信

号处理、智能计算等研究.

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余5页未读，立即下载

评论收藏

内容反馈

weixin_38729269

粉丝: 4
资源: 851

基于稀疏流形聚类嵌入模型和

基于稀疏流形聚类嵌入模型和L_1范数正则化的标签错误检测

基于鉴别稀疏保持嵌入的人脸识别算法.pdf

流形距离与压缩感知核稀疏投影的局部线性嵌入算法.pdf

主成分分析-数据的多流形结构分析 (2).pdf

主成分分析-数据的多流形结构分析 .pdf

数据的多流形结构分析

拟合算法-数据的多流形结构分析.pdf

基于核稀疏子空间聚类的PolSAR影像的无监督分类

Stiefel流形上的稀疏多标签双线性嵌入

基于稀疏保持判别嵌入的人脸识别.pdf

基于线性嵌入和张量流形的高光谱特征提取

基于块非负稀疏重构嵌入的高光谱数据降维

计算机研究 -非负矩阵分解在聚类中的应用研究.pdf

2022研究生数学建模竞赛B题终稿.pdf

格拉斯曼流形上的低秩表示

多流形聚类的图约束非参数生成模型

基于非凸复合函数的稀疏信号恢复算法.docx

故障时间数据的半监督学习.pptx

数学建模竞赛

基于谱正则化的非线性判别聚类

最新资源