基于稀疏编码的图像分类资源-CSDN文库

4星 · 超过85%的资源需积分: 13 63 浏览量 2013-04-07 15:57:35 上传评论 2 收藏 1.53MB PDF 举报

### 基于稀疏编码的图像分类技术详解 #### 摘要与背景本文主要探讨了一种基于非负稀疏编码、低秩及稀疏矩阵分解技术（LR-Sc+SPM）的图像分类框架。该方法首先通过非负稀疏编码结合最大池化与空间金字塔匹配（Sc+SPM）来提取图像局部特征的信息，进而表示图像。考虑到同一类别的图像通常包含相关的（或共同的）元素和特定的（或噪声的）元素，因此采用低秩和稀疏矩阵恢复技术来分解每类图像的特征向量，将其分为一个低秩矩阵和一个稀疏误差矩阵。为了进一步融合图像的共性和特性，该方法仍然采用稀疏编码的思想对Sc+SPM表示的每个图像进行再编码，并通过局部约束线性编码（LLC）学习编码参数作为更新后的图像表示。利用线性支持向量机（Linear SVM）进行最终分类。 #### 技术细节 ##### 非负稀疏编码 (Non-negative Sparse Coding) 非负稀疏编码是一种用于表示数据的有效方法，尤其适用于图像处理领域。在图像分类任务中，通过提取图像中的局部特征并对其进行编码，可以有效地捕捉图像的关键信息。与传统的稀疏编码相比，非负稀疏编码具有更好的解释性和直观性，因为它确保了特征向量的所有分量都是非负的，这更符合图像特征的实际表现形式。在本研究中，非负稀疏编码被用来编码局部特征。具体而言，通过训练得到一组非负基向量，然后使用这些基向量对提取到的图像局部特征进行编码。这种编码方式不仅可以减少冗余信息，还能保持重要的视觉特征不变，从而提高图像表示的质量。 ##### 最大池化与空间金字塔匹配 (Max Pooling and Spatial Pyramid Matching, SPM) 最大池化是一种常用的技术，用于降低数据的维度并保留关键信息。空间金字塔匹配则是一种有效的方法，用于处理图像尺度变化的问题。两者结合使用可以有效地捕获不同尺度下的图像特征，并将其整合成一个固定长度的特征向量，用于后续的分类任务。在这个过程中，通过最大池化操作来减少局部特征的数量，同时保留最重要的信息。接着，通过空间金字塔匹配将不同尺度的特征进行融合，形成一个全面的图像表示。这种方法能够增强模型对尺度变化的鲁棒性。 ##### 低秩与稀疏矩阵分解 (Low-Rank and Sparse Decomposition) 低秩与稀疏矩阵分解是另一种关键技术，它假设同一类别内的图像共享某些共性的特征，而特定的或噪声的特征则是稀疏的。通过对图像特征向量进行分解，可以将共性特征表示为低秩矩阵，将特定特征或噪声表示为稀疏矩阵。这种方法有助于去除噪声、突出关键特征，并且能够更好地理解图像的内在结构。通过这种分解，可以在一定程度上增强模型的泛化能力，使其更加稳定可靠。 ##### 局部约束线性编码 (Locality-Constrained Linear Coding, LLC) 局部约束线性编码是一种改进的编码方法，它可以确保所学习的编码参数具有局部性。这意味着，对于同一类别的图像来说，它们之间的编码应该尽可能相似。通过这种方式，可以进一步提高图像表示的一致性和稳定性，从而提升分类性能。在本研究中，通过收集低秩矩阵和稀疏矩阵的列作为基向量，并利用LLC来学习编码参数，这样可以将特征向量转化为更加紧凑和有效的表示形式。 #### 实验结果与分析实验结果显示，该方法在多个基准数据集上取得了与现有技术相当甚至更好的性能。这表明基于非负稀疏编码、低秩及稀疏矩阵分解技术的图像分类框架不仅理论上有吸引力，在实际应用中也非常有效。通过综合利用多种先进的图像处理技术，该方法能够在复杂多变的数据集上实现准确的分类，为计算机视觉领域的研究提供了新的思路和技术支持。基于稀疏编码的图像分类技术是一种创新的方法，它结合了非负稀疏编码、低秩与稀疏矩阵分解以及局部约束线性编码等多种技术，有效地提高了图像分类的准确性。这种综合性的方法不仅拓宽了图像分类的研究领域，也为实际应用场景提供了强大的工具。

资源推荐

资源详情

资源评论

Image Classiﬁcation by Non-Negative Sparse Coding, Low-Rank and Sparse

Decomposition

Chunjie Zhang

, Jing Liu

, Qi Tian

, Changsheng Xu

,Hanqing Lu

, Songde Ma

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,

P.O.Box 2728, Beijing, China

University of Texas at San Antonio, One UTSA Circle, San Antonio Texas, 78249-USA

{cjzhang, jliu, csxu, luhq}@nlpr.ia.ac.cn, qitian@cs.utsa.edu, masd@most.cn

Abstract

We propose an image classiﬁcation framework by lever-

aging the non-negative sparse coding, low-rank and sparse

matrix decomposition techniques (LR-Sc

SPM). First, we

propose a new non-negative sparse coding along with max

pooling and spatial pyramid matching method (Sc

SPM) to

extract local features’ information in order to represent im-

ages, where non-negative sparse coding is used to encode

local features. Max pooling along with spatial pyramid

matching (SPM) is then utilized to get the feature vectors

to represent images. Second, motivated by the observation

that images of the same class often contain correlated (or

common) items and speciﬁc (or noisy) items, we propose to

leverage the low-rank and sparse matrix recovery technique

to decompose the feature vectors of images per class into a

low-rank matrix and a sparse error matrix. To incorporate

the common and speciﬁc attributes into the image represen-

tation, we still adopt the idea of sparse coding to recode

the Sc

SPM representation of each image. In particular,

we collect the columns of the both matrixes as the bases

and use the coding parameters as the updated image repre-

sentation by learning them through the locality-constrained

linear coding (LLC). Finally, linear SVM classiﬁer is lever-

aged for the ﬁnal classiﬁcation. Experimental results show

that the proposed method achieves or outperforms the state-

of-the-art results on several benchmarks.

1. Introduction

As a fundamental problem in computer vision, image

classiﬁcation has attracted a lot of attention in recent years.

Among many image representation models, the bag of vi-

sual words (BoW) model [1] has been widely used by many

researchers [2-4] and shown very good performance. The

BoW model contains mainly two modules: (i) codebook

generation and quantization of features extracted from local

image patches; (ii) histogram based image representation

and prediction. Recently, it has been shown that combining

the two modules with sparse representation is very effective

and can achieve the state-of-the-art performance.

As to the ﬁrst module of the BoW model, k-means is usu-

ally used to generate codebook and quantize visual descrip-

tors extracted from local image patches by nearest-neighbor

search. A histogram is then computed to represent each im-

age by counting the occurrence number of each visual word

within this image. Recently, Yang et al. [4] developed an

extension by generalizing vector quantization to sparse cod-

ing. By using sparse coding instead of k-means, they are

able to learn the optimal codebook and coding parameters

for local features simultaneously, hence are able to reduce

the quantization loss. Multi-scale max pooling is then used

to get the feature representation of images. However, sparse

coding has no constraints on the sign of coding coefﬁcients.

To satisfy the objective of sparse coding, negative coefﬁ-

cients are sometimes needed, while large numbers of zero

coefﬁcients are inevitable. Since non-zero components typ-

ically provide useful information, the encoding process with

max pooling will bring the loss in terms of those negative

components, and further degrade the classiﬁcation perfor-

mance.

Instead of learning sparse representations for local fea-

tures [4], the use of sparse representation for the ﬁnal classi-

ﬁcation has also been widely applied to many visual appli-

cations and can achieve the state-of-the-art performances,

e.g., image restoration [5] and classiﬁcation tasks [6-11].

These holistically sparse representations on the whole im-

age ensure robustness to occlusions and image corruptions.

Training images are often chosen as the bases for sparse rep-

resentation and test images are then classiﬁed by assigning

the class with the lowest reconstruction error. Ideally, a test

image can be reconstructed by the training samples and only

the coefﬁcients of the samples within the same class may be

nonzero. This means a test image can be sufﬁciently recon-

1673

7UDLQLQJLPDJHV

)HDWXUHH[WUDFWLRQ

1RQQHJDWLYH

VSDUVHFRGLQJ

PXOWLVFDOHPD[

SRROLQJ

6SDUVHDQGORZ

UDQNPDWUL[

GHFRPSRVLWLRQ

/RFDOLW\

FRQVWUDLQHG

OLQHDUFRGLQJ

//&

/LQHDU690

FODVVLILHU

Figure 1. The ﬂowchart of the proposed method.

structed by the training images of the same class. However,

since images are often contaminated with noise; besides,

there are often multiple objects in an image with different

poses and occlusions. Sometimes using the training images

as the bases is not discriminative enough to boost the ﬁnal

classiﬁcation performance. Moreover, images of the same

class often share a lot of similarities and correlate with each

other, hence exhibit degenerated structure [6]. This seman-

tic information of images can help make correct classiﬁca-

tion if computed correctly.

In this paper, we propose a new image classiﬁcation

framework by leveraging the non-negative sparse coding,

low-rank and sparse matrix decomposition techniques (LR-

SPM). Figure 1 shows the ﬂowchart of the proposed

method. Our proposed framework consists of two contri-

butions. First, we extend the recent work on image classi-

ﬁcation [4] and present to use non-negative sparse coding

along with max pooling method to reduce the information

loss during the encoding process for image representation.

The second is our main contribution. We propose a new

image classiﬁcation method method by using the low-rank

and sparse matrix decomposition technique. Our work is

motivated by the observation that: (i) images of the same

class often correlate with each other. Ideally, if we stack the

BoW representation of images within the same class into

a matrix, this matrix will be low-rank; (ii) one image con-

tains only a limited number of objects and a limited type

of noise. This results in the characteristics of noise spar-

sity for the stacked BoW matrix. This low-rank and noise

information can be utilized for better image representation

than directly using the BoW representation of training im-

ages. Specially, to get more discriminative sparse coding

bases with the BoW representation of images, we leverage

the low-rank and sparse matrix decomposition technique to

decompose the BoW representation of images within the

same class into a low rank matrix and a sparse error matrix.

We then use these bases to encode the BoW representation

of images with sparsity and locality constraints. These cod-

ing parameters are used to represent images and linear SVM

classiﬁer is then utilized to predict the category labels of im-

ages. Experimental results on four public datasets demon-

strate the effectiveness of the proposed method.

The rest of the paper is organized as follows. Section 2

introduces some related work. Section 3 presents the pro-

posed non-negative sparse coding spatial pyramid match-

ing method (Sc

SPM). Section 4 shows the proposed im-

age classiﬁcation method by low-rank and sparse matrix

decomposition. Experimental results are given in Section

5. Finally we conclude in Section 6.

2. Related Work

The use of the bag-of-visual words (BoW) model [1] has

been proven very useful for image classiﬁcation. Over the

past few years, many works have been done to improve the

performance of the BoW model. Some tried to learn dis-

criminative visual codebooks for image classiﬁcation [12,

13]. Co-occurrence information of visual words was also

modeled in a generative framework [14, 15]. Others tried

to learn discriminative classiﬁers by considering the spatial

information and correlations among visual words [2-4, 7,

10-11]. To overcome the loss of spatial information in the

BoW model, motivated by Grauman and Darrell’s [3] pyra-

mid matching in feature space, Lazebnik et al. [2] proposed

the spatial pyramid matching (SPM). Since its introduction,

SPM has been widely used and proven very effective.

Recently, Yang et al. [4] proposed an extension of the

SPM approach by leveraging sparse coding and achieved

the state-of-the-art performance for image classiﬁcation

when only one type of local feature (SIFT) is used. This

method can automatically learn the optimal codebook and

search for the optimal coding weights for each local feature.

After this, max pooling along with SPM is used to get the

feature representation of images. Inspired by this, Wang et

al. [16] proposed to use locality to constrain the sparse cod-

ing process which can be computed faster and yields better

performance. [11, 17] also tried to jointly learn the optimal

codebooks and classiﬁers. However, sparse coding [18] has

no constraints on the sign of the coding parameters, nega-

tive parameters are sometimes needed to satisfy the sparse

coding constrains. For some particular applications [19],

non-negative sparse coding [20] is needed.

Not only has sparse coding been used for local features,

but also it has been widely used holistically on the entire

image. Wright et al. [6] tried to do face recognition as

ﬁnding a sparse representation of the test image by treat-

ing the training set as the bases and impressive results were

achieved. Bradley and Bagnell [9] tried to train a compact

codebook using sparse coding. Yuan and Yan [7] made vi-

sual classiﬁcation with multi-task joint sparse representa-

tion by fusing different types of features. Liu et al. [19]

tried to learn sparse and nonnegative representations of im-

1674

剩余7页未读，继续阅读

评论收藏

内容反馈

Ccery123

2014-04-06

很有参考价值。
星空liang

2014-09-24

还可以吧，没有期待的那么好
魅雪紫梦

2013-07-29

不错的资源，着重介绍稀疏编码在图像分类的应用

u010197251

粉丝: 0
资源: 1

基于稀疏编码的图像分类

一组使用稀疏编码算法 进行特征提取和图像分类 的 Python工具_python_代码_下载

基于稀疏编码和分类器集成的多实例学习图像分类

稀疏编码 SAE 学习不变特征

稀疏编码算法及其应用研究

SRC稀疏编码

基于多尺度特征融合Hessian稀疏编码的图像分类算法.docx

深度学习，基于稀疏编码的线性空间金字塔匹配图像分类所用的ScSPM

基于稀疏编码的人脸识别算法.pdf

电信设备-基于多方向上下文信息和稀疏编码模型的图像分类方法.zip

基于稀疏自编码深度神经网络的林火图像分类.pdf

稀疏编码matlab 教程

KSVD稀疏编码算法源码

论文研究-图正则化迁移稀疏概念编码的跨域图像分类.pdf

行业文档-设计装置-一种基于稀疏编码的手写体数字识别方法.zip

基于稀疏编码和机器学习的多姿态人脸识别算法.pdf

论文研究-基于稀疏表示分类的图像检索方法 .pdf

基于稀疏编码和卷积神经网络的地貌图像分类

基于稀疏表示的高光谱图像分类omp算法demo

行业分类-设备装置-基于跨媒体稀疏主题编码的图像自动标注方法.zip

纹理特征.rar_基于纹理_稀疏特征_稀疏表示 分类_稀疏表示分类_纹理分类

红外图像中基于似物性与稀疏编码的行人检测.pdf

基于稀疏自动编码器的深度神经网络实现.pdf

稀疏编码的友好介绍

最新资源

一组使用稀疏编码算法进行特征提取和图像分类的 Python工具_python_代码_下载

纹理特征.rar_基于纹理_稀疏特征_稀疏表示分类_稀疏表示分类_纹理分类