通过l2-norm正则逻辑回归进行有监督的特征学习，以实现3D对象识别资源-CSDN文库

108 浏览量 2021-03-26 20:42:07 上传评论收藏 974KB PDF 举报

在本篇研究论文中，作者探讨了通过l2-范数正则化逻辑回归进行有监督的特征学习，进而实现3D对象识别的技术和方法。这项工作发表在《Neurocomputing》第151期，出版年份为2015年。文章由来自华中科技大学、电子科技大学、武汉光电国家实验室、以及意大利特伦托大学的科研人员联合撰写，主要贡献者包括Fuhao Zou、Yunfei Wang、Yang Yang、Ke Zhou、Yunpeng Chen和Jingkuan Song。文章的主要内容涵盖了以下几点： 1. 3D数字化技术的进步带动了大量数字3D对象的产生，这些对象通常以图形、图像或视频的形式存在。研究的出发点是设计一种针对3D对象的2D图像进行识别的新特征提取方法。 2. 文章提出的特征提取方法基于一个核心理念：通过分类器对两个对象生成的响应可以高度反映出它们的语义相似性。因此，研究尝试利用一组分类器构建特征提取方法。具体而言，首先为每个类别学习一个分类器，然后将所有分类器的输出组合成对象特征。 3. 与常见的基于SIFT特征的bag-of-feature和稀疏编码方法相比，由于所提出的基于l2-范数正则化逻辑回归的方法考虑了标签信息，因此在发现潜在语义信息方面更为强大，这有助于提高对象识别的准确性。 4. 为了使所提方法能够适应大规模数据集，从而改善其泛化能力，选用了l2-范数逻辑回归作为分类器，并采用了随机梯度上升算法进行训练。这一方法在时间复杂度方面与图像像素数量成线性关系，并且比其他方法成本更低。从这些内容中，我们可以提炼出以下几个关键知识点： - 3D对象识别：利用数字3D对象的2D图像进行识别是计算机视觉和图形学领域的重要任务。随着3D数字技术的发展，对3D对象的识别技术提出了更高的要求。 - 特征提取方法：特征提取是机器学习和图像处理中的关键步骤。一个好的特征提取方法能有效地捕捉数据的内在结构和特征，从而提高识别任务的准确性。 - l2-范数正则化逻辑回归：逻辑回归是一种广泛应用于分类问题的统计方法，尤其适用于二分类问题。而l2-范数，也称为岭回归，是一种常用的正则化技术，用于防止模型过拟合，增强模型的泛化能力。 - 随机梯度上升算法：在优化问题中，尤其是在大规模数据集上，随机梯度上升（Stochastic Gradient Ascent, SGA）是一种常用的优化算法。与传统的梯度上升相比，SGA每次只使用一部分数据来估计梯度，因此计算效率更高，适合用于大规模数据集的训练。 - 时间复杂度：在算法和计算机科学中，时间复杂度指的是算法运行时间与输入大小之间的关系。线性时间复杂度意味着算法的运行时间与输入数据的大小成正比，这对于大规模数据集而言是高效且可接受的。这项研究为我们展示了一个通过特征学习来改善3D对象识别准确性的新方法，该方法在理论和应用层面都具有重要的价值。通过将多个分类器的输出相结合，引入了l2-范数正则化逻辑回归，并通过随机梯度上升算法进行高效训练，证明了在处理大规模数据集时的可行性和优越性。这不仅对3D对象识别的研究领域具有启示作用，也为其他需要复杂特征学习和高效训练算法的问题提供了参考。

资源推荐

资源详情

资源评论

Supervised feature learning via l

-norm regularized logistic regression

for 3D object recognition

Fuhao Zou

, Yunfei Wang

, Yang Yang

, Ke Zhou

, Yunpeng Chen

, Jingkuan Song

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China

Department of Information Engineering and Computer Science, University of Trento, Trento 38100, Italy

article info

Article history:

Received 14 November 2013

Received in revised form

9 June 2014

Accepted 11 June 2014

Available online 23 October 2014

Keywords:

Logistic regression

Stochastic gradient ascent

3D object recognition

Feature learning

abstract

With the advance of 3D digitalization techniques, it has produced a large number of digital 3D objects,

which are usually present in graph, image or video format. In this paper, we focus on designing a novel

feature extraction method towards 2D image of 3D object for recognition task. Motivated by the fact that

the responses generated by a classiﬁer for two objects can highly reﬂect their semantic similarity, we

attempt to exploit a set of classiﬁers to construct feature extraction method. The basic idea is as follows.

We ﬁrst learn a classiﬁer for each class and then combine the outputs of all classiﬁers as object feature.

Due to the label information being considered, the proposed method will be more powerful than the

typical methods, such as SIFT based bag-of-feature and sparse coding, in terms of discovering the latent

semantic information. This is helpful to improve the accuracy of the object recognition. In addition, to

make the proposed method scalable to be trained over the massive data (so as to better its generalization

ability), the ℓ

norm logistic regression is selected as the classiﬁer and trained with stochastic gradient

ascent. At the aspect of time complexity, the proposed method is linear to the number of image pixels

and less expensive than the other two methods. These arguments have been demonstrated by

the obtained experimental results, which is performed over four 3D datasets, such as COIL-100, 3Ddata,

ETH-80 and RGB-D dataset.

1. Introduction

With the rapid development of 3D modeling as well as 3D

digital image/video capturing, we have witnessed the exponential

growth of 3D digital content, such as 3D graph and 3D image and

3D TV/movie [21,30]. Due to the fact that the 3D digital works are

able to bring us more vivid and lively vision experience than 2D

ones, the investigation related to 3D digital content has attracted a

lot of attention in the multimedia community, such as semantic

analysis [34,32], scene understanding retrieval [11,6] and recogni-

tion [33,7] for 3D objects. As is well known, the feature represen-

tation of 3D digital objects plays a fundamental role in the case of

multimedia analysis and understanding. Thus, it is highly worth-

while to conduct investigations of how to extract discriminant

features for 3D objects. For the purpose of simplifying the problem

to be discussed, we mainly concentrate on extracting features for

2D images of 3D objects here.

In principle, the features are roughly grouped into three classes:

low level features, middle level features and top level features.

Generally, the low level features are built on the low level information

of the 3D objects, i.e., the textur e information [2 7,19,12],shape[30,4],

color moments [25],Hu'smomentsinvariants[25] and so on. In

addition, according to whether or not the interested region of the

feature locally or globally corresponds to the image, the low level

features are also classiﬁed into local features and global features. Most

local feature s represent te xtur e in an image patch. For e xample, SIFT

features use histograms of gradient orientations [19] of the local

patch. Global features are composed of contour representations [28],

shape descriptors [4], and texture features [27].Totally,thelocalor

global features intend to capture the distinct features of 3D objects

and simultaneously resist the geometrical and photometrical distor -

tion such as tran slation, rotation, scale, occlusion, clutte r and illumi-

nation changes.

Though the local features offer the robustness virtues, they are

handcrafted and susceptible to suffer the “semantic gap” issue.

Namely, the low level feature cannot accurately match its top level

semantic information. This will result in the fact that the similar

objects are far apart in its low level features space with higher

probability, which will signiﬁcantly degrade the performance

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.06.089

Corresponding author.

E-mail address: yunfeiwang@hust.edu.cn (Y. Wang).

Neurocomputing 151 (2015) 603–611

of the 3D object recognition. To handle these issues, some resear-

ches attempt to extract the middle or even high level feature via

resorting to machine learning methods. For instance, to extract

middle level features, sparse coding [36,35] is introduced to build

part-based features for 3D object. Suppose that a 3D object to be

processed is human, sparse coding can successfully decompose the

object as head, leg, and foot body parts by the constrained matrix

factorization. In addition, to extract semantic information of the

higher level, recently, the deep learning method [26,21,23] has

been exploited to extract features for 3D objects.

Compared with the low level feature extraction approaches, the

sparse coding and deep learning methods can automatically

extract the conceptual and semantic level features respectively.

However, they have their own shortcomings. For example, sparse

coding is hard to be scalable to large-scale dataset. As to the deep

learning methods, training their model is very time-consuming.

Apart from this, it requires to be trained by skillful researchers.

That is, the performance of the obtained deep learning model is

highly dependent on the skills of the researcher. It follows that

either sparse coding or deep learning fails to be widely applied to

massive data. In other words, handling massive data is a challen-

ging task for the two aforementioned feature learning methods.

To avoid the limitations of the two aforementioned feature

learning methods, we intend to propose a fast and scalable feature

learning method via a set of one-vs.-all logistic regression classi-

ﬁers with ℓ

-norm, also called the ℓ

-norm logistic regression

(abbreviated as ℓ

-LR). Its motivations are as follows. According

to the literature [29], the classiﬁer response values of the input

features can be used to measure the similarity among the input

features. Therefore, the response values of two similar 3D objects

is very close, vise versa. However, if only using a single classiﬁer, it

may perform well in the case of measuring the similarity for the

similar 3D objects, but fails to recognize dissimilar ones fallen into

different categories. For the sake of comprehensively measuring

the similarity for various kinds of 3D object, we plan to learn a set

of classiﬁers for all classes of training data and combine their

response values into the feature of 3D objects. In addition, to make

the feature learning method scalable to massive data, we exploit

stochastic gradient ascent to update the model parameters, which

is a representative scalable machine learning method.

It is worth highlighting the characteristics of this paper as

follows:



The label information is taken into account when training the

feature learning model. This will effectively avoid the semantic

gap issue often suffered by the handcrafted feature extraction

method.



Stochastic gradient ascent updating is utilized, which is helpful

to training over the massive training data. This implies that we

can better the generalization ability via increasing the volume

of training data.



The proposed feature learning method allows us to extract

features for test objects in an online way. In contrast, either

sparse coding or deep learning fails to achieve this goal, for that

the sparse code of the test object is obtained by repeated

iterations and deep learning model is too computationally

expensive.

The remainder of this paper is organized as follows. In Section 2,

we review the state-of-the-art of the feature extraction methods for

3D objects. We present the feature learning algorithm via a set of

logistic regression classiﬁers in Section 3.Andthen,inSection 4,the

extensive experiments are conducted to demonstrate the effective-

ness and the efﬁciency of the proposed algorithm. Finally, we draw

aconclusioninSection 5.

2. Related works

Considering the fact that the typical feature extraction met-

hods of 3D objects have been talked about in the Section 1, for

simplicity, we plan to just select the state-of-the-art method out of

each class to review, such as SIFT and its variants, sparse coding,

along with deep learning. The detailed survey is as follows.

2.1. SIFT and its variants

Similar to the 2D object retrieval and recognition applications,

scale-invariant feature transform (SIFT) descriptor [19] is also

popular and dominant in 3D object applications [30,18,3,1] since

the SIFT feature is more superior to other features in terms of

handling intensity, rotation, scale and afﬁne variations. However,

matching the similarity between 3D object is very tedious if

computing the correspondence in the level of SIFT feature. For

the ease of matching, the bag-of-feature (BOF) idea is then applied

to aggregate the SIFT features in 3D object retrieval and recogni-

tion tasks [8,31,3,5].

In spite of the fact that the BOF method is able to successfully

aggregate the local features, it will decrease the discriminant ability

to some extent because features are aggregated in a disordered way.

To remedy this limitation, a multi-resolution version of BoF has been

proposed via combing a series of BOFs of different resolutions,

which is also called the pyramid kernel [9].Forfurtherimproving

the discriminant ability, pyramid kernel is extended to spatial

pyramid kernel [14] via taking spatial information into account.

With the above-mentioned advantages, the pyramid and spatial

pyramid kernel have been applied to 3D object recognition [17,13].

As summarized in Section 1, the low level features, such as SIFT and

its variants, are readily to result in semantic gap issues because of

being constructed in a handcrafted way and label information not

being considered.

2.2. Sparse coding

Sparse coding [15] refers to a class of algorithms for automatically

ﬁnding succinct representations of data as a (often linear) combina-

tion of a few typical atoms (usually called dictionaries or codebooks)

learned from data. Given only unlabeled input data, it learns basis

functions that capture middle level features (i.e., concept level

features) in the data. For example, the very large set of English

sentences can be encoded by a small number of symbols (i.e. letters,

numbers, punctuation, and spaces) combined in a particular order

for a particular sentence, and so a sparse coding for English would

be those symbols.

More speciﬁcally, given a k-dimensional set of real-numbe-

red input vectors x

A R

, the goal of sparse coding is to ﬁnd n

k-dimensional basis vectors b

!

; …; b

!

A R

along with a sparse

n-dimensional vector of weights or coefﬁcients y

A R

for each

input vector, such that a linear combination of the basis vectors

with proportions given by the coefﬁcients results in a close

approximation to the input vector: x

 ∑

j ¼ 1

[15]. Presently,

sparse coding has been leveraged to extract conceptual features

for 2D images of 3D objects [36,35]. Theoretically, the perform-

ance of sparse coding based 3D object recognition and retrieval

will outperform those on top of low level features (i.e., SIFT and

its variants). In spite of this, its extrapolation is quite time-

consuming.

2.3. Deep learning

Deep learning is a set of algorithms in machine learning that

attempt to learn layered models of inputs, commonly neural

F. Zou et al. / Neurocomputing 151 (2015) 603–611604

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38534344

粉丝: 0
资源: 916

通过l2-norm正则逻辑回归进行有监督的特征学习，以实现3D对象识别

基于有监督分类的地物识别

irntv.zip_L1-TV_L2 正则化_TV-norm_tv 正则_去卷积 正则化

通过L2,1-Norm正则化矩阵完成方法对Web服务进行QoS预测

weighting k-means with an l2-norm regularization

大数据-算法-p正则化问题的算法研究.pdf

基于多种模型剪枝方法（L1-norm、Slimming、AutoSlim）的模型轻量化和模型压缩实现

l1_and_l2_loss_function:可视化 L1-norm 和 L2-norm 损失函数之间的差异

L1-norm Regularization

具有l2,1-范数正则化的随机傅立叶极限学习机

基于t-norm算子的模糊逻辑和模糊推理基于t-norm算子的模糊逻辑和模糊推理 (2003年)

L1、L2范数学习笔记.docx

l2norm:计算 L2 范数（欧几里得范数）

LBP-HOG特征提取.zip

层次分析matlab代码-dnnwsp:带有权重稀疏控制（即L1-norm正则化）的深度神经网络（DNN）使用精神分裂症患者和健康人群的全脑静

CSL1.0.rar_CSL1.0_CSL1.0.rar_RVM-DOA_l0-norm_sparse classifier

LP.rar_lp-norm

具有类多样性的lp-norm多核学习

逻辑回归公式.docx

基于L2,1-范数的无监督最优特征选择及其在动作识别中的应用

L1-norm Minimization

12. 正则化1

加权Schatten p-Norm最小化以实现图像降噪和背景减法

图像超分辨率重建技术中的正则化方法-源码

wechat-norm-demo.zip

Examples:用于计算R2-norm值的Python示例

正则迹范数在人脸识别中的应用.pdf

最新资源

irntv.zip_L1-TV_L2 正则化_TV-norm_tv 正则_去卷积正则化