基于Python的图像分类_python图片识别分类资源-CSDN文库

共15个文件

xml：4个

py：4个

pdf：1个

image

class

remote

image

5星 · 超过95%的资源需积分: 48 70 浏览量 2019-01-14 17:06:52 上传评论 27 收藏 2.13MB ZIP 举报

在图像处理和计算机视觉领域，图像分类是一项基本且重要的任务，它涉及到将图像分配到预定义的类别中。本教程将深入探讨如何使用Python进行基于遥感数据的图像分类，这是一个对初学者非常友好的实践项目。遥感数据通常包含了丰富的地理信息，如土地覆盖类型、植被状态等，因此它的分类对于环境监测、城市规划等领域具有重大意义。我们需要了解图像分类的基本流程。这通常包括以下几个步骤： 1. 数据预处理：遥感图像通常包含多个波段，需要进行预处理，例如辐射校正、大气校正等，以便消除环境因素的影响。此外，图像可能需要裁剪或重采样以适应模型的需求。 2. 特征提取：特征是区分不同类别的关键。可以使用传统的图像处理技术，如直方图均衡化、边缘检测等，或者使用深度学习方法，如卷积神经网络（CNN）自动提取特征。 3. 模型选择与训练：选择合适的分类模型，如支持向量机（SVM）、随机森林（Random Forest）、K近邻（K-NN）或深度学习模型。在训练过程中，数据集会被划分为训练集和测试集，用于模型的训练和性能评估。 4. 模型优化：通过调整超参数、采用正则化等手段来防止过拟合，提高模型的泛化能力。 5. 模型验证与评估：使用测试集数据评估模型的性能，常见的评估指标有准确率、召回率、F1分数等。在Python中，我们可以利用强大的库如OpenCV、Scikit-image进行图像预处理和特征提取，Scikit-learn进行传统机器学习模型的训练，以及TensorFlow、PyTorch等进行深度学习模型的构建。对于遥感图像，还有专门的库如RSGISLib、Geopandas等可以帮助处理地理空间数据。在提供的压缩包"遥感数据分类picCategories-master"中，可能包含了以下内容： - 图像数据：原始遥感图像文件，可能为TIFF或其他格式。 - 标签文件：每个图像对应的类别信息，通常为CSV或TXT文件。 - 预处理脚本：用于处理图像的Python脚本，可能包括数据读取、预处理和特征提取。 - 模型训练脚本：实现模型选择、训练和优化的代码。 - 结果可视化：展示分类结果的地图或其他可视化输出。通过学习这个项目，你可以掌握遥感图像分类的基本流程，了解Python在遥感数据分析中的应用，并熟悉相关的工具和库。随着对深度学习的深入，你还可以尝试更复杂的模型，如使用迁移学习在预训练的CNN上进行微调，进一步提升分类性能。

资源推荐

资源详情

资源评论

收起资源包目录

遥感数据分类picCategories-master.zip （15个子文件）

遥感数据分类picCategories-master

categoriesGUI.spec 1KB

config

config.cfg 467B

picCategories.py 3KB

07293195.pdf 2.64MB

__pycache__

config.cpython-35.pyc 2KB

categoriesGUI.py 14KB

.idea

misc.xml 201B

modules.xml 299B

encodings.xml 138B

workspace.xml 8KB

picCategories-master.iml 464B

README 534B

modelTrain.py 12KB

config.py 2KB

ico

48.ico 9KB

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 3, MARCH 2016 1349

Unsupervised Deep Feature Extraction for Remote

Sensing Image Classiﬁcation

Adriana Romero, Carlo Gatta, and Gustau Camps-Valls, Senior Member, IEEE

Abstract—This paper introduces the use of single-layer and

deep convolutional networks for remote sensing data analysis.

Direct application to multi- and hyperspectral imagery of super-

vised (shallow or deep) convolutional networks is very challenging

given the high input data dimensionality and the relatively small

amount of available labeled data. Therefore, we propose the use

of greedy layerwise unsupervised pretraining coupled with a highly

efﬁcient algorithm for unsupervised learning of sparse features.

The algorithm is rooted on sparse representations and enforces

both population and lifetime sparsity of the extracted features,

simultaneously. We successfully illustrate the expressive power of

the extracted representations in several scenarios: classiﬁcation

of aerial scenes, as well as land-use classiﬁcation in very high

resolution or land-cover classiﬁcation from multi- and hyperspec-

tral images. The proposed algorithm clearly outperforms standard

principal component analysis (PCA) and its kernel counterpart

(kPCA), as well as current state-of-the-art algorithms of aerial

classiﬁcation, while being extremely computationally efﬁcient at

learning representations of data. Results show that single-layer

convolutional networks can extract powerful discriminative fea-

tures only when the receptive ﬁeld accounts for neighboring pixels

and are preferred when the classiﬁcation requires high resolution

and detailed results. However, deep architectures signiﬁcantly

outperform single-layer variants, capturing increasing levels of

abstraction and complexity throughout the feature hierarchy.

Index Terms—Aerial image classiﬁcation, classiﬁcation, deep

convolutional networks, deep learning, feature extraction, hyper-

spectral (HS) image, multispectral (MS) images, segmentation,

sparse features learning, very high resolution (VHR).

I. INTRODUCTION

ARTH observation (EO) through remote sensing tech-

niques is a research ﬁeld where a huge variety of physical

signals are measured from instruments on board space and

airborne platforms. A wide diversity of sensor characteristics

are available nowadays, ranging from medium and very high

resolution (VHR) multispectral (MS) imagery to hyperspectral

(HS) images that sample the electromagnetic sp ectrum with

high detail. These myriad of sensors serve to particularly

Manuscript received November 14, 2014; re vised April 24, 2015 and

June 29, 2015; accepted August 3, 2015. Date of publication October 6, 2015;

date of current version February 24, 2016.

A. Romero is with the Department of Applied Mathematics and Analysis,

Uni versitat de Barcelona, 08007 Barcelona, Spain (e-mail: adriana.romero@

ub.edu).

C. Gatta is with the Computer Vision Center, Univ ersitat Autònoma de

Barcelona, 01873 Barcelona, Spain (e-mail: cgatta@cvc.uab.es).

G. Camps-Valls is with the Image Processing Laboratory, Universitat de

València, 46980 València, Spain (e-mail: gcamps@uv.es).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TGRS.2015.2478379

different objectives, focusing either on obtaining quantitative

measurements and estimations of geobiophysical variables or

on the identiﬁcation of materials by the a nalysis of the acquired

images [1]–[3]. Among all the different products that can be

obtained from the acquired images, classiﬁcation maps

are

perhaps the most relevant ones. The remote sensing image clas-

siﬁcation problem is very challenging and ubiquitous because

land-cover and land-use maps a re mandatory in multitemporal

studies and constitute useful inputs to other processes.

Despite the h igh number of advanced, robust, and accurate

existing classiﬁers [4], the ﬁeld faces very important challenges,

as follows.

1) Complex statistical characteristics of images. The statis-

tical properties of the acquired images place important

difﬁculties for automatic classiﬁers. The analysis of these

images turns out to be very challenging, particularly

because of the high dimensionality of the pixels, the

speciﬁc noise and uncertainty sources observed, the high

spatial and spectral redundancy and collinearity, and their

potentially nonlinear nature.

Beyond these well-known

data characteristics, we should highlight that spatial and

spectral redundancy also suggests that the acquired signal

may be better described in sparse representation spaces,

as recently reported in [4] and [6]–[8].

2) High-computational problems involved. We are witness-

ing the advent of a Big Data Era, particularly in remote

sensing data processing. The upcoming constellations of

satellite sensors will acquire a large variety of hetero-

geneous images of different spatial, spectral, angular,

and temporal resolutions. In fact, we are witnessing an

ever-increasing amount of data gathered with current and

upcoming EO satellite missions, from MS sensors such

as Landsat-8 [9] to VHR sensors such as WorldView-III

[10], the superspectral Copernicus’ Sentinel-2 [11] and

Sentinel-3 missions [12], as well as the planned Envi-

ronmental Mapping and Analysis Program [13], Hyper-

spectral Infrared Imager [14], and the European Space

Agency’s candidate Fluorescence Explorer [15] imag-

ing spectrometer missions. This data ﬂux will require

In the remote sensing community, the term “classiﬁcation” is often preferred

to the term “semantic segmentation.” We use the term classiﬁcation to classify

full images in the ﬁrst application of aerial image classiﬁcation, as well as to

describe the process of attributing each pixel (or segment) to a single semantic

class in subsequent applications.

Factors such as multiscattering in the acquisition process, heterogeneities at

subpixel level, and atmospheric and geometric distortions lead to distinct non-

linear feature relations, since pixels lie in high-dimensional curved manifolds

[4], [5].

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1350 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 3, MARCH 2016

computationally efﬁcient classiﬁcation techniques. The

current state-of-the-art support vector machine (SVM)

[16], [17] is not, however, able to cope with more than

some few thousands of labeled data points.

A very convenient way to alleviate the aforementioned prob-

lems is to extract relevant, potentially useful, nonredundant,

and nonlinear features from images in order to facilitate the

subsequent classiﬁcation step. The extracted features could be

fed into a simple cost-effective (ideally linear) classiﬁer. The

bottleneck would be then the feature learning step. Learning

expressive spatial–spectral features from HS images in an efﬁ-

cient way is thus of paramount relevance. In addition, and very

importantly, learning such features in an unsupervised fashion

has also become extremely relevant given the few labeled pixels

typically available.

A. Background

Given the typically high dimensionality of remote sensing

data, feature extraction techniques have been widely used in the

literature to reduce the data dimensionality. While the classical

principal component analysis (PCA) [18] is still one of the

most popular choices, a plethora of nonlinear dimensionality

reduction methods, m anifold learning and dictionary learning

algorithms, h ave been introduced in the last decade.

State-of-the-art manifold learning methods [19] include the

following: local approaches for the description of remote

sensing image manifolds [20]; kernel-based and spectral de-

compositions that learn mappings optimizing for maximum

variance, correlation, entropy, or minimum noise fraction [21];

neural networks that generalize PCA to encode nonlinear data

structures via autoassociative/auto encoding networks [22]; and

projection pursuit approaches leading to convenient Gaussian

domains [23]. In remote sensing, autoencoders have been

widely used [24]–[27]. However, a number of (critical) free

parameters are to be tuned; regularization is an important issue,

which is mainly addressed by limiting the network’s structure

heuristically, and only sh allow stru ctures are considered mainly

due to the limitations on computational resources and efﬁciency

of the training algorithms. On top of this, very often, autoen-

coders employ only the spectral information, and in the best

of the cases, spatial information is naively included through

stacking handcrafted spatial features.

To the authors’ knowledge, there is few evidence of the good

performance of deep architectures in remote sensing image

classiﬁcation: [28] introduces a d eep learning algorithm for

classiﬁcation of (low-dimensional) VHR images; [29] explores

the robustness of deep networks to noisy class labels for aerial

image classiﬁcation, and [30] introduces hybrid deep neural

networks to enable the extraction of variable-scale features

to detect vehicles in satellite images; [31] proposes a hybrid

framework based on stacked autoencoders for classiﬁcation of

HS data. Although deep learning methods can cope with the dif-

ﬁculties of nonlinear spatial–spectral image analysis, the issues

of sparsity in the feature representation and efﬁciency of train-

ing algorithms are not obvious in state-of-the-art frameworks.

In recent y ears, dictionary learning has emerged as an

efﬁcient way to learn sparse image features in unsupervised

settings, which are eventually used for image classiﬁcation

and object recognition: discriminative dictionaries have been

proposed for spatial–spectral sparse representatio n and image

classiﬁcation [32]; sparse kernel networks h ave been recently

introduced for classiﬁcation [33], sparse representations over

learned dictionaries for image pansharpening [34], saliency-

based codes for segmentation [35], [36], sparse bag-of-words

codes for automatic target detection [37], and unsupervised

learning of sparse features for aerial image classiﬁcation [38].

These methods describe the input images in sparse represen-

tation spaces but do not take advantage of the high nonlinear

nature of deep architectures.

Therefore, in the context of remote sensing, unsupervised

learning of features in a deep convolutional neural network

(CNN) architecture seeking sparse representations has not been

approached so far.

B. Contributions

In this paper, we aim to address the two main challenges in

the ﬁeld of remote sensing data. Therefore, we introduce the

use of deep convolutional networks for remote sensing data

analysis [39] trained by means of an unsupervised learning

method seeking sparse feature representations. On one hand,

the following are observed: 1) deep architectures have a highly

nonlin ear nature that is well suited to cope with the difﬁculties

of nonlinear spatial–spectral image analysis; 2) convolutional

architectures only capture local interactions, mak ing them well

suited when the input shares similar statistics at all location, i.e.,

when there is high redundancy; 3) sparse features are supposed

to be convenient to describe remote sensing images [4], [6]–[8].

On the other hand, we want to train deep convolutional archi-

tectures efﬁciently to alleviate the high-computationalproblems

involved in remote sensing. Given the typically few labeled

data, applying unsupervised learning algorithms to train deep

architectures is a paramount aspect of remote sensing.

We propose the combination of greedy layerwise unsuper-

vised pretraining [40]–[43] coupled with the highly efﬁcient

enforcing population and lifetime sparsity (EPLS) algorithm

[44] for unsupervised learning of sparse features and show the

applicability and potential of the method to extract hierarchical

(i.e., deep) sparse feature representations of remote sensing im-

ages. The EPLS seeks a sparse representation of the input data

(remo te sensing images) and allows training systems with large

numbers of input channels efﬁciently (and numerous ﬁlters/

parameters), without requiring any metaparameter tuning.

Thus, deep convolutional networks are trained efﬁciently in

an unsupervised greedy layerwise fashion [40]–[43] using the

EPLS algorithm [44] to learn the network ﬁlters. The learned

hierarchical representations of the input remote sensing images

are used for image/pixel classiﬁcation, where lower layers ex-

tract low-level features, and higher layers exhibit more abstract

and complex representations.

To our knowledge, this is the ﬁrst work dealing with sparse

unsupervised deep convolutional networks in remote sensing

data analysis in a systematic way. We want to emphasize the

fact that the methodology presented here is fully unsuper-

vised, which is a different (and more ch allenging) setting to

ROMERO et al.: UNSUPERVISED DEEP FEATURE EXTRACTION FOR REMOTE SENSING IMAGE CLASSIFICATION 1351

the common supervised use of convolutional nets. The main

contributions of this paper are as follows.

1) Deep convolutional architectures trained with EPLS.We

exploit the properties of the EPLS and extend the work

in [44] from single to deep architectures, as well as from

classiﬁcation of ima ges to semantic segmentation of h igh-

dimensional images, wh ich certainly is a more in teresting

problem in the ﬁeld of remote sensing image processing.

2) Application of the proposed method to VHR, MS, and HS

images. Unlike [44], which only focused on tiny RGB

images, we deal with VHR, MS, and HS images as well.

Moreover, we analyze the inﬂuence of deep architectures’

metaparameters on the method’s performance.

The rest of this paper is organized as follows. Section II

introduces the main characteristics of the proposed algorithm

for unsupervised hierarchical (deep) sparse feature extraction:

we describe the (deep) CNN architecture, detail the layerwise

pretraining algorithm, and summarize the unsupervised EPLS

algorithm. Section III compares the proposed algorithm with

state-of-the-art algorithms in terms o f classiﬁcation accuracy

and their expressive power in four different applications: clas-

siﬁcation of aerial scenes, as well as land-cover classiﬁcation

in VHR, MS, and HS images. After a detailed analysis of the

results, we end this paper with some concluding remarks and

outline of the future work in Section IV.

II. U

NSUPERVISE D DEEP FEATURE LEARNING

REMOTE SENSING IMAGES

This section introduces the concepts and strategies employed

to learn deep features for remote sensing. In Section II-A, we

brieﬂy explain the main blocks of a deep CNN; in Section II-B,

we outline a strategy to learn the ﬁlters of each layer, which

is called greedy layerwise unsupervised pretraining [40], [42],

[43]; ﬁnally, in Section II-C, we introduce the EPLS algorithm

[44], which is the unsupervised learning strategy employed to

learn the network parameters.

A. Deep CNNs

Deep neural networks are models that capture hierarchical

representations of data. These models are based on the sequen-

tial application of a computation “module,” where the output of

the previous module is the input to the next one; these modules

are called layers. Each layer provides one representation level.

Layers are parameterized by a set of weights connecting input

units to output units and a set of biases. In the case of CNNs,

weights are shared locally, i.e., the same weights are applied at

every location of the input. The weights connected to the same

output unit form a ﬁlter.

CNN layers consist of the following: 1) a convolution of the

input with a set of learnable ﬁlters to extract local features; 2) a

pointwise nonlinearity, e.g., the logistic function, to allow deep

architectures to learn nonlinear representations of the input

data; and 3) a p ooling operator, which aggregates the statistics

of the features at nearby locations, to reduce the computational

cost (by reducing the spatial size of the image), while providing

a local translational invariance in the previously extracted fea-

Fig. 1. Graphical representation of a deep convolutional architecture.

tures. Fig. 1 shows an example of CNN, with L layers stacked

together. The last convolutional layer is followed by a fully

connected output layer.

The operations performed in a single convolutional layer can

be summarized as

= pool



σ(O

l−1

 W

+ b

)



(1)

where O

l−1

is the input feature map to the lth layer, θ

, b

} is the set of learnable p ar ameters (weights and bi-

ases) of the layer, σ(·) is the pointwise nonlinearity, pool is a

subsampling operation, P is the size of the pooling region,

and the symbol  denotes linear convolution. Note that, in the

context of CNN, the convolution is multidimensional with each

ﬁlter. The input of the ﬁrst layer is the input data; in this case, it

is an MS/HS image, i.e., O

= I,whereI ∈

×C

×N

is the

input image; R

and C

are its width and height, respectively;

and N

is the number of spectral channels (bands). More

generally, the input to a subsequent layer l is a feature map

l−1

∈

l−1

×C

l−1

×N

l−1

,whereR

l−1

and C

l−1

are the width

and h eight of the lth layer’s input feature m ap, respectively; and

l−1

is the number of outputs of the (l − 1)th layer.

CNN architectures have a signiﬁcant number of metaparam-

eters. The most relevant ones may be the following: 1) the

number of layers; 2) the number of outputs per layer; 3) the

size of the ﬁlters, which is also called receptive ﬁeld; and

4) the size and type of spatial pooling.

Another important aspect is how to train such architectures.

Deep convolutional networks can be trained in a supervised

fashion, e.g., by means of standard backpropagation [45]–[47],

or in an unsupervised fashion, by means of greedy layerwise

pretraining [40], [42], [43]. Unsupervised greedy layerwise

pretraining has been successfully used in the literature [40],

[42], [43], [48], [49] to train deep CNN. Supervised methods

usually require a large amount of reliable labeled data, which

is difﬁcult to obtain in remote sensing classiﬁcation problems.

Therefore, in the case of MS and HS images, it is preferred to

use an unsupervised learning strategy given the typically few

available labeled pixels per class.

B. Greedy Layerwise Unsupervised Pretraining Strategy

Greedy layerwise unsupervised pretraining [40], [42], [43] is

based on the idea that a local (layerwise) unsupervised criterion

can be applied to pretrain the network’s parameters, allowing

the use of large amounts of unlabeled data. After pretraining,

the network’s parameters are set to a potentially good local

minima, from which supervised learning (called ﬁne-tuning)

The pooling region is usually square; in this case, it is formed by P × P

pixels.

1352 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 3, MARCH 2016

Fig. 2. Illustration of how EPLS generates the output target matrix.

can follow. However, deep networks have been also trained

in a purely unsupervised way, skipping the ﬁne-tuning step

[48]. Patch-based training is the most commonly used approach

to learn the convolutional layers’ parameters by means of

unsupervised criteria [50]. It consists in using a set of randomly

extracted patches from input images (or feature maps) to train

each layer. After that, the layer weights are applied to each input

location to obtain output convolutional feature maps that will

serve as input to the next layer.

Algorithm 1 shows the pseudocode of a greedy layerwise un-

supervised pretraining strategy, as introduced in [40], [42], and

[43]. The algorithm expects as input a set images D

= {O

}

∀ i

and a deep architecture with L layers. Then, it trains each layer

of the d eep architecture in a patch-based fashion and provides

as output the parameters of all layers {θ

, θ

,...,θ

}, i.e.,

the (pre)trained deep architecture with θ

= {W

, b

}, l ∈

{1, 2,...,L}. For each layer l (line 1), the algorithm extracts

N random patches from the feature maps (or images) in D

l−1

to generate H

l−1

∈R

N×N

l−1

(line 2). Each row of H

l−1

corresponds to a vectorized patch, and each column represents

an input dimension. After that, it learns the layer’s parameters

applying an UnsupervisedCriterion on H

l−1

(line 3). In our

case, the unsuperv ised criterion is the EPLS algorithm, which

is detailed in Section II-C. Then, the set of output feature maps

= {O

}

∀ i

of the trained layer l is computed from the set

of input feature maps D

l−1

= {O

l−1

}

∀ i

by performing feature

extraction (see Section II-D for more details) (line 4). The new

set o f feature maps D

is subsequently used to train the next

layer. The same procedure is repeated for each layer until l = L.

Algorithm 1 Greedy layerwise unsupervised pretraining

Input: D

, L

Output: {θ

, θ

,...,θ

},whereθ

= {W

, b

}∀l ∈

{1, 2,...,L}

1: for l =1→ L do

2: Generate H

l−1

∈R

N×N

l−1

by randomly extracting N

patches from O

l−1

∈D

l−1

3: θ

← UnsupervisedCriterion(H

l−1

)

4: D

= {O

: FeatureExtraction(O

l−1

, θ

), ∀ O

l−1

∈

l−1

} [see (1)]

5: end for

C. Unsupervised Learning Criteria With Sparsity

Sparsity is among the p roperties of a good feature repre-

sentation [48], [50]–[53]. Sparsity can be deﬁned in terms

of population sparsity and lifetime sparsity. On one hand,

population sparsity ensures simple representations of the data

by allowing only a small subsets of outputs to be active at the

same time [54]. On the other hand, lifetime sparsity controls the

frequency of activation of each output throughout the data set,

ensuring rare but high activation of each output [54]. State-of-

the-art unsupervised learning methods such as sparse restricted

Boltzmann machines [40], sparse autoencoders (SAE) [53],

sparse coding (SC) [52], predictive sparse decomposition [49],

sparse ﬁltering [48], and orthogonal matching pursuit (OMP-k)

[55] have been successfully used in the literature to extract

sparse feature representations. OMP-k and SC seek population

sparsity, whereas SAE seeks lifetime sparsity. OMP-k trains a

set of ﬁlters by iterative ly selecting an output of the code to b e

made nonzero in order to minimize the residual reconstruction

error, until at most k outputs have been selected. The method

achieves a sparse representation of the input data in terms of

population sparsity. SAE trains the set of ﬁlters by minimiz-

ing the reconstruction error while ensuring similar activation

statistics through all training samples among all outputs, thus

ensuring a sparse representation of the data in terms of lifetime

sparsity. However, the great majority of these methods have nu-

merous metaparameters and/or enforce sparsity at the expense

of adding metaparameters to tune.

In [44], we introduced EPLS, a novel, metaparameter-free,

off-the-shelf, and simple algorithm for unsupervised sparse

feature learning. The method provides discriminative features

that can be very useful for classiﬁcation as they capture rel-

evant spatial and spectral image features jointly. The method

iteratively builds a sparse target from the output of a layer

and optimizes for that speciﬁc target to learn the ﬁlters. The

sparse target is deﬁned such that it ensures both population and

lifetime sparsity. Fig. 2 summ arizes th e steps of the method in

[44]. Essentially, given a matrix of input patches to train layer l,

i.e., H

l−1

, we need to do the following: 1) compute the output

of the patches H

by applying the learned weights and biases to

the input, and subsequently the nonlinearity; 2) call the EPLS

algorithm to generate a sparse target T

from the output of the

layer, such that it ensures population and lifetime sparsity; and

ROMERO et al.: UNSUPERVISED DEEP FEATURE EXTRACTION FOR REMOTE SENSING IMAGE CLASSIFICATION 1353

(3) optimize the parameters of the layer (weights and biases) by

minimizing the L

norm of the difference between the layer’s

output and the EPLS sparse target, i.e.,

∗

=argmin

H

− T



. (2)

The optimization is performed by means of an out-of-the-box

mini-batch stochastic gradient descent with adaptive learning

rates [56]. From now on, we will use the superscript b to refer

to the data related to a mini-batch, e.g., the output of a layer

∈R

N×N

will now be H

l,b

∈R

×N

,whereN

<Nis

the numbe r of patches in a mini-batch.

Algorithm 2 recapitulates how the EPLS builds the sparse

target matrix f rom the output matrix of a layer. Let H

l,b

be the

mini-batch output matrix of a layer l, which is composed of

output vectors of dimensionality N

.LetT

l,b

be the sparse

target ma trix built by the EPLS, with the same dimensions

as H

l,b

. Starting with no activation in T

l,b

(line 1) and the

output of the system H

l,b

normalized between [0,1] (line 2), the

algorithm processes a row h of H

l,b

at each iteration (line 4).

In line 5 , the algorithm selects the output k of the nth row that

has the maximal activation value h

minus an inhibitor a

to be

set as one “hot code”, thus ensuring population sparsity. The in-

hibitor a

is initialized to zero. It “counts” the number of times

an output j has been selected, increasing its inhibition progres-

sively by N

/N until reaching maximal inhibition, where N is

the total number of training patches. This prevents the selection

of an output that has already been activated N/N

times and,

thus, ensur es lifetime sparsity. In line 6, the kth element of

the nth row of the target matrix T

l,b

is activated, ensuring

population sparsity. In line 7, the inhibitor is updated, and

ﬁnally, in line 9, th e complete outpu t target T

l,b

is remapped

to the active/inactive values of the corresponding nonlinearity.

More details on the EPLS algorithm can be found in [44].

Algorithm 2 EPLS [44]

Input: H

l,b

, a,N

Output: T

l,b

, a

1: T

l,b

2: H

l,b

=(H

l,b

− min(H

l,b

))/(max(H

l,b

) − min(H

l,b

))

3: for n =1→ N

4: h

= H

l,b

n,j

∀ j ∈{1, 2,...,N

}

5: k =argmax

− a

)

6: T

l,b

n,k

7: a

= a

+(N

/N )

8: end for

9: Remap T

l,b

to active/inactive values

D. Feature Extraction

After training the parameters of a network, we can proceed

to extract feature representations. To do so, we must choose an

encoder to m ap the input feature map of each layer to its repre-

sentation, i.e., we must choose the nonlinearity to be used after

applying the learned ﬁlters to all input locations. A straightfor-

ward choice is the use of a natural encoding, i.e., whichever

encoding is asso ciated to the training procedure. When using

EPLS to train networks, the natural encoding is the nonlinearity

used to compute the output of each layer. However, different

training and encoding strategies can be combined together.

Encodings that lead to sparse representations have proven to

be effective in the literature, e.g., soft-threshold encoding is

a popular choice, which involves a tunable metaparameter to

control the desired degree of sparsity [55].

Moreover, the use of polarity split has shown to further

improve the p erformance of many experiments [55]. Polarity

splitting takes into account the positive and negative compo-

nents of a code in the following way:

= pool



σ(O

l−1

 W

+ b

)



−

= pool



σ(O

l−1

 (−W

)+b

)





, O

−



(3)

where O

is th e con catenation of the po sitive and negative

components of the code. Polarity split results in doubling the

number of outputs and is usually applied to the output layer of

the network.

Summarizing, we train d eep architectures by means of

greedy layerwise unsupervised pretraining in conjunction with

EPLS and choose a feature encoding strategy for each spe-

ciﬁc problem. Initial p arameters are randomly drawn from

N (0, 10

−8

). Each layer is trained for a minimum of 20 epochs

and a maximum of N

epochs. If the relative training error

decrease between epochs is very small, the training stops. The

mini-batch size is initialized to N/N

, and, as is standard

practice, the mini-batch size is doubled every time the training

error between two consecutive epochs increases.

III. E

XPERIMENTAL RESULTS

This section is devoted to illustrate the capabilities of the pre-

sented algorithm in different scenarios of image classiﬁcation

and segmentation found in remote sensing. We study relevant

issues as the potential of extracted features for land-cover/

land-use classiﬁcation in aerial, VHR, M S, and HS images. We

study problems with a wide diversity of input data dimension-

ality, number of classes, and amount of available labeled data.

Finally, we pay attention to par ticularly relevant issues when

training the proposed method such as the importance of depth

and sparsity, the impact of the pooling stages, and the learned

hierarchical representations.

A. Aerial Scene Classiﬁcation

1) Data Collection: We validate the aerial scene classiﬁca-

tion on the UCMerced data set. The data set contains manually

extracted images from the USGS National Map Urban Area

Imagery collection.

UCMerced consists of 256 × 256 color

http://vision.ucmerced.edu/datasets/landuse.html

评论收藏

内容反馈

番皂泡

2023-07-27

作者在这个文件中结合图像分类的应用场景，给出了一些实用的案例和经验分享。
两斤香菜

2023-07-27

这篇文件的代码简洁明了，易于理解和实现，对于想要入门图像分类的人来说很方便。
易烫YCC

2023-07-27

这个文件详细介绍了基于Python的图像分类的原理和方法，对于初学者来说很友好。
八位数花园

2023-07-27

这篇文件内容丰富，不仅介绍了基础的图像分类算法，还深入讨论了一些优化和改进的方法。
巧笑倩兮Evelina

2023-07-27

这篇文件基于Python的图像分类非常实用，提供了很多实用的代码和技巧。

乔巴电子

粉丝: 30
资源: 95

基于Python的图像分类

基于python利用pytorch实现图像分类项目源码

基于Python实现图像分类.zip

基于Python的几种图像分类方法的特性研究.pdf

基于python的深度学习的课堂行为图像识别分类项目源码.zip

基于Python卷积神经网络CNN的猫狗图像分类项目源码（毕业设计）.zip

python 实现 纹理图片分类识别 SVM支持向量机 图片资源与代码

遥感影像监督分类与非监督分类及相关代码实现

hog_svm图像分类(python代码）+图片

opencv+svm实现图像分类代码+训练图片

SVM 图片二分类

数字图像处理大作业( 图像分割、人脸检测、边缘检测 ）

基于Python的图像分类 项目实践——图像分类项目材料.zip

基于Python实现图像分类项目源码（95分以上大作业）.zip

基于Python实现图像分类项目源码+文档说明（高分项目）.zip

基于python利用pytorch实现图像分类项目源码.zip

基于Python实现图像分类项目源码+文档说明.zip

使用Python轻松完成垃圾分类（基于图像识别）

基于python inference 、ResNet实现的图像分类

python基于卷积神经网络的高光谱图像分类

用python，caffe框架下的，图像目标识别分类适合caffe初学者

CNN实现图片分类，python代码

python读取遥感影像

Python-rscup遥感图像场景分类

python，分类

注意力模型Python程序

基于Python的飞桨图像分类套件PaddleClas设计源码

基于Python的CNN与SVM组合架构图像分类设计源码

基于python的图像复制粘贴篡改识别软件.zip

最新资源

python 实现纹理图片分类识别 SVM支持向量机图片资源与代码

数字图像处理大作业( 图像分割、人脸检测、边缘检测）

基于Python的图像分类项目实践——图像分类项目材料.zip