Paper-自动增强技术_自动增强资源-CSDN文库

需积分: 5 2 浏览量 2023-11-02 09:42:43 上传评论收藏 1.09MB PDF 举报

资源推荐

资源详情

资源评论

AutoAugment:

Learning Augmentation Strategies from Data

Ekin D. Cubuk

∗

, Barret Zoph

∗

, Dandelion Man

e, Vijay Vasudevan, Quoc V. Le

Google Brain

Abstract

Data augmentation is an effective technique for improv-

ing the accuracy of modern image classiﬁers. However, cur-

rent data augmentation implementations are manually de-

signed. In this paper, we describe a simple procedure called

AutoAugment to automatically search for improved data

augmentation policies. In our implementation, we have de-

signed a search space where a policy consists of many sub-

policies, one of which is randomly chosen for each image

in each mini-batch. A sub-policy consists of two opera-

tions, each operation being an image processing function

such as translation, rotation, or shearing, and the probabil-

ities and magnitudes with which the functions are applied.

We use a search algorithm to ﬁnd the best policy such that

the neural network yields the highest validation accuracy

on a target dataset. Our method achieves state-of-the-art

accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet

(without additional data). On ImageNet, we attain a Top-1

accuracy of 83.5% which is 0.4% better than the previous

record of 83.1%. On CIFAR-10, we achieve an error rate of

1.5%, which is 0.6% better than the previous state-of-the-

art. Augmentation policies we ﬁnd are transferable between

datasets. The policy learned on ImageNet transfers well to

achieve signiﬁcant improvements on other datasets, such as

Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Air-

craft, and Stanford Cars.

1. Introduction

Deep neural nets are powerful machine learning systems

that tend to work well when trained on massive amounts

of data. Data augmentation is an effective technique to in-

crease both the amount and diversity of data by randomly

“augmenting” it [3, 54, 29]; in the image domain, common

augmentations include translating the image by a few pix-

els, or ﬂipping the image horizontally. Intuitively, data aug-

mentation is used to teach a model about invariances in the

data domain: classifying an object is often insensitive to

∗

Equal contribution.

horizontal ﬂips or translation. Network architectures can

also be used to hardcode invariances: convolutional net-

works bake in translation invariance [16, 32, 25, 29]. How-

ever, using data augmentation to incorporate potential in-

variances can be easier than hardcoding invariances into the

model architecture directly.

Dataset GPU Best published Our results

hours results

CIFAR-10 5000 2.1 1.5

CIFAR-100 0 12.2 10.7

SVHN 1000 1.3 1.0

Stanford Cars 0 5.9 5.2

ImageNet 15000 3.9 3.5

Table 1. Error rates (%) from this paper compared to the best re-

sults so far on ﬁve datasets (Top-5 for ImageNet, Top-1 for the

others). Previous best result on Stanford Cars ﬁne-tuned weights

originally trained on a larger dataset [66], whereas we use a ran-

domly initialized network. Previous best results on other datasets

only include models that were not trained on additional data, for

a single evaluation (without ensembling). See Tables 2,3, and 4

for more detailed comparison. GPU hours are estimated for an

NVIDIA Tesla P100.

Yet a large focus of the machine learning and computer

vision community has been to engineer better network ar-

chitectures (e.g., [55, 59, 20, 58, 64, 19, 72, 23, 48]). Less

attention has been paid to ﬁnding better data augmentation

methods that incorporate more invariances. For instance,

on ImageNet, the data augmentation approach by [29], in-

troduced in 2012, remains the standard with small changes.

Even when augmentation improvements have been found

for a particular dataset, they often do not transfer to other

datasets as effectively. For example, horizontal ﬂipping of

images during training is an effective data augmentation

method on CIFAR-10, but not on MNIST, due to the dif-

ferent symmetries present in these datasets. The need for

automatically learned data-augmentation has been raised re-

cently as an important unsolved problem [57].

In this paper, we aim to automate the process of ﬁnding

an effective data augmentation policy for a target dataset.

In our implementation (Section 3), each policy expresses

several choices and orders of possible augmentation opera-

arXiv:1805.09501v3 [cs.CV] 11 Apr 2019

tions, where each operation is an image processing func-

tion (e.g., translation, rotation, or color normalization),

the probabilities of applying the function, and the magni-

tudes with which they are applied. We use a search al-

gorithm to ﬁnd the best choices and orders of these oper-

ations such that training a neural network yields the best

validation accuracy. In our experiments, we use Reinforce-

ment Learning [71] as the search algorithm, but we believe

the results can be further improved if better algorithms are

used [48, 39].

Our extensive experiments show that AutoAugment

achieves excellent improvements in two use cases: 1) Au-

toAugment can be applied directly on the dataset of interest

to ﬁnd the best augmentation policy (AutoAugment-direct)

and 2) learned policies can be transferred to new datasets

(AutoAugment-transfer). Firstly, for direct application, our

method achieves state-of-the-art accuracy on datasets such

as CIFAR-10, reduced CIFAR-10, CIFAR-100, SVHN, re-

duced SVHN, and ImageNet (without additional data). On

CIFAR-10, we achieve an error rate of 1.5%, which is 0.6%

better than the previous state-of-the-art [48]. On SVHN,

we improve the state-of-the-art error rate from 1.3% [12]

to 1.0%. On reduced datasets, our method achieves per-

formance comparable to semi-supervised methods without

using any unlabeled data. On ImageNet, we achieve a Top-

1 accuracy of 83.5% which is 0.4% better than the previous

record of 83.1%. Secondly, if direct application is too ex-

pensive, transferring an augmentation policy can be a good

alternative. For transferring an augmentation policy, we

show that policies found on one task can generalize well

across different models and datasets. For example, the pol-

icy found on ImageNet leads to signiﬁcant improvements

on a variety of FGVC datasets. Even on datasets for which

ﬁne-tuning weights pre-trained on ImageNet does not help

signiﬁcantly [26], e.g. Stanford Cars [27] and FGVC Air-

craft [38], training with the ImageNet policy reduces test

set error by 1.2% and 1.8%, respectively. This result sug-

gests that transferring data augmentation policies offers an

alternative method for standard weight transfer learning. A

summary of our results is shown in Table 1.

2. Related Work

Common data augmentation methods for image recog-

nition have been designed manually and the best augmenta-

tion strategies are dataset-speciﬁc. For example, on MNIST,

most top-ranked models use elastic distortions, scale, trans-

lation, and rotation [54, 8, 62, 52]. On natural image

datasets, such as CIFAR-10 and ImageNet, random crop-

ping, image mirroring and color shifting / whitening are

more common [29]. As these methods are designed manu-

ally, they require expert knowledge and time. Our approach

of learning data augmentation policies from data in princi-

ple can be used for any dataset, not just one.

This paper introduces an automated approach to ﬁnd data

augmentation policies from data. Our approach is inspired

by recent advances in architecture search, where reinforce-

ment learning and evolution have been used to discover

model architectures from data [71, 4, 72, 7, 35, 13, 34, 46,

49, 63, 48, 9]. Although these methods have improved upon

human-designed architectures, it has not been possible to

beat the 2% error-rate barrier on CIFAR-10 using architec-

ture search alone.

Previous attempts at learned data augmentations include

Smart Augmentation, which proposed a network that au-

tomatically generates augmented data by merging two or

more samples from the same class [33]. Tran et al. used a

Bayesian approach to generate data based on the distribu-

tion learned from the training set [61]. DeVries and Taylor

used simple transformations in the learned feature space to

augment data [11].

Generative adversarial networks have also been used for

the purpose of generating additional data (e.g., [45, 41, 70,

2, 56]). The key difference between our method and gen-

erative models is that our method generates symbolic trans-

formation operations, whereas generative models, such as

GANs, generate the augmented data directly. An exception

is work by Ratner et al., who used GANs to generate se-

quences that describe data augmentation strategies [47].

3. AutoAugment: Searching for best Augmen-

tation policies Directly on the Dataset of In-

terest

We formulate the problem of ﬁnding the best augmen-

tation policy as a discrete search problem (see Figure 1).

Our method consists of two components: A search algo-

rithm and a search space. At a high level, the search al-

gorithm (implemented as a controller RNN) samples a data

augmentation policy S, which has information about what

image processing operation to use, the probability of using

the operation in each batch, and the magnitude of the oper-

ation. Key to our method is the fact that the policy S will

be used to train a neural network with a ﬁxed architecture,

whose validation accuracy R will be sent back to update the

controller. Since R is not differentiable, the controller will

be updated by policy gradient methods. In the following

section we will describe the two components in detail.

Search space details: In our search space, a policy con-

sists of 5 sub-policies with each sub-policy consisting of

two image operations to be applied in sequence. Addition-

ally, each operation is also associated with two hyperpa-

rameters: 1) the probability of applying the operation, and

2) the magnitude of the operation.

Figure 2 shows an example of a policy with 5-sub-

policies in our search space. The ﬁrst sub-policy speciﬁes

a sequential application of ShearX followed by Invert. The

Figure 1. Overview of our framework of using a search method

(e.g., Reinforcement Learning) to search for better data augmen-

tation policies. A controller RNN predicts an augmentation policy

from the search space. A child network with a ﬁxed architecture

is trained to convergence achieving accuracy R. The reward R will

be used with the policy gradient method to update the controller

so that it can generate better policies over time.

probability of applying ShearX is 0.9, and when applied,

has a magnitude of 7 out of 10. We then apply Invert with

probability of 0.8. The Invert operation does not use the

magnitude information. We emphasize that these operations

are applied in the speciﬁed order.

Figure 2. One of the policies found on SVHN, and how it can be

used to generate augmented data given an original image used to

train a neural network. The policy has 5 sub-policies. For every

image in a mini-batch, we choose a sub-policy uniformly at ran-

dom to generate a transformed image to train the neural network.

Each sub-policy consists of 2 operations, each operation is associ-

ated with two numerical values: the probability of calling the op-

eration, and the magnitude of the operation. There is a probability

of calling an operation, so the operation may not be applied in that

mini-batch. However, if applied, it is applied with the ﬁxed mag-

nitude. We highlight the stochasticity in applying the sub-policies

by showing how one image can be transformed differently in dif-

ferent mini-batches, even with the same sub-policy. As explained

in the text, on SVHN, geometric transformations are picked more

often by AutoAugment. It can be seen why Invert is a commonly

selected operation on SVHN, since the numbers in the image are

invariant to that transformation.

The operations we used in our experiments are from PIL,

a popular Python image library.

For generality, we consid-

ered all functions in PIL that accept an image as input and

https://pillow.readthedocs.io/en/5.1.x/

output an image. We additionally used two other promis-

ing augmentation techniques: Cutout [12] and SamplePair-

ing [24]. The operations we searched over are ShearX/Y,

TranslateX/Y, Rotate, AutoContrast, Invert, Equalize, So-

larize, Posterize, Contrast, Color, Brightness, Sharpness,

Cutout [12], Sample Pairing [24].

In total, we have 16

operations in our search space. Each operation also comes

with a default range of magnitudes, which will be described

in more detail in Section 4. We discretize the range of mag-

nitudes into 10 values (uniform spacing) so that we can use

a discrete search algorithm to ﬁnd them. Similarly, we also

discretize the probability of applying that operation into 11

values (uniform spacing). Finding each sub-policy becomes

a search problem in a space of (16× 10 × 11)

possibilities.

Our goal, however, is to ﬁnd 5 such sub-policies concur-

rently in order to increase diversity. The search space with 5

sub-policies then has roughly (16× 10×11)

≈ 2.9×10

possibilities.

The 16 operations we used and their default range of val-

ues are shown in Table 1 in the Appendix. Notice that there

is no explicit “Identity” operation in our search space; this

operation is implicit, and can be achieved by calling an op-

eration with probability set to be 0.

Search algorithm details: The search algorithm that

we used in our experiment uses Reinforcement Learning,

inspired by [71, 4, 72, 5]. The search algorithm has two

components: a controller, which is a recurrent neural net-

work, and the training algorithm, which is the Proximal

Policy Optimization algorithm [53]. At each step, the con-

troller predicts a decision produced by a softmax; the pre-

diction is then fed into the next step as an embedding. In

total the controller has 30 softmax predictions in order to

predict 5 sub-policies, each with 2 operations, and each op-

eration requiring an operation type, magnitude and proba-

bility.

The training of controller RNN: The controller is

trained with a reward signal, which is how good the policy is

in improving the generalization of a “child model” (a neural

network trained as part of the search process). In our exper-

iments, we set aside a validation set to measure the gen-

eralization of a child model. A child model is trained with

augmented data generated by applying the 5 sub-policies on

the training set (that does not contain the validation set). For

each example in the mini-batch, one of the 5 sub-policies is

chosen randomly to augment the image. The child model

is then evaluated on the validation set to measure the accu-

racy, which is used as the reward signal to train the recurrent

network controller. On each dataset, the controller samples

about 15,000 policies.

Architecture of controller RNN and training hyper-

parameters: We follow the training procedure and hyper-

parameters from [72] for training the controller. More con-

Details about these operations are listed in Table 1 in the Appendix.

剩余13页未读，继续阅读

评论收藏

内容反馈

IRUIRUI__

粉丝: 543
资源: 55

Paper-自动增强技术

cambridge-english-key-sample-paper-1-listening v2.pdf

cambridge-english-key-sample-paper-1-reading-and-writing v2.pdf

cambridge-english-key-sample-paper-1-speaking v2.pdf

paper-onboarding-android.zip

KET样卷 cambridge-english-key-for-schools-sample-paper-1-reading-and-writing v2

Metaverse-white-paper-v2.1-EN.pdf

paper-dropdown-menu:材质设计浏览器选择元素

Pentaho_Technical_Whitepaper-1-6.pdf

前端项目-paper-css.zip

Android-paper-onboarding-android.zip

Path-white-paper-chinese.pdf

whitepaper-未来网络白皮书——数据中心自动驾驶网络技术白皮书.pdf

paper-dropdown-menu, 材质设计浏览器选择元素.zip

paper-autocomplete, 聚合物自动完成组件.zip

5G-PPP-White-Paper-on-Automotive-Vertical-Sectors.pdf

Metaverse-digital-identity-white-paper-v1.0-EN.pdf

cisco-white-paper-c11-740788.pdf

5GAA_White-Paper-CV2X-Roadmap.pdf

Whitepaper - Top 6 Microservices Patterns

博客中Kmeans以及FCM算法数据（免积分）

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中 支持win7+

衣服数据集-四个独立的压缩文件

神经网络回归预测--气温数据集

机器学习期末复习题及答案

中文短信数据集-带标签

XGBoost+LightGBM+LSTM-光伏发电量预测

Mathwork+Matlab+编程手册

hugging face的models-openai-clip-vit-large-patch14文件夹

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！

最新资源

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中支持win7+