Scale-awareAutomaticAugmentationforObjectDetection.pdf资源-CSDN文库

版权申诉

142 浏览量 2021-11-21 00:08:39 上传评论收藏 5.35MB PDF 举报

资源推荐

资源详情

资源评论

Scale-aware Automatic Augmentation for Object Detection

Yukang Chen

†

, Yanwei Li

1†

, Tao Kong

, Lu Qi

, Ruihang Chu

1∗

, Lei Li

, Jiaya Jia

1.3

The Chinese University of Hong Kong

ByteDance AI Lab

SmartMore

Abstract

We propose Scale-aware AutoAug to learn data augmen-

tation policies for object detection. We deﬁne a new scale-

aware search space, where both image- and box-level aug-

mentations are designed for maintaining scale invariance.

Upon this search space, we propose a new search met-

ric, termed Pareto Scale Balance, to facilitate search with

high efﬁciency. In experiments, Scale-aware AutoAug yields

signiﬁcant and consistent improvement on various object

detectors (e.g., RetinaNet, Faster R-CNN, Mask R-CNN,

and FCOS), even compared with strong multi-scale training

baselines. Our searched augmentation policies are trans-

ferable to other datasets and box-level tasks beyond object

detection (e.g., instance segmentation and keypoint estima-

tion) to improve performance. The search cost is much less

than previous automated augmentation approaches for ob-

ject detection. It is notable that our searched policies have

meaningful patterns, which intuitively provide valuable in-

sight for human data augmentation design. Code and mod-

els will be available at https://github.com/Jia-Research-

Lab/SA-AutoAug.

1. Introduction

Object detection, aiming to locate as well as classify var-

ious objects, is one of the core tasks in the computer vision.

Due to the large scale variance of objects in real-world sce-

narios, it raises concerns on how to bring the scale adap-

tation to the network efﬁciently. Previous work handles

this challenge mainly from two aspects, namely network ar-

chitecture and data augmentation. To make the network

scale invariant, in-network feature pyramids [28, 47, 23]

and adaptive receptive ﬁelds [25] are usually employed.

Another crucial technique to enable scale invariance is data

augmentation, which is independent of speciﬁc architec-

tures, and can be generalized among multiple tasks.

This paper focuses on data augmentation for object

detection. Current data augmentation strategies can be

This work was done during an internship at ByteDance AI Lab. Tao

Kong is responsible for correspondence.

†

Equal contribution.

53 6759 64 74

Inference time (ms/image)

AP (%)

RetinaNet

ResNet-50

FCOS

ResNet-101

Faster R-CNN

ResNet-101

RetinaNet

ResNet-101

Mask R-CNN

ResNet-101

AutoAug-det

Dropblock

Mixup

PSIS

Stitcher

GridMask

InstaBoost

RandAug

Baseline

Scale-aware AutoAug

Figure 1: Comparison with object detection augmentation strate-

gies on MS COCO dataset. Methods in the same vertical line are

based upon the same detector. Scale-aware AutoAug outperforms

both hand-crafted and learned strategies on various detectors.

grouped into color operations (e.g., brightness, contrast, and

whitening) and geometric operations (e.g., re-scaling, ﬂip-

ping). Among them, geometric operations, such as multi-

scale training, improve scale robustness [39, 19]. Sev-

eral hand-crafted data augmentation strategies were devel-

oped to improve performance and robustness of the detec-

tor [41, 42]. Previous work [17, 15] also improves box-level

augmentation by enriching foreground data. Though inspir-

ing performance gain has achieved, these data augmentation

strategies usually rely on heavy expert experience.

Automatic data augmentation policies were widely ex-

plored in image classiﬁcation [44, 50, 37, 35, 9]. Its poten-

tial for object detection, however, was not thoroughly re-

leased. One attempt to automatically learn data augmen-

tation policies for object detectors is AutoAug-det [51]

which performs color or geometric augmentation upon the

context of boxes. It does not fully consider the scale issue

from image- and box-level, which are found, however, es-

sential in object detector design [41, 42, 17]. Moreover, the

heavy computational search cost (i.e., 400 TPU for 2 days)

impedes it from vastly practical. Thus, scale-aware property

and efﬁciency issue are essential to address for searching

augmentation in box-level tasks.

In this paper, we propose a new way to automatically

We refer it as AutoAug-det [51] to distinguish from AutoAugment [9].

arXiv:2103.17220v1 [cs.CV] 31 Mar 2021

learn scale-aware data augmentation strategies for object

detection and relevant box-level tasks. We ﬁrst introduce

scale-awareness to the search space from two image- and

box-levels. For image-level augmentations, zoom-in and

-out operations are included with their probabilities and

zooming ratios for search. For box-level augmentations,

the augmenting areas are generalized with a new searchable

parameter, i.e., area ratio. This makes box-level augmenta-

tions adaptive to object scales.

Based on our scale-aware search space, we further pro-

pose a new estimation metric to facilitate the search process

with better efﬁciency. Previously, each candidate policy is

estimated by the validation accuracy on a proxy task [9, 27],

which lacks efﬁciency and accuracy to an extend. Our met-

ric takes advantage of more speciﬁc statistics, that is, vali-

dation accuracy and accumulated loss over different scales,

to measure the scale balance. We empirically show that it

yields a clearly higher correlation coefﬁcient with the actual

accuracy than the previous proxy accuracy metric.

The proposed approach is distinguished from previous

work from two aspects. First, different from hand-crafted

policies, the proposed method utilizes automatic algorithms

to search among a large variety of augmentation candidates.

It is hard to be fully explored or achieved by human effort.

Moreover, compared with previous learning-based meth-

ods, our approach fully explores the important scale issue in

both image-level and box-level. With the proposed search

space and evaluation metric, our method attains decent per-

formance with much (i.e., 40×) less search cost.

The overall approach, called Scale-aware AutoAug, can

be easily instantiated for box-level tasks, which will be elab-

orated on in Sec. 3. To validate its effectiveness, we con-

duct extensive experiments on MS COCO and Pascal VOC

dataset [30, 16] with several anchor-based and anchor-free

object detectors, which are reported in Sec. 4.2.

In particular, with ResNet-50 backbone, the searched

augmentation policies contribute non-trivial gains over the

strong MS baseline of RetinaNet [29], Faster R-CNN [39],

and FCOS [43], and achieve 41.3% AP, 41.8% AP, and

42.6% AP, respectively. We further experiment with more

box-level tasks, like instance segmentation and keypoint de-

tection. Without bells-and-whistles, our improved FCOS

model attains 51.4% AP with the search augmentation poli-

cies. Besides, our searched policies present meaningful pat-

terns, which provide intuitive insight for human knowledge.

2. Related Work

Data augmentation has been widely utilized for net-

work optimization and proven to be beneﬁcial in vision

tasks [11, 40, 39, 32, 33, 36]. Traditional approaches could

be roughly divided into color operations (e.g., brightness,

contrast, and whitening) and geometric operations (e.g.,

scaling, ﬂipping, translation, and shearing), which require

hyper-parameter tuning and are usually task-speciﬁc [31].

Some commonly used strategies on image classiﬁcation

include random cropping, image mirroring, color shift-

ing/whitening [24], Cutout [12], and Mixup [49].

Scale-wise augmentations also play a vital role for

the optimization of object detectors [46, 5]. For exam-

ple, SNIPER [42] generates image crops around ground

truth instances with multi-scale training. YOLO-v4 [2]

and Stitcher [8] introduce mosaic inputs that contain re-

scaled sub-images. For box-level augmentation, Dwibedi

et al. [15] improve detection performance with the cut-and-

paste strategy. And the visual context surrounding objects

are modeled in [14]. Furthermore, InstaBoost [17] aug-

ments training images using annotated instance masks with

a location probability map. However, these hand-crafted

designs still highly rely on expert efforts.

Inspired by recent advancements in neural architecture

search (NAS) [52, 53, 38, 7], researchers try to learn aug-

mentation policies from data automatically. An example

is AutoAugment [9], which searches data augmentations

for image classiﬁcation and achieves promising results.

PBA [22] uses population-based search method for better

efﬁciency. Fast AutoAugment [27] applies Bayesian opti-

mization to learn data augmentation policies. RandAug [10]

removes the search process at the price of manually tailor-

ing the search space to a very limited volume. AutoAug-

Det [51] extends AutoAugment [9] to object detection by

taking box-level augmentations into consideration.

3. Scale-aware AutoAug

In this section, we ﬁrst brieﬂy review the auto augmenta-

tion pipeline. Then, the scale-aware search space and esti-

mation metric will be respectively elaborated in Sec. 3.2 and

Sec. 3.3. We ﬁnally show the search framework in Sec. 3.4.

3.1. Review of AutoAug

Auto augmentation methods [9, 51, 22, 27, 26] com-

monly formulate the process of ﬁnding the best augmenta-

tion policy as a search problem. To this end, three main

components are needed, namely the search space, search

algorithm, and estimation metric. Search space may vary

according to tasks. For example, the search space [9, 22, 27]

is developed to image classiﬁcation, while it is not the spec-

iﬁed case for box-level tasks. As for search algorithms, re-

inforcement learning [52] and evolutionary algorithm [38]

are usually utilized to explore the search space in iterations.

During this procedure, each child model, which is opti-

mized with the searched policy p, is evaluated on a designed

metric to estimate its effectiveness. This metric serves as

feedback for the search algorithm.

Figure 2: Scale-aware search space. It contains image-level and box-level augmentation. Image-level augmentation includes zoom-in

and zoom-out functions with probabilities and magnitudes for search. In box-level, we introduce scale-aware area ratios, which make

operations adaptive to objects in different scales. Augmented images are further generalized with the Gaussian map.

3.2. Scale-aware Search Space

The designed scale-aware search space contains both

image-level and box-level augmentations. The image-level

augmentations include zoom-in and zoom-out functions on

the whole image. As for box-level augmentations, color and

geometric operations are searched for objects in images.

Image-level augmentations. To handle scale variations,

object detectors are commonly trained with image pyra-

mids. However, these scale settings highly rely on hand-

crafted selection. In our search space, we alleviate this bur-

den by searchable zoom-in and zoom-out functions. As il-

lustrated in the left part of Fig. 2, zoom-in and zoom-out

functions are speciﬁed by probabilities P and magnitudes

M. Speciﬁcally, the probabilities P

and P

out

are searched

in the range from 0 to 0.5. With this range, the existence of

original scale could be guaranteed with the probability

ori

= 1 − P

out

− P

. (1)

The magnitude M represents the zooming ratio for each

function. For the zoom-in function, we search a zooming

ratio from 0.5 to 1.0. For the zoom-out function, we search

a zooming ratio from 1.0 to 1.5. For example, if a zooming

ratio of 1.5 is selected, it means that the input images might

be increased by 1.5×. In traditional multi-scale training,

large-scale images would introduce an additional computa-

tional burden. To avoid this issue, we reserve the original

shape in the zoom-in function with random cropping.

After the search procedure, input images are randomly

sampled from zoom-in, zoom-out, and original scale im-

ages with the searched P and M in each training iteration.

In other words, we sample from 3 resolutions, a larger one,

a small one and the original with the searched probabilities,

i.e., {P

, P

out

, P

ori

}. To our best knowledge, no previ-

ous work considers automatic scale-aware transformation

search for object detection. Experiments validate the supe-

riority over traditional multi-scale training in Tab. 2.

Box-level augmentations. The box-level augmentations

are designed to conduct transformation for each object box.

Different from [51], the proposed approach further smooths

the augmentations and relaxes it to contain learnable fac-

tors, i.e., area ratio. In particular, previous box-level aug-

mentation [51] works exactly in the whole bounding box

annotations without attenuation, which generate an obvi-

ous boundary gap between the augmented and original re-

gion. The sudden appearance change could reduce the dif-

ﬁculty for networks to locate the augmented objects, which

brings the gap between training and inference. To solve this

issue, we extend the original rectangle augmentation to a

剩余11页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6487
资源: 9万+

Scale-aware Automatic Augmentation for Object Detection.pdf

最新资源

Scale-aware Automatic Augmentation for Object Detection.pdf

Context-Aware Embeddings for Automatic Art Analysis.pdf

DKN_ Deep knowledge-aware network for news recommendation.pdf

Ultra Fast Structure-aware Deep Lane Detection.pdf

论文研究-Location-aware Routing for Pocket Switched Networks.pdf

藏经阁-RESOURCE AWARE SCHEDULING IN APACHE STORM.pdf

论文研究-SLA-aware Energy-efficient Scheduling Scheme for Hadoop YARN.pdf

siamRPN论文系列.rar

Interest-aware Message-Passing GCN for Recommendat.md

论文研究-Re-formulating Metadata for Privacy-aware Access Control in Information Sharing.pdf

ISSCC 2013 所有

CVPR2018_Oral_论文合集_人工智能_机器学习

Ad-Aware Free v12.6.997.11652.zip

Structure-Aware Human-Action Generation.pdf

DAP Detection-Aware Pre-Training With Weak Supervision.pdf

Relation-Aware Pedestrian Attribute Recognition with.pdf

2021_Semantic-aware Binary Code Representation with BERT.docx

Cooperation Aware Task Assignment in Spatial Crowdsourcing.pdf

Java 面经手册·小傅哥.pdf

解压后拖入浏览器扩展程序使用.zip

103套PPT模板.zip

Beyond Compare 免安装直接使用

notepad++.exe官网下载

Mars4_5.zip

QT自制精美Ui模板系列（一）桃子风格模板 - 二次开发专用

keygen_2032.rar

Postman9.12.2安装包

python爬虫数据可视化分析大作业.zip

最新资源