没有合适的资源?快使用搜索试试~ 我知道了~
Scale-aware Automatic Augmentation for Object Detection.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 142 浏览量
2021-11-21
00:08:39
上传
评论
收藏 5.35MB PDF 举报
温馨提示
试读
12页
Scale-aware Automatic Augmentation for Object Detection
资源推荐
资源详情
资源评论
Scale-aware Automatic Augmentation for Object Detection
Yukang Chen
1
*
†
, Yanwei Li
1†
, Tao Kong
2
, Lu Qi
1
, Ruihang Chu
1∗
, Lei Li
2
, Jiaya Jia
1.3
1
The Chinese University of Hong Kong
2
ByteDance AI Lab
3
SmartMore
Abstract
We propose Scale-aware AutoAug to learn data augmen-
tation policies for object detection. We define a new scale-
aware search space, where both image- and box-level aug-
mentations are designed for maintaining scale invariance.
Upon this search space, we propose a new search met-
ric, termed Pareto Scale Balance, to facilitate search with
high efficiency. In experiments, Scale-aware AutoAug yields
significant and consistent improvement on various object
detectors (e.g., RetinaNet, Faster R-CNN, Mask R-CNN,
and FCOS), even compared with strong multi-scale training
baselines. Our searched augmentation policies are trans-
ferable to other datasets and box-level tasks beyond object
detection (e.g., instance segmentation and keypoint estima-
tion) to improve performance. The search cost is much less
than previous automated augmentation approaches for ob-
ject detection. It is notable that our searched policies have
meaningful patterns, which intuitively provide valuable in-
sight for human data augmentation design. Code and mod-
els will be available at https://github.com/Jia-Research-
Lab/SA-AutoAug.
1. Introduction
Object detection, aiming to locate as well as classify var-
ious objects, is one of the core tasks in the computer vision.
Due to the large scale variance of objects in real-world sce-
narios, it raises concerns on how to bring the scale adap-
tation to the network efficiently. Previous work handles
this challenge mainly from two aspects, namely network ar-
chitecture and data augmentation. To make the network
scale invariant, in-network feature pyramids [28, 47, 23]
and adaptive receptive fields [25] are usually employed.
Another crucial technique to enable scale invariance is data
augmentation, which is independent of specific architec-
tures, and can be generalized among multiple tasks.
This paper focuses on data augmentation for object
detection. Current data augmentation strategies can be
*
This work was done during an internship at ByteDance AI Lab. Tao
Kong is responsible for correspondence.
†
Equal contribution.
53 6759 64 74
Inference time (ms/image)
36
37
38
39
40
41
42
43
44
45
AP (%)
RetinaNet
ResNet-50
FCOS
ResNet-101
Faster R-CNN
ResNet-101
RetinaNet
ResNet-101
Mask R-CNN
ResNet-101
AutoAug-det
Dropblock
Mixup
PSIS
Stitcher
GridMask
InstaBoost
RandAug
Baseline
Scale-aware AutoAug
Figure 1: Comparison with object detection augmentation strate-
gies on MS COCO dataset. Methods in the same vertical line are
based upon the same detector. Scale-aware AutoAug outperforms
both hand-crafted and learned strategies on various detectors.
grouped into color operations (e.g., brightness, contrast, and
whitening) and geometric operations (e.g., re-scaling, flip-
ping). Among them, geometric operations, such as multi-
scale training, improve scale robustness [39, 19]. Sev-
eral hand-crafted data augmentation strategies were devel-
oped to improve performance and robustness of the detec-
tor [41, 42]. Previous work [17, 15] also improves box-level
augmentation by enriching foreground data. Though inspir-
ing performance gain has achieved, these data augmentation
strategies usually rely on heavy expert experience.
Automatic data augmentation policies were widely ex-
plored in image classification [44, 50, 37, 35, 9]. Its poten-
tial for object detection, however, was not thoroughly re-
leased. One attempt to automatically learn data augmen-
tation policies for object detectors is AutoAug-det [51]
1
,
which performs color or geometric augmentation upon the
context of boxes. It does not fully consider the scale issue
from image- and box-level, which are found, however, es-
sential in object detector design [41, 42, 17]. Moreover, the
heavy computational search cost (i.e., 400 TPU for 2 days)
impedes it from vastly practical. Thus, scale-aware property
and efficiency issue are essential to address for searching
augmentation in box-level tasks.
In this paper, we propose a new way to automatically
1
We refer it as AutoAug-det [51] to distinguish from AutoAugment [9].
arXiv:2103.17220v1 [cs.CV] 31 Mar 2021
learn scale-aware data augmentation strategies for object
detection and relevant box-level tasks. We first introduce
scale-awareness to the search space from two image- and
box-levels. For image-level augmentations, zoom-in and
-out operations are included with their probabilities and
zooming ratios for search. For box-level augmentations,
the augmenting areas are generalized with a new searchable
parameter, i.e., area ratio. This makes box-level augmenta-
tions adaptive to object scales.
Based on our scale-aware search space, we further pro-
pose a new estimation metric to facilitate the search process
with better efficiency. Previously, each candidate policy is
estimated by the validation accuracy on a proxy task [9, 27],
which lacks efficiency and accuracy to an extend. Our met-
ric takes advantage of more specific statistics, that is, vali-
dation accuracy and accumulated loss over different scales,
to measure the scale balance. We empirically show that it
yields a clearly higher correlation coefficient with the actual
accuracy than the previous proxy accuracy metric.
The proposed approach is distinguished from previous
work from two aspects. First, different from hand-crafted
policies, the proposed method utilizes automatic algorithms
to search among a large variety of augmentation candidates.
It is hard to be fully explored or achieved by human effort.
Moreover, compared with previous learning-based meth-
ods, our approach fully explores the important scale issue in
both image-level and box-level. With the proposed search
space and evaluation metric, our method attains decent per-
formance with much (i.e., 40×) less search cost.
The overall approach, called Scale-aware AutoAug, can
be easily instantiated for box-level tasks, which will be elab-
orated on in Sec. 3. To validate its effectiveness, we con-
duct extensive experiments on MS COCO and Pascal VOC
dataset [30, 16] with several anchor-based and anchor-free
object detectors, which are reported in Sec. 4.2.
In particular, with ResNet-50 backbone, the searched
augmentation policies contribute non-trivial gains over the
strong MS baseline of RetinaNet [29], Faster R-CNN [39],
and FCOS [43], and achieve 41.3% AP, 41.8% AP, and
42.6% AP, respectively. We further experiment with more
box-level tasks, like instance segmentation and keypoint de-
tection. Without bells-and-whistles, our improved FCOS
model attains 51.4% AP with the search augmentation poli-
cies. Besides, our searched policies present meaningful pat-
terns, which provide intuitive insight for human knowledge.
2. Related Work
Data augmentation has been widely utilized for net-
work optimization and proven to be beneficial in vision
tasks [11, 40, 39, 32, 33, 36]. Traditional approaches could
be roughly divided into color operations (e.g., brightness,
contrast, and whitening) and geometric operations (e.g.,
scaling, flipping, translation, and shearing), which require
hyper-parameter tuning and are usually task-specific [31].
Some commonly used strategies on image classification
include random cropping, image mirroring, color shift-
ing/whitening [24], Cutout [12], and Mixup [49].
Scale-wise augmentations also play a vital role for
the optimization of object detectors [46, 5]. For exam-
ple, SNIPER [42] generates image crops around ground
truth instances with multi-scale training. YOLO-v4 [2]
and Stitcher [8] introduce mosaic inputs that contain re-
scaled sub-images. For box-level augmentation, Dwibedi
et al. [15] improve detection performance with the cut-and-
paste strategy. And the visual context surrounding objects
are modeled in [14]. Furthermore, InstaBoost [17] aug-
ments training images using annotated instance masks with
a location probability map. However, these hand-crafted
designs still highly rely on expert efforts.
Inspired by recent advancements in neural architecture
search (NAS) [52, 53, 38, 7], researchers try to learn aug-
mentation policies from data automatically. An example
is AutoAugment [9], which searches data augmentations
for image classification and achieves promising results.
PBA [22] uses population-based search method for better
efficiency. Fast AutoAugment [27] applies Bayesian opti-
mization to learn data augmentation policies. RandAug [10]
removes the search process at the price of manually tailor-
ing the search space to a very limited volume. AutoAug-
Det [51] extends AutoAugment [9] to object detection by
taking box-level augmentations into consideration.
3. Scale-aware AutoAug
In this section, we first briefly review the auto augmenta-
tion pipeline. Then, the scale-aware search space and esti-
mation metric will be respectively elaborated in Sec. 3.2 and
Sec. 3.3. We finally show the search framework in Sec. 3.4.
3.1. Review of AutoAug
Auto augmentation methods [9, 51, 22, 27, 26] com-
monly formulate the process of finding the best augmenta-
tion policy as a search problem. To this end, three main
components are needed, namely the search space, search
algorithm, and estimation metric. Search space may vary
according to tasks. For example, the search space [9, 22, 27]
is developed to image classification, while it is not the spec-
ified case for box-level tasks. As for search algorithms, re-
inforcement learning [52] and evolutionary algorithm [38]
are usually utilized to explore the search space in iterations.
During this procedure, each child model, which is opti-
mized with the searched policy p, is evaluated on a designed
metric to estimate its effectiveness. This metric serves as
feedback for the search algorithm.
Figure 2: Scale-aware search space. It contains image-level and box-level augmentation. Image-level augmentation includes zoom-in
and zoom-out functions with probabilities and magnitudes for search. In box-level, we introduce scale-aware area ratios, which make
operations adaptive to objects in different scales. Augmented images are further generalized with the Gaussian map.
3.2. Scale-aware Search Space
The designed scale-aware search space contains both
image-level and box-level augmentations. The image-level
augmentations include zoom-in and zoom-out functions on
the whole image. As for box-level augmentations, color and
geometric operations are searched for objects in images.
Image-level augmentations. To handle scale variations,
object detectors are commonly trained with image pyra-
mids. However, these scale settings highly rely on hand-
crafted selection. In our search space, we alleviate this bur-
den by searchable zoom-in and zoom-out functions. As il-
lustrated in the left part of Fig. 2, zoom-in and zoom-out
functions are specified by probabilities P and magnitudes
M. Specifically, the probabilities P
in
and P
out
are searched
in the range from 0 to 0.5. With this range, the existence of
original scale could be guaranteed with the probability
P
ori
= 1 − P
out
− P
in
. (1)
The magnitude M represents the zooming ratio for each
function. For the zoom-in function, we search a zooming
ratio from 0.5 to 1.0. For the zoom-out function, we search
a zooming ratio from 1.0 to 1.5. For example, if a zooming
ratio of 1.5 is selected, it means that the input images might
be increased by 1.5×. In traditional multi-scale training,
large-scale images would introduce an additional computa-
tional burden. To avoid this issue, we reserve the original
shape in the zoom-in function with random cropping.
After the search procedure, input images are randomly
sampled from zoom-in, zoom-out, and original scale im-
ages with the searched P and M in each training iteration.
In other words, we sample from 3 resolutions, a larger one,
a small one and the original with the searched probabilities,
i.e., {P
in
, P
out
, P
ori
}. To our best knowledge, no previ-
ous work considers automatic scale-aware transformation
search for object detection. Experiments validate the supe-
riority over traditional multi-scale training in Tab. 2.
Box-level augmentations. The box-level augmentations
are designed to conduct transformation for each object box.
Different from [51], the proposed approach further smooths
the augmentations and relaxes it to contain learnable fac-
tors, i.e., area ratio. In particular, previous box-level aug-
mentation [51] works exactly in the whole bounding box
annotations without attenuation, which generate an obvi-
ous boundary gap between the augmented and original re-
gion. The sudden appearance change could reduce the dif-
ficulty for networks to locate the augmented objects, which
brings the gap between training and inference. To solve this
issue, we extend the original rectangle augmentation to a
剩余11页未读,继续阅读
资源评论
易小侠
- 粉丝: 6487
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功