EfficientDet-Scalableand-Efficient-Object-Detection论文资源-CSDN文库

共2个文件

txt：1个

pdf：1个

毕业设计

EfficientDet

需积分: 1 188 浏览量 2024-07-23 15:31:35 上传评论收藏 528KB 7Z 举报

资源推荐

资源详情

资源评论

收起资源包目录

EfficientDet.7z （2个子文件）

EfficientDet

EfficientDet_Scalableand_Efficient_Object_Detection.pdf 679KB

参考链接.txt 178B

EfﬁcientDet: Scalable and Efﬁcient Object Detection

Mingxing Tan Ruoming Pang Quoc V. Le

Google Research, Brain Team

{tanmingxing, rpang, qvl}@google.com

Abstract

Model efﬁciency has become increasingly important in

computer vision. In this paper, we systematically study neu-

ral network architecture design choices for object detection

and propose several key optimizations to improve efﬁciency.

First, we propose a weighted bi-directional feature pyra-

mid network (BiFPN), which allows easy and fast multi-

scale feature fusion; Second, we propose a compound scal-

ing method that uniformly scales the resolution, depth, and

width for all backbone, feature network, and box/class pre-

diction networks at the same time. Based on these optimiza-

tions and better backbones, we have developed a new family

of object detectors, called EfﬁcientDet, which consistently

achieve much better efﬁciency than prior art across a wide

spectrum of resource constraints. In particular, with single-

model and single-scale, our EfﬁcientDet-D7 achieves state-

of-the-art 55.1 AP on COCO test-dev with 77M param-

eters and 410B FLOPs

, being 4x – 9x smaller and using

13x – 42x fewer FLOPs than previous detectors. Code is

available at https://github.com/google/automl/tree/

master/efficientdet.

1. Introduction

Tremendous progresses have been made in recent years

towards more accurate object detection; meanwhile, state-

of-the-art object detectors also become increasingly more

expensive. For example, the latest AmoebaNet-based NAS-

FPN detector [45] requires 167M parameters and 3045B

FLOPs (30x more than RetinaNet [24]) to achieve state-of-

the-art accuracy. The large model sizes and expensive com-

putation costs deter their deployment in many real-world

applications such as robotics and self-driving cars where

model size and latency are highly constrained. Given these

real-world resource constraints, model efﬁciency becomes

increasingly important for object detection.

There have been many previous works aiming to de-

velop more efﬁcient detector architectures, such as one-

Similar to [14, 39], FLOPs denotes number of multiply-adds.

0 200 400 600 800 1000 1200

FLOPs (Billions)

COCO AP

EfﬁcientDet-D7

YOLOv3

Mask R-CNN

RetinaNet

ResNet + NAS-FPN

AmoebaNet + NAS-FPN + AA

AP FLOPs (ratio)

EfﬁcientDet-D0 33.8 2.5B

YOLOv3 [34] 33.0 71B (28x)

EfﬁcientDet-D1 39.6 6.1B

RetinaNet [24] 39.2 97B (16x)

EfﬁcientDet-D7x

†

55.1 410B

AmoebaNet+ NAS-FPN +AA [45]

†

50.7 3045B (13x)

†

Not plotted.

Figure 1: Model FLOPs vs. COCO accuracy – All num-

bers are for single-model single-scale. Our EfﬁcientDet

achieves new state-of-the-art 55.1% COCO AP with much

fewer parameters and FLOPs than previous detectors. More

studies on different backbones and FPN/NAS-FPN/BiFPN

are in Table 4 and 5. Complete results are in Table 2.

stage [27, 33, 34, 24] and anchor-free detectors [21, 44, 40],

or compress existing models [28, 29]. Although these meth-

ods tend to achieve better efﬁciency, they usually sacriﬁce

accuracy. Moreover, most previous works only focus on a

speciﬁc or a small range of resource requirements, but the

variety of real-world applications, from mobile devices to

datacenters, often demand different resource constraints.

A natural question is: Is it possible to build a scal-

able detection architecture with both higher accuracy and

better efﬁciency across a wide spectrum of resource con-

straints (e.g., from 3B to 300B FLOPs)? This paper aims

to tackle this problem by systematically studying various

design choices of detector architectures. Based on the one-

stage detector paradigm, we examine the design choices for

backbone, feature fusion, and class/box network, and iden-

tify two main challenges:

Challenge 1: efﬁcient multi-scale feature fusion – Since

introduced in [23], FPN has been widely used for multi-

arXiv:1911.09070v7 [cs.CV] 27 Jul 2020

scale feature fusion. Recently, PANet [26], NAS-FPN [10],

and other studies [20, 18, 42] have developed more network

structures for cross-scale feature fusion. While fusing dif-

ferent input features, most previous works simply sum them

up without distinction; however, since these different input

features are at different resolutions, we observe they usu-

ally contribute to the fused output feature unequally. To

address this issue, we propose a simple yet highly effective

weighted bi-directional feature pyramid network (BiFPN),

which introduces learnable weights to learn the importance

of different input features, while repeatedly applying top-

down and bottom-up multi-scale feature fusion.

Challenge 2: model scaling – While previous works

mainly rely on bigger backbone networks [24, 35, 34, 10] or

larger input image sizes [13, 45] for higher accuracy, we ob-

serve that scaling up feature network and box/class predic-

tion network is also critical when taking into account both

accuracy and efﬁciency. Inspired by recent works [39], we

propose a compound scaling method for object detectors,

which jointly scales up the resolution/depth/width for all

backbone, feature network, box/class prediction network.

Finally, we also observe that the recently introduced Efﬁ-

cientNets [39] achieve better efﬁciency than previous com-

monly used backbones. Combining EfﬁcientNet backbones

with our propose BiFPN and compound scaling, we have

developed a new family of object detectors, named Efﬁ-

cientDet, which consistently achieve better accuracy with

much fewer parameters and FLOPs than previous object

detectors. Figure 1 and Figure 4 show the performance

comparison on COCO dataset [25]. Under similar accu-

racy constraint, our EfﬁcientDet uses 28x fewer FLOPs than

YOLOv3 [34], 30x fewer FLOPs than RetinaNet [24], and

19x fewer FLOPs than the recent ResNet based NAS-FPN

[10]. In particular, with single-model and single test-time

scale, our EfﬁcientDet-D7 achieves state-of-the-art 55.1 AP

with 77M parameters and 410B FLOPs, outperforming pre-

vious best detector [45] by 4 AP while being 2.7x smaller

and using 7.4x fewer FLOPs. Our EfﬁcientDet is also up to

4x to 11x faster on GPU/CPU than previous detectors.

With simple modiﬁcations, we also demonstrate that

our single-model single-scale EfﬁcientDet achieves 81.74%

mIOU accuracy with 18B FLOPs on Pascal VOC 2012 se-

mantic segmentation, outperforming DeepLabV3+ [6] by

1.7% better accuracy with 9.8x fewer FLOPs.

2. Related Work

One-Stage Detectors: Existing object detectors are

mostly categorized by whether they have a region-of-

interest proposal step (two-stage [11, 35, 5, 13]) or not (one-

stage [36, 27, 33, 24]). While two-stage detectors tend to be

more ﬂexible and more accurate, one-stage detectors are of-

ten considered to be simpler and more efﬁcient by leverag-

ing predeﬁned anchors [17]. Recently, one-stage detectors

have attracted substantial attention due to their efﬁciency

and simplicity [21, 42, 44]. In this paper, we mainly follow

the one-stage detector design, and we show it is possible

to achieve both better efﬁciency and higher accuracy with

optimized network architectures.

Multi-Scale Feature Representations: One of the main

difﬁculties in object detection is to effectively represent and

process multi-scale features. Earlier detectors often directly

perform predictions based on the pyramidal feature hierar-

chy extracted from backbone networks [4, 27, 36]. As one

of the pioneering works, feature pyramid network (FPN)

[23] proposes a top-down pathway to combine multi-scale

features. Following this idea, PANet [26] adds an extra

bottom-up path aggregation network on top of FPN; STDL

[43] proposes a scale-transfer module to exploit cross-scale

features; M2det [42] proposes a U-shape module to fuse

multi-scale features, and G-FRNet [2] introduces gate units

for controlling information ﬂow across features. More re-

cently, NAS-FPN [10] leverages neural architecture search

to automatically design feature network topology. Although

it achieves better performance, NAS-FPN requires thou-

sands of GPU hours during search, and the resulting feature

network is irregular and thus difﬁcult to interpret. In this

paper, we aim to optimize multi-scale feature fusion with a

more intuitive and principled way.

Model Scaling: In order to obtain better accuracy, it

is common to scale up a baseline detector by employing

bigger backbone networks (e.g., from mobile-size models

[38, 16] and ResNet [14], to ResNeXt [41] and AmoebaNet

[32]), or increasing input image size (e.g., from 512x512

[24] to 1536x1536 [45]). Some recent works [10, 45] show

that increasing the channel size and repeating feature net-

works can also lead to higher accuracy. These scaling

methods mostly focus on single or limited scaling dimen-

sions. Recently, [39] demonstrates remarkable model efﬁ-

ciency for image classiﬁcation by jointly scaling up network

width, depth, and resolution. Our proposed compound scal-

ing method for object detection is mostly inspired by [39].

3. BiFPN

In this section, we ﬁrst formulate the multi-scale feature

fusion problem, and then introduce the main ideas for our

proposed BiFPN: efﬁcient bidirectional cross-scale connec-

tions and weighted feature fusion.

3.1. Problem Formulation

Multi-scale feature fusion aims to aggregate features at

different resolutions. Formally, given a list of multi-scale

features

= (P

, P

, ...), where P

represents the

feature at level l

, our goal is to ﬁnd a transformation f that

can effectively aggregate different features and output a list

of new features:

out

= f(

). As a concrete example,

(a) FPN

(d) BiFPN

(b) PANet

repeated blocks repeated blocks

Figure 2: Feature network design – (a) FPN [23] introduces a top-down pathway to fuse multi-scale features from level 3 to

7 (P

- P

); (b) PANet [26] adds an additional bottom-up pathway on top of FPN; (c) NAS-FPN [10] use neural architecture

search to ﬁnd an irregular feature network topology and then repeatedly apply the same block; (d) is our BiFPN with better

accuracy and efﬁciency trade-offs.

Figure 2(a) shows the conventional top-down FPN [23]. It

takes level 3-7 input features

= (P

, ...P

), where

represents a feature level with resolution of 1/2

of the

input images. For instance, if input resolution is 640x640,

then P

represents feature level 3 (640/2

= 80) with res-

olution 80x80, while P

represents feature level 7 with res-

olution 5x5. The conventional FPN aggregates multi-scale

features in a top-down manner:

out

= Conv(P

)

out

= Conv(P

+ Resize(P

out

))

...

out

= Conv(P

+ Resize(P

out

))

where Resize is usually a upsampling or downsampling

op for resolution matching, and Conv is usually a convo-

lutional op for feature processing.

3.2. Cross-Scale Connections

Conventional top-down FPN is inherently limited by the

one-way information ﬂow. To address this issue, PANet

[26] adds an extra bottom-up path aggregation network, as

shown in Figure 2(b). Cross-scale connections are further

studied in [20, 18, 42]. Recently, NAS-FPN [10] employs

neural architecture search to search for better cross-scale

feature network topology, but it requires thousands of GPU

hours during search and the found network is irregular and

difﬁcult to interpret or modify, as shown in Figure 2(c).

By studying the performance and efﬁciency of these

three networks (Table 5), we observe that PANet achieves

better accuracy than FPN and NAS-FPN, but with the cost

of more parameters and computations. To improve model

efﬁciency, this paper proposes several optimizations for

cross-scale connections: First, we remove those nodes that

only have one input edge. Our intuition is simple: if a

node has only one input edge with no feature fusion, then

it will have less contribution to feature network that aims

at fusing different features. This leads to a simpliﬁed bi-

directional network; Second, we add an extra edge from the

original input to output node if they are at the same level,

in order to fuse more features without adding much cost;

Third, unlike PANet [26] that only has one top-down and

one bottom-up path, we treat each bidirectional (top-down

& bottom-up) path as one feature network layer, and repeat

the same layer multiple times to enable more high-level fea-

ture fusion. Section 4.2 will discuss how to determine the

number of layers for different resource constraints using a

compound scaling method. With these optimizations, we

name the new feature network as bidirectional feature pyra-

mid network (BiFPN), as shown in Figure 2 and 3.

3.3. Weighted Feature Fusion

When fusing features with different resolutions, a com-

mon way is to ﬁrst resize them to the same resolution and

then sum them up. Pyramid attention network [22] intro-

duces global self-attention upsampling to recover pixel lo-

calization, which is further studied in [10]. All previous

methods treat all input features equally without distinction.

However, we observe that since different input features are

at different resolutions, they usually contribute to the output

feature unequally. To address this issue, we propose to add

an additional weight for each input, and let the network to

learn the importance of each input feature. Based on this

idea, we consider three weighted fusion approaches:

Unbounded fusion: O =

· I

, where w

is a

评论收藏

内容反馈

图灵追慕者

粉丝: 4156
资源: 189

EfficientDet-Scalableand-Efficient-Object-Detection论文

EfficientDet-master.zip

efficientdet-d7.tar.gz

Yet-Another-EfficientDet-Pytorch-master.rar

efficientdet-d7.pth

efficientdet-d4.tar.gz

Yet-Another-EfficientDet-Pytorch.zip

efficientdet-d1.h5

efficientdet-d3.pth

efficientdet-d5.pth

efficientdet-d0.h5

Real-Time C++ Efficient Object-Oriented and Template Programming(3rd) 无水印原版pdf

efficientnet-b0-b7权重文件.zip

efficientdet-pytorch-master_目标检测_优化_

EfficientNet_model.rar

Real-Time C++ Efficient Object-Oriented and Template Microcontroller Programming

基于matlab的EfficientDet-D0目标检测网络识别80个对象.zip

efficientdet-keras:这是一个efficientdet-keras的源码，可以用于训练自己的模型

细节增强的matlab代码-pretrained-efficientdet-d0:在MATLAB中使用预训练的EfficientDet-D0模

efficientdet-pytorch_efficientdet算法_堆积物_图片识别_

DeepSeaNet：使用 EfficientDet 改进水下物体检测

efficientnet-b6-c76e70fd.rar

NMS通用算法_论文《Efficient Non-Maximum Suppression》（中文）

Python库 | efficient_det-0.0.4-py3-none-any.whl

PyPI 官网下载 | efficient_det-0.0.11-py3-none-any.whl

efficientdet-tf:EfficientDet https的自定义实现

34个经典javaweb项目实例.zip

毕业设计 springBoot人力资源管理系统+毕业论文+前后端源代码

项目源码：基于Hadoop+Spark招聘推荐可视化系统 大数据项目 计算机毕业设计

最新资源

项目源码：基于Hadoop+Spark招聘推荐可视化系统大数据项目计算机毕业设计