【免费】激活函数论文FunnelActivationforVisualRecognition1资源-CSDN文库

需积分: 0 170 浏览量更新于2022-08-03 收藏 2.24MB PDF 举报

在视觉识别任务中，激活函数的作用至关重要，它们引入非线性特性，使神经网络能够学习更复杂的模式。本文提出的“Funnel Activation”（FReLU）是一种新的2D激活函数，旨在扩展ReLU和PReLU的功能，同时保持较低的计算开销。 ReLU（Rectified Linear Unit）是最早被广泛采用的激活函数，其表达式为y = max(x, 0)，它在x小于0时返回0，否则返回x本身。PReLU（Parametric ReLU）进一步引入了可学习的参数p，表达式为y = max(x, px)，通过这个参数可以适应不同的输入分布。然而，这些传统的激活函数忽略了空间条件的影响，无法捕捉图像中的复杂布局。 FReLU（Funnel Activation）则引入了一个2D的空间条件T(·)，使得激活函数的形式变为y = max(x, T(x))。这里的T(·)通过简单的卷积操作实现，能够在像素级别进行建模，有效地捕获图像中复杂的视觉结构。这种设计允许FReLU通过常规的卷积层来适应和处理具有挑战性的复杂图像，而不需要使用更复杂、计算效率更低的卷积结构。实验部分，FReLU在ImageNet图像分类、COCO对象检测以及语义分割等任务上展示了显著的性能提升和鲁棒性。相比于传统的ReLU和PReLU，FReLU能够更好地保留和处理有用的信息，提高模型的识别精度，尤其是在处理具有丰富细节和复杂结构的图像时。此外，FReLU的另一个优点是其简洁的设计。尽管增加了像素级别的建模能力，但其计算成本相对较低，这使得FReLU在实际应用中更具吸引力。通过与现有模型的比较，研究证明了即使使用常规卷积，FReLU也能达到与复杂卷积结构相当的准确性，这对于资源有限的环境或需要高效计算的应用来说是一个显著的优势。 Funnel Activation为视觉识别任务提供了一种有效且实用的激活函数替代方案，通过引入2D空间条件，增强了CNN对图像中复杂信息的处理能力，同时保持了模型的计算效率。这一研究成果不仅推动了激活函数领域的创新，也为深度学习模型的优化提供了新的思路。有兴趣的读者可以通过提供的链接进一步了解和使用该方法。

Funnel Activation for Visual Recognition

Ningning Ma

, Xiangyu Zhang

, and Jian Sun

Hong Kong University of Science and Technology

MEGVII Technology

nmaac@cse.ust.hk, {zhangxiangyu,sunjian}@megvii.com

Abstract. We present a conceptually simple but eﬀective funnel acti-

vation for image recognition tasks, called Funnel activation (FReLU),

that extends ReLU and PReLU to a 2D activation by adding a neg-

ligible overhead of spatial condition. The forms of ReLU and PReLU

are y = max(x, 0) and y = max(x, px), respectively, while FReLU is in

the form of y = max(x, T(x)), where T(·) is the 2D spatial condition.

Moreover, the spatial condition achieves a pixel-wise modeling capac-

ity in a simple way, capturing complicated visual layouts with regular

convolutions. We conduct experiments on ImageNet, COCO detection,

and semantic segmentation tasks, showing great improvements and ro-

bustness of FReLU in the visual recognition tasks. Code is available at

https://github.com/megvii-model/FunnelAct.

Keywords: funnel activation, visual recognition, CNN

1 Introduction

Convolutional neural networks (CNNs) have achieved state-of-the-art perfor-

mance in many visual recognition tasks, such as image classiﬁcation, object de-

tection, and semantic segmentation. As popularized in the CNN framework, one

major kind of layer is the convolution layer, another is the non-linear activation

layer.

First in the convolution layers, capturing the spatial dependency adaptively

is challenging, many advances in more complex and eﬀective convolutions have

been proposed to grasp the local context adaptively in images [7,18]. The ad-

vances achieve great success especially on dense prediction tasks (e.g., semantic

segmentation, object detection). Driven by the advances in more complex convo-

lutions and their less eﬃcient implementations, a question arises: Could regular

convolutions achieve similar accuracy, to grasp the challenging complex images?

Second, usually right after capturing spatial dependency in a convolution

layer linearly, then an activation layer acts as a scalar non-linear transformation.

Many insightful activations have been proposed [31,14,5,25], but improving the

performance on visual tasks is challenging, therefore currently the most widely

used activation is still the Rectiﬁed Linear Unit (ReLU) [32]. Driven by the

Corresponding author

arXiv:2007.11824v2 [cs.CV] 24 Jul 2020

2 Ningning Ma et al.

0.5

1.0

1.5

Moreeffective

Transferbetter

Accuracy

Improvement (%)

FReLU

Swish

PReLU

Classiﬁcation

Detection

Segmentation

ReLU

Fig. 1. Eﬀectiveness and generalization performance. We set the ReLU network as

the baseline, and show the relative improvement of accuracy on the three basic tasks

in computer vision: image classiﬁcation (Top-1 accuracy), object detection (mAP),

and semantic segmentation (mean IU). We use the ResNet-50 [15] as the backbone

pre-trained on the ImageNet dataset, to evaluate the generalization performance on

COCO and CityScape datasets. FReLU is more eﬀective, and transfer better on all of

the three tasks.

distinct roles of the convolution layers and activation layers, another question

arises: Could we design an activation speciﬁcally for visual tasks?

To answer both questions raised above, we show that the simple but eﬀec-

tive visual activation, together with the regular convolutions, can also achieve

signiﬁcant improvements on both dense and sparse predictions (e.g. image clas-

siﬁcation, see Fig. 1). To achieve the results, we identify spatially insensitiveness

in activations as the main obstacle impeding visual tasks from achieving sig-

niﬁcant improvements and propose a new visual activation that eliminates this

barrier. In this work, we present a simple but eﬀective visual activation that

extends ReLU and PReLU to a 2D visual activation.

Spatially insensitiveness is addressed in modern activations for visual tasks.

As popularized in the ReLU activation, non-linearity is performed using a max(·)

function, the condition is the hand-designed zero, thus in the scalar form: y =

max(x, 0). The ReLU activation consistently achieves top accuracy on many

challenging tasks. Through a sequence of advances [31,14,5,25], many variants of

ReLU modify the condition in various ways and relatively improve the accuracy.

However, further improvement is challenging for visual tasks.

Our method, called Funnel activation (FReLU), extends the spirit of

ReLU/PReLU by adding a spatial condition (see Fig. 2) which is simple to im-

plement and only adds a negligible computational overhead. Formally, the form

of our proposed method is y = max(x, T(x)), where T(x) represents the simple

and eﬃcient spatial contextual feature extractor. By using the spatial condition

in activations, it simply extends ReLU and PReLU to a visual parametric ReLU

with a pixel-wise modeling capacity.

Funnel Activation for Visual Recognition 3

Our proposed visual activation acts as an eﬃcient but much more eﬀective

alternative to previous activation approaches. To demonstrate the eﬀectiveness

of the proposed visual activation, we replace the normal ReLU in classiﬁcation

networks, and we use the pre-trained backbone to show its generality on the

other two basic vision tasks: object detection and semantic segmentation. The

results show that FReLU not only improves performance on a single task but

also transfers well to other visual tasks.

2 Related Work

Scalar activations Scalar activations are activations with single input and sin-

gle output, in the form of y = f(x). The Rectiﬁed Linear Unit (ReLU) [13,23,32]

is the most widely used scalar activation on various tasks [26,38], in the form of

y = max(x, 0). It is simple and eﬀective for various tasks and datasets. To mod-

ify the negative part, many variants have been proposed, such as Leaky ReLU

[31], PReLU [14], ELU [5]. They keep the positive part identity and make the

negative part dependent on the sample adaptively.

Other scalar methods such as the sigmoid non-linearity has the form σ(x) =

1/(1+e

−x

), and the Tanh non-linearity has the form tanh(x) = 2σ(2x)−1. These

activations are not widely used in deep CNNs mainly because they saturate and

kill gradients, also involve expensive operations (exponentials, etc.).

Many advances followed [25,39,1,16,35,10,46], and recent searching technique

contributes to a new searched scalar activation called Swish [36] by combing a

comprehensive set of unary functions and binary functions. The form is y =

x ∗ Sigmoid(x), outperforms other scalar activations on some structures and

datasets, and many searched results show great potential.

Contextual conditional activations Besides the scalar activation which only

depends on the neuron itself, conditional activation is a many-to-one function,

which activates the neurons conditioned on contextual information. A represen-

tative method is Maxout [12], it extends the layer to a multi-branch and selects

the maximum. Most activations apply a non-linearity on the linear dot prod-

uct between the weights and the data, which is: f(w

x + b). Maxout computes

the max(w

x + b

, w

x + b

), and generalizes ReLU and Leaky ReLU into the

same framework. With dropout [17], the Maxout network shows improvement.

However, it increases the complexity too much, the numbers of parameters and

multiply-adds has doubled and redoubled.

Contextual gating methods [8,44] use contextual information to enhance the

eﬃcacy, especially on RNN based methods, because the feature dimension is

relatively smaller. There are also on CNN based methods [34], since 2D feature

size has a large dimension, the method is used after a feature reduction.

The contextually conditioned activations are usually channel-wise methods.

However, in this paper, we ﬁnd the spatial dependency is also important in the

non-linear activation functions. We use light-weight CNN technique depth-wise

separable convolution to help with the reduction of additional complexity.

4 Ningning Ma et al.

Spatial dependency modeling Learning better spatial dependency is chal-

lenging, Some approaches use diﬀerent shapes of convolution kernels [41,42,40]

to aggregate the diﬀerent ranges of spatial dependences. However, it requires a

multi-branch that decreases eﬃciency. Advances in convolution kernels such as

atrous convolution [18] and dilated convolution [47] also lead to better perfor-

mance by increasing the receptive ﬁeld.

Another type of methods learn the spatial dependency adaptively, such as

STN [22], active convolution [24], deformable convolution [7]. These methods

adaptively use the spatial transformations to reﬁne the short-range dependencies,

especially for dense vision tasks (e.g. object detection, semantic segmentation).

Our simple FReLU even outperforms them without complex convolutions.

Moreover, the non-local network provides the methods to capture long-range

dependencies to address this problem. GCNet [3] provides a spatial attention

mechanism to better use the spatial global context. Long-range modeling meth-

ods achieve better performance but still require additional blocks into the origin

network structure, which decreases eﬃciency. Our method address this issue in

the non-linear activations, solve this issue better and more eﬃciently.

Receptive ﬁeld The region and size of receptive ﬁeld are essential in vision

recognition tasks [50,33]. The work on eﬀective receptive ﬁeld [29,11] ﬁnds that

diﬀerent pixels contribute unequally and the center pixels have a larger impact.

Therefore, many methods have been proposed to implement the adaptive recep-

tive ﬁeld [7,51,49]. The methods achieve the adaptive receptive ﬁeld and improve

the performance, by involving additional branches in the architecture, such as

developing more complex convolutions or utilizing the attention mechanism. Our

method also achieves the same goal, but in a more simple and eﬃcient manner

by introducing the receptive ﬁeld into the non-linear activations. By using the

more adaptive receptive ﬁeld, we can approximate the layouts in common com-

plex shapes, thus achieve even better results than the complex convolutions, by

using the eﬃcient regular convolutions.

3 Funnel Activation

FReLU is designed speciﬁcally for visual tasks and is conceptually simple: the

condition is a hand-designed zero for ReLU and a parametric px for PReLU, to

this we modify it to a 2D funnel-like condition dependent on the spatial context.

The visual condition helps extract the ﬁne spatial layout of an object. Next, we

introduce the key elements of FReLU, including the funnel condition and the

pixel-wise modeling capacity, which are the main missing parts in ReLU and its

variants.

ReLU We begin by brieﬂy reviewing the ReLU activation. ReLU, in the form

max(x, 0), uses the max(·) to serve as non-linearity and uses a hand-designed

zero as the condition. The non-linear transformation acts as a supplement of

the linear transformation such as convolution and fully-connected layers.

剩余16页未读，继续阅读

资源推荐

资源评论

woo静

粉丝: 33
资源: 347

激活函数论文Funnel Activation for Visual Recognition1

最新资源

激活函数论文Funnel Activation for Visual Recognition1

关于激活函数

ZjutBPO#AI#激活函数1

FunnelAct

开源项目-agnivade-funnel.zip

Go-funnel:一个简约的12个因素日志路由器

[.Net控件] Infragistics NetAdvantage for ASP.NET 2013 Vol.1

Google Analytics API Example 调用API示例, Funnel Visualization 数据

Funnel-Structured cascade

sales-funnel.xlsx

Dundas.Chart.for.Winform.Enterprise.v7.1.0.1812.for.VS2008

deft-funnel：DEFT-FUNNEL：一种开放源代码全局优化求解器，用于解决Matlab中受约束的灰箱和黑箱问题

Funnel是一个通过简单的标准API执行分布式任务的工具包。___下载.zip

Minecraft:4x4 Redstone Lamp Funnel Door

销售漏斗 highchart js funnel chart

Funnel-Transformer

Done For You Funnel System-crx插件

funnel.js-react:使用React编写的所有HTML中的渠道可视化

d3-funnel:一个JavaScript库，用于使用D3.js框架呈现漏斗图

Funnel Sort:命令行排序实用程序-开源

python绘制漏斗图步骤详解

Funnel Search-crx插件

FUNNEL: Assessing Software Changes in Web-based Services

google-analytics-checkout-funnel:Magento 扩展，支持使用 Google Analytics（通用和经典）进行细粒度结账漏斗跟踪。 仅适用于标准的一页结帐

Python库 | video_funnel-0.3.1.tar.gz

GCM_+-LANet：遥...积模块与局部注意力网络模型_翁梦倩.pdf

ECharts漏斗图-Funnel.rar

purchase_funnel:D3 svg 购买漏斗可视化（概念证明）

funnel.js:使用HTML以D3和Backbone编写的渠道可视化

最新资源

google-analytics-checkout-funnel:Magento 扩展，支持使用 Google Analytics（通用和经典）进行细粒度结账漏斗跟踪。仅适用于标准的一页结帐