【免费】激活函数论文GELUS1资源-CSDN文库

需积分: 0 153 浏览量 2022-08-03 22:09:28 上传评论收藏 2.62MB PDF 举报

激活函数论文GELUS1 本文主要介绍了一种新的神经网络激活函数Gaussian Error Linear Unit（GELU），该函数能够在计算机视觉、自然语言处理和语音处理任务中取得性能改进。作者回顾了神经网络激活函数的发展历程。早期的神经元使用二进制阈值单元（Hopfield, 1982; McCulloch & Pitts, 1943），这种硬性二进制决策使得神经元具有“ firing rate”解释，并且可以使用反向传播进行训练。然而，随着网络深度的增加，使用sigmoid激活函数进行训练变得不太有效，而ReLU（Nair & Hinton, 2010）则通过基于输入信号的符号进行硬门控决策，虽然ReLU缺乏统计学基础，但它仍然是一种竞争性的工程解决方案，能实现更快和更好的收敛。后来，修改后的ELU（Clevert et al., 2016）允许ReLU-like非线性输出负值，有时可以提高训练速度。 GELU激活函数的提出是为了解决现有激活函数的缺陷。GELU函数的定义是xΦ(x)，其中Φ(x)是标准高斯累积分布函数。GELU非线性对输入的加权，而不是像ReLU那样基于输入信号的符号进行门控。作者通过实验对比了GELU、ReLU和ELU三个激活函数的性能，结果表明GELU在计算机视觉、自然语言处理和语音处理任务中都取得了性能改进。此外，作者还讨论了神经网络设计中激活函数的选择对网络性能的影响。他们指出，神经网络设计者在选择激活函数时需要考虑到网络的深度、宽度和训练数据的分布等因素。同时，作者还讨论了随机正则化技术的应用，例如添加噪声到隐藏层或使用 dropout（Srivastava et al., 2014），这可以帮助改进网络的泛化能力。本文提出了一种新的神经网络激活函数GELU，并对其进行了实验验证。实验结果表明GELU在多个任务中都取得了性能改进，证明了GELU的有效性。此外，本文还对神经网络激活函数的发展历程进行了回顾，对神经网络设计中激活函数的选择进行了讨论，并讨论了随机正则化技术的应用。这使得本文对读者具有很高的参考价值。

资源详情

资源评论

资源推荐

GAUSSIAN ERROR LINEAR UNITS (GELUS)

Dan Hendrycks

∗

University of California, Berkeley

hendrycks@berkeley.edu

Kevin Gimpel

Toyota Technological Institute at Chicago

kgimpel@ttic.edu

ABSTRACT

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural

network activation function. The GELU activation function is xΦ(x), where Φ(x)

the standard Gaussian cumulative distribution function. The GELU nonlinearity

weights inputs by their value, rather than gates inputs by their sign as in ReLUs

(x1

x>0

). We perform an empirical evaluation of the GELU nonlinearity against

the ReLU and ELU activations and ﬁnd performance improvements across all

considered computer vision, natural language processing, and speech tasks.

1 INTRODUCTION

Early artiﬁcial neurons utilized binary threshold units (Hopﬁeld, 1982; McCulloch & Pitts, 1943).

These hard binary decisions are smoothed with sigmoid activations, enabling a neuron to have a “ﬁr-

ing rate” interpretation and to train with backpropagation. But as networks became deeper, training

with sigmoid activations proved less effective than the non-smooth, less-probabilistic ReLU (Nair &

Hinton, 2010) which makes hard gating decisions based upon an input’s sign. Despite having less of

a statistical motivation, the ReLU remains a competitive engineering solution which often enables

faster and better convergence than sigmoids. Building on the successes of ReLUs, a recent modiﬁ-

cation called ELUs (Clevert et al., 2016) allows a ReLU-like nonlinearity to output negative values

which sometimes increases training speed. In all, the activation choice has remained a necessary

architecture decision for neural networks lest the network be a deep linear classiﬁer.

Deep nonlinear classiﬁers can ﬁt their data so well that network designers are often faced with the

choice of including stochastic regularizer like adding noise to hidden layers or applying dropout (Sri-

vastava et al., 2014), and this choice remains separate from the activation function. Some stochastic

regularizers can make the network behave like an ensemble of networks, a pseudoensemble (Bach-

man et al., 2014), and can lead to marked accuracy increases. For example, the stochastic regular-

izer dropout creates a pseudoensemble by randomly altering some activation decisions through zero

multiplication. Nonlinearities and dropout thus determine a neuron’s output together, yet the two

innovations have remained distinct. More, neither subsumed the other because popular stochastic

regularizers act irrespectively of the input and nonlinearities are aided by such regularizers.

In this work, we introduce a new nonlinearity, the Gaussian Error Linear Unit (GELU). It relates

to stochastic regularizers in that it is the expectation of a modiﬁcation to Adaptive Dropout (Ba &

Frey, 2013). This suggests a more probabilistic view of a neuron’s output. We ﬁnd that this novel

nonlinearity matches or exceeds models with ReLUs or ELUs across tasks from computer vision,

natural language processing, and automatic speech recognition.

2 GELU FORMULATION

We motivate our activation function by combining properties from dropout, zoneout, and ReLUs.

First note that a ReLU and dropout both yield a neuron’s output with the ReLU deterministi-

cally multiplying the input by zero or one and dropout stochastically multiplying by zero. Also,

a new RNN regularizer called zoneout stochastically multiplies inputs by one (Krueger et al.,

2016). We merge this functionality by multiplying the input by zero or one, but the values of

this zero-one mask are stochastically determined while also dependent upon the input. Specif-

ically, we can multiply the neuron input x by m ∼ Bernoulli(Φ(x)), where Φ(x) = P (X ≤

∗

Work done while the author was at TTIC. Code available at github.com/hendrycks/GELUs

arXiv:1606.08415v4 [cs.LG] 8 Jul 2020

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

评论收藏

内容反馈

琉璃纱

粉丝: 22
资源: 298

激活函数论文GELUS1

评论0

最新资源

激活函数论文GELUS1

评论0

具有非多项式激活函数的多层前馈网络可以近似任何函数-研究论文

激活函数论文Funnel Activation for Visual Recognition1

从ReLU到GELU，一文概览神经网络的激活函数.zip

complex-networks-release:与Elizabeth K. Cole等人撰写的论文“用于MRI重建的复值卷积神经网络”相关。 al; 使用展开架构的用于复值卷积和激活函数的工具箱

使用不同卷积神经网络和深度神经网络以及不同激活函数组合的图像分类和性能分析架构-研究论文

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

BurpLoaderKeygen.jar.zip

Chrome Header Editor 插件

Goby红队版-win-x64-2.4.7版本

软件工程导论(第六版)课后习题答案1

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

OpenVAS离线资源

2023年最全最精简wifi密码字典(2.6G)

小迪安全笔记，详细版本

hackbar2.1.3-master安装包

无法定位程序输入点于动态链接库上的问题1

BurpSuite V2024.1.1专业版

关于STM32F103C8T6芯片的一些重要引脚功能的整理1

Kali安装burpsuite专业版

2021年11月更新的哥斯拉4.0.1 免费

全面的安全基线核查清单

通达信股票行情接口C#版API手册

最新资源