Fourier神经算子的普适逼近与误差界_Onuniversalapproximationanderrorbounds

版权申诉

175 浏览量 2022-01-17 20:08:53 上传评论收藏 706KB PDF 举报

【 Fourier神经算子的普适逼近与误差界】在计算机科学（cs）领域，Fourier神经算子（FNOs）已经成为一种新兴的框架，用于学习映射无限维空间之间的算子。这篇由Nikola Kovachki, Samuel Lanthaler和Siddhartha Mishra撰写的论文"ON UNIVERSAL APPROXIMATION AND ERROR BOUNDS FOR FOURIER NEURAL OPERATORS"探讨了FNOs在连续算子逼近中的普遍性以及它们在误差控制方面的表现。 FNOs的核心理念是利用傅立叶变换的特性来处理无限维空间中的函数和算子。这种方法对于解决基于偏微分方程（PDEs）的问题特别有优势，因为PDEs通常涉及到复杂的连续映射。论文证明了FNOs具有普适性，意味着它们能够以任意所需的精度逼近任何连续算子。这是对深度神经网络（DNNs）普遍逼近能力的一个扩展，表明FNOs在处理无限维问题时也能保持强大的表达能力。此外，研究者提出了一个机制，展示了FNOs如何高效地逼近与PDEs相关的算子。他们提供了明确的误差界，展示了当需要逼近与达西型椭圆PDE和无压纳维-斯托克斯流体动力学方程相关的算子时，FNOs的规模仅会以次线性（对数）增长的方式随误差的倒数增加。这意味着FNOs可以在广泛的PDE问题中实现高效的近似。近年来，深度神经网络已经在图像分类、语音识别、自然语言处理、自动驾驶系统等多个科学和工程领域取得了显著成就。同时，它们也被广泛应用于科学计算，特别是在模拟由PDEs描述的物理和工程系统。例如，物理引导的神经网络被用来解决PDE的正反问题，监督学习算法则被应用到高维抛物型PDE和参数椭圆及双曲型PDE等问题上。尽管深度神经网络在许多学习任务上的成功得益于大规模标注数据集、新颖架构和训练算法的出现，以及高性能GPU平台的普及，但仍然存在挑战，如泛化能力、解释性和计算效率。FNOs提供了一个新的视角，可能解决这些问题，尤其是在处理PDEs时，它有可能减少计算资源的需求并提高预测精度。 FNOs通过其在逼近连续算子方面的普适性和误差控制能力，为理解和应用PDEs的复杂性提供了一种强大的工具。它们在理论上的证明和实际应用中的潜力预示着在科学计算和工程模拟领域的新突破。随着对FNOs的深入研究和优化，我们有望看到更多利用这些技术解决现实世界问题的实例。

资源推荐

资源详情

资源评论

arXiv:2107.07562v1 [math.NA] 15 Jul 2021

ON UNIVERSAL APPROXIMATION AND ERROR BOUNDS

FOR FOURIER NE URAL OPERATORS

NIKOLA KOVACHKI, SAMUEL LANTHALER, AND SIDDHARTHA MISHRA

Abstract. Fourier neural operators (FNOs) have recently been proposed

as an eﬀective framework for learning operators that map between inﬁnite-

dimensional spaces. We prove that FNOs are universal, in the sense that they

can approximate any continuous operator to desi red accuracy. Moreover, we

suggest a mechanism by which FNOs can approximate operators associated

with PDEs eﬃciently. Explicit error bounds are derived to show that the size

of the FNO, approximating operators associated with a Darcy type elliptic

PDE and with the incompressible Navier-Stokes equations of ﬂuid dynamics,

only increases sub (log)-linearly in terms of the reciprocal of the error. Thus,

FNOs are shown to eﬃciently approximate operators ar ising in a large class of

PDEs.

1. Introduction

Deep neural network s have been extremely successful in diverse ﬁelds of science

and engineering including image classiﬁcation, speech recognition, natural language

understanding, autonomous systems, game intelligence and protein folding, [16] and

references therein. Moreover, deep neural networks are being increasingly used suc-

cessfully in scientiﬁc computing, particular in simulating physical and engineering

systems modeled by partial diﬀerential equations (PDEs). Examples include the

use of physics informed neural networks [29, 30, 27, 28] for solving forward and in-

verse problems for PDEs and supervised learning algorithms for high-dimensional

parabolic PDEs [11] and par ametric elliptic [14, 32] and hyperb olic [22, 23] PDEs ,

among others.

The success of deep neural ne tworks at a wide variety of lea rning tasks can

be attributed to a conﬂuence of several factor s such as the availability of mass ive

labeled data sets, the design of novel architectures and training algorithms as well

as the abundance of high-end computing platfor ms such as GPUs [12]. Still, it is fair

to surmise that this ediﬁce of success partly rests on the foundation of un iversal

approximation [1, 9, 13], i.e., the ability of neural networks to approximate any

continuous (even measurable) function, mapping a ﬁnite-dimensional input space

into another ﬁnite-dimensional output space, to arbitrary accuracy.

However, many interesting learning tasks entail learning operators i.e., map-

pings between an inﬁnite-dimensional input Banach space and (pos sibly) an inﬁnite-

dimensional output space. A prototypical example in scientiﬁc computing is pro-

vided by nonlinear operators that map the initial datum into the (time series of)

solution of a nonlinear time-dependent PDE such as the Navier-Stokes equations

of ﬂuid dynamics. A priori, it is unclear if neural networks ca n be s uc cessfully

employed for learning such operators from data, given that their universality only

pertains to ﬁnite-dimensional functions.

The ﬁrst successful use of neural networks in the c ontext of such operator learn-

ing was provided in [8], where the authors proposed a novel neural network based

learning architecture, which they termed as operator n etworks and proved that

these operator networks pos sess a surprising universal approximation property for

2 NIKOLA KOVACHKI, S AMUEL LANTHALER, A ND S I DD HARTHA MISHRA

inﬁnite-dimensional nonlinear operators. O perator networks are based on two dif-

ferent neural networks, a branch net and a trunk net, which are trained concurrently

to learn from data. More recently, the authors of [21] have proposed using deep,

instead of shallow, neural networks in both the trunk and branch net and have chris-

tened the r e sulting architecture as a DeepOnet. In a recent article [15], the universal

approximation property of DeepOnets was extended, making it completely analo-

gous to universal approximation results for ﬁnite-dimensional functions by neural

networks. The authors of [15] were also able to show that DeepOne ts can break the

curse of dimensionality for a la rge variety of PDE learning task s. Hence, in spite

of the underlying inﬁnite-dimensional setting, DeepOnets are capable of approxi-

mating a large var iety of nonlinear operators eﬃciently. This is further validated

by the success of DeepOnets in many interesting examples in scie ntiﬁc computing

[26, 6, 20] and re ferences therein.

An alternative operator learning framework is provided by the concept of neu-

ral operators, ﬁrst proposed in [18]. Just as canonical artiﬁcial neural networks

are a concatenated composition of multiple hidden layers, with each hidden layer

composing an aﬃne function with a scalar nonlinea r activatio n function, neural

operators also compose multiple hidden layers, with each hidden layer composing

an aﬃne operator with a local, scalar nonlinear activation operator. The inﬁnite-

dimensional setup is reﬂected in the fact that the aﬃne operator can be s igniﬁcantly

more general than in the ﬁnite-dimensional case, where it is represented by a weight

matrix and bia s vector. On the other hand, for neural operators, one can even use

non-local linear operators, such as those deﬁned in terms of an integral kernel.

The evaluation of such integral kernels can be performed either with graph kernel

networks [18] or with multipole expansions [17].

More recently, the author s of [19] have proposed using convolution-based integral

kernels within neural operators. Such kernels can be eﬃciently evaluated in the

Fourier space, leading to the resulting neural operators being termed as Fourier

Neural Operators (FNOs). In [19], the authors discuss the advantages, in terms of

computational eﬃciency, of FNOs over the other neural operators mentioned above.

Moreover, they pres ent se veral convincing numerical experiments to demonstrate

that FNOs can very eﬃciently approximate a variety of op erators that aris e in

simulating PDEs.

However, the theoretical basis for neural oper ators has not yet b e en properly

investig ated. In pa rticular, it is unclear if neural o perators such as FNOs are uni-

versal i.e., if they can approximate a large class of nonlinear inﬁnite-dimensional

operators. Moreover in this inﬁnite-dimensional s etting, universality does not suf-

ﬁce to indicate computatio nal viability or eﬃciency as the size of the underlying

neural networks might grow exponentially with respect to increas ing accurac y, see

discussion in [15] on this issue. Hence in addition to universality, it is natural to

ask if neural operators can eﬃciently approximate a large class of operators, such

as those arising in the simulation of parametric PDEs.

The investigation of these questions is the main rationale for the current paper.

We focus our attention here on FNOs as they appear to be the most promising of

the neural operator based oper ator learning frameworks. Our main result in this

paper is to show that FNOs are universal in possessing the ability to approximate

a very large class of continuous no nlinear operators. This result highlights the

potential of FNOs for operator lea rning.

As argued before, a universality result is only a ﬁrst step and by itself, does not

constitute evidence for eﬃcient approximation by FNOs. In fact, we show that

in the worst case, the network size might grow exponentially with respect to ac-

curacy, when approximating general operators. Hence, there is a need to derive

FOURIER NEURAL OPERATORS 3

explicit bounds on the network size in terms of the desired error tolerance. In

this context, we consider a concrete computational realization of FNOs, that we

term as pseudospectral FNO o r Ψ-FNO (for short). In addition to proving uni-

versality for Ψ-FNOs, we will suggest a mechanism through which Ψ-FNOs can

approximate ope rators arising from PDE s, eﬃciently. We also derive e xplicit error

bounds for this architecture in approximating PDEs, for two widely used prototyp-

ical examples of PDEs i.e, a Darcy type elliptic equation and the incompressible

Navier-Stokes equations of ﬂuid dynamics. In particular, we prove that the size

of Ψ-FNOs in approximating the underlying o perators for both these PDEs, un-

der suitable hypotheses, only scales polynomially (log-linearly) in the error. Thus,

FNOs can approximate these operators e ﬃciently and these results valida te some

of the computational ﬁndings of [19]. Together, these results constitute the ﬁr st

theoretical justiﬁcation for the use of FNOs.

The rest of the paper is organized as follows: in section 2, we introduce FNOs

and state the universality result. We also introduce Ψ-FNOs in this section. In

section 3, we show that Ψ-FNOs can eﬃciently approximate opera tors, stemming

from the Darcy-type elliptic equation as well as the incompressible Navier-Stokes

equations. In section 4, we compare FNOs with DeepOnets and the r esults of the

article are discus sed in section 5. The mathematical notation, used in this paper,

is summarized in Appendix A and we present all the technical details and proofs

in other appendices.

2. Approximation by Fourier Neural Operators

In this section, we present Fourier Neural Operators (FNOs) and discuss their

approximation of a c lass of nonlinear operators speciﬁed below:

2.1. Setting for Operator Learning.

Setting 2.1. We ﬁx a spatial dimension d ∈ N, and denote by D ⊂ R

a domain

in R

. We consider the approximation of operators G : A(D; R

) → U(D; R

a 7→ u := G(a), where the input a ∈ A(D; R

), d

∈ N, is a function a : D → R

with d

components, and the output u ∈ U(D; R

), d

∈ N, is a function u : D →

with d

components. Here A(D; R

) and U(D; R

) are Banach spaces (o r

suitable subsets of Ba nach spaces). Typical examples of A and U include the space

of continuous functions C(D; R

), or Sobolev s paces H

(D; R

) of order s ≥ 0

(see Appendix B for deﬁnitions.).

Concrete e xamples for oper ators G, involving solutio n operators of PDEs, are

given in section 3.

2.2. Neural Operators. With the above setting 2.1 and as deﬁned in [18], a

neural operator N : A(D; R

) → U(D; R

), a 7→ N(a) is a mapping of the fo rm

N(a) = Q ◦ L

◦ L

L−1

◦ ··· ◦L

◦ R(a),

for a given depth L ∈ N, where R : A(D; R

) → U(D; R

), d

≥ d

, is a lifting

operator (acting locally), of the fo rm

R(a)(x) = Ra(x), R ∈ R

×d

, (2.1)

and Q : U(D; R

) → U(D; R

) is a local projection ope rator, of the form

Q(v)(x) = Qv(x), Q ∈ R

×d

. (2.2)

4 NIKOLA KOVACHKI, S AMUEL LANTHALER, A ND S I DD HARTHA MISHRA

Remark 2.2. In practice, it has been found that improved results can be obtained

if the simple linear lifting and projection operators R (2.1) and Q (2.2) ar e replac e d

instead by non-linear ma ppings of the form

R(a)(x) =

R(a(x), x),

Q(v)(x) =

Q(v(x), x),

where

R : R

× D → R

and

Q : R

× D → R

are neural networks with

activation function σ. Our error estimates will rely on the (more re strictive) linear

choice of lifting and projection o perators, given by (2.1), (2.2). The linear choice

has the theo retical beneﬁt of ensuring compositionality, i.e. that a composition

of neural operators can again be r epresented by a neural oper ator (cf. Lemma

D.4). Despite this technical distinction, we emphasize that all of our error and

complexity estimates continue to hold also fo r neural operators w ith non-linear

lifting and projections, since linear operators can always be approximated by non-

linear ones (cp. Lemma C.1). In fact, in the non-linear case, our results imply that

R can b e chosen to be shallow networks.

In analogy with canonical ﬁnite-dimensional neur al network s, the layers L

, . . . , L

are non-linear operator layers, L

ℓ

: U(D; R

) → U(D; R

), v 7→ L

ℓ

(v), which we

assume to be of the form

ℓ

(v)(x) = σ



ℓ

v(x) + b

ℓ

(x) +



K(a; θ

ℓ



(x)



, ∀x ∈ D.

Here, the weight matrix W

ℓ

∈ R

×d

and bias b

ℓ

(x) ∈ U(D; R

) deﬁne an aﬃne

pointwise mapping W

ℓ

v(x) + b

ℓ

(x). The richness of linear ope rators in the inﬁnite-

dimensional se tting can partly be realized by deﬁning the fo llowing non-local linear

operator,

K : A× Θ → L



U(D; R

), U(D; R

)



that maps the input ﬁeld a and a parameter θ ∈ Θ in the parameter-set Θ to

a bounded linear operator K(a, θ) : U(D; R

) → U(D; R

), and the non-linear

activation function σ : R → R is applied component-wise. As proposed in [18], the

linear operators K(a, θ) are integral operators of the form



K(a; θ)v



(x) =

(x, y; a(x), a(y))v(y) dy, ∀x ∈ D. (2.3)

Here, the integral kernel κ

: R

2(d+d

)

→ R

×d

is a neural network parametrized

by θ ∈ Θ. Speciﬁc examples of the integral kernel (2.3) include those evaluated

with a graph kernel network as in [18] or with a multipole expans ion [17].

2.3. Fourier Neural Operators. As deﬁned in [19], Fourier Neural operators

(FNOs) are special cases of g eneral neural operators (2.3), in which the kernel

(x, y; a(x), a(y)) is of the form κ

= κ

(x − y). In this case, (2.3 ) can be written

as a convolutio n



K(θ)v



(x) =

(x − y)v(y) dy, ∀x ∈ D. (2.4)

For concreteness, we consider the periodic domain D = T

(which we identify with

the standard torus T

= [0, 2π]

), although no n-periodic, rectangular domains D

can also be handled in a straightforward manner.

Given this periodic framework, the convolution ope rator in (2.4) can be com-

puted using the Fourier transform F and the inverse Fourier transform F

−1

(see

Appendix B (B.1) and (B.2) for notation and deﬁnitions), resulting in the following

equivalent representation of the kernel (2.3),

(K(θ)v)(x) = F

−1



(k) · F(v)(k)



(x), ∀x ∈ T

. (2.5)

FOURIER NEURAL OPERATORS 5

Here, P

(k) ∈ C

×d

is a full matrix indexed by k ∈ Z

, and is related to the

integral kernel κ

(x) in (2.4) via the Fourier transfor m, P

(k) = F(κ

)(k). Note

that we must impose that P

(−k) = P

(k)

†

coincides with the Hermitian transpose

for all k ∈ Z

, to ensure that the image function (K(θ)v)(x) is a re al-valued function

for real-valued v(x). Consequently, the form o f Fourier neural operators (FNOs)

for the periodic domain T

is that of a mapping N : A(D; R

) → U(D; R

), of

the form

N(a) := Q ◦ L

◦ L

L−1

◦ ··· ◦L

◦ R(a), (2.6)

where the lifting and projection o perators R and Q are given by (2.1) and (2.2),

respectively, and where the non-linear layers L

ℓ

are of the form

ℓ

(v)(x) = σ



ℓ

v(x) + b

ℓ

(x) + F

−1



ℓ

(k) · F(v)(k)



(x)



. (2.7)

Here, W

ℓ

∈ R

×d

and b

ℓ

(x) deﬁne a pointwise aﬃne mapping (corresponding to

weights and bias es), and P

ℓ

: Z

→ C

×d

deﬁnes the coeﬃcients of a non-local,

linear mapping via the Fourier transform.

Remark 2.3. The simplest example for a FNO, as deﬁned by (2.6),(2.4) is as

follows; let

N : R

→ R

be a canonical ﬁnite-dimens ional neural network with

activation function σ. We ca n associate to

N the mapping N : L

; R

) →

; R

), given by a(x) 7→

N(a(x)). We easily obser ve that N is a FNO as we

can write it in the form,

N =

Q ◦

◦ ··· ◦

◦

where

R(y) = Ry with R ∈ R

×d

, and each layer

ℓ

is o f the form

ℓ

(y) =

σ(W

ℓ

y + b

ℓ

) for some W

ℓ

∈ R

×d

, b

ℓ

∈ R

, with

Q being an aﬃne output layer

of the form

Q(y) = Qy + q with Q ∈ R

×d

, q ∈ R

. Replacing the input y

by a function v(x), these layers clearly are a spec ial case of the FNO lifting layer

(2.1), the non-linear layers (2.7) (with P

ℓ

≡ 0 and constant bias b

ℓ

(x) ≡ b

ℓ

), and the

projection layer (2.2). Thus, any ﬁnite-dimensional neural network can be identiﬁed

with a FNO as deﬁned a bove.

For the remainder of this work, we make the following assumption,

Assumption 2.4 (Activation function). Unless explicitly stated otherwise, the

activation function σ : R → R in (2.7) is assumed to be non-polynomial, (globally)

Lipschitz continuous and σ ∈ C

2.4. Universal Approxi mation by FNOs. Next, we will show that FNO s (2.6)

are universal i.e., given a large class of operators, as deﬁned in setting 2.1, one can

ﬁnd an FNO that approximates it to desired accuracy. To be more precis e, we have

the following theorem.

Theorem 2.5 (Universal approximation). Let s, s

′

≥ 0. Let G : H

; R

) →

′

; R

) be a continuous operator. Let K ⊂ H

; R

) be a compact subset.

Then for any ǫ > 0, there exists a FNO N : H

; R

) → H

′

; R

), of the

form (2.6), continuous as an oper ator H

→ H

′

, such that

sup

a∈K

kG(a) − N(a)k

′

≤ ǫ.

Sketch of proof. The detaile d proof of this universal approximation theorem is pro-

vided in Appendix D.4 and we outline it here. For notational simplicity, we set

= d

= 1, and ﬁrst observe the following lemma, prove d in Appendix D.1:

剩余60页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6611
资源: 9万+

Fourier神经算子的普适逼近与误差界_On universal approximation and error bounds

最新资源

Fourier神经算子的普适逼近与误差界_On universal approximation and error bounds

球面神经网络的近似误差估计

(fourier)-MATLAB.zip_Fourier_MATLAB Fourier_fourier matlab_fouri

Fourier_fft和fourier_傅里叶变换_fft_

Fast_Fourier_Transform_Algorithms_and_Applications

Python_Projects_Fourier_python_余弦_傅里叶级数_python分解_

fourier.zip_Fourier amplitude_傅里叶_实部虚部相位

ECG_Analysis.zip_ECG_Fourier_Heart To Heart_fourier ecg_java

split_step_fourier_method.rar_split fourier_split step_分步傅里叶_分步

fourier_neural_operator:使用傅立叶变换学习微分方程中的算子

基于Fourier神经网络的图像复原算法

split_step_fourier_method.rar_schrodinger_split step_split step

fast fourier transform.rar_Fourier_fast_matalab_transform

The_Fourier_Transform_and_its_Applications.pdf

NSI-near-far.rar_Fourier_antenna near field_far field_matlab ant

47456753split_step_fourier_method_非线性薛定谔方程_分布傅里叶_

split_step_fourier_method_非线性脉冲传输_分布傅里叶变换_

Fourier_Decriptor.zip_fourier descriptor_fourier-descriptor_傅立叶描

DWTFFT.rar_fourier face matlab_wavelet_wavelet fourier_小波 人脸_小波

Fourier_fouriermatlab_fouriertransform_4F系统_

FFT.rar_FFT 谐波_FFT谐波_Fourier amplitude_matlab fft求相角_信号谐波

fs.rar_fourier series_matlab 傅里叶_matlab傅里叶_傅里叶 MATLAB_傅里叶级数

Fractional Fourier Transform.rar_Fourier_chicken9ts_分数 变换_分数余弦_分

Fractional_Fourier_Transform.rar_fractional fourier_frft_分数阶Four

A Fast Fourier Transform Compiler_傅里叶变换_FFTW_源码

Split Step Fourier Method.rar_method_schrodinger_split_split ste

ForWaRD.rar_Fourier_matlab midwt_去模糊MATLAB_图像复原

Fourier transform based on Riemann sum.zip_SUM_傅里叶积分

Fast-Fourier-Transformation-master_傅里叶文档_傅里叶_TRANSFORMATION_

discrete-Fourier-transform.zip_discrete fourier_信号 时域数据_信号波形图_均方

最新资源

DWTFFT.rar_fourier face matlab_wavelet_wavelet fourier_小波人脸_小波

Fractional Fourier Transform.rar_Fourier_chicken9ts_分数变换_分数余弦_分

discrete-Fourier-transform.zip_discrete fourier_信号时域数据_信号波形图_均方