2 NIKOLA KOVACHKI, S AMUEL LANTHALER, A ND S I DD HARTHA MISHRA
infinite-dimensional nonlinear operators. O perator networks are based on two dif-
ferent neural networks, a branch net and a trunk net, which are trained concurrently
to learn from data. More recently, the authors of [21] have proposed using deep,
instead of shallow, neural networks in both the trunk and branch net and have chris-
tened the r e sulting architecture as a DeepOnet. In a recent article [15], the universal
approximation property of DeepOnets was extended, making it completely analo-
gous to universal approximation results for finite-dimensional functions by neural
networks. The authors of [15] were also able to show that DeepOne ts can break the
curse of dimensionality for a la rge variety of PDE learning task s. Hence, in spite
of the underlying infinite-dimensional setting, DeepOnets are capable of approxi-
mating a large var iety of nonlinear operators efficiently. This is further validated
by the success of DeepOnets in many interesting examples in scie ntific computing
[26, 6, 20] and re ferences therein.
An alternative operator learning framework is provided by the concept of neu-
ral operators, first proposed in [18]. Just as canonical artificial neural networks
are a concatenated composition of multiple hidden layers, with each hidden layer
composing an affine function with a scalar nonlinea r activatio n function, neural
operators also compose multiple hidden layers, with each hidden layer composing
an affine operator with a local, scalar nonlinear activation operator. The infinite-
dimensional setup is reflected in the fact that the affine operator can be s ignificantly
more general than in the finite-dimensional case, where it is represented by a weight
matrix and bia s vector. On the other hand, for neural operators, one can even use
non-local linear operators, such as those defined in terms of an integral kernel.
The evaluation of such integral kernels can be performed either with graph kernel
networks [18] or with multipole expansions [17].
More recently, the author s of [19] have proposed using convolution-based integral
kernels within neural operators. Such kernels can be efficiently evaluated in the
Fourier space, leading to the resulting neural operators being termed as Fourier
Neural Operators (FNOs). In [19], the authors discuss the advantages, in terms of
computational efficiency, of FNOs over the other neural operators mentioned above.
Moreover, they pres ent se veral convincing numerical experiments to demonstrate
that FNOs can very efficiently approximate a variety of op erators that aris e in
simulating PDEs.
However, the theoretical basis for neural oper ators has not yet b e en properly
investig ated. In pa rticular, it is unclear if neural o perators such as FNOs are uni-
versal i.e., if they can approximate a large class of nonlinear infinite-dimensional
operators. Moreover in this infinite-dimensional s etting, universality does not suf-
fice to indicate computatio nal viability or efficiency as the size of the underlying
neural networks might grow exponentially with respect to increas ing accurac y, see
discussion in [15] on this issue. Hence in addition to universality, it is natural to
ask if neural operators can efficiently approximate a large class of operators, such
as those arising in the simulation of parametric PDEs.
The investigation of these questions is the main rationale for the current paper.
We focus our attention here on FNOs as they appear to be the most promising of
the neural operator based oper ator learning frameworks. Our main result in this
paper is to show that FNOs are universal in possessing the ability to approximate
a very large class of continuous no nlinear operators. This result highlights the
potential of FNOs for operator lea rning.
As argued before, a universality result is only a first step and by itself, does not
constitute evidence for efficient approximation by FNOs. In fact, we show that
in the worst case, the network size might grow exponentially with respect to ac-
curacy, when approximating general operators. Hence, there is a need to derive