[论文翻译]参数偏微分方程的傅里叶神经算子


原文地址:https://arxiv.org/pdf/2010.08895v3


FOURIER NEURAL OPERATOR FORPARAMETRIC PARTIAL DIFFERENTIAL EQUATIONS

参数偏微分方程的傅里叶神经算子

Zongyi Li zongyili $@$ caltech.edu

Zongyi Li zongyili $@$ caltech.edu

Nikola Kovachki nkovachki@caltech.edu

Nikola Kovachki nkovachki@caltech.edu

Kamyar Aziz za de ne she li kamyar $@$ purdue.edu

Kamyar Aziz za de ne she li kamyar $@$ purdue.edu

Burigede Liu bgl@caltech.edu

Burigede Liu bgl@caltech.edu

Kaushik Bhatt acharya bhatta@caltech.edu

Kaushik Bhatt acharya bhatta@caltech.edu

Andrew Stuart astuart@caltech.edu

Andrew Stuart astuart@caltech.edu

Anima Anandkumar anima@caltech.edu

Anima Anandkumar anima@caltech.edu

ABSTRACT

摘要

The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an entire family of PDEs, in contrast to classical methods which solve one instance of the equation. In this work, we formulate a new neural operator by parameter i zing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. We perform experiments on Burgers’ equation, Darcy flow, and Navier-Stokes equation. The Fourier neural operator is the first ML-based method to successfully model turbulent flows with zero-shot super-resolution. It is up to three orders of magnitude faster compared to traditional PDE solvers. Additionally, it achieves superior accuracy compared to previous learning-based solvers under fixed resolution.

神经网络的经典发展主要集中于学习有限维欧几里得空间之间的映射关系。近年来,这一范畴被推广至能学习函数空间之间映射的神经算子 (neural operator) 。对于偏微分方程 (PDEs) ,神经算子直接学习从任意函数参数依赖关系到解的映射,从而能学习整个PDE族,这与仅求解方程单个实例的经典方法形成鲜明对比。本研究通过直接在傅里叶空间参数化积分核,构建了一种新型神经算子,其架构兼具高表达力与高效性。我们在Burgers方程、达西流 (Darcy flow) 和Navier-Stokes方程上进行了实验验证。傅里叶神经算子 (Fourier neural operator) 是首个基于机器学习、能以零样本超分辨率成功模拟湍流的方法,其速度比传统PDE求解器快达三个数量级。在固定分辨率下,其精度也优于以往基于学习的求解器。

1 INTRODUCTION

1 引言

Many problems in science and engineering involve solving complex partial differential equation (PDE) systems repeatedly for different values of some parameters. Examples arise in molecular dynamics, micro-mechanics, and turbulent flows. Often such systems require fine disc ret iz ation in order to capture the phenomenon being modeled. As a consequence, traditional numerical solvers are slow and sometimes inefficient. For example, when designing materials such as airfoils, one needs to solve the associated inverse problem where thousands of evaluations of the forward model are needed. A fast method can make such problems feasible.

科学与工程中的许多问题涉及针对不同参数值反复求解复杂的偏微分方程(PDE)系统。这类问题常见于分子动力学、微观力学和湍流等领域。为准确捕捉模拟现象,这类系统通常需要精细离散化处理,导致传统数值求解器速度缓慢且效率低下。例如在设计翼型等材料时,需要求解涉及数千次正向模型评估的逆问题,此时快速求解方法能显著提升可行性。

Conventional solvers vs. Data-driven methods. Traditional solvers such as finite element methods (FEM) and finite difference methods (FDM) solve the equation by disc ret i zing the space. Therefore, they impose a trade-off on the resolution: coarse grids are fast but less accurate; fine grids are accurate but slow. Complex PDE systems, as described above, usually require a very fine discretization, and therefore very challenging and time-consuming for traditional solvers. On the other hand, data-driven methods can directly learn the trajectory of the family of equations from the data. As a result, the learning-based method can be orders of magnitude faster than the conventional solvers.

传统求解器与数据驱动方法的对比。传统求解器如有限元方法(FEM)和有限差分方法(FDM)通过空间离散化来求解方程。因此它们在分辨率上存在权衡:粗网格计算快但精度低;细网格精度高但速度慢。如前所述,复杂偏微分方程系统通常需要非常精细的离散化,这对传统求解器来说极具挑战性且耗时。相比之下,数据驱动方法可以直接从数据中学习方程族的轨迹。因此,基于学习的方法可以比传统求解器快几个数量级。

Machine learning methods may hold the key to revolutionizing scientific disciplines by providing fast solvers that approximate or enhance traditional ones (Raissi et al., 2019; Jiang et al., 2020; Greenfeld et al., 2019; Kochkov et al., 2021). However, classical neural networks map between finite-dimensional spaces and can therefore only learn solutions tied to a specific disc ret iz ation. This is often a limitation for practical applications and therefore the development of mesh-invariant neural networks is required. We first outline two mainstream neural network-based approaches for PDEs – the finite-dimensional operators and Neural-FEM.

机器学习方法可能通过提供近似或增强传统求解器的快速求解器,成为革新科学领域的关键 [20][21][22][23]。然而,经典神经网络在有限维空间之间进行映射,因此只能学习与特定离散化绑定的解。这在实际应用中往往构成限制,因此需要开发网格无关的神经网络。我们首先概述两种基于神经网络的偏微分方程主流方法——有限维算子与神经有限元法(Neural-FEM)。

Finite-dimensional operators. These approaches parameter ize the solution operator as a deep convolutional neural network between finite-dimensional Euclidean spaces Guo et al. (2016); Zhu

有限维算子。这类方法将求解算子参数化为有限维欧几里得空间之间的深度卷积神经网络 (Guo et al., 2016; Zhu)

Zero-shot super-resolution: Navier-Stokes Equation with viscosity $\nu=1\mathrm{e}{-4}$ ; Ground truth on top and prediction on bottom; trained on $64\times64\times20$ dataset; evaluated on $256\times256\times80$ (see Section 5.4).

零样本超分辨率:粘度为$\nu=1\mathrm{e}{-4}$的Navier-Stokes方程;顶部为真实值,底部为预测值;在$64\times64\times20$数据集上训练;在$256\times256\times80$上评估(参见第5.4节)。

Figure 1: top: The architecture of the Fourier layer; bottom: Example flow from Navier-Stokes.

图 1: 上: 傅里叶层架构; 下: 纳维-斯托克斯方程示例流场

& Zabaras (2018); Adler & Oktem (2017); Bhatnagar et al. (2019); Khoo et al. (2017). Such approaches are, by definition, mesh-dependent and will need modifications and tuning for different resolutions and disc ret iz at ions in order to achieve consistent error (if at all possible). Furthermore, these approaches are limited to the disc ret iz ation size and geometry of the training data and hence, it is not possible to query solutions at new points in the domain. In contrast, we show, for our method, both invariance of the error to grid resolution, and the ability to transfer the solution between meshes.

Zabaras (2018); Adler & Oktem (2017); Bhatnagar等人 (2019); Khoo等人 (2017)。这类方法本质上依赖于网格,需要针对不同分辨率和离散化方案进行调整和优化才能实现误差一致性(如果可行的话)。此外,这些方法受限于训练数据的离散化规模和几何结构,因此无法在域内新位置查询解。相比之下,我们的方法既实现了误差对网格分辨率的不变性,又具备在不同网格间传递解的能力。

Neural-FEM. The second approach directly parameter ize s the solution function as a neural network (E & Yu, 2018; Raissi et al., 2019; Bar & Sochen, 2019; Smith et al., 2020; Pan & Duraisamy, 2020). This approach is designed to model one specific instance of the PDE, not the solution operator. It is mesh-independent and accurate, but for any given new instance of the functional parameter/coefficient, it requires training a new neural network. The approach closely resembles classical methods such as finite elements, replacing the linear span of a finite set of local basis functions with the space of neural networks. The Neural-FEM approach suffers from the same computational issue as classical methods: the optimization problem needs to be solved for every new instance. Furthermore, the approach is limited to a setting in which the underlying PDE is known.

神经有限元方法。第二种方法直接将解函数参数化为神经网络 (E & Yu, 2018; Raissi et al., 2019; Bar & Sochen, 2019; Smith et al., 2020; Pan & Duraisamy, 2020)。该方法旨在建模偏微分方程(PDE)的特定实例,而非解算子。它具有网格无关性和高精度,但对于任何新的函数参数/系数实例,都需要重新训练神经网络。该方法与经典方法(如有限元法)非常相似,用神经网络空间取代了有限局部基函数的线性张成空间。神经有限元方法与经典方法存在相同的计算问题:每个新实例都需要重新求解优化问题。此外,该方法仅限于已知底层偏微分方程的场景。

Neural Operators. Recently, a new line of work proposed learning mesh-free, infinitedimensional operators with neural networks (Lu et al., 2019; Bhatt acharya et al., 2020; Nelsen & Stuart, 2020; Li et al., 2020b;a; Patel et al., 2021). The neural operator remedies the mesh-dependent nature of the finite-dimensional operator methods discussed above by producing a single set of network parameters that may be used with different disc ret iz at ions. It has the ability to transfer solutions between meshes. Furthermore, the neural operator needs to be trained only once. Obtaining a solution for a new instance of the parameter requires only a forward pass of the network, alleviating the major computational issues incurred in Neural-FEM methods. Lastly, the neural operator requires no knowledge of the underlying PDE, only data. Thus far, neural operators have not yielded efficient numerical algorithms that can parallel the success of convolutional or recurrent neural networks in the finite-dimensional setting due to the cost of evaluating integral operators. Through the fast Fourier transform, our work alleviates this issue.

神经算子 (Neural Operators)。近期,一系列研究提出用神经网络学习无网格、无限维算子 (Lu et al., 2019; Bhattacharya et al., 2020; Nelsen & Stuart, 2020; Li et al., 2020b;a; Patel et al., 2021)。神经算子通过生成适用于不同离散化场景的单一网络参数集,解决了前述有限维算子方法对网格的依赖性,具备跨网格传递解的能力。此外,神经算子只需训练一次,对新参数实例求解仅需网络前向传播,显著缓解了Neural-FEM方法的计算瓶颈。最后,神经算子无需偏微分方程(PDE)的先验知识,仅依赖数据即可工作。然而,由于积分算子计算成本高昂,现有神经算子尚未发展出能与有限维场景中卷积神经网络/循环神经网络相媲美的高效数值算法。我们的工作通过快速傅里叶变换解决了这一难题。

Fourier Transform. The Fourier transform is frequently used in spectral methods for solving differential equations, since differentiation is equivalent to multiplication in the Fourier domain. Fourier transforms have also played an important role in the development of deep learning. In theory, they appear in the proof of the universal approximation theorem (Hornik et al., 1989) and, empirically, they have been used to speed up convolutional neural networks (Mathieu et al., 2013). Neural network architectures involving the Fourier transform or the use of sinusoidal activation functions have also been proposed and studied (Bengio et al., 2007; Mingo et al., 2004; Sitzmann et al., 2020). Recently, some spectral methods for PDEs have been extended to neural networks (Fan et al., 2019a;b; Kashinath et al., 2020). We build on these works by proposing a neural operator architecture defined directly in Fourier space with quasi-linear time complexity and state-of-the-art approximation capabilities.

傅里叶变换 (Fourier Transform)。傅里叶变换常被用于谱方法求解微分方程,因为在傅里叶域中微分等价于乘法运算。傅里叶变换在深度学习发展历程中也扮演着重要角色:理论上,它出现在通用逼近定理的证明中 [Hornik et al., 1989];实践中,它被用于加速卷积神经网络 [Mathieu et al., 2013]。涉及傅里叶变换或采用正弦激活函数的神经网络架构也已被提出并深入研究 [Bengio et al., 2007; Mingo et al., 2004; Sitzmann et al., 2020]。近期,部分偏微分方程的谱方法被扩展应用于神经网络 [Fan et al., 2019a;b; Kashinath et al., 2020]。我们在这些研究基础上提出了一种直接在傅里叶空间定义的神经算子架构,该架构具有拟线性时间复杂度与最先进的逼近能力。

Our Contributions. We introduce the Fourier neural operator, a novel deep learning architecture able to learn mappings between infinite-dimensional spaces of functions; the integral operator is restricted to a convolution, and instantiated through a linear transformation in the Fourier domain.

我们的贡献。我们提出了一种新型深度学习架构——傅里叶神经算子(Fourier neural operator),能够学习无限维函数空间之间的映射;该积分算子被限制为卷积运算,并通过傅里叶域的线性变换实现。

We observed that the proposed framework can approximate complex operators raising in PDEs that are highly non-linear, with high frequency modes and slow energy decay. The power of neural operators comes from combining linear, global integral operators (via the Fourier transform) and nonlinear, local activation functions. Similar to the way standard neural networks approximate highly non-linear functions by combining linear multiplications with non-linear activation s, the proposed neural operators can approximate highly non-linear operators.

我们观察到,所提出的框架能够逼近偏微分方程(PDE)中出现的具有高度非线性、高频模式和缓慢能量衰减特性的复杂算子。神经算子的强大之处在于结合了线性全局积分算子(通过傅里叶变换实现)和非线性局部激活函数。类似于标准神经网络通过线性乘法与非线性激活函数的组合来逼近高度非线性函数的方式,所提出的神经算子能够逼近高度非线性算子。

2LEARNING OPERATORS

2 学习算子

Our methodology learns a mapping between two infinite dimensional spaces from a finite collection of observed input-output pairs. Let $D\subset\mathbb{R}^{d}$ be a bounded, open set and $\mathcal{A}=\mathcal{A}(D;\mathbb{R}^{d_{a}})$ and $\mathcal{U}=\mathcal{U}(D;\mathbb{R}^{d_{u}})$ be separable Banach spaces of function taking values in $\mathbb{R}^{d_{a}}$ and $\mathbb{R}^{d_{u}}$ respectively. Furthermore let $G^{\dagger}:{\mathcal{A}}\rightarrow{\mathcal{U}}$ be a (typically) non-linear map. We study maps $G^{\dagger}$ which arise as the solution operators of parametric PDEs – see Section 5 for examples. Suppose we have observations ${a_{j},u_{j}}_ {j=1}^{N}$ where $a_{j}\sim\mu$ is an i.i.d. sequence from the probability measure $\mu$ supported on $\mathcal{A}$ and $u_{j}=G^{\dagger}(a_{j})$ is possibly corrupted with noise. We aim to build an approximation of $G^{\dagger}$ by constructing a parametric map

我们的方法通过有限组观测到的输入-输出对,学习两个无限维空间之间的映射关系。设 $D\subset\mathbb{R}^{d}$ 为有界开集,$\mathcal{A}=\mathcal{A}(D;\mathbb{R}^{d_{a}})$ 和 $\mathcal{U}=\mathcal{U}(D;\mathbb{R}^{d_{u}})$ 分别是取值于 $\mathbb{R}^{d_{a}}$ 和 $\mathbb{R}^{d_{u}}$ 的可分巴拿赫函数空间。进一步设 $G^{\dagger}:{\mathcal{A}}\rightarrow{\mathcal{U}}$ 为(通常)非线性映射。我们研究的 $G^{\dagger}$ 映射源自参数化偏微分方程的解算子(具体示例见第5节)。假设观测数据集 ${a_{j},u_{j}}_ {j=1}^{N}$ 中,$a_{j}\sim\mu$ 是支撑在 $\mathcal{A}$ 上的概率测度 $\mu$ 的独立同分布序列,且 $u_{j}=G^{\dagger}(a_{j})$ 可能包含噪声。我们的目标是通过构建参数化映射来逼近 $G^{\dagger}$。

$$
G:{\mathcal{A}}\times\Theta\to{\mathcal{U}}\qquad{\mathrm{or~equivalently}}, G_{\theta}:{\mathcal{A}}\to{\mathcal{U}},\quad\theta\in\Theta
$$

$$
G:{\mathcal{A}}\times\Theta\to{\mathcal{U}}\qquad{\mathrm{或等价地}}, G_{\theta}:{\mathcal{A}}\to{\mathcal{U}},\quad\theta\in\Theta
$$

for some finite-dimensional parameter space $\Theta$ by choosing $\theta^{\dagger}\in\Theta$ so that $G(\cdot,\theta^{\dagger})=G_{\theta^{\dagger}}\approx G^{\dagger}$ . This is a natural framework for learning in infinite-dimensions as one could define a cost functional $C:\mathcal{U}\times\mathcal{U}\to\mathbb{R}$ and seek a minimizer of the problem

对于某个有限维参数空间$\Theta$,通过选择$\theta^{\dagger}\in\Theta$使得$G(\cdot,\theta^{\dagger})=G_{\theta^{\dagger}}\approx G^{\dagger}$。这是无限维度学习的自然框架,因为可以定义成本函数$C:\mathcal{U}\times\mathcal{U}\to\mathbb{R}$并寻求该问题的极小化解。

$$
\operatorname*{min}_ {\theta\in\Theta}\mathbb{E}_{a\sim\mu}[C(G(a,\theta),G^{\dagger}(a))]
$$

$$
\operatorname*{min}_ {\theta\in\Theta}\mathbb{E}_{a\sim\mu}[C(G(a,\theta),G^{\dagger}(a))]
$$

which directly parallels the classical finite-dimensional setting (Vapnik, 1998). Showing the existence of minimizers, in the infinite-dimensional setting, remains a challenging open problem. We will approach this problem in the test-train setting by using a data-driven empirical approximation to the cost used to determine $\theta$ and to test the accuracy of the approximation. Because we conceptualize our methodology in the infinite-dimensional setting, all finite-dimensional approximations share a common set of parameters which are consistent in infinite dimensions. A table of notation is shown in Appendix 3.

这与经典的有限维设定直接对应 (Vapnik, 1998)。在无限维设定中证明极小值的存在性仍是一个具有挑战性的开放问题。我们将通过使用数据驱动的经验近似方法,在测试-训练设定中处理该问题,该近似用于确定参数$\theta$并检验近似的准确性。由于我们在无限维设定中构建方法论,所有有限维近似都共享一组在无限维中保持一致的通用参数。符号表见附录3。

Learning the Operator. Approximating the operator $G^{\dagger}$ is a different and typically much more challenging task than finding the solution $u\in\mathcal{U}$ of a PDE for a single instance of the parameter $a\in{\mathcal{A}}$ . Most existing methods, ranging from classical finite elements, finite differences, and finite volumes to modern machine learning approaches such as physics-informed neural networks

学习算子。近似算子 $G^{\dagger}$ 是一个与求解参数 $a\in{\mathcal{A}}$ 单实例下偏微分方程解 $u\in\mathcal{U}$ 截然不同且通常更具挑战性的任务。现有方法涵盖从经典有限元法、有限差分法、有限体积法,到物理信息神经网络等现代机器学习方法。

(a)

(a)


Figure 2: top: The architecture of the neural operators; bottom: Fourier layer.

图 2: 上: 神经算子架构;下: 傅里叶层。

(a) The full architecture of neural operator: start from input $a$ . 1. Lift to a higher dimension channel space by a neural network $P$ . 2. Apply four layers of integral operators and activation functions. 3. Project back to the target dimension by a neural network $Q$ . Output $u$ . (b) Fourier layers: Start from input $v$ . On top: apply the Fourier transform $\mathcal{F}$ ; a linear transform $R$ on the lower Fourier modes and filters out the higher modes; then apply the inverse Fourier transform ${\mathcal{F}}^{-1}$ . On the bottom: apply a local linear transform $W$ .

(a) 神经算子完整架构:从输入 $a$ 开始。1. 通过神经网络 $P$ 提升至高维通道空间。2. 应用四层积分算子和激活函数。3. 通过神经网络 $Q$ 投影回目标维度。输出 $u$。(b) 傅里叶层:从输入 $v$ 开始。上层:应用傅里叶变换 $\mathcal{F}$;对低频傅里叶模态进行线性变换 $R$ 并滤除高频模态;然后应用逆傅里叶变换 ${\mathcal{F}}^{-1}$。下层:应用局部线性变换 $W$。

(PINNs) (Raissi et al., 2019) aim at the latter and can therefore be computationally expensive. This makes them impractical for applications where a solution to the PDE is required for many different instances of the parameter. On the other hand, our approach directly approximates the operator and is therefore much cheaper and faster, offering tremendous computational savings when compared to traditional solvers. For an example application to Bayesian inverse problems, see Section 5.5.

物理信息神经网络 (PINNs) (Raissi et al., 2019) 针对后者设计,因此计算成本较高。这使得它们在需要为参数多个不同实例求解偏微分方程的应用中不切实际。相比之下,我们的方法直接逼近算子,因此更经济高效,与传统求解器相比可大幅节省计算量。关于贝叶斯反问题的应用示例,请参阅第5.5节。

Disc ret iz ation. Since our data $a_{j}$ and $u_{j}$ are, in general, functions, to work with them numerically, we assume access only to point-wise evaluations. Let $D_{j}={x_{1},\ldots,x_{n}}\subset D$ be a $n$ -point disc ret iz ation of the domain $D$ and assume we have observations $a_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{a}}$ , $u_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{v}}$ , for a finite collection of input-output pairs indexed by $j$ . To be disc ret iz ation-invariant, the neural operator can produce an answer $u(x)$ for any $x\in D$ , potentially $x\notin D_{j}$ . Such a property is highly desirable as it allows a transfer of solutions between different grid geometries and disc ret iz at ions.

离散化处理。由于我们的数据 $a_{j}$ 和 $u_{j}$ 通常是函数形式,为进行数值计算,我们假设仅能获取其点值评估。设 $D_{j}={x_{1},\ldots,x_{n}}\subset D$ 为定义域 $D$ 的 $n$ 点离散化集合,并假设我们观测到有限组输入-输出对(索引为 $j$)的 $a_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{a}}$ 和 $u_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{v}}$。为实现离散化不变性,该神经算子需能为任意 $x\in D$(包括 $x\notin D_{j}$ 的情况)生成解 $u(x)$。此特性极具价值,因其支持不同网格几何与离散化方案间的解迁移。

3 NEURAL OPERATOR

3 神经算子

The neural operator, proposed in (Li et al., 2020b), is formulated as an iterative architecture $v_{0}\mapsto$ $v_{1}\mapsto\ldots\mapsto v_{T}$ where $v_{j}$ for $j=0,1,\ldots,T-1$ is a sequence of functions each taking values in $\mathbb{R}^{d_{v}}$ . As shown in Figure 2 (a), the input $a\in{\mathcal{A}}$ is first lifted to a higher dimensional representation $v_{0}(x)=P(a(x))$ by the local transformation $P$ which is usually parameterized by a shallow fullyconnected neural network. Then we apply several iterations of updates $v_{t}\mapsto v_{t+1}$ (defined below). The output $u(x)=Q(v_{T}(x))$ is the projection of $v_{T}$ by the local transformation $Q:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{u}}$ . In each iteration, the update $v_{t}\mapsto v_{t+1}$ is defined as the composition of a non-local integral operator $\kappa$ and a local, nonlinear activation function $\sigma$ .

神经算子 (Li et al., 2020b) 被表述为一个迭代架构 $v_{0}\mapsto$ $v_{1}\mapsto\ldots\mapsto v_{T}$ ,其中 $v_{j}$ ( $j=0,1,\ldots,T-1$ ) 是取值于 $\mathbb{R}^{d_{v}}$ 的函数序列。如图 2 (a) 所示,输入 $a\in{\mathcal{A}}$ 首先通过局部变换 $P$ (通常由浅层全连接神经网络参数化) 被提升到更高维表示 $v_{0}(x)=P(a(x))$ 。随后我们应用多次迭代更新 $v_{t}\mapsto v_{t+1}$ (定义如下)。输出 $u(x)=Q(v_{T}(x))$ 是通过局部变换 $Q:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{u}}$ 对 $v_{T}$ 的投影。每次迭代中,更新 $v_{t}\mapsto v_{t+1}$ 被定义为非局部积分算子 $\kappa$ 与局部非线性激活函数 $\sigma$ 的组合。

Definition 1 (Iterative updates) Define the update to the representation $v_{t}\mapsto v_{t+1}$ by

定义1 (迭代更新) 将表示更新定义为 $v_{t}\mapsto v_{t+1}$

$$
v_{t+1}(x):=\sigma\Big(W v_{t}(x)+\big(K(a;\phi)v_{t}\big)(x)\Big),\qquad\forall x\in D
$$

$$
v_{t+1}(x):=\sigma\Big(W v_{t}(x)+\big(K(a;\phi)v_{t}\big)(x)\Big),\qquad\forall x\in D
$$

where $\mathcal{K}:\mathcal{A}\times\Theta_{\mathcal{K}}\to\mathcal{L}(\mathcal{U}(D;\mathbb{R}^{d_{v}}),\mathcal{U}(D;\mathbb{R}^{d_{v}}))$ maps to bounded linear operators on $\mathcal{U}(D;\mathbb{R}^{d_{v}})$ and is parameterized by $\phi\in\Theta_{K}$ , $W:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}}$ is a linear transformation, and $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ is $a$ non-linear activation function whose action is defined component-wise.

其中 $\mathcal{K}:\mathcal{A}\times\Theta_{\mathcal{K}}\to\mathcal{L}(\mathcal{U}(D;\mathbb{R}^{d_{v}}),\mathcal{U}(D;\mathbb{R}^{d_{v}}))$ 映射到 $\mathcal{U}(D;\mathbb{R}^{d_{v}})$ 上的有界线性算子,并由 $\phi\in\Theta_{K}$ 参数化,$W:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}}$ 是线性变换,$\sigma:\mathbb{R}\rightarrow\mathbb{R}$ 是一个逐分量定义的非线性激活函数。

We choose $\textstyle\mathcal{K}(a;\phi)$ to be a kernel integral transformation parameterized by a neural network.

我们选择$\textstyle\mathcal{K}(a;\phi)$作为由神经网络参数化的核积分变换。

Definition 2 (Kernel integral operator $\kappa$ ) Define the kernel integral operator mapping in (2) by

定义 2 (核积分算子 $\kappa$) 将式 (2) 中的核积分算子映射定义为

$$
\big(K(a;\phi)v_{t}\big)(x):=\int_{D}\kappa\big(x,y,a(x),a(y);\phi\big)v_{t}(y)\mathrm{d}y,\qquad\forall x\in D
$$

$$
\big(K(a;\phi)v_{t}\big)(x):=\int_{D}\kappa\big(x,y,a(x),a(y);\phi\big)v_{t}(y)\mathrm{d}y,\qquad\forall x\in D
$$

where $\kappa_{\phi}:\mathbb{R}^{2(d+d_{a})}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ is a neural network parameterized by $\phi\in\Theta_{\kappa}$

其中 $\kappa_{\phi}:\mathbb{R}^{2(d+d_{a})}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 是由参数 $\phi\in\Theta_{\kappa}$ 定义的神经网络

Here $\kappa_{\phi}$ plays the role of a kernel function which we learn from data. Together definitions 1 and 2 constitute a generalization of neural networks to infinite-dimensional spaces as first proposed in Li et al. (2020b). Notice even the integral operator is linear, the neural operator can learn highly non-linear operators by composing linear integral operators with non-linear activation functions, analogous to standard neural networks.

这里 $\kappa_{\phi}$ 作为我们从数据中学习的核函数发挥作用。定义1和定义2共同构成了将神经网络推广到无限维空间的泛化框架,最初由Li等人(2020b)提出。值得注意的是,虽然积分算子是线性的,但神经算子可以通过将线性积分算子与非线性激活函数组合来学习高度非线性的算子,这与标准神经网络类似。

If we remove the dependence on the function $a$ and impose $\kappa_{\phi}(x,y)=\kappa_{\phi}(x-y)$ , we obtain that (3) is a convolution operator, which is a natural choice from the perspective of fundamental solutions. We exploit this fact in the following section by parameter i zing $\kappa_{\phi}$ directly in Fourier space and using the Fast Fourier Transform (FFT) to efficiently compute (3). This leads to a fast architecture that obtains state-of-the-art results for PDE problems.

如果我们去除对函数 $a$ 的依赖,并设定 $\kappa_{\phi}(x,y)=\kappa_{\phi}(x-y)$ ,则可得出 (3) 式是一个卷积算子。从基本解的角度来看,这是自然的选择。在下一节中,我们将利用这一事实,直接在傅里叶空间中对 $\kappa_{\phi}$ 进行参数化,并使用快速傅里叶变换 (FFT) 高效计算 (3) 式。这种快速架构在处理偏微分方程 (PDE) 问题时能取得最先进的结果。

4 FOURIER NEURAL OPERATOR

4 傅里叶神经算子 (Fourier Neural Operator)

We propose replacing the kernel integral operator in (3), by a convolution operator defined in Fourier space. Let $\mathcal{F}$ denote the Fourier transform of a function $\dot{f}:D\rightarrow\mathbb{R}^{d_{v}}$ and ${\mathcal{F}}^{-1}$ its inverse then

我们提出用傅里叶空间定义的卷积算子替代(3)中的核积分算子。设$\mathcal{F}$表示函数$\dot{f}:D\rightarrow\mathbb{R}^{d_{v}}$的傅里叶变换,${\mathcal{F}}^{-1}$为其逆变换,则

$$
(\mathcal{F}f)_ {j}(k)=\int_{D}f_{j}(x)e^{-2i\pi\langle x,k\rangle}\mathrm{d}x,\qquad(\mathcal{F}^{-1}f)_ {j}(x)=\int_{D}f_{j}(k)e^{2i\pi\langle x,k\rangle}\mathrm{d}k
$$

$$
(\mathcal{F}f)_ {j}(k)=\int_{D}f_{j}(x)e^{-2i\pi\langle x,k\rangle}\mathrm{d}x,\qquad(\mathcal{F}^{-1}f)_ {j}(x)=\int_{D}f_{j}(k)e^{2i\pi\langle x,k\rangle}\mathrm{d}k
$$

for $j=1,\ldots,d_{v}$ where $i=\sqrt{-1}$ is the imaginary unit. By letting $\kappa_{\phi}(x,y,a(x),a(y))=\kappa_{\phi}(x-y)$ in (3) and applying the convolution theorem, we find that

对于 $j=1,\ldots,d_{v}$,其中 $i=\sqrt{-1}$ 为虚数单位。令 (3) 式中的 $\kappa_{\phi}(x,y,a(x),a(y))=\kappa_{\phi}(x-y)$ 并应用卷积定理,可得

$$
\bigl(K(a;\phi)v_{t}\bigr)(x)=\mathcal{F}^{-1}\bigl(\mathcal{F}(\kappa_{\phi})\cdot\mathcal{F}(v_{t})\bigr)(x),\qquad\forall x\in D.
$$

$$
\bigl(K(a;\phi)v_{t}\bigr)(x)=\mathcal{F}^{-1}\bigl(\mathcal{F}(\kappa_{\phi})\cdot\mathcal{F}(v_{t})\bigr)(x),\qquad\forall x\in D.
$$

We, therefore, propose to directly parameter ize $\kappa_{\phi}$ in Fourier space.

因此,我们建议直接在傅里叶空间中参数化 $\kappa_{\phi}$。

Definition 3 (Fourier integral operator $\kappa$ ) Define the Fourier integral operator

定义 3 (傅里叶积分算子 $\kappa$) 定义傅里叶积分算子

$$
{\big(}K(\phi)v_{t}{\big)}(x)={\mathcal{F}}^{-1}{\Big(}R_{\phi}\cdot\left({\mathcal{F}}v_{t}\right){\Big)}(x)\qquad\forall x\in D
$$

$$
{\big(}K(\phi)v_{t}{\big)}(x)={\mathcal{F}}^{-1}{\Big(}R_{\phi}\cdot\left({\mathcal{F}}v_{t}\right){\Big)}(x)\qquad\forall x\in D
$$

where $R_{\phi}$ is the Fourier transform of a periodic function $\kappa:\bar{D}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ parameterized by $\phi\in\Theta_{K}$ . An illustration is given in Figure 2 (b).

其中 $R_{\phi}$ 是周期函数 $\kappa:\bar{D}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 的傅里叶变换,由 $\phi\in\Theta_{K}$ 参数化。图 2 (b) 给出了图示。

For frequency mode $k\in D$ , we have $(\mathcal{F}v_{t})(k)\in\mathbb{C}^{d_{v}}$ and $R_{\phi}(k)\in\mathbb{C}^{d_{v}\times d_{v}}$ . Notice that since we assume $\kappa$ is periodic, it admits a Fourier series expansion, so we may work with the discrete modes $k\in{\mathbb{Z}^{d}}$ . We pick a finite-dimensional parameter iz ation by truncating the Fourier series at a maximal number of modes $k_{\operatorname*{max}}=|Z_{k_{\operatorname*{max}}}|\stackrel{\cdot}{=}|{k\in\mathbb{Z}^{d}:|k_{j}|\stackrel{\cdot}{\leq}k_{\operatorname*{max},j} $ , for $j=1,\ldots,d}$ . We thus parameter ize $R_{\phi}$ directly as complex-valued $(k_{\operatorname*{max}}\times d_{v}\times d_{v})$ -tensor comprising a collection of truncated Fourier modes and therefore drop $\phi$ from our notation. Since $\kappa$ is real-valued, we impose conjugate symmetry. We note that the set $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ is not the canonical choice for the low frequency modes of $v_{t}$ . Indeed, the low frequency modes are usually defined by placing an upper-bound on the $\ell_{1}$ -norm of $k\in{\mathbb{Z}^{d}}$ . We choose $Z_{\boldsymbol{k}_{\mathrm{max}}}$ as above since it allows for an efficient implementation.

对于频率模式 $k\in D$,我们有 $(\mathcal{F}v_{t})(k)\in\mathbb{C}^{d_{v}}$ 和 $R_{\phi}(k)\in\mathbb{C}^{d_{v}\times d_{v}}$。注意到由于假设 $\kappa$ 是周期性的,它允许傅里叶级数展开,因此我们可以处理离散模式 $k\in{\mathbb{Z}^{d}}$。我们通过截断傅里叶级数在最大模式数 $k_{\operatorname*{max}}=|Z_{k_{\operatorname*{max}}}|\stackrel{\cdot}{=}|{k\in\mathbb{Z}^{d}:|k_{j}|\stackrel{\cdot}{\leq}k_{\operatorname*{max},j}$(对于 $j=1,\ldots,d}$)处选择一个有限维参数化。因此,我们将 $R_{\phi}$ 直接参数化为复值 $(k_{\operatorname*{max}}\times d_{v}\times d_{v})$ 张量,包含一组截断的傅里叶模式,因此在符号中去掉了 $\phi$。由于 $\kappa$ 是实值的,我们施加共轭对称性。我们注意到集合 $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ 并不是 $v_{t}$ 低频模式的标准选择。实际上,低频模式通常是通过对 $k\in{\mathbb{Z}^{d}}$ 的 $\ell_{1}$ 范数设置上限来定义的。我们选择上述的 $Z_{\boldsymbol{k}_{\mathrm{max}}}$ 是因为它允许高效的实现。

The discrete case and the FFT. Assuming the domain $D$ is disc ret i zed with $n\in\mathbb{N}$ points, we have that $\mathbf{v}_ t \in \mathbb{R}^{n \times d_v}$ and $\mathcal{F}(v_{t})\in\mathbb{C}^{n\times d_{v}^{\star}}$ . Since we convolve $v_{t}$ with a function which only has $k_{\mathrm{max}}$ Fourier modes, we may simply truncate the higher modes to obtain $\mathcal{F}(v_{t})\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}}$ . Multiplication by the weight tensor $R\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}\times d_{v}}$ is then

离散情形与FFT。假设域$D$被离散化为$n\in\mathbb{N}$个点,则有$\mathbf{v}_ t \in \mathbb{R}^{n \times d_v}$且$\mathcal{F}(v_{t})\in\mathbb{C}^{n\times d_{v}^{\star}}$。由于我们仅用含$k_{\mathrm{max}}$个傅里叶模式的函数对$v_{t}$进行卷积,可直接截断高频模式得到$\mathcal{F}(v_{t})\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}}$。随后通过权重张量$R\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}\times d_{v}}$进行乘法运算。

$$
\big(R\cdot\big(\mathcal{F}v_{t}\big)\big)_ {k,l}=\sum_{j=1}^{d_{v}}R_{k,l,j}\big(\mathcal{F}v_{t}\big)_ {k,j},k=1,\ldots,k_{\operatorname*{max}},\quad j=1,\ldots,d_{v}.
$$

$$
\big(R\cdot\big(\mathcal{F}v_{t}\big)\big)_ {k,l}=\sum_{j=1}^{d_{v}}R_{k,l,j}\big(\mathcal{F}v_{t}\big)_ {k,j},k=1,\ldots,k_{\operatorname*{max}},\quad j=1,\ldots,d_{v}.
$$

When the disc ret iz ation is uniform with resolution $s_{1}\times\cdots\times s_{d}=n$ , $\mathcal{F}$ can be replaced by the Fast Fourier Transform. For $f\in\mathbb{R}^{n\times d_{v}}$ , $k=(k_{1},\dots,k_{d})\in\mathbb{Z}_ {s_{1}}\times\dots\times\mathbb{Z}_ {s_{d}}$ , and $x=(x_{1},\dots,\overset{\cdot}{x}_{d})\in D$ , the FFT $\hat{\mathcal F}$ and its inverse ${\hat{\mathcal{F}}}^{-1}$ are defined as

当离散化均匀且分辨率为 $s_{1}\times\cdots\times s_{d}=n$ 时,$\mathcal{F}$ 可被快速傅里叶变换 (Fast Fourier Transform) 替代。对于 $f\in\mathbb{R}^{n\times d_{v}}$、$k=(k_{1},\dots,k_{d})\in\mathbb{Z}_{s_{1}}\times\dots\times\mathbb{Z}_ {s_{d}}$ 以及 $x=(x_{1},\dots,\overset{\cdot}{x}_{d})\in D$,FFT $\hat{\mathcal F}$ 及其逆变换 ${\hat{\mathcal{F}}}^{-1}$ 定义为

$$
\begin{array}{r l r}&{}&{\displaystyle(\hat{\mathcal{F}}f)_ {l}(k)=\sum_{x_{1}=0}^{s_{1}-1}\cdots\sum_{x_{d}=0}^{s_{d}-1}f_{l}(x_{1},\ldots,x_{d})e^{-2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}},}\ &{}&{\displaystyle(\hat{\mathcal{F}}^{-1}f)_ {l}(x)=\sum_{k_{1}=0}^{s_{1}-1}\cdots\sum_{k_{d}=0}^{s_{d}-1}f_{l}(k_{1},\ldots,k_{d})e^{2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}}}\end{array}
$$

$$
\begin{array}{r l r}&{}&{\displaystyle(\hat{\mathcal{F}}f)_ {l}(k)=\sum_{x_{1}=0}^{s_{1}-1}\cdots\sum_{x_{d}=0}^{s_{d}-1}f_{l}(x_{1},\ldots,x_{d})e^{-2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}},}\ &{}&{\displaystyle(\hat{\mathcal{F}}^{-1}f)_ {l}(x)=\sum_{k_{1}=0}^{s_{1}-1}\cdots\sum_{k_{d}=0}^{s_{d}-1}f_{l}(k_{1},\ldots,k_{d})e^{2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}}}\end{array}
$$

for $l=1,\ldots,d_{v}$ . In this case, the set of truncated modes becomes

对于 $l=1,\ldots,d_{v}$。此时,截断模态集变为
image.png

image.png

When implemented, $R$ is treated as a $\left(s_{1}\times\dots\times s_{d}\times d_{v}\times d_{v}\right)$ -tensor and the above definition of $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ corresponds to the “corners” of $R$ , which allows for a straight-forward parallel implementation of (5) via matrix-vector multiplication. In practice, we have found that choosing $k_{\operatorname*{max},j}=12$ which yields $k_{\operatorname*{max}}=12^{d}$ parameters per channel to be sufficient for all the tasks that we consider.

实现时,$R$被视为一个$\left(s_{1}\times\dots\times s_{d}\times d_{v}\times d_{v}\right)$维张量,上述$Z_{\boldsymbol{k}_ {\mathrm{max}}}$的定义对应$R$的"角点",这使得(5)式可通过矩阵-向量乘法直接并行实现。实践中我们发现,选择$k_{\operatorname*{max},j}=12$(每通道产生$k_{\operatorname*{max}}=12^{d}$个参数)足以应对所有考虑的任务。

Parameter iz at ions of $R$ . In general, $R$ can be defined to depend on $(\mathcal{F}a)$ to parallel (3). Indeed, we can define $R_{\phi}:\mathbb{Z}^{d}\times\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ as a parametric function that maps $(k,(\mathcal{F}a)(k))$ to the values of the appropriate Fourier modes. We have experimented with linear as well as neural network parameter iz at ions of $R_{\phi}$ . We find that the linear parameter iz ation has a similar performance to the previously described direct parameter iz ation, while neural networks have worse performance. This is likely due to the discrete structure of the space $\mathbb{Z}^{d}$ . Our experiments in this work focus on the direct parameter iz ation presented above.

参数化 $R$。通常,可以定义 $R$ 依赖于 $(\mathcal{F}a)$ 以与式 (3) 对应。具体而言,我们可以将 $R_{\phi}:\mathbb{Z}^{d}\times\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 定义为参数化函数,将 $(k,(\mathcal{F}a)(k))$ 映射到相应傅里叶模式的值。我们尝试了 $R_{\phi}$ 的线性参数化和神经网络参数化,发现线性参数化与前述直接参数化性能相当,而神经网络性能较差,这可能是由于 $\mathbb{Z}^{d}$ 空间的离散结构所致。本文实验主要采用上述直接参数化方法。

Invariance to disc ret iz ation. The Fourier layers are disc ret iz ation-invariant because they can learn from and evaluate functions which are disc ret i zed in an arbitrary way. Since parameters are learned directly in Fourier space, resolving the functions in physical space simply amounts to projecting on the basis $e^{2\pi i\langle x,k\rangle}$ which are well-defined everywhere on $\mathbb{R}^{d}$ . This allows us to achieve zero-shot super-resolution as shown in Section 5.4. Furthermore, our architecture has a consistent error at any resolution of the inputs and outputs. On the other hand, notice that, in Figure 3, the standard CNN methods we compare against have an error that grows with the resolution.

离散化不变性。Fourier层具有离散化不变性,因为它们能够从任意方式离散化的函数中学习并进行评估。由于参数直接在傅里叶空间学习,在物理空间解析函数仅需投影到定义于$\mathbb{R}^{d}$全域的基函数$e^{2\pi i\langle x,k\rangle}$上。如第5.4节所示,这使我们能够实现零样本超分辨率。此外,我们的架构在输入和输出的任何分辨率下都保持一致的误差。值得注意的是,在图3中,作为对比的标准CNN方法其误差会随分辨率增加而增大。

Quasi-linear complexity. The weight tensor $R$ contains $k_{\operatorname*{max}}<n$ modes, so the inner multiplication has complexity $O(k_{\operatorname*{max}})$ . Therefore, the majority of the computational cost lies in computing the Fourier transform $\mathcal{F}(v_{t})$ and its inverse. General Fourier transforms have complexity ${\dot{O}}(n^{2})$ , however, since we truncate the series the complexity is in fact $O(n k_{\operatorname*{max}})$ , while the FFT has complexity $O(n\log n)$ . Generally, we have found using FFTs to be very efficient. However a uniform disc ret iz ation is required.

拟线性复杂度。权重张量 $R$ 包含 $k_{\operatorname*{max}}<n$ 个模态,因此内积运算复杂度为 $O(k_{\operatorname*{max}})$ 。主要计算开销在于计算傅里叶变换 $\mathcal{F}(v_{t})$ 及其逆变换。常规傅里叶变换复杂度为 ${\dot{O}}(n^{2})$ ,但由于我们截断了级数,实际复杂度为 $O(n k_{\operatorname*{max}})$ ,而快速傅里叶变换(FFT)的复杂度为 $O(n\log n)$ 。实践发现FFT效率极高,但要求采用均匀离散化处理。

5 NUMERICAL EXPERIMENTS

5 数值实验

In this section, we compare the proposed Fourier neural operator with multiple finite-dimensional architectures as well as operator-based approximation methods on the 1-d Burgers’ equation, the 2-d Darcy Flow problem, and 2-d Navier-Stokes equation. The data generation processes are discussed in Appendices A.3.1, A.3.2, and A.3.3 respectively. We do not compare against traditional solvers (FEM/FDM) or neural-FEM type methods since our goal is to produce an efficient operator approximation that can be used for downstream applications. We demonstrate one such application to the Bayesian inverse problem in Section 5.5.

在本节中,我们将提出的傅里叶神经算子 (Fourier neural operator) 与多种有限维架构以及基于算子的逼近方法进行比较,测试案例包括一维Burgers方程、二维Darcy流问题和二维Navier-Stokes方程。数据生成过程分别在附录A.3.1、A.3.2和A.3.3中讨论。由于我们的目标是构建可用于下游应用的高效算子逼近方法,因此未与传统求解器 (FEM/FDM) 或神经-FEM类方法进行比较。我们在5.5节展示了该方法在贝叶斯逆问题中的典型应用。

We construct our Fourier neural operator by stacking four Fourier integral operator layers as specified in (2) and (4) with the ReLU activation as well as batch normalization. Unless otherwise specified, we use $N=1000$ training instances and 200 testing instances. We use Adam optimizer to train for 500 epochs with an initial learning rate of 0.001 that is halved every 100 epochs. We set $k_{\mathrm{max},j}=16,d_{v}=64$ for the 1-d problem and $k_{\mathrm{max},j}=12,d_{v}=32$ for the 2-d problems. Lower resolution data are down sampled from higher resolution. All the computation is carried on a single Nvidia V100 GPU with 16GB memory.

我们通过堆叠四个由(2)和(4)式定义的傅里叶积分算子层来构建傅里叶神经算子,其中包含ReLU激活函数和批量归一化。除非另有说明,我们使用1000个训练实例和200个测试实例。采用Adam优化器进行500轮训练,初始学习率为0.001,每100轮减半。对于一维问题设置$k_{\mathrm{max},j}=16,d_{v}=64$,二维问题设置$k_{\mathrm{max},j}=12,d_{v}=32$。低分辨率数据由高分辨率降采样得到。所有计算均在16GB显存的Nvidia V100 GPU上完成。

Remark on Resolution. Traditional PDE solvers such as FEM and FDM approximate a single function and therefore their error to the continuum decreases as the resolution is increased. On the other hand, operator approximation is independent of the ways its data is disc ret i zed as long as all relevant information is resolved. Resolution-invariant operators have consistent error rates among different resolutions as shown in Figure 3. Further, resolution-invariant operators can do zero-shot super-resolution, as shown in Section 5.4.

关于分辨率的说明。传统偏微分方程(PDE)求解器(如有限元法(FEM)和有限差分法(FDM))通过逼近单个函数来实现求解,因此随着分辨率提高,其与连续解的误差会减小。而算子逼近的精度与数据离散方式无关,只要所有相关信息被解析即可。如图3所示,分辨率不变的算子在不同分辨率下具有一致的误差率。此外,如第5.4节所示,分辨率不变的算子还能实现零样本超分辨率。

Benchmarks for time-independent problems (Burgers and Darcy): NN: a simple point-wise feed forward neural network. RBM: the classical Reduced Basis Method (using a POD basis) (De

时间无关问题基准测试(Burgers和Darcy方程):
NN: 简单的逐点前馈神经网络
RBM: 经典降基方法(采用POD基) (De


Figure 3: Benchmark on Burger’s equation, Darcy Flow, and Navier-Stokes

图 3: Burger方程、Darcy流和Navier-Stokes的基准测试

Left: benchmarks on Burgers equation; Mid: benchmarks on Darcy Flow for different resolutions; Right: the learning curves on Navier-Stokes $\nu=1\mathrm{{e}-3}$ with different benchmarks. Train and test on the same resolution. For acronyms, see Section 5; details in Tables 1, 3, 4.

左:Burgers方程的基准测试;中:不同分辨率的Darcy Flow基准测试;右:Navier-Stokes $\nu=1\mathrm{{e}-3}$ 在不同基准下的学习曲线。训练与测试采用相同分辨率。缩略语参见第5节;细节见表1、表3、表4。

Vore, 2014). FCN: a the-state-of-the-art neural network architecture based on Fully Convolution Networks (Zhu & Zabaras, 2018). PCANN: an operator method using PCA as an auto encoder on both the input and output data and interpolating the latent spaces with a neural network (Bhattacharya et al., 2020). GNO: the original graph neural operator (Li et al., 2020b). MGNO: the multipole graph neural operator (Li et al., 2020a). LNO: a neural operator method based on the low-rank decomposition of the kernel $\begin{array}{r}{\kappa(x,y):=\sum_{j=1}^{r}\phi_{j}(x)\psi_{j}(y)}\end{array}$ , similar to the unstacked DeepONet proposed in (Lu et al., 2019). FNO: the ne wly purposed Fourier neural operator.

Vore, 2014)。FCN:一种基于全卷积网络 (Fully Convolution Networks) 的先进神经网络架构 (Zhu & Zabaras, 2018)。PCANN:一种算子方法,在输入和输出数据上使用PCA作为自动编码器,并通过神经网络对潜在空间进行插值 (Bhattacharya et al., 2020)。GNO:原始图神经算子 (Li et al., 2020b)。MGNO:多极图神经算子 (Li et al., 2020a)。LNO:一种基于核低秩分解的神经算子方法,其核函数表示为 $\begin{array}{r}{\kappa(x,y):=\sum_{j=1}^{r}\phi_{j}(x)\psi_{j}(y)}\end{array}$,类似于 (Lu et al., 2019) 提出的非堆叠式DeepONet。FNO:新提出的傅里叶神经算子。

Benchmarks for time-dependent problems (Navier-Stokes): ResNet: 18 layers of 2-d convolution with residual connections (He et al., 2016). U-Net: A popular choice for image-to-image regression tasks consisting of four blocks with 2-d convolutions and de convolutions (Ronne berger et al., 2015). TF-Net: A network designed for learning turbulent flows based on a combination of spatial and temporal convolutions (Wang et al., 2020). FNO-2d: 2-d Fourier neural operator with a RNN structure in time. FNO-3d: 3-d Fourier neural operator that directly convolves in space-time.

时间相关问题基准测试(Navier-Stokes):
ResNet: 18层二维卷积网络,带残差连接(He et al., 2016)。
U-Net: 图像到图像回归任务的常用架构,包含四个二维卷积和反卷积模块(Ronneberger et al., 2015)。
TF-Net: 专为学习湍流设计的网络,结合了空间和时间卷积(Wang et al., 2020)。
FNO-2d: 二维傅里叶神经算子,时间维度采用RNN结构。
FNO-3d: 三维傅里叶神经算子,直接在时空域进行卷积。

5.1 BURGERS’ EQUATION

5.1 伯格斯方程 (Burgers' Equation)

The 1-d Burgers’ equation is a non-linear PDE with various applications including modeling the one dimensional flow of a viscous fluid. It takes the form

一维 Burgers 方程是一种非线性偏微分方程,其应用包括模拟粘性流体的一维流动。其形式为

$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1)}\end{array}
$$

$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1)}\end{array}
$$

with periodic boundary conditions where $u_{0}\in L_{\mathrm{per}}^{2}((0,1);\mathbb{R})$ is the initial condition and $\nu\in\mathbb{R}_ {+}$ is the viscosity coefficient. We aim to learn the operator mapping the initial condition to the solution at time one, $G^{\dagger}:L_{\mathrm{per}}^{2}((0,1);\mathbb{R})\to H_{\mathrm{per}}^{r}((0,1);\mathbb{R})$ defined by $\bar{u}_{0}\mapsto u(\cdot,1)$ for any $r>0$ .

在周期性边界条件下,其中 $u_{0}\in L_{\mathrm{per}}^{2}((0,1);\mathbb{R})$ 是初始条件,$\nu\in\mathbb{R}_ {+}$ 是粘性系数。我们的目标是学习将初始条件映射到时间为一时的解的算子 $G^{\dagger}:L_{\mathrm{per}}^{2}((0,1);\mathbb{R})\to H_{\mathrm{per}}^{r}((0,1);\mathbb{R})$,定义为 $\bar{u}_{0}\mapsto u(\cdot,1)$,其中 $r>0$。

The results of our experiments are shown in Figure 3 (a) and Table 3 (Appendix A.3.1). Our proposed method obtains the lowest relative error compared to any of the benchmarks. Further, the error is invariant with the resolution, while the error of convolution neural network based methods (FCN) grows with the resolution. Compared to other neural operator methods such as GNO and MGNO that use Nystrom sampling in physical space, the Fourier neural operator is both more accurate and more computationally efficient.

我们的实验结果如图 3 (a) 和表 3 (附录 A.3.1) 所示。与所有基准方法相比,我们提出的方法获得了最低的相对误差。此外,该误差不随分辨率变化,而基于卷积神经网络的方法 (FCN) 的误差会随分辨率增加。与在物理空间使用 Nystrom 采样的其他神经算子方法 (如 GNO 和 MGNO) 相比,傅里叶神经算子 (Fourier neural operator) 在精度和计算效率上都更优。

5.2 DARCY FLOW

5.2 达西流 (Darcy Flow)

We consider the steady-state of the 2-d Darcy Flow equation on the unit box which is the second order, linear, elliptic PDE

我们考虑单位盒上二维达西流(Darcy Flow)方程的稳态情况,这是一个二阶线性椭圆型偏微分方程(PDE)

$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}}\end{array}
$$

$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}}\end{array}
$$

with a Dirichlet boundary where $a\in L^{\infty}((0,1)^{2};\mathbb{R}_ {+})$ is the diffusion coefficient and $f\in$ $L^{2}((0,1)^{2};\mathbb{R})$ is the forcing function. This PDE has numerous applications including modeling the pressure of subsurface flow, the deformation of linearly elastic materials, and the electric potential in conductive materials. We are interested in learning the operator mapping the diffusion coefficient to the solution, $G^{\dagger}:L^{\infty}((0,1)_ {.}^{2};\mathbb{R}_ {+})\to H_{0}^{1}((0,\bar{1})^{2};\mathbb{R}_{+})$ defined by $a\mapsto u$ . Note that although the PDE is linear, the operator $G^{\dagger}$ is not.

具有Dirichlet边界条件,其中$a\in L^{\infty}((0,1)^{2};\mathbb{R}_ {+})$为扩散系数,$f\in L^{2}((0,1)^{2};\mathbb{R})$为外力函数。该偏微分方程在模拟地下流动压力、线性弹性材料变形以及导电材料电势等领域有广泛应用。我们关注于学习从扩散系数映射到解的算子$G^{\dagger}:L^{\infty}((0,1)^{2};\mathbb{R}_ {+})\to H_{0}^{1}((0,1)^{2};\mathbb{R}_{+})$,其定义为$a\mapsto u$。需注意尽管该偏微分方程是线性的,但算子$G^{\dagger}$并非线性。

The results of our experiments are shown in Figure 3 (b) and Table 4 (Appendix A.3.2). The proposed Fourier neural operator obtains nearly one order of magnitude lower relative error compared to any benchmarks. We again observe the invariance of the error with respect to the resolution.

我们的实验结果如图 3 (b) 和表 4 (附录 A.3.2) 所示。相比所有基准方法,提出的傅里叶神经算子 (Fourier neural operator) 获得了近一个数量级更低的相对误差。我们再次观察到误差对分辨率的不变性。

5.3 NAVIER-STOKES EQUATION

5.3 纳维-斯托克斯方程 (Navier-Stokes Equation)

We consider the 2-d Navier-Stokes equation for a viscous, incompressible fluid in vorticity form on the unit torus:

我们考虑单位环面上黏性不可压缩流体的二维纳维-斯托克斯方程涡量形式:

$$
\begin{array}{r l r l}&{\partial_{t}w(x,t)+u(x,t)\cdot\nabla w(x,t)=\nu\Delta w(x,t)+f(x),}&&{x\in(0,1)^{2},t\in(0,T]}\ &{}&&{\nabla\cdot u(x,t)=0,}&&{x\in(0,1)^{2},t\in[0,T]}\ &{}&&{w(x,0)=w_{0}(x),}&&{x\in(0,1)^{2}}\end{array}
$$

$$
\begin{array}{r l r l}&{\partial_{t}w(x,t)+u(x,t)\cdot\nabla w(x,t)=\nu\Delta w(x,t)+f(x),}&&{x\in(0,1)^{2},t\in(0,T]}\ &{}&&{\nabla\cdot u(x,t)=0,}&&{x\in(0,1)^{2},t\in[0,T]}\ &{}&&{w(x,0)=w_{0}(x),}&&{x\in(0,1)^{2}}\end{array}
$$

where $u\in C([0,T];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}^{2}))$ for any $r>0$ is the velocity field, $w=\nabla\times u$ is the vorticity, $w_{0}\in L_{\mathrm{per}}^{2}((0,1)^{2};\mathbb{R})$ is the initial vorticity, $\nu\in\mathbb{R}_ {+}$ is the viscosity coefficient, and $f\in$ $L_{\mathrm{per}}^{2}((0,1)^{2};\mathbb{R})$ is the forcing function. We are interested in learning the operator mapping the vorticity up to time 10 to the vorticity up to some later time $T>10,G^{\dagger}:C([0,10];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}))\to$ $C((10,T];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}))$ defined by $w|_ {(0,1)^{2}\times[0,10]}\mapsto w|_{(0,1)^{2}\times(10,T]}$ . Given the vorticity it is easy to derive the velocity. While vorticity is harder to model compared to velocity, it provides more information. By formulating the problem on vorticity, the neural network models mimic the pseudospectral method. We experiment with the vis cosi ties $\nu=1{\mathrm{e}}{-}3,1{\mathrm{e}}{-}4,1{\mathrm{e}}{-}5$ , decreasing the final time $T$ as the dynamic becomes chaotic. Since the baseline methods are not resolution-invariant, we fix the resolution to be $64\times64$ for both training and testing.

其中 $u\in C([0,T];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}^{2}))$ (对于任意 $r>0$)表示速度场,$w=\nabla\times u$ 表示涡量,$w_{0}\in L_{\mathrm{per}}^{2}((0,1)^{2};\mathbb{R})$ 表示初始涡量,$\nu\in\mathbb{R}_ {+}$ 表示粘性系数,$f\in L_{\mathrm{per}}^{2}((0,1)^{2};\mathbb{R})$ 表示外力函数。我们关注学习将时间10以内的涡量映射到某个更晚时间 $T>10$ 的涡量的算子 $G^{\dagger}:C([0,10];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}))\to C((10,T];H_{\mathrm{per}}^{r}((0,1)^{2};\mathbb{R}))$,其定义为 $w|_ {(0,1)^{2}\times[0,10]}\mapsto w|_{(0,1)^{2}\times(10,T]}$。给定涡量后很容易推导出速度。虽然涡量比速度更难建模,但它提供了更多信息。通过在涡量上建立问题,神经网络模型模仿了伪谱方法。我们实验了粘性系数 $\nu=1{\mathrm{e}}{-}3,1{\mathrm{e}}{-}4,1{\mathrm{e}}{-}5$,并随着动态变得混沌而减小终止时间 $T$。由于基线方法不具备分辨率不变性,我们将训练和测试的分辨率固定为 $64\times64$。

Table 1: Benchmarks on Navier Stokes (fixing resolution $64\times64$ for both training and testing)

ConfigParametersTime perv=1e-3 T =50v=1e-4 T=30v=1e-4 T=30v=1e-5 T=20
FNO-3D6,558,537epoch 38.99sN=1000 0.0086N=1000 0.1918N=10000 0.0820N=1000 0.1893
FNO-2D414,517127.80s0.01280.15590.08340.1556
U-Net24,950,49148.67s0.02450.20510.11900.1982
TF-Net7,451,72447.21s0.02250.22530.11680.2268
ResNet266,64178.47s0.07010.28710.23110.2753

表 1: Navier Stokes基准测试 (训练和测试均固定分辨率 $64\times64$ )

Config Parameters Time per v=1e-3 T=50 v=1e-4 T=30 v=1e-4 T=30 v=1e-5 T=20
FNO-3D 6,558,537 epoch 38.99s N=1000 0.0086 N=1000 0.1918 N=10000 0.0820 N=1000 0.1893
FNO-2D 414,517 127.80s 0.0128 0.1559 0.0834 0.1556
U-Net 24,950,491 48.67s 0.0245 0.2051 0.1190 0.1982
TF-Net 7,451,724 47.21s 0.0225 0.2253 0.1168 0.2268
ResNet 266,641 78.47s 0.0701 0.2871 0.2311 0.2753

As shown in Table 1, the FNO-3D has the best performance when there is sufficient data $(\nu=$ $\mathrm{1e-3}$ , $N=1000$ and $\nu=1{\mathrm{e}}{-}4,N=10000)$ . For the configurations where the amount of data is insufficient $(\nu=1\mathrm{{e}}{-}4$ , $N=1000$ and $\nu=1{\mathrm{e}}{-}5,N=1000)$ , all methods have $>15%$ error with FNO-2D achieving the lowest. Note that we only present results for spatial resolution $64\times64$ since all benchmarks we compare against are designed for this resolution. Increasing it degrades their performance while FNO achieves the same errors.

如表 1 所示,在数据充足的情况下 $(\nu=$ $\mathrm{1e-3}$,$N=1000$ 以及 $\nu=1{\mathrm{e}}{-}4,N=10000)$,FNO-3D 表现最佳。而在数据量不足的配置下 $(\nu=1\mathrm{{e}}{-}4$,$N=1000$ 以及 $\nu=1{\mathrm{e}}{-}5,N=1000)$,所有方法的误差均 $>15%$,其中 FNO-2D 误差最低。需要注意的是,我们仅展示了空间分辨率为 $64\times64$ 的结果,因为我们对比的所有基准测试均针对此分辨率设计。提高分辨率会降低它们的性能,而 FNO 则能保持相同的误差水平。

2D and 3D Convolutions. FNO-2D, U-Net, TF-Net, and ResNet all do 2D-convolution in the spatial domain and re currently propagate in the time domain $\scriptstyle(2\mathrm{D}+\mathrm{RNN})$ . The operator maps the solution at the previous 10 time steps to the next time step (2D functions to 2D functions). On the other hand, FNO-3D performs convolution in space-time. It maps the initial time steps directly to the full trajectory (3D functions to 3D functions). The $2\mathrm{D}{+}\mathrm{RNN}$ structure can propagate the solution to any arbitrary time $T$ in increments of a fixed interval length $\Delta t$ , while the Conv3D structure is fixed to the interval $[0,T]$ but can transfer the solution to an arbitrary time-disc ret iz ation. We find the 3-d method to be more expressive and easier to train compared to its RNN-structured counterpart.

2D与3D卷积。FNO-2D、U-Net、TF-Net和ResNet均在空间域执行2D卷积,并通过$\scriptstyle(2\mathrm{D}+\mathrm{RNN})$结构在时间域递推。这些算子将前10个时间步的解映射至下一时间步(2D函数到2D函数)。而FNO-3D则在时空域执行卷积运算,直接将初始时间步映射至完整轨迹(3D函数到3D函数)。$2\mathrm{D}{+}\mathrm{RNN}$结构能以固定间隔$\Delta t$为单位将解递推至任意时间$T$,而Conv3D结构虽固定于区间$[0,T]$,但可将解转换至任意时间离散化网格。实验表明,相较于RNN结构,3D方法具有更强的表达能力且更易训练。

5.4 ZERO-SHOT SUPER-RESOLUTION.

5.4 零样本超分辨率 (Zero-shot Super-Resolution)

The neural operator is mesh-invariant, so it can be trained on a lower resolution and evaluated at a higher resolution, without seeing any higher resolution data (zero-shot super-resolution). Figure 1 shows an example where we train the FNO-3D model on $64\times64\times20$ resolution data in the setting above with $(\nu=1e{-4},N=10000)$ and transfer to $256\times256\times80$ resolution, demonstrating super-resolution in space-time. Fourier neural operator is the only model among the benchmarks (FNO-2D, U-Net, TF-Net, and ResNet) that can do zero-shot super-resolution. And surprisingly, it can do super-resolution not only in the spatial domain but also in the temporal domain.

神经算子具有网格不变性,因此可以在低分辨率下训练并在高分辨率下评估,而无需接触任何高分辨率数据(零样本超分辨率)。图 1: 展示了一个示例,我们在上述设置中使用 $64\times64\times20$ 分辨率数据训练 FNO-3D 模型 $(\nu=1e{-4},N=10000)$ 并迁移到 $256\times256\times80$ 分辨率,实现了时空超分辨率。Fourier 神经算子是基准模型(FNO-2D、U-Net、TF-Net 和 ResNet)中唯一能够实现零样本超分辨率的模型。令人惊讶的是,它不仅在空间域中实现超分辨率,还能在时间域中实现超分辨率。

5.5 BAYESIAN INVERSE PROBLEM

5.5 贝叶斯逆问题

In this experiment, we use a function space Markov chain Monte Carlo (MCMC) method (Cotter et al., 2013) to draw samples from the posterior distribution of the initial vorticity in Navier-Stokes given sparse, noisy observations at time $T=50$ . We compare the Fourier neural operator acting as a surrogate model with the traditional solvers used to generate our train-test data (both run on GPU). We generate 25,000 samples from the posterior (with a 5,000 sample burn-in period), requiring 30,000 evaluations of the forward operator.

在本实验中,我们采用函数空间马尔可夫链蒙特卡洛 (MCMC) 方法 (Cotter et al., 2013) ,从给定 $T=50$ 时刻稀疏噪声观测的 Navier-Stokes 初始涡量后验分布中抽取样本。我们将充当代理模型的傅里叶神经算子 (Fourier neural operator) 与用于生成训练-测试数据的传统求解器 (均在 GPU 上运行) 进行对比。我们从后验分布中生成 25,000 个样本 (包含 5,000 个样本的预热期) ,这需要对前向算子进行 30,000 次评估。

As shown in Figure 6 (Appendix A.5), FNO and the traditional solver recover almost the same posterior mean which, when pushed forward, recovers well the late-time dynamic of Navier Stokes. In sharp contrast, FNO takes $0.005s$ to evaluate a single instance while the traditional solver, after being optimized to use the largest possible internal time-step which does not lead to blow-up, takes $2.2s$ . This amounts to 2.5 minutes for the MCMC using FNO and over 18 hours for the traditional solver. Even if we account for data generation and training time (offline steps) which take 12 hours, using FNO is still faster! Once trained, FNO can be used to quickly perform multiple MCMC runs for different initial conditions and observations, while the traditional solver will take 18 hours for every instance. Furthermore, since FNO is differentiable, it can easily be applied to PDE-constrained optimization problems without the need for the adjoint method.

如图 6 (附录 A.5) 所示,FNO 与传统求解器恢复的后验均值几乎相同,经过前推后能很好地还原 Navier Stokes 方程的后期动力学。但两者效率形成鲜明对比:FNO 评估单个实例仅需 $0.005s$,而经过优化采用最大允许内部时间步长的传统求解器需耗时 $2.2s$。这使得基于 FNO 的 MCMC 仅需 2.5 分钟,传统求解器却需超过 18 小时。即使计入耗时 12 小时的数据生成和训练(离线阶段),FNO 方案仍更具时效优势。训练完成后,FNO 可快速针对不同初始条件和观测数据进行多次 MCMC 运算,而传统求解器每个实例都需 18 小时。此外,得益于可微特性,FNO 无需伴随方法即可直接应用于 PDE 约束优化问题。

Spectral analysis. Due to the way we parameter ize $R_{\phi}$ , the function output by (4) has at most $k_{\operatorname*{max},j}$ Fourier modes per channel. This, however, does not mean that the Fourier neural operator can only approximate functions up to $k_{\operatorname*{max},j}$ modes. Indeed, the activation functions which occur between integral operators and the final decoder network $Q$ recover the high frequency modes. As an example, consider a solution to the Navier-Stokes equation with viscosity $\nu=1\mathrm{{e}-3}$ . Truncating this function at 20 Fourier modes yields an error around $2%$ while our Fourier neural operator learns the parametric dependence and produces approximations to an error of $\leq1%$ with only $k_{\operatorname*{max},j}=12$ parameterized modes.

谱分析。由于我们参数化 $R_{\phi}$ 的方式,(4) 式输出的函数每个通道最多包含 $k_{\operatorname*{max},j}$ 个傅里叶模态。但这并不意味着傅里叶神经算子只能逼近最多 $k_{\operatorname*{max},j}$ 个模态的函数。实际上,积分算子之间的激活函数和最终解码网络 $Q$ 能够恢复高频模态。例如,考虑黏度 $\nu=1\mathrm{e-3}$ 的 Navier-Stokes 方程解,将该函数截断至 20 个傅里叶模态会产生约 $2%$ 的误差,而我们的傅里叶神经算子仅用 $k_{\operatorname*{max},j}=12$ 个参数化模态就能学习参数依赖关系并生成误差 $\leq1%$ 的近似解。

Non-periodic boundary condition. Traditional Fourier methods work only with periodic boundary conditions. However, the Fourier neural operator does not have this limitation. This is due to the linear transform $W$ (the bias term) which keeps the track of non-periodic boundary. As an example, the Darcy Flow and the time domain of Navier-Stokes have non-periodic boundary conditions, and the Fourier neural operator still learns the solution operator with excellent accuracy.

非周期性边界条件。传统傅里叶方法仅适用于周期性边界条件,而傅里叶神经算子通过线性变换$W$(偏置项)突破这一限制,能够有效追踪非周期性边界。例如在达西流(Darcy Flow)和纳维-斯托克斯方程时域问题中,虽然存在非周期性边界条件,傅里叶神经算子仍能以极高精度学习解算子。

6 DISCUSSION AND CONCLUSION

6 讨论与结论

Requirements on Data. Data-driven methods rely on the quality and quantity of data. To learn Navier-Stokes equation with viscosity $\nu=1\mathrm{{e}}{-4}$ , we need to generate $N=10000$ training pairs ${a_{j},u_{j}}$ with the numerical solver. However, for more challenging PDEs, generating a few training samples can be already very expensive. A future direction is to combine neural operators with numerical solvers to levitate the requirements on data. Recurrent structure. The neural operator has an iterative structure that can naturally be formulated as a recurrent network where all layers share the same parameters without sacrificing performance. (We did not impose this restriction in the experiments.) Computer vision. Operator learning is not restricted to PDEs. Images can naturally be viewed as real-valued functions on 2-d domains and videos simply add a temporal structure. Our approach is therefore a natural choice for problems in computer vision where invariance to disc ret iz ation crucial is important (Chi et al., 2020).

数据要求。数据驱动方法依赖于数据的质量和数量。为了学习粘度为$\nu=1\mathrm{e}{-4}$的Navier-Stokes方程,我们需要用数值求解器生成$N=10000$个训练对${a_{j},u_{j}}$。然而,对于更具挑战性的偏微分方程(PDE),生成少量训练样本可能已经非常昂贵。未来的一个方向是将神经算子与数值求解器结合,以减轻对数据的要求。

循环结构。神经算子具有迭代结构,可以自然地表述为循环网络,其中所有层共享相同参数而不牺牲性能(我们在实验中并未强制这一限制)。

计算机视觉。算子学习不仅限于偏微分方程。图像可以自然地视为二维域上的实值函数,而视频只是增加了时间结构。因此,我们的方法对于计算机视觉中离散化不变性至关重要的问题是一个自然的选择(Chi et al., 2020)。

ACKNOWLEDGEMENTS

致谢

The authors want to thank Ray Wang and Rose Yu for meaningful discussions. Z. Li gratefully acknowledges the financial support from the Kortschak Scholars Program. A. Anandkumar is supported in part by Bren endowed chair, LwLL grants, Beyond Limits, Raytheon, Microsoft, Google, Adobe faculty fellowships, and DE Logi grant. K. Bhatt acharya, N. B. Kovachki, B. Liu, and A. M. Stuart gratefully acknowledge the financial support of the Army Research Laboratory through the Cooperative Agreement Number W911NF-12-0022. Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-12-2- 0022. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

作者感谢Ray Wang和Rose Yu富有建设性的讨论。Z. Li衷心感谢Kortschak Scholars项目提供的资金支持。A. Anandkumar的部分研究工作得到了Bren讲席教授席位、LwLL资助、Beyond Limits、Raytheon、Microsoft、Google、Adobe教师奖学金以及DE Logi资助的支持。K. Bhattacharya、N. B. Kovachki、B. Liu和A. M. Stuart诚挚感谢陆军研究实验室通过合作协议编号W911NF-12-0022提供的资金支持。本研究由陆军研究实验室发起,并在合作协议编号W911NF-12-2-0022下完成。本文档中的观点和结论仅代表作者个人,不应被解释为陆军研究实验室或美国政府的官方政策或立场,无论明示或暗示。美国政府有权为政府目的复制和分发本文件,无论其版权标记如何。

REFERENCES

参考文献

Jonathan D Smith, Kamyar Aziz za de ne she li, and Zachary E Ross. Eikonet: Solving the eikonal equation with deep neural networks. arXiv preprint arXiv:2004.00361, 2020. Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Inter science, 1998. Rui Wang, Karthik Kashinath, Mustafa Mustafa, Adrian Albert, and Rose Yu. Towards physicsinformed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1457–1466, 2020. Yinhao Zhu and Nicholas Zabaras. Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quant if i cation. Journal of Computational Physics, 2018. ISSN 0021-9991. doi: https://doi.org/10.1016/j.jcp.2018.04.018. URL http://www. science direct.com/science/article/pii/S 0021999118302341.

Jonathan D Smith、Kamyar Aziz za de ne she li 和 Zachary E Ross。Eikonet:用深度神经网络求解程函方程。arXiv预印本 arXiv:2004.00361,2020。
Vladimir N. Vapnik。统计学习理论。Wiley-Interscience,1998。
Rui Wang、Karthik Kashinath、Mustafa Mustafa、Adrian Albert 和 Rose Yu。面向湍流预测的物理信息深度学习。第26届ACM SIGKDD知识发现与数据挖掘国际会议论文集,第1457–1466页,2020。
Yinhao Zhu 和 Nicholas Zabaras。贝叶斯深度卷积编码器-解码器网络:替代建模与不确定性量化。计算物理学杂志,2018。ISSN 0021-9991。doi: https://doi.org/10.1016/j.jcp.2018.04.018。URL http://www.sciencedirect.com/science/article/pii/S0021999118302341

A APPENDIX

A 附录

A.1 TABLE OF NOTATIONS

A.1 符号表

A table of notations is given in Table 2.

表2中给出了符号说明表。

Table 2: table of notations

NotationMeaning
Operator learning D c Rd ED α ∈ A=(D;Rda) u ∈U=(D; Rdu) Dj Gt :A →U μThe spatial domain for the PDE Points in the the spatial domain The input coefficient functions The target solution functions The discretization of (aj,u§) The operator mapping the coefficients to the solutions
Neural operator v(x) ∈ Rdv da du dv K : R2(d+1) → Rdxd Φ t=0,...,TThe neural network representation of u(x) Dimension of the input a(x). Dimension of the output u(). The dimension of the representation v(x) The kernel maps (x, y,a(x), a(y)) to a d x d matrix The parameters of the kernel network K The time steps (layers)
Fourier operator F,F-1 R M k kmazThe activation function Fourier transformation and its inverse. The linear transformation applied on the lower Fourier modes. The linear transformation (bias term) applied on the spatial domain. Fourier modes / wave numbers. The max Fourier modes used in the Fourier layer.
Hyperparameters N n S v TThe number of training pairs. The size of the discretization. The resolution of the discretization (sd = n). The viscosity. The time interval [o, T for time-dependent equation.

表 2: 符号说明

Notation Meaning
Operator learning D c Rd ED α ∈ A=(D;Rda) u ∈U=(D; Rdu) Dj Gt :A →U μ PDE的空间域 空间域中的点 输入系数函数 目标解函数 (aj,u§)的离散化 将系数映射到解的算子
Neural operator v(x) ∈ Rdv da du dv K : R2(d+1) → Rdxd Φ t=0,...,T u(x)的神经网络表示 输入a(x)的维度 输出u()的维度 表示v(x)的维度 将(x, y,a(x), a(y))映射到d x d矩阵的核函数 核网络K的参数 时间步(层)
Fourier operator F,F-1 R M k kmaz 激活函数 傅里叶变换及其逆变换 应用于低阶傅里叶模式的线性变换 应用于空间域的线性变换(偏置项) 傅里叶模式/波数 傅里叶层使用的最大傅里叶模式
Hyperparameters N n S v T 训练对的数量 离散化尺寸 离散化分辨率(sd = n) 粘度 时间相关方程的时间区间[o, T]

A.2 SPECTRAL ANALYSIS

A.2 频谱分析

The spectral decay of the Navier Stokes equation data is shown in Figure 4. The spectrum decay has a slope $k^{-5/3}$ , matching the energy spectrum in the turbulence region. And we notice the energy spectrum does not decay along with time.

Navier Stokes方程数据的频谱衰减如图4所示。频谱衰减斜率为$k^{-5/3}$,与湍流区域的能谱相匹配。我们注意到能谱并未随时间衰减。

We also present the spectral decay in term of the truncation $k_{m a x}$ defined in 4 as shown in Figure5. We note all equations (Burgers, Darcy, and Navier-Stokes with $\nu\leq1\mathrm{e}{-4}$ ) exhibit high frequency modes. Even we truncate at $k_{m a x}=12$ in the Fourier layer, the Fourier neural operator can recover the high frequency modes.

我们还展示了截断值 $k_{max}$ 对应的频谱衰减情况(定义见第4节),如图5所示。值得注意的是,所有方程(Burgers、Darcy以及 $\nu\leq1\mathrm{e}{-4}$ 的Navier-Stokes方程)都表现出高频模态特征。即便在Fourier层中采用 $k_{max}=12$ 进行截断,Fourier神经算子仍能有效恢复这些高频模态。

A.3 DATA GENERATION

A.3 数据生成

In this section, we provide the details of data generator for the three equation we used in Section 5.

在本节中,我们将详细介绍第5节所用三个方程的数据生成器实现细节。


Figure 4: Spectral Decay of Navier-Stokes equations

图 4: 纳维-斯托克斯方程的频谱衰减

The spectral decay of the Navier-stokes equation data we used in section 5.3. The $\mathrm{y}.$ -axis is the spectrum; the $\mathbf{X}$ -axis is the wavenumber $|k|=k_{1}+k_{2}$ .

我们在5.3节中使用的Navier-Stokes方程数据的频谱衰减。y轴表示频谱;X轴表示波数|k|=k1+k2。


Figure 5: Spectral Decay in term of $k_{m a x}$

图 5: 基于 $k_{m a x}$ 的频谱衰减

The error of truncation in one single Fourier layer without applying the linear transform $R$ . The y-axis is the normalized truncation error; the $\mathbf{X}$ -axis is the truncation mode $k_{m a x}$ .

未应用线性变换 $R$ 的单一傅里叶层截断误差。y轴为归一化截断误差;$\mathbf{X}$ 轴为截断模式 $k_{m a x}$。

A.3.1 BURGERS EQUATION

A.3.1 伯格斯方程

Recall the 1-d Burger’s equation on the unit torus:

回忆单位环面上的1维伯格斯方程:

$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1).}\end{array}
$$

$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1).}\end{array}
$$

The initial condition $u_{0}(x)$ is generated according to $u_{0}\sim\mu$ where $\mu=\mathcal{N}(0,625(-\Delta+25I)^{-2})$ with periodic boundary conditions. We set the viscosity to $\nu=0.1$ and solve the equation using a split step method where the heat equation part is solved exactly in Fourier space then the non-linear part is advanced, again in Fourier space, using a very fine forward Euler method. We solve on a spatial mesh with resolution $2^{13}=8192$ and use this dataset to subsample other resolutions.

初始条件 $u_{0}(x)$ 根据 $u_{0}\sim\mu$ 生成,其中 $\mu=\mathcal{N}(0,625(-\Delta+25I)^{-2})$ 且具有周期性边界条件。我们将粘度设为 $\nu=0.1$,并采用分步法求解方程:先在傅里叶空间精确求解热方程部分,再在傅里叶空间使用极精细的前向欧拉法推进非线性部分。求解时采用空间网格分辨率为 $2^{13}=8192$,并基于该数据集进行其他分辨率的二次采样。

A.3.2 DARCY FLOW

A.3.2 达西流 (Darcy Flow)

The 2-d Darcy Flow is a second-order linear elliptic equation of the form

二维达西流 (Darcy Flow) 是一个二阶线性椭圆方程,其形式为

$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}.}\end{array}
$$

$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}.}\end{array}
$$

The coefficients $a(x)$ are generated according to $a\sim\mu$ where $\mu(A) = \mathcal{N}\left(0, (-\Delta + 9I)^{-2}\right)\left(\psi^{-1}(A)\right), \quad \forall A \in \mathcal{B}.$ with zero Neumann boundary conditions on the Laplacian. The mapping $\psi:\mathbb{R}\to\mathbb{R}$ takes the value 12 on the positive part of the real line and 3 on the negative and the push-forward is defined pointwise. The forcing is kept fixed $f(x)=1$ . Such constructions are prototypical models for many physical systems such as permeability in subsurface flows and material micro structures in elasticity. Solutions $u$ are obtained by using a second-order finite difference scheme on a $421\times421$ grid. Different resolutions are down sampled from this dataset.

系数 $a(x)$ 的生成遵循 $a\sim\mu$ 分布,其中 $\mu(A) = \mathcal{N}\left(0, (-\Delta + 9I)^{-2}\right)\left(\psi^{-1}(A)\right), \quad \forall A \in \mathcal{B}.$ 并在拉普拉斯算子(Laplacian)上采用零诺伊曼边界条件。映射 $\psi:\mathbb{R}\to\mathbb{R}$ 在实轴正半轴取值为12,负半轴取值为3,且前推映射(push-forward)为逐点定义。外力项固定为 $f(x)=1$。此类构造是许多物理系统的典型模型,例如地下流动中的渗透率问题以及弹性力学中的材料微观结构。解 $u$ 通过在 $421\times421$ 网格上采用二阶有限差分格式获得,不同分辨率的数据集均由此降采样而来。

A.3.3 NAVIER-STOKES EQUATION

A.3.3 纳维-斯托克斯方程

Recall the 2-d Navier-Stokes equation for a viscous, incompressible fluid in vorticity form on the unit torus:

回忆单位环面上黏性不可压缩流体的二维纳维-斯托克斯方程(以涡量形式表示):

$$
\begin{array}{r l r l}&{\partial_{t}w(x,t)+u(x,t)\cdot\nabla w(x,t)=\nu\Delta w(x,t)+f(x),}&&{x\in(0,1)^{2},t\in(0,T]}\ &{}&&{\nabla\cdot u(x,t)=0,}&&{x\in(0,1)^{2},t\in[0,T]}\ &{}&&{w(x,0)=w_{0}(x),}&&{x\in(0,1)^{2}.}\end{array}
$$

$$
\begin{array}{r l r l}&{\partial_{t}w(x,t)+u(x,t)\cdot\nabla w(x,t)=\nu\Delta w(x,t)+f(x),}&&{x\in(0,1)^{2},t\in(0,T]}\ &{}&&{\nabla\cdot u(x,t)=0,}&&{x\in(0,1)^{2},t\in[0,T]}\ &{}&&{w(x,0)=w_{0}(x),}&&{x\in(0,1)^{2}.}\end{array}
$$

The initial condition $w_{0}(x)$ is generated according to $w_{0}\sim\mu$ where $\mu=\mathcal{N}(0,7^{3/2}(-\Delta+49I)^{-2.5})$ with periodic boundary conditions. The forcing is kept fixed $f(x)=0.1(\sin(2\pi(x_{1}+x_{2}))+$ $\cos(\bar{2\pi}(x_{1}+x_{2})))$ . The equation is solved using the stream-function formulation with a pseudospectral method. First a Poisson equation is solved in Fourier space to find the velocity field. Then the vorticity is differentiated and the non-linear term is computed is physical space after which it is dealiased. Time is advanced with a Crank–Nicolson update where the non-linear term does not enter the implicit part. All data are generated on a $256\times256$ grid and are down sampled to $64\times64$ . We use a time-step of $\mathrm{1e-4}$ for the Crank–Nicolson scheme in the data-generated process where we record the solution every $t=1$ time units. The step is increased to $\mathrm{2e-2}$ when used in MCMC for the Bayesian inverse problem.

初始条件 $w_{0}(x)$ 根据 $w_{0}\sim\mu$ 生成,其中 $\mu=\mathcal{N}(0,7^{3/2}(-\Delta+49I)^{-2.5})$ 采用周期性边界条件。外力项固定为 $f(x)=0.1(\sin(2\pi(x_{1}+x_{2}))+$ $\cos(\bar{2\pi}(x_{1}+x_{2})))$ 。该方程采用流函数形式的伪谱法求解:先在傅里叶空间求解泊松方程得到速度场,然后在物理空间对涡量进行微分并计算非线性项后进行去混淆处理。时间推进采用Crank-Nicolson格式,其中非线性项不参与隐式部分。所有数据在 $256\times256$ 网格上生成并降采样至 $64\times64$ 。数据生成过程中Crank-Nicolson格式采用 $\mathrm{1e-4}$ 时间步长,每 $t=1$ 时间单位记录一次解;当用于贝叶斯反问题的MCMC时,步长增大至 $\mathrm{2e-2}$ 。

A.4 RESULTS OF BURGERS’ EQUATION AND DARCY FLOW

A.4 伯格斯方程(Burgers' Equation)与达西流(Darcy Flow)的结果

The details error rate on Burgers’ equation and Darcy Flow are listed in Table 3 and Table 4.

Burgers方程和Darcy Flow的详细误差率列于表3和表4。

Table 3: Benchmarks on 1-d Burgers’ equation

Networkss =256S=512S=1024S=2048S = 4096S=8192
NN0.47140.45610.48030.46450.47790.4452
GCN0.39990.41380.41760.41570.41910.4198
FCN0.09580.14070.18770.23130.28550.3238
PCANN0.03980.03950.03910.03830.03920.0393
GNO0.05550.05940.06510.06630.06660.0699
LNO0.02120.02210.02170.02190.02000.0189
MGNO0.02430.03550.03740.03600.03640.0364
FNO0.01490.01580.01600.01460.01420.0139

表 3: 一维Burgers方程基准测试

Networks s=256 s=512 s=1024 s=2048 s=4096 s=8192
NN 0.4714 0.4561 0.4803 0.4645 0.4779 0.4452
GCN 0.3999 0.4138 0.4176 0.4157 0.4191 0.4198
FCN 0.0958 0.1407 0.1877 0.2313 0.2855 0.3238
PCANN 0.0398 0.0395 0.0391 0.0383 0.0392 0.0393
GNO 0.0555 0.0594 0.0651 0.0663 0.0666 0.0699
LNO 0.0212 0.0221 0.0217 0.0219 0.0200 0.0189
MGNO 0.0243 0.0355 0.0374 0.0360 0.0364 0.0364
FNO 0.0149 0.0158 0.0160 0.0146 0.0142 0.0139

Table 4: Benchmarks on 2-d Darcy Flow

Networkss =85s = 141S=211 S = 421
NN0.17160.1716 0.17160.1716
FCN0.02530.0493 0.07270.1097
PCANN0.02990.0298 0.02980.0299
RBM0.02440.0251 0.02550.0259
GNO0.03460.0332 0.03420.0369
LNO0.05200.0461 0.0445
MGNO0.04160.0428 0.04280.0420
FNO0.01080.0109 0.01090.0098

表 4: 二维达西流基准测试

Networks s =85 s = 141 S=211 S = 421
NN 0.1716 0.1716 0.1716 0.1716
FCN 0.0253 0.0493 0.0727 0.1097
PCANN 0.0299 0.0298 0.0298 0.0299
RBM 0.0244 0.0251 0.0255 0.0259
GNO 0.0346 0.0332 0.0342 0.0369
LNO 0.0520 0.0461 0.0445
MGNO 0.0416 0.0428 0.0428 0.0420
FNO 0.0108 0.0109 0.0109 0.0098

A.5 BAYESIAN INVERSE PROBLEM

A.5 贝叶斯逆问题

Results of the Bayesian inverse problem for the Navier-Stokes equation are shown in Figure 6. It can be seen that the result using Fourier neural operator as a surrogate is as good as the result of the traditional solver.

图 6: 展示了Navier-Stokes方程的贝叶斯反问题求解结果。可以看出,使用Fourier神经算子作为替代模型的效果与传统求解器结果相当。

The top left panel shows the true initial vorticity while bottom left panel shows the true observed vorticity at $T=50$ with black dots indicating the locations of the observation points placed on a $7\times7$ grid. The top middle panel shows the posterior mean of the initial vorticity given the noisy observations estimated with MCMC using the traditional solver, while the top right panel shows the same thing but using FNO as a surrogate model. The bottom middle and right panels show the vorticity at $T=50$ when the respective approximate posterior means are used as initial conditions.

左上角面板显示了真实的初始涡度,而左下角面板显示了在$T=50$时刻的真实观测涡度,其中黑点表示放置在$7\times7$网格上的观测点位置。中间上方面板展示了使用传统求解器通过MCMC估计的含噪声观测数据得到的初始涡度后验均值,而右上角面板则展示了使用FNO作为替代模型的相同结果。中间下方和右侧面板分别展示了将各自近似后验均值作为初始条件时$T=50$时刻的涡度情况。

Figure 6: Results of the Bayesian inverse problem for the Navier-Stokes equation.

图 6: 纳维-斯托克斯方程的贝叶斯反问题结果。

阅读全文(20积分)