FOURIER NEURAL OPERATOR FORPARAMETRIC PARTIAL DIFFERENTIAL EQUATIONS
参数偏微分方程的傅里叶神经算子
Zongyi Li zongyili $@$ caltech.edu
Zongyi Li zongyili $@$ caltech.edu
Nikola Kovachki nkovachki@caltech.edu
Nikola Kovachki nkovachki@caltech.edu
Kamyar Aziz za de ne she li kamyar $@$ purdue.edu
Kamyar Aziz za de ne she li kamyar $@$ purdue.edu
Burigede Liu bgl@caltech.edu
Burigede Liu bgl@caltech.edu
Kaushik Bhatt acharya bhatta@caltech.edu
Kaushik Bhatt acharya bhatta@caltech.edu
Andrew Stuart astuart@caltech.edu
Andrew Stuart astuart@caltech.edu
Anima Anandkumar anima@caltech.edu
Anima Anandkumar anima@caltech.edu
ABSTRACT
摘要
The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an entire family of PDEs, in contrast to classical methods which solve one instance of the equation. In this work, we formulate a new neural operator by parameter i zing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. We perform experiments on Burgers’ equation, Darcy flow, and Navier-Stokes equation. The Fourier neural operator is the first ML-based method to successfully model turbulent flows with zero-shot super-resolution. It is up to three orders of magnitude faster compared to traditional PDE solvers. Additionally, it achieves superior accuracy compared to previous learning-based solvers under fixed resolution.
神经网络的经典发展主要集中于学习有限维欧几里得空间之间的映射关系。近年来,这一范畴被推广至能学习函数空间之间映射的神经算子 (neural operator) 。对于偏微分方程 (PDEs) ,神经算子直接学习从任意函数参数依赖关系到解的映射,从而能学习整个PDE族,这与仅求解方程单个实例的经典方法形成鲜明对比。本研究通过直接在傅里叶空间参数化积分核,构建了一种新型神经算子,其架构兼具高表达力与高效性。我们在Burgers方程、达西流 (Darcy flow) 和Navier-Stokes方程上进行了实验验证。傅里叶神经算子 (Fourier neural operator) 是首个基于机器学习、能以零样本超分辨率成功模拟湍流的方法,其速度比传统PDE求解器快达三个数量级。在固定分辨率下,其精度也优于以往基于学习的求解器。
1 INTRODUCTION
1 引言
Many problems in science and engineering involve solving complex partial differential equation (PDE) systems repeatedly for different values of some parameters. Examples arise in molecular dynamics, micro-mechanics, and turbulent flows. Often such systems require fine disc ret iz ation in order to capture the phenomenon being modeled. As a consequence, traditional numerical solvers are slow and sometimes inefficient. For example, when designing materials such as airfoils, one needs to solve the associated inverse problem where thousands of evaluations of the forward model are needed. A fast method can make such problems feasible.
科学与工程中的许多问题涉及针对不同参数值反复求解复杂的偏微分方程(PDE)系统。这类问题常见于分子动力学、微观力学和湍流等领域。为准确捕捉模拟现象,这类系统通常需要精细离散化处理,导致传统数值求解器速度缓慢且效率低下。例如在设计翼型等材料时,需要求解涉及数千次正向模型评估的逆问题,此时快速求解方法能显著提升可行性。
Conventional solvers vs. Data-driven methods. Traditional solvers such as finite element methods (FEM) and finite difference methods (FDM) solve the equation by disc ret i zing the space. Therefore, they impose a trade-off on the resolution: coarse grids are fast but less accurate; fine grids are accurate but slow. Complex PDE systems, as described above, usually require a very fine discretization, and therefore very challenging and time-consuming for traditional solvers. On the other hand, data-driven methods can directly learn the trajectory of the family of equations from the data. As a result, the learning-based method can be orders of magnitude faster than the conventional solvers.
传统求解器与数据驱动方法的对比。传统求解器如有限元方法(FEM)和有限差分方法(FDM)通过空间离散化来求解方程。因此它们在分辨率上存在权衡:粗网格计算快但精度低;细网格精度高但速度慢。如前所述,复杂偏微分方程系统通常需要非常精细的离散化,这对传统求解器来说极具挑战性且耗时。相比之下,数据驱动方法可以直接从数据中学习方程族的轨迹。因此,基于学习的方法可以比传统求解器快几个数量级。
Machine learning methods may hold the key to revolutionizing scientific disciplines by providing fast solvers that approximate or enhance traditional ones (Raissi et al., 2019; Jiang et al., 2020; Greenfeld et al., 2019; Kochkov et al., 2021). However, classical neural networks map between finite-dimensional spaces and can therefore only learn solutions tied to a specific disc ret iz ation. This is often a limitation for practical applications and therefore the development of mesh-invariant neural networks is required. We first outline two mainstream neural network-based approaches for PDEs – the finite-dimensional operators and Neural-FEM.
机器学习方法可能通过提供近似或增强传统求解器的快速求解器,成为革新科学领域的关键 [20][21][22][23]。然而,经典神经网络在有限维空间之间进行映射,因此只能学习与特定离散化绑定的解。这在实际应用中往往构成限制,因此需要开发网格无关的神经网络。我们首先概述两种基于神经网络的偏微分方程主流方法——有限维算子与神经有限元法(Neural-FEM)。
Finite-dimensional operators. These approaches parameter ize the solution operator as a deep convolutional neural network between finite-dimensional Euclidean spaces Guo et al. (2016); Zhu
有限维算子。这类方法将求解算子参数化为有限维欧几里得空间之间的深度卷积神经网络 (Guo et al., 2016; Zhu)
Zero-shot super-resolution: Navier-Stokes Equation with viscosity $\nu=1\mathrm{e}{-4}$ ; Ground truth on top and prediction on bottom; trained on $64\times64\times20$ dataset; evaluated on $256\times256\times80$ (see Section 5.4).
零样本超分辨率:粘度为$\nu=1\mathrm{e}{-4}$的Navier-Stokes方程;顶部为真实值,底部为预测值;在$64\times64\times20$数据集上训练;在$256\times256\times80$上评估(参见第5.4节)。
Figure 1: top: The architecture of the Fourier layer; bottom: Example flow from Navier-Stokes.
图 1: 上: 傅里叶层架构; 下: 纳维-斯托克斯方程示例流场
& Zabaras (2018); Adler & Oktem (2017); Bhatnagar et al. (2019); Khoo et al. (2017). Such approaches are, by definition, mesh-dependent and will need modifications and tuning for different resolutions and disc ret iz at ions in order to achieve consistent error (if at all possible). Furthermore, these approaches are limited to the disc ret iz ation size and geometry of the training data and hence, it is not possible to query solutions at new points in the domain. In contrast, we show, for our method, both invariance of the error to grid resolution, and the ability to transfer the solution between meshes.
Zabaras (2018); Adler & Oktem (2017); Bhatnagar等人 (2019); Khoo等人 (2017)。这类方法本质上依赖于网格,需要针对不同分辨率和离散化方案进行调整和优化才能实现误差一致性(如果可行的话)。此外,这些方法受限于训练数据的离散化规模和几何结构,因此无法在域内新位置查询解。相比之下,我们的方法既实现了误差对网格分辨率的不变性,又具备在不同网格间传递解的能力。
Neural-FEM. The second approach directly parameter ize s the solution function as a neural network (E & Yu, 2018; Raissi et al., 2019; Bar & Sochen, 2019; Smith et al., 2020; Pan & Duraisamy, 2020). This approach is designed to model one specific instance of the PDE, not the solution operator. It is mesh-independent and accurate, but for any given new instance of the functional parameter/coefficient, it requires training a new neural network. The approach closely resembles classical methods such as finite elements, replacing the linear span of a finite set of local basis functions with the space of neural networks. The Neural-FEM approach suffers from the same computational issue as classical methods: the optimization problem needs to be solved for every new instance. Furthermore, the approach is limited to a setting in which the underlying PDE is known.
神经有限元方法。第二种方法直接将解函数参数化为神经网络 (E & Yu, 2018; Raissi et al., 2019; Bar & Sochen, 2019; Smith et al., 2020; Pan & Duraisamy, 2020)。该方法旨在建模偏微分方程(PDE)的特定实例,而非解算子。它具有网格无关性和高精度,但对于任何新的函数参数/系数实例,都需要重新训练神经网络。该方法与经典方法(如有限元法)非常相似,用神经网络空间取代了有限局部基函数的线性张成空间。神经有限元方法与经典方法存在相同的计算问题:每个新实例都需要重新求解优化问题。此外,该方法仅限于已知底层偏微分方程的场景。
Neural Operators. Recently, a new line of work proposed learning mesh-free, infinitedimensional operators with neural networks (Lu et al., 2019; Bhatt acharya et al., 2020; Nelsen & Stuart, 2020; Li et al., 2020b;a; Patel et al., 2021). The neural operator remedies the mesh-dependent nature of the finite-dimensional operator methods discussed above by producing a single set of network parameters that may be used with different disc ret iz at ions. It has the ability to transfer solutions between meshes. Furthermore, the neural operator needs to be trained only once. Obtaining a solution for a new instance of the parameter requires only a forward pass of the network, alleviating the major computational issues incurred in Neural-FEM methods. Lastly, the neural operator requires no knowledge of the underlying PDE, only data. Thus far, neural operators have not yielded efficient numerical algorithms that can parallel the success of convolutional or recurrent neural networks in the finite-dimensional setting due to the cost of evaluating integral operators. Through the fast Fourier transform, our work alleviates this issue.
神经算子 (Neural Operators)。近期,一系列研究提出用神经网络学习无网格、无限维算子 (Lu et al., 2019; Bhattacharya et al., 2020; Nelsen & Stuart, 2020; Li et al., 2020b;a; Patel et al., 2021)。神经算子通过生成适用于不同离散化场景的单一网络参数集,解决了前述有限维算子方法对网格的依赖性,具备跨网格传递解的能力。此外,神经算子只需训练一次,对新参数实例求解仅需网络前向传播,显著缓解了Neural-FEM方法的计算瓶颈。最后,神经算子无需偏微分方程(PDE)的先验知识,仅依赖数据即可工作。然而,由于积分算子计算成本高昂,现有神经算子尚未发展出能与有限维场景中卷积神经网络/循环神经网络相媲美的高效数值算法。我们的工作通过快速傅里叶变换解决了这一难题。
Fourier Transform. The Fourier transform is frequently used in spectral methods for solving differential equations, since differentiation is equivalent to multiplication in the Fourier domain. Fourier transforms have also played an important role in the development of deep learning. In theory, they appear in the proof of the universal approximation theorem (Hornik et al., 1989) and, empirically, they have been used to speed up convolutional neural networks (Mathieu et al., 2013). Neural network architectures involving the Fourier transform or the use of sinusoidal activation functions have also been proposed and studied (Bengio et al., 2007; Mingo et al., 2004; Sitzmann et al., 2020). Recently, some spectral methods for PDEs have been extended to neural networks (Fan et al., 2019a;b; Kashinath et al., 2020). We build on these works by proposing a neural operator architecture defined directly in Fourier space with quasi-linear time complexity and state-of-the-art approximation capabilities.
傅里叶变换 (Fourier Transform)。傅里叶变换常被用于谱方法求解微分方程,因为在傅里叶域中微分等价于乘法运算。傅里叶变换在深度学习发展历程中也扮演着重要角色:理论上,它出现在通用逼近定理的证明中 [Hornik et al., 1989];实践中,它被用于加速卷积神经网络 [Mathieu et al., 2013]。涉及傅里叶变换或采用正弦激活函数的神经网络架构也已被提出并深入研究 [Bengio et al., 2007; Mingo et al., 2004; Sitzmann et al., 2020]。近期,部分偏微分方程的谱方法被扩展应用于神经网络 [Fan et al., 2019a;b; Kashinath et al., 2020]。我们在这些研究基础上提出了一种直接在傅里叶空间定义的神经算子架构,该架构具有拟线性时间复杂度与最先进的逼近能力。
Our Contributions. We introduce the Fourier neural operator, a novel deep learning architecture able to learn mappings between infinite-dimensional spaces of functions; the integral operator is restricted to a convolution, and instantiated through a linear transformation in the Fourier domain.
我们的贡献。我们提出了一种新型深度学习架构——傅里叶神经算子(Fourier neural operator),能够学习无限维函数空间之间的映射;该积分算子被限制为卷积运算,并通过傅里叶域的线性变换实现。
We observed that the proposed framework can approximate complex operators raising in PDEs that are highly non-linear, with high frequency modes and slow energy decay. The power of neural operators comes from combining linear, global integral operators (via the Fourier transform) and nonlinear, local activation functions. Similar to the way standard neural networks approximate highly non-linear functions by combining linear multiplications with non-linear activation s, the proposed neural operators can approximate highly non-linear operators.
我们观察到,所提出的框架能够逼近偏微分方程(PDE)中出现的具有高度非线性、高频模式和缓慢能量衰减特性的复杂算子。神经算子的强大之处在于结合了线性全局积分算子(通过傅里叶变换实现)和非线性局部激活函数。类似于标准神经网络通过线性乘法与非线性激活函数的组合来逼近高度非线性函数的方式,所提出的神经算子能够逼近高度非线性算子。
2LEARNING OPERATORS
2 学习算子
Our methodology learns a mapping between two infinite dimensional spaces from a finite collection of observed input-output pairs. Let $D\subset\mathbb{R}^{d}$ be a bounded, open set and $\mathcal{A}=\mathcal{A}(D;\mathbb{R}^{d_{a}})$ and $\mathcal{U}=\mathcal{U}(D;\mathbb{R}^{d_{u}})$ be separable Banach spaces of function taking values in $\mathbb{R}^{d_{a}}$ and $\mathbb{R}^{d_{u}}$ respectively. Furthermore let $G^{\dagger}:{\mathcal{A}}\rightarrow{\mathcal{U}}$ be a (typically) non-linear map. We study maps $G^{\dagger}$ which arise as the solution operators of parametric PDEs – see Section 5 for examples. Suppose we have observations ${a_{j},u_{j}}_ {j=1}^{N}$ where $a_{j}\sim\mu$ is an i.i.d. sequence from the probability measure $\mu$ supported on $\mathcal{A}$ and $u_{j}=G^{\dagger}(a_{j})$ is possibly corrupted with noise. We aim to build an approximation of $G^{\dagger}$ by constructing a parametric map
我们的方法通过有限组观测到的输入-输出对,学习两个无限维空间之间的映射关系。设 $D\subset\mathbb{R}^{d}$ 为有界开集,$\mathcal{A}=\mathcal{A}(D;\mathbb{R}^{d_{a}})$ 和 $\mathcal{U}=\mathcal{U}(D;\mathbb{R}^{d_{u}})$ 分别是取值于 $\mathbb{R}^{d_{a}}$ 和 $\mathbb{R}^{d_{u}}$ 的可分巴拿赫函数空间。进一步设 $G^{\dagger}:{\mathcal{A}}\rightarrow{\mathcal{U}}$ 为(通常)非线性映射。我们研究的 $G^{\dagger}$ 映射源自参数化偏微分方程的解算子(具体示例见第5节)。假设观测数据集 ${a_{j},u_{j}}_ {j=1}^{N}$ 中,$a_{j}\sim\mu$ 是支撑在 $\mathcal{A}$ 上的概率测度 $\mu$ 的独立同分布序列,且 $u_{j}=G^{\dagger}(a_{j})$ 可能包含噪声。我们的目标是通过构建参数化映射来逼近 $G^{\dagger}$。
$$
G:{\mathcal{A}}\times\Theta\to{\mathcal{U}}\qquad{\mathrm{or~equivalently}}, G_{\theta}:{\mathcal{A}}\to{\mathcal{U}},\quad\theta\in\Theta
$$
$$
G:{\mathcal{A}}\times\Theta\to{\mathcal{U}}\qquad{\mathrm{或等价地}}, G_{\theta}:{\mathcal{A}}\to{\mathcal{U}},\quad\theta\in\Theta
$$
for some finite-dimensional parameter space $\Theta$ by choosing $\theta^{\dagger}\in\Theta$ so that $G(\cdot,\theta^{\dagger})=G_{\theta^{\dagger}}\approx G^{\dagger}$ . This is a natural framework for learning in infinite-dimensions as one could define a cost functional $C:\mathcal{U}\times\mathcal{U}\to\mathbb{R}$ and seek a minimizer of the problem
对于某个有限维参数空间$\Theta$,通过选择$\theta^{\dagger}\in\Theta$使得$G(\cdot,\theta^{\dagger})=G_{\theta^{\dagger}}\approx G^{\dagger}$。这是无限维度学习的自然框架,因为可以定义成本函数$C:\mathcal{U}\times\mathcal{U}\to\mathbb{R}$并寻求该问题的极小化解。
$$
\operatorname*{min}_ {\theta\in\Theta}\mathbb{E}_{a\sim\mu}[C(G(a,\theta),G^{\dagger}(a))]
$$
$$
\operatorname*{min}_ {\theta\in\Theta}\mathbb{E}_{a\sim\mu}[C(G(a,\theta),G^{\dagger}(a))]
$$
which directly parallels the classical finite-dimensional setting (Vapnik, 1998). Showing the existence of minimizers, in the infinite-dimensional setting, remains a challenging open problem. We will approach this problem in the test-train setting by using a data-driven empirical approximation to the cost used to determine $\theta$ and to test the accuracy of the approximation. Because we conceptualize our methodology in the infinite-dimensional setting, all finite-dimensional approximations share a common set of parameters which are consistent in infinite dimensions. A table of notation is shown in Appendix 3.
这与经典的有限维设定直接对应 (Vapnik, 1998)。在无限维设定中证明极小值的存在性仍是一个具有挑战性的开放问题。我们将通过使用数据驱动的经验近似方法,在测试-训练设定中处理该问题,该近似用于确定参数$\theta$并检验近似的准确性。由于我们在无限维设定中构建方法论,所有有限维近似都共享一组在无限维中保持一致的通用参数。符号表见附录3。
Learning the Operator. Approximating the operator $G^{\dagger}$ is a different and typically much more challenging task than finding the solution $u\in\mathcal{U}$ of a PDE for a single instance of the parameter $a\in{\mathcal{A}}$ . Most existing methods, ranging from classical finite elements, finite differences, and finite volumes to modern machine learning approaches such as physics-informed neural networks
学习算子。近似算子 $G^{\dagger}$ 是一个与求解参数 $a\in{\mathcal{A}}$ 单实例下偏微分方程解 $u\in\mathcal{U}$ 截然不同且通常更具挑战性的任务。现有方法涵盖从经典有限元法、有限差分法、有限体积法,到物理信息神经网络等现代机器学习方法。
(a)
(a)
Figure 2: top: The architecture of the neural operators; bottom: Fourier layer.
图 2: 上: 神经算子架构;下: 傅里叶层。
(a) The full architecture of neural operator: start from input $a$ . 1. Lift to a higher dimension channel space by a neural network $P$ . 2. Apply four layers of integral operators and activation functions. 3. Project back to the target dimension by a neural network $Q$ . Output $u$ . (b) Fourier layers: Start from input $v$ . On top: apply the Fourier transform $\mathcal{F}$ ; a linear transform $R$ on the lower Fourier modes and filters out the higher modes; then apply the inverse Fourier transform ${\mathcal{F}}^{-1}$ . On the bottom: apply a local linear transform $W$ .
(a) 神经算子完整架构:从输入 $a$ 开始。1. 通过神经网络 $P$ 提升至高维通道空间。2. 应用四层积分算子和激活函数。3. 通过神经网络 $Q$ 投影回目标维度。输出 $u$。(b) 傅里叶层:从输入 $v$ 开始。上层:应用傅里叶变换 $\mathcal{F}$;对低频傅里叶模态进行线性变换 $R$ 并滤除高频模态;然后应用逆傅里叶变换 ${\mathcal{F}}^{-1}$。下层:应用局部线性变换 $W$。
(PINNs) (Raissi et al., 2019) aim at the latter and can therefore be computationally expensive. This makes them impractical for applications where a solution to the PDE is required for many different instances of the parameter. On the other hand, our approach directly approximates the operator and is therefore much cheaper and faster, offering tremendous computational savings when compared to traditional solvers. For an example application to Bayesian inverse problems, see Section 5.5.
物理信息神经网络 (PINNs) (Raissi et al., 2019) 针对后者设计,因此计算成本较高。这使得它们在需要为参数多个不同实例求解偏微分方程的应用中不切实际。相比之下,我们的方法直接逼近算子,因此更经济高效,与传统求解器相比可大幅节省计算量。关于贝叶斯反问题的应用示例,请参阅第5.5节。
Disc ret iz ation. Since our data $a_{j}$ and $u_{j}$ are, in general, functions, to work with them numerically, we assume access only to point-wise evaluations. Let $D_{j}={x_{1},\ldots,x_{n}}\subset D$ be a $n$ -point disc ret iz ation of the domain $D$ and assume we have observations $a_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{a}}$ , $u_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{v}}$ , for a finite collection of input-output pairs indexed by $j$ . To be disc ret iz ation-invariant, the neural operator can produce an answer $u(x)$ for any $x\in D$ , potentially $x\notin D_{j}$ . Such a property is highly desirable as it allows a transfer of solutions between different grid geometries and disc ret iz at ions.
离散化处理。由于我们的数据 $a_{j}$ 和 $u_{j}$ 通常是函数形式,为进行数值计算,我们假设仅能获取其点值评估。设 $D_{j}={x_{1},\ldots,x_{n}}\subset D$ 为定义域 $D$ 的 $n$ 点离散化集合,并假设我们观测到有限组输入-输出对(索引为 $j$)的 $a_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{a}}$ 和 $u_{j}|_ {D_{j}}\in\mathbb{R}^{n\times d_{v}}$。为实现离散化不变性,该神经算子需能为任意 $x\in D$(包括 $x\notin D_{j}$ 的情况)生成解 $u(x)$。此特性极具价值,因其支持不同网格几何与离散化方案间的解迁移。
3 NEURAL OPERATOR
3 神经算子
The neural operator, proposed in (Li et al., 2020b), is formulated as an iterative architecture $v_{0}\mapsto$ $v_{1}\mapsto\ldots\mapsto v_{T}$ where $v_{j}$ for $j=0,1,\ldots,T-1$ is a sequence of functions each taking values in $\mathbb{R}^{d_{v}}$ . As shown in Figure 2 (a), the input $a\in{\mathcal{A}}$ is first lifted to a higher dimensional representation $v_{0}(x)=P(a(x))$ by the local transformation $P$ which is usually parameterized by a shallow fullyconnected neural network. Then we apply several iterations of updates $v_{t}\mapsto v_{t+1}$ (defined below). The output $u(x)=Q(v_{T}(x))$ is the projection of $v_{T}$ by the local transformation $Q:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{u}}$ . In each iteration, the update $v_{t}\mapsto v_{t+1}$ is defined as the composition of a non-local integral operator $\kappa$ and a local, nonlinear activation function $\sigma$ .
神经算子 (Li et al., 2020b) 被表述为一个迭代架构 $v_{0}\mapsto$ $v_{1}\mapsto\ldots\mapsto v_{T}$ ,其中 $v_{j}$ ( $j=0,1,\ldots,T-1$ ) 是取值于 $\mathbb{R}^{d_{v}}$ 的函数序列。如图 2 (a) 所示,输入 $a\in{\mathcal{A}}$ 首先通过局部变换 $P$ (通常由浅层全连接神经网络参数化) 被提升到更高维表示 $v_{0}(x)=P(a(x))$ 。随后我们应用多次迭代更新 $v_{t}\mapsto v_{t+1}$ (定义如下)。输出 $u(x)=Q(v_{T}(x))$ 是通过局部变换 $Q:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{u}}$ 对 $v_{T}$ 的投影。每次迭代中,更新 $v_{t}\mapsto v_{t+1}$ 被定义为非局部积分算子 $\kappa$ 与局部非线性激活函数 $\sigma$ 的组合。
Definition 1 (Iterative updates) Define the update to the representation $v_{t}\mapsto v_{t+1}$ by
定义1 (迭代更新) 将表示更新定义为 $v_{t}\mapsto v_{t+1}$
$$
v_{t+1}(x):=\sigma\Big(W v_{t}(x)+\big(K(a;\phi)v_{t}\big)(x)\Big),\qquad\forall x\in D
$$
$$
v_{t+1}(x):=\sigma\Big(W v_{t}(x)+\big(K(a;\phi)v_{t}\big)(x)\Big),\qquad\forall x\in D
$$
where $\mathcal{K}:\mathcal{A}\times\Theta_{\mathcal{K}}\to\mathcal{L}(\mathcal{U}(D;\mathbb{R}^{d_{v}}),\mathcal{U}(D;\mathbb{R}^{d_{v}}))$ maps to bounded linear operators on $\mathcal{U}(D;\mathbb{R}^{d_{v}})$ and is parameterized by $\phi\in\Theta_{K}$ , $W:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}}$ is a linear transformation, and $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ is $a$ non-linear activation function whose action is defined component-wise.
其中 $\mathcal{K}:\mathcal{A}\times\Theta_{\mathcal{K}}\to\mathcal{L}(\mathcal{U}(D;\mathbb{R}^{d_{v}}),\mathcal{U}(D;\mathbb{R}^{d_{v}}))$ 映射到 $\mathcal{U}(D;\mathbb{R}^{d_{v}})$ 上的有界线性算子,并由 $\phi\in\Theta_{K}$ 参数化,$W:\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}}$ 是线性变换,$\sigma:\mathbb{R}\rightarrow\mathbb{R}$ 是一个逐分量定义的非线性激活函数。
We choose $\textstyle\mathcal{K}(a;\phi)$ to be a kernel integral transformation parameterized by a neural network.
我们选择$\textstyle\mathcal{K}(a;\phi)$作为由神经网络参数化的核积分变换。
Definition 2 (Kernel integral operator $\kappa$ ) Define the kernel integral operator mapping in (2) by
定义 2 (核积分算子 $\kappa$) 将式 (2) 中的核积分算子映射定义为
$$
\big(K(a;\phi)v_{t}\big)(x):=\int_{D}\kappa\big(x,y,a(x),a(y);\phi\big)v_{t}(y)\mathrm{d}y,\qquad\forall x\in D
$$
$$
\big(K(a;\phi)v_{t}\big)(x):=\int_{D}\kappa\big(x,y,a(x),a(y);\phi\big)v_{t}(y)\mathrm{d}y,\qquad\forall x\in D
$$
where $\kappa_{\phi}:\mathbb{R}^{2(d+d_{a})}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ is a neural network parameterized by $\phi\in\Theta_{\kappa}$
其中 $\kappa_{\phi}:\mathbb{R}^{2(d+d_{a})}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 是由参数 $\phi\in\Theta_{\kappa}$ 定义的神经网络
Here $\kappa_{\phi}$ plays the role of a kernel function which we learn from data. Together definitions 1 and 2 constitute a generalization of neural networks to infinite-dimensional spaces as first proposed in Li et al. (2020b). Notice even the integral operator is linear, the neural operator can learn highly non-linear operators by composing linear integral operators with non-linear activation functions, analogous to standard neural networks.
这里 $\kappa_{\phi}$ 作为我们从数据中学习的核函数发挥作用。定义1和定义2共同构成了将神经网络推广到无限维空间的泛化框架,最初由Li等人(2020b)提出。值得注意的是,虽然积分算子是线性的,但神经算子可以通过将线性积分算子与非线性激活函数组合来学习高度非线性的算子,这与标准神经网络类似。
If we remove the dependence on the function $a$ and impose $\kappa_{\phi}(x,y)=\kappa_{\phi}(x-y)$ , we obtain that (3) is a convolution operator, which is a natural choice from the perspective of fundamental solutions. We exploit this fact in the following section by parameter i zing $\kappa_{\phi}$ directly in Fourier space and using the Fast Fourier Transform (FFT) to efficiently compute (3). This leads to a fast architecture that obtains state-of-the-art results for PDE problems.
如果我们去除对函数 $a$ 的依赖,并设定 $\kappa_{\phi}(x,y)=\kappa_{\phi}(x-y)$ ,则可得出 (3) 式是一个卷积算子。从基本解的角度来看,这是自然的选择。在下一节中,我们将利用这一事实,直接在傅里叶空间中对 $\kappa_{\phi}$ 进行参数化,并使用快速傅里叶变换 (FFT) 高效计算 (3) 式。这种快速架构在处理偏微分方程 (PDE) 问题时能取得最先进的结果。
4 FOURIER NEURAL OPERATOR
4 傅里叶神经算子 (Fourier Neural Operator)
We propose replacing the kernel integral operator in (3), by a convolution operator defined in Fourier space. Let $\mathcal{F}$ denote the Fourier transform of a function $\dot{f}:D\rightarrow\mathbb{R}^{d_{v}}$ and ${\mathcal{F}}^{-1}$ its inverse then
我们提出用傅里叶空间定义的卷积算子替代(3)中的核积分算子。设$\mathcal{F}$表示函数$\dot{f}:D\rightarrow\mathbb{R}^{d_{v}}$的傅里叶变换,${\mathcal{F}}^{-1}$为其逆变换,则
$$
(\mathcal{F}f)_ {j}(k)=\int_{D}f_{j}(x)e^{-2i\pi\langle x,k\rangle}\mathrm{d}x,\qquad(\mathcal{F}^{-1}f)_ {j}(x)=\int_{D}f_{j}(k)e^{2i\pi\langle x,k\rangle}\mathrm{d}k
$$
$$
(\mathcal{F}f)_ {j}(k)=\int_{D}f_{j}(x)e^{-2i\pi\langle x,k\rangle}\mathrm{d}x,\qquad(\mathcal{F}^{-1}f)_ {j}(x)=\int_{D}f_{j}(k)e^{2i\pi\langle x,k\rangle}\mathrm{d}k
$$
for $j=1,\ldots,d_{v}$ where $i=\sqrt{-1}$ is the imaginary unit. By letting $\kappa_{\phi}(x,y,a(x),a(y))=\kappa_{\phi}(x-y)$ in (3) and applying the convolution theorem, we find that
对于 $j=1,\ldots,d_{v}$,其中 $i=\sqrt{-1}$ 为虚数单位。令 (3) 式中的 $\kappa_{\phi}(x,y,a(x),a(y))=\kappa_{\phi}(x-y)$ 并应用卷积定理,可得
$$
\bigl(K(a;\phi)v_{t}\bigr)(x)=\mathcal{F}^{-1}\bigl(\mathcal{F}(\kappa_{\phi})\cdot\mathcal{F}(v_{t})\bigr)(x),\qquad\forall x\in D.
$$
$$
\bigl(K(a;\phi)v_{t}\bigr)(x)=\mathcal{F}^{-1}\bigl(\mathcal{F}(\kappa_{\phi})\cdot\mathcal{F}(v_{t})\bigr)(x),\qquad\forall x\in D.
$$
We, therefore, propose to directly parameter ize $\kappa_{\phi}$ in Fourier space.
因此,我们建议直接在傅里叶空间中参数化 $\kappa_{\phi}$。
Definition 3 (Fourier integral operator $\kappa$ ) Define the Fourier integral operator
定义 3 (傅里叶积分算子 $\kappa$) 定义傅里叶积分算子
$$
{\big(}K(\phi)v_{t}{\big)}(x)={\mathcal{F}}^{-1}{\Big(}R_{\phi}\cdot\left({\mathcal{F}}v_{t}\right){\Big)}(x)\qquad\forall x\in D
$$
$$
{\big(}K(\phi)v_{t}{\big)}(x)={\mathcal{F}}^{-1}{\Big(}R_{\phi}\cdot\left({\mathcal{F}}v_{t}\right){\Big)}(x)\qquad\forall x\in D
$$
where $R_{\phi}$ is the Fourier transform of a periodic function $\kappa:\bar{D}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ parameterized by $\phi\in\Theta_{K}$ . An illustration is given in Figure 2 (b).
其中 $R_{\phi}$ 是周期函数 $\kappa:\bar{D}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 的傅里叶变换,由 $\phi\in\Theta_{K}$ 参数化。图 2 (b) 给出了图示。
For frequency mode $k\in D$ , we have $(\mathcal{F}v_{t})(k)\in\mathbb{C}^{d_{v}}$ and $R_{\phi}(k)\in\mathbb{C}^{d_{v}\times d_{v}}$ . Notice that since we assume $\kappa$ is periodic, it admits a Fourier series expansion, so we may work with the discrete modes $k\in{\mathbb{Z}^{d}}$ . We pick a finite-dimensional parameter iz ation by truncating the Fourier series at a maximal number of modes $k_{\operatorname*{max}}=|Z_{k_{\operatorname*{max}}}|\stackrel{\cdot}{=}|{k\in\mathbb{Z}^{d}:|k_{j}|\stackrel{\cdot}{\leq}k_{\operatorname*{max},j} $ , for $j=1,\ldots,d}$ . We thus parameter ize $R_{\phi}$ directly as complex-valued $(k_{\operatorname*{max}}\times d_{v}\times d_{v})$ -tensor comprising a collection of truncated Fourier modes and therefore drop $\phi$ from our notation. Since $\kappa$ is real-valued, we impose conjugate symmetry. We note that the set $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ is not the canonical choice for the low frequency modes of $v_{t}$ . Indeed, the low frequency modes are usually defined by placing an upper-bound on the $\ell_{1}$ -norm of $k\in{\mathbb{Z}^{d}}$ . We choose $Z_{\boldsymbol{k}_{\mathrm{max}}}$ as above since it allows for an efficient implementation.
对于频率模式 $k\in D$,我们有 $(\mathcal{F}v_{t})(k)\in\mathbb{C}^{d_{v}}$ 和 $R_{\phi}(k)\in\mathbb{C}^{d_{v}\times d_{v}}$。注意到由于假设 $\kappa$ 是周期性的,它允许傅里叶级数展开,因此我们可以处理离散模式 $k\in{\mathbb{Z}^{d}}$。我们通过截断傅里叶级数在最大模式数 $k_{\operatorname*{max}}=|Z_{k_{\operatorname*{max}}}|\stackrel{\cdot}{=}|{k\in\mathbb{Z}^{d}:|k_{j}|\stackrel{\cdot}{\leq}k_{\operatorname*{max},j}$(对于 $j=1,\ldots,d}$)处选择一个有限维参数化。因此,我们将 $R_{\phi}$ 直接参数化为复值 $(k_{\operatorname*{max}}\times d_{v}\times d_{v})$ 张量,包含一组截断的傅里叶模式,因此在符号中去掉了 $\phi$。由于 $\kappa$ 是实值的,我们施加共轭对称性。我们注意到集合 $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ 并不是 $v_{t}$ 低频模式的标准选择。实际上,低频模式通常是通过对 $k\in{\mathbb{Z}^{d}}$ 的 $\ell_{1}$ 范数设置上限来定义的。我们选择上述的 $Z_{\boldsymbol{k}_{\mathrm{max}}}$ 是因为它允许高效的实现。
The discrete case and the FFT. Assuming the domain $D$ is disc ret i zed with $n\in\mathbb{N}$ points, we have that $\mathbf{v}_ t \in \mathbb{R}^{n \times d_v}$ and $\mathcal{F}(v_{t})\in\mathbb{C}^{n\times d_{v}^{\star}}$ . Since we convolve $v_{t}$ with a function which only has $k_{\mathrm{max}}$ Fourier modes, we may simply truncate the higher modes to obtain $\mathcal{F}(v_{t})\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}}$ . Multiplication by the weight tensor $R\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}\times d_{v}}$ is then
离散情形与FFT。假设域$D$被离散化为$n\in\mathbb{N}$个点,则有$\mathbf{v}_ t \in \mathbb{R}^{n \times d_v}$且$\mathcal{F}(v_{t})\in\mathbb{C}^{n\times d_{v}^{\star}}$。由于我们仅用含$k_{\mathrm{max}}$个傅里叶模式的函数对$v_{t}$进行卷积,可直接截断高频模式得到$\mathcal{F}(v_{t})\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}}$。随后通过权重张量$R\in\mathbb{C}^{k_{\operatorname*{max}}\times d_{v}\times d_{v}}$进行乘法运算。
$$
\big(R\cdot\big(\mathcal{F}v_{t}\big)\big)_ {k,l}=\sum_{j=1}^{d_{v}}R_{k,l,j}\big(\mathcal{F}v_{t}\big)_ {k,j},k=1,\ldots,k_{\operatorname*{max}},\quad j=1,\ldots,d_{v}.
$$
$$
\big(R\cdot\big(\mathcal{F}v_{t}\big)\big)_ {k,l}=\sum_{j=1}^{d_{v}}R_{k,l,j}\big(\mathcal{F}v_{t}\big)_ {k,j},k=1,\ldots,k_{\operatorname*{max}},\quad j=1,\ldots,d_{v}.
$$
When the disc ret iz ation is uniform with resolution $s_{1}\times\cdots\times s_{d}=n$ , $\mathcal{F}$ can be replaced by the Fast Fourier Transform. For $f\in\mathbb{R}^{n\times d_{v}}$ , $k=(k_{1},\dots,k_{d})\in\mathbb{Z}_ {s_{1}}\times\dots\times\mathbb{Z}_ {s_{d}}$ , and $x=(x_{1},\dots,\overset{\cdot}{x}_{d})\in D$ , the FFT $\hat{\mathcal F}$ and its inverse ${\hat{\mathcal{F}}}^{-1}$ are defined as
当离散化均匀且分辨率为 $s_{1}\times\cdots\times s_{d}=n$ 时,$\mathcal{F}$ 可被快速傅里叶变换 (Fast Fourier Transform) 替代。对于 $f\in\mathbb{R}^{n\times d_{v}}$、$k=(k_{1},\dots,k_{d})\in\mathbb{Z}_{s_{1}}\times\dots\times\mathbb{Z}_ {s_{d}}$ 以及 $x=(x_{1},\dots,\overset{\cdot}{x}_{d})\in D$,FFT $\hat{\mathcal F}$ 及其逆变换 ${\hat{\mathcal{F}}}^{-1}$ 定义为
$$
\begin{array}{r l r}&{}&{\displaystyle(\hat{\mathcal{F}}f)_ {l}(k)=\sum_{x_{1}=0}^{s_{1}-1}\cdots\sum_{x_{d}=0}^{s_{d}-1}f_{l}(x_{1},\ldots,x_{d})e^{-2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}},}\ &{}&{\displaystyle(\hat{\mathcal{F}}^{-1}f)_ {l}(x)=\sum_{k_{1}=0}^{s_{1}-1}\cdots\sum_{k_{d}=0}^{s_{d}-1}f_{l}(k_{1},\ldots,k_{d})e^{2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}}}\end{array}
$$
$$
\begin{array}{r l r}&{}&{\displaystyle(\hat{\mathcal{F}}f)_ {l}(k)=\sum_{x_{1}=0}^{s_{1}-1}\cdots\sum_{x_{d}=0}^{s_{d}-1}f_{l}(x_{1},\ldots,x_{d})e^{-2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}},}\ &{}&{\displaystyle(\hat{\mathcal{F}}^{-1}f)_ {l}(x)=\sum_{k_{1}=0}^{s_{1}-1}\cdots\sum_{k_{d}=0}^{s_{d}-1}f_{l}(k_{1},\ldots,k_{d})e^{2i\pi\sum_{j=1}^{d}\frac{x_{j}k_{j}}{s_{j}}}}\end{array}
$$
for $l=1,\ldots,d_{v}$ . In this case, the set of truncated modes becomes
对于 $l=1,\ldots,d_{v}$。此时,截断模态集变为
When implemented, $R$ is treated as a $\left(s_{1}\times\dots\times s_{d}\times d_{v}\times d_{v}\right)$ -tensor and the above definition of $Z_{\boldsymbol{k}_ {\mathrm{max}}}$ corresponds to the “corners” of $R$ , which allows for a straight-forward parallel implementation of (5) via matrix-vector multiplication. In practice, we have found that choosing $k_{\operatorname*{max},j}=12$ which yields $k_{\operatorname*{max}}=12^{d}$ parameters per channel to be sufficient for all the tasks that we consider.
实现时,$R$被视为一个$\left(s_{1}\times\dots\times s_{d}\times d_{v}\times d_{v}\right)$维张量,上述$Z_{\boldsymbol{k}_ {\mathrm{max}}}$的定义对应$R$的"角点",这使得(5)式可通过矩阵-向量乘法直接并行实现。实践中我们发现,选择$k_{\operatorname*{max},j}=12$(每通道产生$k_{\operatorname*{max}}=12^{d}$个参数)足以应对所有考虑的任务。
Parameter iz at ions of $R$ . In general, $R$ can be defined to depend on $(\mathcal{F}a)$ to parallel (3). Indeed, we can define $R_{\phi}:\mathbb{Z}^{d}\times\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ as a parametric function that maps $(k,(\mathcal{F}a)(k))$ to the values of the appropriate Fourier modes. We have experimented with linear as well as neural network parameter iz at ions of $R_{\phi}$ . We find that the linear parameter iz ation has a similar performance to the previously described direct parameter iz ation, while neural networks have worse performance. This is likely due to the discrete structure of the space $\mathbb{Z}^{d}$ . Our experiments in this work focus on the direct parameter iz ation presented above.
参数化 $R$。通常,可以定义 $R$ 依赖于 $(\mathcal{F}a)$ 以与式 (3) 对应。具体而言,我们可以将 $R_{\phi}:\mathbb{Z}^{d}\times\mathbb{R}^{d_{v}}\rightarrow\mathbb{R}^{d_{v}\times d_{v}}$ 定义为参数化函数,将 $(k,(\mathcal{F}a)(k))$ 映射到相应傅里叶模式的值。我们尝试了 $R_{\phi}$ 的线性参数化和神经网络参数化,发现线性参数化与前述直接参数化性能相当,而神经网络性能较差,这可能是由于 $\mathbb{Z}^{d}$ 空间的离散结构所致。本文实验主要采用上述直接参数化方法。
Invariance to disc ret iz ation. The Fourier layers are disc ret iz ation-invariant because they can learn from and evaluate functions which are disc ret i zed in an arbitrary way. Since parameters are learned directly in Fourier space, resolving the functions in physical space simply amounts to projecting on the basis $e^{2\pi i\langle x,k\rangle}$ which are well-defined everywhere on $\mathbb{R}^{d}$ . This allows us to achieve zero-shot super-resolution as shown in Section 5.4. Furthermore, our architecture has a consistent error at any resolution of the inputs and outputs. On the other hand, notice that, in Figure 3, the standard CNN methods we compare against have an error that grows with the resolution.
离散化不变性。Fourier层具有离散化不变性,因为它们能够从任意方式离散化的函数中学习并进行评估。由于参数直接在傅里叶空间学习,在物理空间解析函数仅需投影到定义于$\mathbb{R}^{d}$全域的基函数$e^{2\pi i\langle x,k\rangle}$上。如第5.4节所示,这使我们能够实现零样本超分辨率。此外,我们的架构在输入和输出的任何分辨率下都保持一致的误差。值得注意的是,在图3中,作为对比的标准CNN方法其误差会随分辨率增加而增大。
Quasi-linear complexity. The weight tensor $R$ contains $k_{\operatorname*{max}}<n$ modes, so the inner multiplication has complexity $O(k_{\operatorname*{max}})$ . Therefore, the majority of the computational cost lies in computing the Fourier transform $\mathcal{F}(v_{t})$ and its inverse. General Fourier transforms have complexity ${\dot{O}}(n^{2})$ , however, since we truncate the series the complexity is in fact $O(n k_{\operatorname*{max}})$ , while the FFT has complexity $O(n\log n)$ . Generally, we have found using FFTs to be very efficient. However a uniform disc ret iz ation is required.
拟线性复杂度。权重张量 $R$ 包含 $k_{\operatorname*{max}}<n$ 个模态,因此内积运算复杂度为 $O(k_{\operatorname*{max}})$ 。主要计算开销在于计算傅里叶变换 $\mathcal{F}(v_{t})$ 及其逆变换。常规傅里叶变换复杂度为 ${\dot{O}}(n^{2})$ ,但由于我们截断了级数,实际复杂度为 $O(n k_{\operatorname*{max}})$ ,而快速傅里叶变换(FFT)的复杂度为 $O(n\log n)$ 。实践发现FFT效率极高,但要求采用均匀离散化处理。
5 NUMERICAL EXPERIMENTS
5 数值实验
In this section, we compare the proposed Fourier neural operator with multiple finite-dimensional architectures as well as operator-based approximation methods on the 1-d Burgers’ equation, the 2-d Darcy Flow problem, and 2-d Navier-Stokes equation. The data generation processes are discussed in Appendices A.3.1, A.3.2, and A.3.3 respectively. We do not compare against traditional solvers (FEM/FDM) or neural-FEM type methods since our goal is to produce an efficient operator approximation that can be used for downstream applications. We demonstrate one such application to the Bayesian inverse problem in Section 5.5.
在本节中,我们将提出的傅里叶神经算子 (Fourier neural operator) 与多种有限维架构以及基于算子的逼近方法进行比较,测试案例包括一维Burgers方程、二维Darcy流问题和二维Navier-Stokes方程。数据生成过程分别在附录A.3.1、A.3.2和A.3.3中讨论。由于我们的目标是构建可用于下游应用的高效算子逼近方法,因此未与传统求解器 (FEM/FDM) 或神经-FEM类方法进行比较。我们在5.5节展示了该方法在贝叶斯逆问题中的典型应用。
We construct our Fourier neural operator by stacking four Fourier integral operator layers as specified in (2) and (4) with the ReLU activation as well as batch normalization. Unless otherwise specified, we use $N=1000$ training instances and 200 testing instances. We use Adam optimizer to train for 500 epochs with an initial learning rate of 0.001 that is halved every 100 epochs. We set $k_{\mathrm{max},j}=16,d_{v}=64$ for the 1-d problem and $k_{\mathrm{max},j}=12,d_{v}=32$ for the 2-d problems. Lower resolution data are down sampled from higher resolution. All the computation is carried on a single Nvidia V100 GPU with 16GB memory.
我们通过堆叠四个由(2)和(4)式定义的傅里叶积分算子层来构建傅里叶神经算子,其中包含ReLU激活函数和批量归一化。除非另有说明,我们使用1000个训练实例和200个测试实例。采用Adam优化器进行500轮训练,初始学习率为0.001,每100轮减半。对于一维问题设置$k_{\mathrm{max},j}=16,d_{v}=64$,二维问题设置$k_{\mathrm{max},j}=12,d_{v}=32$。低分辨率数据由高分辨率降采样得到。所有计算均在16GB显存的Nvidia V100 GPU上完成。
Remark on Resolution. Traditional PDE solvers such as FEM and FDM approximate a single function and therefore their error to the continuum decreases as the resolution is increased. On the other hand, operator approximation is independent of the ways its data is disc ret i zed as long as all relevant information is resolved. Resolution-invariant operators have consistent error rates among different resolutions as shown in Figure 3. Further, resolution-invariant operators can do zero-shot super-resolution, as shown in Section 5.4.
关于分辨率的说明。传统偏微分方程(PDE)求解器(如有限元法(FEM)和有限差分法(FDM))通过逼近单个函数来实现求解,因此随着分辨率提高,其与连续解的误差会减小。而算子逼近的精度与数据离散方式无关,只要所有相关信息被解析即可。如图3所示,分辨率不变的算子在不同分辨率下具有一致的误差率。此外,如第5.4节所示,分辨率不变的算子还能实现零样本超分辨率。
Benchmarks for time-independent problems (Burgers and Darcy): NN: a simple point-wise feed forward neural network. RBM: the classical Reduced Basis Method (using a POD basis) (De
时间无关问题基准测试(Burgers和Darcy方程):
NN: 简单的逐点前馈神经网络
RBM: 经典降基方法(采用POD基) (De
Figure 3: Benchmark on Burger’s equation, Darcy Flow, and Navier-Stokes
图 3: Burger方程、Darcy流和Navier-Stokes的基准测试
Left: benchmarks on Burgers equation; Mid: benchmarks on Darcy Flow for different resolutions; Right: the learning curves on Navier-Stokes $\nu=1\mathrm{{e}-3}$ with different benchmarks. Train and test on the same resolution. For acronyms, see Section 5; details in Tables 1, 3, 4.
左:Burgers方程的基准测试;中:不同分辨率的Darcy Flow基准测试;右:Navier-Stokes $\nu=1\mathrm{{e}-3}$ 在不同基准下的学习曲线。训练与测试采用相同分辨率。缩略语参见第5节;细节见表1、表3、表4。
Vore, 2014). FCN: a the-state-of-the-art neural network architecture based on Fully Convolution Networks (Zhu & Zabaras, 2018). PCANN: an operator method using PCA as an auto encoder on both the input and output data and interpolating the latent spaces with a neural network (Bhattacharya et al., 2020). GNO: the original graph neural operator (Li et al., 2020b). MGNO: the multipole graph neural operator (Li et al., 2020a). LNO: a neural operator method based on the low-rank decomposition of the kernel $\begin{array}{r}{\kappa(x,y):=\sum_{j=1}^{r}\phi_{j}(x)\psi_{j}(y)}\end{array}$ , similar to the unstacked DeepONet proposed in (Lu et al., 2019). FNO: the ne wly purposed Fourier neural operator.
Vore, 2014)。FCN:一种基于全卷积网络 (Fully Convolution Networks) 的先进神经网络架构 (Zhu & Zabaras, 2018)。PCANN:一种算子方法,在输入和输出数据上使用PCA作为自动编码器,并通过神经网络对潜在空间进行插值 (Bhattacharya et al., 2020)。GNO:原始图神经算子 (Li et al., 2020b)。MGNO:多极图神经算子 (Li et al., 2020a)。LNO:一种基于核低秩分解的神经算子方法,其核函数表示为 $\begin{array}{r}{\kappa(x,y):=\sum_{j=1}^{r}\phi_{j}(x)\psi_{j}(y)}\end{array}$,类似于 (Lu et al., 2019) 提出的非堆叠式DeepONet。FNO:新提出的傅里叶神经算子。
Benchmarks for time-dependent problems (Navier-Stokes): ResNet: 18 layers of 2-d convolution with residual connections (He et al., 2016). U-Net: A popular choice for image-to-image regression tasks consisting of four blocks with 2-d convolutions and de convolutions (Ronne berger et al., 2015). TF-Net: A network designed for learning turbulent flows based on a combination of spatial and temporal convolutions (Wang et al., 2020). FNO-2d: 2-d Fourier neural operator with a RNN structure in time. FNO-3d: 3-d Fourier neural operator that directly convolves in space-time.
时间相关问题基准测试(Navier-Stokes):
ResNet: 18层二维卷积网络,带残差连接(He et al., 2016)。
U-Net: 图像到图像回归任务的常用架构,包含四个二维卷积和反卷积模块(Ronneberger et al., 2015)。
TF-Net: 专为学习湍流设计的网络,结合了空间和时间卷积(Wang et al., 2020)。
FNO-2d: 二维傅里叶神经算子,时间维度采用RNN结构。
FNO-3d: 三维傅里叶神经算子,直接在时空域进行卷积。
5.1 BURGERS’ EQUATION
5.1 伯格斯方程 (Burgers' Equation)
The 1-d Burgers’ equation is a non-linear PDE with various applications including modeling the one dimensional flow of a viscous fluid. It takes the form
一维 Burgers 方程是一种非线性偏微分方程,其应用包括模拟粘性流体的一维流动。其形式为
$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1)}\end{array}
$$
$$
\begin{array}{r l}{\partial_{t}u(x,t)+\partial_{x}(u^{2}(x,t)/2)=\nu\partial_{x x}u(x,t),\quad}&{x\in(0,1),t\in(0,1]}\ {u(x,0)=u_{0}(x),\quad}&{x\in(0,1)}\end{array}
$$
with periodic boundary conditions where $u_{0}\in L_{\mathrm{per}}^{2}((0,1);\mathbb{R})$ is the initial condition and $\nu\in\mathbb{R}_ {+}$ is the viscosity coefficient. We aim to learn the operator mapping the initial condition to the solution at time one, $G^{\dagger}:L_{\mathrm{per}}^{2}((0,1);\mathbb{R})\to H_{\mathrm{per}}^{r}((0,1);\mathbb{R})$ defined by $\bar{u}_{0}\mapsto u(\cdot,1)$ for any $r>0$ .
在周期性边界条件下,其中 $u_{0}\in L_{\mathrm{per}}^{2}((0,1);\mathbb{R})$ 是初始条件,$\nu\in\mathbb{R}_ {+}$ 是粘性系数。我们的目标是学习将初始条件映射到时间为一时的解的算子 $G^{\dagger}:L_{\mathrm{per}}^{2}((0,1);\mathbb{R})\to H_{\mathrm{per}}^{r}((0,1);\mathbb{R})$,定义为 $\bar{u}_{0}\mapsto u(\cdot,1)$,其中 $r>0$。
The results of our experiments are shown in Figure 3 (a) and Table 3 (Appendix A.3.1). Our proposed method obtains the lowest relative error compared to any of the benchmarks. Further, the error is invariant with the resolution, while the error of convolution neural network based methods (FCN) grows with the resolution. Compared to other neural operator methods such as GNO and MGNO that use Nystrom sampling in physical space, the Fourier neural operator is both more accurate and more computationally efficient.
我们的实验结果如图 3 (a) 和表 3 (附录 A.3.1) 所示。与所有基准方法相比,我们提出的方法获得了最低的相对误差。此外,该误差不随分辨率变化,而基于卷积神经网络的方法 (FCN) 的误差会随分辨率增加。与在物理空间使用 Nystrom 采样的其他神经算子方法 (如 GNO 和 MGNO) 相比,傅里叶神经算子 (Fourier neural operator) 在精度和计算效率上都更优。
5.2 DARCY FLOW
5.2 达西流 (Darcy Flow)
We consider the steady-state of the 2-d Darcy Flow equation on the unit box which is the second order, linear, elliptic PDE
我们考虑单位盒上二维达西流(Darcy Flow)方程的稳态情况,这是一个二阶线性椭圆型偏微分方程(PDE)
$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}}\end{array}
$$
$$
\begin{array}{r l r l}{-\nabla\cdot(a(x)\nabla u(x))=f(x)}&{}&{x\in(0,1)^{2}}\ {u(x)=0}&{}&{x\in\partial(0,1)^{2}}\end{array}
$$
with a Dirichlet boundary where $a\in L^{\infty}((0,1)^{2};\mathbb{R}_ {+})$ is the diffusion coefficient and $f\in$ $L^{2}((0,1)^{2};\mathbb{R})$ is the forcing function. This PDE has numerous applications including modeling the pressure of subsurface flow, the deformation of linearly elastic materials, and the electric potential in conductive materials. We are interested in learning the operator mapping the diffusion coefficient to the solution, $G^{\dagger}:L^{\infty}((0,1)_ {.}^{2};\mathbb{R}_ {+})\to H_{0}^{1}((0,\bar{1})^{2};\mathbb{R}_{+})$ defined by $a\mapsto u$ . Note that although the PDE is linear, the operator $G^{\dagger}$ is not.
具有Dirichlet边界条件,其中$a\in L^{\infty}((0,1)^{2};\mathbb{R}_ {+})$为扩散系数,$f\in L^{2}((0,1)^{2};\mathbb{R})$为外力函数。该偏微分方程在模拟地下流动压力、线性弹性材料变形以及导电材料电势等领域有广泛应用。我们关注于学习从扩散系数映射到解的算子$G^{\dagger}:L^{\infty}((0,1)^{2};\mathbb{R}_ {+})\to H_{0}^{1}((0,1)^{2};\mathbb{R}_{+})$,其定义为$a\mapsto u$。需注意尽管该偏微分方程是线性的,但算子$G^{\dagger}$并非线性。
The results of our experiments are shown in Figure 3 (b) and Table 4 (Appendix A.3.2). The proposed Fourier neural operator obtains nearly one order of magnitude lower relative error compared to any benchmarks. We again observe the invariance of the error with respect to the resolution.
我们的实验结果如图 3 (b) 和表 4 (附录 A.3.2) 所示。相比所有基准方法,提出的傅里叶神经算子 (Fourier neural operator) 获得了近一个数量级更低的相对误差。我们再次观察到误差对分辨率的不变性。