[论文翻译]PANDA: 超越重连的扩展宽度感知消息传递


原文地址:https://arxiv.org/pdf/2406.03671v2


PANDA: Expanded Width-Aware Message Passing Beyond Rewiring

PANDA: 超越重连的扩展宽度感知消息传递

Jeongwhan Choi 1 Sumin Park 2 Hyowon Wi 1 Sung-Bae Cho 1 Noseong Park 3

Abstract

摘要

Recent research in the field of graph neural network (GNN) has identified a critical issue known as “over-squashing,” resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagation. However, such approaches inevitably deteriorate the original graph topology, which may lead to a distortion of information flow. To address this, we introduce an expanded width-aware (PANDA) message passing, a new message passing paradigm where nodes with high centrality, a potential source of over-squashing, are selectively expanded in width to encapsulate the growing influx of signals from distant nodes. Experimental results show that our method outperforms existing rewiring methods, suggesting that selectively expanding the hidden state of nodes can be a compelling alternative to graph rewiring for addressing the over-squashing.

图神经网络 (GNN) 领域的最新研究发现了一个关键问题,即由于图结构中的瓶颈现象导致的"过度挤压" (over-squashing),这会阻碍远程信息的传播。先前的研究提出了各种图重连 (graph rewiring) 概念,旨在优化图的空间或频谱特性以促进信号传播。然而,这些方法不可避免地会破坏原始图拓扑结构,可能导致信息流失真。为了解决这个问题,我们引入了一种扩展宽度感知 (PANDA) 消息传递机制,这是一种新的消息传递范式,其中具有高中心性 (可能成为过度挤压源) 的节点会被选择性地扩展宽度,以封装来自远处节点不断增长的信号流。实验结果表明,我们的方法优于现有的重连方法,这表明选择性地扩展节点的隐藏状态可以成为解决过度挤压问题的图重连方法的有效替代方案。

1. Introduction

1. 引言

Graph Neural Networks (GNNs) have emerged as a powerful tool for graph data processing and are being studied extensively in various domains, as evidenced by significant research contributions (Defferrard et al., 2016; Velickovic et al., 2018; Chen et al., 2020; Chien et al., 2021; Choi et al., 2021; Chamberlain et al., 2021; Hwang et al., 2021; Choi & Park, 2023; Choi et al., 2023; Kang et al., 2023; Gru- ber et al., 2023). A key subclass within GNNs is Message Passing Neural Networks (MPNNs), which excel in propagating information through neighbouring nodes. As MPNNs propagate only for a 1-hop distance within a single layer, long-range interactions can only be captured when the depth of the network is comparable to the distance. However, increasing the number of layers for capturing long-range dependencies results in exponential growths of the receptive field and excessive aggregations of repeated messages that are subsequently packed into a feature vector of fixed length, which causes the problem called “over-squashing” (Alon & Yahav, 2021; Shi et al., 2023b). This phenomenon severely decreases the expressivity of the networks and thus impairs the model performance. Previous studies have attempted to identify topological bottlenecks in a graph and directly modify graph connectivity with various optimization targets, including curvature, effective resistance, and spectral gap (Alon & Yahav, 2021; Topping et al., 2022; Banerjee et al., 2022; Deac et al., 2022; Arnaiz-Rodriguez et al.,2022; Karhadkar et al., 2023; Black et al., 2023; Fesser & Weber, 2023; Sonthalia et al., 2023; Giraldo et al., 2023; Shi et al., 2023a; Barbero et al., 2023).

图神经网络 (Graph Neural Networks, GNNs) 已成为图数据处理的强大工具,并在多个领域得到广泛研究 (Defferrard et al., 2016; Velickovic et al., 2018; Chen et al., 2020; Chien et al., 2021; Choi et al., 2021; Chamberlain et al., 2021; Hwang et al., 2021; Choi & Park, 2023; Choi et al., 2023; Kang et al., 2023; Gruber et al., 2023)。GNNs 中的一个重要子类是消息传递神经网络 (Message Passing Neural Networks, MPNNs),它擅长通过相邻节点传播信息。由于 MPNNs 在单层中仅传播 1 跳距离,因此只有当网络深度与距离相当时才能捕获长程交互。然而,增加层数以捕获长程依赖会导致感受野呈指数级增长,并过度聚合重复消息,这些消息随后被压缩到固定长度的特征向量中,从而引发称为“过度挤压 (over-squashing)”的问题 (Alon & Yahav, 2021; Shi et al., 2023b)。这一现象严重降低了网络的表达能力,从而损害模型性能。先前的研究试图识别图中的拓扑瓶颈,并通过各种优化目标(包括曲率、有效电阻和谱间隙)直接修改图连接性 (Alon & Yahav, 2021; Topping et al., 2022; Banerjee et al., 2022; Deac et al., 2022; Arnaiz-Rodriguez et al., 2022; Karhadkar et al., 2023; Black et al., 2023; Fesser & Weber, 2023; Sonthalia et al., 2023; Giraldo et al., 2023; Shi et al., 2023a; Barbero et al., 2023)。


Figure 1. Potential pitfalls of rewiring in domain-specific graphs: (a) In a molecular graph, rewiring the edge in red to a benzene ring violates the domain knowledge. (b) In a social graph, connecting a user to his/her enemy may lead to totally different meaning.

图 1: 领域特定图中重连的潜在陷阱: (a) 在分子图中, 将红色边重连至苯环会违反领域知识。 (b) 在社交图中, 将用户与其敌人建立连接可能导致完全不同的含义。


Figure 2. Comparison of resistance correlation and mean accuracy across different methods for GCN. A large negative correlation reflects that a higher total effective resistance is associated with reduced signal propagation (see Section 4).

图 2: GCN不同方法下电阻相关性与平均准确率的对比。较大的负相关性表明较高的总有效电阻会导致信号传播减弱 (详见第4节)。

Motivation. Existing rewiring methods that alter the graph topology to resolve over-squashing, while potentially beneficial, can inadvertently introduce inaccuracies within domain-specific contexts. Fig. 1(a) demonstrates how modifying the molecular graph of a benzene ring could contradict chemical principles, while Fig. 1(b) shows that rewiring in social networks could result in the distortion of underlying community structures. These examples highlight the necessity to preserve the original graph structure to maintain the validity of domain-specific semantic information.

动机。现有的重连方法通过改变图拓扑来解决过度挤压问题,虽然可能有益,但在特定领域情境中可能无意间引入不准确性。图 1(a) 展示了修改苯环分子图可能违背化学原理的情况,而图 1(b) 则表明社交网络中的重连可能导致底层社区结构扭曲。这些例子凸显了保留原始图结构以维持特定领域语义信息有效性的必要性。

Prior works have tried to identify various factors associated with the over-squashing through the lens of sensitivity analysis of Jacobian of node features (Topping et al., 2022; Di Giovanni et al., 2023). As one of those trials, Di Giovanni et al. (2023), an insightful analytical paper, provides a theoretical justification that increasing the width of the model (i.e., hidden dimension) can also contribute to improving the sensitivity of the model. However, it points out a major concern regarding expanding the width of the model to address over-squashing; increasing the hidden dimension globally affects the whole networks at the expense of the generalization capacity of the model.

先前的研究试图通过节点特征雅可比矩阵 (Jacobian) 的敏感性分析 (Topping et al., 2022; Di Giovanni et al., 2023) 来识别与过度挤压 (over-squashing) 相关的各种因素。作为其中一项探索,Di Giovanni et al. (2023) 这篇富有洞察力的分析论文从理论上证明,增加模型宽度 (即隐藏维度) 也有助于提升模型敏感性。但该研究同时指出,通过扩展模型宽度来解决过度挤压问题存在一个主要缺陷:全局增加隐藏维度会影响整个网络,并以牺牲模型泛化能力为代价。

Inspired by the limitation of existing rewiring methods and the promising role of width expansion for addressing the over-squashing, we aim to design a novel message passing paradigm that mitigates the over-squashing by targeting the nodes that are mainly involved in creating bottlenecks in the signal flow and selectively expanding their width without the risk to change the underlying graph topology.

受现有重连方法局限性的启发,以及宽度扩展在解决过度挤压问题中的潜在作用,我们旨在设计一种新颖的消息传递范式。该范式通过针对信号流中主要形成瓶颈的节点,选择性地扩展其宽度而不改变底层图拓扑结构,从而缓解过度挤压问题。

Main idea. It is possible to define graph bottlenecks as nodes with high centrality, such as betweenness centrality (Yu et al., 2007; Topping et al., 2022). Nodes with high centrality can show hub-like behavior in that they connect to a relatively larger set of nodes lying on the different parts of the graph, and thus receive excessive information from them. Increasing the width of the feature vector of these nodes provides more space for them to process information flowing from other nodes. Then, we can take advantage of their enhanced capacities to alleviate over-squashing. In order to do this, it requires exchanging messages between nodes with different hidden dimensions. This raises a natural question:

主要思想。可以将图瓶颈定义为具有高中心性的节点,例如介数中心性 (Yu et al., 2007; Topping et al., 2022)。高中心性节点会表现出枢纽般的行为,即它们连接到位于图中不同部分的相对较多的节点,从而从这些节点接收过多的信息。增加这些节点的特征向量宽度,可以为它们处理来自其他节点的信息提供更多空间。然后,我们可以利用它们增强的能力来缓解过度挤压问题。为了实现这一点,需要在具有不同隐藏维度的节点之间交换信息。这就引出了一个自然的问题:

“Is it possible to design a message passing that enables information exchange between nodes with different width only in a way that mitigates over-squashing without rewiring?”

能否设计一种消息传递机制,使得仅在不同宽度的节点之间以缓解过压缩(over-squashing)的方式交换信息,而无需重新连接?

We propose a new message passing framework that addresses the above question. First, the entire node set is divided into two subsets, expanded and non-expanded nodes, which are colored in red $\textcircled{4}$ and blue $\textcircled{4}$ , respectively in Fig. 3. Then, we classify four different edge types depending on the expansion state of the nodes at source (edge tail) and target (edge head) position: low-to-low, high-to-high, low-to-high, and high-to-low. Each edge type requires a distinct message passing scheme.

我们提出了一种新的消息传递框架来解决上述问题。首先,将整个节点集划分为两个子集:扩展节点和非扩展节点,在图3中分别用红色$\textcircled{4}$和蓝色$\textcircled{4}$表示。然后,根据源位置(边尾)和目标位置(边头)节点的扩展状态,将边分为四种不同类型:低到低、高到高、低到高以及高到低。每种边类型都需要采用不同的消息传递方案。

The standard message passing between nodes with equal widths corresponds to the conventional message passing (see Figs. 3(b) and 3(c)). For high-to-low message passing (see Fig. 3(d)), a low-dimensional node selects and receives a subset of the information sent by a high-dimensional node. The low-to-high message passing involves augmenting the hidden vectors of low-dimensional nodes to match the higher dimension size for propagation (see Fig. 3(e)).

节点间等宽度的标准消息传递对应传统消息传递(见图3(b)和图3(c))。对于高维到低维的消息传递(见图3(d)),低维节点会筛选并接收高维节点发送的信息子集。而低维到高维的消息传递则需要将低维节点的隐藏向量扩展至高维尺寸进行传播(见图3(e))。

Our framework can potentially enhance the capacity of nodes in the perspective of signal propagation. As shown in Fig. 2, our framework maintains constant signal propagation w.r.t. effective resistance and simultaneously improves the accuracy compared to the existing rewiring methods. We use the correlation coefficient to quantify how signal propagation decreases as effective resistance increases.

我们的框架有望从信号传播的角度提升节点能力。如图 2 所示,相较于现有的重连方法,我们的框架在保持有效电阻 (effective resistance) 恒定信号传播的同时提高了准确性。我们使用相关系数来量化信号传播如何随着有效电阻的增加而减弱。

Contributions and outline. We introduce an expanded width-aware message passing (PANDA)1, a novel paradigm for message passing with expanded widths for nodes that are potentially bottlenecks. Our contributions can be summarized as follows:

贡献与概要。我们提出了一种扩展宽度感知的消息传递方法 (PANDA),这是一种针对潜在瓶颈节点采用扩展宽度进行消息传递的新范式。我们的贡献可概括如下:

• In Section 3, we propose PANDA that enables the signal propagation across the nodes of different widths. • In Section 4, we discuss how our PANDA can alleviate over-squashing in terms of various perspectives. Verifying a higher feature sensitivity compared to other models, we show PANDA can overcome topological bottleneck, maintaining consistent signal propagation even under large effective resistance. Finally, we show that our method solves the limitation of existing rewiring methods, e.g., over-smoothing.

  • 在第3节中,我们提出了PANDA,实现了不同宽度节点间的信号传播。
  • 在第4节中,我们从多个角度讨论了PANDA如何缓解过度挤压问题。通过验证其相比其他模型具有更高的特征敏感性,我们证明PANDA能够克服拓扑瓶颈,即使在高有效电阻下也能保持稳定的信号传播。最后,我们展示了该方法解决了现有重布线方法(如过度平滑)的局限性。

• In Section 5, we empirically demonstrate that our PANDA outperforms existing rewiring methods.

• 在第5节中,我们通过实验证明PANDA优于现有的重连方法。

2. Preliminaries

2. 预备知识

We first examine the notation used in this paper. Given a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , we use $\nu$ and $\mathcal{E}$ to denote its nodes and edges, respectively. The nodes are indexed by $v$ and $u$ such that $v,u\in\nu$ , and an edge connecting nodes $v$ and $u$ is denoted by $(v,u)\in\mathcal{E}$ . The connectivity is encoded in the adjacency matrix $\mathbf{A}\in\mathbb{R}^{n\times n}$ where $n$ is the number of nodes. $p$ denotes the width (hidden dimension size), while $\ell$ is the number of layers. The feature of node $v$ at layer $\ell$ is written as h(vℓ+1).

我们首先介绍本文使用的符号表示。给定图 $\mathcal{G}=(\mathcal{V},\mathcal{E})$ ,用 $\nu$ 和 $\mathcal{E}$ 分别表示其节点和边。节点通过 $v$ 和 $u$ 进行索引,满足 $v,u\in\nu$ ,连接节点 $v$ 和 $u$ 的边记为 $(v,u)\in\mathcal{E}$ 。连通性通过邻接矩阵 $\mathbf{A}\in\mathbb{R}^{n\times n}$ 编码,其中 $n$ 为节点数量。 $p$ 表示宽度(隐藏层维度大小), $\ell$ 为层数。节点 $v$ 在第 $\ell$ 层的特征记为 h(vℓ+1)。


Figure 3. Examples of PANDA’s message passing mechanism. The size of the node indicates the size of hidden dimension.


图 3: PANDA消息传递机制示例。节点尺寸代表隐藏维度大小。

2.1. Message Passing Neural Networks

2.1. 消息传递神经网络 (Message Passing Neural Networks)

We consider the case where each node $v$ has a feature MPNNs iterative ly update node representations using the following equations:

我们考虑每个节点 $v$ 具有特征 的情况。MPNN通过以下公式迭代更新节点表示:

$$
\begin{array}{r}{\mathbf{m}{v}^{(\ell)}=\psi^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}(v)}),}\ {\mathbf{h}{v}^{(\ell+1)}=\phi^{(\ell)}(\mathbf{h}{v}^{(\ell)},\mathbf{m}_{v}^{(\ell)}).\qquad}\end{array}
$$

$$
\begin{array}{r}{\mathbf{m}{v}^{(\ell)}=\psi^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}(v)}),}\ {\mathbf{h}{v}^{(\ell+1)}=\phi^{(\ell)}(\mathbf{h}{v}^{(\ell)},\mathbf{m}_{v}^{(\ell)}).\qquad}\end{array}
$$

where m(vℓ) is the aggregated node feature of v’s neighbourhood. The aggregation function $\psi^{(\ell)}$ and the update function $\phi^{(\ell)}$ are learnable, and their different definitions result in different architectures (Kipf & Welling, 2017; Xu et al., 2019; Velickovic et al., 2018). In default, all nodes have the same hidden dimension size. We will introduce a method that enables message passing among nodes with different hidden dimensions in Section 3.

其中 m(vℓ) 表示节点 v 邻域的聚合特征。聚合函数 $\psi^{(\ell)}$ 和更新函数 $\phi^{(\ell)}$ 是可学习的,其不同定义会形成不同架构 (Kipf & Welling, 2017; Xu et al., 2019; Velickovic et al., 2018)。默认情况下所有节点具有相同的隐藏维度大小,我们将在第3节介绍实现不同隐藏维度节点间消息传递的方法。

2.2. Over-squashing Problem

2.2. 过度挤压问题

Initially identified by Alon & Yahav (2021), over-squashing has become a significant challenge in GNNs when dealing with long-range dependencies. It mainly occurs when the information aggregated from too many neighbours is squashed into a fixed-sized node feature vector, resulting in a substantial loss of information (Shi et al., 2023b). We will review the literature, including rewiring methods, to address this problem, in Section 6. We now describe the Jacobian sensitivity and effective resistance widely used to estimate the over-squashing. In Section 4, we justify our method in terms of the two metrics.

最初由Alon和Yahav (2021) 提出的过度挤压 (over-squashing) 问题,已成为图神经网络 (GNN) 处理长程依赖关系时的重大挑战。该问题主要发生在将过多邻居聚合的信息压缩至固定尺寸的节点特征向量时,导致信息严重丢失 (Shi et al., 2023b)。我们将在第6节回顾包括重连方法在内的相关文献以解决此问题。下面将介绍广泛用于评估过度挤压的雅可比敏感度 (Jacobian sensitivity) 和有效电阻 (effective resistance) 指标,并在第4节基于这两个指标论证本方法的合理性。

Sensitivity bound. Recent theoretical insights on GNNs have provided a formal understanding of over-squashing via the lens of the sensitivity analysis (Topping et al., 2022). The sensitivity bound theorem, as proposed by (Di Giovanni et al., 2023, Theorm 3.2), posits that the impact of oversquashing can be quantified by examining the following 3 factors: i) the Lipschitz continuity of the non-linearity in the model, ii) the width of the model, and iii) the topology of a graph. The sensitivity of $\mathbf{h}{v}^{(\ell)}$ to $\mathbf{h}_{u}^{(0)}$ , assuming that there exists a $\ell$ -path between nodes $v$ and $u$ , is bounded by

灵敏度界限。近期关于图神经网络 (GNN) 的理论研究通过灵敏度分析视角 (Topping et al., 2022) 对过度挤压现象提供了形式化理解。(Di Giovanni et al., 2023, Theorem 3.2) 提出的灵敏度界限定理指出,过度挤压的影响可通过以下三个因素量化:i) 模型中非线性函数的Lipschitz连续性,ii) 模型宽度,iii) 图拓扑结构。假设节点 $v$ 和 $u$ 之间存在 $\ell$ 长度的路径,则 $\mathbf{h}{v}^{(\ell)}$ 对 $\mathbf{h}_{u}^{(0)}$ 的灵敏度受限于

$$
\left|\frac{\partial\mathbf{h}{v}^{(\ell)}}{\partial\mathbf{h}{u}^{(0)}}\right|{1}\le(\underbrace{z w p}{\mathrm{model}})^{\ell}\underbrace{(\mathbf{S}^{\ell}){v u}}_{\mathrm{topology}}.
$$

$$
\left|\frac{\partial\mathbf{h}{v}^{(\ell)}}{\partial\mathbf{h}{u}^{(0)}}\right|{1}\le(\underbrace{z w p}{\mathrm{model}})^{\ell}\underbrace{(\mathbf{S}^{\ell}){v u}}_{\mathrm{topology}}.
$$

The impact of the model to over-squashing is bounded by the Lipschitz constant $z$ , the width $p$ , and the maximum weight $w$ , while the topology of an input graph affects the sensitivity by $\ell\cdot$ -th power of the graph shift matrix S. Hereinafter, S can be any (normalized) adjacency matrix (with self-loops).

模型对过度挤压的影响受限于Lipschitz常数$z$、宽度$p$和最大权重$w$,而输入图的拓扑结构通过图移位矩阵S的$\ell\cdot$次方影响敏感性。此后,S可以是任何(归一化)邻接矩阵(含自环)。

A small Jacobian norm indicates that the final representations of a node learned by a model are not much affected by the variations in its neighbouring inputs, implying that the network is suffered from over-squashing. This condition typically arises when the information has not been sufficiently propagated over the graph due to the excessive aggregation of information, i.e., compression into a small hidden vector, through several layers of message passing.

雅可比矩阵范数较小表明模型学习到的节点最终表征受其相邻输入变化的影响不大,这意味着网络存在过度挤压 (over-squashing) 问题。这种情况通常是由于信息在多层消息传递过程中被过度聚合 (即压缩为小隐藏向量) ,导致未能在图上充分传播所致。

Effective resistance and signal propagation. Derived from the field of electrical engineering, the effective resistance between two nodes $u$ and $v$ in an electrical network is defined as the potential difference induced across the edges when a unit current is injected at one of each end (Ghosh et al., 2008). An algebraic expression for the effective resistance is given in Appendix A. Rayleigh’s monotonic it y principle, which says that adding paths or shortening existing paths can only decrease the effective resistance between two nodes (Thomassen, 1990), leads to the following interpretation: more and shorter disjoint paths connecting the nodes $u$ and $v$ lead to a lower resistance between them (Black et al., 2023; Devriendt & Lambiotte, 2022). Therefore, edges with high effective resistance struggle to propagate information, making bottlenecks. The total effective resistance $R_{t o t}$ , the sum of the effective resistance among all pairs of nodes (see Eq. (20)), is a key measure for measuring the overall degree of over-squashing in a graph. It is known that the signal propagation of GNNs is inversely proportional to $R_{t o t}$ (Di Giovanni et al., 2023), which will be covered in detail in Section 4 with a comparison to our method.

有效电阻与信号传播。源自电气工程领域,有效电阻是指在一个电气网络中,当单位电流从节点$u$和$v$两端分别注入时,在边两端产生的电势差 (Ghosh et al., 2008)。附录A给出了有效电阻的代数表达式。根据Rayleigh单调性原理 (Thomassen, 1990),即增加路径或缩短现有路径只会降低两节点间的有效电阻,可以得出以下推论:连接节点$u$和$v$的越多且越短的不相交路径,会导致它们之间的电阻越低 (Black et al., 2023; Devriendt & Lambiotte, 2022)。因此,具有高有效电阻的边难以传播信息,形成瓶颈。总有效电阻$R_{tot}$(所有节点对间有效电阻之和,见式(20))是衡量图中过度挤压程度的关键指标。已知图神经网络(GNN)的信号传播与$R_{tot}$成反比 (Di Giovanni et al., 2023),第4节将详细讨论这一点并与我们的方法进行比较。

Over-smoothing problem. Over-smoothing is another well-known problem that reduces the expressiveness of GNNs in deeper layers, resulting in a loss of disc rim i native power in the representation of node features (Li et al., 2018; Nt & Maehara, 2019; Oono & Suzuki, 2020). Rewiring methods to address the over-squashing problem focus on improving graph connectivity to ease the signal flow. However, one limitation of the rewiring method is that adding too many edges potentially leads to the over-smoothing issue (Karhadkar et al., 2023). To explore this trade-off between over-squashing and over-smoothing, FoSR (Karhadkar et al., 2023) analyzes the change in Dirichlet energy upon adding edges. Their results demonstrate that rewiring the graph to enhance information propagation leads to the over-smoothing problem.

过度平滑问题。过度平滑是另一个众所周知的问题,它会降低深层图神经网络 (GNN) 的表达能力,导致节点特征表示中的判别力下降 (Li et al., 2018; Nt & Maehara, 2019; Oono & Suzuki, 2020)。针对过度挤压问题的重布线方法主要关注改善图连接性以促进信号流动。然而,重布线方法的一个局限是添加过多边可能导致过度平滑问题 (Karhadkar et al., 2023)。为了探究过度挤压与过度平滑之间的权衡关系,FoSR (Karhadkar et al., 2023) 分析了添加边时狄利克雷能量的变化。他们的结果表明,通过重布线增强信息传播会导致过度平滑问题。

3. Proposed Method

3. 提出的方法

We first introduce our motivation and design rationale. We then propose our PANDA message-passing framework.

我们首先介绍研究动机和设计原理,随后提出PANDA消息传递框架。

3.1. Motivation and Design Rationale

3.1. 动机与设计原理

Sensitivity bound. Our design rationale is anchored in the insight derived from the sensitivity bound theorem (Di Giovanni et al., 2023, Theorem 3.2). Given the sensitivity bound in Eq. (3), we consider two scenarios: one with a standard width $p$ and the other with an increased width $p_{\mathrm{high}}$ , where $p_{\mathrm{high}}>p$ . For nodes with the increased width, the sensitivity bound is improved, demonstrating a potentially higher sensitivity to input features. Regarding this, we provide a Proposition B.1 in Appendix B and show empirical results well aligned with our argument in Section 4. We do not increase the width of all nodes, but selectively.

灵敏度边界。我们的设计理念基于灵敏度边界定理 (Di Giovanni et al., 2023, Theorem 3.2) 的洞见。给定方程(3)中的灵敏度边界,我们考虑两种场景:一种是标准宽度 $p$,另一种是增加后的宽度 $p_{\mathrm{high}}$,其中 $p_{\mathrm{high}}>p$。对于宽度增加的节点,其灵敏度边界得到提升,表明对输入特征可能具有更高的灵敏度。关于这一点,我们在附录B中给出了命题B.1,并在第4节展示了与我们的论点高度吻合的实验结果。我们并非增加所有节点的宽度,而是进行选择性调整。

Design rationale for selecting node centrality. We consider node centrality metrics to identify a node as a potential source of the bottleneck in a graph. Betweenness centrality has been considered as a significant indicator of the bottleneck, which measures the number of shortest paths going through a node (Freeman, 1977). The more frequently a node appears on the shortest paths of pairs of different nodes, the more likely it is to cause the bottleneck (Topping et al., 2022, Definition 9.). However, a hub-like node with inter-community edges also has eccentric properties that are different from those of the high betweenness centrality node, such as a high degree. We apply various centrality metrics in our framework to understand how each is related to the over-squashing problem. Appendix C describes the central i ties we consider.

选择节点中心性的设计原理。我们考虑使用节点中心性指标来识别图中潜在的瓶颈源。介数中心性 (betweenness centrality) 被视为瓶颈的重要指标,它衡量通过某节点的最短路径数量 (Freeman, 1977)。一个节点在不同节点对的最短路径上出现的频率越高,就越可能引发瓶颈 (Topping et al., 2022, Definition 9.)。然而,具有社群间边的枢纽型节点也表现出与高介数中心性节点不同的特性,例如高度数。我们在框架中应用多种中心性指标,以理解每种指标与过度挤压问题的关联。附录C描述了所考虑的各类中心性指标。

3.2. Expanded Width-Aware Message Passing

3.2. 扩展宽度感知消息传递

Node expansion criteria. In our proposed method, we stratify nodes into two categories: low-dimensional nodes $\textcircled{4}$ and high-dimensional nodes $\textcircled{4}$ (see Fig. 4). This stratification is based on a specific centrality that quantifies the importance of each node. The centrality, denoted as $C(\mathcal G)$ , can be one of the following: degree (Borgatti & Halgin, 2011), betweenness (Freeman, 1977), closeness (Freeman, 1978), PageRank (Page et al., 1999), or load centrality (Goh et al., 2001). We represent the centrality values for all nodes as a vector $\mathbf{c}\in\mathbb{R}^{n}$ and then sort the elements of c in descending order. Based on the sorted c, we select the top $k$ nodes, resulting in a binary mask vector, $\mathbf{b}\in{0,1}^{n}$ :

节点扩展标准。在我们提出的方法中,我们将节点分为两类:低维节点 $\textcircled{4}$ 和高维节点 $\textcircled{4}$(见图4)。这种分层基于一种特定的中心性度量,用于量化每个节点的重要性。该中心性记为 $C(\mathcal G)$,可以是以下之一:度中心性 (degree) (Borgatti & Halgin, 2011)、介数中心性 (betweenness) (Freeman, 1977)、接近中心性 (closeness) (Freeman, 1978)、PageRank (Page et al., 1999) 或负载中心性 (load centrality) (Goh et al., 2001)。我们将所有节点的中心性值表示为向量 $\mathbf{c}\in\mathbb{R}^{n}$,然后按降序对c的元素进行排序。基于排序后的c,我们选择前$k$个节点,生成一个二元掩码向量 $\mathbf{b}\in{0,1}^{n}$:


Figure 4. Our proposed PANDA message passing framework. First, we selectively expand widths (i.e., hidden dimension sizes) according to a centrality in $\mathcal{G}$ , and our PANDA message passing enables signal propagation among nodes with different widths $\bigcirc$ low-dimensional nodes and $\bigcirc$ high-dimensional nodes).

图 4: 我们提出的PANDA消息传递框架。首先,我们根据 $\mathcal{G}$ 中的中心性选择性扩展宽度(即隐藏维度大小),PANDA消息传递实现了不同宽度节点间的信号传播( $\bigcirc$ 低维节点和 $\bigcirc$ 高维节点)。

Define subsets of neighbourhoods. Based on a centrality measure, we define four subsets of neighbouring nodes. For a node $v$ classified as low-dimensional, denoted by $\mathbf{b}_{v}=0$ :

定义邻域子集。基于中心性度量,我们定义了四个相邻节点的子集。对于分类为低维的节点 $v$ ,记为 $\mathbf{b}_{v}=0$ :

$$
\begin{array}{r}{\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}{u}=0},}\ {\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}_{u}=1}.}\end{array}
$$

$$
\begin{array}{r}{\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}{u}=0},}\ {\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}_{u}=1}.}\end{array}
$$

In addition, for a node $v$ classified as high-dimensional:

此外,对于被分类为高维的节点 $v$:

$$
\begin{array}{r}{\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}{u}=1},}\ {\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}_{u}=0}.}\end{array}
$$

$$
\begin{array}{r}{\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}{u}=1},}\ {\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}_{u}=0}.}\end{array}
$$

The comprehensive union of these subsets across all nodes in $\nu$ encompasses the entire edge set $\mathcal{E}$ , thereby preserving the original graph structure.

$\nu$ 中所有节点子集的全面并集涵盖了整个边集 $\mathcal{E}$,从而保留了原始图结构。

A new framework: PANDA message passing. The main contribution of our framework is that, given any MPNNs, we can make it work with different widths for nodes. To this end, we replace the standard neighbour aggregator $\psi^{(\ell)}$ with 4 different aggregator s, each applied to the corresponding type of neighbours: ψ(↔ℓ), ψ(⇔ℓ), ψ(↠ℓ), and ψ(↣ℓ). Then, we use a common update function for all types. The final message passed from layer $\ell$ to layer $\ell+1$ is defined as follows:

新框架:PANDA消息传递。我们框架的主要贡献在于,给定任何MPNN,我们都能使其适应不同宽度的节点。为此,我们用4种不同的聚合器替代标准邻居聚合器$\psi^{(\ell)}$,分别应用于对应类型的邻居:ψ(↔ℓ)、ψ(⇔ℓ)、ψ(↠ℓ)和ψ(↣ℓ)。然后,我们对所有类型使用通用的更新函数。从层$\ell$传递到层$\ell+1$的最终消息定义如下:

$$
\begin{array}{r l}&{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)}),}\ &{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\Leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\rightarrow}^{(\ell)}({f(\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)}),}\ &{\mathbf{h}{v}^{(\ell+1)}=\phi(\mathbf{h}{v}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\rightarrow}^{(\ell)},\mathbf{m}_{v,\rightarrow}^{(\ell)}).}\end{array}
$$

$$
\begin{array}{r l}&{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)}),}\ &{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\Leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\rightarrow}^{(\ell)}({f(\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)}),}\ &{\mathbf{h}{v}^{(\ell+1)}=\phi(\mathbf{h}{v}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\rightarrow}^{(\ell)},\mathbf{m}_{v,\rightarrow}^{(\ell)}).}\end{array}
$$

Each aggregator $\psi_{*}^{(\ell)}$ is tailored to handle a specific type of interactions among nodes. In Eq. (13), the hidden vector of a node $v$ is updated from the four messages. For instance, $\psi_{\rightarrow}^{(\ell)}(\cdot)$ and $\psi_{\mapsto}^{(\ell)}(\cdot)$ are designed for interactions with different widths by using functions $f(\cdot)$ and $g(\cdot)$ .

每个聚合器 $\psi_{*}^{(\ell)}$ 都专门用于处理节点间特定类型的交互。在公式(13)中,节点 $v$ 的隐藏向量通过四条消息进行更新。例如,$\psi_{\rightarrow}^{(\ell)}(\cdot)$ 和 $\psi_{\mapsto}^{(\ell)}(\cdot)$ 通过函数 $f(\cdot)$ 和 $g(\cdot)$ 被设计用于处理不同宽度的交互。

$f(\cdot)$ is a linear transformation that projects node features to the expanded dimensional space and is defined as:

$f(\cdot)$ 是一个将节点特征投影到扩展维度空间的线性变换,其定义为:

$$
f(\mathbf{h}{u}^{(\ell)}):=\mathbf{W}{f}^{(\ell)}\mathbf{h}_{u}^{(\ell)},
$$

$$
f(\mathbf{h}{u}^{(\ell)}):=\mathbf{W}{f}^{(\ell)}\mathbf{h}_{u}^{(\ell)},
$$

where W(ℓ) $\mathbf{W}{f}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p}$ . For $\psi_{\mapsto}^{(\ell)}(\cdot)$ , we define $g(\cdot)$ as adimension selector as follows:

其中 W(ℓ) $\mathbf{W}{f}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p}$。对于 $\psi_{\mapsto}^{(\ell)}(\cdot)$,我们将 $g(\cdot)$ 定义为维度选择器如下:

$$
\begin{array}{r l}&{g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):=\mathrm{gather}\big(\mathbf{h}{u}^{(\ell)},\mathrm{topk}(\mathbf{s},p)\big),}\ &{\qquad\mathbf{s}=\mathrm{softmax}\left(\sigma\big(\eta(\mathbf{h}{u}^{(\ell)}\oplus\mathbf{h}_{v}^{(\ell)})\big)\right),}\end{array}
$$

$$
\begin{array}{r l}&{g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):=\mathrm{gather}\big(\mathbf{h}{u}^{(\ell)},\mathrm{topk}(\mathbf{s},p)\big),}\ &{\qquad\mathbf{s}=\mathrm{softmax}\left(\sigma\big(\eta(\mathbf{h}{u}^{(\ell)}\oplus\mathbf{h}_{v}^{(\ell)})\big)\right),}\end{array}
$$

where $\eta$ is a non-linear neural network that computes a score vector $\mathbf{s}\in\mathbb{R}^{p_{\mathrm{high}}}$ that gives the relative significance of each dimension for each message. This score vector is then used to select the top $p$ dimensions for the message aggregation.

其中 $\eta$ 是一个非线性神经网络,用于计算得分向量 $\mathbf{s}\in\mathbb{R}^{p_{\mathrm{high}}}$ ,该向量表示每条消息各维度的重要性权重。随后利用该得分向量筛选出前 $p$ 个维度进行消息聚合。

In our PANDA, it is important to emphasize that nodes with low and high-dimensional features are processed in their respective dimensional spaces throughout the network layers. This means that until the final layer, these nodes maintain their distinct dimensional i ties.

在我们的PANDA中,需要重点强调的是,具有低维和高维特征的节点在整个网络层中都在各自的维度空间中进行处理。这意味着在到达最终层之前,这些节点始终保持其独特的维度特性。

Instance of our framework. To better understand our framework, we show how to integrate PANDA into two classical MPNNs: GCN (Kipf & Welling, 2017) and GIN (Xu et al., 2019). We will use these variations for our experiments. For the PANDA-GCN variant, the layer-update is expressed as follows for low and high-dimensional nodes. First, if the node $v$ is low-dimensional:

我们框架的实例。为了更好地理解我们的框架,我们展示了如何将PANDA集成到两种经典MPNN中:GCN (Kipf & Welling, 2017) 和 GIN (Xu et al., 2019)。我们将在实验中使用这些变体。对于PANDA-GCN变体,低维和高维节点的层更新表示如下。首先,如果节点$v$是低维的:

$$
\begin{array}{r}{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{low}}\leftrightarrow\mathrm{low}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ {\displaystyle+\sum_{u\in\mathcal{N}{\mathrm{low}}\to\mathrm{high}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}g(\mathbf{h}{v}^{(\ell)},\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$

$$
\begin{array}{r}{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{low}}\leftrightarrow\mathrm{low}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ {\displaystyle+\sum_{u\in\mathcal{N}{\mathrm{low}}\to\mathrm{high}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}g(\mathbf{h}{v}^{(\ell)},\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$

then if node $v$ is high-dimensional:

若节点 $v$ 为高维:

$$
\begin{array}{r l}&{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ &{\quad\quad\quad\quad\quad\quad\quad+\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{low}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}f(\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$

$$
\begin{array}{r l}&{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ &{\quad\quad\quad\quad\quad\quad\quad+\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{low}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}f(\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$

where σ is a ReLU activation function, Wl(oℓ)w $\mathbf{W}{\mathrm{low}}^{(\ell)}\in\mathbb{R}^{p\times p}$ and W(ℓ) $\mathbf{W}{\mathrm{high}}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p_{\mathrm{high}}}$ are the weight matrices. The messages are normalized by their degrees, $d_{v}$ and $d_{u}$ . For the PANDA-GIN variant, we describe the layer-update in Appendix D.