PANDA: Expanded Width-Aware Message Passing Beyond Rewiring
PANDA: 超越重连的扩展宽度感知消息传递
Jeongwhan Choi 1 Sumin Park 2 Hyowon Wi 1 Sung-Bae Cho 1 Noseong Park 3
Abstract
摘要
Recent research in the field of graph neural network (GNN) has identified a critical issue known as “over-squashing,” resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagation. However, such approaches inevitably deteriorate the original graph topology, which may lead to a distortion of information flow. To address this, we introduce an expanded width-aware (PANDA) message passing, a new message passing paradigm where nodes with high centrality, a potential source of over-squashing, are selectively expanded in width to encapsulate the growing influx of signals from distant nodes. Experimental results show that our method outperforms existing rewiring methods, suggesting that selectively expanding the hidden state of nodes can be a compelling alternative to graph rewiring for addressing the over-squashing.
图神经网络 (GNN) 领域的最新研究发现了一个关键问题,即由于图结构中的瓶颈现象导致的"过度挤压" (over-squashing),这会阻碍远程信息的传播。先前的研究提出了各种图重连 (graph rewiring) 概念,旨在优化图的空间或频谱特性以促进信号传播。然而,这些方法不可避免地会破坏原始图拓扑结构,可能导致信息流失真。为了解决这个问题,我们引入了一种扩展宽度感知 (PANDA) 消息传递机制,这是一种新的消息传递范式,其中具有高中心性 (可能成为过度挤压源) 的节点会被选择性地扩展宽度,以封装来自远处节点不断增长的信号流。实验结果表明,我们的方法优于现有的重连方法,这表明选择性地扩展节点的隐藏状态可以成为解决过度挤压问题的图重连方法的有效替代方案。
1. Introduction
1. 引言
Graph Neural Networks (GNNs) have emerged as a powerful tool for graph data processing and are being studied extensively in various domains, as evidenced by significant research contributions (Defferrard et al., 2016; Velickovic et al., 2018; Chen et al., 2020; Chien et al., 2021; Choi et al., 2021; Chamberlain et al., 2021; Hwang et al., 2021; Choi & Park, 2023; Choi et al., 2023; Kang et al., 2023; Gru- ber et al., 2023). A key subclass within GNNs is Message Passing Neural Networks (MPNNs), which excel in propagating information through neighbouring nodes. As MPNNs propagate only for a 1-hop distance within a single layer, long-range interactions can only be captured when the depth of the network is comparable to the distance. However, increasing the number of layers for capturing long-range dependencies results in exponential growths of the receptive field and excessive aggregations of repeated messages that are subsequently packed into a feature vector of fixed length, which causes the problem called “over-squashing” (Alon & Yahav, 2021; Shi et al., 2023b). This phenomenon severely decreases the expressivity of the networks and thus impairs the model performance. Previous studies have attempted to identify topological bottlenecks in a graph and directly modify graph connectivity with various optimization targets, including curvature, effective resistance, and spectral gap (Alon & Yahav, 2021; Topping et al., 2022; Banerjee et al., 2022; Deac et al., 2022; Arnaiz-Rodriguez et al.,2022; Karhadkar et al., 2023; Black et al., 2023; Fesser & Weber, 2023; Sonthalia et al., 2023; Giraldo et al., 2023; Shi et al., 2023a; Barbero et al., 2023).
图神经网络 (Graph Neural Networks, GNNs) 已成为图数据处理的强大工具,并在多个领域得到广泛研究 (Defferrard et al., 2016; Velickovic et al., 2018; Chen et al., 2020; Chien et al., 2021; Choi et al., 2021; Chamberlain et al., 2021; Hwang et al., 2021; Choi & Park, 2023; Choi et al., 2023; Kang et al., 2023; Gruber et al., 2023)。GNNs 中的一个重要子类是消息传递神经网络 (Message Passing Neural Networks, MPNNs),它擅长通过相邻节点传播信息。由于 MPNNs 在单层中仅传播 1 跳距离,因此只有当网络深度与距离相当时才能捕获长程交互。然而,增加层数以捕获长程依赖会导致感受野呈指数级增长,并过度聚合重复消息,这些消息随后被压缩到固定长度的特征向量中,从而引发称为“过度挤压 (over-squashing)”的问题 (Alon & Yahav, 2021; Shi et al., 2023b)。这一现象严重降低了网络的表达能力,从而损害模型性能。先前的研究试图识别图中的拓扑瓶颈,并通过各种优化目标(包括曲率、有效电阻和谱间隙)直接修改图连接性 (Alon & Yahav, 2021; Topping et al., 2022; Banerjee et al., 2022; Deac et al., 2022; Arnaiz-Rodriguez et al., 2022; Karhadkar et al., 2023; Black et al., 2023; Fesser & Weber, 2023; Sonthalia et al., 2023; Giraldo et al., 2023; Shi et al., 2023a; Barbero et al., 2023)。

Figure 1. Potential pitfalls of rewiring in domain-specific graphs: (a) In a molecular graph, rewiring the edge in red to a benzene ring violates the domain knowledge. (b) In a social graph, connecting a user to his/her enemy may lead to totally different meaning.
图 1: 领域特定图中重连的潜在陷阱: (a) 在分子图中, 将红色边重连至苯环会违反领域知识。 (b) 在社交图中, 将用户与其敌人建立连接可能导致完全不同的含义。

Figure 2. Comparison of resistance correlation and mean accuracy across different methods for GCN. A large negative correlation reflects that a higher total effective resistance is associated with reduced signal propagation (see Section 4).
图 2: GCN不同方法下电阻相关性与平均准确率的对比。较大的负相关性表明较高的总有效电阻会导致信号传播减弱 (详见第4节)。
Motivation. Existing rewiring methods that alter the graph topology to resolve over-squashing, while potentially beneficial, can inadvertently introduce inaccuracies within domain-specific contexts. Fig. 1(a) demonstrates how modifying the molecular graph of a benzene ring could contradict chemical principles, while Fig. 1(b) shows that rewiring in social networks could result in the distortion of underlying community structures. These examples highlight the necessity to preserve the original graph structure to maintain the validity of domain-specific semantic information.
动机。现有的重连方法通过改变图拓扑来解决过度挤压问题,虽然可能有益,但在特定领域情境中可能无意间引入不准确性。图 1(a) 展示了修改苯环分子图可能违背化学原理的情况,而图 1(b) 则表明社交网络中的重连可能导致底层社区结构扭曲。这些例子凸显了保留原始图结构以维持特定领域语义信息有效性的必要性。
Prior works have tried to identify various factors associated with the over-squashing through the lens of sensitivity analysis of Jacobian of node features (Topping et al., 2022; Di Giovanni et al., 2023). As one of those trials, Di Giovanni et al. (2023), an insightful analytical paper, provides a theoretical justification that increasing the width of the model (i.e., hidden dimension) can also contribute to improving the sensitivity of the model. However, it points out a major concern regarding expanding the width of the model to address over-squashing; increasing the hidden dimension globally affects the whole networks at the expense of the generalization capacity of the model.
先前的研究试图通过节点特征雅可比矩阵 (Jacobian) 的敏感性分析 (Topping et al., 2022; Di Giovanni et al., 2023) 来识别与过度挤压 (over-squashing) 相关的各种因素。作为其中一项探索,Di Giovanni et al. (2023) 这篇富有洞察力的分析论文从理论上证明,增加模型宽度 (即隐藏维度) 也有助于提升模型敏感性。但该研究同时指出,通过扩展模型宽度来解决过度挤压问题存在一个主要缺陷:全局增加隐藏维度会影响整个网络,并以牺牲模型泛化能力为代价。
Inspired by the limitation of existing rewiring methods and the promising role of width expansion for addressing the over-squashing, we aim to design a novel message passing paradigm that mitigates the over-squashing by targeting the nodes that are mainly involved in creating bottlenecks in the signal flow and selectively expanding their width without the risk to change the underlying graph topology.
受现有重连方法局限性的启发,以及宽度扩展在解决过度挤压问题中的潜在作用,我们旨在设计一种新颖的消息传递范式。该范式通过针对信号流中主要形成瓶颈的节点,选择性地扩展其宽度而不改变底层图拓扑结构,从而缓解过度挤压问题。
Main idea. It is possible to define graph bottlenecks as nodes with high centrality, such as betweenness centrality (Yu et al., 2007; Topping et al., 2022). Nodes with high centrality can show hub-like behavior in that they connect to a relatively larger set of nodes lying on the different parts of the graph, and thus receive excessive information from them. Increasing the width of the feature vector of these nodes provides more space for them to process information flowing from other nodes. Then, we can take advantage of their enhanced capacities to alleviate over-squashing. In order to do this, it requires exchanging messages between nodes with different hidden dimensions. This raises a natural question:
主要思想。可以将图瓶颈定义为具有高中心性的节点,例如介数中心性 (Yu et al., 2007; Topping et al., 2022)。高中心性节点会表现出枢纽般的行为,即它们连接到位于图中不同部分的相对较多的节点,从而从这些节点接收过多的信息。增加这些节点的特征向量宽度,可以为它们处理来自其他节点的信息提供更多空间。然后,我们可以利用它们增强的能力来缓解过度挤压问题。为了实现这一点,需要在具有不同隐藏维度的节点之间交换信息。这就引出了一个自然的问题:
“Is it possible to design a message passing that enables information exchange between nodes with different width only in a way that mitigates over-squashing without rewiring?”
能否设计一种消息传递机制,使得仅在不同宽度的节点之间以缓解过压缩(over-squashing)的方式交换信息,而无需重新连接?
We propose a new message passing framework that addresses the above question. First, the entire node set is divided into two subsets, expanded and non-expanded nodes, which are colored in red $\textcircled{4}$ and blue $\textcircled{4}$ , respectively in Fig. 3. Then, we classify four different edge types depending on the expansion state of the nodes at source (edge tail) and target (edge head) position: low-to-low, high-to-high, low-to-high, and high-to-low. Each edge type requires a distinct message passing scheme.
我们提出了一种新的消息传递框架来解决上述问题。首先,将整个节点集划分为两个子集:扩展节点和非扩展节点,在图3中分别用红色$\textcircled{4}$和蓝色$\textcircled{4}$表示。然后,根据源位置(边尾)和目标位置(边头)节点的扩展状态,将边分为四种不同类型:低到低、高到高、低到高以及高到低。每种边类型都需要采用不同的消息传递方案。
The standard message passing between nodes with equal widths corresponds to the conventional message passing (see Figs. 3(b) and 3(c)). For high-to-low message passing (see Fig. 3(d)), a low-dimensional node selects and receives a subset of the information sent by a high-dimensional node. The low-to-high message passing involves augmenting the hidden vectors of low-dimensional nodes to match the higher dimension size for propagation (see Fig. 3(e)).
节点间等宽度的标准消息传递对应传统消息传递(见图3(b)和图3(c))。对于高维到低维的消息传递(见图3(d)),低维节点会筛选并接收高维节点发送的信息子集。而低维到高维的消息传递则需要将低维节点的隐藏向量扩展至高维尺寸进行传播(见图3(e))。
Our framework can potentially enhance the capacity of nodes in the perspective of signal propagation. As shown in Fig. 2, our framework maintains constant signal propagation w.r.t. effective resistance and simultaneously improves the accuracy compared to the existing rewiring methods. We use the correlation coefficient to quantify how signal propagation decreases as effective resistance increases.
我们的框架有望从信号传播的角度提升节点能力。如图 2 所示,相较于现有的重连方法,我们的框架在保持有效电阻 (effective resistance) 恒定信号传播的同时提高了准确性。我们使用相关系数来量化信号传播如何随着有效电阻的增加而减弱。
Contributions and outline. We introduce an expanded width-aware message passing (PANDA)1, a novel paradigm for message passing with expanded widths for nodes that are potentially bottlenecks. Our contributions can be summarized as follows:
贡献与概要。我们提出了一种扩展宽度感知的消息传递方法 (PANDA),这是一种针对潜在瓶颈节点采用扩展宽度进行消息传递的新范式。我们的贡献可概括如下:
• In Section 3, we propose PANDA that enables the signal propagation across the nodes of different widths. • In Section 4, we discuss how our PANDA can alleviate over-squashing in terms of various perspectives. Verifying a higher feature sensitivity compared to other models, we show PANDA can overcome topological bottleneck, maintaining consistent signal propagation even under large effective resistance. Finally, we show that our method solves the limitation of existing rewiring methods, e.g., over-smoothing.
- 在第3节中,我们提出了PANDA,实现了不同宽度节点间的信号传播。
- 在第4节中,我们从多个角度讨论了PANDA如何缓解过度挤压问题。通过验证其相比其他模型具有更高的特征敏感性,我们证明PANDA能够克服拓扑瓶颈,即使在高有效电阻下也能保持稳定的信号传播。最后,我们展示了该方法解决了现有重布线方法(如过度平滑)的局限性。
• In Section 5, we empirically demonstrate that our PANDA outperforms existing rewiring methods.
• 在第5节中,我们通过实验证明PANDA优于现有的重连方法。
2. Preliminaries
2. 预备知识
We first examine the notation used in this paper. Given a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , we use $\nu$ and $\mathcal{E}$ to denote its nodes and edges, respectively. The nodes are indexed by $v$ and $u$ such that $v,u\in\nu$ , and an edge connecting nodes $v$ and $u$ is denoted by $(v,u)\in\mathcal{E}$ . The connectivity is encoded in the adjacency matrix $\mathbf{A}\in\mathbb{R}^{n\times n}$ where $n$ is the number of nodes. $p$ denotes the width (hidden dimension size), while $\ell$ is the number of layers. The feature of node $v$ at layer $\ell$ is written as h(vℓ+1).
我们首先介绍本文使用的符号表示。给定图 $\mathcal{G}=(\mathcal{V},\mathcal{E})$ ,用 $\nu$ 和 $\mathcal{E}$ 分别表示其节点和边。节点通过 $v$ 和 $u$ 进行索引,满足 $v,u\in\nu$ ,连接节点 $v$ 和 $u$ 的边记为 $(v,u)\in\mathcal{E}$ 。连通性通过邻接矩阵 $\mathbf{A}\in\mathbb{R}^{n\times n}$ 编码,其中 $n$ 为节点数量。 $p$ 表示宽度(隐藏层维度大小), $\ell$ 为层数。节点 $v$ 在第 $\ell$ 层的特征记为 h(vℓ+1)。

Figure 3. Examples of PANDA’s message passing mechanism. The size of the node indicates the size of hidden dimension.

图 3: PANDA消息传递机制示例。节点尺寸代表隐藏维度大小。
2.1. Message Passing Neural Networks
2.1. 消息传递神经网络 (Message Passing Neural Networks)
We consider the case where each node $v$ has a feature MPNNs iterative ly update node representations using the following equations:
我们考虑每个节点 $v$ 具有特征 的情况。MPNN通过以下公式迭代更新节点表示:
$$
\begin{array}{r}{\mathbf{m}{v}^{(\ell)}=\psi^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}(v)}),}\ {\mathbf{h}{v}^{(\ell+1)}=\phi^{(\ell)}(\mathbf{h}{v}^{(\ell)},\mathbf{m}_{v}^{(\ell)}).\qquad}\end{array}
$$
$$
\begin{array}{r}{\mathbf{m}{v}^{(\ell)}=\psi^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}(v)}),}\ {\mathbf{h}{v}^{(\ell+1)}=\phi^{(\ell)}(\mathbf{h}{v}^{(\ell)},\mathbf{m}_{v}^{(\ell)}).\qquad}\end{array}
$$
where m(vℓ) is the aggregated node feature of v’s neighbourhood. The aggregation function $\psi^{(\ell)}$ and the update function $\phi^{(\ell)}$ are learnable, and their different definitions result in different architectures (Kipf & Welling, 2017; Xu et al., 2019; Velickovic et al., 2018). In default, all nodes have the same hidden dimension size. We will introduce a method that enables message passing among nodes with different hidden dimensions in Section 3.
其中 m(vℓ) 表示节点 v 邻域的聚合特征。聚合函数 $\psi^{(\ell)}$ 和更新函数 $\phi^{(\ell)}$ 是可学习的,其不同定义会形成不同架构 (Kipf & Welling, 2017; Xu et al., 2019; Velickovic et al., 2018)。默认情况下所有节点具有相同的隐藏维度大小,我们将在第3节介绍实现不同隐藏维度节点间消息传递的方法。
2.2. Over-squashing Problem
2.2. 过度挤压问题
Initially identified by Alon & Yahav (2021), over-squashing has become a significant challenge in GNNs when dealing with long-range dependencies. It mainly occurs when the information aggregated from too many neighbours is squashed into a fixed-sized node feature vector, resulting in a substantial loss of information (Shi et al., 2023b). We will review the literature, including rewiring methods, to address this problem, in Section 6. We now describe the Jacobian sensitivity and effective resistance widely used to estimate the over-squashing. In Section 4, we justify our method in terms of the two metrics.
最初由Alon和Yahav (2021) 提出的过度挤压 (over-squashing) 问题,已成为图神经网络 (GNN) 处理长程依赖关系时的重大挑战。该问题主要发生在将过多邻居聚合的信息压缩至固定尺寸的节点特征向量时,导致信息严重丢失 (Shi et al., 2023b)。我们将在第6节回顾包括重连方法在内的相关文献以解决此问题。下面将介绍广泛用于评估过度挤压的雅可比敏感度 (Jacobian sensitivity) 和有效电阻 (effective resistance) 指标,并在第4节基于这两个指标论证本方法的合理性。
Sensitivity bound. Recent theoretical insights on GNNs have provided a formal understanding of over-squashing via the lens of the sensitivity analysis (Topping et al., 2022). The sensitivity bound theorem, as proposed by (Di Giovanni et al., 2023, Theorm 3.2), posits that the impact of oversquashing can be quantified by examining the following 3 factors: i) the Lipschitz continuity of the non-linearity in the model, ii) the width of the model, and iii) the topology of a graph. The sensitivity of $\mathbf{h}{v}^{(\ell)}$ to $\mathbf{h}_{u}^{(0)}$ , assuming that there exists a $\ell$ -path between nodes $v$ and $u$ , is bounded by
灵敏度界限。近期关于图神经网络 (GNN) 的理论研究通过灵敏度分析视角 (Topping et al., 2022) 对过度挤压现象提供了形式化理解。(Di Giovanni et al., 2023, Theorem 3.2) 提出的灵敏度界限定理指出,过度挤压的影响可通过以下三个因素量化:i) 模型中非线性函数的Lipschitz连续性,ii) 模型宽度,iii) 图拓扑结构。假设节点 $v$ 和 $u$ 之间存在 $\ell$ 长度的路径,则 $\mathbf{h}{v}^{(\ell)}$ 对 $\mathbf{h}_{u}^{(0)}$ 的灵敏度受限于
$$
\left|\frac{\partial\mathbf{h}{v}^{(\ell)}}{\partial\mathbf{h}{u}^{(0)}}\right|{1}\le(\underbrace{z w p}{\mathrm{model}})^{\ell}\underbrace{(\mathbf{S}^{\ell}){v u}}_{\mathrm{topology}}.
$$
$$
\left|\frac{\partial\mathbf{h}{v}^{(\ell)}}{\partial\mathbf{h}{u}^{(0)}}\right|{1}\le(\underbrace{z w p}{\mathrm{model}})^{\ell}\underbrace{(\mathbf{S}^{\ell}){v u}}_{\mathrm{topology}}.
$$
The impact of the model to over-squashing is bounded by the Lipschitz constant $z$ , the width $p$ , and the maximum weight $w$ , while the topology of an input graph affects the sensitivity by $\ell\cdot$ -th power of the graph shift matrix S. Hereinafter, S can be any (normalized) adjacency matrix (with self-loops).
模型对过度挤压的影响受限于Lipschitz常数$z$、宽度$p$和最大权重$w$,而输入图的拓扑结构通过图移位矩阵S的$\ell\cdot$次方影响敏感性。此后,S可以是任何(归一化)邻接矩阵(含自环)。
A small Jacobian norm indicates that the final representations of a node learned by a model are not much affected by the variations in its neighbouring inputs, implying that the network is suffered from over-squashing. This condition typically arises when the information has not been sufficiently propagated over the graph due to the excessive aggregation of information, i.e., compression into a small hidden vector, through several layers of message passing.
雅可比矩阵范数较小表明模型学习到的节点最终表征受其相邻输入变化的影响不大,这意味着网络存在过度挤压 (over-squashing) 问题。这种情况通常是由于信息在多层消息传递过程中被过度聚合 (即压缩为小隐藏向量) ,导致未能在图上充分传播所致。
Effective resistance and signal propagation. Derived from the field of electrical engineering, the effective resistance between two nodes $u$ and $v$ in an electrical network is defined as the potential difference induced across the edges when a unit current is injected at one of each end (Ghosh et al., 2008). An algebraic expression for the effective resistance is given in Appendix A. Rayleigh’s monotonic it y principle, which says that adding paths or shortening existing paths can only decrease the effective resistance between two nodes (Thomassen, 1990), leads to the following interpretation: more and shorter disjoint paths connecting the nodes $u$ and $v$ lead to a lower resistance between them (Black et al., 2023; Devriendt & Lambiotte, 2022). Therefore, edges with high effective resistance struggle to propagate information, making bottlenecks. The total effective resistance $R_{t o t}$ , the sum of the effective resistance among all pairs of nodes (see Eq. (20)), is a key measure for measuring the overall degree of over-squashing in a graph. It is known that the signal propagation of GNNs is inversely proportional to $R_{t o t}$ (Di Giovanni et al., 2023), which will be covered in detail in Section 4 with a comparison to our method.
有效电阻与信号传播。源自电气工程领域,有效电阻是指在一个电气网络中,当单位电流从节点$u$和$v$两端分别注入时,在边两端产生的电势差 (Ghosh et al., 2008)。附录A给出了有效电阻的代数表达式。根据Rayleigh单调性原理 (Thomassen, 1990),即增加路径或缩短现有路径只会降低两节点间的有效电阻,可以得出以下推论:连接节点$u$和$v$的越多且越短的不相交路径,会导致它们之间的电阻越低 (Black et al., 2023; Devriendt & Lambiotte, 2022)。因此,具有高有效电阻的边难以传播信息,形成瓶颈。总有效电阻$R_{tot}$(所有节点对间有效电阻之和,见式(20))是衡量图中过度挤压程度的关键指标。已知图神经网络(GNN)的信号传播与$R_{tot}$成反比 (Di Giovanni et al., 2023),第4节将详细讨论这一点并与我们的方法进行比较。
Over-smoothing problem. Over-smoothing is another well-known problem that reduces the expressiveness of GNNs in deeper layers, resulting in a loss of disc rim i native power in the representation of node features (Li et al., 2018; Nt & Maehara, 2019; Oono & Suzuki, 2020). Rewiring methods to address the over-squashing problem focus on improving graph connectivity to ease the signal flow. However, one limitation of the rewiring method is that adding too many edges potentially leads to the over-smoothing issue (Karhadkar et al., 2023). To explore this trade-off between over-squashing and over-smoothing, FoSR (Karhadkar et al., 2023) analyzes the change in Dirichlet energy upon adding edges. Their results demonstrate that rewiring the graph to enhance information propagation leads to the over-smoothing problem.
过度平滑问题。过度平滑是另一个众所周知的问题,它会降低深层图神经网络 (GNN) 的表达能力,导致节点特征表示中的判别力下降 (Li et al., 2018; Nt & Maehara, 2019; Oono & Suzuki, 2020)。针对过度挤压问题的重布线方法主要关注改善图连接性以促进信号流动。然而,重布线方法的一个局限是添加过多边可能导致过度平滑问题 (Karhadkar et al., 2023)。为了探究过度挤压与过度平滑之间的权衡关系,FoSR (Karhadkar et al., 2023) 分析了添加边时狄利克雷能量的变化。他们的结果表明,通过重布线增强信息传播会导致过度平滑问题。
3. Proposed Method
3. 提出的方法
We first introduce our motivation and design rationale. We then propose our PANDA message-passing framework.
我们首先介绍研究动机和设计原理,随后提出PANDA消息传递框架。
3.1. Motivation and Design Rationale
3.1. 动机与设计原理
Sensitivity bound. Our design rationale is anchored in the insight derived from the sensitivity bound theorem (Di Giovanni et al., 2023, Theorem 3.2). Given the sensitivity bound in Eq. (3), we consider two scenarios: one with a standard width $p$ and the other with an increased width $p_{\mathrm{high}}$ , where $p_{\mathrm{high}}>p$ . For nodes with the increased width, the sensitivity bound is improved, demonstrating a potentially higher sensitivity to input features. Regarding this, we provide a Proposition B.1 in Appendix B and show empirical results well aligned with our argument in Section 4. We do not increase the width of all nodes, but selectively.
灵敏度边界。我们的设计理念基于灵敏度边界定理 (Di Giovanni et al., 2023, Theorem 3.2) 的洞见。给定方程(3)中的灵敏度边界,我们考虑两种场景:一种是标准宽度 $p$,另一种是增加后的宽度 $p_{\mathrm{high}}$,其中 $p_{\mathrm{high}}>p$。对于宽度增加的节点,其灵敏度边界得到提升,表明对输入特征可能具有更高的灵敏度。关于这一点,我们在附录B中给出了命题B.1,并在第4节展示了与我们的论点高度吻合的实验结果。我们并非增加所有节点的宽度,而是进行选择性调整。
Design rationale for selecting node centrality. We consider node centrality metrics to identify a node as a potential source of the bottleneck in a graph. Betweenness centrality has been considered as a significant indicator of the bottleneck, which measures the number of shortest paths going through a node (Freeman, 1977). The more frequently a node appears on the shortest paths of pairs of different nodes, the more likely it is to cause the bottleneck (Topping et al., 2022, Definition 9.). However, a hub-like node with inter-community edges also has eccentric properties that are different from those of the high betweenness centrality node, such as a high degree. We apply various centrality metrics in our framework to understand how each is related to the over-squashing problem. Appendix C describes the central i ties we consider.
选择节点中心性的设计原理。我们考虑使用节点中心性指标来识别图中潜在的瓶颈源。介数中心性 (betweenness centrality) 被视为瓶颈的重要指标,它衡量通过某节点的最短路径数量 (Freeman, 1977)。一个节点在不同节点对的最短路径上出现的频率越高,就越可能引发瓶颈 (Topping et al., 2022, Definition 9.)。然而,具有社群间边的枢纽型节点也表现出与高介数中心性节点不同的特性,例如高度数。我们在框架中应用多种中心性指标,以理解每种指标与过度挤压问题的关联。附录C描述了所考虑的各类中心性指标。
3.2. Expanded Width-Aware Message Passing
3.2. 扩展宽度感知消息传递
Node expansion criteria. In our proposed method, we stratify nodes into two categories: low-dimensional nodes $\textcircled{4}$ and high-dimensional nodes $\textcircled{4}$ (see Fig. 4). This stratification is based on a specific centrality that quantifies the importance of each node. The centrality, denoted as $C(\mathcal G)$ , can be one of the following: degree (Borgatti & Halgin, 2011), betweenness (Freeman, 1977), closeness (Freeman, 1978), PageRank (Page et al., 1999), or load centrality (Goh et al., 2001). We represent the centrality values for all nodes as a vector $\mathbf{c}\in\mathbb{R}^{n}$ and then sort the elements of c in descending order. Based on the sorted c, we select the top $k$ nodes, resulting in a binary mask vector, $\mathbf{b}\in{0,1}^{n}$ :
节点扩展标准。在我们提出的方法中,我们将节点分为两类:低维节点 $\textcircled{4}$ 和高维节点 $\textcircled{4}$(见图4)。这种分层基于一种特定的中心性度量,用于量化每个节点的重要性。该中心性记为 $C(\mathcal G)$,可以是以下之一:度中心性 (degree) (Borgatti & Halgin, 2011)、介数中心性 (betweenness) (Freeman, 1977)、接近中心性 (closeness) (Freeman, 1978)、PageRank (Page et al., 1999) 或负载中心性 (load centrality) (Goh et al., 2001)。我们将所有节点的中心性值表示为向量 $\mathbf{c}\in\mathbb{R}^{n}$,然后按降序对c的元素进行排序。基于排序后的c,我们选择前$k$个节点,生成一个二元掩码向量 $\mathbf{b}\in{0,1}^{n}$:

Figure 4. Our proposed PANDA message passing framework. First, we selectively expand widths (i.e., hidden dimension sizes) according to a centrality in $\mathcal{G}$ , and our PANDA message passing enables signal propagation among nodes with different widths $\bigcirc$ low-dimensional nodes and $\bigcirc$ high-dimensional nodes).
图 4: 我们提出的PANDA消息传递框架。首先,我们根据 $\mathcal{G}$ 中的中心性选择性扩展宽度(即隐藏维度大小),PANDA消息传递实现了不同宽度节点间的信号传播( $\bigcirc$ 低维节点和 $\bigcirc$ 高维节点)。
Define subsets of neighbourhoods. Based on a centrality measure, we define four subsets of neighbouring nodes. For a node $v$ classified as low-dimensional, denoted by $\mathbf{b}_{v}=0$ :
定义邻域子集。基于中心性度量,我们定义了四个相邻节点的子集。对于分类为低维的节点 $v$ ,记为 $\mathbf{b}_{v}=0$ :
$$
\begin{array}{r}{\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}{u}=0},}\ {\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}_{u}=1}.}\end{array}
$$
$$
\begin{array}{r}{\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}{u}=0},}\ {\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=0\land\mathbf{b}_{u}=1}.}\end{array}
$$
In addition, for a node $v$ classified as high-dimensional:
此外,对于被分类为高维的节点 $v$:
$$
\begin{array}{r}{\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}{u}=1},}\ {\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}_{u}=0}.}\end{array}
$$
$$
\begin{array}{r}{\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}{u}=1},}\ {\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)={u\in\mathcal{N}(v)\mid\mathbf{b}{v}=1\land\mathbf{b}_{u}=0}.}\end{array}
$$
The comprehensive union of these subsets across all nodes in $\nu$ encompasses the entire edge set $\mathcal{E}$ , thereby preserving the original graph structure.
$\nu$ 中所有节点子集的全面并集涵盖了整个边集 $\mathcal{E}$,从而保留了原始图结构。
A new framework: PANDA message passing. The main contribution of our framework is that, given any MPNNs, we can make it work with different widths for nodes. To this end, we replace the standard neighbour aggregator $\psi^{(\ell)}$ with 4 different aggregator s, each applied to the corresponding type of neighbours: ψ(↔ℓ), ψ(⇔ℓ), ψ(↠ℓ), and ψ(↣ℓ). Then, we use a common update function for all types. The final message passed from layer $\ell$ to layer $\ell+1$ is defined as follows:
新框架:PANDA消息传递。我们框架的主要贡献在于,给定任何MPNN,我们都能使其适应不同宽度的节点。为此,我们用4种不同的聚合器替代标准邻居聚合器$\psi^{(\ell)}$,分别应用于对应类型的邻居:ψ(↔ℓ)、ψ(⇔ℓ)、ψ(↠ℓ)和ψ(↣ℓ)。然后,我们对所有类型使用通用的更新函数。从层$\ell$传递到层$\ell+1$的最终消息定义如下:
$$
\begin{array}{r l}&{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)}),}\ &{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\Leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\rightarrow}^{(\ell)}({f(\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)}),}\ &{\mathbf{h}{v}^{(\ell+1)}=\phi(\mathbf{h}{v}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\rightarrow}^{(\ell)},\mathbf{m}_{v,\rightarrow}^{(\ell)}).}\end{array}
$$
$$
\begin{array}{r l}&{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{low}\leftrightarrow\mathrm{low}}(v)}),}\ &{\mathbf{m}{v,\leftrightarrow}^{(\ell)}=\psi_{\Leftrightarrow}^{(\ell)}({\mathbf{h}{u}^{(\ell)}:u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\rightarrow}^{(\ell)}({f(\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{low}\rightarrow\mathrm{high}}(v)}),}\ &{\mathbf{m}{v,\rightarrow}^{(\ell)}=\psi_{\leftrightarrow}^{(\ell)}({g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):u\in\mathcal{N}{\mathrm{high}\rightarrow\mathrm{low}}(v)}),}\ &{\mathbf{h}{v}^{(\ell+1)}=\phi(\mathbf{h}{v}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\leftrightarrow}^{(\ell)},\mathbf{m}{v,\rightarrow}^{(\ell)},\mathbf{m}_{v,\rightarrow}^{(\ell)}).}\end{array}
$$
Each aggregator $\psi_{*}^{(\ell)}$ is tailored to handle a specific type of interactions among nodes. In Eq. (13), the hidden vector of a node $v$ is updated from the four messages. For instance, $\psi_{\rightarrow}^{(\ell)}(\cdot)$ and $\psi_{\mapsto}^{(\ell)}(\cdot)$ are designed for interactions with different widths by using functions $f(\cdot)$ and $g(\cdot)$ .
每个聚合器 $\psi_{*}^{(\ell)}$ 都专门用于处理节点间特定类型的交互。在公式(13)中,节点 $v$ 的隐藏向量通过四条消息进行更新。例如,$\psi_{\rightarrow}^{(\ell)}(\cdot)$ 和 $\psi_{\mapsto}^{(\ell)}(\cdot)$ 通过函数 $f(\cdot)$ 和 $g(\cdot)$ 被设计用于处理不同宽度的交互。
$f(\cdot)$ is a linear transformation that projects node features to the expanded dimensional space and is defined as:
$f(\cdot)$ 是一个将节点特征投影到扩展维度空间的线性变换,其定义为:
$$
f(\mathbf{h}{u}^{(\ell)}):=\mathbf{W}{f}^{(\ell)}\mathbf{h}_{u}^{(\ell)},
$$
$$
f(\mathbf{h}{u}^{(\ell)}):=\mathbf{W}{f}^{(\ell)}\mathbf{h}_{u}^{(\ell)},
$$
where W(ℓ) $\mathbf{W}{f}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p}$ . For $\psi_{\mapsto}^{(\ell)}(\cdot)$ , we define $g(\cdot)$ as adimension selector as follows:
其中 W(ℓ) $\mathbf{W}{f}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p}$。对于 $\psi_{\mapsto}^{(\ell)}(\cdot)$,我们将 $g(\cdot)$ 定义为维度选择器如下:
$$
\begin{array}{r l}&{g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):=\mathrm{gather}\big(\mathbf{h}{u}^{(\ell)},\mathrm{topk}(\mathbf{s},p)\big),}\ &{\qquad\mathbf{s}=\mathrm{softmax}\left(\sigma\big(\eta(\mathbf{h}{u}^{(\ell)}\oplus\mathbf{h}_{v}^{(\ell)})\big)\right),}\end{array}
$$
$$
\begin{array}{r l}&{g(\mathbf{h}{v}^{(\ell)},\mathbf{h}{u}^{(\ell)}):=\mathrm{gather}\big(\mathbf{h}{u}^{(\ell)},\mathrm{topk}(\mathbf{s},p)\big),}\ &{\qquad\mathbf{s}=\mathrm{softmax}\left(\sigma\big(\eta(\mathbf{h}{u}^{(\ell)}\oplus\mathbf{h}_{v}^{(\ell)})\big)\right),}\end{array}
$$
where $\eta$ is a non-linear neural network that computes a score vector $\mathbf{s}\in\mathbb{R}^{p_{\mathrm{high}}}$ that gives the relative significance of each dimension for each message. This score vector is then used to select the top $p$ dimensions for the message aggregation.
其中 $\eta$ 是一个非线性神经网络,用于计算得分向量 $\mathbf{s}\in\mathbb{R}^{p_{\mathrm{high}}}$ ,该向量表示每条消息各维度的重要性权重。随后利用该得分向量筛选出前 $p$ 个维度进行消息聚合。
In our PANDA, it is important to emphasize that nodes with low and high-dimensional features are processed in their respective dimensional spaces throughout the network layers. This means that until the final layer, these nodes maintain their distinct dimensional i ties.
在我们的PANDA中,需要重点强调的是,具有低维和高维特征的节点在整个网络层中都在各自的维度空间中进行处理。这意味着在到达最终层之前,这些节点始终保持其独特的维度特性。
Instance of our framework. To better understand our framework, we show how to integrate PANDA into two classical MPNNs: GCN (Kipf & Welling, 2017) and GIN (Xu et al., 2019). We will use these variations for our experiments. For the PANDA-GCN variant, the layer-update is expressed as follows for low and high-dimensional nodes. First, if the node $v$ is low-dimensional:
我们框架的实例。为了更好地理解我们的框架,我们展示了如何将PANDA集成到两种经典MPNN中:GCN (Kipf & Welling, 2017) 和 GIN (Xu et al., 2019)。我们将在实验中使用这些变体。对于PANDA-GCN变体,低维和高维节点的层更新表示如下。首先,如果节点$v$是低维的:
$$
\begin{array}{r}{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{low}}\leftrightarrow\mathrm{low}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ {\displaystyle+\sum_{u\in\mathcal{N}{\mathrm{low}}\to\mathrm{high}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}g(\mathbf{h}{v}^{(\ell)},\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$
$$
\begin{array}{r}{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{low}}\leftrightarrow\mathrm{low}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ {\displaystyle+\sum_{u\in\mathcal{N}{\mathrm{low}}\to\mathrm{high}}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{low}}^{(\ell)}g(\mathbf{h}{v}^{(\ell)},\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$
then if node $v$ is high-dimensional:
若节点 $v$ 为高维:
$$
\begin{array}{r l}&{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ &{\quad\quad\quad\quad\quad\quad\quad+\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{low}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}f(\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$
$$
\begin{array}{r l}&{\displaystyle\mathbf{h}{v}^{(\ell+1)}=\mathbf{h}{v}^{(\ell)}+\sigma\Big(\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{high}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}\mathbf{h}{u}^{(\ell)}}\ &{\quad\quad\quad\quad\quad\quad\quad+\sum_{u\in\mathcal{N}{\mathrm{high}\leftrightarrow\mathrm{low}}(v)}\frac{1}{\sqrt{d_{u}d_{v}}}\mathbf{W}{\mathrm{high}}^{(\ell)}f(\mathbf{h}_{u}^{(\ell)})\Big),}\end{array}
$$
where σ is a ReLU activation function, Wl(oℓ)w $\mathbf{W}{\mathrm{low}}^{(\ell)}\in\mathbb{R}^{p\times p}$ and W(ℓ) $\mathbf{W}{\mathrm{high}}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p_{\mathrm{high}}}$ are the weight matrices. The messages are normalized by their degrees, $d_{v}$ and $d_{u}$ . For the PANDA-GIN variant, we describe the layer-update in Appendix D.
其中σ是ReLU激活函数,Wl(oℓ)w $\mathbf{W}{\mathrm{low}}^{(\ell)}\in\mathbb{R}^{p\times p}$ 和W(ℓ)$\mathbf{W}{\mathrm{high}}^{(\ell)}\in\mathbb{R}^{p_{\mathrm{high}}\times p_{\mathrm{high}}}$ 为权重矩阵。消息通过各自的度数$d_{v}$和$d_{u}$进行归一化处理。对于PANDA-GIN变体,我们在附录D中描述了层更新机制。
Implementation. For implementation, the hidden vectors of different widths are separated and entered into the layer. These low and high-dimensional nodes are output with the same dimension in the last layer and integrated again.
实现。在实现过程中,不同宽度的隐藏向量被分开并输入到该层。这些低维和高维节点在最后一层以相同的维度输出并再次整合。
3.3. Properties of PANDA
3.3. PANDA 的特性
Directivity. Nodes with an equal width remain undirected, while edges among nodes with different widths can be directed. ψ(↠ℓ) and $\psi_{\longmapsto}^{(\ell)}$ have independent sets of parameters — Eqs. (17) and (18) have different weight parameters for low-dimensional and high-dimensional nodes. By ensuring the directivity only for edges with different widths, our method can be more expressive than its undirected counterpart (Rossi et al., 2023, Theorems 4.1 and 4.2).
方向性。宽度相同的节点保持无向性,而宽度不同的节点之间的边可以是有向的。ψ(↠ℓ) 和 $\psi_{\longmapsto}^{(\ell)}$ 拥有独立的参数集——方程 (17) 和 (18) 为低维和高维节点设置了不同的权重参数。通过仅对不同宽度的边确保方向性,我们的方法比其无向版本更具表现力 (Rossi et al., 2023, Theorems 4.1 and 4.2)。
Relational graph. Our PANDA can be considered as a relational graph convolutional network (RGCN) (Sch licht kru ll et al., 2018) applied to an augmented relational graph that incorporates two types of relation. Not only is message propagation performed according to the relationship of the neighbour set, but also different weight matrices are used for low-dim nodes and high-dim nodes, as mentioned earlier. In this sense, our PANDA is similar to R-GCN. However, R-GCN is not a model designed to alleviate over-squashing.
关系图。我们的PANDA可视为应用于增强关系图的关系图卷积网络(RGCN) (Schlichtkrull et al., 2018),该图包含两种关系类型。如先前所述,不仅根据邻居集的关系进行消息传播,还对低维节点和高维节点使用不同的权重矩阵。从这个意义上说,我们的PANDA与R-GCN类似。但R-GCN并非为缓解过度挤压问题而设计的模型。
Computational complexity. Compared to existing rewiring methods, PANDA adds complexity from centrality calculation and four different message passings instead of graph rewiring. The specific complexity depends on which algorithm is used to calculate the centrality. For example, the time complexity of degree centrality is $\mathcal{O}(|\mathcal{E}|)$ , and for betweenness centrality it is $\mathcal{O}(|\mathcal{V}|^{3})$ . We will discuss the empirical runtime of PANDA in Appendix $\mathrm{H}$ .
计算复杂度。与现有重布线方法相比,PANDA因中心性计算和四种不同消息传递而增加了复杂度,而非图结构重布线。具体复杂度取决于所采用的中心性计算算法。例如,度中心性 (degree centrality) 的时间复杂度为 $\mathcal{O}(|\mathcal{E}|)$,而中介中心性 (betweenness centrality) 的复杂度为 $\mathcal{O}(|\mathcal{V}|^{3})$。我们将在附录 $\mathrm{H}$ 中讨论PANDA的实际运行时间。
4. Discussions
4. 讨论
This section discusses why our PANDA alleviates oversquashing from empirical perspectives, i.e., feature sensitivity, signal propagation, and Dirichlet energy.
本节从实证角度探讨了我们的PANDA如何缓解过度挤压问题,具体包括特征敏感性、信号传播和Dirichlet能量三个方面。
Empirical sensitivity analysis. In order to verify that our method actually improves the feature sensitivity among nodes, we analyze the sensitivity given by Eq. (3) on benchmark datasets. As shown in Fig. 5, the sensitivity improves as the width $p$ increases. Our method benefits from selectively having 64 and 128-dimensional nodes, yielding sensitivity that lies between these two dimensions — $16.7%$ of nodes have 128 dimensions in Fig. 5. Compared to other rewiring techniques, PANDA shows higher sensitivity that is maintained even in deeper layers, while others exhibit decreasing trends.
实证敏感性分析。为验证我们的方法确实提高了节点间的特征敏感性,我们在基准数据集上分析了由公式(3)给出的敏感性。如图5所示,敏感性随着宽度$p$的增加而提升。我们的方法通过选择性采用64维和128维节点而获益,产生的敏感性介于这两个维度之间——图5中16.7%的节点具有128维。与其他重连技术相比,PANDA展现出更高的敏感性,即使在更深层也能保持,而其他方法则呈现下降趋势。

Figure 5. Empirical sensitivity across layers for GCN on MUTAG. More results on other datasets are in Appendix G.
图 5: GCN模型在MUTAG数据集上各层的经验敏感性分析。其他数据集的更多结果见附录G。

Figure 6. The amount of signal propagated across the graph w.r.t. the normalized total effective resistance $(R_{t o t})$ . More results in other datasets are in Appendix G.
图 6: 基于归一化总有效电阻 $(R_{tot})$ 的图上信号传播量。其他数据集的更多结果见附录 G。
Signal propagation w.r.t. effective resistance. Di Giovanni et al. (2023) provide an empirical evidence that the information propagation of general GNNs is inversely proportional to the total effective resistance $R_{t o t}$ , which motivates us to check if our PANDA maintains the magnitude of signal flow in a graph with high $R_{t o t}$ . We further analyze if the signal flow improves after applying various rewiring methods. The detailed setting on the analysis is in Appendix A.2.
信号传播与有效电阻的关系。Di Giovanni等人 (2023) 通过实验证明,普通图神经网络(GNN)的信息传播与总有效电阻 $R_{t o t}$ 成反比,这促使我们验证PANDA模型是否能在高 $R_{t o t}$ 图中保持信号流强度。我们进一步分析应用不同重布线方法后信号流是否改善。具体分析设置详见附录A.2。
In Fig. 6, all methods except PANDA present decaying signal propagation trends as the resistance of graphs increases. In contrast, PANDA shows consistent signal propagation even for graphs with higher effective resistance. To verify the linear relationship between total effective resistance and signal propagation, we also quantify the correlation coefficient between them in Fig. 2. This result demonstrates the powerful effect of the width expansion in mitigating the over-squashing problem. Our PANDA maintains continuous information flows even under high bottleneck conditions, which cannot be overcome even by the direct modification of graph topology through various rewiring strategies.
在图6中,除PANDA外,所有方法都随着图的有效电阻增加而呈现信号传播衰减趋势。相比之下,即使对于具有较高有效电阻的图,PANDA仍能保持一致的信号传播。为验证总有效电阻与信号传播之间的线性关系,我们在图2中量化了二者的相关系数。这一结果表明宽度扩展在缓解过度挤压问题上的强大效果。我们的PANDA即使在高瓶颈条件下也能维持连续信息流,这是通过多种重连策略直接修改图拓扑也无法克服的。
Trade-off between rewiring and over-smoothing. As mentioned in Section 2.2, graph rewiring can cause oversmoothing. Since PANDA does not modify the original graph connectivity, we analyze whether our method can avoid the over-smoothing problem induced by graph rewiring using the Dirichlet energy as a measure of oversmoothing. The theoretical relation between Dirichelt energy and over-smoohting is given in Appendix E. Fig. 7 compares the Dirichlet energies of the final hidden representations learned from GCN and GIN having both a rewiring technique and PANDA. The result shows that PANDA yields a higher Dirichlet energy for both GCN and GIN compared to others, highlighting the strength of our model, mitigating the over-squashing while preserving the original graph connectivity.
重连与过度平滑之间的权衡。如第2.2节所述,图重连可能导致过度平滑。由于PANDA不修改原始图连接性,我们通过狄利克雷能量(Dirichlet energy)作为过度平滑的衡量指标,分析该方法能否避免图重连引发的过度平滑问题。狄利克雷能量与过度平滑的理论关系见附录E。图7对比了采用重连技术和PANDA的GCN与GIN最终隐藏层表示的狄利克雷能量。结果表明,相较于其他方法,PANDA在GCN和GIN上均产生更高的狄利克雷能量,凸显了我们的模型在保持原始图连接性的同时缓解过度挤压(over-squashing)的优势。

Figure 7. Dirichlet energy of baselines and PANDA. More results in other datasets are in Appendix G.
图 7: 基线方法与PANDA的狄利克雷能量 (Dirichlet energy)。其他数据集的更多结果见附录G。
5. Experiments
5. 实验
In this section, we empirically verify that PANDA can significantly improve the performance of GNN over other rewiring methods. We experiment with graph classification and node classification, as well as tasks on Long-Range Graph Benchmark (LRGB) (Dwivedi et al., 2022). We cover the experiments on TUDataset (Morris et al., 2020) in the main content and the node classification task and LRGB dataset in Appendix K and Appendix L.
在本节中,我们通过实验验证PANDA能显著提升GNN在其他重布线方法上的性能。实验涵盖图分类、节点分类以及长程图基准测试 (Long-Range Graph Benchmark, LRGB) (Dwivedi et al., 2022) 任务。正文部分展示TUDataset (Morris et al., 2020) 的实验结果,节点分类任务和LRGB数据集的相关实验详见附录K与附录L。
Datasets. For graph classification, we report our method using the following benchmarks: REDDIT-BINARY, IMDBBINARY, MUTAG, ENZYMES, PROTEINS, COLLAB from the TUDataset (Morris et al., 2020). A detail of datasets is available in Appendix F.1.
数据集。对于图分类任务,我们使用以下基准数据集评估方法:来自TUDataset (Morris et al., 2020) 的REDDIT-BINARY、IMDBBINARY、MUTAG、ENZYMES、PROTEINS和COLLAB。数据集详细信息见附录F.1。
Experimental setting. For graph classification, we compare PANDA to no graph rewiring and 7 other state-of-theart rewiring methods: DIGL (Gasteiger et al., 2019), Fully adjacent layer (FA) (Alon & Yahav, 2021), SDRF (Topping et al., 2022), FoSR (Karhadkar et al., 2023), BORF (Karhad- kar et al., 2023), GTR (Black et al., 2023), and CT- Layer (Arnaiz-Rodriguez et al., 2022). In our experiments, we prioritize fairness and comprehensiveness rather than aiming to obtain the best possible performance for each dataset. For backbone GNNs, we use GCN, GIN (Xu et al., 2019), R-GCN (Sch licht kru ll et al., 2018), and R- GIN (Brock schmidt, 2020). We accumulate the result in 100 random trials and report the mean test accuracy, along with the $95%$ confidence interval. Further experiment details are in Appendix F.
实验设置。在图分类任务中,我们将PANDA与不进行图重连的情况以及7种最先进的图重连方法进行比较:DIGL (Gasteiger等,2019)、全连接层(FA) (Alon & Yahav, 2021)、SDRF (Topping等,2022)、FoSR (Karhadkar等,2023)、BORF (Karhadkar等,2023)、GTR (Black等,2023)和CT层(Arnaiz-Rodriguez等,2022)。在实验中,我们优先考虑公平性和全面性,而不是追求每个数据集的最佳性能。对于骨干GNN,我们使用GCN、GIN (Xu等,2019)、R-GCN (Schlichtkrull等,2018)和R-GIN (Brockschmidt, 2020)。我们在100次随机试验中累积结果,并报告平均测试准确率以及95%置信区间。更多实验细节见附录F。
Table 1. Results of PANDA and baselines for GCN and GIN. We show the best three in red (first), blue (second), and purple (third).
表 1. PANDA与基线方法在GCN和GIN上的结果。我们用红色(第一)、蓝色(第二)和紫色(第三)标出前三名。
| 方法 | REDDIT-BINARYIMDB-BINARY | MUTAG | ENZYMES | PROTEINS | COLLAB |
|---|---|---|---|---|---|
| GCN (None) | 68.255 ± 1.098 | 49.770 ± 0.817 | 72.150 ± 2.442 | 27.667 ± 1.164 | 70.982 ± 0.737 33.784 ± 0.488 |
| + Last Layer FA | 68.485 ± 0.945 | 48.980 ± 0.945 | 70.050 ± 2.027 | 26.467 ± 1.204 | 71.018 ± 0.963 33.320 ± 0.435 |
| + Every Layer FA | 48.490 ± 1.044 | 48.170 ± 0.801 | 70.450 ± 1.960 | 18.333 ± 1.038 | 60.036 ± 0.925 51.798 ± 0.419 |
| + DIGL | 49.980 ± 0.680 | 49.910 ± 0.841 | 71.350 ± 2.391 | 27.517 ± 1.053 | 70.607 ± 0.731 15.530 ± 0.294 |
| + SDRF | 68.620 ± 0.851 | 49.400 ± 0.904 | 71.050 ± 1.872 | 28.367 ± 1.174 | 70.920 ± 0.792 33.448 ± 0.472 |
| + FoSR | 70.330 ± 0.727 | 49.660 ± 0.864 | 80.000 ± 1.574 | 25.067 ± 0.994 | 73.420 ± 0.811 33.836 ± 0.584 |
| +BORF | Time-out | 50.100 ± 0.900 | 75.800 ± 1.900 | 24.700 ± 1.000 | 71.000 ± 0.800 Time-out |
| + GTR | 68.990 ± 0.610 | 49.920 ± 0.990 | 79.100 ± 1.860 | 27.520 ± 0.990 | 72.590 ± 2.480 33.050 ± 0.400 |
| + CT-Layer | 51.580 ± 1.019 | 50.320 ± 0.944 | 75.899 ± 3.024 | 17.383 ± 1.030 | 60.357 ± 1.060 52.146 ± 0.415 |
| +PANDA | 80.690 ± 0.721 | 63.760 ± 1.012 | 85.750 ± 1.396 | 31.550 ± 1.230 | 76.000 ± 0.774 68.400 ± 0.452 |
| GIN (None) | 86.785 ± 1.056 | 70.180 ± 0.992 | 77.700 ± 0.360 | 33.800 ± 0.115 | 70.804 ± 0.827 72.992 ± 0.384 |
| + Last Layer FA | 90.220 ± 0.475 | 70.910 ± 0.788 | 83.450 ± 1.742 | 47.400 ± 1.387 | 72.304 ± 0.666 75.056 ± 0.406 |
| + EveryLayer FA | 50.360 ± 0.684 | 49.160 ± 0.870 | 72.550 ± 3.016 | 28.383 ± 1.052 | 70.375 ± 0.910 32.984 ± 0.390 |
| + DIGL | 76.035 ± 0.774 | 64.390 ± 0.907 | 79.700 ± 2.150 | 35.717 ± 1.198 | 70.759 ± 0.774 5 54.504 ± 0.410 |
| + SDRF | 86.440 ± 0.590 | 69.720 ± 1.152 | 78.400 ± 2.803 | 35.817 ± 1.094 | 69.813 ± 0.792 72.958 ± 0.419 |
| + FoSR | 87.350 ± 0.598 | 71.210 ± 0.919 | 78.400 ± 2.803 | 29.200 ± 1.367 | 75.107 ± 0.817 73.278 ± 0.416 |
| +BORF | Time-out | 71.300 ± 1.500 | 80.800 ± 2.500 | 35.500 ± 1.200 | 74.200 ± 0.800 Time-out |
| + GTR | 86.980 ± 0.660 | 71.280 ± 0.860 | 77.600 ± 2.840 | 30.570 ± 1.420 | 73.130 ± 0.690 72.930 ± 0.420 |
| + CT-Layer | 54.589 ± 1.757 | 50.000 ± 0.974 | 56.850 ± 4.253 | 16.583 ± 0.907 | 61.107 ± 1.184 52.304 ± 0.605 |
| + PANDA | 91.055 ± 0.402 | 72.560 ± 0.917 | 88.750 ± 1.570 | 46.200 ± 1.410 | 75.759 ± 0.856 75.110 ± 0.210 |
Table 2. PANDA-GCN v.s. R-GCN
表 2. PANDA-GCN 对比 R-GCN
| 方法 | REDDIT-BINARY | IMDB-BINARY | MUTAG | ENZYMES | PROTEINS | COLLAB |
|---|---|---|---|---|---|---|
| R-GCN | 49.850 ± 0.653 | 50.012 ± 0.917 | 69.250 ± 2.085 28.600 ± 1.186 | 69.518 ± 0.725 | 533.602±1.047 | |
| PANDA-GCN | 80.690 ± 0.721 | 63.760 ± 1.012 | 85.750 ±1.396 31.550 ± 1.230 | 76.000±0.774 | 68.400 ± 0.452 |
Results. Table 1 shows the results of different methods applied to the GCN and GIN models across benchmark datasets. Our PANDA method shows the highest accuracy across most of the datasets for both models, significantly outperforming the baseline (“None”) and other rewiring methods. Surprisingly, For REDDIT-BINARY, our PANDA shows a substantial $22.30%$ improvement over the baseline (None) method. In particular, in the case of BORF, it is impracticable to evaluate for REDDIT-BINARY and COLLAB due to the rewiring time. In the case of GIN, our PANDA also leads to the highest accuracies, with 91.055 on REDDITBINARY and 88.75 on MUTAG. These results underscore the effectiveness of our PANDA in enhancing the performance of GNN. In Table 9 of Appendix I.1, we also show the experimental results for R-GCN and R-GIN.
结果。表1展示了在基准数据集上应用于GCN和GIN模型的不同方法的结果。我们的PANDA方法在两种模型的大多数数据集上均显示出最高准确率,显著优于基线方法("None")和其他重布线方法。令人惊讶的是,在REDDIT-BINARY数据集上,PANDA相比基线(None)方法实现了22.30%的大幅提升。特别是对于BORF方法,由于重布线时间问题,无法在REDDIT-BINARY和COLLAB数据集上进行评估。在GIN模型中,我们的PANDA同样取得了最高准确率,在REDDITBINARY上达到91.055,在MUTAG上达到88.75。这些结果凸显了PANDA在提升GNN性能方面的有效性。在附录I.1的表9中,我们还展示了R-GCN和R-GIN的实验结果。
Performance comparison of PANDA-GCN and R-GCN. Considering the similarities between PANDA-GCN and R-GCN discussed in the previous section, we empirically evaluate their performance. As shown in Table 2, PANDAGCN outperforms R-GCN in all datasets. Compared to R-GCN, which is not designed to mitigate over-squashing, these results show the efficacy of PANDA
PANDA-GCN与R-GCN的性能对比。鉴于前文讨论的PANDA-GCN与R-GCN的相似性,我们对其性能进行了实证评估。如表2所示,PANDA-GCN在所有数据集上的表现均优于R-GCN。与未针对过压缩(over-squashing)问题设计的R-GCN相比,这些结果证明了PANDA的有效性。
Effect of centrality metrics. It is natural to ask how different the performance of PANDA varies depending on the centrality metrics. Table 3 compares the results obtained using different kinds of centrality metrics. In IMDB-BINARY, PageRank centrality is better than other metrics, indicating that larger node widths are effective in receiving messages from influential nodes. Closeness centrality shows the best performance on PROTEINS and COLLAB. This indicates that it is effective to expand the width of nodes that require many connections to connect to other distant nodes.
中心性指标的影响。很自然地会问,PANDA的性能会因中心性指标的不同而产生多大差异。表3比较了使用不同中心性指标获得的结果。在IMDB-BINARY数据集中,PageRank中心性优于其他指标,表明较大的节点宽度能有效接收来自重要节点的信息。接近中心性在PROTEINS和COLLAB数据集上表现最佳,这表明扩展需要大量连接才能与其他远距离节点相连的节点宽度是有效的。
Sensitivity on top $k$ nodes. Fig. 8 shows a sensitivity study w.r.t. top $k$ nodes. For PROTEINS, both PANDAGCN and PANDA-GIN show an increase in mean accuracy as the top $k$ value increased from 3 to 7. However, the accuracy does not increase after the $k$ value is 7. For MUTAG, PANDA-GCN and PANDA-GIN have the highest accuracy for $k$ values of 10 and 7, respectively.
对前 $k$ 个节点的敏感性。图 8 展示了关于前 $k$ 个节点的敏感性研究。在 PROTEINS 数据集上,随着 $k$ 值从 3 增加到 7,PANDAGCN 和 PANDA-GIN 的平均准确率均有所提升。但当 $k$ 值超过 7 后,准确率不再增长。在 MUTAG 数据集上,PANDA-GCN 和 PANDA-GIN 分别在 $k$ 值为 10 和 7 时达到最高准确率。
Sensitivity on phigh. Fig. 9 shows a sensitivity study of the PANDA w.r.t. $p_{\mathrm{high}}$ . In PROTEINS, using PANDA-GCN and PANDA-GIN with larger $p_{\mathrm{high}}$ improves performance. In MUTAG, on the other hand, both models achieve their best performance when using a $p_{\mathrm{high}}$ value of 112.
对 phigh 的敏感性。图 9 展示了 PANDA 对 $p_{\mathrm{high}}$ 的敏感性研究。在 PROTEINS 数据集中,使用 PANDA-GCN 和 PANDA-GIN 时,较大的 $p_{\mathrm{high}}$ 值能提升性能。而在 MUTAG 数据集中,两种模型均在 $p_{\mathrm{high}}$ 值为 112 时达到最佳性能。
Table 3. Performance comparison by various centrality measures for PANDA-GCN. The results of PANDA-GIN are in Appendix I.2.
表 3. PANDA-GCN 在不同中心性度量下的性能对比。PANDA-GIN 的结果见附录 I.2。
| (5)0 | REDDIT-BINARY | IMDB-BINARY | MUTAG | ENZYMES | PROTEINS | COLLAB |
|---|---|---|---|---|---|---|
| Degree | 80.690 ± 0.721 | 62.100 ± 1.043 | 85.200 ±1.568 | 31.117±1.258 | 75.375± 0.800 | 68.162 2±0.471 |
| Betweenness | 80.000 ±0.659 | 59.630±1.152 | 85.750 ± 1.396 | 29.600 ± 1.208 | 74.589 ± 0.791 | 67.844 ± 0.547 |
| Closeness | 79.700 ± 0.664 | 61.160 ± 0.992 | 84.700 ± 1.554 | 29.967 ±1.231 | 76.000 ± 0.774 | 68.400 )±0.452 |
| PageRank | 80.340 ± 0.826 | 63.760 ± 1.012 | 85.450 ± 1.569 | 31.550 ±1.230 | 74.098 ±0.851 | 67.540 ± 0.500 |
| Load | 79.500 ±0.732 | 59.840 ± 1.153 | 85.700 ± 1.549 | 28.167 ± 1.090 | 74.188 ± 0.814 | 67.802 ± 0.506 |

Figure 8. Sensitivity on top $\cdot k$ . More results are in Appendix J.
图 8: 最高 $\cdot k$ 的敏感性分析。更多结果见附录 J。

Figure 9. Sensitivity on $p_{\mathrm{high}}$ . More results are in Appendix J.
图 9: 对 $p_{\mathrm{high}}$ 的敏感性分析。更多结果见附录J。
6. Related Work
6. 相关工作
This section introduces related work to alleviate the oversquashing problem. Before the link between over-squashing and graph curvature was established by (Topping et al., 2022), DIGL was developed to sparsify the adjacency matrix using the graph heat kernel and personalized PageRank (Gasteiger et al., 2019). After Alon & Yahav (2021) empirically observed the over-squashing problem, research is active to find indicators for over-squashing and propose rewiring methods. Topping et al. (2022) initially connect over-squashing with graph Ricci curvature, showing that edges with highly negative Ricci curvature contribute to the over-squashing and proposes an SDRF. Since then, rewiring methods with indicators based on curvatures have been studied (Nguyen et al., 2023; Shi et al., 2023a). In addition to using curvatures as indicators of the over-squashing, Black et al. (2023) measure the over-squashing through the concept of effective resistance. Banerjee et al. (2022) propose rewiring methods to handle the original topology of the graph to prevent it from being disconnected (Banerjee et al.,
本节介绍缓解过度挤压 (oversquashing) 问题的相关工作。在 (Topping et al., 2022) 建立过度挤压与图曲率之间的联系之前,DIGL 通过图热核和个性化 PageRank (Gasteiger et al., 2019) 开发了邻接矩阵稀疏化方法。Alon & Yahav (2021) 通过实验观察到过度挤压问题后,研究人员积极寻找过度挤压的指标并提出重连方法。Topping et al. (2022) 首次将过度挤压与图里奇曲率联系起来,表明具有高度负里奇曲率的边会导致过度挤压,并提出了 SDRF 方法。此后,基于曲率指标的重连方法得到研究 (Nguyen et al., 2023; Shi et al., 2023a)。除了使用曲率作为过度挤压的指标外,Black et al. (2023) 还通过有效电阻的概念来测量过度挤压。Banerjee et al. (2022) 提出了处理图原始拓扑的重连方法,以防止其断开连接 (Banerjee et al.,
2022). To prevent over-smoothing, Karhadkar et al. (2023) propose a rewiring method that optimizes the spectral gap. Deac et al. (2022) propose to interleave message propagation on the original graph with message passing on an expander graph to alleviate over-squashing. Gutteridge et al. (2023) propose a layer-dependent rewiring method that provides a variety of rewiring graphs for feature propagation. Barbero et al. (2023) propose a LASER to rewire the graph to better preserve locality. Finkel sh te in et al. (2023) propose a CO-GNN with flexible and dynamic message passing that can perform effective graph rewiring at each layer of GNN. Recently, Di Giovanni et al. (2023) analyze the factors that contribute to over-squashing, and Southern et al. (2024) analyze the role of virtual nodes in over-squashing. Also, rewiring methods, as well as more advanced and flexible message-passing paradigms, are being studied (Errica et al., 2023; Barbero et al., 2023; Park et al., 2023; Qian et al., 2024; Behrouz & Hashemi, 2024; Qiu et al., 2024).
2022)。为防止过度平滑(over-smoothing),Karhadkar等人(2023)提出了一种优化谱间隙的重新布线方法。Deac等人(2022)建议在原始图上交替进行消息传播与扩展图上的消息传递以缓解过度挤压(over-squashing)。Gutteridge等人(2023)提出了一种层依赖的重新布线方法,为特征传播提供多种重布线方案。Barbero等人(2023)开发了LASER方法,通过重布线更好地保持局部性。Finkelstein等人(2023)提出具有灵活动态消息传递能力的CO-GNN,可在GNN每一层执行有效的图重布线。最近,Di Giovanni等人(2023)分析了导致过度挤压的因素,Southern等人(2024)则研究了虚拟节点在过度挤压中的作用。此外,重布线方法以及更先进灵活的消息传递范式也正在被深入研究(Errica等人,2023;Barbero等人,2023;Park等人,2023;Qian等人,2024;Behrouz & Hashemi,2024;Qiu等人,2024)。
Most studies to mitigate over-squashing focus on rewiring methods that change the topology. However, PANDA is different in that it addresses the over-squashing problem by moving away from the rewiring method.
大多数缓解过度挤压 (over-squashing) 的研究都集中在改变拓扑结构的重连方法上。然而,PANDA 的不同之处在于它通过摒弃重连方法来解决过度挤压问题。
7. Concluding Remarks
7. 结论
We presented PANDA, a novel message passing framework, effectively addressing the over-squashing problem in GNNs without rewiring edges. By selectively expanding the width of high-centrality nodes, PANDA promotes signal propagation to alleviate over-squashing. Our empirical evaluations show that PANDA outperforms existing graph rewiring methods in graph classification. This research contributes a pioneering approach to selectively expanding node widths in message passing and lays the groundwork for the future exploration of width-aware strategies in GNNs.
我们提出了PANDA这一创新的消息传递框架,无需重连边即可有效解决图神经网络(GNN)中的过度挤压问题。通过选择性扩展高中心性节点的宽度,PANDA促进了信号传播以缓解过度挤压。实证评估表明,PANDA在图分类任务上优于现有的图重连方法。本研究开创性地提出了消息传递中节点宽度的选择性扩展方法,为未来探索GNN中宽度感知策略奠定了基础。
Limitations and future work. We only focus on redesigning the GNN based on sensitivity (Eq. (3)) for the width of the nodes that contribute to over-squashing. Despite our empirical evidence for mitigating over-squashing, more research is needed to prove the effects of higher widths on over-squashing. For future work, we will aim to reduce the complexity of our method and explore a much wider range of tasks to study the pros and cons of higher widths.
局限性与未来工作。我们仅专注于基于节点宽度对过度挤压的敏感性(公式(3))来重新设计GNN。尽管我们提供了缓解过度挤压的实证证据,但仍需更多研究来验证更大宽度对过度挤压的影响。未来工作中,我们将致力于降低方法的复杂度,并探索更广泛的任务范围以研究更大宽度的优缺点。
