Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI

使用云无服务器计算与FMI进行可扩展的宇宙AI推理

Abstract

摘要

Large-scale astronomical image data processing and prediction is essential for astronomers, providing crucial insights into celestial objects, the universe’s history, and its evolution. While modern deep learning models offer high predictive accuracy, they often demand substantial computational resources, making them resource-intensive and limiting accessibility. We introduce the Cloud-based Astronomy Inference (CAI) framework to address these challenges. This scalable solution integrates pre-trained foundation models with serverless cloud infrastructure through a Function-as-aService (FaaS) Message Interface (FMI). CAI enables efficient and scalable inference on astronomical images without extensive hardware. Using a foundation model for redshift prediction as a case study, our extensive experiments cover user devices, HPC (High-Performance Computing) servers, and Cloud. CAI’s significant s cal ability improvement on large data sizes provides an accessible and effective tool for the astronomy community. The code is accessible at https://github.com/UVA-MLSys/AI-for-Astronomy.

大规模天文图像数据处理与预测对天文学家至关重要，提供了关于天体、宇宙历史及其演化的重要见解。尽管现代深度学习模型提供了高预测精度，但它们通常需要大量计算资源，这使得它们资源密集且限制了可访问性。我们引入了基于云的天文推理（CAI）框架来解决这些挑战。这一可扩展的解决方案通过函数即服务（FaaS）消息接口（FMI）将预训练的基础模型与无服务器云基础设施集成在一起。CAI 使得在天文图像上进行高效且可扩展的推理成为可能，而无需大量硬件。以红移预测的基础模型为例，我们进行了广泛的实验，涵盖了用户设备、高性能计算（HPC）服务器和云。CAI 在大数据量上的显著可扩展性改进为天文学界提供了一个可访问且有效的工具。代码可在 https://github.com/UVA-MLSys/AI-for-Astronomy 访问。

Keywords

关键词

Cloud Computing, Astronomy, Foundation Models, Scaling, Serverless

云计算、天文学、基础模型、扩展、无服务器

Introduction

引言

Astronomical images are vital to modern astrophysics, offering key insights into celestial objects, such as their shape Willett et al. (2013), distance Hubble (1929), and other fundamental characteristics that define our understanding of the universe. Large surveys like the Dark Energy Spectroscopic Instrument (DESI) Fan et al. (2019) and the Sloan Digital Sky Survey (SDSS) Gunn et al. (1998) provide extensive image datasets with unique attributes. For example, SDSS images consist of five spectral bands—u, g, r, i, and z—each focused on specific wavelengths: Ultraviolet (u) at $3543,\mathrm{~~\AA~~}$ , Green (g) at $4770,\mathrm{~~\AA~~}$ , Red (r) at 6231 ˚A, Near Infrared (i) at $7625,\mathrm{\AA}$ , and Infrared (z) at 9134 ˚A MurrugarraLlerena and Hirata (2017).

天文图像对现代天体物理学至关重要，它们提供了关于天体的关键洞察，例如其形状 Willett 等人 (2013)、距离 Hubble (1929) 以及其他定义我们对宇宙理解的基本特征。像暗能量光谱仪 (DESI) Fan 等人 (2019) 和斯隆数字巡天 (SDSS) Gunn 等人 (1998) 这样的大型巡天项目提供了具有独特属性的广泛图像数据集。例如，SDSS 图像由五个光谱波段组成——u、g、r、i 和 z——每个波段都聚焦于特定波长：紫外 (u) 在 $3543,\mathrm{~~\AA~~}$，绿 (g) 在 $4770,\mathrm{~~\AA~~}$，红 (r) 在 6231 ˚A，近红外 (i) 在 $7625,\mathrm{\AA}$，以及红外 (z) 在 9134 ˚A MurrugarraLlerena 和 Hirata (2017)。

Deep learning foundation models trained on large-scale astronomical image datasets have proven to be powerful tools for improving the accuracy and efficiency of tasks such as redshift prediction, morphology classification, and similarity search Hayat et al. (2021); Lanusse et al. (2023); Fathkouhi and Fox (2024). Figure 5 highlights several notable deep-learning architectures that leverage astronomical images for prediction. For example, AstroCLIP Lanusse et al. (2023) uses an image transformer with a 307M-parameter encoder, pretrained on approximately 197K images. AstroMAE Fathkouhi and Fox (2024) combines frozen pretrained weights with fine-tunable parameters, totaling 10.4M learnable parameters. Similarly, Henghes et al. (2022) trained their model on 1 million SDSS images using 7.8M parameters. In comparison, AstroMAE, trained on about 650K images, showed significant performance improvements over the Henghes et al. (2022) model in redshift prediction.

在大规模天文图像数据集上训练的深度学习基础模型已被证明是提高红移预测、形态分类和相似性搜索等任务准确性和效率的强大工具 Hayat et al. (2021); Lanusse et al. (2023); Fathkouhi and Fox (2024)。图 5 展示了几个利用天文图像进行预测的著名深度学习架构。例如，AstroCLIP Lanusse et al. (2023) 使用了一个具有 307M 参数的图像 Transformer 编码器，并在约 197K 张图像上进行了预训练。AstroMAE Fathkouhi and Fox (2024) 结合了冻结的预训练权重和可微调参数，总计有 10.4M 个可学习参数。同样，Henghes et al. (2022) 在 100 万张 SDSS 图像上训练了他们的模型，使用了 7.8M 个参数。相比之下，AstroMAE 在约 650K 张图像上训练，在红移预测方面表现出比 Henghes et al. (2022) 模型显著的性能提升。

While these models yield impressive results, their large parameter counts require substantial computational resources, making both training and inference resourceintensive. The memory and computing resources on a standalone device become a limitation for users to run on large datasets and foundation models. There is a need for advanced infrastructure to overcome the limit of highperformance inference accessibility for many users Neely (2021). Serverless computing is a recently popular Functionas-a-Service (FaaS) Li et al. (2022), that lets developers write cloud functions in high-level languages (e.g. Python) and takes care of the complex infrastructure management itself.

虽然这些模型产生了令人印象深刻的结果，但它们的大量参数需要大量的计算资源，使得训练和推理都变得资源密集。独立设备上的内存和计算资源成为用户在大型数据集和基础模型上运行的限制。需要先进的基础设施来克服许多用户高性能推理可访问性的限制 [Neely (2021)]。无服务器计算是最近流行的函数即服务 (FaaS) [Li et al. (2022)]，它允许开发者用高级语言（例如 Python语言）编写云函数，并自行处理复杂的基础设施管理。

This study proposes a highly scalable serverless computing framework for using AWS AWS (2024) to enhance the accessibility of a pretrained foundation model for astronomical images. This effectively reduces the computational demands on individual users. In summary, we offer the following contributions:

本研究提出了一种高度可扩展的无服务器计算框架，利用 AWS AWS (2024) 来增强预训练基础模型在天文图像上的可访问性。这有效降低了个体用户的计算需求。总结来说，我们提供了以下贡献：

• A novel Cloud-based Astronomy framework (named “Cloud-based Astronomy Inference” (CAI)) to significantly enhance the s cal ability of foundation model inference on large astronomical images. • Detailed experiments on the redshift prediction task using real-world galaxy images from the SDSS survey, comparing CAI’s performance with other computing devices (e.g. personal, and HPC).

• 一种新颖的基于云的天文学框架（命名为“基于云的天文学推理”（CAI）），显著增强了基础模型在大规模天文图像上的推理能力。
• 使用来自SDSS调查的真实星系图像进行红移预测任务的详细实验，比较CAI与其他计算设备（例如个人计算机和高性能计算）的性能。

• Our comprehensive performance analysis with inference time and throughput demonstrates that CAI effectively improves the s cal ability of foundation model inference in astronomy.

• 我们通过推理时间和吞吐量的综合性能分析表明，CAI 有效提高了天文学中基础模型推理的可扩展性。

Problem Statement

问题陈述

In astronomical imaging, much of the research has focused on enhancing deep learning model performance, with comparatively less attention to scaling inference capabilities. Although recent foundation models, trained on extensive astronomical image datasets, demonstrate versatility across various tasks, their high parameter count limits usability and s cal ability due to infrastructure constraints. To address this, a scalable framework is essential for efficient inference on large image volumes without added financial burdens. We introduce Cloud-based Astronomy Inference (CAI), which employs the Function-as-a-Service (FaaS) Message

在天文成像领域，大部分研究集中在提升深度学习模型的性能上，而对扩展推理能力的关注相对较少。尽管最近的基础模型在广泛的天文图像数据集上训练，展示了在各种任务中的多功能性，但由于基础设施的限制，其高参数数量限制了可用性和可扩展性。为了解决这一问题，一个可扩展的框架对于在不增加财务负担的情况下对大量图像进行高效推理至关重要。我们引入了基于云的天文推理 (Cloud-based Astronomy Inference, CAI)，它采用了函数即服务 (Function-as-a-Service, FaaS) 消息传递机制。

Interface (FMI) Bo¨hringer (2022) to enhance the s cal ability of foundation models trained on astronomical images. Based on our review, CAI is the first framework specifically designed to address the inference s cal ability of foundation models in astronomical imaging.

接口 (FMI) Bo¨hringer (2022) 旨在提升基于天文图像训练的基础模型的可扩展性。根据我们的综述，CAI 是首个专门为解决天文成像中基础模型推理可扩展性而设计的框架。

Although dark energy constitutes approximately $95%$ of the universe’s energy, our understanding of this mysterious force remains profoundly limited. Investigating and exploring its nature requires a large-scale collection of galaxy images, supported by advanced cosmological methods and theories Jones et al. (2024). A cornerstone of these methods is the precise determination of a critical parameter: redshift. Redshift measures how much the light from a celestial object has been stretched, providing crucial insights into the distances of these objects and the expansion of the universe Hubble (1929).

尽管暗能量 (dark energy) 构成了宇宙能量的约 $95%$，我们对这种神秘力量的理解仍然非常有限。研究和探索其性质需要大规模收集星系图像，并依赖于先进的宇宙学方法和理论 Jones et al. (2024)。这些方法的一个关键基石是精确测定一个关键参数：红移 (redshift)。红移测量了来自天体的光被拉伸的程度，为这些天体的距离和宇宙的膨胀提供了重要的见解 Hubble (1929)。

In this study, we assess CAI’s s cal ability by applying it to redshift prediction. For this purpose, we selected AstroMAE due to its superior performance and because it is pretrained on a larger dataset, which may enhance generalization compared to AstroCLIP. Future work will explore the s cal ability of additional models discussed in this context.

在本研究中，我们通过将 CAI 应用于红移预测来评估其可扩展性。为此，我们选择了 AstroMAE，因为它的性能更优，并且在更大的数据集上进行了预训练，这可能比 AstroCLIP 具有更好的泛化能力。未来的工作将探索本文讨论的其他模型的可扩展性。

Methodology

方法论

We first collect the AstroMAE model pretrained on the large astronomy dataset. Then deploy it to our proposed cloud architecture for inference s cal ability benchmark.

我们首先收集了在大规模天文学数据集上预训练的 AstroMAE 模型，然后将其部署到我们提出的云架构中，以进行推理可扩展性基准测试。

AstroMAE

AstroMAE Fathkouhi and Fox (2024) is a recent foundation model that captures general patterns in galaxy images for redshift prediction. It has two major phases:

AstroMAE Fathkouhi 和 Fox (2024) 是最近的一个基础模型，用于捕捉星系图像中的一般模式以进行红移预测。它有两个主要阶段：

Pre training: Figure 1 illustrates the pre training process of AstroMAE’s masked auto encoder Devlin (2018). The masked auto encoder aims to reconstruct masked patches using unmasked ones. We mask $75%$ of the embedded patches, the remaining $25%$ are fed into the encoder. Initially, images are segmented into uniform patches of size $5!\times!8!\times!8$ and embedded into 192-dimensional vectors with positional embedding.

预训练：图 1 展示了 AstroMAE 的掩码自编码器 (masked auto encoder) 的预训练过程 [Devlin, 2018]。掩码自编码器的目标是通过未掩码的补丁重建被掩码的补丁。我们掩码了 $75%$ 的嵌入补丁，剩下的 $25%$ 被输入到编码器中。最初，图像被分割成大小为 $5!\times!8!\times!8$ 的均匀补丁，并通过位置嵌入嵌入到 192 维向量中。

The reconstructed masked patches are compared to their original patches, enabling the model to learn meaningful representations. To promote learning of data patterns instead of memorizing patch positions, the embeddings are randomly shuffled before being input to the encoder. Compared to other pre training methods like contrastive learning Chen et al. (2020), the masked auto encoder does not rely on specific augmentations, which can potentially increase the dataset size and computational demands. In contrast, AstroMAE’s masked auto encoder processes $25%$ patches, making it significantly more efficient. A crucial step for working on large astronomy data. AstroMAE also uses a modified ViT Wang et al. (2022), that contains a parallel convolutional module and performs even better.

重建的掩码补丁与原始补丁进行比较，使模型能够学习有意义的表示。为了促进数据模式的学习而不是记忆补丁位置，嵌入在输入编码器之前会被随机打乱。与对比学习等其他预训练方法相比 [Chen et al., 2020]，掩码自编码器不依赖于特定的增强操作，这可能会增加数据集的大小和计算需求。相比之下，AstroMAE 的掩码自编码器处理 $25%$ 的补丁，使其效率显著提高。这是处理大规模天文数据的关键步骤。AstroMAE 还使用了改进的 ViT [Wang et al., 2022]，该模型包含一个并行卷积模块，性能更优。

Fine-tuning: During fine-tuning, the decoder is removed, leaving a frozen encoder that works with two additional models: a parallel Inception model and a magnitude block. The outputs from the frozen encoder, Inception model, and magnitude block are first processed through several layers individually, then concatenated and passed through additional layers for final processing.

微调：在微调过程中，解码器被移除，留下一个冻结的编码器，它与两个额外的模型一起工作：一个并行的Inception模型和一个幅度块。冻结编码器、Inception模型和幅度块的输出首先分别通过几层进行处理，然后连接并通过额外的层进行最终处理。

Figure 1. The architecture of masked auto encoder of AstroMAE Fathkouhi and Fox (2024).

图 1: AstroMAE 的掩码自编码器架构 (Fathkouhi and Fox, 2024)。

Let $V\in R^{I}$ represent the image data and $O\in R^{M}$ represent the magnitude data, where $I$ and $M$ denote the dimensions of the image and magnitude data spaces, respectively.

设 $V\in R^{I}$ 表示图像数据，$O\in R^{M}$ 表示幅度数据，其中 $I$ 和 $M$ 分别表示图像和幅度数据空间的维度。

A frozen pretrained encoder $E:R^{I}\to R^{C}$ generates a latent space representation $L_{E}$ from the image $V$ :

冻结的预训练编码器 $E:R^{I}\to R^{C}$ 从图像 $V$ 生成潜在空间表示 $L_{E}$：

$$
L_{E}=E(V).
$$

Similarly, an inception model $W:R^{I}\rightarrow R^{Q}$ extracts features $L_{W}$ from the same image of Figure 2a).

同样地，一个初始模型 $W:R^{I}\rightarrow R^{Q}$ 从图 2a) 的同一张图像中提取特征 $L_{W}$。

$$
L_{W}=W(V).
$$

The magnitude data $O$ from Figure 2b) is processed through a magnitude block $S:R^{M}\to R^{T}$ , resulting in magnitude features $L_{T}$ :

图 2b) 中的幅度数据 $O$ 通过幅度块 $S:R^{M}\to R^{T}$ 进行处理，生成幅度特征 $L_{T}$：

$$
L_{T}=S(O).
$$

To further process these image features, AstroMAE applies two fully connected layers with a ReLU activation in between to both $L_{E}$ and $L_{W}$ . The resulting features are denoted as $L_{E C}$ and $L_{W C}$ , representing the processed outputs of the frozen encoder and the inception model in Figure 2c), respectively.

为了进一步处理这些图像特征，AstroMAE 对 $L_{E}$ 和 $L_{W}$ 分别应用了两个全连接层，并在它们之间使用了 ReLU 激活函数。处理后的特征分别表示为 $L_{E C}$ 和 $L_{W C}$，分别代表图 2c) 中冻结编码器和 Inception 模型的处理输出。

$$
\begin{array}{r}{L_{E C}=F C(\mathrm{ReLU}(F C(L_{E}))),}\ {L_{W C}=F C(\mathrm{ReLU}(F C(L_{W}))).}\end{array}
$$

Finally, the redshift prediction $P^{R S}$ is obtained as shown in Figure 2d) by concatenating $L_{T}$ , $L_{E C}$ , and $L_{W C}$ , and then passing them through two fully connected layers with a ReLU activation function in between:

最后，通过将 $L_{T}$ 、 $L_{E C}$ 和 $L_{W C}$ 连接起来，然后通过两个全连接层（中间使用 ReLU 激活函数）得到红移预测 $P^{R S}$ ，如图 2d) 所示：

$$
P^{R S}=F C(\mathrm{ReLU}(F C(\mathrm{Concat}(L_{T},L_{E C},L_{W C}))))
$$

The predicted redshift $P^{R S}$ is compared with the actual redshift $R^{R S}$ , and the cost is calculated using the Mean Squared Error (MSE) over $N$ samples:

预测的红移 $P^{R S}$ 与实际红移 $R^{R S}$ 进行比较，并使用均方误差 (MSE) 计算 $N$ 个样本的成本：

$$
\mathrm{MSE}=\frac{1}{N}\sum_{i=1}^{N}\left(P_{i}^{R S}-R_{i}^{R S}\right)^{2}
$$

Figure 2. AstroMAE fine-tuning architecture.

图 2: AstroMAE 微调架构。

Table 1. Redshift Prediction Using Various Architectures with AstroMAE Fathkouhi and Fox (2024).

表 1. 使用不同架构和 AstroMAE 进行红移预测 (Fathkouhi 和 Fox, 2024)

训练类型	架构	MSE	MAE
监督训练 (从头开始)	plain-ViT-magnitude	0.00077	0.01871
	pcm-ViT-magnitude	0.00057	0.01604
	Henghes 等人 (2022)	0.00058	0.01568
	plain-ViT	0.00097	0.02123
	pcm-ViT	0.00063	0.01686
	Inception-only	0.00064	0.01705
微调	plain-ViT-magnitude	0.00068	0.01740
	pcm-ViT-magnitude	0.00060	0.01655
	plain-AstroMAE	0.00056	0.01558
	pcm-AstroMAE	0.00053	0.01520
	plain-ViT	0.00086	0.01970
	pcm-ViT	0.00084	0.01945
	plain-ViT-inception	0.00059	0.01622
	pcm-ViT-inception	0.00059	0.01601

Table 1 shows the performance of various architectures combining a pretrained encoder, magnitude block, and the inception model. Rows labeled “from-scratch” denote models where both plain-ViT Do sov it ski y (2020) and pcmViT Wang et al. (2022) are initialized and trained entirely from scratch. In contrast, the fine-tuning section includes models where pcm-ViT and plain-ViT are pretrained and frozen during the fine-tuning. All modules in Table 1 adhere to the architecture shown in Figure 2.

表 1 展示了结合预训练编码器、幅度块和 inception 模型的各种架构的性能。标记为“from-scratch”的行表示模型中的 plain-ViT Do sov it ski y (2020) 和 pcmViT Wang et al. (2022) 都是从头初始化和训练的。相比之下，微调部分包括在微调期间预训练并冻结的 pcm-ViT 和 plain-ViT 模型。表 1 中的所有模块都遵循图 2 所示的架构。

The AstroMAE encoder is pretrained on $80%$ of the data, $10%$ for validation, and the remaining $10%$ for finetuning. For fine-tuning, this $10%$ is further split into $70%$ for training, $10%$ for validation, and $20%$ for testing. The results in Table 1 are based on inference over the $20%$ testing samples from the fine-tuning model. The results underscore the rationale behind AstroMAE’s architectural choices.

AstroMAE 编码器在 80% 的数据上进行预训练，10% 用于验证，剩余的 10% 用于微调。在微调阶段，这 10% 的数据进一步分为 70% 用于训练，10% 用于验证，20% 用于测试。表 1 中的结果基于对微调模型中 20% 测试样本的推理。这些结果强调了 AstroMAE 架构选择背后的合理性。

Plain-ViT and pcm-ViT models exhibit better results compared to their from-scratch counterparts due to pre training. Pcm-ViT outperforms plain-ViT in both cases, suggesting that self-attention alone lacks high-frequency information, which convolutional layers intuitively capture. The Inception-only model does not outperform pcm-ViT trained from scratch. This highlights that while highfrequency information captured by the inception module is beneficial, broader, low-frequency patterns are also essential for accurate redshift prediction. Similarly, pcm-ViT and plain-ViT achieve better results when combined with the inception model. So AstroMAE incorporates the Inception model, enhancing the model’s ability to capture detailed features.

由于预训练，Plain-ViT 和 pcm-ViT 模型相比从头训练的模型表现出更好的结果。在这两种情况下，pcm-ViT 都优于 plain-ViT，这表明仅靠自注意力机制缺乏高频信息，而卷积层直观地捕捉到了这些信息。仅使用 Inception 的模型并没有优于从头训练的 pcm-ViT。这突显了虽然 Inception 模块捕捉到的高频信息是有益的，但更广泛的低频模式对于准确的红移预测也是必不可少的。同样，pcm-ViT 和 plain-ViT 在与 Inception 模型结合时取得了更好的结果。因此，AstroMAE 结合了 Inception 模型，增强了模型捕捉细节特征的能力。

Moreover, the image data alone may be insufficient for optimal redshift predictions. A comparison between pcmViT and plain-ViT inception models with the proposed AstroMAE incorporating an additional magnitude block, as shown in Figure 2, supports this insight. So magnitude values for each image band are integrated during fine-tuning. Given the superior performance of pcm-AstroMAE, it has been selected for this study.

此外，仅凭图像数据可能不足以实现最佳的红移预测。图 2 中展示了 pcmViT 和 plain-ViT 初始模型与结合了额外星等块的 AstroMAE 的比较，支持了这一观点。因此，在微调过程中，每个图像波段的星等值被整合进来。鉴于 pcm-AstroMAE 的优越性能，本研究选择了它。

Proposed Framework: CAI

提出的框架：CAI

Analyzing large astronomical data requires advanced distributed systems to handle concurrent jobs at scale. We propose a novel cloud architecture called Cloud-based Astronomy Inference (CAI), using AWS serverless AWS (2024) techniques to solve this challenge with significant speed-up improvement. Figure 3 shows an overview of the proposed framework. It has the following steps:

分析大型天文数据需要先进的分布式系统来处理大规模并发任务。我们提出了一种名为基于云的天文推理 (Cloud-based Astronomy Inference, CAI) 的新型云架构，利用 AWS 无服务器技术 (AWS, 2024) 来解决这一挑战，并显著提升了速度。图 3 展示了所提出框架的概览，包含以下步骤：

Initialization: Defines the parameters and configurations of each child workflow based on the input payload. Then, it returns the parameter array with the state output. The concurrent job array size is the image data size divided by the smallest partition size. E.g. if the data and partition size is 1GB and 25MB respectively, the number of partitions is $\left[1024\div25\right]=41$ .
初始化：根据输入的有效载荷定义每个子工作流的参数和配置。然后，返回带有状态输出的参数数组。并发作业数组的大小是图像数据大小除以最小分区大小。例如，如果数据和分区大小分别为1GB和25MB，则分区数量为$\left[1024\div25\right]=41$。
Data Partitioning: The total data is split into smaller sizes so that each lambda function can work with one small partition at a time during distributed processing. We use a maximum of 25MB of data per partition (arbitrarily chosen. Section shows results for different partition sizes). This keeps the data loading time low and allows batch data processing. This is important because too large data can potentially get out of memory (AWS Lambda has a maximum of 10GB memory).
数据分区：将总数据分割成较小的部分，以便在分布式处理过程中，每个 lambda 函数可以一次处理一个小分区。我们每个分区最多使用 25MB 的数据（任意选择。部分展示了不同分区大小的结果）。这样可以保持数据加载时间较短，并允许批量数据处理。这一点很重要，因为过大的数据可能会导致内存不足（AWS Lambda 的最大内存为 10GB）。
Distributed Processing: This step is where concurrent job processing is done based on each input item from the initialization. It uses the following AWS components:
分布式处理：此步骤基于初始化中的每个输入项进行并发作业处理。它使用以下 AWS 组件：

Figure 3. CAI framework overview using AWS serverless computing. It uses the AWS S3 bucket for data, code, and result storage. The state machine defines the workflow steps using AWS lambda functions and distributed maps. Parallel execution is achieved through data partitions for almost linear high-performance inference scaling.

图 3: 使用 AWS 无服务器计算的 CAI 框架概览。它使用 AWS S3 存储桶来存储数据、代码和结果。状态机使用 AWS lambda 函数和分布式映射定义工作流步骤。通过数据分区实现并行执行，从而实现几乎线性的高性能推理扩展。

Summarize Results: The results from each job are summarized by this lambda function for final evaluation.
总结结果：此 lambda 函数汇总每个任务的结果以进行最终评估。

As part of this work, we introduce and include a novel integration with FMI, FaaS Message Interface Bo¨hringer (2022) for communication over Amazon AWS Lambda and collective operations, as illustrated in 4. This is not supported by default in AWS but offers performance benefits related to message passing between the Lambda functions. FMI supports the following collective operations: 1) send/receive, 2) broadcast, 3) gather, 4) scatter, 5) reduce, 6) all reduce, and 7) scan. Additional operations and languages can be easily added based on the abstract design of the library. FMI uses a rendezvous server during the initialization phase to exchange IP addresses and establish point-to-point connections between a pair of AWS Lambda functions. Functions can pass useful messages or perform collective operations during the later stages (e.g., model inference) once direct communication is established.

作为这项工作的一部分，我们引入并包含了一种与 FMI (FaaS Message Interface) 的新颖集成，用于通过 Amazon AWS Lambda 进行通信和集体操作，如图 4 所示。AWS 默认不支持此功能，但它提供了与 Lambda 函数之间消息传递相关的性能优势。FMI 支持以下集体操作：1) 发送/接收，2) 广播，3) 收集，4) 分散，5) 归约，6) 全归约，以及 7) 扫描。基于库的抽象设计，可以轻松添加其他操作和语言。FMI 在初始化阶段使用会合服务器来交换 IP 地址并在 AWS Lambda 函数对之间建立点对点连接。一旦建立了直接通信，函数可以在后续阶段（例如模型推理）传递有用消息或执行集体操作。

Figure 4. CAI: The Architecture of Serverless Cloud with FMI for Cosmic AI Inference.

图 4: CAI：基于 FMI 的 Serverless 云架构，用于 Cosmic AI 推理。

Experiments

实验

We intend to showcase how CAI is a novel approach for computing redshift prediction compared to other devices. This section thoroughly outlines the dataset used in our experiment, the metrics we use to assess each device, the experimental setup, and the analysis of our results.

我们旨在展示CAI与其他设备相比在计算红移预测方面的新颖方法。本节详细介绍了我们实验中使用的数据集、用于评估每个设备的指标、实验设置以及结果分析。

Dataset

数据集

The dataset used in this study is prepared following Fathkouhi and Fox (2024) and originates from Pasquet et al. (2019). It has 659,857 galaxy images derived from SDSS DR8 Aihara et al. (2011). Each image has dimensions of $64!\times!64!\times!5$ and is annotated with 64 physical properties, including class, metal li city, and age, along with unique IDs for cross-referencing with other SDSS physical property tables. These images were captured using a 2.5- meter dedicated telescope at Apache Point Observatory in New Mexico and were already background-subtracted and photometric ally calibrated. Pasquet et al. (2019) further processed the images by resampling them onto a common grid of overlapping frames and applying the La´nczos-3 resampling kernel Duchon (1979) to enhance image quality and reduce artifacts. The images are center-cropped to a final size of $32!\times!32!\times!5$ , and the magnitude values for each band, along with the redshift, are obtained using the Astroquery library Ginsburg et al. (2019). The resulting dataset, consisting of images and magnitudes as inputs and redshift as the target, is then prepared for model training.

本研究中使用的数据集按照 Fathkouhi 和 Fox (2024) 的方法准备，数据源自 Pasquet 等人 (2019) 的研究。该数据集包含 659,857 张来自 SDSS DR8 Aihara 等人 (2011) 的星系图像。每张图像的尺寸为 $64!\times!64!\times!5$，并标注了 64 个物理属性，包括类别、金属丰度和年龄，以及用于与其他 SDSS 物理属性表交叉引用的唯一 ID。这些图像是在新墨西哥州阿帕奇点天文台使用一台 2.5 米专用望远镜拍摄的，并且已经进行了背景扣除和光度校准。Pasquet 等人 (2019) 进一步处理了这些图像，将它们重新采样到一个重叠帧的公共网格上，并应用了 La´nczos-3 重采样核 Duchon (1979) 以提高图像质量并减少伪影。图像最终被中心裁剪为 $32!\times!32!\times!5$ 的尺寸，并且使用 Astroquery 库 Ginsburg 等人 (2019) 获取了每个波段的星等值以及红移值。最终的数据集由图像和星等值作为输入，红移作为目标，准备用于模型训练。

Figure 5. The parameter counts for recent deep learning-based methods developed for astronomy images, capable of inference across diverse computing environments—including a personal laptop, HPC CPUs, HPC GPUs, and our proposed cloud-based framework, CAI. A pre-trained AstroMAE model is used for the inference scaling experiments.

图 5: 最近开发的基于深度学习的天文学图像方法的参数量，能够在多种计算环境中进行推理，包括个人笔记本电脑、HPC CPU、HPC GPU 和我们提出的基于云的框架 CAI。推理扩展实验使用了预训练的 AstroMAE 模型。

Implementation

实现

We used Python 3.11 with PyTorch 2 as the core framework. Also, NumPy 1.2 and Pandas 2.2 for data analytics. Additionally, $\mathrm{Timm}~0.4.12$ was independently installed to provide access to

[论文翻译]使用云无服务器计算与FMI进行可扩展的宇宙AI推理

原文地址：https://arxiv.org/pdf/2501.06249

Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI

使用云无服务器计算与FMI进行可扩展的宇宙AI推理

Abstract

摘要

Keywords

关键词

Introduction

引言

相关工作

Problem Statement

问题陈述

Methodology

方法论

AstroMAE

AstroMAE

Proposed Framework: CAI

提出的框架：CAI

Experiments

实验

Dataset

数据集

Implementation

实现

[论文翻译]使用云无服务器计算与FMI进行可扩展的宇宙AI推理

原文地址：https://arxiv.org/pdf/2501.06249

Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI

使用云无服务器计算与FMI进行可扩展的宇宙AI推理

Abstract

摘要

Keywords

关键词

Introduction

引言

Related Work

相关工作

Problem Statement

问题陈述

Methodology

方法论

AstroMAE

AstroMAE

Proposed Framework: CAI

提出的框架：CAI

Experiments

实验

Dataset

数据集

Implementation

实现