[论文翻译]PYTEXT:从 NLP 研究到生产的无缝路径


原文地址:https://arxiv.org/pdf/1812.08729v1


PYTEXT: A SEAMLESS PATH FROM NLP RESEARCH TO PRODUCTION

PYTEXT:从 NLP 研究到生产的无缝路径

Ahmed Aly1 Kushal Lakhotia1 Shicong Zhao1 Mrinal Mohit Barlas Oguz2 Abhinav Arora 1 Sonal Gupta 1 Christopher Dewan 2 Stef Nelson-Lindall 2 Rushin Shah 1

Ahmed Aly1 Kushal Lakhotia1 Shicong Zhao1 Mrinal Mohit Barlas Oguz2 Abhinav Arora1 Sonal Gupta1 Christopher Dewan2 Stef Nelson-Lindall2 Rushin Shah1

1Facebook Conversational AI 2Facebook AI

1 Facebook 对话式 AI
2 Facebook AI

ABSTRACT

摘要

We introduce PyText1 – a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale.

我们介绍 PyText1——一个基于 PyTorch 的深度学习 NLP 建模框架。PyText 解决了快速实验和大规模模型服务的常见冲突需求。它通过为模型组件提供简单且可扩展的接口,并利用 PyTorch 的能力通过优化的 Caffe2 执行引擎导出模型进行推理,实现了这一目标。我们报告了将实验和生产工作流迁移到 PyText 的经验,这使得我们能够更快地迭代新的建模想法,并在工业规模上无缝部署它们。

1 INTRODUCTION

1 引言

When building a machine learning system, especially one based on neural networks, there is usually a trade-off between ease of experimentation and deployment readiness, often with conflicting requirements. For instance, to rapidly try out flexible and non-conventional modeling ideas, researchers tend to use modern imperative deep-learning frameworks like PyTorch2 or TensorFlow Eager3. These frameworks provide an easy, eager-execution interface that facilitates writing advanced and dynamic models quickly, but also suffer from overhead in latency at inference and impose deployment challenges. In contrast, productionoriented systems are typically written in declarative frameworks that express the model as a static graph, such as Caffe24 and Tensor flow 5. While being highly optimized for production scenarios, they are often harder to use, and make the experimentation life-cycle much longer. This conflict is even more prevalent in natural language processing (NLP) systems, since most NLP models are inherently very dynamic, and not easily expressible in a static graph. This adds to the challenge of serving these models at an industrial scale.

在构建机器学习系统时,尤其是在基于神经网络的系统中,通常需要在实验便捷性和部署准备度之间进行权衡,这两者往往存在冲突的需求。例如,为了快速尝试灵活且非传统的建模思路,研究人员倾向于使用现代的命令式深度学习框架,如 PyTorch2 或 TensorFlow Eager3。这些框架提供了易于使用的即时执行接口,便于快速编写高级和动态模型,但在推理时存在延迟开销,并且给部署带来了挑战。相比之下,面向生产的系统通常使用声明式框架编写,这些框架将模型表示为静态图,例如 Caffe24 和 TensorFlow5。尽管这些框架在生产场景中高度优化,但它们通常更难使用,并且会大大延长实验周期。这种冲突在自然语言处理 (NLP) 系统中更为普遍,因为大多数 NLP 模型本质上是高度动态的,难以用静态图表达。这进一步增加了在工业规模上部署这些模型的挑战。

PyText, built on PyTorch $1.0~^{6}$ , is designed to achieve the following:

PyText,基于 PyTorch $1.0~^{6}$,旨在实现以下目标:

Table 1. Comparison of NLP Modeling Frameworks

表 1. NLP 建模框架对比

NLP 框架 深度学习支持 易于原型设计 工业性能
CoreNLP X
AllenNLP X
FLAIR X
Spacy 2.0 X
PyText

Existing popular frameworks for building state-of-the-art NLP models include Stanford CoreNLP (Manning et al., 2014), AllenNLP (Gardner et al., 2017), FLAIR (Akbik et al., 2018) and Spacy $2.0~^{7}$ . CoreNLP has been a popular library for both research and production, but does not support neural network models very well. AllenNLP and

现有的用于构建最先进 NLP 模型的流行框架包括 Stanford CoreNLP (Manning et al., 2014)、AllenNLP (Gardner et al., 2017)、FLAIR (Akbik et al., 2018) 和 Spacy $2.0~^{7}$。CoreNLP 一直是研究和生产中的流行库,但对神经网络模型的支持不够好。AllenNLP 和

FLAIR are easy-to-use for prototypes but it is hard to productionize the models since they are in Python, which doesn’t support large scale real time requests due to lack of good multi-threading support. Spacy 2.0 has some state-of-the-art NLP models built for production use-cases but is not easily extensible for quick prototyping and building new models.

FLAIR 易于用于原型开发,但由于其基于 Python语言,难以将模型投入生产,因为 Python语言 缺乏良好的多线程支持,无法应对大规模实时请求。Spacy 2.0 内置了一些最先进的 NLP 模型,适用于生产环境,但在快速原型设计和构建新模型方面不易扩展。

2 FRAMEWORK DESIGN

2 框架设计

PyText is a modeling framework that helps researchers and engineers build end-to-end pipelines for training or inference. Apart from workflows for experimentation with model architectures, it provides ways to customize handling of raw data, reporting of metrics, training methodology and exporting of trained models. PyText users are free to implement one or more of these components and can expect the entire pipeline to work out of the box. A number of default pipelines are implemented for popular tasks which can be used as-is. We now dive deeper into building blocks of the framework and its design.

PyText 是一个建模框架,帮助研究人员和工程师构建用于训练或推理的端到端管道。除了用于实验模型架构的工作流外,它还提供了自定义原始数据处理、指标报告、训练方法以及训练模型导出的方式。PyText 用户可以自由实现一个或多个这些组件,并期望整个管道能够开箱即用。许多默认管道已为流行任务实现,可以直接使用。现在我们将深入探讨该框架的构建模块及其设计。

2.1 Component

2.1 组件

Everything in PyText is a component. A component is clearly defined by the parameters required to configure it. All components are maintained in a global registry which makes PyText aware of them. They currently include –

PyText 中的所有内容都是一个组件。组件通过配置所需的参数明确定义。所有组件都维护在一个全局注册表中,这使得 PyText 能够识别它们。它们目前包括——

Task: combines various components required for a training or inference task into a pipeline. Figure 1 shows a sample config for a document classification task. It can be configured as a JSON file that defines the parameters of all the children components.

任务:将训练或推理任务所需的各种组件组合成一个流水线。图 1 展示了一个文档分类任务的示例配置。它可以配置为一个 JSON 文件,定义所有子组件的参数。

Data Handler: processes raw input data and prepare batches of tensors to feed to the model.

数据处理器:处理原始输入数据并准备批量张量以输入模型。

Model: defines the neural network architecture.

模型:定义神经网络架构。

Optimizer: encapsulates model parameter optimization us- ing loss from forward pass of the model.

优化器 (Optimizer):封装了使用模型前向传播的损失进行模型参数优化的过程。

Metric Reporter: implements the relevant metric computation and reporting for the models.

指标报告器:实现模型的相关指标计算和报告。

Trainer: uses the data handler, model, loss and optimizer to train a model and perform model selection by validating against a holdout set.

训练器:使用数据处理器、模型、损失函数和优化器来训练模型,并通过在保留集上进行验证来执行模型选择。

Predictor: uses the data handler and model for inference given a test dataset.

预测器:使用数据处理器和模型对给定的测试数据集进行推理。

Exporter: exports a trained PyTorch model to a Caffe2 graph using $\mathrm{ONNX^{8}}$ .

Exporter: 使用 $\mathrm{ONNX^{8}}$ 将训练好的 PyTorch 模型导出为 Caffe2 图。


Figure 1. Document Classification Task Config

图 1: 文档分类任务配置

2.2 Design Overview

2.2 设计概述

The task bootstraps a PyText job and creates all the required components. There are two modes in which a job can be run:

任务启动一个 PyText 任务并创建所有必需的组件。任务可以在两种模式下运行:

• Train: Trains a model either from scratch or from a saved check-point. Task uses the Data Handler to create batch iterators over training, evaluation and test datasets and passes these iterators along with model, optimizer and metrics reporter to the trainer. Subsequently, the trained model is serialized in PyTorch format as well as converted to a static Caffe2 graph.

• 训练:从头开始或从保存的检查点训练模型。任务使用数据处理器(Data Handler)创建训练、评估和测试数据集的批量迭代器,并将这些迭代器与模型、优化器和指标报告器一起传递给训练器。随后,训练好的模型以 PyTorch 格式序列化,并转换为静态的 Caffe2 图。


Figure 2. PyText Framework Design

图 2: PyText 框架设计

• Predict: Loads a pre-trained model and computes its prediction for a given test set. The task Manager, again, uses the Data Handler to create a batch iterator over the test data-set and passes it with the model to the predictor for inference.

• 预测:加载预训练模型并计算其对给定测试集的预测结果。任务管理器再次使用数据处理器在测试数据集上创建批量迭代器,并将其与模型一起传递给预测器进行推理。

Figure 2 illustrates the overall design of the framework.

图 2: 展示了框架的整体设计。

3 MODELING SUPPORT

3 建模支持

We now discuss the native support for building and extending models in PyText.

我们现在讨论 PyText 中构建和扩展模型的原生支持。

3.1 Terminology

3.1 术语

Module: is a reusable component that is implemented without any knowledge of which model it will be used in. It defines a clear input and output interface such that it can be plugged into another module or model.

模块:是一个可重用的组件,它在实现时无需了解将用于哪个模型。它定义了清晰的输入和输出接口,以便可以插入到另一个模块或模型中。

Model: has a one-to-one mapping with a task. Each model can be made up of a combination of modules for running a training or prediction job.

模型:与任务具有一一对应的映射关系。每个模型可以由多个模块组合而成,用于运行训练或预测任务。

3.2 Model Abstraction

3.2 模型抽象

PyText provides a simple, easily extensible model abstraction. We break up a single-task model into Token Embed- ding, Representation, Decoder and Output layers, each of which is configurable. Further, each module can be saved and loaded individually to be reused in other models.

PyText 提供了一个简单且易于扩展的模型抽象。我们将单任务模型分解为 Token Embedding(Token 嵌入)、Representation(表示)、Decoder(解码器)和 Output(输出)层,每一层都是可配置的。此外,每个模块可以单独保存和加载,以便在其他模型中重复使用。

Token Embedding: converts a batch of numerical i zed tokens into a batch of vector embeddings for each token. It can be configured to use embeddings of a number of styles: pretrained word-based, trainable word-based, character-based with CNN and highway networks(Kim et al., 2016), pretrained deep contextual character-based (e.g., ELMo(Peters et al., 2018)), token-level gazetteer features or morphologybased (e.g. capitalization).

Token Embedding: 将一批数值化的 Token 转换为一组向量嵌入 (vector embeddings)。它可以配置为使用多种风格的嵌入:预训练的词嵌入 (pretrained word-based)、可训练的词嵌入 (trainable word-based)、基于字符的嵌入 (character-based) 并带有 CNN 和高速公路网络 (highway networks) (Kim et al., 2016)、预训练的深度上下文字符嵌入 (pretrained deep contextual character-based)(例如 ELMo (Peters et al., 2018))、Token 级别的地名录特征 (token-level gazetteer features) 或基于形态的特征 (morphology-based)(例如大写)。

Representation: processes a batch of embedded tokens to a representation of the input. The implementation of what it emits as output depends on the task, e.g., the representation of the document for a text classification task will differ from that for a word tagging task. Logically this part of the model should implement the sub-network such that its output can be interpreted as features over the input. Examples of the different representations that are present in PyText are; Bidirectional LSTM and CNN representations.

表示:将一批嵌入的 Token 处理为输入的表示。其输出的具体实现取决于任务,例如,文本分类任务的文档表示将与词性标注任务的表示不同。从逻辑上讲,模型的这一部分应实现子网络,以便其输出可以解释为输入的特征。PyText 中存在的不同表示示例包括:双向 LSTM 和 CNN 表示。

Decoder: is responsible for generating logits from the input representation. Logically this part of the model should implement the sub-network that generates model output over the features learned by the representation.

解码器 (Decoder):负责从输入表示生成 logits。从逻辑上讲,模型的这一部分应实现从表示学习到的特征生成模型输出的子网络。

Output Layer: concerns itself with generating prediction and the loss (when label or ground truth is provided).

输出层:负责生成预测和计算损失(当提供标签或真实值时)。

These modules compose the base model implementation, they can be easily extended for more complicated architectures.

这些模块构成了基础模型的实现,它们可以轻松扩展以支持更复杂的架构。

3.3 Multi-task Model Training

3.3 多任务模型训练

PyText supports multi-task training (Collobert & Weston, 2008) to optimize multiple tasks jointly as a first-class citizen. We use multi-task model by allowing parameter sharing between modules of the multiple single task models. We use the model abstraction for single task discussed in Section 3.2 to define the tasks and let the user declare which modules of those single tasks should be shared. This enables training a model with one or more input representations jointly against multiple tasks.

PyText 支持多任务训练 (Collobert & Weston, 2008) ,将多任务联合优化作为一等公民。我们通过允许多个单任务模型的模块之间共享参数来使用多任务模型。我们使用第 3.2 节中讨论的单任务模型抽象来定义任务,并让用户声明这些单任务的哪些模块应该共享。这使得能够针对多个任务联合训练一个或多个输入表示的模型。

Multi-task models make the following assumptions:

多任务模型做出以下假设:

• If there are n tasks in the multi-task model setup then there must be n data sources containing data for one task each.

• 如果在多任务模型设置中有 n 个任务,那么必须有 n 个数据源,每个数据源包含一个任务的数据。

• The single task scenario must be implemented for it to be reused for the multi-task setup.

• 单任务场景必须实现,以便在多重任务设置中重复使用。


Figure 3. Joint document classification and word tagging model

图 3: 联合文档分类和词标注模型

3.3.1 Multi-task Model Examples

3.3.1 多任务模型示例

PyText provides the flexibility of building any multi-task model architecture with the appropriate model configuration, if the two assumptions listed above are satisfied. The examples below give a flavor of two sample model architectures built with PyText for joint learning against more than one task.

PyText 提供了构建任何多任务模型架构的灵活性,只要满足上述两个假设,就可以通过适当的模型配置实现。以下示例展示了使用 PyText 构建的两个样本模型架构,用于针对多个任务进行联合学习。

Figure 3 illustrates a model that learns a shared document representation for document classification and word tagging tasks. This model is useful for natural language understanding where given a sentence, we want to predict the intent behind it and tag the slots in the sentence. Jointly optimizing for two tasks helps the model learn a robust sentence representation for the two tasks. Further, we can use this pre-trained sentence representation for other tasks where training data is scarce.

图 3 展示了一个模型,该模型学习了一个共享的文档表示,用于文档分类和词性标注任务。该模型在自然语言理解中非常有用,当给定一个句子时,我们希望预测其背后的意图并标注句子中的槽位。联合优化这两个任务有助于模型为这两个任务学习一个鲁棒的句子表示。此外,我们可以将这个预训练的句子表示用于其他训练数据稀缺的任务。

Figure 4 illustrates a model that learns document and query representations using query-document relevance and individual query and document classification tasks. This is often used in information retrieval where, given a query and a document, we want to predict their relevance; but we also add query and document classification tasks to increase robustness of learned representations.

图 4 展示了一个模型,该模型通过查询-文档相关性以及单独的查询和文档分类任务来学习文档和查询的表示。这种方法通常用于信息检索中,在给定查询和文档的情况下,我们希望预测它们的相关性;但我们也添加了查询和文档分类任务,以增强学习表示的鲁棒性。

3.4 Model Zoo

3.4 模型库

PyText models are focused on NLP tasks that can be configured with a variety of modules. We enumerate here the classes of models that are currently supported.

PyText 模型专注于可以配置多种模块的自然语言处理 (NLP) 任务。我们在此列举当前支持的模型类别。


Figure 4. Joint query-document relevance and document classification model

图 4: 联合查询-文档相关性和文档分类模型

• Text Classification: classifies a sentence or a document into an appropriate category. PyText includes reference implementations of Bidirectional LSTM (Schuster & Paliwal, 1997) with Self-Attention (Lin et al., 2017) and Convolutional Neural Network (Kim, 2014) models for text classification.

• 文本分类:将句子或文档分类到适当的类别中。PyText 包含了双向 LSTM (Schuster & Paliwal, 1997) 与自注意力机制 (Lin et al., 2017) 以及卷积神经网络 (Kim, 2014) 模型的参考实现,用于文本分类。

• Word Tagging: labels word sequences, i.e. classifies each word in a sequence to an appropriate category. Common examples of such tasks include Partof-Speech (POS) tagging, Named Entity Recognition (NER) and Slot Filling in spoken language understanding. PyText contains reference implementations of Bidirectional LSTM with Slot-Attention and Bidirectional Sequential Convolutional Neural Network (Vu, 2016) for word tagging.

• 词性标注:为词序列打标签,即将序列中的每个词分类到适当的类别。此类任务的常见示例包括词性标注 (POS)、命名实体识别 (NER) 和口语理解中的槽位填充。PyText 包含了带有槽位注意力的双向 LSTM 和双向序列卷积神经网络 (Vu, 2016) 的参考实现,用于词性标注。

• Semantic Parsing: maps a natural language sentence into a formal representation of its meaning. PyText provides a reference implementation for Recurrent Neural Network Grammars (Dyer et al., 2016) (Gupta et al., 2018) for semantic parsing.

• 语义解析:将自然语言句子映射为其意义的正式表示。PyText 提供了用于语义解析的循环神经网络语法 (Dyer et al., 2016) (Gupta et al., 2018) 的参考实现。

• Language Modeling: assigns a probability to a sequence of words (sentence) in a language. It also assigns a probability for the likelihood of a given word to follow a sequence of words. PyText provides a reference implementation for a stacked LSTM Language Model (Mikolov et al., 2010).

• 语言建模:为语言中的单词序列(句子)分配概率。它还为给定单词跟随单词序列的可能性分配概率。PyText 提供了一个堆叠 LSTM 语言模型(Mikolov 等,2010)的参考实现。

• Joint Models: We utilize the multi-task training support illustrated earlier to fuse and train models for two or more of the tasks mentioned here and optimize their parameters jointly.

联合模型:我们利用前面展示的多任务训练支持,将两个或多个任务的模型融合并训练,并联合优化它们的参数。

4 PRODUCTION WORKFLOW

4 生产工作流

4.1 From Idea to Production

4.1 从想法到生产

Researchers and engineers can follow the following steps to validate their ideas and quickly ship them to production –

研究人员和工程师可以按照以下步骤验证他们的想法,并快速将其投入生产——


Figure 5. From Idea to Production flowchart

图 5: 从创意到生产的流程图

  1. Implement the model in PyText, and make sure offline metrics on the test set look good.
  2. 在 PyText 中实现模型,并确保测试集上的离线指标表现良好。
  3. Publish the model to the bundled PyTorch-based inference service, and do a real-time small scale evaluation on a live traffic sample.
  4. 将模型发布到基于 PyTorch 的推理服务中,并对实时流量样本进行小规模评估。
  5. Export it automatically to a Caffe2 net. In some cases, e.g. when using complex control flow logic and custom data-structures, this might not yet be supported via PyTorch 1.0.
  6. 自动导出为 Caffe2 网络。在某些情况下,例如使用复杂的控制流逻辑和自定义数据结构时,PyTorch 1.0 可能还不支持此功能。
  7. If the procedure in 3 isn’t supported, use the PyTorch $\mathrm{{C++~API^{9}}}$ to rewrite the model (only the torch.nn.Module10 subclass) and wrap it in a Caffe2 operator.
  8. 如果第3步中的过程不受支持,则使用 PyTorch 的 $\mathrm{{C++~API^{9}}}$ 重写模型(仅限 torch.nn.Module10 子类),并将其封装在 Caffe2 操作符中。
  9. Publish the model to the production-grade Caffe2 prediction service and start serving live traffic
  10. 将模型发布到生产级 Caffe2 预测服务并开始处理实时流量

4.2 Benchmarks

4.2 基准测试

Table 2. Latency Comparison (in milliseconds, smaller is better) of Python and $\mathrm{C}{+}{+}$ implementations of PyText models

表 2. Python 和 $\mathrm{C}{+}{+}$ 实现的 PyText 模型的延迟比较 (单位:毫秒,越小越好)

模型 实现方式 P50 P90 P99
JointBLSTM PyTorch 34.08 47.23 64.94
导出到 Caffe2 19.65 24.69 30.21
RNNG PyTorch 19.74 28.53 36.37
PyTorch C++ 18.73 25.47 32.63

We compared the performance of Python and $\mathrm{C}{+}{+}$ models (either directly exported to Caffe2 or re-written with the PyTorch $\mathrm{C++API^{11}}$ ) on an intent-slot detection task. We note that porting to $\mathrm{C}{+}{+}$ gave significant latency boosts (Table 2) for the JointBLSTM model and a slight boost for the RNNG model. The latter is still valuable though, since the highly performant production serving infrastructure in many companies don’t support Python code.

我们在意图-槽位检测任务上比较了 Python 和 $\mathrm{C}{+}{+}$ 模型(直接导出到 Caffe2 或使用 PyTorch 的 $\mathrm{C++API^{11}}$ 重写)的性能。我们注意到,将模型移植到 $\mathrm{C}{+}{+}$ 后,JointBLSTM 模型的延迟显著降低(表 2),而 RNNG 模型的延迟略有改善。尽管如此,后者的改进仍然很有价值,因为许多公司的高性能生产服务基础设施不支持 Python 代码。

The experiments were performed on a CPU-only machine with 48 Intel Xeon E5-2680 processors clocked at $2.5\mathrm{GHz}$ , with 251 GB RAM and CentOS 7.5. The $\mathrm{C}{+}{+}$ code was compiled with gcc -O3.

实验在一台仅配备 CPU 的机器上进行,该机器拥有 48 个 Intel Xeon E5-2680 处理器,主频为 $2.5\mathrm{GHz}$,内存为 251 GB,运行 CentOS 7.5 系统。$\mathrm{C}{+}{+}$ 代码使用 gcc -O3 进行编译。

4.3 Production Challenges

4.3 生产挑战

4.3.1 Data pre-processing

4.3.1 数据预处理

One limitation of PyTorch is that it doesn’t support string tensors; which means that any kind of string manipulation and indexing needs to happen outside the model. This is easy during training, but makes product ionization of the model tricky. We addressed this by writing a feat uri z ation library in $\mathrm{C}{+}+^{11}$ . This is accessible during training via Pybind12 and at inference as part of the runtime services suite shown in Figure 6. This library pre processes the raw input by performing tasks like –

PyTorch 的一个限制是不支持字符串张量 (string tensors);这意味着任何类型的字符串操作和索引都需要在模型外部进行。这在训练期间很容易实现,但会使模型的产品化变得棘手。我们通过在 $\mathrm{C}{+}+^{11}$ 中编写一个特征化库 (featurization library) 来解决这个问题。该库在训练期间通过 Pybind12 访问,在推理时作为运行时服务套件的一部分,如图 6 所示。该库通过执行以下任务来预处理原始输入:

• Text token iz ation and normalization • Mapping characters to IDs for character-based models • Perform token alignments for gazetteer features

• 文本 Token 化与归一化
• 将字符映射到基于字符模型的 ID
• 为地名录特征执行 Token 对齐

By sharing the feat uri z ation code across training and inference we ensure data consistency in the different stages of the model.

通过在训练和推理中共享特征化代码,我们确保了模型不同阶段的数据一致性。


Figure 6. Training and Inference Workflow Architecture

图 6: 训练和推理工作流架构

4.3.2 Vocabulary management

4.3.2 词汇管理

Another consequence of string tensors not being supported yet is that we can’t maintain vocabularies inside the model. We explored two solutions to this –

字符串张量尚未支持的另一个后果是我们无法在模型内部维护词汇表。我们探索了两种解决方案——

• Maintain the vocabularies in the remote feat uri z ation service. • After exporting the model, post-process the resultant Caffe2 graph and prepend the vocabularies to the net

• 维护远程特征向量化服务中的词汇表。
• 导出模型后,对生成的 Caffe2 图进行后处理,并将词汇表添加到网络前。

We ultimately opted for the second option since its nontrivial to maintain synchronization and versioning between training-time and test-time vocabularies, across different use cases and languages.

我们最终选择了第二种方案,因为在不同的使用场景和语言之间,保持训练时和测试时词汇表的同步和版本控制并非易事。

5 FUTURE WORK

5 未来工作

Upcoming enhancements to PyText span multiple domains:

PyText 即将推出的增强功能涵盖多个领域:

• Modeling Capabilities: Adding support for advanced NLP models for more use cases, e.g.

• 建模能力:增加对高级 NLP(自然语言处理)模型的支持,以应对更多用例,例如

– Question answering, reading comprehension and sum mari z ation tasks – Multilingual and language-agnostic tasks

– 问答、阅读理解和摘要任务
– 多语言和语言无关任务

• Performance Benchmarks and Improvements : A core goal of PyText is to enable building highly scalable models, with can run with low latency and high throughput. We plan to invest in –

• 性能基准测试与改进:PyText 的核心目标是支持构建高度可扩展的模型,这些模型能够以低延迟和高吞吐量运行。我们计划投入资源——

– Training speed – by augmenting the current distributed-training support with lower precision computations support like fp $16^{13}$ – Inference speed – by benchmarking performance and tuning the model deployment for expected load patterns.

– 训练速度 – 通过增加对低精度计算(如 fp $16^{13}$)的支持来增强当前的分布式训练支持
– 推理速度 – 通过基准测试性能并根据预期的负载模式调整模型部署

• Model Interpret ability: We plan to add more tooling support for monitoring metrics and debugging model internals –

• 模型可解释性:我们计划增加更多工具支持,用于监控指标和调试模型内部结构。

– Tensor board 14 and Visdom 15 integration for visualizing the different layers of the models and track evaluation metrics during training – Explore and implement different model explanation approaches, e.g LIME 16 and SHAP (Lundberg & Lee, 2017)

– 集成 TensorBoard 14 和 Visdom 15,用于可视化模型的不同层并在训练期间跟踪评估指标
– 探索并实现不同的模型解释方法,例如 LIME 16 和 SHAP (Lundberg & Lee, 2017)

• Model Robustness: Adversarial input, noise, and differences in grammar and syntax can often hurt model accuracy. To analyze and improve robustness against these perturbations, we plan to invest in adversarial training and data augmentation techniques.

• 模型鲁棒性:对抗性输入、噪声以及语法和句法的差异通常会损害模型的准确性。为了分析和提高对这些扰动的鲁棒性,我们计划投资于对抗训练和数据增强技术。

• Mobile Deployment Support: We utilize the optimized Caffe2 runtime engine to serve our models, and plan to leverage its optimization for mobile devices 17, as well as support training light-weight models.

• 移动端部署支持:我们利用优化的 Caffe2 运行时引擎来服务我们的模型,并计划利用其对移动设备的优化 [17],同时支持训练轻量级模型。

6 CONCLUSION

6 结论

In this paper we presented PyText – a new NLP modeling platform built on PyTorch. It blurs the boundaries between experiments and large scale deployment and makes it easy for both researchers and engineers to rapidly try out new modeling ideas and then production ize them. It does so by providing an extensible framework for adding new models and by defining a clear production workflow for rigorously evaluating and serving them. Using this framework and the processes defined here, we significantly reduced the time required for us to take models from research ideas to industrial-scale production.

在本文中,我们介绍了 PyText——一个基于 PyTorch 构建的新 NLP 建模平台。它模糊了实验与大规模部署之间的界限,使研究人员和工程师都能轻松快速地尝试新的建模思路,并将其投入生产。它通过提供一个可扩展的框架来添加新模型,并通过定义清晰的生产工作流程来严格评估和服务这些模型。利用这个框架和这里定义的过程,我们显著减少了将模型从研究想法转化为工业规模生产所需的时间。

REFERENCES

参考文献

php/AAAI/AAAI16/paper/view/12489.

php/AAAI/AAAI16/paper/view/12489.

Lin, Z., Feng, M., dos Santos, C. N., Yu, M., Xi- ang, B., Zhou, B., and Bengio, Y. A structured selfattentive sentence embedding. 2017. URL https: //openreview.net/forum?id $\underline{{\underline{{\mathbf{\Pi}}}}}$ BJC_jUqxe.

Lin, Z., Feng, M., dos Santos, C. N., Yu, M., Xiang, B., Zhou, B., and Bengio, Y. 一种结构化的自注意力句子嵌入。2017. URL https://openreview.net/forum?id $\underline{{\underline{{\mathbf{\Pi}}}}}$ BJC_jUqxe.

Lundberg, S. and Lee, S. A unified approach to interpreting model predictions. CoRR, abs/1705.07874, 2017. URL http://arxiv.org/abs/1705.07874.

Lundberg, S. 和 Lee, S. 解释模型预测的统一方法。CoRR, abs/1705.07874, 2017. URL http://arxiv.org/abs/1705.07874.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60, 2014. URL http://www.aclweb.org/ anthology/P/P14/P14-5010.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., 和 McClosky, D. The Stanford CoreNLP natural language processing toolkit. 在计算语言学协会 (ACL) 系统演示中, 第 55–60 页, 2014. URL http://www.aclweb.org/ anthology/P/P14/P14-5010.

Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., and Khudanpur, S. Recurrent neural network based language model. In INTER SPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp. 1045–1048, 2010. URL http://www.isca-speech.org/archive/ inter speech 2010/i10_1045.html.

Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., 和 Khudanpur, S. 基于循环神经网络的语言模型。在 INTER SPEECH 2010,第11届国际语音通信协会年会,日本千叶县幕张,2010年9月26-30日,第1045-1048页,2010年。URL http://www.isca-speech.org/archive/ inter speech 2010/i10_1045.html。

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Z ett le moyer, L. Deep contextual i zed word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, 2018. URL http://aclweb.org/anthology/N18-1202.

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. 深度上下文词表示。在《2018年北美计算语言学协会会议:人类语言技术》第一卷(长篇论文)中,第2227-2237页。计算语言学协会,2018年。URL http://aclweb.org/anthology/N18-1202.

Schuster, M. and Paliwal, K. Bidirectional recurrent neural networks. Trans. Sig. Proc., 45(11):2673–2681, November 1997. ISSN 1053-587X. doi: 10.1109/78. 650093. URL http://dx.doi.org/10.1109/78. 650093.

Schuster, M. 和 Paliwal, K. 双向循环神经网络。Trans. Sig. Proc., 45(11):2673–2681, 1997年11月。ISSN 1053-587X。doi: 10.1109/78.650093。URL http://dx.doi.org/10.1109/78.650093

Vu, N. T. Sequential convolutional neural networks for slot filling in spoken language understanding. In Inter speech 2016, pp. 3250–3254, 2016. doi: 10.21437/Interspeech. 2016-395. URL http://dx.doi.org/10.21437/ Inter speech.2016-395.

Vu, N. T. 用于口语理解中槽填充的顺序卷积神经网络。在Inter speech 2016中,第3250–3254页,2016年。doi: 10.21437/Interspeech.2016-395。URL http://dx.doi.org/10.21437/Inter speech.2016-395。

阅读全文(20积分)