[论文翻译]大语言模型增强检索:通过语言模型和文档级嵌入提升检索模型


原文地址:https://arxiv.org/pdf/2404.05825


LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

大语言模型增强检索:通过语言模型和文档级嵌入提升检索模型

Mingrui Wu Meta mingruiwu@meta.com

吴明瑞 Meta mingruiwu@meta.com

Abstract

摘要

Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-ofwords based approaches. This paper introduces a model-agnostic doc-level embedding framework through large language model (LLM) augmentation. In addition, it also improves some important components in the retrieval model training process, such as negative sampling, loss function, etc. By implementing this LLM-augmented retrieval framework, we have been able to significantly improve the effectiveness of widely-used retriever models such as Bi-encoders (Contriever, DRAGON) and late-interaction models (ColBERTv2), thereby achieving state-of-the-art results on LoTTE datasets and BEIR datasets.

最近,基于嵌入的检索(embedding-based retrieval)或密集检索(de

阅读全文(20积分)