[论文翻译]PromptKD: 视觉-语言模型的无监督提示蒸馏


原文地址:https://arxiv.org/pdf/2403.02781v5


PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

PromptKD: 视觉-语言模型的无监督提示蒸馏

Abstract

摘要

Prompt learning has emerged as a valuable technique in enhancing vision-language models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly focuses on designing various learning forms of prompts, neglecting the potential of prompts as effective distillers for learning from larger teacher models. In this paper, we introduce an unsupervised domain prompt distillation framework, which aims to transfer the knowledge of a larger teacher model to a lightweight target model through prompt-driven imitation using unlabeled domain images. Specifically, our framework consists of two distinct stages. In the initial stage, we pre-train a large CLIP teacher model using domain (few-shot) labels. After pretraining, we leverage the unique decoupled-modality charact eris tics of CLIP by pre-computing and storing the text features as class vectors only once

阅读全文(20积分)