PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
PromptKD: 视觉-语言模型的无监督提示蒸馏
Abstract
摘要
Prompt learning has emerged as a valuable technique in enhancing vision-language models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly focuses on designing various learning forms of prompts, neglecting the potential of prompts as effective distillers for learning from larger teacher models. In this paper, we introduce an unsupervised domain prompt distillation framework, which aims to transfer the knowledge of a larger teacher model to a lightweight target model through prompt-driven imitation using unlabeled domain images. Specifically, our framework consists of two distinct stages. In the initial stage, we pre-train a large CLIP teacher model using domain (few-shot) labels. After pretraining, we leverage the unique decoupled-modality charact eris tics of CLIP by pre-computing and storing the text features as class vectors only once
