0

arXiv:2405.18376v3 Announce Type: replace
Abstract: Existing SFDA methods struggle to fully use pre-trained knowledge and often rely on a single model's predictions or handcrafted prompts, limiting robustness under domain shift. Multimodal Large Language Models (MLLMs) offer a promising alternative: they encode rich visual-semantic knowledge and generalize well without task-specific tuning. However, their use in SFDA is hindered by instruction-following failures, inconsistent outputs, and high inference costs. We propose Reliability-based Curriculum Learning (RCL), a novel framework that distills robust supervision from multiple frozen MLLMs into a compact target model. RCL organizes adaptation as a three-stage curriculum that progressively incorporates pseudo-labels based on inter-model agreement and model confidence, enabling stable and noise-aware training. Our approach achieves state-of-the-art performance on standard SFDA datasets, Office-Home, DomainNet-126, and VisDA-C, outperforming zero-shot MLLMs, their ensembles, all without accessing source data or tuning foundation models. Our code is available at: https://github.com/Dong-Jie-Chen/RCL.