Abstract: Text-to-video retrieval is an essential task in multimedia information retrieval, enabling users to search and retrieve videos based on natural language descriptions. In this paper, we ...
Abstract: Domain generalization (DG) aims to train a model on source domains that can generalize well to unseen domains. Recent advances in Vision-Language Models (VLMs), such as CLIP, exhibit ...