This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite significant progress in ...
Abstract: An ideal artificial intelligence (AI) system should have the capability to continually learn like humans. However, when learning new knowledge, AI systems often suffer from catastrophic ...
Abstract: Visual grounding focuses on localizing objects referred to by natural language queries. Existing fully and weakly supervised methods rely on a mass of language queries for training. However, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results