Abstract: When we look around and perform complex tasks, how we see and selectively process what we see is crucial. How-ever, the lack of this visual search mechanism in current multimodal LLMs (MLLMs ...
Abstract: Visual behavior depends on both bottom-up mechanisms, where gaze is driven by the visual conspicuity of the stimuli, and top-down mechanisms, guiding attention towards relevant areas based ...