You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper has been accepted by IEEE Transactions on Image Processing (TIP).
📖 Overview
We propose SC-CLIP, a training-free method designed to enhance CLIP's dense feature representation, effectively addressing the uniform attention activations and feature homogenization caused by the anomaly tokens.
We mitigate the negative effects of anomaly tokens from two perspectives. First, we explicitly address the anomaly tokens based on local context. Second, we reduce their impact on normal tokens by enhancing feature discriminability and attention correlation, leveraging the spatial consistency inherent in CLIP's mid-level features.
Our approach sets new state-of-the-art results across popular benchmarks. And we conduct extensive experiments to validate the effectiveness of our method.
We provide the dataset configurations in this repository, following SCLIP.
Please follow the MMSeg data preparation document to download and pre-process the datasets. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{bai2025self,
title={Self-calibrated clip for training-free open-vocabulary segmentation},
author={Bai, Sule and Liu, Yong and Han, Yifei and Zhang, Haoji and Tang, Yansong and Zhou, Jie and Lu, Jiwen},
journal={IEEE Transactions on Image Processing},
year={2025},
publisher={IEEE}
}
About
[TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation