| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sat, 08 Jun 2024 19:37:05 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6664b2e1-4065"
expires: Mon, 29 Dec 2025 12:25:52 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 2AB0:2DDCFF:8C61AC:9D99D3:695270F8
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 12:15:52 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210063-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767010552.399939,VS0,VE205
vary: Accept-Encoding
x-fastly-request-id: 682ce02607ac586636491ea6d6aabf48f8d5c57b
content-length: 3897
MASA
Matching Anything By Segmenting Anything
CVPR 2024 Highlight
Siyuan Li1,
Lei Ke1,
Martin Danelljan1,
Luigi Piccinelli1,
Mattia Segu1,
Luc Van Gool1,2,
Fisher Yu1
1Computer Vision Lab, ETH Zurich,
2INSAIT
🔈 try sound on!
MASA provides a universal instance appearance model for matching any objects in any domain .
Abstract
The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings.
We propose MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM), MASA learns instance-level correspondence through exhaustive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection.
We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains.
Extensive tests on multiple challenging MOT and MOTS benchmarks indicate that the proposed method, using only unlabeled static images, achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences, in zero-shot association.
Video
Open-vocabulary Tracking
Track and Segment Any object
Using MASA with SAM, you can track and segment any object.
Fast Track Anything
MASA works with any detector. Here are the fast open-vocabulary tracking examples with Yolo-World-X.
Driving Scenes and Ego-centric Tracking
BibTeX
@article{masa,
author = {Li, Siyuan and Ke, Lei and Danelljan, Martin and Piccinelli, Luigi and Segu, Mattia and Van Gool, Luc and Yu, Fisher},
title = {Matching Anything By Segmenting Anything},
journal = {CVPR},
year = {2024},
}