CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Wed, 16 Oct 2024 13:50:46 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"670fc4b6-7d36" expires: Mon, 29 Dec 2025 03:48:40 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 1E62:3655F2:840CC1:945E48:6951F7C0 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 05:34:44 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210046-BOM x-cache: HIT x-cache-hits: 0 x-timer: S1766986484.400963,VS0,VE204 vary: Accept-Encoding x-fastly-request-id: 8816edcc627bd6c620a6d70fd033d6f65f2b0faf content-length: 4848 CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev^1,2, Iurii Makarov¹, Jianyuan Wang^1,2, Natalia Neverova¹, Andrea Vedaldi¹, Christian Rupprecht²

¹Meta AI ²Visual Geometry Group, University of Oxford

Paper Code 🤗 Demo

Overview

Most state-of-the-art point trackers are trained on synthetic data due to the difficulty of annotating real videos for this task. However, this can result in suboptimal performance due to the statistical gap between synthetic and real videos. In order to understand these issues better, we introduce CoTracker, comprising a new tracking model and a new semi-supervised training recipe.

This allows real videos without annotations to be used during training by generating pseudo-labels using off-the-shelf teachers. The new model eliminates or simplifies components from previous trackers, resulting in a simpler and often smaller architecture. This training scheme is much simpler than prior work and achieves better results using 1,000 times less data.

We further study the scaling behaviour to understand the impact of using more real unsupervised data in point tracking. The model is available in online and offline variants and reliably tracks visible and occluded points. We demonstrate qualitatively impressive tracking results, where points can be tracked for a long time even when they are occluded or leave the field of view. Quantitatively, CoTracker outperforms all recent trackers on standard benchmarks, often by a substantial margin.

Tracking through occlusions

We track points sampled on the first frame. Only CoTracker and CoTracker3 can track through occlusions. However, CoTracker loses tracked points at the end while CoTracker3 is still tracking them.

BootsTAPIR

LocoTrack

CoTracker

Ours offline

Object-centric tracking on a regular grid

We track 10k points sampled on a regular grid starting from the initial video frame. Since the points are grid-sampled, tracks without significant transformations should maintain grid patterns in future frames. LocoTrack and CoTracker3 tracks are better aligned than BootsTAPIR tracks. Neither LocoTrack nor BootsTAPIR can track through occlusions. They also lose more background and object points than CoTracker3.

BootsTAPIR

LocoTrack

Ours offline

The effect of scaling

Scaling helps improve both online and offline models, while in these examples the online model benefits from scaling more than the offline one.

Ours online base

Ours online scaled

Ours offline base

Ours offline scaled

Failure cases

Featureless surfaces is a common mode of failure: the model cannot track points sampled in the sky or on the surface of water.

BibTeX

@InProceedings{karaev2024cotracker3,
    author    = {Nikita Karaev and Iurii Makarov and Jianyuan Wang and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
    title     = {{CoTracker3}: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos},
    journal   = {arxiv},
    year      = {2024}
  }

Original Source | Taken Source