SceneDiff: A Benchmark and Method for Multiview Object Change Detection

Yuqun Wu · Chih-hao Lin · Henry Che · Aditi Tiwari · Chuhang Zou · Shenlong Wang · Derek Hoiem

This repository contains the code for the paper SceneDiff: A Benchmark and Method for Multiview Object Change Detection. We investigate the problem of identifying objects that have been changed between a pair of captures of the same scene at different times, introducing the first object-level multiview change detection benchmark and a new training-free method.

SceneDiff Benchmark

Download the SceneDiff benchmark dataset from 🤗 Hugging Face.

mkdir data && cd data
wget https://huggingface.co/datasets/yuqun/SceneDiff/resolve/main/scenediff_benchmark.zip
unzip scenediff_bechmark.zip

Dataset Structure

scenediff_benchmark/
├── data/                          # 350 sequence pairs
│   ├── sequence_pair_1/
│   │   ├── original_video1.mp4    # Raw video before change
│   │   ├── original_video2.mp4    # Raw video after change
│   │   ├── video1.mp4             # Video with annotation mask (before)
│   │   ├── video2.mp4             # Video with annotation mask (after)
│   │   ├── segments.pkl           # Dense segmentation masks for evaluation
│   │   └── metadata.json          # Sequence metadata
│   ├── sequence_pair_2/
│   │   └── ...
│   └── ...
├── splits/                        # Val/Test splits
│   ├── val_split.json
│   └── test_split.json
└── vis/                           # Visualization tools
    ├── visualizer.py              # Flask-based web viewer
    ├── requirements.txt
    └── templates/

About segments.pkl: See the detailed description here.

Visualization: For better visualization, run the command:

cd data/scenediff_benchmark/vis && pip install -r requirements.txt
python visualizer.py

Evaluation

We expect the method predictions have following structures:

output_dir/
├── sequence_pair_1/
│   └── object_masks.pkl           # Dense segmentations of changed objects (for evaluation)
├── sequence_pair_2/
└── ...

with object_masks.pkl following this structure:

object_masks = {
    'H': int,                           # Image height
    'W': int,                           # Image width
    'video_1': {                        # Objects existing in video_1
        'object_id_1': {                # Integer ID for each detected object
            'frame_id_1': {             # Integer frame number
                'mask': RLE_Mask,       # Run-length encoded mask
                'cost': float           # Confidence score of the prediction
            },
            ...
        },
        ...
    },
    'video_2': {                        # Objects existing in video_2
        'object_id_1': {                # Integer ID for each detected object
            'frame_id_1': {             # Integer frame number
                'mask': RLE_Mask,       # Run-length encoded mask
                'cost': float           # Confidence score of the prediction
            },
            ...
        },
        ...
    }
}

Then the evaluation script can be run with:

python scripts/evaluate_multiview.py \
    --pred_dir ${OUTPUT_DIR} \
    --duplicate_match_threshold 2 \
    --per_frame_duplicate_match_threshold 2 \
    --splits val \
    --sets varied \
    --output_path ${OUTPUT_FILE_PATH} \
    --visualize False

Arguments:

--duplicate_match_threshold: Tolerance for duplicate objects across frames (default: 2)
--per_frame_duplicate_match_threshold: Tolerance for duplicate regions per frame (default: 2)
--splits: Choose from val, test, or all
--sets: Choose from varied, kitchen, or All
--visualize: Set to True to save visualization outputs

Output: The evaluation results will be saved to ${OUTPUT_FILE_PATH}

Getting Started

Installation

Clone this repository with submodules:

git clone --recursive https://github.com/yuqunw/scene_diff.git
cd scene_diff

Create conda environment and install dependencies:

conda create -n scene_diff python=3.10 -y
conda activate scene_diff
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 # Install the pytorch fitting your nvcc version 
pip install -r requirements.txt
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu121.html # install torch_scatter

Install submodules:

# Install segment-anything submodule
cd submodules/segment-anything-langsplat-modified
pip install -e .
cd ../..

Download Checkpoints

1. Download the Segment-Anything checkpoint:

bash checkpoints/download_sam_checkpoint.sh

2. Configure DINOv3 checkpoint:

The DINOv3 checkpoint will be automatically downloaded on first use after filling in the checkpoint url. To set it up:

Visit the DINOv3 downloads page to apply for the checkpoint access
Right-click on dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth and copy the download link

Update the URL in configs/scenediff_config.yml:

models:
  dinov3:
    weight_url: "<paste_your_copied_url_here>"

Quick Demo

Run change detection on any two videos:

python scripts/demo.py \
    --config configs/scenediff_config.yml \
    --video1 path/to/video1.mp4 \
    --video2 path/to/video2.mp4 \
    --output output/demo

Output: The script generates point cloud visualizations including score maps and object segmentations for both videos in the specified output directory.

Parameters: You can modify parameters in configs/scenediff_config.yml. If the automatic threshold for change detection doesn't work well (score maps look correct but too many or few detections), you can manually set detection.object_threshold in the config file.

Predict on SceneDiff Benchmark

Run inference on all sequences in the benchmark:

python scripts/predict_multiview.py \
    --config configs/scenediff_config.yml \
    --splits val \
    --sets varied \
    --output_dir output/scenediff_benchmark

Arguments:

--splits: Choose from val, test, or all
--sets: Choose from varied, kitchen, or All
--output_dir: Directory to save predictions
Modify more arguments in the config file

Acknowledgement

We thank the great work from these repositories:

Segment-Anything and LangSplat for region segmentation
Pi3 for geometry estimation
DINOv3 for appearance feature extraction

License

This project is released under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SceneDiff: A Benchmark and Method for Multiview Object Change Detection

SceneDiff Benchmark

Dataset Structure

Evaluation

Getting Started

Installation

Download Checkpoints

Quick Demo

Predict on SceneDiff Benchmark

Acknowledgement

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
checkpoints		checkpoints
configs		configs
modules		modules
scripts		scripts
splits		splits
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
utils.py		utils.py

License

yuqunw/scene_diff

Folders and files

Latest commit

History

Repository files navigation

SceneDiff: A Benchmark and Method for Multiview Object Change Detection

SceneDiff Benchmark

Dataset Structure

Evaluation

Getting Started

Installation

Download Checkpoints

Quick Demo

Predict on SceneDiff Benchmark

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages