Yuqun Wu
·
Chih-hao Lin
·
Henry Che
·
Aditi Tiwari
·
Chuhang Zou
·
Shenlong Wang
·
Derek Hoiem
This repository contains the code for the paper SceneDiff: A Benchmark and Method for Multiview Object Change Detection. We investigate the problem of identifying objects that have been changed between a pair of captures of the same scene at different times, introducing the first object-level multiview change detection benchmark and a new training-free method.
Download the SceneDiff benchmark dataset from 🤗 Hugging Face.
mkdir data && cd data
wget https://huggingface.co/datasets/yuqun/SceneDiff/resolve/main/scenediff_benchmark.zip
unzip scenediff_bechmark.zipscenediff_benchmark/
├── data/ # 350 sequence pairs
│ ├── sequence_pair_1/
│ │ ├── original_video1.mp4 # Raw video before change
│ │ ├── original_video2.mp4 # Raw video after change
│ │ ├── video1.mp4 # Video with annotation mask (before)
│ │ ├── video2.mp4 # Video with annotation mask (after)
│ │ ├── segments.pkl # Dense segmentation masks for evaluation
│ │ └── metadata.json # Sequence metadata
│ ├── sequence_pair_2/
│ │ └── ...
│ └── ...
├── splits/ # Val/Test splits
│ ├── val_split.json
│ └── test_split.json
└── vis/ # Visualization tools
├── visualizer.py # Flask-based web viewer
├── requirements.txt
└── templates/
About segments.pkl: See the detailed description here.
Visualization: For better visualization, run the command:
cd data/scenediff_benchmark/vis && pip install -r requirements.txt
python visualizer.pyWe expect the method predictions have following structures:
output_dir/
├── sequence_pair_1/
│ └── object_masks.pkl # Dense segmentations of changed objects (for evaluation)
├── sequence_pair_2/
└── ...
with object_masks.pkl following this structure:
object_masks = {
'H': int, # Image height
'W': int, # Image width
'video_1': { # Objects existing in video_1
'object_id_1': { # Integer ID for each detected object
'frame_id_1': { # Integer frame number
'mask': RLE_Mask, # Run-length encoded mask
'cost': float # Confidence score of the prediction
},
...
},
...
},
'video_2': { # Objects existing in video_2
'object_id_1': { # Integer ID for each detected object
'frame_id_1': { # Integer frame number
'mask': RLE_Mask, # Run-length encoded mask
'cost': float # Confidence score of the prediction
},
...
},
...
}
}Then the evaluation script can be run with:
python scripts/evaluate_multiview.py \
--pred_dir ${OUTPUT_DIR} \
--duplicate_match_threshold 2 \
--per_frame_duplicate_match_threshold 2 \
--splits val \
--sets varied \
--output_path ${OUTPUT_FILE_PATH} \
--visualize FalseArguments:
--duplicate_match_threshold: Tolerance for duplicate objects across frames (default: 2)--per_frame_duplicate_match_threshold: Tolerance for duplicate regions per frame (default: 2)--splits: Choose fromval,test, orall--sets: Choose fromvaried,kitchen, orAll--visualize: Set toTrueto save visualization outputs
Output: The evaluation results will be saved to ${OUTPUT_FILE_PATH}
-
Clone this repository with submodules:
git clone --recursive https://github.com/yuqunw/scene_diff.git cd scene_diff -
Create conda environment and install dependencies:
conda create -n scene_diff python=3.10 -y conda activate scene_diff pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 # Install the pytorch fitting your nvcc version pip install -r requirements.txt pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu121.html # install torch_scatter
-
Install submodules:
# Install segment-anything submodule cd submodules/segment-anything-langsplat-modified pip install -e . cd ../..
1. Download the Segment-Anything checkpoint:
bash checkpoints/download_sam_checkpoint.sh2. Configure DINOv3 checkpoint:
The DINOv3 checkpoint will be automatically downloaded on first use after filling in the checkpoint url. To set it up:
- Visit the DINOv3 downloads page to apply for the checkpoint access
- Right-click on
dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pthand copy the download link - Update the URL in
configs/scenediff_config.yml:models: dinov3: weight_url: "<paste_your_copied_url_here>"
Run change detection on any two videos:
python scripts/demo.py \
--config configs/scenediff_config.yml \
--video1 path/to/video1.mp4 \
--video2 path/to/video2.mp4 \
--output output/demoOutput: The script generates point cloud visualizations including score maps and object segmentations for both videos in the specified output directory.
Parameters: You can modify parameters in configs/scenediff_config.yml. If the automatic threshold for change detection doesn't work well (score maps look correct but too many or few detections), you can manually set detection.object_threshold in the config file.
Run inference on all sequences in the benchmark:
python scripts/predict_multiview.py \
--config configs/scenediff_config.yml \
--splits val \
--sets varied \
--output_dir output/scenediff_benchmarkArguments:
--splits: Choose fromval,test, orall--sets: Choose fromvaried,kitchen, orAll--output_dir: Directory to save predictions- Modify more arguments in the config file
We thank the great work from these repositories:
- Segment-Anything and LangSplat for region segmentation
- Pi3 for geometry estimation
- DINOv3 for appearance feature extraction
This project is released under the MIT License. See LICENSE for details.
