| CARVIEW |
Promising or Elusive? Unsupervised Object Segmentation
from Real-world Single Images
NeurIPS 2022
Abstract
In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We firstly introduce four complexity factors to quantitatively measure the distributions of object- and scene-level biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models catastrophically fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the colossal failure of existing unsupervised models on real-world images is the challenging distributions of object- and scene-level biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.
Unsupervised Segmentation Performance
Synthetic datasets
training
testing
Real-world datasets
training
testing
Complexity Factors
Object Color Gradient
Object Shape Concavity
Given an RGB image, we first convert it to grayscale, then calculate its gradient horizontally and vertically. Specifically, to avoid the effect from background, we remove gradient from object boundary. The final score is the averaged inner gradient.
Given a binary mask of an object shape, we first find its smallest convex polygon that surrounds the object. Factor value is computed as 1 - area of object / area of convex mask.
Inter-object Color Similarity
Inter-object Shape Variation
Given an image consisiting of multiple objects, we first calculate the average RGB color of each object. In RGB space, we average Euclidean distance between each pair of objects. Factor value if computed as 1 - normalized averaged distance.
We calculate diagonal length of bounding box for each object. The averaged diagonal variation is normalized to be the final factor value.
Ablations
C: Single Color Ablation
S: Convex Shape Ablation
Remove color gradient inside each object such that: Object Color Gradient is effectively reduced; Inter-object Color Similarity remains similar.
Make convex the shape of each object such that: Object Shape Concavity is effectively reduced; Inter-object Shape Variation remains similar.
T: Texture Replaced Ablation
U: Uniform Scale Ablation
Replaced with distinctive texture for all objects such that: Object Color Gradient remains similar; Inter-object Color Similarity is effectively reduced.
Rescale for all objects such that: Object Shape Concavity remains similar; Inter-object Shape Variation is effectively reduced.
Qualitative Results from Ablation
Full Ablation
YCB
ScanNet
COCO
Object-level Ablation
C: Single Color Ablation
S: Convex Shape Ablation
Scene-level Ablation
T: Texture Replaced Ablation
U: Uniform Scale Ablation
Quantitative Results from Ablation
Complexity Factor Distributions
Quantitatively Segmentation Performance
Video
Short Demo (40s)
Long presentation (11min)
BibTeX
If you find this work useful for your research, please cite:
@inproceedings{yang2022,
title={{Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images}},
author={Yang, Yafei and Yang, Bo},
booktitle={NeurIPS},
year={2022},
}
© This page takes inspiration from https://imagine.enpc.fr/~monniert/DTIClustering/.