Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Demo


Dust the Blinds:

Clean the Table:

Semantic segmentation

Training Pipeline

Encoding the scene photometry and geometry

VL-Fields jointly encodes the geometry and appearance of a scene, along with the visual-language features. This allows us rely only on the neural-fields for re-rendering the input video, without the need of a stored point-cloud (like in CLIP-Fields).


Related Work

There's a lot of excellent work for grounding language into neural implicit representations.

DFF introduced the idea of distilling knowledge from large language-vision models, for the purpose of grounding language into neural-fields.

CLIP-Fields demonstrated how such models can be used in the field of mobile robotics, for the purpose of commanding robots with natural language queries.

More recently, LERF addressed the limitations in utilizing fine-tuned VL models (e.g., LSeg), by directly extracting vision-language features from CLIP.

BibTeX

@article{tsagkas2023vlfields,
  title   =  {VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations},
  author  =  {Tsagkas, Nikolaos and Mac Aodha, Oisin and Lu, Chris Xiaoxuan},
  journal =  {arXiv preprint arXiv:2305.12427},
  year    =  {2023}
}

This website is a modified version of nerfies.