HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Fri, 22 Nov 2024 02:39:44 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"673feef0-3836" expires: Sun, 28 Dec 2025 19:46:30 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 1407:2916CC:7FD3AD:8F4BA4:695186BD accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 19:36:30 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210034-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766950590.945201,VS0,VE215 vary: Accept-Encoding x-fastly-request-id: d4a17b46e8ee1a01bf727bcc30c17b39c5d542dd content-length: 4156 PLA - Project Page

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Runyu Ding^1*, Jihan Yang^1*, Chuhui Xue², Wenqing Zhang², Song Bai^2†, Xiaojuan Qi^1†,

¹The University of Hong Kong, ²ByteDance Inc.
CVPR 2023
^*Indicates Equal Contribution, ^† indicates Equal Corresponding

Paper arXiv Code

vending machine

piano

stove

working office

library

Synonymical novel concepts: "couch", "freezer"

Hierarchical novel concepts: "bathroom", "kitchen"

Fine-grained novel concepts: "monitor", "blackboard"

Abstract

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred to 3D scenarios due to the inaccessibility of large-scale 3D-text pairs. To this end, we propose to distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning multi-view images from 3D, which allows explicitly associating 3D and semantic-rich captions. Further, to facilitate coarse-to-fine visual-semantic representation learning from captions, we design hierarchical 3D-caption pairs, leveraging geometric constraints between 3D scenes and multi-view images. Finally, by employing contrastive learning, the model learns language-aware embeddings that connect 3D and text for open-vocabulary tasks. Our method not only remarkably outperforms baseline methods by 25.8% ~ 44.7% hIoU and 14.5% ~ 50.4% hAP₅₀ on open-vocabulary semantic and instance segmentation, but also shows robust transferability on challenging zero-shot domain transfer tasks.

Approach

Image-bridged Point-language Association

We utilize multi-view images of a 3D scene as a bridge to access knowledge encoded in vision-language models. Text discritions are first generated by a powerful image-captioning model, and is then associated with a set of points in the 3D scene with geometry constraints between images and 3D point clouds. We present hierarchical scene-level, view-level and entity-level point-language association manners to elicit coarse-to-fine semantic-rich language supervision.

Langauge-driven 3D Scene Understanding Framework

Different from the close-set network, the learnable semantic head is replaced by category embeddings encoded by a text encoder from category names. Binary head is to rectify semantic scores with base and novel probability as condition. Instance head is tailored to instance segmentation. Most importantly, to endow the model with rich semantic space to improve open-vocabulary capability, we supervise point embeddings with caption embeddings based on point-language association.

Visualizations

Hierarchical Point-language Association

BibTeX

@inproceedings{ding2022language,
    title={PLA: Language-Driven Open-Vocabulary 3D Scene Understanding},
    author={Ding, Runyu and Yang, Jihan and Xue, Chuhui and Zhang, Wenqing and Bai, Song and Qi, Xiaojuan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2023}
}

This page was built using the Academic Project Page Template. You are free to borrow the of this website, we just ask that you link back to this page in the footer.
This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source