Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language



Extracting Object Attributes with Prefixes

Training FlexCap on a large dataset leads to an emergent capability: the model can extract desired information for a specific image region using input prefixes. We present below some examples of attributes that FlexCap can generate.

Click on the image to inspect the bounding box and caption closely.


Human Action

The prefix for extracting actions: The person is _____

Object Use

The prefix for extracting the function of an object: This is used for _____

Text

The prefix for performing OCR: The sign says _____

Book Title

The prefix for extracting book titles from cover pages: This book is called _____

Author

The prefix for extracting authors from cover pages: Written by _____

Photo Location

The prefix for extracting the location of a photo: The photo was taken _____

Noteworthy

The prefix for extracting noteworthy aspects of an image: Notice _____

Object Material

The prefix for extracting material: It is made of _____

Object Color

The prefix for extracting color: The color is _____




FlexCapLLM

Rich localized captions generated by FlexCap can be easily passed onto Large Language Models (LLMs) to enable zero-shot visual question answering.

Here we present some of the results of FlexCapLLM. Click on any of the images to inspect closely. Note: in the images below "FlexCap" refers to the system "FlexCapLLM".




BibTeX
@inproceedings{
dwibedi2024flexcap,
title={FlexCap: Describe Anything in Images in Controllable Detail},
author={Debidatta Dwibedi and Vidhi Jain and Jonathan Tompson and Andrew Zisserman and Yusuf Aytar},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=P5dEZeECGu}
}