Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

๐Ÿ“‘ Research Projects


  • DeeptraceReward
    Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

    Xingyu Fu, Siyi Liu, Yinuo Xu, Pan Lu, Guangqiuse Hu, Tianbo Yang, Taran Anantasagar, Christopher Shen, Yikai Mao, Yuanzhe Liu, Keyush Shah, Chung Un Lee, Yejin Choi, James Zou, Dan Roth*, Chris Callison-Burch*

    Arxiv 2025 Sep

    [paper] [website] [code] [dataset] [twitter]

  • refocus
    ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

    Xingyu Fu, Minqian Liu, Zhengyuan Yang, John Corring, Yijuan Lu, Jianwei Yang, Dan Roth, Dinei Florencio, Cha Zhang

    ICML 2025

    [paper] [website] [code] [dataset] [twitter]


  • scienceT2I
    Science-T2I: Addressing Scientific Illusions in Image Synthesis

    Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie

    CVPR 2025

    [paper] [website] [code] [dataset]


  • muirbench
    MUIRBENCH: A Comprehensive Benchmark for Robust Multi-image Understanding

    Fei Wang*, Xingyu Fu*, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    ICLR 2025

    [paper] [website] [code] [dataset] [twitter]


  • sketchpad
    Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

    Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

    NeurIPS 2024

    [paper] [website] [code] [twitter]


  • commonsenseT2I
    Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

    COLM 2024

    [paper] [website] [code] [dataset] [twitter]


  • blink
    BLINK: Multimodal Large Language Models Can See but Not Perceive

    Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Maโ€ , Ranjay Krishnaโ€ 

    ECCV 2024, Spotlight of cVinW@CVPR 2024, 36K total downloads.

    [paper] [website] [code] [dataset] [eval] [twitter] [ Paper of the day]


  • deceptive
    Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

    Bangzheng Li, Ben Zhou, Fei Wang, Xingyu Fu, Dan Roth, Muhao Chen

    NAACL. 2024.

    [paper] [website] [code] [dataset]


  • imagenhub
    ImagenHub: Standardizing the evaluation of conditional image generation models

    Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

    ICLR. 2024.

    [paper] [website] [code] [dataset] [visualization]



  • generate-then-select
    Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

    Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang

    ACL findings. 2023.

    [paper] [website] [code]


  • dclub
    Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

    Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth

    Arxiv. 2023.

    [paper]


  • there's
    There's a Time and Place for Reasoning Beyond the Image

    Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth

    ACL (Oral). 2022.

    [paper] [code]


  • there's
    Design Challenges in Low-resource Cross-lingual Entity Linking

    Xingyu Fu*, Weijia Shi*, Xiaodong Yu, Zian Zhao, Dan Roth

    EMNLP. 2020.

    [paper] [code]


  • arabic
    Constrained sequence-to-sequence semitic root extraction for enriching word embeddings

    Ahmed El-Kishky*, Xingyu Fu*, Aseel Addawood, Nahil Sobh, Clare Voss, Jiawei Han

    WANLP @ ACL. 2019.

    [paper]


๐ŸŽค Invited Talks

  • [2024/09] : UPenn Clunch. Title: Better Evaluations for Generative Multimodal Models.
  • [2024/06] : Microsoft Azure AI, AI reading group. Title: BLINK: Multimodal Large Language Models Can See but Not Perceive.
  • [2023/07] : Amazon AWS Responsible AI Group, AI reading group. Title: Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge.

๐Ÿ’ผ Work Experience

  • [Summer 2024], Research Intern @ Microsoft, Seattle WA
  • [Summer 2022], Research Intern @ AWS AI Labs, New York City NY
  • [Summer 2019], Research Intern @ Cogcomp from UPenn, Philadelphia PA
  • [Summer 2018], Research Assistant @ DMG from UIUC, Champaign IL

 
Original Source | Taken Source