CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://sqwu.top/ x-github-request-id: 222D:444BC:8726B8:97CB79:695222E1 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 06:42:42 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210031-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766990562.080817,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: 7191b7ed1daec22795a7ab6494851b21600307e0 content-length: 162 HTTP/1.1 200 OK Connection: keep-alive Content-Length: 11764 Server: GitHub.com Content-Type: text/html; charset=utf-8 Last-Modified: Thu, 04 Dec 2025 11:30:23 GMT Access-Control-Allow-Origin: * ETag: W/"693170cf-d484" expires: Mon, 29 Dec 2025 04:57:11 GMT Cache-Control: max-age=600 Content-Encoding: gzip x-proxy-cache: MISS X-GitHub-Request-Id: A053:2BC55:846D45:94DE38:695207CE Accept-Ranges: bytes Age: 0 Date: Mon, 29 Dec 2025 06:42:42 GMT Via: 1.1 varnish X-Served-By: cache-bom-vanm7210042-BOM X-Cache: HIT X-Cache-Hits: 0 X-Timer: S1766990562.320517,VS0,VE202 Vary: Accept-Encoding X-Fastly-Request-ID: ece8c6f49a885da4e0fe55f1aff566801de1c717 Shengqiong Wu

Shengqiong Wu (吴胜琼)

Ph.D. Candidate
NExT++ Research Center
National University of Singapore

About Me

I am currently a fourth-year Ph.D. student at NExT++ Research Center, advised by Prof. Tat-Seng Chua in School of Computing at National University of Singapore. Prior to this, I received both my M.S. and B.S. degrees from Wuhan University.

My research focuses on Large Vision-Language Foundation Models, with particular interest in advancing their capability, controllability, evaluability, and robust, reliable reasoning. I am also broadly interested in Natural Language Processing.

I welcome collaboration opportunities. Feel free to reach out if you are interested in my research.

Some of my representive work:

	NExT-GPT: The first unified any-to-any multimodal LLM, capable of understanding and generating across any modality or combination of modalities (e.g., text, image, video, audio). [PDF] [Github] [Huggingface] [Video] (ICML'24 Oral, selected as a Most Influential Paper by Paper Digest, WAIC Youth Outstanding Paper Award, , )
	Any2Caption: A SoTA framework for controllable video generation from any condition by being the first to leverage MLLMs to interpret diverse inputs into dense, structured captions. [PDF] [Github] [Huggingface] [Video] (Preprint, 2025)
	Setok: The first to propose a general dynamic semantic-equivalent vision tokenizer, fundamentally enhancing the performance bottlenecks of existing MLLMs. [PDF] [Github] (ICLR'25)
	USG: The first to propose a Universal Scene Graph representation framework that unifies structured semantic scene graphs across modalities including images, text, videos, and 3D. [PDF] [Github] (CVPR'25 Highlight)

🔥NEWs🔥

[NEW!-2025/11]🚀 We develop a Universal Video Agent towards Open-Source Next-Generation Video Generalist.

[NEW!-2025/10]🥳🥳🥳 I'm co-organizing the Workshop on Scene Graph on Structured Intelligence at WACV 2026. Welcome Submissions and Presentations!🤓

[NEW!-2025/06] 🚀 Just launched an open discussions on the future of scene graphs and scene understanding. Let us know what you think in the discussion thread!

[NEW!-2025/06] 📣 We host a grand Challenge in MUCG Workshop, welcome to partcipate. ✨ Top teams will be awarded certificates + cash prizes!

[NEW!-2025/05] 🚀📣 Excited to announce that we're organizing the 1st MUCG Workshop: MLLM for Unified Comprehension and Generation at ACM MM 2025!

[NEW!-2025/05] 🥳 ICML'25 Spotlight paper: Path to Multimodal Generalist: General-Level and General-Bench . A New Evaluation Paradigm for multimodal generalists.

[NEW!-2025/03] We release the first survey on MM-CoT reasoning, welcom to participate and star✨✨.

[NEW!-2025/03] 🐾 I will be a volunteer at ICLR 2025. One paper (Setok) is accpted by ICLR-25. Looking forward to meet you.

[NEW!-2024/02] 🥳🥳🥳 My two full papers are accpted in CVPR 2025: Universal Scene Graph Generation and 4D Scene Graph Generation .

[NEW!-2025/02] Welcome to SSNLP-25, Free Registration!!!

[NEW!-2024/12] 🥳 I maintain a github repo focusing on Awesome-Scene-Graph-Generation-and-Application. Welcome to check it out, contribute, and share any resources or insights related to scene graph generation!

[NEW!-2024/12] One paper focusing on mitigating hallucination in MLLMs is accepted by AAAI-25!

[NEW!-2024/08] 🐾I'm excited to have the opportunity to be a volunteer at ACL 2024. Looking forward to being part of it!

[NEW!-2024/08] I have tried to build NExT-GPT in PaddlePaddle, PaddleNLP and PPDiffusers. Still working on it. 🤔🤔

[NEW!-2024/05] Congratulations! 🥳🥳🥳, two full papers are accpted in ICML-24, NExT-GPT and Video-of-Thought .

[NEW!-2024/04] We release Vitron (Demo, Paper , Code), a universal pixel-level vision LLM designed for understanding, generating, segmenting, editing of both image and video. 🌟🌟

[NEW!-2024/02] One full paper is accepted by CVPR-24 about Text-to-Video Generation. Congrats to all my co-authors.

[NEW!-2024/01] I honor Baidu Scholarship (10 people worldwide). 🥳🥳🥳

[NEW!-2023/12] I have successfully passed my Rearcsh-based QE, I am now a Ph.D. Candidate. 🥳🥳🥳

[NEW!-2023/11] I'll join Kunlun 2050 Research as a research intern, advised by Prof. Yan. 😆

[NEW!-2023/10] We build NExT-GPT, a general-purpose any-to-any MLLM. 🌟

[NEW!-2023/09] One full paper is accepted in NeurIPS-23, about Intricate Text-to-image Generation based on Scene Graph.

[NEW!-2023/07] One full paper is accepted in ACM MM-23, about High-faithful Text-to-image Generation enhanced with layout planning from LLM.

[NEW!-2023/05] Two full papers is accepted in ACL-23, about Multimodal Relation Extraction and Image Captioning.

[NEW!-2022/07] I'm heading to SoC NUS to pursue my PhD—a new journey and fresh challenges await! 🙃

Publications

2025

Shengqiong Wu, Weicai Ye, Yuanxing Zhang, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Kun Gai, Hao Fei, Tat-Seng Chua. A Reason-then-Describe Instruction Interpreter for Controllable Video Generation. arxiv. 2025. [Project][pdf]
Zhengyang Liang, Daoan Zhang, Huichi Zhou, Rui Huang, Bobo Li, Yuechen Zhang, Shengqiong Wu, Xiaohan Wang, Jiebo Luo, Lizi Liao, Hao Fei. UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist. arxiv. 2025. [Project][pdf]
Shengqiong Wu, Weicai Ye, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Shuicheng Yan, Hao Fei, Tat-Seng Chua. Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation. arxiv. 2025. [Project][pdf]
Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou,et al. On Path to Multimodal Generalist: General-Level and General-Bench. ICML. 2025. [Project][pdf][Huggingface]
Yaoting Wang, Shengqiong Wu, Yuechen Zhang, William Wang, Ziwei Liu, Jiebo Luo, Hao Fei. Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey. arxiv. 2025. [Code][pdf]
Shengqiong Wu, Hao Fei, Tat-Seng Chua, Shuicheng Yan. Universal Scene Graph Generation. CVPR. 2025. [Code][pdf]
Shengqiong Wu, Hao Fei, Jingkang Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Tat-seng Chua. Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene. CVPR. 2025. [Code][pdf]
Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan. Towards Semantic Equivalence of Tokenization in Multimodal LLM. ICLR. 2025. [Code][pdf]
Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua. Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning. In Proceedings of AAAI. 2025. [pdf]

2024

Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan. VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. In Proceedings of NeurIPS. 2024. [Code][pdf]
Meng Luo, Hao Fei*, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu. PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. In Proceedings of ACM MM. 2024. (Oral).[Code][pdf]
Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua. NExT-GPT: Any-to-Any Multimodal Large Language Model. In Proceedings of ICML. 2024. (Oral) [Code | 3.6k 🌟][pdf]
Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu. Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. In Proceedings of ICML. 2024. (Oral) [Code][pdf]
Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua. Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs. In Proceedings of CVPR. 2024. [Code][pdf]

2023

Shengqiong Wu, Hao Fei, Hanwang Zhang, Tat-Seng Chua. Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion. In Proceedings of NeurIPS. 2023. (long, poster) [Code][pdf]
Leigang Qu*, Shengqiong Wu*, Hao Fei, Liqiang Nie, Tat-Seng Chua. LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. In Proceedings of ACM MM. 2023. (*: equal contribution, long) [Code][pdf]
Bobo Li, Hao Fei, Yuhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, Fei Li, Donghong Ji. DiaASQ: A benchmark of conversational aspect-based sentiment quadruple analysis.In Proceedings of ACL. 2023. (long, poster) [Code][pdf]
Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua. Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling. In Proceedings of ACL. 2023. (long, poster, paper award nomination, 1.6%) [Code][pdf]
Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua. Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment. In Proceedings of ACL. 2023. (long, oral) [pdf]

2022

Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua. LasUIE: Unifying information extraction with latent adaptive structure-aware generative language model. In Proceedings of NeurIPS. 2022. (long, poster) [Code][pdf]
Hu Cao, Jingye Li, Fangfang Su, Fei Li, Hao Fei, Shengqiong Wu, Bobo Li, Liang Zhao and Donghong Ji. OneEE: A One-Stage Framework for Fast Overlapping and Nested Event Extraction. In Proceedings of COLING. 2022. (long, oral) [Code][pdf]
Shengqiong Wu, Hao Fei, Fei Li, Meishan Zhang, Yijiang Liu, Chong Teng, Donghong Ji. Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling. In Proceedings of AAAI. 2022. (long, online) [Code][pdf]
Jingye Li, Hao Fei, Jiang Liu, Shengqiong Wu, Meishan Zhang, Chong Teng, Donghong Ji, Fei Li. Unified named entity recognition as word-word relation classification. In Proceedings of AAAI. 2022. (long, online) [Code][pdf]

2021

Shengqiong Wu, Hao Fei, Yafeng Ren, Donghong Ji, Jingye Li. Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge. In Proceedings of IJCAI. 2021. (long, online) [Code][pdf]

Academic Services

Conference Reviewer

NeurIPS-23/24/25, ICLR-24/25, ICML-24/25, CVPR-24/25/26, ACM MM-23/24/25, IJCAI-23/24, AAAI-24/25/26, ACL-23/24, WSDM-23,

Journal Reviewer

ICJV, TOMM, IEEE/ACM TALLIP, IPM, KBS, Neurocomputing

Invited Talks

[2025/03] Towards Semantic Equivalence of Tokenization in Multimodal LLM. NICE.
[2024/01] The Path to AGI: Achieving Modality Unification with NExT-GPT. Qingyuan Talk.
[2023/12] NExT-GPT: Any-to-Any Multimodal LLM. AI New Youth.
[2022/12] Deep Learning based Natural Language Processing: A Survey and Outlook. Jianghan University, China.
[2021/10] Comparison of Aspect-based Sentiment Analysis based on span model and transition model. Wuhan University, China.
[2021/05] Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge. CIPS, Youth Working Committee.

Internships

Dec. 2024 - Now Kuaishou, Remote VGI Research Intern Advisor: Weicai Ye, Xintao Wang
Nov. 2023 - Jun. 2024 Kunlun Skywork AI, Singapore 2050 Research Intern Advisor: Shuicheng Yan, Director
Jul. 2019 - Aug. 2019 China Merchants Bank, Wuhan, China Information Technology Department Intern Advisor: Hua Pan, General Manager
Jul. 2018 - Nov. 2018 YITU, ShangHai, China Solution Engineer Intern Advisor: Ze Deng

Honors & Awards

[2025] Dean's Graduate Research Excellence Award. NUS.

[2024] Google PhD Fellowship. Google.

[2024] ByteDance Scholarship, ByteDance.

[2024] Baidu Scholarship, Baidu.

[2023] Research Achievement Award of SoC, NUS.

[2022] President's Graduate Fellowship of NUS.

[2022] Award of Excellent Master Thesis of Hubei Cyberspace Security Society.

[2022] Excellent Graduated Graduate Student.

[2021] Huawei Scholarship for graduate students.

[2021] National Scholarship for graduate students.

[2021] 1st prize of Excellent Scholarship of Academy for M.S. Fellowship.

[2021] 1st prize of Extraordinary scholarship of academy.

[2020] Outstanding graduate student award.

[2020] 2st prize of academic scholarship of Wuhan University for graduate.

[2017] Excellent student cadre.

[2016-2019] 2st prize of academic scholarship of Wuhan University for undergraduate.

[2017] National Encouragement Scholarship.

Education

Aug. 2022 - Now School of Computing, National University of Singapore, Singapore Ph.D. in Computer Science
Sep. 2019 - Jun. 2022 School of Cyber Science and Engineering, Wuhan University, China M.S. in Computer Science
Sep. 2015 - Jun. 2019 School of Computer Science, Wuhan University, China B.S. in Computer Science and Technology

Skills and MISC

Coding

...

Deep Learning

...

Hobby

...

Original Source | Taken Source