| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 04 Dec 2025 11:30:23 GMT
access-control-allow-origin: *
etag: W/"693170cf-d484"
expires: Mon, 29 Dec 2025 06:50:28 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 84BF:272D88:8720F2:97C356:6952225A
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 06:40:28 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210043-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766990429.584474,VS0,VE210
vary: Accept-Encoding
x-fastly-request-id: d0ab3dba650c1b155d424b100443922a3f67e11d
content-length: 11764
Shengqiong Wu
About Me
I am currently a fourth-year Ph.D. student at NExT++ Research Center, advised by Prof. Tat-Seng Chua in School of Computing at National University of Singapore. Prior to this, I received both my M.S. and B.S. degrees from Wuhan University.
My research focuses on Large Vision-Language Foundation Models, with particular interest in advancing their capability, controllability, evaluability, and robust, reliable reasoning. I am also broadly interested in Natural Language Processing.
I welcome collaboration opportunities. Feel free to reach out if you are interested in my research.
Some of my representive work:|
NExT-GPT:
The first unified any-to-any multimodal LLM, capable of understanding and generating across any modality or combination of modalities (e.g., text, image, video, audio). [PDF] [Github] [Huggingface] [Video] (ICML'24 Oral, selected as a Most Influential Paper by Paper Digest, WAIC Youth Outstanding Paper Award, |
|
|
Any2Caption:
A SoTA framework for controllable video generation from any condition by being the first to leverage MLLMs to interpret diverse inputs into dense, structured captions. [PDF] [Github] [Huggingface] [Video] (Preprint, 2025) |
|
|
Setok:
The first to propose a general dynamic semantic-equivalent vision tokenizer, fundamentally enhancing the performance bottlenecks of existing MLLMs. [PDF] [Github] (ICLR'25) |
|
USG:
The first to propose a Universal Scene Graph representation framework that unifies structured semantic scene graphs across modalities including images, text, videos, and 3D. [PDF] [Github] (CVPR'25 Highlight) |
π₯NEWsπ₯
|
|
Publications
| 2025 |
|
| 2024 |
|
| 2023 |
|
| 2022 |
|
| 2021 |
Academic Services
| Conference Reviewer |
| NeurIPS-23/24/25, ICLR-24/25, ICML-24/25, CVPR-24/25/26, ACM MM-23/24/25, IJCAI-23/24, AAAI-24/25/26, ACL-23/24, WSDM-23, |
| Journal Reviewer |
| ICJV, TOMM, IEEE/ACM TALLIP, IPM, KBS, Neurocomputing |
Invited Talks
|
Internships
|
Dec. 2024 - Now                    Kuaishou, Remote                                                           VGI Research Intern                                                           Advisor: Weicai Ye, Xintao Wang |
|
|
Nov. 2023 - Jun. 2024         Kunlun Skywork AI, Singapore                                                           2050 Research Intern                                                           Advisor: Shuicheng Yan, Director |
|
|
Jul. 2019 - Aug. 2019          China Merchants Bank, Wuhan, China                                                           Information Technology Department Intern                                                           Advisor: Hua Pan, General Manager |
|
|
Jul. 2018 - Nov. 2018          YITU, ShangHai, China                                                           Solution Engineer Intern                                                           Advisor: Ze Deng |
|
Honors & Awards
|
|
Education
|
Skills and MISC
                                   ...
     ...
     ...
|
    
    
    
    
    
    
    
...
    
...
    
...
