| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 17 Dec 2025 04:46:17 GMT
access-control-allow-origin: *
etag: W/"69423599-ffcc"
expires: Tue, 30 Dec 2025 10:43:27 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 281E:2D64E0:9E6B65:B20073:6953AA76
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 10:33:28 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210076-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767090808.804607,VS0,VE221
vary: Accept-Encoding
x-fastly-request-id: 374481ad9fac5780024303f1b2969f9e3ef07085
content-length: 25070
Shufan Li
Computer Science Phd Student at UCLA
I'm Shufan Li ( jacklishufan@berkeley.edu ): a second year CS Phd student at UCLA advised by Prof Aditya Grover. Previously, I worked as an undergraduate researcher at Berkeley Artificial Intelligence Research Lab advised by Prof Trevor Darrell.
Original Works
Publications
-
LaViDa: A Large Diffusion Model for Vision-Language UnderstandingShufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya GroverUniversal Multimodal Diffusion Language Model for Vision-Language Understanding
-
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context ReflectionShufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Arsh Koneru, Yusuke Kato, Kazuki Kozuka, Aditya Grover[Arxiv]. ICCV 2025In-Context Reflection Improves Text-to-Image Generation
-
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified FlowsShufan Li*, Konstantinos Kallidromitis*, Akash Gokul*, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover[Arxiv]. CVPR 2025Universal Multimodal Diffusion Generative Model for Image, Audio and Text
-
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following.Shufan Li*, Harkanwar Singh, Aditya Grover[Arxiv]. [Project Page] NAACL Findings 2025Image editing following multi-modal instructions.
-
SegLLM: Multi-round Reasoning SegmentationXuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell[Arxiv]. ICLR 2025.Multi-round interactive Segmention using Large Language Model
-
Aligning Diffusion Models by Optimizing Human Utility.Shufan Li*, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozukar[Arxiv] NeurIPS 2024Aligning text-to-image models with human feedback.
-
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data.Shufan Li*, Harkanwar Singh, Aditya Grover[Arxiv]. ECCV 2024 (Oral)Modeling multi-dimensional data with linear complexity.
-
xT: Nested Tokenization for Larger Context in Large Images .Ritwik Gupta*, Shufan Li*, Tyler Zhu*, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam[Arxiv]. ICML 2024.Long context visual perception on large images.
-
Hierarchical Open-vocabulary Universal Image Segmentation.Xudong Wang*, Shufan Li*, Konstantinos Kallidromitis*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell[Arxiv]. NeurIPS 2023.Segmentating arbritrary objects and object parts using text prompts
-
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning.Colorado Reed*, Shufan Li*, Ritwik Gupta*, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell[Arxiv]. ICCV 2023. (Oral)Scale-Aware representation learning with prior information of image scale.
Preprints
-
LaViDa: A Large Diffusion Model for Vision-Language UnderstandingShufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya GroverUniversal Multimodal Diffusion Language Model for Vision-Language Understanding
-
PopAlign: Population-Level Alignment for Fair Text-to-Image GenerationShufan Li*, Harkanwar Singh, Aditya Grover[preprint]. 2024.Aligning Diffusion Model For Fairness
-
Refine and Represent: Region-to-Object Representation LearningAkash Gokul*, Konstantinos Kallidromitis*, Shufan Li*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed[preprint]. 2022.Representation learning benefits by first learning from image regions and then learning from actual objects.
Technical Reports
-
Mercury: Ultra-Fast Language Models Based on DiffusionSamar Khanna*, Siddhant Kharbanda*, Shufan Li*, Harshit Varma*, Eric Wang* Sawyer Birnbaum , Ziyang Luo , Yanis Miraoui , Akash Palrecha Stefano Ermon , Aditya Grover , Volodymyr Kuleshov[Arxiv]. 2025.Frontier Diffusion Language Models (with Inception Labs)
-
Interpreting Audiograms with Multi-stage Neural NetworksShufan Li*, Congxi Lu, Linkai Li,Jirong Duan, Xinping Fu, Haoshuai Zhou[Arxiv]. 2021.Accelerate Hearing Aid Fitting using Computer Vision (with Orka Labs)
-
Chart-RCNN: Efficient Line Chart Data Extraction from Camera ImagesShufan Li*, Congxi Lu, Linkai Li, Haoshuai Zhou[preprint]. 2022.Line Chart Data Extraction in the wild.