| CARVIEW |
Long (Tony) Lian
I am an EECS PhD student at UC Berkeley and BAIR, advised by Prof. Adam Yala and Prof. Trevor Darrell. My research primarily focuses on developing LLMs/VLMs with reasoning capabilities through RL. I am also a research scientist intern at Meta GenAI with Victoria Lin and Yuandong Tian, working on reasoning LLMs. I interned with the Deep Imagination Research team at NVIDIA Research in 2024. I hold a B.A. in Computer Science from UC Berkeley, where I conducted research under the supervision of Prof. Stella Yu during my undergraduate studies. I also interned with Baidu’s Distributed Deep Learning team.
I am looking for full-time research scientist and member of technical staff positions in the industry. Feel free to reach out to me via email!
Publications (*: equal contribution)
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Long Lian, Sida Wang, Felix Juefei-Xu, Tsu-Jui Fu, Xiuyu Li, Adam Yala, Trevor Darrell, Alane Suhr, Yuandong Tian, Xi Victoria Lin
ThreadWeaver is a framework for adaptive parallel reasoning that achieves accuracy on par with cutting-edge sequential reasoning models while significantly reducing inference latency. ThreadWeaver utilizes a two-stage parallel trajectory generator, trie-based training-inference co-design, and parallelization-aware reinforcement learning (P-GRPO).
Describe Anything: Detailed Localized Image and Video Captioning
Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui
Describe Anything Model (DAM) generates detailed descriptions for user-specified regions in images and videos, marked by points, boxes, scribbles, or masks. We introduce DLC-Bench to evaluate such region-based descriptions.
Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan*, Xiuyu Li*, Long Lian*, Charlie Victor Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
We demonstrate a new dimension of scaling—parallel reasoning—by giving LLMs spawn() and join() functions to control when to reason sequentially or in parallel, enabling lower latency and improved scalability in complex reasoning tasks.
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Kumar Krishna Agrawal*, Long Lian*, Longchao Liu, Natalia Harguindeguy, Boyi Li, Alexander Bick, Maggie Chung, Trevor Darrell, Adam Yala
Atlas is a new neural network using Multi-Scale Attention for efficient cross-scale image modeling. Atlas achieves state-of-the-art accuracy with significantly better speed and compute efficiency on high-resolution image tasks.
CrossMAE: Rethinking Patch Dependence for Masked Autoencoders
Letian Fu*, Long Lian*, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala†, Trevor Darrell†, Alexei A. Efros†, Ken Goldberg†
Transactions on Machine Learning Research (TMLR), 2025
Independent partial patch reconstruction facilitates efficient representation learning.
Unsupervised Universal Image Segmentation
Dantong Niu*, Xudong Wang*, Xinyang Han*, Long Lian, Roei Herzig, Trevor Darrell
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
A novel unified framework adept at performing various instance, semantic and panoptic segmentation tasks without any supervision.
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu*, Long Lian*, Joseph E Gonzalez, Boyi Li, Trevor Darrell
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
A text-to-image generation framework that iteratively corrects inaccuracies by integrating a self-correcting LLM with diffusion models, significantly improving image alignment with complex prompts and enabling image editing tasks.
LLM-grounded Video Diffusion Models
Long Lian*, Baifeng Shi*, Adam Yala, Trevor Darrell, Boyi Li
International Conference on Learning Representations (ICLR), 2024
Improving text-to-video generation by using a large language model to make plans before the actual video generation, achieving realistic video generation that align with complex input prompts.
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Long Lian, Boyi Li, Adam Yala, Trevor Darrell
Transactions on Machine Learning Research (TMLR), 2024 (Featured Certification)
Enhancing the prompt understanding capabilities of text-to-image diffusion models with large-language models for grounding.
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer
International Conference on Computer Vision (ICCV), 2023
Running Stable Diffusion in 4-bit weights with high generation quality for the first time.
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping
Long Lian, Zhirong Wu, Stella X. Yu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Segmenting objects in videos without any human annotation in any training stage.
Unsupervised Selective Labeling for More Effective Semi-Supervised Learning
Xudong Wang*, Long Lian*, Stella X. Yu
European Conference on Computer Vision (ECCV), 2022
We focus on selecting the right data to query for label for semi-supervised learning without any label or task information. Our method demonstrates that a small compute spent on careful labeled data selection brings big annotation efficiency.
Debiased Learning from Naturally Imbalanced Pseudo-Labels
Xudong Wang, Zhirong Wu, Long Lian, Stella X. Yu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
We propose a novel and effective debiased learning method based on counterfactual reasoning and adaptive margins to deal with the undesired effect from naturally imbalanced pseudo-labels.
Unsupervised Visual Attention and Invariance for Reinforcement Learning
Xudong Wang*, Long Lian*, Stella X. Yu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Rather than training a universal RL policy invariant to train-test distribution shift, we proposed unsupervised visual attention and invariance method (VAI) to disperse interference task-irrelevant factors towards a RL policy robust to distractions.
Long-tailed Recognition by Routing Diverse Distribution-aware Experts
Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, Stella X. Yu
International Conference on Learning Representations (ICLR), 2021 (Spotlight)
Leveraging bias-variance tradeoff, we proposed a universal classification framework for long-tailed distributed data with RoutIng Diverse Experts (RIDE) that improves both accuracy and inference speed.
Academic Services
Reviewer for CVPR/ECCV/ICCV/ICLR/ICML/NeurIPS/AAAI