Candidate
| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 22 Dec 2025 22:44:45 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6949c9dd-8489"
expires: Sun, 28 Dec 2025 08:15:47 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 200A:3946E9:758258:83C1D1:6950E4DB
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 08:05:48 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210082-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766909148.844178,VS0,VE221
vary: Accept-Encoding
x-fastly-request-id: d9287fc6238fcb5ba9082c30a683fc2f0ae7571d
content-length: 7673
VLIC: Vision-Language Models As Perceptual Judges
for Human-Aligned Image Compression
VLIC:
Vision-Language Models As Perceptual Judges
We observe that VLIC produces high-quality reconstructions, particularly for human-relevant details such as text
and faces. We conduct large-scale user studies and quantitative evaluations. Please consult the paper for the
details. Thank you!
VLIC:
Vision-Language Models As Perceptual Judges
for Human-Aligned Image Compression
Kyle Sargent1,
Ruiqi Gao3,
Philipp Henzler2,
Charles
Herrmann3,
Aleksander Holynski3,
Li Fei-Fei1, Jiajun Wu1, Jason Zhang2
1Stanford University 2Google Research 3Google DeepMind
Li Fei-Fei1, Jiajun Wu1, Jason Zhang2
1Stanford University 2Google Research 3Google DeepMind
Overview
Can we use VLMs as judges to improve human-aligned image compression? Yes! In VLIC (Vision Language Models for Image Compression), we present a diffusion-based image compression system designed to be post-trained with binary VLM judgments. VLIC leverages existing techniques for diffusion model post-training with preferences, rather than distilling the VLM judgments into a separate perceptual loss network. Please consult our paper for more details, and check out the visualizations on this page!
Gallery
Select a Scene from the bottom bar. Select a Method from the sidebar to compare against VLIC.
Methods
Acknowledgments
We thank Ben Poole, David Minnen, and Dina Bashkirova for helpful discussions.BibTeX
@article{sargent2025vlic,
title = {VLIC: Vision-Language Models As Perceptual Judges for Human-Aligned Image Compression},
author = {Sargent, Kyle and Gao, Ruiqi and Henzler, Philipp and Herrmann, Charles and Holynski, Aleksander and Li, Fei-Fei and Wu, Jiajun and Zhang, Jason},
journal = {arXiv preprint arXiv:XXXX.XXXXX},
year = {2025}
}