CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://humansensinglab.github.io/GAS/ x-github-request-id: 3858:2B0FD4:8252A0:925B9E:6951D285 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 00:59:51 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210051-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766969991.911772,VS0,VE197 vary: Accept-Encoding x-fastly-request-id: 6552bc06492dab71f882ff30c3b8e17a24214683 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 31 Jul 2025 07:15:07 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"688b17fb-54e2" expires: Mon, 29 Dec 2025 01:09:51 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: EB92:2DDCFF:81D120:91D8A0:6951D286 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 00:59:51 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210051-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766969991.122350,VS0,VE209 vary: Accept-Encoding x-fastly-request-id: 014dd7e196812bf994c220137b0ebb5630690dc3 content-length: 4481 GAS: Generative Avatar Synthesis from a Single Image

GAS: Generative Avatar Synthesis

from a Single Image

ICCV 2025

¹Carnegie Mellon University ²Shanghai AI Laboratory ³Stanford University

arXiv
Code

TL;DR: We introduce a unified framework for generative avatar synthesis from a single image, featuring consistent view synthesis and realistic pose animation.

Abstract

We present a unified and generalizable framework for synthesizing view-consistent and temporally coherent avatars from a single image, addressing the challenging task of single-image avatar generation. Existing diffusion-based methods often condition on sparse human templates (e.g., depth or normal maps), which leads to multi-view and temporal inconsistencies due to the mismatch between these signals and the true appearance of the subject. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model. In a first step, an initial 3D reconstructed human through a generalized NeRF provides comprehensive conditioning, ensuring high-quality synthesis faithful to the reference appearance and structure. Subsequently, the derived geometry and appearance from the generalized NeRF serve as input to a video-based diffusion model. This strategic integration is pivotal for enforcing both multi-view and temporal consistency throughout the avatar's generation. Empirical results underscore the superior generalization ability of our proposed method, demonstrating its effectiveness across diverse in-domain and out-of-domain in-the-wild datasets.

Method

Starting from a single input image, GAS uses a generalizable human NeRF to map the subject into a canonical space, then reposes and renders the 3D NeRF model to extract detailed appearance cues (i.e., NeRF renderings). These are paired with geometry cues (i.e., SMPL normal maps) and fed into a video diffusion model. A switcher module disentangles the tasks, enabling the model to generate either multi-view consistent novel views or temporally coherent pose animations.

Applications

Interactive view and pose synthesis

Leveraging the unified framework, we enable interactive synthesis of human avatars, allowing users to synthesize novel views during novel pose animation.

Synchronized Multi-view Video Generation

By alternating the sampling between view and pose synthesis, we can generate synchronized multi-view videos of human performers from only a single image.

Results

Novel view synthesis

We demonstrate the capability of our method to synthesize view-consistent avatars from a single image.

Novel pose animation

We demonstrate the capability of our method to synthesize temporal-coherent avatars with realistic deformations from a single image.

Comparison with baselines

We show the comparison with baselines on the task of novel view synthesis and novel pose animation.

BibTeX

              
@article{lu2025gas,
    title={GAS: Generative Avatar Synthesis from a Single Image},
    author={Lu, Yixing and Dong, Junting and Kwon, Youngjoong and Zhao, Qin and Dai, Bo and De la Torre, Fernando},
    journal={arXiv preprint arXiv:2502.06957},
    year={2025}
    }

Original Source | Taken Source