| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://bqy.info/3dpe
x-github-request-id: D343:3FD64F:7DFF5E:8D4D08:69516FD7
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 17:58:47 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210062-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766944727.450634,VS0,VE197
vary: Accept-Encoding
x-fastly-request-id: 1278c91fbc4827a3a3cf3ec6ab20b1736345d6a7
content-length: 162
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://bqy.info/3dpe/
x-github-request-id: 3D8B:292AC1:7DE024:8D2B9F:69516FD0
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 17:58:47 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210052-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766944728.770006,VS0,VE198
vary: Accept-Encoding
x-fastly-request-id: 3a876002383ab4347a671d28ef77639d1908a366
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
x-origin-cache: HIT
last-modified: Mon, 15 Sep 2025 10:11:51 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68c7e667-26cb"
expires: Sun, 28 Dec 2025 18:08:48 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: F656:3655F2:7E8B80:8DD685:69516FD7
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 17:58:48 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210052-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766944728.981394,VS0,VE207
vary: Accept-Encoding
x-fastly-request-id: eb97a51414e708ac2c5c7882673ee01f44aa6bb7
content-length: 3481
3DPE
ECCV 2024
Real-time 3D-aware Portrait Editing from a Single Image
ECCV 2024
Qingyan Bai1,2
Zifan Shi1
Yinghao Xu3
Hao Ouyang1,2
Qiuyu Wang2
Ceyuan Yang4 Xuan Wang2 Gordon Wetzstein3 Yujun Shen2 Qifeng Chen1
Ceyuan Yang4 Xuan Wang2 Gordon Wetzstein3 Yujun Shen2 Qifeng Chen1
1 HKUST
2 Ant Group
3 Stanford University 4 Shanghai AI Laboratory
3 Stanford University 4 Shanghai AI Laboratory
Overview
This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or
text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model,
which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages
over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., ∼0.04s per image), over 100× faster
than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such
that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified
customized types of editing during inference.
Method
Motivation.
Live3D Portrait (Live3D) proposes a real-time 3D inversion method based on the two-branch structure.
The figure below demonstrate the disentanglement in Live3D features.
We separately disable the features from the two branches Ehigh(·) and Elow(·) to infer the reconstructed image.
Without Ehigh(·), the output retains the coarse structure but loses its appearance. Conversely,
when Elow(·) is deactivated, the reconstructed portraits preserve the texture (such as the blue and
purple reflection on the glasses) but fail to capture the geometry.
Framework.
Inspired by the aforementioned feature disentanglement, we propose to distill the priors in the 2D diffusion generative model and 3D GAN for real-time 3D-aware editing.
The proposed model is fine-tuned from Live3D where the prompt features are fused with ones from Ehigh(·) through cross-attention,
in order to further predict the triplane representation.
![]() |
![]() |
Results
Shown below are input images and the corresponding stylized renderings.
For qualitative comparisons, we compare the results of several baselines with image prompts and text prompts. In each case, we include the edited portraits as well as their novel view renderings.
The figure below includes: (a) testing results of customized prompt adaptation and (b) its learning process.
We show the intermediate testing results at 10s, 1min, 2min and 5min during adaptation for the style golden statue.
For qualitative comparisons, we compare the results of several baselines with image prompts and text prompts. In each case, we include the edited portraits as well as their novel view renderings.
![]() |
![]() |
BibTeX
@inproceedings{bai20243dpe,
title = {Real-time 3D-aware Portrait Editing from a Single Image},
author = {Bai, Qingyan and Shi, Zifan and Xu, Yinghao and Ouyang, Hao and Wang, Qiuyu and Yang, Ceyuan and Wang, Xuan and Wetzstein, Gordon and Shen, Yujun and Chen, Qifeng},
booktitle = {European Conference on Computer Vision},
year = {2024}
}
Related Work


Live 3D Portrait: Real-Time Radiance Fields for Single-Image Portrait View Synthesis.
Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano.
TOG 2023.
Comment: Proposes a one-shot method to infer and render a 3D representation from a single unposed image in real-time.
Comment: Proposes a one-shot method to infer and render a 3D representation from a single unposed image in real-time.

InstructPix2Pix: Learning to Follow Image Editing Instructions.
Tim Brooks, Aleksander Holynski, Alexei A. Efros.
CVPR 2023.
Comment: Proposes an image editing method following human textual instructions.
Comment: Proposes an image editing method following human textual instructions.




Comment: Proposes a hybrid explicit-implicit network that synthesizes high-resolution multi-view-consistent images in real time and also produces high-quality 3D geometry.