| CARVIEW |
HGG: Learning Efficient and Generalizable Human Representation with Human Gaussian Model
*Indicates Equal Contribution
Abstract
Modeling animatable human avatars from videos is a long-standing and challenging problem. While conventional methods require per-instance optimization, recent feed-forward methods have been proposed to generate 3D Gaussians with a learnable network. However, these methods predict Gaussians for each frame independently, without fully capturing the relations of Gaussians from different timestamps. To address this, we propose Human Gaussian Graph to model the connection between predicted Gaussians and human SMPL mesh, so that we can leverage information from all frames to recover an animatable human representation. Specifically, the Human Gaussian Graph contains dual layers where Gaussians are the first layer nodes and mesh vertices serve as the second layer nodes. Based on this structure, we further propose the intra-node operation to aggregate various Gaussians connected to one mesh vertex, and inter-node operation to support message passing among mesh node neighbors. Experimental results on novel view synthesis and novel pose animation demonstrate the efficiency and generalization of our method.
SOTA Rendering Quality with Remarkable Fast Run-time Performance
(a) Qualitative results: HGG delivers high-fidelity results for both novel view synthesis and novel pose animation. (b) Performance comparison: HGG achieves the highest PSNR in both single view (yellow) and multiview (blue) settings with superior computational efficiency.
Experimental Results
Qualitative comparison of ours against GART, ExAvatar, LGM and GPS-Gaussian on MvHumanNet dataset.
More qualitative results on novel pose animation.
More qualitative results on novel view synthesis.
Method Overview
Given an input human video, our goal is to build high-fidelity animatable Gaussian representations within inference time. We first establish frame-wise Gaussian representations through a feed-forward 3DGS network. Then, we construct a Human Gaussian Graph (HGG) to model the relations between predicted Gaussians from multiple frames and the SMPL mesh. We introduce two complementary types of operations on the HGG: the intra-node operation that extracts temporal features across multiple timesteps, and the inter-node operation that facilitates robust local message passing between topologically adjacent nodes. Finally, the Gaussians are updated into SMPL-aligned Gaussians through the HGG framework, enabling novel pose animation.
BibTeX
BibTex Code Here