| CARVIEW |
Method
The proposed method introduces a framework for generating high-quality, view-consistent Physically Based Rendering (PBR) maps from image prompts using a Multiview Diffusion Model. At its core, the approach employs a Dual-Channel Material Generation framework, which extends the traditional diffusion model by incorporating an additional channel to simultaneously produce albedo and metallic-roughness (MR) maps. To ensure the generated textures retain fine details and remain faithful to the input reference image, we utilize Reference Attention, which extracts detailed information through a dedicated reference branch. Additionally, Consistency-Regularized Training is introduced to enhance robustness by training the model on pairs of reference images with slight variations in camera pose and lighting, enforcing the generation of lighting-invariant PBR maps. To address misalignment issues between the albedo and MR channels, a Multi-Channel Aligned Attention module is designed to synchronize information across channels, preventing artifacts and unexpected shadows. Furthermore, Learnable Material Embeddings are incorporated for each channel, providing contextual guidance to ensure artifact-free and coherent texture generation.
Comparison with text-conditioned methods
Comparison with image-conditioned methods
Realistic material creation with a variety of different prompts
Texture generation under distinct environmental illuminations
Ablation results