GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces
Abstract
GaussianBlender, a feed-forward framework using latent diffusion models, enables instant, high-fidelity, and multi-view consistent 3D stylization through text-driven edits on disentangled latent spaces.
3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation. Existing text-to-3D stylization methods typically distill from 2D image editors, requiring time-intensive per-asset optimization and exhibiting multi-view inconsistency due to the limitations of current text-to-image models, which makes them impractical for large-scale production. In this paper, we introduce GaussianBlender, a pioneering feed-forward framework for text-driven 3D stylization that performs edits instantly at inference. Our method learns structured, disentangled latent spaces with controlled information sharing for geometry and appearance from spatially-grouped 3D Gaussians. A latent diffusion model then applies text-conditioned edits on these learned representations. Comprehensive evaluations show that GaussianBlender not only delivers instant, high-fidelity, geometry-preserving, multi-view consistent stylization, but also surpasses methods that require per-instance test-time optimization - unlocking practical, democratized 3D stylization at scale.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer (2025)
- DiffStyle360: Diffusion-Based 360{\deg} Head Stylization via Style Fusion Attention (2025)
- Native 3D Editing with Full Attention (2025)
- InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization (2025)
- Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting (2025)
- Fast Multi-view Consistent 3D Editing with Video Priors (2025)
- GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper