429
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
arXiv:2512.07829v2 Announce Type: replace
Abstract: Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual represent…
Abstract: Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual represent…