Sdxl: Improving Latent Diffusion Models for High-Resolution Image Synthesis

By Dustin Podell et al
Published on July 4, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Improving Stable Diffusion
2.1 Architecture & Scale
2.2 Micro-Conditioning
2.3 Multi-Aspect Training
2.4 Improved Autoencoder
2.5 Putting Everything Together

Summary

SDXL is a latent diffusion model for text-to-image synthesis that leverages a larger UNet backbone and multiple novel conditioning schemes. It demonstrates improved performance compared to previous versions of Stable Diffusion, achieving competitive results with state-of-the-art image generators. The document discusses design choices, such as architecture, scale, conditioning techniques, multi-aspect training, and an improved autoencoder. The model is trained in multiple stages, utilizing size- and crop-conditioning, multi-aspect training, and an improved autoencoder to enhance image synthesis capabilities.
×
This is where the content will go.