Abstract

The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 37 (NeurIPS 2024)
EditorsA. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang
PublisherCurran Associates Inc.
ISBN (Print)9798331314385
Publication statusPublished - 2025
MoE publication typeA4 Conference publication
EventConference on Neural Information Processing Systems - Vancouver, Canada, Vancouver , Canada
Duration: 10 Dec 202415 Dec 2024
Conference number: 38
https://neurips.cc/Conferences/2024

Conference

ConferenceConference on Neural Information Processing Systems
Abbreviated titleNeurIPS
Country/TerritoryCanada
CityVancouver
Period10/12/202415/12/2024
Internet address

Fingerprint

Dive into the research topics of 'Guiding a Diffusion Model with a Bad Version of Itself'. Together they form a unique fingerprint.
  • -: ERC PIPE/Lehtinen

    01/05/202031/08/2025

    Project: EU_H2ERC

Cite this