TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

CVPR 2024 Highlight
Beihang University1, SenseTime Research2, Monash University3, UT Austin4
*Equal Contribution, 📧Corresponding Author

In TFMQ-DM, our contributions are as follows: 1) we discover that existing quantization methods suffer from temporal feature disturbance, disrupting the denoising trajectory of diffusion models and significantly affecting the quality of generated images. 2) We reveal that the disturbance comes from two aspects: inappropriate reconstruction target and unaware of finite activations. Both inducements ignore the special characteristics of time information-related modules。 3) An advanced framework (TFMQ-DM) is proposed, consisting of temporal information aware reconstruction (TIAR) for weight quantization and finite set calibration (FSC) for activation quantization. Both are based on a Temporal Information Block specially devised for diffusion models. 4) Extensive experiments on various datasets show that our novel framework achieves a new state-of-the-art result in PTQ of diffusion models, especially under 4-bit weight quantization, and significantly accelerates quantization time. For some hard tasks, e.g., CelebA-HQ 256 × 256, our method reduces the FID score by 6.71

Abstract

The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step t to achieve satisfactory multi-round denoising. Usually, t from the finite set {1, . . . , T} is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step t and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the fullprecision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by 2.0× on LSUN-Bedrooms 256 × 256 compared to previous works.

Visualization Results

(The full-precision images are depicted in the upper line, and the images generated by quantized models are shown in the lower lines)

Acceleration Results

We deploy the quantized Stable Diffusion on the Intel® Xeon® Gold 6248R Processor

Poster

BibTeX

@InProceedings{Huang_2024_CVPR,
  author    = {Huang, Yushi and Gong, Ruihao and Liu, Jing and Chen, Tianlong and Liu, Xianglong},
  title     = {TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2024},
  pages     = {7362-7371}
}