The Diffusion model, a prevalent framework for image
generation, encounters significant challenges in terms of
broad applicability due to its extended inference times and
substantial memory requirements. Efficient Post-training
Quantization (PTQ) is pivotal for addressing these issues in
traditional models. Different from traditional models, diffusion models heavily depend on the time-step t to achieve
satisfactory multi-round denoising. Usually, t from the finite set {1, . . . , T} is encoded to a temporal feature by a
few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules
separately. They adopt inappropriate reconstruction targets
and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory,
as well as a low compression efficiency. To solve these,
we propose a Temporal Feature Maintenance Quantization
(TFMQ) framework building upon a Temporal Information
Block which is just related to the time-step t and unrelated
to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction
(TIAR) and finite set calibration (FSC) to align the fullprecision temporal features in a limited time. Equipped with
the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models
prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under
4-bit weight quantization. Additionally, our method incurs
almost no extra computational cost and accelerates quantization time by 2.0× on LSUN-Bedrooms 256 × 256 compared to previous works.