Masked Diffusion Models are Fast and Privacy-aware Learners

Jiachen Lei
Peng Cheng
Zhongjie Ba*
Kui Ren

ICSR

Paper
GitHub

Pre-training paradigm proposed in MaskDM. The noisy input is masked and then fed into the model. Only visible areas are denoised. After pre-training, the model is subsequently fine-tuned via conventional denoising training.


Abstract

Diffusion models have emerged as the de-facto technique for image generation, yet they entail significant computational overhead, hindering the technique's broader application in the research community. We propose a prior-based denoising training framework, the first to incorporate the pre-train and fine-tune paradigm into the diffusion model training process, which substantially improves training efficiency and shows potential in facilitating various downstream tasks. Our approach centers on masking a high proportion (e.g., up to 90%) of the input image and employing masked denoising score matching to denoise the visible areas, thereby guiding the diffusion model to learn more salient features from training data as prior knowledge. By utilizing masked learning in a pre-training stage, we efficiently train the ViT-based diffusion model on CelebA-HQ 256 x 256 in the pixel space, achieving a 4x acceleration and enhancing the quality of generated images compared to denoising diffusion probabilistic model (DDPM). Moreover, our masked pre-training technique can be universally applied to various diffusion models that directly generate images in the pixel space, aiding in the learning of pre-trained models with superior generalizability. For instance, a diffusion model pre-trained on VGGFace2 attains a 46% quality improvement through fine-tuning with merely 10% data from a different distribution. Moreover, our method shows the potential to serve as a training paradigm for enhancing the privacy protection capabilities of diffusion models


Paper

Masked Diffusion Models are Fast and Privacy-aware Learners

Jiachen Lei, Peng Cheng, Zhongjie Ba*, Kui Ren