SVDQuant, claiming a 3x speedup with Flux over NF4
A new quantization scheme for diffusion models by MIT HAN Lab that not only compresses weights into 4-bits to reduce memory requirements but also does activations in 4-bit (INT4).
This accelerates generation speed remarkably over NF4 since it still uses 16-bit activations.
They also claim image quality is superior in comparison to NF4.
Weights: https://huggingface.co/mit-han-lab/svdquant-models
Code: https://github.com/mit-han-lab/nunchaku
Blog with more details: https://hanlab.mit.edu/blog/svdquant
https://i.redd.it/b58flyr8nqzd1.gif
I am not the author of this work, just sharing the info.