SVDQuant, claiming a 3x speedup with Flux over NF4

A new quantization scheme for diffusion models by MIT HAN Lab that not only compresses weights into 4-bits to reduce memory requirements but also does activations in 4-bit (INT4).

This accelerates generation speed remarkably over NF4 since it still uses 16-bit activations.

They also claim image quality is superior in comparison to NF4.

Weights: https://huggingface.co/mit-han-lab/svdquant-models

Code: https://github.com/mit-han-lab/nunchaku

Blog with more details: https://hanlab.mit.edu/blog/svdquant

https://i.redd.it/b58flyr8nqzd1.gif

I am not the author of this work, just sharing the info.