SVDQuant, claiming a 3x speedup with Flux over NF4

A new quantization scheme for diffusion models by MIT HAN Lab that not only compresses weights into 4-bits to reduce memory requirements but also does activations in 4-bit (INT4).

This accelerates generation speed remarkably over NF4 since it still uses 16-bit activations.

They also claim image quality is superior in comparison to NF4.