My version to use Dev and Schnell on a 3090 using quants with a gradio front end.

#23
by NuclearGeek - opened

I added quants to both models on startup. So it takes a few minutes on startup but then I get image generations in under 2 minutes for dev and just a few seconds for schnell.

Here is the github and a video explaining it:

https://github.com/NuclearGeekETH/NuclearGeek-Flux-Capacitor

https://www.youtube.com/watch?v=EZVjuFZ0otQ

This makes one hell of a difference in inference speed. I tested with only 28 steps. I quantized the text_encoder_2 version and not the text_encoder.
Ran in 35 seconds flat. nice!! I'm using RTX 4090 so no cpu offloading needed. Result was very good.
Note: If using GPU (without enable_model_cpu_offload()) you should Quantize BEFORE sending the pipeline to device="cuda".

Running transformer freeze DEV
Running text_encoder freeze DEV
seed = 17894334164879757554
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 28/28 [00:35<00:00, 1.28s/it]

Sign up or log in to comment