Kolors Quant

Run the Kolors model with 11GB VRAM.

Download

Download the chatglm3-8bit.safetensors from Kijai.

You should have:

kolors-fp8
chatglm3-8bit.safetensors

Setup

pip install accelerate diffusers transformers optimum-quanto sentencepiece

Inference

from diffusers import AutoencoderKL, DDPMScheduler, KolorsPipeline, UNet2DConditionModel
from diffusers.pipelines.kolors.text_encoder import ChatGLMConfig, ChatGLMModel
import json
from optimum.quanto.models import QuantizedDiffusersModel
from text_encoder.quantization import quantize
from safetensors.torch import load_model
import torch

class KolorsUNet2DConditionModel(QuantizedDiffusersModel):
    base_class = UNet2DConditionModel

wrapped_unet = KolorsUNet2DConditionModel.from_pretrained('./kolors-fp8')
# You can make a copy of the Kolors-diffusers/text_encoder folder.
# with open('./text_encoder/config.json') as encoder_f:
#     encoder_config = json.load(encoder_f)
# encoder_config = ChatGLMConfig.from_dict(encoder_config)
# text_encoder = ChatGLMModel(encoder_config)
# quantize(text_encoder.encoder, 8)
# load_model(text_encoder, './chatglm3-8bit.safetensors')
pipe = KolorsPipeline.from_pretrained('Kwai-Kolors/Kolors-diffusers',
                                      unet=wrapped_unet._wrapped.to(dtype=torch.float16),
                                      # text_encoder=text_encoder,
                                      torch_dtype=torch.float16).to('cuda')
image = pipe('cat playing piano', num_inference_steps=20).images[0]
image.save('cat.png')

Disclaimer

Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.