Fine-tune the model?

#46
by NXBY - opened

Hi everyone, is it possible to fine-tune the BLOOM model? If yes, how to do that? Thanks!

Hey all,
I'm also interesting in understanding how can I fine-tune this model to do a specific generation task after giving it many prompts.

BigScience Workshop org
edited Jul 18, 2022

Hi everyone,
If you have enough compute you could fine tune BLOOM on any downstream task but you would need enough GPU RAM to store the model + gradient (optimizer state) which is quite costly. A common tasks people fine-tune auto regressive models is Question Answering. I would say if you are interested of doing that you can first try it on one of the BLOOM-small models (ideally 1b3 since it is the one of a small fully trained model)
Another option could be to do "prompt-tuning": https://youtu.be/8HwHGGb1zpQ?t=2455 it could be interesting to apply this method on BLOOM as it wont require to store the optimizer state of the whole model

Hi everyone,
If you have enough compute you could fine tune BLOOM on any downstream task but you would need enough GPU RAM to store the model + gradient (optimizer state) which is quite costly. A common tasks people fine-tune auto regressive models is Question Answering. I would say if you are interested of doing that you can first try it on one of the BLOOM-small models (ideally 1b3 since it is the one of a small fully trained model)
Another option could be to do "prompt-tuning": https://youtu.be/8HwHGGb1zpQ?t=2455 it could be interesting to apply this method on BLOOM as it wont require to store the optimizer state of the whole model

Thanks for your reply,
I'm not able to find any code on how it should be done.
Can you please refer me to the code that can do the fine-tuning?

Hi everyone,
If you have enough compute you could fine tune BLOOM on any downstream task but you would need enough GPU RAM to store the model + gradient (optimizer state) which is quite costly. A common tasks people fine-tune auto regressive models is Question Answering. I would say if you are interested of doing that you can first try it on one of the BLOOM-small models (ideally 1b3 since it is the one of a small fully trained model)
Another option could be to do "prompt-tuning": https://youtu.be/8HwHGGb1zpQ?t=2455 it could be interesting to apply this method on BLOOM as it wont require to store the optimizer state of the whole model

Hi ybelkada, thank you for your reply. And I just watched the video you shared with us, thanks for the video!

I have the same question as AnaRhisT: is there any code we can use to do model tuning?
And is there any code to do prompt tuning as mentioned in the video?

Thank you very much!

BigScience Workshop org

It is highly recommended that you use the same codebase that was originally used to train the model rather than the huggingface port. You can find that codebase here

It is highly recommended that you use the same codebase that was originally used to train the model rather than the huggingface port. You can find that codebase here

Hi stellaathena, thank you for your message!
Indeed I should try the codebase that has been used originally!
I'm quite new to this field, in the link you shared I only see the code to pretrain BERT, GPT and T5, but no BLOOM.
Should I re-use the GPT code? (I saw that they are similar) And may I download the BLOOM model from Hugging Face, or I'd better download the model from somewhere else?
Do you have any code using the original codebase to fine-tune BLOOM?

Thank you!

BigScience Workshop org

@NXBY if you are that new to this field, finetuning this model is almost certainly not what you want to be doing. This model is expensive to finetune and to do inference with, and requires highly specialized hardware. You should start off with a model like GPT-2 or GPT-Neo, which is several orders of magnitude smaller and substantially cheaper to use

When I tried chat.petals.ml, every thing I tried produced a python error, pretty much boiling down to this final line:

  File "/home/borzunov/.local/lib/python3.10/site-packages/hivemind/p2p/p2p_daemon_bindings/utils.py", line 72, in raise_if_failed
    raise ControlFailure(f"Connect failed. msg={response.error.msg}")
hivemind.p2p.p2p_daemon_bindings.utils.ControlFailure: Connect failed. msg=routing: not found

Thoughts...?

hi ..........I try to implement the extractive QA from bloom-560m model ........ I need the training script for bloom extractive QA and implement steps..................pls help me

hi ..........I try to implement the extractive QA from bloom-560m model ........ I need the training script for bloom extractive QA and implement steps..................pls help me

Here's the work I did for my acadmic project. That's been a while ago now, so I don't know if the code is still functional, and I will not be able to provide support or advice for anything that isn't working today. However, to the degree it's helpful:

https://github.com/jasondeden/capstone

Good luck!

Hey. I have been exploring BLOOM and its API closely. I have learned how the parameters effect the response. After I am getting the responses I usually process it and remove garbage token model has produced.
Is there any way that BLOOM can stop producing more token if the context of sentence is completed.
GPT-3 basically stops itself when the context of sentence is completed or the prompt is answered.
But BLOOM produce extra tokens just to complete the max_token length provided.
Is there any way BLOOM can stop producing tokens once context is complete just like GPT-3

For Example: What is Machine Learning?
BLOOM will answer it fine with first 2 or 3 sentences but produce random text(garbage - it may be some code - regular expression formulas)

You may want to try BLOOMZ, which stops by itself when it deems the question to be answered.
Questions like What is Machine Learning? should work quite well.

Thanks @Muennighoff
I cannot really find Inference API to BLOOMZ? Can you provide a link to the page to checkout the API. Other then that I do have an idea to make it work using transformers.

Hey @RafayPunch , you mentioned that you know how to tackle this problem using transformers. Ive been stuck on the same issue and haven't found a solution for it yet. If you have, could you please explain how you did it.

I want to use bloomz-7b1-mt version of the model and make it more ChatGPT like for my language Punjabi. Is there a way I can shred off tokenizers and embeddings for languages other than the one I want, since it can be done for mt5 which reduced the model size by more than half. Also, are there any smaller versions of this model coming soon since I dont have access to a cluster of GPUs.

I have seen multiple tutorials on using the QLORA and PEFT techniques to fine-tune many 7B parameter models but they dont seem to work for this one here. I want to fine-tune it using a free version on colab and I dont want it to take much space, can anyone please help?

Sign up or log in to comment