Question
#2
by
mrfakename
- opened
Hi
Thanks for releasing Granite, can’t wait to try it out. If it’s based on the Llama arch, why does it need Transformers 4.41?
Thanks!
(PS: thanks for using the Llama arch instead of a custom one - makes it so much easier to tune :))
hi
@mrfakename
, the llama arch required adding a new parameter 'mlp_bias'
PR: https://github.com/huggingface/transformers/pull/30031
rest is similar to llama
you can find this param in our config as well: https://huggingface.co/ibm-granite/granite-3b-code-base/blob/c2475bd7587e4e08fafb0e22223f9af7081c5c00/config.json#L14
thx for the explanation! makes sense
mrfakename
changed discussion status to
closed