Releasing base model and combined SFT dataset

#13
by SS12444 - opened

Great work. Are there plans for releasing the base model and expanded training dataset like for idefics2? Base model is good for experimentation. Thanks!

Thanks! No unfortunately, since we included large synthetic instruction datasets directly in the pre-training, we didn't really have a base model anymore, so we only release the final instruct version

I might be missing something in the paper, but is there a model that is like the base model of idefics2, after the pretraining stage, but before the SFT stage (Table 3 of Idefics3 paper). I understand that pretraining can include wider forms of data but the model will not be necessarily instruction tuned

I was also wondering as the paper suggests that there is pre-training and then fine-tuning, there should be a base model between pre-training and fine-tuning. Let me know if I am missing something here. BY THE WAY great work.

Sign up or log in to comment