finetune bug

#1
by Raku-Yihan - opened

Thank you for your great job.
I met some issue, could you please tell me how to address?

06/06/2024 13:43:00 - INFO - llamafactory.model.utils.checkpointing - Gradient checkpointing enabled.
06/06/2024 13:43:00 - INFO - llamafactory.model.utils.attention - Using torch SDPA for faster training and inference.
06/06/2024 13:43:00 - INFO - llamafactory.model.adapter - ZeRO3/FSDP/PureBF16/BAdam detected, remaining trainable params as their original precision.
06/06/2024 13:43:00 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
06/06/2024 13:43:00 - INFO - llamafactory.model.loader - trainable params: 2511024128 || all params: 2924351216 || trainable%: 85.8660
[INFO|trainer.py:641] 2024-06-06 13:43:00,209 >> Using auto half precision backend
[INFO|trainer.py:2078] 2024-06-06 13:43:00,583 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-06-06 13:43:00,583 >> Num examples = 2,091
[INFO|trainer.py:2080] 2024-06-06 13:43:00,583 >> Num Epochs = 3
[INFO|trainer.py:2081] 2024-06-06 13:43:00,583 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2084] 2024-06-06 13:43:00,583 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2085] 2024-06-06 13:43:00,583 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2086] 2024-06-06 13:43:00,583 >> Total optimization steps = 783
[INFO|trainer.py:2087] 2024-06-06 13:43:00,585 >> Number of trainable parameters = 2,511,024,128
0%| | 0/783 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/llamafac/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/home/ubuntu/haha/LLaMA-Factory/src/llamafactory/cli.py", line 95, in main
run_exp()
File "/home/ubuntu/haha/LLaMA-Factory/src/llamafactory/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/ubuntu/haha/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
outputs = model(**inputs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/transformers/models/paligemma/modeling_paligemma.py", line 419, in forward
inputs_embeds = self.get_input_embeddings()(input_ids)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/home/ubuntu/miniconda3/envs/llamafac/lib/python3.10/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
0%|

Owner

Which version of llamafactory did you use? and how about the training args?

Thank you for your reply, I have solved issue caused by env version.

Raku-Yihan changed discussion status to closed

Sign up or log in to comment