Edit model card

Built with Axolotl Visualize in Weights & Biases

medusa-microllama_305M_stage1_v2

This model is a fine-tuned version of keeeeenw/MicroLlama on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5107

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 40
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
3.0312 0.0244 40 3.0649
3.026 0.0489 80 2.9528
2.8781 0.0733 120 2.9163
2.8075 0.0978 160 2.9268
2.9164 0.1222 200 2.9027
2.7724 0.1467 240 2.8815
2.8856 0.1711 280 2.8871
2.718 0.1955 320 2.8749
2.6479 0.2200 360 2.8815
2.6194 0.2444 400 2.8872
2.7954 0.2689 440 2.8773
2.7008 0.2933 480 2.8572
2.6876 0.3178 520 2.8560
2.879 0.3422 560 2.8665
2.7377 0.3666 600 2.8482
2.7459 0.3911 640 2.8512
2.8036 0.4155 680 2.8712
2.89 0.4400 720 2.8614
2.7898 0.4644 760 2.8570
2.891 0.4888 800 2.8384
2.717 0.5133 840 2.8344
2.8589 0.5377 880 2.8342
2.8944 0.5622 920 2.8040
2.85 0.5866 960 2.8012
2.8057 0.6111 1000 2.8063
2.6772 0.6355 1040 2.7957
2.7905 0.6599 1080 2.7822
2.7579 0.6844 1120 2.7922
2.7625 0.7088 1160 2.7763
2.85 0.7333 1200 2.7607
2.8447 0.7577 1240 2.7611
2.8027 0.7822 1280 2.7501
2.461 0.8066 1320 2.7201
2.6232 0.8310 1360 2.6906
2.6998 0.8555 1400 2.6763
2.7609 0.8799 1440 2.6603
2.6003 0.9044 1480 2.6549
2.2626 0.9288 1520 2.6484
2.5896 0.9533 1560 2.6389
2.5704 0.9777 1600 2.6245
2.1629 1.0021 1640 2.6164
2.1719 1.0266 1680 2.6152
2.2115 1.0510 1720 2.6134
2.359 1.0755 1760 2.6127
2.3486 1.0999 1800 2.6066
2.1864 1.1244 1840 2.6041
2.1692 1.1488 1880 2.6023
2.1455 1.1732 1920 2.5998
2.195 1.1977 1960 2.5914
2.3458 1.2221 2000 2.5883
2.1419 1.2466 2040 2.5827
2.1329 1.2710 2080 2.5743
2.2733 1.2954 2120 2.5686
2.2662 1.3199 2160 2.5654
2.399 1.3443 2200 2.5637
2.1518 1.3688 2240 2.5563
2.1115 1.3932 2280 2.5483
2.2048 1.4177 2320 2.5434
2.2658 1.4421 2360 2.5390
2.2186 1.4665 2400 2.5366
2.1467 1.4910 2440 2.5321
2.2352 1.5154 2480 2.5281
2.2507 1.5399 2520 2.5250
2.1987 1.5643 2560 2.5221
2.2234 1.5888 2600 2.5205
2.0497 1.6132 2640 2.5181
2.1133 1.6376 2680 2.5166
2.1047 1.6621 2720 2.5153
2.1578 1.6865 2760 2.5148
2.1869 1.7110 2800 2.5135
2.0953 1.7354 2840 2.5126
2.1413 1.7599 2880 2.5119
2.1333 1.7843 2920 2.5115
2.2001 1.8087 2960 2.5114
2.1889 1.8332 3000 2.5111
2.2247 1.8576 3040 2.5110
2.2258 1.8821 3080 2.5108
2.157 1.9065 3120 2.5107
2.181 1.9310 3160 2.5107
2.1441 1.9554 3200 2.5107
2.4097 1.9798 3240 2.5107

Framework versions

  • Transformers 4.43.0
  • Pytorch 2.3.1
  • Datasets 2.15.0
  • Tokenizers 0.19.1
Downloads last month
148
Safetensors
Model size
440M params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Momorami/medusa-microllama_305m_stage1

Finetuned
this model

Dataset used to train Momorami/medusa-microllama_305m_stage1