Edit model card

smol_llama: 220M GQA

A small 220M param (total) decoder model. This is the first version of the model.

  • 1024 hidden size, 10 layers
  • GQA (32 heads, 8 key-value), context length 2048
  • train-from-scratch on one GPU :)

Links

Here are some fine-tunes we did, but there are many more possibilities out there!

  • instruct
    • openhermes - link
    • open-instruct - link
  • code
    • python (pypi) - link
  • zephyr DPO tune

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 29.44
AI2 Reasoning Challenge (25-Shot) 24.83
HellaSwag (10-Shot) 29.76
MMLU (5-Shot) 25.85
TruthfulQA (0-shot) 44.55
Winogrande (5-shot) 50.99
GSM8k (5-shot) 0.68

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 6.62
IFEval (0-Shot) 23.86
BBH (3-Shot) 3.04
MATH Lvl 5 (4-Shot) 0.00
GPQA (0-shot) 0.78
MuSR (0-shot) 9.07
MMLU-PRO (5-shot) 1.66
Downloads last month
2,247
Safetensors
Model size
218M params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for BEE-spoke-data/smol_llama-220M-GQA

Finetunes
12 models
Merges
1 model
Quantizations
3 models

Datasets used to train BEE-spoke-data/smol_llama-220M-GQA

Collection including BEE-spoke-data/smol_llama-220M-GQA

Evaluation results