---
license: apache-2.0
datasets:
- roneneldan/TinyStories
language:
- en
---
models are in models/

names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)

inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)

tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one

train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.

model.py is where the MambaLMHeadModel class is defined.

context lenght is 256