--- license: apache-2.0 datasets: - roneneldan/TinyStories language: - en --- models are in models/ names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat) inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb) tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class. model.py is where the MambaLMHeadModel class is defined. context lenght is 256