Vaibhav Srivastav
reach-vb
322
followers
·
36 following
AI & ML interests
TTS + LM performance prediction
Organizations
view post
What an eventful day in Open Source LLMs today: Mistral released Codestral Mamba 🐍 > Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B > Mamba 2 architecture - supports up to 256K context > Apache 2.0 licensed, perfect for local code assistant > Transformers & llama.cpp integration upcoming! Model checkpoint: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1 Hugging Face dropped SmolLM 🤏 > Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more! > 135M, 360M, and 1.7B param model checkpoints > Trained on 600B high-quality synthetic + FineWeb Edu tokens > Architecture: Llama + GQA + 2048 ctx length > Ripe for fine-tuning and on-device deployments. > Works out of the box with Transformers! Model checkpoints:
HuggingFaceTB/smollm-6695016cad7167254ce15966 Mistral released Mathstral 7B ∑ > 56.6% on MATH and 63.47% on MMLU > Same architecture as Mistral 7B > Works out of the box with Transformers & llama.cpp > Released under Apache 2.0 license Model checkpoint: https://huggingface.co/mistralai/mathstral-7B-v0.1 Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! 🤗 What's your favourite from the release today?
view post
Yet another rewarding week in Open Source AI: 1. Google dropped Gemma 27B & 9B - The best open (commercially permissive) LLM out there, according to LYMSYS.
google/gemma-2-release-667d6600fd5220e7b967f315 2. Mars5 TTS - Text to Speech with insane prosodies control & voice cloning.
CAMB-AI/MARS5-TTS 3. Meta shipped LLM Compiler - beats GPT 4 on code optimisation and compiler reasoning.
facebook/llm-compiler-667c5b05557fe99a9edd25cb 4. Arcee-Spark - Qwen2 7B (w/ merging) fine-tuned further to beat GPT 3.5 on MT Bench.
arcee-ai/Arcee-Spark 5. Gemini Nano out in the wild in Chrome - On device LLM with just 2 lines of code (fully offline) 6. Fal released a fully Open Source GAN based Super-Resolution model (with second version already cooking)
fal/AuraSR 7. NYU release Cambrian 1 - Vision Multimodal LLM that beats pretty much all other closed source competition 8-34B model size https://huggingface.co/nyu-visionx And.. much more like Open LLM Leaderboard got a major update, LYMSYS released Chat Vision Arena, OpenAI released a paper on CriticGPT! What a lovely week, can’t wait for the next to see what the community is up to! Put it down in comments if I missed something 🔥