Indic Datasets List of text and voice datasets to train and finetune Indic LLMs ai4bharat/sangraha Viewer • Updated Jul 25 • 177M • 1.21k • 27 uonlp/CulturaX Viewer • Updated Jul 23 • 7.18B • 11.1k • 459 pary/hind_encorp Updated Jan 18 • 16 • 1 PleIAs/YouTube-Commons Updated Jun 26 • 21 • 301
Alignment Dataset English and other model alignment datasets. H-D-T/Buzz-8b-Large-v0.5 Text Generation • Updated May 14 • 12 • 29 allenai/WildChat-1M Viewer • Updated 14 days ago • 838k • 916 • 267 nvidia/ChatQA-Training-Data Viewer • Updated Jun 4 • 442k • 1.9k • 152 nvidia/ChatRAG-Bench Viewer • Updated May 24 • 34.6k • 1.78k • 94