Andrea Soria

asoria

AI & ML interests

Maintainer of 🤗Datasets: Data processing

Articles

Organizations

asoria's activity

upvoted an article 3 days ago
view article
Article

Fine-Tuning Gemma Models in Hugging Face

21
upvoted an article 22 days ago
view article
Article

The 5 Most Under-Rated Tools on Hugging Face

74
upvoted an article about 1 month ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

242
upvoted 3 articles about 2 months ago
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

63
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

58
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

30
upvoted 2 articles 2 months ago
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

23
view article
Article

Announcing New Dataset Search Features

22
upvoted an article 3 months ago
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung
10
upvoted 2 articles 4 months ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

14
view article
Article

Synthetic data: save money, time and carbon with open source

45
upvoted 2 articles 5 months ago
view article
Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

67
view article
Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

23
upvoted 2 articles 5 months ago
view article
Article

It's raining diffusion personalization techniques☔️🎭🖼️

By linoyts
18
view article
Article

DuckDB: run SQL queries on 50,000+ datasets on the Hugging Face Hub

4