Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 13 days ago • 40
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published 29 days ago • 16
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
view post Post 448 What are currently the best multilingual models with at most 72B parameters? Are Llama 3.3 70B and Qwen 2.5 72B still king? See translation 1 reply · 👀 1 1 + Reply
view post Post 1652 I fine-tuned a smol VLM to generate specialized art history metadata!https://huggingface.co/davanstrien/iconclass-vlm: Qwen2.5-VL-3B trained using SFT to generate ICONCLASS codes (think Dewey Decimal for art!)Trained with TRL + HF Jobs - single UV script, no GPU needed! Space to explore predictions on a test set: davanstrien/iconclass-predictionsBlog soon! See translation 👍 4 4 + Reply
view post Post 916 Thanks to popular request, I've just added two subsets to the CommonCrawl-Creative Commons Corpus (C5; BramVanroy/CommonCrawl-CreativeCommons) so that you do not have to do filtering manually- C5f ( BramVanroy/CommonCrawl-CreativeCommons-fine): only retains high-quality samples that are also present in FineWeb or FineWeb-2;- C5r (https://huggingface.co/datasets/BramVanroy/CommonCrawl-CreativeCommons-recommended): additional strict filtering that removes samples with license disagreement, non-commercial licenses, and Wikipedia samples. The latter because you should probably get those from a more reliable source that provides better parsed content.It goes without saying that these filters lead to a massive reduction in quantity. Doc and token counts are given on the dataset pages. See translation 👍 3 3 + Reply
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 69
Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG Paper • 2507.20136 • Published Jul 27, 2025
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
PL-Guard: Benchmarking Language Model Safety for Polish Paper • 2506.16322 • Published Jun 19, 2025 • 1
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT Paper • 2506.04929 • Published Jun 5, 2025 • 2