LLM Models
updated
gradientai/Llama-3-8B-Instruct-Gradient-1048k
Text Generation
• Updated
• 9.22k
• 680
Are Your LLMs Capable of Stable Reasoning?
Paper
• 2412.13147
• Published
• 93
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
• 2412.11919
• Published
• 36
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published
• 107
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published
• 33
VideoRAG: Retrieval-Augmented Generation over Video Corpus
Paper
• 2501.05874
• Published
• 75
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published
• 60
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper
• 2502.18137
• Published
• 60
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published
• 69
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Paper
• 2504.00072
• Published
• 6
ReZero: Enhancing LLM search ability by trying one-more-time
Paper
• 2504.11001
• Published
• 16
Paper
• 2506.03569
• Published
• 80
Sentinel: SOTA model to protect against prompt injections
Paper
• 2506.05446
• Published
• 23
qualifire/prompt-injection-sentinel
Text Classification
• 0.4B • Updated
• 393
• 15
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
• 2506.07900
• Published
• 95
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
• 2506.09991
• Published
• 55
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
• 2506.13585
• Published
• 273
Demystifying the Visual Quality Paradox in Multimodal Large Language
Models
Paper
• 2506.15645
• Published
• 4
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published
• 130
GPTailor: Large Language Model Pruning Through Layer Cutting and
Stitching
Paper
• 2506.20480
• Published
• 7
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic
Empirical Study
Paper
• 2506.19794
• Published
• 8
Where to find Grokking in LLM Pretraining? Monitor
Memorization-to-Generalization without Test
Paper
• 2506.21551
• Published
• 28
MMSearch-R1: Incentivizing LMMs to Search
Paper
• 2506.20670
• Published
• 64
Multi-Granular Spatio-Temporal Token Merging for Training-Free
Acceleration of Video LLMs
Paper
• 2507.07990
• Published
• 46
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
Qwen3Guard Technical Report
Paper
• 2510.14276
• Published
• 15
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented
Generation
Paper
• 2510.17354
• Published
• 35
RL makes MLLMs see better than SFT
Paper
• 2510.16333
• Published
• 49
Image-Text-to-Text
• 4B • Updated
• 1.08k
• 3
LongCat-Flash-Thinking-2601 Technical Report
Paper
• 2601.16725
• Published
• 175
Kimi K2.5: Visual Agentic Intelligence
Paper
• 2602.02276
• Published
• 238