LLM Models - a Stalin16 Collection

Stalin16 's Collections

Model Evaluation

Reasoning Models

Data and other things

Gen AI Diffusion

LLM Models

updated 20 days ago

gradientai/Llama-3-8B-Instruct-Gradient-1048k

Text Generation • Updated Oct 29, 2024 • 9.22k • 680
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 93
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Paper • 2412.11919 • Published Dec 16, 2024 • 36
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published Jan 3, 2025 • 33
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10, 2025 • 75
Baichuan-Omni-1.5 Technical Report

Paper • 2501.15368 • Published Jan 26, 2025 • 60
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13, 2025 • 148
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25, 2025 • 60
Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19, 2025 • 69
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

Paper • 2504.00072 • Published Mar 31, 2025 • 6
ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15, 2025 • 16
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80
Sentinel: SOTA model to protect against prompt injections

Paper • 2506.05446 • Published Jun 5, 2025 • 23
qualifire/prompt-injection-sentinel

Text Classification • 0.4B • Updated Sep 22, 2025 • 393 • 15
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 95
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 273
Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Paper • 2506.15645 • Published Jun 18, 2025 • 4
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 130
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

Paper • 2506.20480 • Published Jun 25, 2025 • 7
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Paper • 2506.19794 • Published Jun 24, 2025 • 8
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Paper • 2506.21551 • Published Jun 26, 2025 • 28
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 64
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Paper • 2507.07990 • Published Jul 10, 2025 • 46
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206
Qwen3Guard Technical Report

Paper • 2510.14276 • Published Oct 16, 2025 • 15
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20, 2025 • 35
RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
hustvl/InfiniteVL

Image-Text-to-Text • 4B • Updated 20 days ago • 1.08k • 3
LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published about 1 month ago • 175
Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published 21 days ago • 238