Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.00919

about 22 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

about 5 hours ago

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Paper • 2602.06035 • Published 16 days ago • 23
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 22 days ago • 192
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Paper • 2602.16705 • Published 3 days ago • 26

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Paper • 2601.23143 • Published 22 days ago • 38
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 22 days ago • 192
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 195
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 196

World-in-World: World Models in a Closed-Loop World

Paper • 2510.18135 • Published Oct 20, 2025 • 77
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 52
World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28, 2025 • 44
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Paper • 2601.14133 • Published Jan 20 • 61

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 30
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

Paper • 2509.10510 • Published Sep 2, 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 67
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 238
Self-Supervised Prompt Optimization

Paper • 2502.06855 • Published Feb 7, 2025 • 16

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6, 2025 • 45
A Survey on Vision-Language-Action Models for Autonomous Driving

Paper • 2506.24044 • Published Jun 30, 2025 • 14
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper • 2507.10548 • Published Jul 14, 2025 • 37

about 22 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284

about 5 hours ago

InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Paper • 2602.06035 • Published 16 days ago • 23
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 22 days ago • 192
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Paper • 2602.16705 • Published 3 days ago • 26

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 21 days ago • 284

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Paper • 2601.23143 • Published 22 days ago • 38
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 22 days ago • 192
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 195
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 196

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

Paper • 2509.10510 • Published Sep 2, 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 67
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 238
Self-Supervised Prompt Optimization

Paper • 2502.06855 • Published Feb 7, 2025 • 16

World-in-World: World Models in a Closed-Loop World

Paper • 2510.18135 • Published Oct 20, 2025 • 77
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 52
World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28, 2025 • 44
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Paper • 2601.14133 • Published Jan 20 • 61

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6, 2025 • 45
A Survey on Vision-Language-Action Models for Autonomous Driving

Paper • 2506.24044 • Published Jun 30, 2025 • 14
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper • 2507.10548 • Published Jul 14, 2025 • 37

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 30
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs