view article Article We’re open-sourcing our text-to-image model and the process behind it 25 days ago • 73
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Paper • 2411.19331 • Published Nov 28, 2024 • 5
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 155
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 722
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10 • 146
distil-large-v3.5 Collection This collection contains the model repositories for distil-large-v3.5, which provides support for the most popular Whisper libraries. • 5 items • Updated Mar 25 • 9
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated Mar 21 • 22
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 96
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 12 items • Updated 10 days ago • 44