OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

Eurayka authored a paper about 18 hours ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

ownerEli authored a paper 5 days ago

STEP3-VL-10B Technical Report

Eurayka submitted a paper 5 days ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

View all activity

Papers

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

View all Papers

Eurayka

authored a paper about 18 hours ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published 6 days ago • 10

ownerEli

authored a paper 5 days ago

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published 7 days ago • 178

Eurayka

submitted a paper to Daily Papers 5 days ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published 6 days ago • 10

vansin

submitted a paper to Daily Papers 8 days ago

End-to-End Video Character Replacement without Structural Guidance

Paper • 2601.08587 • Published 8 days ago • 7

heroding77

authored a paper 8 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 9 days ago • 25

heroding77

submitted a paper to Daily Papers 8 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 9 days ago • 25

Nymbo

posted an update 14 days ago

Post

1772

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

kpzhang996

submitted a paper to Daily Papers 22 days ago

Yume-1.5: A Text-Controlled Interactive World Generation Model

Paper • 2512.22096 • Published 26 days ago • 59

lll2343

updated 3 models 25 days ago

revliter

updated a collection 28 days ago

InternVideo-Next

Collection

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision • 5 items • Updated 28 days ago • 4

vansin

posted an update about 1 month ago

Post

277

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Eurayka

authored a paper about 1 month ago

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

Paper • 2511.20272 • Published Nov 25, 2025 • 1

Nymbo

posted an update about 1 month ago

Post

2174

🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets

I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot

KingNish

posted an update about 1 month ago

Post

2573

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

KingNish

posted an update about 1 month ago

Post

2551

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.co/blog/KingNish/optimizer-part1

1 reply

revliter

updated a collection about 2 months ago

InternVideo-Next

Collection

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision • 5 items • Updated 28 days ago • 4

AI & ML interests

Recent Activity

Papers

Team members 117

OpenGVLab's activity