How different are its hardware requirements from those of the Qwen2-VL-2B?

#4
by likewendy - opened

I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?

If anyone has tried it, could you leave a comment here to let more people know?

I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?

If anyone has tried it, could you leave a comment here to let more people know?

Based on my experience, it works fine even on a CPU Qwen/Qwen3-VL-2B-Instruct is likely a merge with a 300M VisionEncoder with their previous Qwen/Qwen3-1.7B base Model.

for normal chat it will be very quick, for image inputs might take a while, but having a minimal GPU is enough to accelerate performance for such small model. earlier I had tried 4B variant of this VL model even that was fast for Chat on CPU.

btw there's not much size difference in Qwen2-VL-2B and Qwen3-VL-2B
For clarification there's NO Qwen2.5-VL-2B for 2.5-VL series it's 3B lowest.

I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?

If anyone has tried it, could you leave a comment here to let more people know?

Based on my experience, it works fine even on a CPU Qwen/Qwen3-VL-2B-Instruct is likely a merge with a 300M VisionEncoder with their previous Qwen/Qwen3-1.7B base Model.

for normal chat it will be very quick, for image inputs might take a while, but having a minimal GPU is enough to accelerate performance for such small model. earlier I had tried 4B variant of this VL model even that was fast for Chat on CPU.

btw there's not much size difference in Qwen2-VL-2B and Qwen3-VL-2B
For clarification there's NO Qwen2.5-VL-2B for 2.5-VL series it's 3B lowest.

thank you!

Sign up or log in to comment