How different are its hardware requirements from those of the Qwen2-VL-2B?
I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?
If anyone has tried it, could you leave a comment here to let more people know?
I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?
If anyone has tried it, could you leave a comment here to let more people know?
Based on my experience, it works fine even on a CPU Qwen/Qwen3-VL-2B-Instruct is likely a merge with a 300M VisionEncoder with their previous Qwen/Qwen3-1.7B base Model.
for normal chat it will be very quick, for image inputs might take a while, but having a minimal GPU is enough to accelerate performance for such small model. earlier I had tried 4B variant of this VL model even that was fast for Chat on CPU.
btw there's not much size difference in Qwen2-VL-2B and Qwen3-VL-2B
For clarification there's NO Qwen2.5-VL-2B for 2.5-VL series it's 3B lowest.
I suspect it uses a larger visual tower, and the Qwen2-VL-2B requires higher hardware specifications to run compared to the Qwen2.5-VL-2B?
If anyone has tried it, could you leave a comment here to let more people know?
Based on my experience, it works fine even on a CPU
Qwen/Qwen3-VL-2B-Instructis likely a merge with a 300M VisionEncoder with their previousQwen/Qwen3-1.7Bbase Model.for normal chat it will be very quick, for image inputs might take a while, but having a minimal GPU is enough to accelerate performance for such small model. earlier I had tried 4B variant of this VL model even that was fast for Chat on CPU.
btw there's not much size difference in
Qwen2-VL-2BandQwen3-VL-2B
For clarification there's NOQwen2.5-VL-2Bfor 2.5-VL series it's 3B lowest.
thank you!