Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Model Details

Model Description

Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

Model Sources

Repository: https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR
Base model: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
Fine-tuning run: W&B Experiment

Uses

This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.

The model is not designed for:

Handwritten OCR
Scene text in natural environments (e.g., street signs)
Legal or financial document processing without human review

Training Details

Training Data

The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

The dataset focuses on clean UI text extraction rather than general image captioning.

Training Hyperparameters

Parameter	Value
Epochs	8
Batch size	8
Learning rate	3e-4
LoRA rank	64
LoRA alpha	64
Precision	bfloat16 (mixed)
Optimizer	AdamW
Scheduler	Cosine decay
Gradient accumulation	2
Weight decay	0.01

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Base model

Qwen/Qwen3-VL-2B-Instruct

Quantized

unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit

Adapter

(2)

this model

Dataset used to train Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Collection including Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

🐦 Loyca

Collection

4 items • Updated 28 days ago • 1