Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR
Model Details
Model Description
Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.
Model Sources
- Repository: https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR
- Base model: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
- Fine-tuning run: W&B Experiment
Uses
This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.
The model is not designed for:
- Handwritten OCR
- Scene text in natural environments (e.g., street signs)
- Legal or financial document processing without human review
Training Details
Training Data
The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.
The dataset focuses on clean UI text extraction rather than general image captioning.
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 8 |
| Batch size | 8 |
| Learning rate | 3e-4 |
| LoRA rank | 64 |
| LoRA alpha | 64 |
| Precision | bfloat16 (mixed) |
| Optimizer | AdamW |
| Scheduler | Cosine decay |
| Gradient accumulation | 2 |
| Weight decay | 0.01 |
Model tree for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR
Base model
Qwen/Qwen3-VL-2B-Instruct