Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Model Details

Model Description

Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

Model Sources


Uses

This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.

The model is not designed for:

  • Handwritten OCR
  • Scene text in natural environments (e.g., street signs)
  • Legal or financial document processing without human review

Training Details

Training Data

The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

The dataset focuses on clean UI text extraction rather than general image captioning.

Training Hyperparameters

Parameter Value
Epochs 8
Batch size 8
Learning rate 3e-4
LoRA rank 64
LoRA alpha 64
Precision bfloat16 (mixed)
Optimizer AdamW
Scheduler Cosine decay
Gradient accumulation 2
Weight decay 0.01
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Dataset used to train Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Collection including Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR