Qwen3-VL-8B WebSight Fine-tuned

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct on the WebSight dataset for GUI automation tasks.

Model Description

  • Base Model: Qwen/Qwen3-VL-8B-Instruct
  • Fine-tuning Method: LoRA (merged)
  • Dataset: wave-ui/websight-v2
  • Task: Image-to-click location prediction
  • Output Format: pyautogui.click(x, y) commands

Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
import torch

# Load model and processor
model = AutoModelForVision2Seq.from_pretrained(
    "Asanshay/qwen3-vl-8b-websight-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "Asanshay/qwen3-vl-8b-websight-merged",
    trust_remote_code=True
)

# Prepare input
image = Image.open("screenshot.png")
prompt = "click the login button"

inputs = processor(
    text=f"<image>\n{prompt}",
    images=image,
    return_tensors="pt"
).to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)

result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)  # Output: pyautogui.click(x, y)

Training Details

  • Training Framework: LLaMA-Factory
  • Hardware: 8x H100 GPUs
  • LoRA Config:
    • Rank: 64
    • Alpha: 128
    • Dropout: 0.05
    • Target modules: all linear layers

Output Format

The model outputs click coordinates normalized to 1400x800 resolution:

  • Format: pyautogui.click(x, y)
  • Example: pyautogui.click(565, 486)

Scale to your screen resolution:

x_actual = int(x_norm * (screen_width / 1400))
y_actual = int(y_norm * (screen_height / 800))

Citation

@misc{qwen3-vl-websight,
  title={Qwen3-VL Fine-tuned for GUI Automation},
  author={Your Name},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/Asanshay/qwen3-vl-8b-websight-merged}}
}

License

Apache 2.0 (inherited from base model)

Downloads last month
23
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Asanshay/websight-v2-grounded

Finetuned
(81)
this model