Qwen3-VL-4B-FLARE25
Model Description
This is a fully finetuned version of Qwen/Qwen3-VL-4B-Instruct on the FLARE 2025 medical imaging dataset. The model has been trained to perform various medical vision-language tasks across 8 imaging modalities and 19 different datasets.
Base Model: Qwen3-VL-4B-Instruct Training Dataset: FLARE25 (Medical Imaging Foundation Models: A Multi-task Learning Framework) Training Samples: 12232 samples across 26 datasets Supported Tasks:
- Classification
- Multi-label Classification
- Detection
- Instance Detection
- Regression
- Counting
- Report Generation
Supported Medical Imaging Modalities
- Ultrasound - Breast ultrasound, intrauterine growth charts
- X-ray - Dental, chest, periapical radiographs
- Retinography - Fundus imaging, diabetic retinopathy
- Microscopy - Chromosome analysis, bone marrow, cell counting
- Clinical Photography - Neonatal jaundice assessment
- Dermatology - Skin lesion classification
- Endoscopy - Gastrointestinal imaging
- Mammography - Breast cancer screening
Performance Summary
| Task | Primary Metric | Baseline | Finetuned | Improvement |
|---|---|---|---|---|
| Classification | Balanced Accuracy | 2.2% | 53.5% | +2,309% |
| Detection | F1@0.5 | 0.0% | 80.3% | ∞ (new capability) |
| Instance Detection | F1@0.5 | 0.01% | 1.0% | +9,900% |
| Multi-label Classification | F1 Macro | 28.3% | 50.3% | +77.7% |
| Regression | MAE | 35.8 | 22.4 | +37.3% |
| Counting | MAE | 417.7 | 244.4 | +41.5% |
| Report Generation | GREEN Score | 67.7% | 80.8% | +19.4% |
Usage
Please check our GitHub for details
Training Details
Training Hyperparameters
- Base Model: Qwen3-VL-4B-Instruct
- Training Framework: DeepSpeed ZeRO-3
- Learning Rate: 1e-5
- Batch Size: 4 per device
- Gradient Accumulation Steps: 4
- Training Epochs: 2
- Max Sequence Length: 8192
- Image Resolution: Dynamic (max_pixels: 50176, min_pixels: 784)
- Optimizer: AdamW
- Mixed Precision: BF16
- Gradient Checkpointing: Enabled
Training Data Distribution
The model was trained on 19 medical imaging datasets across 8 modalities:
Ultrasound:
- BUSI, BUS-UCLM (Classification)
- BUSI-det, BUS-UCLM-det (Detection)
- IUGC (Classification + Detection)
X-ray:
- Dental, Periapical, Bone Resorption, ChestDR, IU-XRay
Retinography:
- Retino, Fundus
Microscopy:
- Chromosome, Bone Marrow, NeurIPS22-Cell
Clinical/Dermatology/Endoscopy/Mammography:
- Neojaundice, BCN20000, Endo, CMMD
Citation
If you use this model, please cite:
@misc{qwen3vl-flare25,
author = {Shuolin Yin},
title = {Qwen3-VL-4B Finetuned on FLARE25 Medical Imaging Dataset},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/leoyinn/qwen3vl-flare25}}
}
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
License
This model is released under the Apache 2.0 License. The base model license from Qwen also applies.
Acknowledgments
- Base Model: Qwen3-VL-4B-Instruct by Alibaba Cloud
- Dataset: FLARE 2025 Medical Imaging Challenge
- Training Infrastructure: Built on the official Qwen3-VL finetuning framework
Repository
Full training code, evaluation scripts, and results: GitHub - FLARE25-QWen3VL-4B
- Downloads last month
- 32
Model tree for leoyinn/qwen3vl-flare25
Base model
Qwen/Qwen3-VL-4B-Instruct