Qwen3-VL-4B-FLARE25

Model Description

This is a fully finetuned version of Qwen/Qwen3-VL-4B-Instruct on the FLARE 2025 medical imaging dataset. The model has been trained to perform various medical vision-language tasks across 8 imaging modalities and 19 different datasets.

Base Model: Qwen3-VL-4B-Instruct Training Dataset: FLARE25 (Medical Imaging Foundation Models: A Multi-task Learning Framework) Training Samples: 12232 samples across 26 datasets Supported Tasks:

  • Classification
  • Multi-label Classification
  • Detection
  • Instance Detection
  • Regression
  • Counting
  • Report Generation

Supported Medical Imaging Modalities

  1. Ultrasound - Breast ultrasound, intrauterine growth charts
  2. X-ray - Dental, chest, periapical radiographs
  3. Retinography - Fundus imaging, diabetic retinopathy
  4. Microscopy - Chromosome analysis, bone marrow, cell counting
  5. Clinical Photography - Neonatal jaundice assessment
  6. Dermatology - Skin lesion classification
  7. Endoscopy - Gastrointestinal imaging
  8. Mammography - Breast cancer screening

Performance Summary

Task Primary Metric Baseline Finetuned Improvement
Classification Balanced Accuracy 2.2% 53.5% +2,309%
Detection F1@0.5 0.0% 80.3% ∞ (new capability)
Instance Detection F1@0.5 0.01% 1.0% +9,900%
Multi-label Classification F1 Macro 28.3% 50.3% +77.7%
Regression MAE 35.8 22.4 +37.3%
Counting MAE 417.7 244.4 +41.5%
Report Generation GREEN Score 67.7% 80.8% +19.4%

Usage

Please check our GitHub for details

Training Details

Training Hyperparameters

  • Base Model: Qwen3-VL-4B-Instruct
  • Training Framework: DeepSpeed ZeRO-3
  • Learning Rate: 1e-5
  • Batch Size: 4 per device
  • Gradient Accumulation Steps: 4
  • Training Epochs: 2
  • Max Sequence Length: 8192
  • Image Resolution: Dynamic (max_pixels: 50176, min_pixels: 784)
  • Optimizer: AdamW
  • Mixed Precision: BF16
  • Gradient Checkpointing: Enabled

Training Data Distribution

The model was trained on 19 medical imaging datasets across 8 modalities:

Ultrasound:

  • BUSI, BUS-UCLM (Classification)
  • BUSI-det, BUS-UCLM-det (Detection)
  • IUGC (Classification + Detection)

X-ray:

  • Dental, Periapical, Bone Resorption, ChestDR, IU-XRay

Retinography:

  • Retino, Fundus

Microscopy:

  • Chromosome, Bone Marrow, NeurIPS22-Cell

Clinical/Dermatology/Endoscopy/Mammography:

  • Neojaundice, BCN20000, Endo, CMMD

Citation

If you use this model, please cite:

@misc{qwen3vl-flare25,
  author = {Shuolin Yin},
  title = {Qwen3-VL-4B Finetuned on FLARE25 Medical Imaging Dataset},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/leoyinn/qwen3vl-flare25}}
}

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

License

This model is released under the Apache 2.0 License. The base model license from Qwen also applies.

Acknowledgments

  • Base Model: Qwen3-VL-4B-Instruct by Alibaba Cloud
  • Dataset: FLARE 2025 Medical Imaging Challenge
  • Training Infrastructure: Built on the official Qwen3-VL finetuning framework

Repository

Full training code, evaluation scripts, and results: GitHub - FLARE25-QWen3VL-4B

Downloads last month
32
Safetensors
Model size
570k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leoyinn/qwen3vl-flare25

Finetuned
(117)
this model

Dataset used to train leoyinn/qwen3vl-flare25