| base_model: | |
| - llava-hf/llama3-llava-next-8b-hf | |
| - openbmb/MiniCPM-V-2_6 | |
| - microsoft/Phi-3-vision-128k-instruct | |
| - Qwen/Qwen2.5-VL-7B-Instruct | |
| license: mit | |
| metrics: | |
| - accuracy | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| **The following models are obtained via supervised fine-tuning (SFT) using the ECD-10k-Images dataset ([URL](https://huggingface.co/datasets/ChartFoundation/ECD-10k-Images)) proposed in our ICCV 2025 paper, "[Effective Training Data Synthesis for Improving MLLM Chart Understanding](https://huggingface.co/papers/2508.06492)" ([Code](https://github.com/yuweiyang-anu/ECD)).** | |
| **ECD Dataset Overview**: | |
|  | |
| **Comparing 4 MLLMs on six test sets: (CharXiv, ChartQA, ReachQA, ChartBench, ChartX, ECDBench)** | |
|  | |
| **Citation**: | |
| If it is helpful to your research, please cite our paper as follows: | |
| ``` | |
| @inproceedings{yang2025effective, | |
| title={Effective Training Data Synthesis for Improving MLLM Chart Understanding}, | |
| author={Yang, Yuwei and Zhang, Zeyu and Hou, Yunzhong and Li, Zhuowan and Liu, Gaowen and Payani, Ali and Ting, Yuan-Sen and Zheng, Liang}, | |
| booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, | |
| year={2025} | |
| } | |
| ``` |