AdaptLLM commited on
Commit
c905096
·
verified ·
1 Parent(s): da73565

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -8,10 +8,10 @@ This project adapts general Multimodal Large Language Models (MLLMs) to specific
8
 
9
  ### 1. Data Synthesis
10
  - We create a **generate-then-filter pipeline** using open-source models to make diverse visual tasks from domain-specific image-caption pairs.
11
- - This data works better than data made by hand or closed-source models (e.g., GPT-4V).
12
 
13
  ### 2. Training Pipeline
14
- - Instead of the usual two-step training (image-caption pairs first, then visual tasks), we use a **single-step training** to handle more tasks for specific domains.
15
 
16
  ### 3. Task Evaluation
17
  - We test our method in important fields like **biomedicine, food, and remote sensing**.
@@ -24,15 +24,20 @@ This project adapts general Multimodal Large Language Models (MLLMs) to specific
24
  | Model | Repo ID in HF 🤗 | Domain | Base Model | Training Data | Evaluation Benchmark |
25
  |:----------------------------------------------------------------------------|:--------------------------------------------|:--------------|:-------------------------|:------------------------------------------------------------------------------------------------|-----------------------|
26
  | [Visual Instruction Synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer) | AdaptLLM/visual-instruction-synthesizer | - | open-llava-next-llama3-8b | VisionFLAN and ALLaVA | - |
 
 
 
 
 
27
  | [AdaMLLM-med-2B](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) | AdaptLLM/biomed-Qwen2-VL-2B-Instruct | Biomedicine | Qwen2-VL-2B-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
28
  | [AdaMLLM-food-2B](https://huggingface.co/AdaptLLM/food-Qwen2-VL-2B-Instruct) | AdaptLLM/food-Qwen2-VL-2B-Instruct | Food | Qwen2-VL-2B-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
29
- | [AdaMLLM-remote-sensing-2B](https://huggingface.co/AdaptLLM/food-Qwen2-VL-2B-Instruct) | AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct | Remote Sensing | Qwen2-VL-2B-Instruct | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
30
  | [AdaMLLM-med-8B](https://huggingface.co/AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B) | AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B | Biomedicine | open-llava-next-llama3-8b | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
31
  | [AdaMLLM-food-8B](https://huggingface.co/AdaptLLM/food-LLaVA-NeXT-Llama3-8B) |AdaptLLM/food-LLaVA-NeXT-Llama3-8B | Food | open-llava-next-llama3-8b | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
32
- | [AdaMLLM-remote-sensing-8B](https://huggingface.co/AdaptLLM/food-LLaVA-NeXT-Llama3-8B) |AdaptLLM/remote-sensing-LLaVA-NeXT-Llama3-8B | Remote Sensing | open-llava-next-llama3-8b | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
33
  | [AdaMLLM-med-11B](https://huggingface.co/AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct | Biomedicine | Llama-3.2-11B-Vision-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
34
  | [AdaMLLM-food-11B](https://huggingface.co/AdaptLLM/food-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/food-Llama-3.2-11B-Vision-Instruct | Food | Llama-3.2-11B-Vision-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
35
- | [AdaMLLM-remote-sensing-11B](https://huggingface.co/AdaptLLM/food-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct | Remote Sensing | Llama-3.2-11B-Vision-Instruct | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
36
 
37
  **Code**: [https://github.com/bigai-ai/QA-Synthesizer](https://github.com/bigai-ai/QA-Synthesizer)
38
 
 
8
 
9
  ### 1. Data Synthesis
10
  - We create a **generate-then-filter pipeline** using open-source models to make diverse visual tasks from domain-specific image-caption pairs.
11
+ - This data works better than data made by hand or closed-source models (e.g., GPT-4V/o).
12
 
13
  ### 2. Training Pipeline
14
+ - Instead of the usual two-step training (image-caption pairs first, then visual tasks), we use a **single-stage training** to handle more tasks for specific domains.
15
 
16
  ### 3. Task Evaluation
17
  - We test our method in important fields like **biomedicine, food, and remote sensing**.
 
24
  | Model | Repo ID in HF 🤗 | Domain | Base Model | Training Data | Evaluation Benchmark |
25
  |:----------------------------------------------------------------------------|:--------------------------------------------|:--------------|:-------------------------|:------------------------------------------------------------------------------------------------|-----------------------|
26
  | [Visual Instruction Synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer) | AdaptLLM/visual-instruction-synthesizer | - | open-llava-next-llama3-8b | VisionFLAN and ALLaVA | - |
27
+ | [AdaMLLM-med-1B](https://huggingface.co/AdaptLLM/biomed-InternVL3-1B) | AdaptLLM/biomed-InternVL3-1B | Biomedicine | InternVL3-1B | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
28
+ | [AdaMLLM-med-4B](https://huggingface.co/AdaptLLM/biomed-gemma-3-4b-it) | AdaptLLM/biomed-gemma-3-4b-it | Biomedicine | gemma-3-4b-it | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
29
+ | [AdaMLLM-med-3B](https://huggingface.co/AdaptLLM/biomed-Qwen2.5-VL-3B-Instruct) | AdaptLLM/biomed-Qwen2.5-VL-3B-Instruct | Biomedicine | Qwen2.5-VL-3B-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
30
+ | [AdaMLLM-food-3B](https://huggingface.co/AdaptLLM/food-Qwen2.5-VL-3B-Instruct) | AdaptLLM/food-Qwen2.5-VL-3B-Instruct | Food | Qwen2.5-VL-3B-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
31
+ | [AdaMLLM-remote-sensing-3B](https://huggingface.co/AdaptLLM/remote-sensing-Qwen2.5-VL-3B-Instruct) | AdaptLLM/remote-sensing-Qwen2.5-VL-3B-Instruct | Remote Sensing | Qwen2.5-VL-3B-Instruct | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/remote-sensing-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/remote-sensing-VQA-benchmark) |
32
  | [AdaMLLM-med-2B](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) | AdaptLLM/biomed-Qwen2-VL-2B-Instruct | Biomedicine | Qwen2-VL-2B-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
33
  | [AdaMLLM-food-2B](https://huggingface.co/AdaptLLM/food-Qwen2-VL-2B-Instruct) | AdaptLLM/food-Qwen2-VL-2B-Instruct | Food | Qwen2-VL-2B-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
34
+ | [AdaMLLM-remote-sensing-2B](https://huggingface.co/AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct) | AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct | Remote Sensing | Qwen2-VL-2B-Instruct | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/remote-sensing-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/remote-sensing-VQA-benchmark) |
35
  | [AdaMLLM-med-8B](https://huggingface.co/AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B) | AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B | Biomedicine | open-llava-next-llama3-8b | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
36
  | [AdaMLLM-food-8B](https://huggingface.co/AdaptLLM/food-LLaVA-NeXT-Llama3-8B) |AdaptLLM/food-LLaVA-NeXT-Llama3-8B | Food | open-llava-next-llama3-8b | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
37
+ | [AdaMLLM-remote-sensing-8B](https://huggingface.co/AdaptLLM/remote-sensing-LLaVA-NeXT-Llama3-8B) |AdaptLLM/remote-sensing-LLaVA-NeXT-Llama3-8B | Remote Sensing | open-llava-next-llama3-8b | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/remote-sensing-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/remote-sensing-VQA-benchmark) |
38
  | [AdaMLLM-med-11B](https://huggingface.co/AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct | Biomedicine | Llama-3.2-11B-Vision-Instruct | [biomed-visual-instructions](https://huggingface.co/datasets/AdaptLLM/biomed-visual-instructions) | [biomed-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/biomed-VQA-benchmark) |
39
  | [AdaMLLM-food-11B](https://huggingface.co/AdaptLLM/food-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/food-Llama-3.2-11B-Vision-Instruct | Food | Llama-3.2-11B-Vision-Instruct | [food-visual-instructions](https://huggingface.co/datasets/AdaptLLM/food-visual-instructions) | [food-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/food-VQA-benchmark) |
40
+ | [AdaMLLM-remote-sensing-11B](https://huggingface.co/AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct) | AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct | Remote Sensing | Llama-3.2-11B-Vision-Instruct | [remote-sensing-visual-instructions](https://huggingface.co/datasets/AdaptLLM/remote-sensing-visual-instructions) | [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/remote-sensing-VQA-benchmark) |
41
 
42
  **Code**: [https://github.com/bigai-ai/QA-Synthesizer](https://github.com/bigai-ai/QA-Synthesizer)
43