EarthMind-R1

EarthMind-R1 is a vision-language model fine-tuned using GRPO (Group Relative Policy Optimization) for geospatial and remote sensing image understanding tasks.

Model Description

Base Model: EarthMind-4B
Training Method: GRPO (Group Relative Policy Optimization)
Training Data: Geospatial instruction dataset
Fine-tuning: LoRA adapters merged into base weights

Usage

Quick Start

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_id = "aadex/Earthmind-R1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Load an image
image = Image.open("your_image.jpg").convert("RGB")

# Ask a question
question = "Describe what you see in this satellite image."

# Use model's chat interface
response = model.chat(
    tokenizer=tokenizer,
    question=question,
    images=[image],
    generation_config={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "do_sample": True,
    },
)

print(response)

Expected Output Format

The model is trained to provide structured responses:

<think>
[Reasoning about the image content]
</think>
<answer>
[Final answer to the question]
</answer>

Requirements

torch>=2.0
transformers>=4.40
accelerate
pillow

Hardware Requirements

Minimum: 16GB VRAM (with bfloat16)
Recommended: 24GB VRAM for comfortable inference

Training Details

Framework: VLM-R1 + TRL
Optimizer: AdamW
Learning Rate: 1e-6
LoRA Configuration:
- r: 32
- alpha: 64
- dropout: 0.05
GRPO Settings:
- num_generations: 4
- num_iterations: 2
- beta: 0.01

Limitations

Optimized for geospatial/remote sensing imagery
May not perform as well on general domain images
Response quality depends on image resolution and clarity

Citation

If you use this model, please cite:

@misc{earthmind-r1,
  title={EarthMind-R1: GRPO Fine-tuned Vision-Language Model for Geospatial Understanding},
  author={Your Name},
  year={2024},
  publisher={HuggingFace}
}

License

Apache 2.0

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16