ZhenShiL nielsr HF Staff commited on
Commit
e887b57
·
verified ·
1 Parent(s): 8b91c55

Improve model card: Add metadata, links, checkpoints, datasets, and usage example for FarSLIP (#1)

Browse files

- Improve model card: Add metadata, links, checkpoints, datasets, and usage example for FarSLIP (b44a381f78a253b78cf4460026734f881eea176a)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: zero-shot-image-classification
4
+ library_name: open_clip
5
+ datasets:
6
+ - ZhenShiL/MGRS-200k
7
+ - omlab/RS5M
8
+ tags:
9
+ - remote-sensing
10
+ ---
11
+
12
+ <h1 align="center"> FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding </h1>
13
+
14
+ <p align="center">
15
+ <a href="https://huggingface.co/datasets/ZhenShiL/MGRS-200k">
16
+ <img alt="Hugging Face Dataset" src="https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-blue">
17
+ </a>
18
+ <a href="https://huggingface.co/ZhenShiL/FarSLIP">
19
+ <img alt="Hugging Face Model" src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-yellow">
20
+ </a>
21
+ <a href="https://huggingface.co/papers/2511.14901">
22
+ <img alt="Hugging Face Paper" src="https://img.shields.io/badge/%F0%9F%97%92%20Paper-2511.14901-b31b1b">
23
+ </a>
24
+ </p>
25
+
26
+ **Paper**: [FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding](https://huggingface.co/papers/2511.14901)
27
+ **Code**: [https://github.com/NJU-LHRS/FarSLIP](https://github.com/NJU-LHRS/FarSLIP)
28
+
29
+ ## Introduction
30
+ We introduce FarSLIP, a vision-language foundation model for remote sensing (RS) that achieves fine-grained vision-language alignment. FarSLIP demonstrates state-of-the-art performance on both fine-grained and image-level tasks, including open-vocabulary semantic segmentation, zero-shot classification, and image-text retrieval.
31
+ We also construct MGRS-200k, the first multi-granularity image-text dataset for RS. Each image is annotated with both short and long global-level captions, along with multiple object-category pairs.
32
+
33
+ <figure>
34
+ <div align="center">
35
+ <img src="https://github.com/NJU-LHRS/FarSLIP/raw/main/assets/model.png" width="60%">
36
+ </div>
37
+ </figure>
38
+
39
+ ## Checkpoints
40
+ You can download all our checkpoints from [Huggingface](https://huggingface.co/ZhenShiL/FarSLIP), or selectively download them through the links below.
41
+
42
+ | Model name | Architecture | OVSS mIoU (%) | ZSC top-1 accuracy (%) | Download |
43
+ |-------------|--------------|---------------|-------------------------|----------------|
44
+ | FarSLIP-s1 | ViT-B-32 | 29.87 | 58.64 | [FarSLIP1_ViT-B-32](https://huggingface.co/ZhenShiL/FarSLIP/resolve/main/FarSLIP1_ViT-B-32.pt?download=true) |
45
+ | FarSLIP-s2 | ViT-B-32 | 30.49 | 60.12 | [FarSLIP2_ViT-B-32](https://huggingface.co/ZhenShiL/FarSLIP/resolve/main/FarSLIP2_ViT-B-32.pt?download=true) |
46
+ | FarSLIP-s1 | ViT-B-16 | 35.44 | 61.89 | [FarSLIP1_ViT-B-16](https://huggingface.co/ZhenShiL/FarSLIP/resolve/main/FarSLIP1_ViT-B-16.pt?download=true) |
47
+ | FarSLIP-s2 | ViT-B-16 | 35.41 | 62.24 | [FarSLIP2_ViT-B-16](https://huggingface.co/ZhenShiL/FarSLIP/resolve/main/FarSLIP2_ViT-B-16.pt?download=true) |
48
+
49
+ ## Dataset
50
+ FarSLIP is trained in two stages.
51
+ + In the first stage, we use the [RS5M](https://github.com/om-ai-lab/RS5M) dataset. A quick portal to the RS5M dataset: [link](https://huggingface.co/datasets/omlab/RS5M).
52
+ + In the second stage, we use the proposed MGRS-200k dataset, which is available on [Huggingface](https://huggingface.co/datasets/ZhenShiL/MGRS-200k).
53
+
54
+ <p align="center">
55
+ <img src="https://github.com/NJU-LHRS/FarSLIP/raw/main/assets/dataset.png" width="100%">
56
+ <br>
57
+ <em>Examples from MGRS-200k</em>
58
+ </p>
59
+
60
+ ## Usage / Testing
61
+
62
+ Below is a sample usage for zero-shot scene classification, taken directly from the [official GitHub repository](https://github.com/NJU-LHRS/FarSLIP#zero-shot-scene-classification).
63
+
64
+ ### Zero-shot scene classification
65
+ + Please refer to [SkyScript](https://github.com/wangzhecheng/SkyScript?tab=readme-ov-file#download-benchmark-datasets) for scene classification dataset preparation, including 'SkyScript_cls', 'aid', 'eurosat', 'fmow', 'millionaid', 'patternnet', 'rsicb', 'nwpu'.
66
+ + Replace the `BENCHMARK_DATASET_ROOT_DIR` in `tests/test_scene_classification.py` to your own path.
67
+
68
+ + Run testing (e.g. FarSLIP-s1 with ViT-B-32):
69
+ ```
70
+ python -m tests.test_scene_classification --model-arch ViT-B-32 --model-name FarSLIP1 --force-quick-gelu --pretrained checkpoints/FarSLIP1_ViT-B-32.pt
71
+ ```
72
+
73
+ <figure>
74
+ <div align="center">
75
+ <img src="https://github.com/NJU-LHRS/FarSLIP/raw/main/assets/classification.png" width="100%">
76
+ </div>
77
+ <figcaption align="center">
78
+ <em>Comparison of zero-shot classification accuracies (Top-1 acc., %) of different RS-specific CLIP variants across multiple benchmarks.</em>
79
+ </figcaption>
80
+ </figure>
81
+
82
+ ## Citation
83
+ If you find our work is useful, please give us ⭐ in GitHub and consider cite our paper:
84
+
85
+ ```tex
86
+ @article{li2025farslip,
87
+ title={FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding},
88
+ author={Zhenshi Li and Weikang Yu and Dilxat Muhtar and Xueliang Zhang and Pengfeng Xiao and Pedram Ghamisi and Xiao Xiang Zhu},
89
+ journal={arXiv preprint arXiv:2511.14901},
90
+ year={2025}
91
+ }
92
+ ```