testmobileclip / README.md
finhdev's picture
Update README.md
6de994e verified
---
license: apple-amlr
license_name: apple-ascl
license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
library_name: mobileclip
---
# πŸ“Έ MobileCLIP-B Zero-Shot Image Classifier
### Hugging Face Inference Endpoint
> **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint.
> Handles image β†’ text similarity in a single fast call.
---
## πŸ“‘ Sidebar
- [Features](#-features)
- [Repository layout](#-repository-layout)
- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
- [How it works](#-how-it-works)
- [Updating the label set](#-updating-the-label-set)
- [License](#-license)
---
## ✨ Features
| | This repo |
|------------------------------|-----------|
| **Model** | MobileCLIP-B (`datacompdr` checkpoint) |
| **Branch fusion** | `reparameterize_model` baked in |
| **Mixed-precision** | FP16 on GPU, FP32 on CPU |
| **Pre-computed text feats** | One-time encoding of prompts in `items.json` |
| **Per-request work** | _Only_ image decoding β†’ encode_image β†’ softmax |
| **Latency (A10G)** | < 30 ms once the image arrives |
---
## πŸ“ Repository layout
| Path | Purpose |
|--------------------|------------------------------------------------------------------|
| `handler.py` | HF entry-point (loads model + text cache, serves requests) |
| `reparam.py` | 60-line stand-alone copy of Apple’s `reparameterize_model` |
| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) |
| `items.json` | Your label set (`id`, `name`, `prompt` per line) |
| `README.md` | This document |
---
## πŸš€ Quick start (local smoke-test)
```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()
img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes
PY
```
---
## 🌐 Calling the deployed endpoint
```bash
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"
python - <<'PY'
import base64, json, os, requests, sys
url = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img = sys.argv
payload = {
"inputs": {
"image": base64.b64encode(open(img, "rb").read()).decode()
}
}
resp = requests.post(
url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
},
json=payload,
timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG
```
*Response example*
```json
[
{ "id": 23, "label": "cat", "score": 0.92 },
{ "id": 11, "label": "tiger cat", "score": 0.05 },
{ "id": 48, "label": "siamese cat", "score": 0.02 }
]
```
---
## βš™οΈ How it works
1. **Startup (runs once per replica)**
* Downloads / loads MobileCLIP-B (`datacompdr`).
* Fuses MobileOne branches via `reparam.py`.
* Reads `items.json` and encodes every prompt β†’ `[N,512]` tensor.
2. **Per request**
* Decodes base-64 JPEG/PNG.
* Applies OpenCLIP preprocessing (224 Γ— 224 center-crop + normalise).
* Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
* Returns sorted `[{id, label, score}, …]`.
---
## πŸ”„ Updating the label set
Simply edit `items.json`, push, and redeploy.
```json
[
{ "id": 0, "name": "cat", "prompt": "a photo of a cat" },
{ "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]
```
No code changes are required; the handler re-encodes prompts at start-up.
---
## βš–οΈ License
* **Weights / data** β€” Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
* **This wrapper code** β€” MIT
---
<div align="center"><sub>Maintained with ❀️ by Your-Team β€” Aug 2025</sub></div>