|
|
--- |
|
|
license: apple-amlr |
|
|
license_name: apple-ascl |
|
|
license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data |
|
|
library_name: mobileclip |
|
|
--- |
|
|
|
|
|
# πΈ MobileCLIP-B Zero-Shot Image Classifier |
|
|
### Hugging Face Inference Endpoint |
|
|
|
|
|
> **Production-ready wrapper** around Appleβs MobileCLIP-B checkpoint. |
|
|
> Handles image β text similarity in a single fast call. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Sidebar |
|
|
|
|
|
- [Features](#-features) |
|
|
- [Repository layout](#-repository-layout) |
|
|
- [Quick start (local smoke-test)](#-quick-start-local-smoke-test) |
|
|
- [Calling the deployed endpoint](#-calling-the-deployed-endpoint) |
|
|
- [How it works](#-how-it-works) |
|
|
- [Updating the label set](#-updating-the-label-set) |
|
|
- [License](#-license) |
|
|
|
|
|
--- |
|
|
|
|
|
## β¨ Features |
|
|
| | This repo | |
|
|
|------------------------------|-----------| |
|
|
| **Model** | MobileCLIP-B (`datacompdr` checkpoint) | |
|
|
| **Branch fusion** | `reparameterize_model` baked in | |
|
|
| **Mixed-precision** | FP16 on GPU, FP32 on CPU | |
|
|
| **Pre-computed text feats** | One-time encoding of prompts in `items.json` | |
|
|
| **Per-request work** | _Only_ image decoding β encode_image β softmax | |
|
|
| **Latency (A10G)** | < 30 ms once the image arrives | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Repository layout |
|
|
|
|
|
| Path | Purpose | |
|
|
|--------------------|------------------------------------------------------------------| |
|
|
| `handler.py` | HF entry-point (loads model + text cache, serves requests) | |
|
|
| `reparam.py` | 60-line stand-alone copy of Appleβs `reparameterize_model` | |
|
|
| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) | |
|
|
| `items.json` | Your label set (`id`, `name`, `prompt` per line) | |
|
|
| `README.md` | This document | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quick start (local smoke-test) |
|
|
|
|
|
```bash |
|
|
python -m venv venv && source venv/bin/activate |
|
|
pip install -r requirements.txt |
|
|
|
|
|
python - <<'PY' |
|
|
import base64, json, handler, pathlib |
|
|
app = handler.EndpointHandler() |
|
|
|
|
|
img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode() |
|
|
print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes |
|
|
PY |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Calling the deployed endpoint |
|
|
|
|
|
```bash |
|
|
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud" |
|
|
TOKEN="hf_xxxxxxxxxxxxxxxxx" |
|
|
IMG="cat.jpg" |
|
|
|
|
|
python - <<'PY' |
|
|
import base64, json, os, requests, sys |
|
|
url = os.environ["ENDPOINT"] |
|
|
token = os.environ["TOKEN"] |
|
|
img = sys.argv |
|
|
|
|
|
payload = { |
|
|
"inputs": { |
|
|
"image": base64.b64encode(open(img, "rb").read()).decode() |
|
|
} |
|
|
} |
|
|
resp = requests.post( |
|
|
url, |
|
|
headers={ |
|
|
"Authorization": f"Bearer {token}", |
|
|
"Content-Type": "application/json", |
|
|
"Accept": "application/json", |
|
|
}, |
|
|
json=payload, |
|
|
timeout=60, |
|
|
) |
|
|
print(json.dumps(resp.json()[:5], indent=2)) |
|
|
PY |
|
|
$IMG |
|
|
``` |
|
|
|
|
|
*Response example* |
|
|
|
|
|
```json |
|
|
[ |
|
|
{ "id": 23, "label": "cat", "score": 0.92 }, |
|
|
{ "id": 11, "label": "tiger cat", "score": 0.05 }, |
|
|
{ "id": 48, "label": "siamese cat", "score": 0.02 } |
|
|
] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ How it works |
|
|
|
|
|
1. **Startup (runs once per replica)** |
|
|
|
|
|
* Downloads / loads MobileCLIP-B (`datacompdr`). |
|
|
* Fuses MobileOne branches via `reparam.py`. |
|
|
* Reads `items.json` and encodes every prompt β `[N,512]` tensor. |
|
|
|
|
|
2. **Per request** |
|
|
|
|
|
* Decodes base-64 JPEG/PNG. |
|
|
* Applies OpenCLIP preprocessing (224 Γ 224 center-crop + normalise). |
|
|
* Encodes the image, normalises, computes cosine similarity vs. cached text matrix. |
|
|
* Returns sorted `[{id, label, score}, β¦]`. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Updating the label set |
|
|
|
|
|
Simply edit `items.json`, push, and redeploy. |
|
|
|
|
|
```json |
|
|
[ |
|
|
{ "id": 0, "name": "cat", "prompt": "a photo of a cat" }, |
|
|
{ "id": 1, "name": "dog", "prompt": "a photo of a dog" } |
|
|
] |
|
|
``` |
|
|
|
|
|
No code changes are required; the handler re-encodes prompts at start-up. |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ License |
|
|
|
|
|
* **Weights / data** β Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)) |
|
|
* **This wrapper code** β MIT |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"><sub>Maintained with β€οΈ by Your-Team β Aug 2025</sub></div> |
|
|
|
|
|
|