File size: 4,215 Bytes
6de994e ce813cc 6de994e ce813cc 6de994e ce813cc 1fdf84e 6de994e ce813cc 6de994e 1fdf84e ce813cc 1fdf84e 6de994e 1fdf84e ce813cc 1fdf84e 6de994e ce813cc 1fdf84e 6de994e ce813cc 6de994e 1fdf84e ce813cc 1fdf84e 6de994e ce813cc 6de994e ce813cc 6de994e ce813cc 6de994e ce813cc 1fdf84e 6de994e 1fdf84e ce813cc 6de994e ce813cc 1fdf84e ce813cc 6de994e ce813cc 6de994e ce813cc 6de994e 1fdf84e ce813cc 1fdf84e 6de994e 1fdf84e ce813cc 6de994e ce813cc 6de994e ce813cc 6de994e ce813cc 6de994e 1fdf84e ce813cc 6de994e ce813cc 6de994e 1fdf84e ce813cc 6de994e ce813cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: apple-amlr
license_name: apple-ascl
license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
library_name: mobileclip
---
# ๐ธ MobileCLIP-B Zero-Shot Image Classifier
### Hugging Face Inference Endpoint
> **Production-ready wrapper** around Appleโs MobileCLIP-B checkpoint.
> Handles image โ text similarity in a single fast call.
---
## ๐ Sidebar
- [Features](#-features)
- [Repository layout](#-repository-layout)
- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
- [How it works](#-how-it-works)
- [Updating the label set](#-updating-the-label-set)
- [License](#-license)
---
## โจ Features
| | This repo |
|------------------------------|-----------|
| **Model** | MobileCLIP-B (`datacompdr` checkpoint) |
| **Branch fusion** | `reparameterize_model` baked in |
| **Mixed-precision** | FP16 on GPU, FP32 on CPU |
| **Pre-computed text feats** | One-time encoding of prompts in `items.json` |
| **Per-request work** | _Only_ image decoding โ encode_image โ softmax |
| **Latency (A10G)** | < 30 ms once the image arrives |
---
## ๐ Repository layout
| Path | Purpose |
|--------------------|------------------------------------------------------------------|
| `handler.py` | HF entry-point (loads model + text cache, serves requests) |
| `reparam.py` | 60-line stand-alone copy of Appleโs `reparameterize_model` |
| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) |
| `items.json` | Your label set (`id`, `name`, `prompt` per line) |
| `README.md` | This document |
---
## ๐ Quick start (local smoke-test)
```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()
img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes
PY
```
---
## ๐ Calling the deployed endpoint
```bash
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"
python - <<'PY'
import base64, json, os, requests, sys
url = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img = sys.argv
payload = {
"inputs": {
"image": base64.b64encode(open(img, "rb").read()).decode()
}
}
resp = requests.post(
url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
},
json=payload,
timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG
```
*Response example*
```json
[
{ "id": 23, "label": "cat", "score": 0.92 },
{ "id": 11, "label": "tiger cat", "score": 0.05 },
{ "id": 48, "label": "siamese cat", "score": 0.02 }
]
```
---
## โ๏ธ How it works
1. **Startup (runs once per replica)**
* Downloads / loads MobileCLIP-B (`datacompdr`).
* Fuses MobileOne branches via `reparam.py`.
* Reads `items.json` and encodes every prompt โ `[N,512]` tensor.
2. **Per request**
* Decodes base-64 JPEG/PNG.
* Applies OpenCLIP preprocessing (224 ร 224 center-crop + normalise).
* Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
* Returns sorted `[{id, label, score}, โฆ]`.
---
## ๐ Updating the label set
Simply edit `items.json`, push, and redeploy.
```json
[
{ "id": 0, "name": "cat", "prompt": "a photo of a cat" },
{ "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]
```
No code changes are required; the handler re-encodes prompts at start-up.
---
## โ๏ธ License
* **Weights / data** โ Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
* **This wrapper code** โ MIT
---
<div align="center"><sub>Maintained with โค๏ธ by Your-Team โ Aug 2025</sub></div>
|