File size: 4,215 Bytes

6de994e
 
 
 
 
 
 
 
 
ce813cc
6de994e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce813cc
6de994e
 
 
 
 
 
 
 
 
ce813cc
1fdf84e
 
6de994e
ce813cc
6de994e
 
 
 
 
 
 
1fdf84e
ce813cc
1fdf84e
6de994e
1fdf84e
ce813cc
 
 
1fdf84e
6de994e
 
ce813cc
1fdf84e
6de994e
 
ce813cc
6de994e
1fdf84e
ce813cc
1fdf84e
6de994e
ce813cc
 
6de994e
 
ce813cc
 
 
6de994e
 
 
 
ce813cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6de994e
ce813cc
 
 
1fdf84e
6de994e
1fdf84e
ce813cc
 
6de994e
 
 
ce813cc
1fdf84e
 
ce813cc
 
6de994e
ce813cc
6de994e
ce813cc
6de994e
 
 
1fdf84e
ce813cc
1fdf84e
6de994e
 
 
 
1fdf84e
ce813cc
 
6de994e
ce813cc
6de994e
ce813cc
 
 
6de994e
 
ce813cc
 
 
6de994e
1fdf84e
ce813cc
 
6de994e
ce813cc
6de994e
 
1fdf84e
ce813cc
 
6de994e
ce813cc

---
license: apple-amlr
license_name: apple-ascl
license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
library_name: mobileclip
---

# 📸 MobileCLIP-B Zero-Shot Image Classifier
### Hugging Face Inference Endpoint

> **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint.
> Handles image → text similarity in a single fast call.

---

## 📑 Sidebar

- [Features](#-features)
- [Repository layout](#-repository-layout)
- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
- [How it works](#-how-it-works)
- [Updating the label set](#-updating-the-label-set)
- [License](#-license)

---

## ✨ Features
|                              | This repo |
|------------------------------|-----------|
| **Model**                    | MobileCLIP-B (`datacompdr` checkpoint) |
| **Branch fusion**            | `reparameterize_model` baked in |
| **Mixed-precision**          | FP16 on GPU, FP32 on CPU |
| **Pre-computed text feats**  | One-time encoding of prompts in `items.json` |
| **Per-request work**         | _Only_ image decoding → encode_image → softmax |
| **Latency (A10G)**           | < 30 ms once the image arrives |

---

## 📁 Repository layout

| Path               | Purpose                                                          |
|--------------------|------------------------------------------------------------------|
| `handler.py`       | HF entry-point (loads model + text cache, serves requests)       |
| `reparam.py`       | 60-line stand-alone copy of Apple’s `reparameterize_model`       |
| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`)      |
| `items.json`       | Your label set (`id`, `name`, `prompt` per line)                 |
| `README.md`        | This document                                                    |

---

## 🚀 Quick start (local smoke-test)

```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()

img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5])   # top-5 classes
PY
```

---

## 🌐 Calling the deployed endpoint

```bash
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"

python - <<'PY'
import base64, json, os, requests, sys
url   = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img   = sys.argv

payload = {
    "inputs": {
        "image": base64.b64encode(open(img, "rb").read()).decode()
    }
}
resp = requests.post(
    url,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type":  "application/json",
        "Accept":        "application/json",
    },
    json=payload,
    timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG
```

*Response example*

```json
[
  { "id": 23, "label": "cat",         "score": 0.92 },
  { "id": 11, "label": "tiger cat",   "score": 0.05 },
  { "id": 48, "label": "siamese cat", "score": 0.02 }
]
```

---

## ⚙️ How it works

1. **Startup (runs once per replica)**

   * Downloads / loads MobileCLIP-B (`datacompdr`).
   * Fuses MobileOne branches via `reparam.py`.
   * Reads `items.json` and encodes every prompt → `[N,512]` tensor.

2. **Per request**

   * Decodes base-64 JPEG/PNG.
   * Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise).
   * Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
   * Returns sorted `[{id, label, score}, …]`.

---

## 🔄 Updating the label set

Simply edit `items.json`, push, and redeploy.

```json
[
  { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
  { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]
```

No code changes are required; the handler re-encodes prompts at start-up.

---

## ⚖️ License

* **Weights / data** — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
* **This wrapper code** — MIT

---

<div align="center"><sub>Maintained with ❤️ by Your-Team — Aug 2025</sub></div>