File size: 4,215 Bytes
6de994e
 
 
 
 
 
 
 
 
ce813cc
6de994e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce813cc
6de994e
 
 
 
 
 
 
 
 
ce813cc
1fdf84e
 
6de994e
ce813cc
6de994e
 
 
 
 
 
 
1fdf84e
ce813cc
1fdf84e
6de994e
1fdf84e
ce813cc
 
 
1fdf84e
6de994e
 
ce813cc
1fdf84e
6de994e
 
ce813cc
6de994e
1fdf84e
ce813cc
1fdf84e
6de994e
ce813cc
 
6de994e
 
ce813cc
 
 
6de994e
 
 
 
ce813cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6de994e
ce813cc
 
 
1fdf84e
6de994e
1fdf84e
ce813cc
 
6de994e
 
 
ce813cc
1fdf84e
 
ce813cc
 
6de994e
ce813cc
6de994e
ce813cc
6de994e
 
 
1fdf84e
ce813cc
1fdf84e
6de994e
 
 
 
1fdf84e
ce813cc
 
6de994e
ce813cc
6de994e
ce813cc
 
 
6de994e
 
ce813cc
 
 
6de994e
1fdf84e
ce813cc
 
6de994e
ce813cc
6de994e
 
1fdf84e
ce813cc
 
6de994e
ce813cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: apple-amlr
license_name: apple-ascl
license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
library_name: mobileclip
---

# ๐Ÿ“ธ MobileCLIP-B Zero-Shot Image Classifier
### Hugging Face Inference Endpoint

> **Production-ready wrapper** around Appleโ€™s MobileCLIP-B checkpoint.
> Handles image โ†’ text similarity in a single fast call.

---

## ๐Ÿ“‘ Sidebar

- [Features](#-features)
- [Repository layout](#-repository-layout)
- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
- [How it works](#-how-it-works)
- [Updating the label set](#-updating-the-label-set)
- [License](#-license)

---

## โœจ Features
|                              | This repo |
|------------------------------|-----------|
| **Model**                    | MobileCLIP-B (`datacompdr` checkpoint) |
| **Branch fusion**            | `reparameterize_model` baked in |
| **Mixed-precision**          | FP16 on GPU, FP32 on CPU |
| **Pre-computed text feats**  | One-time encoding of prompts in `items.json` |
| **Per-request work**         | _Only_ image decoding โ†’ encode_image โ†’ softmax |
| **Latency (A10G)**           | < 30 ms once the image arrives |

---

## ๐Ÿ“ Repository layout

| Path               | Purpose                                                          |
|--------------------|------------------------------------------------------------------|
| `handler.py`       | HF entry-point (loads model + text cache, serves requests)       |
| `reparam.py`       | 60-line stand-alone copy of Appleโ€™s `reparameterize_model`       |
| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`)      |
| `items.json`       | Your label set (`id`, `name`, `prompt` per line)                 |
| `README.md`        | This document                                                    |

---

## ๐Ÿš€ Quick start (local smoke-test)

```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

python - <<'PY'
import base64, json, handler, pathlib
app = handler.EndpointHandler()

img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
print(app({"inputs": {"image": img_b64}})[:5])   # top-5 classes
PY
```

---

## ๐ŸŒ Calling the deployed endpoint

```bash
ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"

python - <<'PY'
import base64, json, os, requests, sys
url   = os.environ["ENDPOINT"]
token = os.environ["TOKEN"]
img   = sys.argv

payload = {
    "inputs": {
        "image": base64.b64encode(open(img, "rb").read()).decode()
    }
}
resp = requests.post(
    url,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type":  "application/json",
        "Accept":        "application/json",
    },
    json=payload,
    timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2))
PY
$IMG
```

*Response example*

```json
[
  { "id": 23, "label": "cat",         "score": 0.92 },
  { "id": 11, "label": "tiger cat",   "score": 0.05 },
  { "id": 48, "label": "siamese cat", "score": 0.02 }
]
```

---

## โš™๏ธ How it works

1. **Startup (runs once per replica)**

   * Downloads / loads MobileCLIP-B (`datacompdr`).
   * Fuses MobileOne branches via `reparam.py`.
   * Reads `items.json` and encodes every prompt โ†’ `[N,512]` tensor.

2. **Per request**

   * Decodes base-64 JPEG/PNG.
   * Applies OpenCLIP preprocessing (224 ร— 224 center-crop + normalise).
   * Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
   * Returns sorted `[{id, label, score}, โ€ฆ]`.

---

## ๐Ÿ”„ Updating the label set

Simply edit `items.json`, push, and redeploy.

```json
[
  { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
  { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
]
```

No code changes are required; the handler re-encodes prompts at start-up.

---

## โš–๏ธ License

* **Weights / data** โ€” Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
* **This wrapper code** โ€” MIT

---

<div align="center"><sub>Maintained with โค๏ธ by Your-Team โ€” Aug 2025</sub></div>