finhdev commited on
Commit
6de994e
·
verified ·
1 Parent(s): ce813cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -75
README.md CHANGED
@@ -1,68 +1,83 @@
1
- ````markdown
2
- # 📸 MobileCLIP-B Zero-Shot Image Classifier — HF Inference Endpoint
 
 
 
 
 
 
 
3
 
4
- This repository packages Apple’s **MobileCLIP-B** model as a production-ready
5
- Hugging Face Inference Endpoint.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- * **One-shot image → class probabilities**
8
- < 30 ms on an A10G / T4 once the image arrives.
9
- * **Branch-fused / FP16** MobileCLIP for fast GPU inference.
10
- * **Pre-computed text embeddings** for your custom label set
11
- (`items.json`) every request encodes **only** the image.
12
- * Built with vanilla **`open-clip-torch`** (no forks) and a
13
- 60-line local helper (`reparam.py`) to fuse MobileOne blocks.
 
 
14
 
15
  ---
16
 
17
- ## What’s inside
18
 
19
- | File | Purpose |
20
- |------|---------|
21
- | `handler.py` | Hugging Face entry-point loads weights, caches text features, serves requests |
22
- | `reparam.py` | Stand-alone copy of `reparameterize_model` from Apple’s repo (removes heavy upstream dependency) |
23
- | `requirements.txt` | Minimal, conflict-free dependency set (`torch`, `torchvision`, `open-clip-torch`) |
24
- | `items.json` | Your label spec — each element must have `id`, `name`, and `prompt` fields |
25
- | `README.md` | You are here |
26
 
27
  ---
28
 
29
- ## 🔧 Quick start (local smoke-test)
30
 
31
  ```bash
32
  python -m venv venv && source venv/bin/activate
33
  pip install -r requirements.txt
34
- python - <<'PY'
35
- from pathlib import Path, PurePosixPath
36
- import base64, json, requests
37
-
38
- # Load a demo image and encode it
39
- img_path = Path("tests/cat.jpg")
40
- payload = {
41
- "image": base64.b64encode(img_path.read_bytes()).decode()
42
- }
43
 
44
- # Local simulation — spin up uvicorn the same way the HF container does
45
- import handler, uvicorn
46
  app = handler.EndpointHandler()
47
 
48
- print(app({"inputs": payload})[:5]) # top-5 classes
 
49
  PY
50
- ````
51
 
52
  ---
53
 
54
- ## 🚀 Calling the deployed endpoint
55
 
56
  ```bash
57
- ENDPOINT_URL="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
58
- HF_TOKEN="hf_xxxxxxxxxxxxxxxxx"
59
  IMG="cat.jpg"
60
 
61
  python - <<'PY'
62
- import base64, json, requests, sys, os
63
- url = os.environ["ENDPOINT_URL"]
64
- token = os.environ["HF_TOKEN"]
65
- img = sys.argv[1]
66
 
67
  payload = {
68
  "inputs": {
@@ -79,73 +94,61 @@ resp = requests.post(
79
  json=payload,
80
  timeout=60,
81
  )
82
- print(json.dumps(resp.json()[:5], indent=2)) # top-5
83
  PY
84
  $IMG
85
  ```
86
 
87
- Sample response:
88
 
89
  ```json
90
  [
91
- { "id": 23, "label": "cat", "score": 0.92 },
92
- { "id": 11, "label": "tiger cat", "score": 0.05 },
93
- { "id": 48, "label": "siamese cat", "score": 0.02 },
94
-
95
  ]
96
  ```
97
 
98
  ---
99
 
100
- ## 🏗️ How the handler works (high-level)
101
 
102
- 1. **Startup**
103
 
104
- * Downloads / loads the `datacompdr` MobileCLIP-B checkpoint.
105
- * Runs `reparameterize_model` to fuse MobileOne branches.
106
- * Reads `items.json`, tokenises all prompts, and caches the resulting
107
- text embeddings (`[n_classes, 512]`).
108
 
109
  2. **Per request**
110
 
111
- * Decodes the incoming base-64 JPEG/PNG.
112
- * Applies the exact OpenCLIP preprocessing (224 × 224 center-crop,
113
- mean/std normalisation).
114
- * Encodes the image, L2-normalises, and performs one `softmax(cosine)`
115
- against the cached text matrix.
116
- * Returns a sorted JSON list `[{"id", "label", "score"}, …]`.
117
-
118
- This design keeps bandwidth low (compressed image over the wire) and
119
- latency low (no per-request text encoding).
120
 
121
  ---
122
 
123
- ## 📝 Updating the label set
124
 
125
- Edit `items.json`, **rebuild the endpoint**, done.
126
 
127
  ```json
128
  [
129
- { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
130
- { "id": 1, "name": "dog", "prompt": "a photo of a dog" },
131
-
132
  ]
133
  ```
134
 
135
- * `id` is your internal numeric key (stays stable).
136
- * `name` is the human-readable label returned to clients.
137
- * `prompt` is what the model actually “sees” — tweak wording to improve accuracy.
138
 
139
  ---
140
 
141
- ## ⚖️ Licence
142
 
143
- * **Weights**: Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)).
144
- * **Code in this repo**: MIT.
145
 
146
  ---
147
 
148
- <div align="center"><sub>Maintained with ❤️ by Your Team — August 2025</sub></div>
149
- ```
150
- ::contentReference[oaicite:0]{index=0}
151
 
 
1
+ ---
2
+ license: apple-amlr
3
+ license_name: apple-ascl
4
+ license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
5
+ library_name: mobileclip
6
+ ---
7
+
8
+ # 📸 MobileCLIP-B Zero-Shot Image Classifier
9
+ ### Hugging Face Inference Endpoint
10
 
11
+ > **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint.
12
+ > Handles image → text similarity in a single fast call.
13
+
14
+ ---
15
+
16
+ ## 📑 Sidebar
17
+
18
+ - [Features](#-features)
19
+ - [Repository layout](#-repository-layout)
20
+ - [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
21
+ - [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
22
+ - [How it works](#-how-it-works)
23
+ - [Updating the label set](#-updating-the-label-set)
24
+ - [License](#-license)
25
+
26
+ ---
27
 
28
+ ## Features
29
+ | | This repo |
30
+ |------------------------------|-----------|
31
+ | **Model** | MobileCLIP-B (`datacompdr` checkpoint) |
32
+ | **Branch fusion** | `reparameterize_model` baked in |
33
+ | **Mixed-precision** | FP16 on GPU, FP32 on CPU |
34
+ | **Pre-computed text feats** | One-time encoding of prompts in `items.json` |
35
+ | **Per-request work** | _Only_ image decoding → encode_image → softmax |
36
+ | **Latency (A10G)** | < 30 ms once the image arrives |
37
 
38
  ---
39
 
40
+ ## 📁 Repository layout
41
 
42
+ | Path | Purpose |
43
+ |--------------------|------------------------------------------------------------------|
44
+ | `handler.py` | HF entry-point (loads model + text cache, serves requests) |
45
+ | `reparam.py` | 60-line stand-alone copy of Apple’s `reparameterize_model` |
46
+ | `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) |
47
+ | `items.json` | Your label set (`id`, `name`, `prompt` per line) |
48
+ | `README.md` | This document |
49
 
50
  ---
51
 
52
+ ## 🚀 Quick start (local smoke-test)
53
 
54
  ```bash
55
  python -m venv venv && source venv/bin/activate
56
  pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
57
 
58
+ python - <<'PY'
59
+ import base64, json, handler, pathlib
60
  app = handler.EndpointHandler()
61
 
62
+ img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
63
+ print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes
64
  PY
65
+ ```
66
 
67
  ---
68
 
69
+ ## 🌐 Calling the deployed endpoint
70
 
71
  ```bash
72
+ ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
73
+ TOKEN="hf_xxxxxxxxxxxxxxxxx"
74
  IMG="cat.jpg"
75
 
76
  python - <<'PY'
77
+ import base64, json, os, requests, sys
78
+ url = os.environ["ENDPOINT"]
79
+ token = os.environ["TOKEN"]
80
+ img = sys.argv
81
 
82
  payload = {
83
  "inputs": {
 
94
  json=payload,
95
  timeout=60,
96
  )
97
+ print(json.dumps(resp.json()[:5], indent=2))
98
  PY
99
  $IMG
100
  ```
101
 
102
+ *Response example*
103
 
104
  ```json
105
  [
106
+ { "id": 23, "label": "cat", "score": 0.92 },
107
+ { "id": 11, "label": "tiger cat", "score": 0.05 },
108
+ { "id": 48, "label": "siamese cat", "score": 0.02 }
 
109
  ]
110
  ```
111
 
112
  ---
113
 
114
+ ## ⚙️ How it works
115
 
116
+ 1. **Startup (runs once per replica)**
117
 
118
+ * Downloads / loads MobileCLIP-B (`datacompdr`).
119
+ * Fuses MobileOne branches via `reparam.py`.
120
+ * Reads `items.json` and encodes every prompt `[N,512]` tensor.
 
121
 
122
  2. **Per request**
123
 
124
+ * Decodes base-64 JPEG/PNG.
125
+ * Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise).
126
+ * Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
127
+ * Returns sorted `[{id, label, score}, …]`.
 
 
 
 
 
128
 
129
  ---
130
 
131
+ ## 🔄 Updating the label set
132
 
133
+ Simply edit `items.json`, push, and redeploy.
134
 
135
  ```json
136
  [
137
+ { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
138
+ { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
 
139
  ]
140
  ```
141
 
142
+ No code changes are required; the handler re-encodes prompts at start-up.
 
 
143
 
144
  ---
145
 
146
+ ## ⚖️ License
147
 
148
+ * **Weights / data** — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
149
+ * **This wrapper code** MIT
150
 
151
  ---
152
 
153
+ <div align="center"><sub>Maintained with ❤️ by Your-Team — Aug 2025</sub></div>
 
 
154