testmobileclip / README.md

Update README.md

6de994e verified 5 months ago

4.22 kB

	---
	license: apple-amlr
	license_name: apple-ascl
	license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
	library_name: mobileclip
	---

	# 📸 MobileCLIP-B Zero-Shot Image Classifier
	### Hugging Face Inference Endpoint

	> Production-ready wrapper around Apple’s MobileCLIP-B checkpoint.
	> Handles image → text similarity in a single fast call.

	---

	## 📑 Sidebar

	- [Features](#-features)
	- [Repository layout](#-repository-layout)
	- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
	- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
	- [How it works](#-how-it-works)
	- [Updating the label set](#-updating-the-label-set)
	- [License](#-license)

	---

	## ✨ Features
	\| \| This repo \|
	\|------------------------------\|-----------\|
	\| Model \| MobileCLIP-B (`datacompdr` checkpoint) \|
	\| Branch fusion \| `reparameterize_model` baked in \|
	\| Mixed-precision \| FP16 on GPU, FP32 on CPU \|
	\| Pre-computed text feats \| One-time encoding of prompts in `items.json` \|
	\| Per-request work \| _Only_ image decoding → encode_image → softmax \|
	\| Latency (A10G) \| < 30 ms once the image arrives \|

	---

	## 📁 Repository layout

	\| Path \| Purpose \|
	\|--------------------\|------------------------------------------------------------------\|
	\| `handler.py` \| HF entry-point (loads model + text cache, serves requests) \|
	\| `reparam.py` \| 60-line stand-alone copy of Apple’s `reparameterize_model` \|
	\| `requirements.txt` \| Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) \|
	\| `items.json` \| Your label set (`id`, `name`, `prompt` per line) \|
	\| `README.md` \| This document \|

	---

	## 🚀 Quick start (local smoke-test)

	```bash
	python -m venv venv && source venv/bin/activate
	pip install -r requirements.txt

	python - <<'PY'
	import base64, json, handler, pathlib
	app = handler.EndpointHandler()

	img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
	print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes
	PY
	```

	---

	## 🌐 Calling the deployed endpoint

	```bash
	ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
	TOKEN="hf_xxxxxxxxxxxxxxxxx"
	IMG="cat.jpg"

	python - <<'PY'
	import base64, json, os, requests, sys
	url = os.environ["ENDPOINT"]
	token = os.environ["TOKEN"]
	img = sys.argv

	payload = {
	"inputs": {
	"image": base64.b64encode(open(img, "rb").read()).decode()
	}
	}
	resp = requests.post(
	url,
	headers={
	"Authorization": f"Bearer {token}",
	"Content-Type": "application/json",
	"Accept": "application/json",
	},
	json=payload,
	timeout=60,
	)
	print(json.dumps(resp.json()[:5], indent=2))
	PY
	$IMG
	```

	Response example

	```json
	[
	{ "id": 23, "label": "cat", "score": 0.92 },
	{ "id": 11, "label": "tiger cat", "score": 0.05 },
	{ "id": 48, "label": "siamese cat", "score": 0.02 }
	]
	```

	---

	## ⚙️ How it works

	1. Startup (runs once per replica)

	* Downloads / loads MobileCLIP-B (`datacompdr`).
	* Fuses MobileOne branches via `reparam.py`.
	* Reads `items.json` and encodes every prompt → `[N,512]` tensor.

	2. Per request

	* Decodes base-64 JPEG/PNG.
	* Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise).
	* Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
	* Returns sorted `[{id, label, score}, …]`.

	---

	## 🔄 Updating the label set

	Simply edit `items.json`, push, and redeploy.

	```json
	[
	{ "id": 0, "name": "cat", "prompt": "a photo of a cat" },
	{ "id": 1, "name": "dog", "prompt": "a photo of a dog" }
	]
	```

	No code changes are required; the handler re-encodes prompts at start-up.

	---

	## ⚖️ License

	* Weights / data — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
	* This wrapper code — MIT

	---

	<div align="center"><sub>Maintained with ❤️ by Your-Team — Aug 2025</sub></div>