finhdev
/

testmobileclip

mobileclip

Model card Files Files and versions

xet

Community

finhdev commited on Aug 4, 2025

Commit

6de994e

verified ·

1 Parent(s): ce813cc

Update README.md

Browse files

Files changed (1) hide show

README.md +78 -75

README.md CHANGED Viewed

@@ -1,68 +1,83 @@
-````markdown
-# 📸 MobileCLIP-B Zero-Shot Image Classifier — HF Inference Endpoint
-This repository packages Apple’s **MobileCLIP-B** model as a production-ready
-Hugging Face Inference Endpoint.
-* **One-shot image → class probabilities**
-  ⚡ < 30 ms on an A10G / T4 once the image arrives.
-* **Branch-fused / FP16** MobileCLIP for fast GPU inference.
-* **Pre-computed text embeddings** for your custom label set
-  (`items.json`) — every request encodes **only** the image.
-* Built with vanilla **`open-clip-torch`** (no forks) and a
-  60-line local helper (`reparam.py`) to fuse MobileOne blocks.
 ---
-## ✨ What’s inside
-| File | Purpose |
-|------|---------|
-| `handler.py` | Hugging Face entry-point — loads weights, caches text features, serves requests |
-| `reparam.py` | Stand-alone copy of `reparameterize_model` from Apple’s repo (removes heavy upstream dependency) |
-| `requirements.txt` | Minimal, conflict-free dependency set (`torch`, `torchvision`, `open-clip-torch`) |
-| `items.json` | Your label spec — each element must have `id`, `name`, and `prompt` fields |
-| `README.md` | You are here |
 ---
-## 🔧 Quick start (local smoke-test)
 ```bash
 python -m venv venv && source venv/bin/activate
 pip install -r requirements.txt
-python - <<'PY'
-from pathlib import Path, PurePosixPath
-import base64, json, requests
-# Load a demo image and encode it
-img_path = Path("tests/cat.jpg")
-payload = {
-    "image": base64.b64encode(img_path.read_bytes()).decode()
-}
-# Local simulation — spin up uvicorn the same way the HF container does
-import handler, uvicorn
 app = handler.EndpointHandler()
-print(app({"inputs": payload})[:5])   # top-5 classes
 PY
-````
 ---
-## 🚀 Calling the deployed endpoint
 ```bash
-ENDPOINT_URL="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
-HF_TOKEN="hf_xxxxxxxxxxxxxxxxx"
 IMG="cat.jpg"
 python - <<'PY'
-import base64, json, requests, sys, os
-url   = os.environ["ENDPOINT_URL"]
-token = os.environ["HF_TOKEN"]
-img   = sys.argv[1]
 payload = {
     "inputs": {
@@ -79,73 +94,61 @@ resp = requests.post(
     json=payload,
     timeout=60,
 )
-print(json.dumps(resp.json()[:5], indent=2))   # top-5
 PY
 $IMG
 ```
-Sample response:
 ```json
 [
-  { "id": 23, "label": "cat",          "score": 0.92 },
-  { "id": 11, "label": "tiger cat",    "score": 0.05 },
-  { "id": 48, "label": "siamese cat",  "score": 0.02 },
-  …
 ]
 ```
 ---
-## 🏗️ How the handler works (high-level)
-1. **Startup**
-   * Downloads / loads the `datacompdr` MobileCLIP-B checkpoint.
-   * Runs `reparameterize_model` to fuse MobileOne branches.
-   * Reads `items.json`, tokenises all prompts, and caches the resulting
-     text embeddings (`[n_classes, 512]`).
 2. **Per request**
-   * Decodes the incoming base-64 JPEG/PNG.
-   * Applies the exact OpenCLIP preprocessing (224 × 224 center-crop,
-     mean/std normalisation).
-   * Encodes the image, L2-normalises, and performs one `softmax(cosine)`
-     against the cached text matrix.
-   * Returns a sorted JSON list `[{"id", "label", "score"}, …]`.
-This design keeps bandwidth low (compressed image over the wire) and
-latency low (no per-request text encoding).
 ---
-## 📝 Updating the label set
-Edit `items.json`, **rebuild the endpoint**, done.
 ```json
 [
-  { "id": 0, "name": "cat",  "prompt": "a photo of a cat"  },
-  { "id": 1, "name": "dog",  "prompt": "a photo of a dog"  },
-  …
 ]
 ```
-* `id` is your internal numeric key (stays stable).
-* `name` is the human-readable label returned to clients.
-* `prompt` is what the model actually “sees” — tweak wording to improve accuracy.
 ---
-## ⚖️ Licence
-* **Weights**: Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)).
-* **Code in this repo**: MIT.
 ---
-<div align="center"><sub>Maintained with ❤️ by Your Team — August 2025</sub></div>
-```
-::contentReference[oaicite:0]{index=0}

+---
+license: apple-amlr
+license_name: apple-ascl
+license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data
+library_name: mobileclip
+---
+# 📸 MobileCLIP-B Zero-Shot Image Classifier
+### Hugging Face Inference Endpoint
+> **Production-ready wrapper** around Apple’s MobileCLIP-B checkpoint.
+> Handles image → text similarity in a single fast call.
+---
+## 📑 Sidebar
+- [Features](#-features)
+- [Repository layout](#-repository-layout)
+- [Quick start (local smoke-test)](#-quick-start-local-smoke-test)
+- [Calling the deployed endpoint](#-calling-the-deployed-endpoint)
+- [How it works](#-how-it-works)
+- [Updating the label set](#-updating-the-label-set)
+- [License](#-license)
+---
+## ✨ Features
+|                              | This repo |
+|------------------------------|-----------|
+| **Model**                    | MobileCLIP-B (`datacompdr` checkpoint) |
+| **Branch fusion**            | `reparameterize_model` baked in |
+| **Mixed-precision**          | FP16 on GPU, FP32 on CPU |
+| **Pre-computed text feats**  | One-time encoding of prompts in `items.json` |
+| **Per-request work**         | _Only_ image decoding → encode_image → softmax |
+| **Latency (A10G)**           | < 30 ms once the image arrives |
 ---
+## 📁 Repository layout
+| Path               | Purpose                                                          |
+|--------------------|------------------------------------------------------------------|
+| `handler.py`       | HF entry-point (loads model + text cache, serves requests)       |
+| `reparam.py`       | 60-line stand-alone copy of Apple’s `reparameterize_model`       |
+| `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`)      |
+| `items.json`       | Your label set (`id`, `name`, `prompt` per line)                 |
+| `README.md`        | This document                                                    |
 ---
+## 🚀 Quick start (local smoke-test)
 ```bash
 python -m venv venv && source venv/bin/activate
 pip install -r requirements.txt
+python - <<'PY'
+import base64, json, handler, pathlib
 app = handler.EndpointHandler()
+img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode()
+print(app({"inputs": {"image": img_b64}})[:5])   # top-5 classes
 PY
+```
 ---
+## 🌐 Calling the deployed endpoint
 ```bash
+ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
+TOKEN="hf_xxxxxxxxxxxxxxxxx"
 IMG="cat.jpg"
 python - <<'PY'
+import base64, json, os, requests, sys
+url   = os.environ["ENDPOINT"]
+token = os.environ["TOKEN"]
+img   = sys.argv
 payload = {
     "inputs": {
     json=payload,
     timeout=60,
 )
+print(json.dumps(resp.json()[:5], indent=2))
 PY
 $IMG
 ```
+*Response example*
 ```json
 [
+  { "id": 23, "label": "cat",         "score": 0.92 },
+  { "id": 11, "label": "tiger cat",   "score": 0.05 },
+  { "id": 48, "label": "siamese cat", "score": 0.02 }
 ]
 ```
 ---
+## ⚙️ How it works
+1. **Startup (runs once per replica)**
+   * Downloads / loads MobileCLIP-B (`datacompdr`).
+   * Fuses MobileOne branches via `reparam.py`.
+   * Reads `items.json` and encodes every prompt → `[N,512]` tensor.
 2. **Per request**
+   * Decodes base-64 JPEG/PNG.
+   * Applies OpenCLIP preprocessing (224 × 224 center-crop + normalise).
+   * Encodes the image, normalises, computes cosine similarity vs. cached text matrix.
+   * Returns sorted `[{id, label, score}, …]`.
 ---
+## 🔄 Updating the label set
+Simply edit `items.json`, push, and redeploy.
 ```json
 [
+  { "id": 0, "name": "cat", "prompt": "a photo of a cat" },
+  { "id": 1, "name": "dog", "prompt": "a photo of a dog" }
 ]
 ```
+No code changes are required; the handler re-encodes prompts at start-up.
 ---
+## ⚖️ License
+* **Weights / data** — Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data))
+* **This wrapper code** — MIT
 ---
+<div align="center"><sub>Maintained with ❤️ by Your-Team — Aug 2025</sub></div>