--- title: Embedding Inference API emoji: 🤖 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false --- # Embedding Inference API A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI. ## Features - **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art) - **RESTful API**: Easy-to-use HTTP endpoints - **Batch Processing**: Process multiple texts in a single request - **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.) - **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment ## Supported Models | Model | Dimension | Max Tokens | Best For | |-------|-----------|------------|----------| | JobBERT v2 | 768 | 512 | Job titles and descriptions | | JobBERT v3 | 768 | 512 | Job titles (improved performance) | | Jina AI v3 | 1024 | 8,192 | General text, long documents | | Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) | ## Quick Start ### Local Development 1. **Install dependencies:** ```bash cd embedding pip install -r requirements.txt ``` 2. **Run the API:** ```bash python api.py ``` 3. **Access the API:** - API: http://localhost:7860 - Docs: http://localhost:7860/docs ### Docker Deployment 1. **Build the image:** ```bash docker build -t embedding-api . ``` 2. **Run the container:** ```bash docker run -p 7860:7860 embedding-api ``` 3. **With Voyage AI (optional):** ```bash docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api ``` ## Hugging Face Spaces Deployment ### Option 1: Using Hugging Face CLI 1. **Install Hugging Face CLI:** ```bash pip install huggingface_hub huggingface-cli login ``` 2. **Create a new Space:** - Go to https://huggingface.co/spaces - Click "Create new Space" - Choose "Docker" as the Space SDK - Name your space (e.g., `your-username/embedding-api`) 3. **Clone and push:** ```bash git clone https://huggingface.co/spaces/your-username/embedding-api cd embedding-api # Copy files from embedding folder cp /path/to/embedding/Dockerfile . cp /path/to/embedding/api.py . cp /path/to/embedding/requirements.txt . cp /path/to/embedding/README.md . git add . git commit -m "Initial commit" git push ``` 4. **Configure environment (optional):** - Go to your Space settings - Add `VOYAGE_API_KEY` secret if using Voyage AI ### Option 2: Manual Upload 1. Create a new Docker Space on Hugging Face 2. Upload these files: - `Dockerfile` - `api.py` - `requirements.txt` - `README.md` 3. Add environment variables in Settings if needed ## API Usage ### Health Check ```bash curl http://localhost:7860/health ``` Response: ```json { "status": "healthy", "models_loaded": ["jobbertv2", "jobbertv3", "jina"], "voyage_available": false, "api_key_required": false } ``` ### Generate Embeddings (Elasticsearch Compatible) The main `/embed` endpoint uses Elasticsearch inference API format with model selection via query parameter. #### Single Text (JobBERT v3 - default) Without API key: ```bash curl -X POST "http://localhost:7860/embed" \ -H "Content-Type: application/json" \ -d '{ "input": "Software Engineer" }' ``` With API key: ```bash curl -X POST "http://localhost:7860/embed" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "input": "Software Engineer" }' ``` Response: ```json { "embedding": [0.123, -0.456, 0.789, ...] } ``` #### Single Text with Model Selection ```bash # JobBERT v2 curl -X POST "http://localhost:7860/embed?model=jobbertv2" \ -H "Content-Type: application/json" \ -d '{"input": "Data Scientist"}' # JobBERT v3 (recommended) curl -X POST "http://localhost:7860/embed?model=jobbertv3" \ -H "Content-Type: application/json" \ -d '{"input": "Product Manager"}' # Jina AI curl -X POST "http://localhost:7860/embed?model=jina" \ -H "Content-Type: application/json" \ -d '{"input": "Machine Learning Engineer"}' ``` #### Multiple Texts (Batch) ```bash curl -X POST "http://localhost:7860/embed?model=jobbertv3" \ -H "Content-Type: application/json" \ -d '{ "input": ["Software Engineer", "Data Scientist", "Product Manager"] }' ``` Response: ```json { "embeddings": [ [0.123, -0.456, ...], [0.234, -0.567, ...], [0.345, -0.678, ...] ] } ``` #### Jina AI with Task Type ```bash curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \ -H "Content-Type: application/json" \ -d '{"input": "What is machine learning?"}' ``` **Jina AI Tasks (query parameter):** - `retrieval.query`: For search queries - `retrieval.passage`: For documents - `text-matching`: For similarity (default) #### Voyage AI (requires API key) ```bash curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \ -H "Content-Type: application/json" \ -d '{"input": "This is a document to embed"}' ``` **Voyage AI Input Types (query parameter):** - `document`: For documents/passages - `query`: For search queries ### Batch Endpoint (Original Format) For compatibility, the original batch endpoint is still available at `/embed/batch`: ```bash curl -X POST http://localhost:7860/embed/batch \ -H "Content-Type: application/json" \ -d '{ "texts": ["Software Engineer", "Data Scientist"], "model": "jobbertv3" }' ``` Response includes metadata: ```json { "embeddings": [[0.123, ...], [0.234, ...]], "model": "jobbertv3", "dimension": 768, "num_texts": 2 } ``` ### List Available Models ```bash curl http://localhost:7860/models ``` ## Python Client Examples ### Elasticsearch-Compatible Format (Recommended) ```python import requests BASE_URL = "http://localhost:7860" API_KEY = "your-api-key-here" # Optional, only if API key is required # Headers (include API key if required) headers = {} if API_KEY: headers["Authorization"] = f"Bearer {API_KEY}" # Single embedding (JobBERT v3 - default) response = requests.post( f"{BASE_URL}/embed", headers=headers, json={"input": "Software Engineer"} ) result = response.json() embedding = result["embedding"] # Single vector print(f"Embedding dimension: {len(embedding)}") # Single embedding with model selection response = requests.post( f"{BASE_URL}/embed?model=jina", headers=headers, json={"input": "Data Scientist"} ) embedding = response.json()["embedding"] # Batch embeddings response = requests.post( f"{BASE_URL}/embed?model=jobbertv3", headers=headers, json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]} ) result = response.json() embeddings = result["embeddings"] # List of vectors print(f"Generated {len(embeddings)} embeddings") # Jina AI with task response = requests.post( f"{BASE_URL}/embed?model=jina&task=retrieval.query", headers=headers, json={"input": "What is Python?"} ) # Voyage AI with input type response = requests.post( f"{BASE_URL}/embed?model=voyage&input_type=document", headers=headers, json={"input": "Document text here"} ) ``` ### Python Client Class with API Key Support ```python import requests from typing import List, Union, Optional class EmbeddingClient: def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"): self.base_url = base_url self.api_key = api_key self.model = model self.headers = {} if api_key: self.headers["Authorization"] = f"Bearer {api_key}" def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]: """Get embeddings for single text or batch""" response = requests.post( f"{self.base_url}/embed?model={self.model}", headers=self.headers, json={"input": text} ) response.raise_for_status() result = response.json() if isinstance(text, str): return result["embedding"] else: return result["embeddings"] # Usage client = EmbeddingClient( base_url="https://YOUR-SPACE.hf.space", api_key="your-api-key-here", # Optional model="jobbertv3" ) # Single embedding embedding = client.embed("Software Engineer") print(f"Dimension: {len(embedding)}") # Batch embeddings embeddings = client.embed(["Software Engineer", "Data Scientist"]) print(f"Generated {len(embeddings)} embeddings") ``` ### Batch Format (Original) ```python import requests url = "http://localhost:7860/embed/batch" response = requests.post(url, json={ "texts": ["Software Engineer", "Data Scientist"], "model": "jobbertv3" }) result = response.json() embeddings = result["embeddings"] print(f"Model: {result['model']}, Dimension: {result['dimension']}") ``` ## Environment Variables - `PORT`: Server port (default: 7860) - `API_KEY`: Your API key for authentication (optional, but recommended for production) - `REQUIRE_API_KEY`: Set to `true` to enable API key authentication (default: `false`) - `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings) ### Setting Up API Key Authentication #### Local Development ```bash # Set environment variables export API_KEY="your-secret-key-here" export REQUIRE_API_KEY="true" # Run the API python api.py ``` #### Hugging Face Spaces 1. Go to your Space settings 2. Click on "Variables and secrets" 3. Add secrets: - Name: `API_KEY`, Value: `your-secret-key-here` - Name: `REQUIRE_API_KEY`, Value: `true` 4. Restart your Space #### Docker ```bash docker run -p 7860:7860 \ -e API_KEY="your-secret-key-here" \ -e REQUIRE_API_KEY="true" \ embedding-api ``` ## Interactive Documentation Once the API is running, visit: - **Swagger UI**: http://localhost:7860/docs - **ReDoc**: http://localhost:7860/redoc ## Notes - Models are downloaded automatically on first startup (~2-3GB total) - Voyage AI requires an API key from https://www.voyageai.com/ - First request to each model may be slower due to model loading - Use batch processing for better performance (send multiple texts at once) ## Troubleshooting ### Models not loading - Check available disk space (need ~3GB) - Ensure internet connection for model download - Check logs for specific error messages ### Voyage AI not working - Verify `VOYAGE_API_KEY` is set correctly - Check API key has sufficient credits - Ensure `voyageai` package is installed ### Out of memory - Reduce batch size (process fewer texts per request) - Use smaller models (JobBERT v2 instead of Jina) - Increase container memory limits ## License This API uses models with different licenses: - JobBERT v2/v3: Apache 2.0 - Jina AI: Apache 2.0 - Voyage AI: Subject to Voyage AI terms of service