Spaces:
Running
Running
| title: Embedding Inference API | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # Embedding Inference API | |
| A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI. | |
| ## Features | |
| - **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art) | |
| - **RESTful API**: Easy-to-use HTTP endpoints | |
| - **Batch Processing**: Process multiple texts in a single request | |
| - **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.) | |
| - **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment | |
| ## Supported Models | |
| | Model | Dimension | Max Tokens | Best For | | |
| |-------|-----------|------------|----------| | |
| | JobBERT v2 | 768 | 512 | Job titles and descriptions | | |
| | JobBERT v3 | 768 | 512 | Job titles (improved performance) | | |
| | Jina AI v3 | 1024 | 8,192 | General text, long documents | | |
| | Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) | | |
| ## Quick Start | |
| ### Local Development | |
| 1. **Install dependencies:** | |
| ```bash | |
| cd embedding | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Run the API:** | |
| ```bash | |
| python api.py | |
| ``` | |
| 3. **Access the API:** | |
| - API: http://localhost:7860 | |
| - Docs: http://localhost:7860/docs | |
| ### Docker Deployment | |
| 1. **Build the image:** | |
| ```bash | |
| docker build -t embedding-api . | |
| ``` | |
| 2. **Run the container:** | |
| ```bash | |
| docker run -p 7860:7860 embedding-api | |
| ``` | |
| 3. **With Voyage AI (optional):** | |
| ```bash | |
| docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api | |
| ``` | |
| ## Hugging Face Spaces Deployment | |
| ### Option 1: Using Hugging Face CLI | |
| 1. **Install Hugging Face CLI:** | |
| ```bash | |
| pip install huggingface_hub | |
| huggingface-cli login | |
| ``` | |
| 2. **Create a new Space:** | |
| - Go to https://huggingface.co/spaces | |
| - Click "Create new Space" | |
| - Choose "Docker" as the Space SDK | |
| - Name your space (e.g., `your-username/embedding-api`) | |
| 3. **Clone and push:** | |
| ```bash | |
| git clone https://huggingface.co/spaces/your-username/embedding-api | |
| cd embedding-api | |
| # Copy files from embedding folder | |
| cp /path/to/embedding/Dockerfile . | |
| cp /path/to/embedding/api.py . | |
| cp /path/to/embedding/requirements.txt . | |
| cp /path/to/embedding/README.md . | |
| git add . | |
| git commit -m "Initial commit" | |
| git push | |
| ``` | |
| 4. **Configure environment (optional):** | |
| - Go to your Space settings | |
| - Add `VOYAGE_API_KEY` secret if using Voyage AI | |
| ### Option 2: Manual Upload | |
| 1. Create a new Docker Space on Hugging Face | |
| 2. Upload these files: | |
| - `Dockerfile` | |
| - `api.py` | |
| - `requirements.txt` | |
| - `README.md` | |
| 3. Add environment variables in Settings if needed | |
| ## API Usage | |
| ### Health Check | |
| ```bash | |
| curl http://localhost:7860/health | |
| ``` | |
| Response: | |
| ```json | |
| { | |
| "status": "healthy", | |
| "models_loaded": ["jobbertv2", "jobbertv3", "jina"], | |
| "voyage_available": false, | |
| "api_key_required": false | |
| } | |
| ``` | |
| ### Generate Embeddings (Elasticsearch Compatible) | |
| The main `/embed` endpoint uses Elasticsearch inference API format with model selection via query parameter. | |
| #### Single Text (JobBERT v3 - default) | |
| Without API key: | |
| ```bash | |
| curl -X POST "http://localhost:7860/embed" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "input": "Software Engineer" | |
| }' | |
| ``` | |
| With API key: | |
| ```bash | |
| curl -X POST "http://localhost:7860/embed" \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer YOUR_API_KEY" \ | |
| -d '{ | |
| "input": "Software Engineer" | |
| }' | |
| ``` | |
| Response: | |
| ```json | |
| { | |
| "embedding": [0.123, -0.456, 0.789, ...] | |
| } | |
| ``` | |
| #### Single Text with Model Selection | |
| ```bash | |
| # JobBERT v2 | |
| curl -X POST "http://localhost:7860/embed?model=jobbertv2" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Data Scientist"}' | |
| # JobBERT v3 (recommended) | |
| curl -X POST "http://localhost:7860/embed?model=jobbertv3" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Product Manager"}' | |
| # Jina AI | |
| curl -X POST "http://localhost:7860/embed?model=jina" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Machine Learning Engineer"}' | |
| ``` | |
| #### Multiple Texts (Batch) | |
| ```bash | |
| curl -X POST "http://localhost:7860/embed?model=jobbertv3" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "input": ["Software Engineer", "Data Scientist", "Product Manager"] | |
| }' | |
| ``` | |
| Response: | |
| ```json | |
| { | |
| "embeddings": [ | |
| [0.123, -0.456, ...], | |
| [0.234, -0.567, ...], | |
| [0.345, -0.678, ...] | |
| ] | |
| } | |
| ``` | |
| #### Jina AI with Task Type | |
| ```bash | |
| curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "What is machine learning?"}' | |
| ``` | |
| **Jina AI Tasks (query parameter):** | |
| - `retrieval.query`: For search queries | |
| - `retrieval.passage`: For documents | |
| - `text-matching`: For similarity (default) | |
| #### Voyage AI (requires API key) | |
| ```bash | |
| curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "This is a document to embed"}' | |
| ``` | |
| **Voyage AI Input Types (query parameter):** | |
| - `document`: For documents/passages | |
| - `query`: For search queries | |
| ### Batch Endpoint (Original Format) | |
| For compatibility, the original batch endpoint is still available at `/embed/batch`: | |
| ```bash | |
| curl -X POST http://localhost:7860/embed/batch \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "texts": ["Software Engineer", "Data Scientist"], | |
| "model": "jobbertv3" | |
| }' | |
| ``` | |
| Response includes metadata: | |
| ```json | |
| { | |
| "embeddings": [[0.123, ...], [0.234, ...]], | |
| "model": "jobbertv3", | |
| "dimension": 768, | |
| "num_texts": 2 | |
| } | |
| ``` | |
| ### List Available Models | |
| ```bash | |
| curl http://localhost:7860/models | |
| ``` | |
| ## Python Client Examples | |
| ### Elasticsearch-Compatible Format (Recommended) | |
| ```python | |
| import requests | |
| BASE_URL = "http://localhost:7860" | |
| API_KEY = "your-api-key-here" # Optional, only if API key is required | |
| # Headers (include API key if required) | |
| headers = {} | |
| if API_KEY: | |
| headers["Authorization"] = f"Bearer {API_KEY}" | |
| # Single embedding (JobBERT v3 - default) | |
| response = requests.post( | |
| f"{BASE_URL}/embed", | |
| headers=headers, | |
| json={"input": "Software Engineer"} | |
| ) | |
| result = response.json() | |
| embedding = result["embedding"] # Single vector | |
| print(f"Embedding dimension: {len(embedding)}") | |
| # Single embedding with model selection | |
| response = requests.post( | |
| f"{BASE_URL}/embed?model=jina", | |
| headers=headers, | |
| json={"input": "Data Scientist"} | |
| ) | |
| embedding = response.json()["embedding"] | |
| # Batch embeddings | |
| response = requests.post( | |
| f"{BASE_URL}/embed?model=jobbertv3", | |
| headers=headers, | |
| json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]} | |
| ) | |
| result = response.json() | |
| embeddings = result["embeddings"] # List of vectors | |
| print(f"Generated {len(embeddings)} embeddings") | |
| # Jina AI with task | |
| response = requests.post( | |
| f"{BASE_URL}/embed?model=jina&task=retrieval.query", | |
| headers=headers, | |
| json={"input": "What is Python?"} | |
| ) | |
| # Voyage AI with input type | |
| response = requests.post( | |
| f"{BASE_URL}/embed?model=voyage&input_type=document", | |
| headers=headers, | |
| json={"input": "Document text here"} | |
| ) | |
| ``` | |
| ### Python Client Class with API Key Support | |
| ```python | |
| import requests | |
| from typing import List, Union, Optional | |
| class EmbeddingClient: | |
| def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"): | |
| self.base_url = base_url | |
| self.api_key = api_key | |
| self.model = model | |
| self.headers = {} | |
| if api_key: | |
| self.headers["Authorization"] = f"Bearer {api_key}" | |
| def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]: | |
| """Get embeddings for single text or batch""" | |
| response = requests.post( | |
| f"{self.base_url}/embed?model={self.model}", | |
| headers=self.headers, | |
| json={"input": text} | |
| ) | |
| response.raise_for_status() | |
| result = response.json() | |
| if isinstance(text, str): | |
| return result["embedding"] | |
| else: | |
| return result["embeddings"] | |
| # Usage | |
| client = EmbeddingClient( | |
| base_url="https://YOUR-SPACE.hf.space", | |
| api_key="your-api-key-here", # Optional | |
| model="jobbertv3" | |
| ) | |
| # Single embedding | |
| embedding = client.embed("Software Engineer") | |
| print(f"Dimension: {len(embedding)}") | |
| # Batch embeddings | |
| embeddings = client.embed(["Software Engineer", "Data Scientist"]) | |
| print(f"Generated {len(embeddings)} embeddings") | |
| ``` | |
| ### Batch Format (Original) | |
| ```python | |
| import requests | |
| url = "http://localhost:7860/embed/batch" | |
| response = requests.post(url, json={ | |
| "texts": ["Software Engineer", "Data Scientist"], | |
| "model": "jobbertv3" | |
| }) | |
| result = response.json() | |
| embeddings = result["embeddings"] | |
| print(f"Model: {result['model']}, Dimension: {result['dimension']}") | |
| ``` | |
| ## Environment Variables | |
| - `PORT`: Server port (default: 7860) | |
| - `API_KEY`: Your API key for authentication (optional, but recommended for production) | |
| - `REQUIRE_API_KEY`: Set to `true` to enable API key authentication (default: `false`) | |
| - `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings) | |
| ### Setting Up API Key Authentication | |
| #### Local Development | |
| ```bash | |
| # Set environment variables | |
| export API_KEY="your-secret-key-here" | |
| export REQUIRE_API_KEY="true" | |
| # Run the API | |
| python api.py | |
| ``` | |
| #### Hugging Face Spaces | |
| 1. Go to your Space settings | |
| 2. Click on "Variables and secrets" | |
| 3. Add secrets: | |
| - Name: `API_KEY`, Value: `your-secret-key-here` | |
| - Name: `REQUIRE_API_KEY`, Value: `true` | |
| 4. Restart your Space | |
| #### Docker | |
| ```bash | |
| docker run -p 7860:7860 \ | |
| -e API_KEY="your-secret-key-here" \ | |
| -e REQUIRE_API_KEY="true" \ | |
| embedding-api | |
| ``` | |
| ## Interactive Documentation | |
| Once the API is running, visit: | |
| - **Swagger UI**: http://localhost:7860/docs | |
| - **ReDoc**: http://localhost:7860/redoc | |
| ## Notes | |
| - Models are downloaded automatically on first startup (~2-3GB total) | |
| - Voyage AI requires an API key from https://www.voyageai.com/ | |
| - First request to each model may be slower due to model loading | |
| - Use batch processing for better performance (send multiple texts at once) | |
| ## Troubleshooting | |
| ### Models not loading | |
| - Check available disk space (need ~3GB) | |
| - Ensure internet connection for model download | |
| - Check logs for specific error messages | |
| ### Voyage AI not working | |
| - Verify `VOYAGE_API_KEY` is set correctly | |
| - Check API key has sufficient credits | |
| - Ensure `voyageai` package is installed | |
| ### Out of memory | |
| - Reduce batch size (process fewer texts per request) | |
| - Use smaller models (JobBERT v2 instead of Jina) | |
| - Increase container memory limits | |
| ## License | |
| This API uses models with different licenses: | |
| - JobBERT v2/v3: Apache 2.0 | |
| - Jina AI: Apache 2.0 | |
| - Voyage AI: Subject to Voyage AI terms of service | |