Spaces:
Sleeping
Sleeping
metadata
title: Embedding Inference API
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
Embedding Inference API
A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.
Features
- Multiple Models: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
- RESTful API: Easy-to-use HTTP endpoints
- Batch Processing: Process multiple texts in a single request
- Task-Specific Embeddings: Support for different embedding tasks (retrieval, classification, etc.)
- Docker Ready: Easy deployment to Hugging Face Spaces or any Docker environment
Supported Models
| Model | Dimension | Max Tokens | Best For |
|---|---|---|---|
| JobBERT v2 | 768 | 512 | Job titles and descriptions |
| JobBERT v3 | 768 | 512 | Job titles (improved performance) |
| Jina AI v3 | 1024 | 8,192 | General text, long documents |
| Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |
Quick Start
Local Development
Install dependencies:
cd embedding pip install -r requirements.txtRun the API:
python api.pyAccess the API:
- API: http://localhost:7860
- Docs: http://localhost:7860/docs
Docker Deployment
Build the image:
docker build -t embedding-api .Run the container:
docker run -p 7860:7860 embedding-apiWith Voyage AI (optional):
docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
Hugging Face Spaces Deployment
Option 1: Using Hugging Face CLI
Install Hugging Face CLI:
pip install huggingface_hub huggingface-cli loginCreate a new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose "Docker" as the Space SDK
- Name your space (e.g.,
your-username/embedding-api)
Clone and push:
git clone https://huggingface.co/spaces/your-username/embedding-api cd embedding-api # Copy files from embedding folder cp /path/to/embedding/Dockerfile . cp /path/to/embedding/api.py . cp /path/to/embedding/requirements.txt . cp /path/to/embedding/README.md . git add . git commit -m "Initial commit" git pushConfigure environment (optional):
- Go to your Space settings
- Add
VOYAGE_API_KEYsecret if using Voyage AI
Option 2: Manual Upload
- Create a new Docker Space on Hugging Face
- Upload these files:
Dockerfileapi.pyrequirements.txtREADME.md
- Add environment variables in Settings if needed
API Usage
Health Check
curl http://localhost:7860/health
Response:
{
"status": "healthy",
"models_loaded": ["jobbertv2", "jobbertv3", "jina"],
"voyage_available": false,
"api_key_required": false
}
Generate Embeddings (Elasticsearch Compatible)
The main /embed endpoint uses Elasticsearch inference API format with model selection via query parameter.
Single Text (JobBERT v3 - default)
Without API key:
curl -X POST "http://localhost:7860/embed" \
-H "Content-Type: application/json" \
-d '{
"input": "Software Engineer"
}'
With API key:
curl -X POST "http://localhost:7860/embed" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"input": "Software Engineer"
}'
Response:
{
"embedding": [0.123, -0.456, 0.789, ...]
}
Single Text with Model Selection
# JobBERT v2
curl -X POST "http://localhost:7860/embed?model=jobbertv2" \
-H "Content-Type: application/json" \
-d '{"input": "Data Scientist"}'
# JobBERT v3 (recommended)
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
-H "Content-Type: application/json" \
-d '{"input": "Product Manager"}'
# Jina AI
curl -X POST "http://localhost:7860/embed?model=jina" \
-H "Content-Type: application/json" \
-d '{"input": "Machine Learning Engineer"}'
Multiple Texts (Batch)
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
-H "Content-Type: application/json" \
-d '{
"input": ["Software Engineer", "Data Scientist", "Product Manager"]
}'
Response:
{
"embeddings": [
[0.123, -0.456, ...],
[0.234, -0.567, ...],
[0.345, -0.678, ...]
]
}
Jina AI with Task Type
curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \
-H "Content-Type: application/json" \
-d '{"input": "What is machine learning?"}'
Jina AI Tasks (query parameter):
retrieval.query: For search queriesretrieval.passage: For documentstext-matching: For similarity (default)
Voyage AI (requires API key)
curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \
-H "Content-Type: application/json" \
-d '{"input": "This is a document to embed"}'
Voyage AI Input Types (query parameter):
document: For documents/passagesquery: For search queries
Batch Endpoint (Original Format)
For compatibility, the original batch endpoint is still available at /embed/batch:
curl -X POST http://localhost:7860/embed/batch \
-H "Content-Type: application/json" \
-d '{
"texts": ["Software Engineer", "Data Scientist"],
"model": "jobbertv3"
}'
Response includes metadata:
{
"embeddings": [[0.123, ...], [0.234, ...]],
"model": "jobbertv3",
"dimension": 768,
"num_texts": 2
}
List Available Models
curl http://localhost:7860/models
Python Client Examples
Elasticsearch-Compatible Format (Recommended)
import requests
BASE_URL = "http://localhost:7860"
API_KEY = "your-api-key-here" # Optional, only if API key is required
# Headers (include API key if required)
headers = {}
if API_KEY:
headers["Authorization"] = f"Bearer {API_KEY}"
# Single embedding (JobBERT v3 - default)
response = requests.post(
f"{BASE_URL}/embed",
headers=headers,
json={"input": "Software Engineer"}
)
result = response.json()
embedding = result["embedding"] # Single vector
print(f"Embedding dimension: {len(embedding)}")
# Single embedding with model selection
response = requests.post(
f"{BASE_URL}/embed?model=jina",
headers=headers,
json={"input": "Data Scientist"}
)
embedding = response.json()["embedding"]
# Batch embeddings
response = requests.post(
f"{BASE_URL}/embed?model=jobbertv3",
headers=headers,
json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]}
)
result = response.json()
embeddings = result["embeddings"] # List of vectors
print(f"Generated {len(embeddings)} embeddings")
# Jina AI with task
response = requests.post(
f"{BASE_URL}/embed?model=jina&task=retrieval.query",
headers=headers,
json={"input": "What is Python?"}
)
# Voyage AI with input type
response = requests.post(
f"{BASE_URL}/embed?model=voyage&input_type=document",
headers=headers,
json={"input": "Document text here"}
)
Python Client Class with API Key Support
import requests
from typing import List, Union, Optional
class EmbeddingClient:
def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"):
self.base_url = base_url
self.api_key = api_key
self.model = model
self.headers = {}
if api_key:
self.headers["Authorization"] = f"Bearer {api_key}"
def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]:
"""Get embeddings for single text or batch"""
response = requests.post(
f"{self.base_url}/embed?model={self.model}",
headers=self.headers,
json={"input": text}
)
response.raise_for_status()
result = response.json()
if isinstance(text, str):
return result["embedding"]
else:
return result["embeddings"]
# Usage
client = EmbeddingClient(
base_url="https://YOUR-SPACE.hf.space",
api_key="your-api-key-here", # Optional
model="jobbertv3"
)
# Single embedding
embedding = client.embed("Software Engineer")
print(f"Dimension: {len(embedding)}")
# Batch embeddings
embeddings = client.embed(["Software Engineer", "Data Scientist"])
print(f"Generated {len(embeddings)} embeddings")
Batch Format (Original)
import requests
url = "http://localhost:7860/embed/batch"
response = requests.post(url, json={
"texts": ["Software Engineer", "Data Scientist"],
"model": "jobbertv3"
})
result = response.json()
embeddings = result["embeddings"]
print(f"Model: {result['model']}, Dimension: {result['dimension']}")
Environment Variables
PORT: Server port (default: 7860)API_KEY: Your API key for authentication (optional, but recommended for production)REQUIRE_API_KEY: Set totrueto enable API key authentication (default:false)VOYAGE_API_KEY: Voyage AI API key (optional, required for Voyage embeddings)
Setting Up API Key Authentication
Local Development
# Set environment variables
export API_KEY="your-secret-key-here"
export REQUIRE_API_KEY="true"
# Run the API
python api.py
Hugging Face Spaces
- Go to your Space settings
- Click on "Variables and secrets"
- Add secrets:
- Name:
API_KEY, Value:your-secret-key-here - Name:
REQUIRE_API_KEY, Value:true
- Name:
- Restart your Space
Docker
docker run -p 7860:7860 \
-e API_KEY="your-secret-key-here" \
-e REQUIRE_API_KEY="true" \
embedding-api
Interactive Documentation
Once the API is running, visit:
- Swagger UI: http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc
Notes
- Models are downloaded automatically on first startup (~2-3GB total)
- Voyage AI requires an API key from https://www.voyageai.com/
- First request to each model may be slower due to model loading
- Use batch processing for better performance (send multiple texts at once)
Troubleshooting
Models not loading
- Check available disk space (need ~3GB)
- Ensure internet connection for model download
- Check logs for specific error messages
Voyage AI not working
- Verify
VOYAGE_API_KEYis set correctly - Check API key has sufficient credits
- Ensure
voyageaipackage is installed
Out of memory
- Reduce batch size (process fewer texts per request)
- Use smaller models (JobBERT v2 instead of Jina)
- Increase container memory limits
License
This API uses models with different licenses:
- JobBERT v2/v3: Apache 2.0
- Jina AI: Apache 2.0
- Voyage AI: Subject to Voyage AI terms of service