inference / README.md
nurulajt's picture
Update README.md
b810e9b verified
---
title: Embedding Inference API
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# Embedding Inference API
A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.
## Features
- **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
- **RESTful API**: Easy-to-use HTTP endpoints
- **Batch Processing**: Process multiple texts in a single request
- **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.)
- **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment
## Supported Models
| Model | Dimension | Max Tokens | Best For |
|-------|-----------|------------|----------|
| JobBERT v2 | 768 | 512 | Job titles and descriptions |
| JobBERT v3 | 768 | 512 | Job titles (improved performance) |
| Jina AI v3 | 1024 | 8,192 | General text, long documents |
| Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |
## Quick Start
### Local Development
1. **Install dependencies:**
```bash
cd embedding
pip install -r requirements.txt
```
2. **Run the API:**
```bash
python api.py
```
3. **Access the API:**
- API: http://localhost:7860
- Docs: http://localhost:7860/docs
### Docker Deployment
1. **Build the image:**
```bash
docker build -t embedding-api .
```
2. **Run the container:**
```bash
docker run -p 7860:7860 embedding-api
```
3. **With Voyage AI (optional):**
```bash
docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
```
## Hugging Face Spaces Deployment
### Option 1: Using Hugging Face CLI
1. **Install Hugging Face CLI:**
```bash
pip install huggingface_hub
huggingface-cli login
```
2. **Create a new Space:**
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose "Docker" as the Space SDK
- Name your space (e.g., `your-username/embedding-api`)
3. **Clone and push:**
```bash
git clone https://huggingface.co/spaces/your-username/embedding-api
cd embedding-api
# Copy files from embedding folder
cp /path/to/embedding/Dockerfile .
cp /path/to/embedding/api.py .
cp /path/to/embedding/requirements.txt .
cp /path/to/embedding/README.md .
git add .
git commit -m "Initial commit"
git push
```
4. **Configure environment (optional):**
- Go to your Space settings
- Add `VOYAGE_API_KEY` secret if using Voyage AI
### Option 2: Manual Upload
1. Create a new Docker Space on Hugging Face
2. Upload these files:
- `Dockerfile`
- `api.py`
- `requirements.txt`
- `README.md`
3. Add environment variables in Settings if needed
## API Usage
### Health Check
```bash
curl http://localhost:7860/health
```
Response:
```json
{
"status": "healthy",
"models_loaded": ["jobbertv2", "jobbertv3", "jina"],
"voyage_available": false,
"api_key_required": false
}
```
### Generate Embeddings (Elasticsearch Compatible)
The main `/embed` endpoint uses Elasticsearch inference API format with model selection via query parameter.
#### Single Text (JobBERT v3 - default)
Without API key:
```bash
curl -X POST "http://localhost:7860/embed" \
-H "Content-Type: application/json" \
-d '{
"input": "Software Engineer"
}'
```
With API key:
```bash
curl -X POST "http://localhost:7860/embed" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"input": "Software Engineer"
}'
```
Response:
```json
{
"embedding": [0.123, -0.456, 0.789, ...]
}
```
#### Single Text with Model Selection
```bash
# JobBERT v2
curl -X POST "http://localhost:7860/embed?model=jobbertv2" \
-H "Content-Type: application/json" \
-d '{"input": "Data Scientist"}'
# JobBERT v3 (recommended)
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
-H "Content-Type: application/json" \
-d '{"input": "Product Manager"}'
# Jina AI
curl -X POST "http://localhost:7860/embed?model=jina" \
-H "Content-Type: application/json" \
-d '{"input": "Machine Learning Engineer"}'
```
#### Multiple Texts (Batch)
```bash
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
-H "Content-Type: application/json" \
-d '{
"input": ["Software Engineer", "Data Scientist", "Product Manager"]
}'
```
Response:
```json
{
"embeddings": [
[0.123, -0.456, ...],
[0.234, -0.567, ...],
[0.345, -0.678, ...]
]
}
```
#### Jina AI with Task Type
```bash
curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \
-H "Content-Type: application/json" \
-d '{"input": "What is machine learning?"}'
```
**Jina AI Tasks (query parameter):**
- `retrieval.query`: For search queries
- `retrieval.passage`: For documents
- `text-matching`: For similarity (default)
#### Voyage AI (requires API key)
```bash
curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \
-H "Content-Type: application/json" \
-d '{"input": "This is a document to embed"}'
```
**Voyage AI Input Types (query parameter):**
- `document`: For documents/passages
- `query`: For search queries
### Batch Endpoint (Original Format)
For compatibility, the original batch endpoint is still available at `/embed/batch`:
```bash
curl -X POST http://localhost:7860/embed/batch \
-H "Content-Type: application/json" \
-d '{
"texts": ["Software Engineer", "Data Scientist"],
"model": "jobbertv3"
}'
```
Response includes metadata:
```json
{
"embeddings": [[0.123, ...], [0.234, ...]],
"model": "jobbertv3",
"dimension": 768,
"num_texts": 2
}
```
### List Available Models
```bash
curl http://localhost:7860/models
```
## Python Client Examples
### Elasticsearch-Compatible Format (Recommended)
```python
import requests
BASE_URL = "http://localhost:7860"
API_KEY = "your-api-key-here" # Optional, only if API key is required
# Headers (include API key if required)
headers = {}
if API_KEY:
headers["Authorization"] = f"Bearer {API_KEY}"
# Single embedding (JobBERT v3 - default)
response = requests.post(
f"{BASE_URL}/embed",
headers=headers,
json={"input": "Software Engineer"}
)
result = response.json()
embedding = result["embedding"] # Single vector
print(f"Embedding dimension: {len(embedding)}")
# Single embedding with model selection
response = requests.post(
f"{BASE_URL}/embed?model=jina",
headers=headers,
json={"input": "Data Scientist"}
)
embedding = response.json()["embedding"]
# Batch embeddings
response = requests.post(
f"{BASE_URL}/embed?model=jobbertv3",
headers=headers,
json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]}
)
result = response.json()
embeddings = result["embeddings"] # List of vectors
print(f"Generated {len(embeddings)} embeddings")
# Jina AI with task
response = requests.post(
f"{BASE_URL}/embed?model=jina&task=retrieval.query",
headers=headers,
json={"input": "What is Python?"}
)
# Voyage AI with input type
response = requests.post(
f"{BASE_URL}/embed?model=voyage&input_type=document",
headers=headers,
json={"input": "Document text here"}
)
```
### Python Client Class with API Key Support
```python
import requests
from typing import List, Union, Optional
class EmbeddingClient:
def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"):
self.base_url = base_url
self.api_key = api_key
self.model = model
self.headers = {}
if api_key:
self.headers["Authorization"] = f"Bearer {api_key}"
def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]:
"""Get embeddings for single text or batch"""
response = requests.post(
f"{self.base_url}/embed?model={self.model}",
headers=self.headers,
json={"input": text}
)
response.raise_for_status()
result = response.json()
if isinstance(text, str):
return result["embedding"]
else:
return result["embeddings"]
# Usage
client = EmbeddingClient(
base_url="https://YOUR-SPACE.hf.space",
api_key="your-api-key-here", # Optional
model="jobbertv3"
)
# Single embedding
embedding = client.embed("Software Engineer")
print(f"Dimension: {len(embedding)}")
# Batch embeddings
embeddings = client.embed(["Software Engineer", "Data Scientist"])
print(f"Generated {len(embeddings)} embeddings")
```
### Batch Format (Original)
```python
import requests
url = "http://localhost:7860/embed/batch"
response = requests.post(url, json={
"texts": ["Software Engineer", "Data Scientist"],
"model": "jobbertv3"
})
result = response.json()
embeddings = result["embeddings"]
print(f"Model: {result['model']}, Dimension: {result['dimension']}")
```
## Environment Variables
- `PORT`: Server port (default: 7860)
- `API_KEY`: Your API key for authentication (optional, but recommended for production)
- `REQUIRE_API_KEY`: Set to `true` to enable API key authentication (default: `false`)
- `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)
### Setting Up API Key Authentication
#### Local Development
```bash
# Set environment variables
export API_KEY="your-secret-key-here"
export REQUIRE_API_KEY="true"
# Run the API
python api.py
```
#### Hugging Face Spaces
1. Go to your Space settings
2. Click on "Variables and secrets"
3. Add secrets:
- Name: `API_KEY`, Value: `your-secret-key-here`
- Name: `REQUIRE_API_KEY`, Value: `true`
4. Restart your Space
#### Docker
```bash
docker run -p 7860:7860 \
-e API_KEY="your-secret-key-here" \
-e REQUIRE_API_KEY="true" \
embedding-api
```
## Interactive Documentation
Once the API is running, visit:
- **Swagger UI**: http://localhost:7860/docs
- **ReDoc**: http://localhost:7860/redoc
## Notes
- Models are downloaded automatically on first startup (~2-3GB total)
- Voyage AI requires an API key from https://www.voyageai.com/
- First request to each model may be slower due to model loading
- Use batch processing for better performance (send multiple texts at once)
## Troubleshooting
### Models not loading
- Check available disk space (need ~3GB)
- Ensure internet connection for model download
- Check logs for specific error messages
### Voyage AI not working
- Verify `VOYAGE_API_KEY` is set correctly
- Check API key has sufficient credits
- Ensure `voyageai` package is installed
### Out of memory
- Reduce batch size (process fewer texts per request)
- Use smaller models (JobBERT v2 instead of Jina)
- Increase container memory limits
## License
This API uses models with different licenses:
- JobBERT v2/v3: Apache 2.0
- Jina AI: Apache 2.0
- Voyage AI: Subject to Voyage AI terms of service