Spaces:

mycompanyajt
/

inference

Running

App Files Files Community

inference / README.md

nurulajt

Update README.md

b810e9b verified 23 days ago

preview code

raw

history blame contribute delete

10.9 kB

	---
	title: Embedding Inference API
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# Embedding Inference API

	A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.

	## Features

	- Multiple Models: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
	- RESTful API: Easy-to-use HTTP endpoints
	- Batch Processing: Process multiple texts in a single request
	- Task-Specific Embeddings: Support for different embedding tasks (retrieval, classification, etc.)
	- Docker Ready: Easy deployment to Hugging Face Spaces or any Docker environment

	## Supported Models

	\| Model \| Dimension \| Max Tokens \| Best For \|
	\|-------\|-----------\|------------\|----------\|
	\| JobBERT v2 \| 768 \| 512 \| Job titles and descriptions \|
	\| JobBERT v3 \| 768 \| 512 \| Job titles (improved performance) \|
	\| Jina AI v3 \| 1024 \| 8,192 \| General text, long documents \|
	\| Voyage AI \| 1024 \| 32,000 \| High-quality embeddings (requires API key) \|

	## Quick Start

	### Local Development

	1. Install dependencies:
	```bash
	cd embedding
	pip install -r requirements.txt
	```

	2. Run the API:
	```bash
	python api.py
	```

	3. Access the API:
	- API: http://localhost:7860
	- Docs: http://localhost:7860/docs

	### Docker Deployment

	1. Build the image:
	```bash
	docker build -t embedding-api .
	```

	2. Run the container:
	```bash
	docker run -p 7860:7860 embedding-api
	```

	3. With Voyage AI (optional):
	```bash
	docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
	```

	## Hugging Face Spaces Deployment

	### Option 1: Using Hugging Face CLI

	1. Install Hugging Face CLI:
	```bash
	pip install huggingface_hub
	huggingface-cli login
	```

	2. Create a new Space:
	- Go to https://huggingface.co/spaces
	- Click "Create new Space"
	- Choose "Docker" as the Space SDK
	- Name your space (e.g., `your-username/embedding-api`)

	3. Clone and push:
	```bash
	git clone https://huggingface.co/spaces/your-username/embedding-api
	cd embedding-api

	# Copy files from embedding folder
	cp /path/to/embedding/Dockerfile .
	cp /path/to/embedding/api.py .
	cp /path/to/embedding/requirements.txt .
	cp /path/to/embedding/README.md .

	git add .
	git commit -m "Initial commit"
	git push
	```

	4. Configure environment (optional):
	- Go to your Space settings
	- Add `VOYAGE_API_KEY` secret if using Voyage AI

	### Option 2: Manual Upload

	1. Create a new Docker Space on Hugging Face
	2. Upload these files:
	- `Dockerfile`
	- `api.py`
	- `requirements.txt`
	- `README.md`
	3. Add environment variables in Settings if needed

	## API Usage

	### Health Check

	```bash
	curl http://localhost:7860/health
	```

	Response:
	```json
	{
	"status": "healthy",
	"models_loaded": ["jobbertv2", "jobbertv3", "jina"],
	"voyage_available": false,
	"api_key_required": false
	}
	```

	### Generate Embeddings (Elasticsearch Compatible)

	The main `/embed` endpoint uses Elasticsearch inference API format with model selection via query parameter.

	#### Single Text (JobBERT v3 - default)

	Without API key:
	```bash
	curl -X POST "http://localhost:7860/embed" \
	-H "Content-Type: application/json" \
	-d '{
	"input": "Software Engineer"
	}'
	```

	With API key:
	```bash
	curl -X POST "http://localhost:7860/embed" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer YOUR_API_KEY" \
	-d '{
	"input": "Software Engineer"
	}'
	```

	Response:
	```json
	{
	"embedding": [0.123, -0.456, 0.789, ...]
	}
	```

	#### Single Text with Model Selection

	```bash
	# JobBERT v2
	curl -X POST "http://localhost:7860/embed?model=jobbertv2" \
	-H "Content-Type: application/json" \
	-d '{"input": "Data Scientist"}'

	# JobBERT v3 (recommended)
	curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
	-H "Content-Type: application/json" \
	-d '{"input": "Product Manager"}'

	# Jina AI
	curl -X POST "http://localhost:7860/embed?model=jina" \
	-H "Content-Type: application/json" \
	-d '{"input": "Machine Learning Engineer"}'
	```

	#### Multiple Texts (Batch)

	```bash
	curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
	-H "Content-Type: application/json" \
	-d '{
	"input": ["Software Engineer", "Data Scientist", "Product Manager"]
	}'
	```

	Response:
	```json
	{
	"embeddings": [
	[0.123, -0.456, ...],
	[0.234, -0.567, ...],
	[0.345, -0.678, ...]
	]
	}
	```

	#### Jina AI with Task Type

	```bash
	curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \
	-H "Content-Type: application/json" \
	-d '{"input": "What is machine learning?"}'
	```

	Jina AI Tasks (query parameter):
	- `retrieval.query`: For search queries
	- `retrieval.passage`: For documents
	- `text-matching`: For similarity (default)

	#### Voyage AI (requires API key)

	```bash
	curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \
	-H "Content-Type: application/json" \
	-d '{"input": "This is a document to embed"}'
	```

	Voyage AI Input Types (query parameter):
	- `document`: For documents/passages
	- `query`: For search queries

	### Batch Endpoint (Original Format)

	For compatibility, the original batch endpoint is still available at `/embed/batch`:

	```bash
	curl -X POST http://localhost:7860/embed/batch \
	-H "Content-Type: application/json" \
	-d '{
	"texts": ["Software Engineer", "Data Scientist"],
	"model": "jobbertv3"
	}'
	```

	Response includes metadata:
	```json
	{
	"embeddings": [[0.123, ...], [0.234, ...]],
	"model": "jobbertv3",
	"dimension": 768,
	"num_texts": 2
	}
	```

	### List Available Models

	```bash
	curl http://localhost:7860/models
	```

	## Python Client Examples

	### Elasticsearch-Compatible Format (Recommended)

	```python
	import requests

	BASE_URL = "http://localhost:7860"
	API_KEY = "your-api-key-here" # Optional, only if API key is required

	# Headers (include API key if required)
	headers = {}
	if API_KEY:
	headers["Authorization"] = f"Bearer {API_KEY}"

	# Single embedding (JobBERT v3 - default)
	response = requests.post(
	f"{BASE_URL}/embed",
	headers=headers,
	json={"input": "Software Engineer"}
	)
	result = response.json()
	embedding = result["embedding"] # Single vector
	print(f"Embedding dimension: {len(embedding)}")

	# Single embedding with model selection
	response = requests.post(
	f"{BASE_URL}/embed?model=jina",
	headers=headers,
	json={"input": "Data Scientist"}
	)
	embedding = response.json()["embedding"]

	# Batch embeddings
	response = requests.post(
	f"{BASE_URL}/embed?model=jobbertv3",
	headers=headers,
	json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]}
	)
	result = response.json()
	embeddings = result["embeddings"] # List of vectors
	print(f"Generated {len(embeddings)} embeddings")

	# Jina AI with task
	response = requests.post(
	f"{BASE_URL}/embed?model=jina&task=retrieval.query",
	headers=headers,
	json={"input": "What is Python?"}
	)

	# Voyage AI with input type
	response = requests.post(
	f"{BASE_URL}/embed?model=voyage&input_type=document",
	headers=headers,
	json={"input": "Document text here"}
	)
	```

	### Python Client Class with API Key Support

	```python
	import requests
	from typing import List, Union, Optional

	class EmbeddingClient:
	def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"):
	self.base_url = base_url
	self.api_key = api_key
	self.model = model
	self.headers = {}
	if api_key:
	self.headers["Authorization"] = f"Bearer {api_key}"

	def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]:
	"""Get embeddings for single text or batch"""
	response = requests.post(
	f"{self.base_url}/embed?model={self.model}",
	headers=self.headers,
	json={"input": text}
	)
	response.raise_for_status()
	result = response.json()

	if isinstance(text, str):
	return result["embedding"]
	else:
	return result["embeddings"]

	# Usage
	client = EmbeddingClient(
	base_url="https://YOUR-SPACE.hf.space",
	api_key="your-api-key-here", # Optional
	model="jobbertv3"
	)

	# Single embedding
	embedding = client.embed("Software Engineer")
	print(f"Dimension: {len(embedding)}")

	# Batch embeddings
	embeddings = client.embed(["Software Engineer", "Data Scientist"])
	print(f"Generated {len(embeddings)} embeddings")
	```

	### Batch Format (Original)

	```python
	import requests

	url = "http://localhost:7860/embed/batch"

	response = requests.post(url, json={
	"texts": ["Software Engineer", "Data Scientist"],
	"model": "jobbertv3"
	})
	result = response.json()
	embeddings = result["embeddings"]
	print(f"Model: {result['model']}, Dimension: {result['dimension']}")
	```

	## Environment Variables

	- `PORT`: Server port (default: 7860)
	- `API_KEY`: Your API key for authentication (optional, but recommended for production)
	- `REQUIRE_API_KEY`: Set to `true` to enable API key authentication (default: `false`)
	- `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)

	### Setting Up API Key Authentication

	#### Local Development

	```bash
	# Set environment variables
	export API_KEY="your-secret-key-here"
	export REQUIRE_API_KEY="true"

	# Run the API
	python api.py
	```

	#### Hugging Face Spaces

	1. Go to your Space settings
	2. Click on "Variables and secrets"
	3. Add secrets:
	- Name: `API_KEY`, Value: `your-secret-key-here`
	- Name: `REQUIRE_API_KEY`, Value: `true`
	4. Restart your Space

	#### Docker

	```bash
	docker run -p 7860:7860 \
	-e API_KEY="your-secret-key-here" \
	-e REQUIRE_API_KEY="true" \
	embedding-api
	```

	## Interactive Documentation

	Once the API is running, visit:
	- Swagger UI: http://localhost:7860/docs
	- ReDoc: http://localhost:7860/redoc

	## Notes

	- Models are downloaded automatically on first startup (~2-3GB total)
	- Voyage AI requires an API key from https://www.voyageai.com/
	- First request to each model may be slower due to model loading
	- Use batch processing for better performance (send multiple texts at once)

	## Troubleshooting

	### Models not loading
	- Check available disk space (need ~3GB)
	- Ensure internet connection for model download
	- Check logs for specific error messages

	### Voyage AI not working
	- Verify `VOYAGE_API_KEY` is set correctly
	- Check API key has sufficient credits
	- Ensure `voyageai` package is installed

	### Out of memory
	- Reduce batch size (process fewer texts per request)
	- Use smaller models (JobBERT v2 instead of Jina)
	- Increase container memory limits

	## License

	This API uses models with different licenses:
	- JobBERT v2/v3: Apache 2.0
	- Jina AI: Apache 2.0
	- Voyage AI: Subject to Voyage AI terms of service