jina-embeddings-v5-text-small-text-matching: Text-Matching-Targeted Embedding Distillation
Blog | Elastic Inference Service | ArXiv | Blog
Model Overview
It is part of the jina-embeddings-v5-text model family, which also includes jina-embeddings-v5-text-nano, a smaller model for more resource-constrained use cases.
Trained using a novel approach that combines distillation with task-specific contrastive losses, jina-embeddings-v5-text-small-text-matching outperforms existing state-of-the-art models of similar size across diverse embedding benchmarks.
| Feature | Value |
|---|---|
| Parameters | 677M |
| Supported Tasks | text-matching |
| Max Sequence Length | 32768 |
| Embedding Dimension | 1024 |
| Matryoshka Dimensions | 32, 64, 128, 256, 512, 768, 1024 |
| Pooling Strategy | Last-token pooling |
| Base Model | jinaai/jina-embeddings-v5-text-small |
Training and Evaluation
For training details and evaluation results, see our technical report.
Usage
Requirements
The following Python packages are required:
transformers>=5.1.0torch>=2.8.0peft>=0.15.2vllm>=0.15.1
Optional / Recommended
- flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
- sentence-transformers: If you want to use the model via the
sentence-transformersinterface, install this package as well.
via Elastic Inference Service
The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.
PUT _inference/text_embedding/jina-v5
{
"service": "elastic",
"service_settings": {
"model_id": "jina-embeddings-v5-text-small"
}
}
See the Elastic Inference Service documentation for setup details.
via sentence-transformers
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-small-text-matching",
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
config_kwargs={"_attn_implementation": "flash_attention_2"}, # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size
texts = [
"A beautiful sunset over the beach", # English
"غروب جميل على الشاطئ", # Arabic
"海滩上美丽的日落", # Chinese
"Un beau coucher de soleil sur la plage", # French
"Ein wunderschöner Sonnenuntergang am Strand", # German
"Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία", # Greek
"समुद्र तट पर एक खूबसूरत सूर्यास्त", # Hindi
"Un bellissimo tramonto sulla spiaggia", # Italian
"浜辺に沈む美しい夕日", # Japanese
"해변 위로 아름다운 일몰", # Korean
]
# Encode texts
embeddings = model.encode(texts)
print(embeddings.shape)
# (10, 1024)
similarity = model.similarity(embeddings[0], embeddings[1:])
print(similarity)
# tensor([[0.7833, 0.8926, 0.9333, 0.9421, 0.7588, 0.9068, 0.9301, 0.8521, 0.8768]])
via vLLM
from vllm import LLM
# Initialize model
name = "jinaai/jina-embeddings-v5-text-small-text-matching"
model = LLM(
model=name,
dtype="float16",
runner="pooling",
pooler_config=PoolerConfig(seq_pooling_type="LAST", normalize=True)
)
# Create text prompts
document1 = "Overview of climate change impacts on coastal cities"
document1_prompt = f"Document: {document1}"
document2 = "The impacts of climate change on large cities"
document2_prompt = f"Document: {document2}"
# Encode all prompts
prompts = [document1_prompt, document2_prompt]
outputs = model.encode(prompts, pooling_task="embed")
embed_document1 = outputs[0].outputs.data
embed_document2 = outputs[1].outputs.data
via Text Embeddings Inference
- Via Docker on CPU:
docker run -p 8080:80 \ ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 \ --model-id jinaai/jina-embeddings-v5-text-small-text-matching \ --dtype float32 --pooling last-token - Via Docker on NVIDIA GPU (Turing, Ampere, Ada Lovelace, Hopper or Blackwell):
docker run --gpus all --shm-size 1g -p 8080:80 \ ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 \ --model-id jinaai/jina-embeddings-v5-text-small-text-matching \ --dtype float16 --pooling last-token
Alternatively, you can also run with
cargo, more information can be found in the Text Embeddings Inference documentation.
Send a request to /v1/embeddings to generate embeddings via the OpenAI Embeddings API:
curl -X POST http://127.0.0.1:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "jinaai/jina-embeddings-v5-text-small-text-matching",
"input": [
"Document: The impacts of climate change on coastal cities are significant...",
]
}'
Or rather via the Text Embeddings Inference API specification instead, to prevent from manually formatting the inputs:
curl -X POST http://127.0.0.1:8080/embed \
-H "Content-Type: application/json" \
-d '{
"inputs": "Overview of climate change impacts on coastal cities",
"prompt_name": "document",
}'
via llama.cpp (GGUF)
After installing llama.cpp one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version:llama-server -hf jinaai/jina-embeddings-v5-text-small-text-matching:F16 --embedding --pooling last -ub 32768
Client:
curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"input": [
"Document: A beautiful sunset over the beach",
"Document: Un beau coucher de soleil sur la plage",
"Document: 海滩上美丽的日落",
"Document: 浜辺に沈む美しい夕日",
"Document: Golden sunlight melts into the horizon, painting waves in warm amber and rose, while the sky whispers goodnight to the quiet, endless sea."
]
}'
License
The model is licensed under CC BY-NC 4.0. For commercial use, please contact us.
Citation
If you find jina-embeddings-v5-text-small-text-matching useful in your research, please cite the following paper:
@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation},
author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael Günther and Maximilian Werk and Han Xiao},
year={2026},
eprint={2602.15547},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.15547},
}
- Downloads last month
- 9,932