illuin-conteb
/

modernbert-large-insent

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

manu commited on Jun 2

Commit

e745b8d

·

verified ·

1 Parent(s): 6462378

Update README.md

Files changed (1) hide show

README.md +26 -16

README.md CHANGED Viewed

@@ -12,24 +12,17 @@ base_model:
 # ModernBERT-embed-large + InSeNT
-This is a contextual model finetuned from [modernbert-embed-large](https://huggingface.co/answerdotai/ModernBERT-large) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
-## Model Details
-### Model Description
-- **Model Type:** Sentence Transformer
-- **Maximum Sequence Length:**  8192 tokens
-- **Output Dimensionality:** 768 dimensions
-- **Similarity Function:** cosine
-- **Training Dataset:**
-    - train
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
-- **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
 ## Usage
@@ -69,9 +62,26 @@ print("Length of first document embedding:", len(embeddings[0])) # 3
 print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([768])
 ```
 ## Citation
-### BibTeX
 ```bibtex
 @misc{conti2025contextgoldgoldpassage,

 # ModernBERT-embed-large + InSeNT
+[![arXiv](https://img.shields.io/badge/arXiv-2505.24782-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2505.24782)
+[![GitHub](https://img.shields.io/badge/Code_Repository-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/illuin-tech/contextual-embeddings)
+[![Hugging Face](https://img.shields.io/badge/ConTEB_HF_Page-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/illuin-conteb)
+<img src="https://cdn-uploads.huggingface.co/production/uploads/60f2e021adf471cbdf8bb660/jq_zYRy23bOZ9qey3VY4v.png" width="800">
+This is a contextual model finetuned from [modernbert-embed-large](https://huggingface.co/answerdotai/ModernBERT-large) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
+> [!WARNING]
+> This experimental model stems from the paper [*Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings*](https://arxiv.org/abs/2505.24782).
+> While results are promising, we have seen regression on standard embedding tasks, and using it in production will probably require further work on extending the training set to improve robustness and OOD generalization.
 ## Usage
 print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([768])
 ```
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Maximum Sequence Length:**  8192 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** cosine
+- **Training Dataset:**
+    - train
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
+- **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
 ## Citation
 ```bibtex
 @misc{conti2025contextgoldgoldpassage,