manu commited on
Commit
e745b8d
·
verified ·
1 Parent(s): 6462378

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -16
README.md CHANGED
@@ -12,24 +12,17 @@ base_model:
12
 
13
  # ModernBERT-embed-large + InSeNT
14
 
15
- This is a contextual model finetuned from [modernbert-embed-large](https://huggingface.co/answerdotai/ModernBERT-large) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
 
 
16
 
17
- ## Model Details
18
 
19
- ### Model Description
20
- - **Model Type:** Sentence Transformer
21
- - **Maximum Sequence Length:** 8192 tokens
22
- - **Output Dimensionality:** 768 dimensions
23
- - **Similarity Function:** cosine
24
- - **Training Dataset:**
25
- - train
26
- <!-- - **Language:** Unknown -->
27
- <!-- - **License:** Unknown -->
28
-
29
- ### Model Sources
30
 
31
- - **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
32
- - **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
 
33
 
34
  ## Usage
35
 
@@ -69,9 +62,26 @@ print("Length of first document embedding:", len(embeddings[0])) # 3
69
  print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([768])
70
  ```
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ## Citation
73
 
74
- ### BibTeX
75
 
76
  ```bibtex
77
  @misc{conti2025contextgoldgoldpassage,
 
12
 
13
  # ModernBERT-embed-large + InSeNT
14
 
15
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.24782-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2505.24782)
16
+ [![GitHub](https://img.shields.io/badge/Code_Repository-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/illuin-tech/contextual-embeddings)
17
+ [![Hugging Face](https://img.shields.io/badge/ConTEB_HF_Page-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/illuin-conteb)
18
 
19
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/60f2e021adf471cbdf8bb660/jq_zYRy23bOZ9qey3VY4v.png" width="800">
20
 
21
+ This is a contextual model finetuned from [modernbert-embed-large](https://huggingface.co/answerdotai/ModernBERT-large) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
 
 
 
 
 
 
 
 
 
 
22
 
23
+ > [!WARNING]
24
+ > This experimental model stems from the paper [*Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings*](https://arxiv.org/abs/2505.24782).
25
+ > While results are promising, we have seen regression on standard embedding tasks, and using it in production will probably require further work on extending the training set to improve robustness and OOD generalization.
26
 
27
  ## Usage
28
 
 
62
  print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([768])
63
  ```
64
 
65
+
66
+ ## Model Details
67
+
68
+ ### Model Description
69
+ - **Model Type:** Sentence Transformer
70
+ - **Maximum Sequence Length:** 8192 tokens
71
+ - **Output Dimensionality:** 768 dimensions
72
+ - **Similarity Function:** cosine
73
+ - **Training Dataset:**
74
+ - train
75
+ <!-- - **Language:** Unknown -->
76
+ <!-- - **License:** Unknown -->
77
+
78
+ ### Model Sources
79
+
80
+ - **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
81
+ - **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
82
+
83
  ## Citation
84
 
 
85
 
86
  ```bibtex
87
  @misc{conti2025contextgoldgoldpassage,