Update README.md
Browse files
README.md
CHANGED
|
@@ -8983,6 +8983,25 @@ The core training code will be integrated into the rag-retrieval library(https:/
|
|
| 8983 |
|
| 8984 |
This work was accomplished during my free time; please grant time a little time.
|
| 8985 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8986 |
## Usage
|
| 8987 |
```python
|
| 8988 |
|
|
@@ -9110,5 +9129,10 @@ if __name__ == "__main__":
|
|
| 9110 |
|
| 9111 |
|
| 9112 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9113 |
## License
|
| 9114 |
**This model should not be used for any commercial purpose!**
|
|
|
|
| 8983 |
|
| 8984 |
This work was accomplished during my free time; please grant time a little time.
|
| 8985 |
|
| 8986 |
+
|
| 8987 |
+
Here's a short introduction to the training method:
|
| 8988 |
+
|
| 8989 |
+
The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
|
| 8990 |
+
The training process of jasper have 4 stage:
|
| 8991 |
+
|
| 8992 |
+
Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
|
| 8993 |
+
|
| 8994 |
+
Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
|
| 8995 |
+
|
| 8996 |
+
Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
|
| 8997 |
+
|
| 8998 |
+
I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
|
| 8999 |
+
|
| 9000 |
+
**The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
|
| 9001 |
+
Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless.
|
| 9002 |
+
|
| 9003 |
+
|
| 9004 |
+
|
| 9005 |
## Usage
|
| 9006 |
```python
|
| 9007 |
|
|
|
|
| 9129 |
|
| 9130 |
|
| 9131 |
```
|
| 9132 |
+
|
| 9133 |
+
## Evaluation on MTEB
|
| 9134 |
+
|
| 9135 |
+
script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
|
| 9136 |
+
|
| 9137 |
## License
|
| 9138 |
**This model should not be used for any commercial purpose!**
|