tomaarsen HF Staff commited on
Commit
8117fae
·
verified ·
1 Parent(s): 7a484f0

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,1760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:90000
12
+ - loss:SpladeLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ - loss:FlopsLoss
15
+ base_model: distilbert/distilbert-base-uncased
16
+ widget:
17
+ - text: yvette mimieux net worth
18
+ - text: is sweet potato fries a bad calorie for you
19
+ - text: 'The name Gunnar is a Swedish baby name. In Swedish the meaning of the name
20
+ Gunnar is: Battle strong. American Meaning: The name Gunnar is an American baby
21
+ name.In American the meaning of the name Gunnar is: Battle strong.Teutonic Meaning:
22
+ The name Gunnar is a Teutonic baby name.In Teutonic the meaning of the name Gunnar
23
+ is: Bold warrior. Norse Meaning: The name Gunnar is a Norse baby name. In Norse
24
+ the meaning of the name Gunnar is: Iighter. Scandinavian Meaning: The name Gunnar
25
+ is a Scandinavian baby name.he name Gunnar is a Teutonic baby name. In Teutonic
26
+ the meaning of the name Gunnar is: Bold warrior. Norse Meaning: The name Gunnar
27
+ is a Norse baby name. In Norse the meaning of the name Gunnar is: Iighter. Scandinavian
28
+ Meaning: The name Gunnar is a Scandinavian baby name.'
29
+ - text: what your fsh test results indicate
30
+ - text: 1 Lymphoma, the most common canine cancer, usually requires only chemotherapy
31
+ and its cost can come up to be around $450 to $500. 2 Osteosarcoma, another type
32
+ of canine cancer, is usually treated with chemotherapy along with amputation surgery.3 This
33
+ type of chemotherapy treatment costs approximately $450.nother factor is the type
34
+ of drugs used in the process. The size of the dog that needs to undergo chemotherapy
35
+ can also impact the cost. Even a dog very small in size with a single cancerous
36
+ lesion can cost $200 for chemotherapy, while the same problem on a larger breed
37
+ could cost more than $1,000 a month.
38
+ datasets:
39
+ - sentence-transformers/msmarco
40
+ pipeline_tag: feature-extraction
41
+ library_name: sentence-transformers
42
+ metrics:
43
+ - dot_accuracy@1
44
+ - dot_accuracy@3
45
+ - dot_accuracy@5
46
+ - dot_accuracy@10
47
+ - dot_precision@1
48
+ - dot_precision@3
49
+ - dot_precision@5
50
+ - dot_precision@10
51
+ - dot_recall@1
52
+ - dot_recall@3
53
+ - dot_recall@5
54
+ - dot_recall@10
55
+ - dot_ndcg@10
56
+ - dot_mrr@10
57
+ - dot_map@100
58
+ - query_active_dims
59
+ - query_sparsity_ratio
60
+ - corpus_active_dims
61
+ - corpus_sparsity_ratio
62
+ co2_eq_emissions:
63
+ emissions: 53.738116037369885
64
+ energy_consumed: 0.1382501660330276
65
+ source: codecarbon
66
+ training_type: fine-tuning
67
+ on_cloud: false
68
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
69
+ ram_total_size: 31.777088165283203
70
+ hours_used: 0.458
71
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
72
+ model-index:
73
+ - name: splade-distilbert-base-uncased trained on MS MARCO triplets
74
+ results:
75
+ - task:
76
+ type: sparse-information-retrieval
77
+ name: Sparse Information Retrieval
78
+ dataset:
79
+ name: NanoMSMARCO
80
+ type: NanoMSMARCO
81
+ metrics:
82
+ - type: dot_accuracy@1
83
+ value: 0.38
84
+ name: Dot Accuracy@1
85
+ - type: dot_accuracy@3
86
+ value: 0.66
87
+ name: Dot Accuracy@3
88
+ - type: dot_accuracy@5
89
+ value: 0.72
90
+ name: Dot Accuracy@5
91
+ - type: dot_accuracy@10
92
+ value: 0.86
93
+ name: Dot Accuracy@10
94
+ - type: dot_precision@1
95
+ value: 0.38
96
+ name: Dot Precision@1
97
+ - type: dot_precision@3
98
+ value: 0.22
99
+ name: Dot Precision@3
100
+ - type: dot_precision@5
101
+ value: 0.14400000000000002
102
+ name: Dot Precision@5
103
+ - type: dot_precision@10
104
+ value: 0.08599999999999998
105
+ name: Dot Precision@10
106
+ - type: dot_recall@1
107
+ value: 0.38
108
+ name: Dot Recall@1
109
+ - type: dot_recall@3
110
+ value: 0.66
111
+ name: Dot Recall@3
112
+ - type: dot_recall@5
113
+ value: 0.72
114
+ name: Dot Recall@5
115
+ - type: dot_recall@10
116
+ value: 0.86
117
+ name: Dot Recall@10
118
+ - type: dot_ndcg@10
119
+ value: 0.6227350359947015
120
+ name: Dot Ndcg@10
121
+ - type: dot_mrr@10
122
+ value: 0.5469285714285713
123
+ name: Dot Mrr@10
124
+ - type: dot_map@100
125
+ value: 0.5547419639747225
126
+ name: Dot Map@100
127
+ - type: query_active_dims
128
+ value: 23.6200008392334
129
+ name: Query Active Dims
130
+ - type: query_sparsity_ratio
131
+ value: 0.999226131942886
132
+ name: Query Sparsity Ratio
133
+ - type: corpus_active_dims
134
+ value: 86.9286117553711
135
+ name: Corpus Active Dims
136
+ - type: corpus_sparsity_ratio
137
+ value: 0.9971519359230925
138
+ name: Corpus Sparsity Ratio
139
+ - type: dot_accuracy@1
140
+ value: 0.38
141
+ name: Dot Accuracy@1
142
+ - type: dot_accuracy@3
143
+ value: 0.66
144
+ name: Dot Accuracy@3
145
+ - type: dot_accuracy@5
146
+ value: 0.72
147
+ name: Dot Accuracy@5
148
+ - type: dot_accuracy@10
149
+ value: 0.86
150
+ name: Dot Accuracy@10
151
+ - type: dot_precision@1
152
+ value: 0.38
153
+ name: Dot Precision@1
154
+ - type: dot_precision@3
155
+ value: 0.22
156
+ name: Dot Precision@3
157
+ - type: dot_precision@5
158
+ value: 0.14400000000000002
159
+ name: Dot Precision@5
160
+ - type: dot_precision@10
161
+ value: 0.08599999999999998
162
+ name: Dot Precision@10
163
+ - type: dot_recall@1
164
+ value: 0.38
165
+ name: Dot Recall@1
166
+ - type: dot_recall@3
167
+ value: 0.66
168
+ name: Dot Recall@3
169
+ - type: dot_recall@5
170
+ value: 0.72
171
+ name: Dot Recall@5
172
+ - type: dot_recall@10
173
+ value: 0.86
174
+ name: Dot Recall@10
175
+ - type: dot_ndcg@10
176
+ value: 0.6227350359947015
177
+ name: Dot Ndcg@10
178
+ - type: dot_mrr@10
179
+ value: 0.5469285714285713
180
+ name: Dot Mrr@10
181
+ - type: dot_map@100
182
+ value: 0.5547419639747225
183
+ name: Dot Map@100
184
+ - type: query_active_dims
185
+ value: 23.6200008392334
186
+ name: Query Active Dims
187
+ - type: query_sparsity_ratio
188
+ value: 0.999226131942886
189
+ name: Query Sparsity Ratio
190
+ - type: corpus_active_dims
191
+ value: 86.9286117553711
192
+ name: Corpus Active Dims
193
+ - type: corpus_sparsity_ratio
194
+ value: 0.9971519359230925
195
+ name: Corpus Sparsity Ratio
196
+ - task:
197
+ type: sparse-information-retrieval
198
+ name: Sparse Information Retrieval
199
+ dataset:
200
+ name: NanoNFCorpus
201
+ type: NanoNFCorpus
202
+ metrics:
203
+ - type: dot_accuracy@1
204
+ value: 0.38
205
+ name: Dot Accuracy@1
206
+ - type: dot_accuracy@3
207
+ value: 0.54
208
+ name: Dot Accuracy@3
209
+ - type: dot_accuracy@5
210
+ value: 0.58
211
+ name: Dot Accuracy@5
212
+ - type: dot_accuracy@10
213
+ value: 0.62
214
+ name: Dot Accuracy@10
215
+ - type: dot_precision@1
216
+ value: 0.38
217
+ name: Dot Precision@1
218
+ - type: dot_precision@3
219
+ value: 0.36666666666666664
220
+ name: Dot Precision@3
221
+ - type: dot_precision@5
222
+ value: 0.32799999999999996
223
+ name: Dot Precision@5
224
+ - type: dot_precision@10
225
+ value: 0.266
226
+ name: Dot Precision@10
227
+ - type: dot_recall@1
228
+ value: 0.021751342131059177
229
+ name: Dot Recall@1
230
+ - type: dot_recall@3
231
+ value: 0.07406240621283516
232
+ name: Dot Recall@3
233
+ - type: dot_recall@5
234
+ value: 0.09358105221669372
235
+ name: Dot Recall@5
236
+ - type: dot_recall@10
237
+ value: 0.11969365467146144
238
+ name: Dot Recall@10
239
+ - type: dot_ndcg@10
240
+ value: 0.3145066096009473
241
+ name: Dot Ndcg@10
242
+ - type: dot_mrr@10
243
+ value: 0.4676904761904762
244
+ name: Dot Mrr@10
245
+ - type: dot_map@100
246
+ value: 0.1297946574794519
247
+ name: Dot Map@100
248
+ - type: query_active_dims
249
+ value: 18.780000686645508
250
+ name: Query Active Dims
251
+ - type: query_sparsity_ratio
252
+ value: 0.9993847060911262
253
+ name: Query Sparsity Ratio
254
+ - type: corpus_active_dims
255
+ value: 164.79444885253906
256
+ name: Corpus Active Dims
257
+ - type: corpus_sparsity_ratio
258
+ value: 0.9946007978227986
259
+ name: Corpus Sparsity Ratio
260
+ - type: dot_accuracy@1
261
+ value: 0.38
262
+ name: Dot Accuracy@1
263
+ - type: dot_accuracy@3
264
+ value: 0.54
265
+ name: Dot Accuracy@3
266
+ - type: dot_accuracy@5
267
+ value: 0.58
268
+ name: Dot Accuracy@5
269
+ - type: dot_accuracy@10
270
+ value: 0.62
271
+ name: Dot Accuracy@10
272
+ - type: dot_precision@1
273
+ value: 0.38
274
+ name: Dot Precision@1
275
+ - type: dot_precision@3
276
+ value: 0.36666666666666664
277
+ name: Dot Precision@3
278
+ - type: dot_precision@5
279
+ value: 0.32799999999999996
280
+ name: Dot Precision@5
281
+ - type: dot_precision@10
282
+ value: 0.266
283
+ name: Dot Precision@10
284
+ - type: dot_recall@1
285
+ value: 0.021751342131059177
286
+ name: Dot Recall@1
287
+ - type: dot_recall@3
288
+ value: 0.07406240621283516
289
+ name: Dot Recall@3
290
+ - type: dot_recall@5
291
+ value: 0.09358105221669372
292
+ name: Dot Recall@5
293
+ - type: dot_recall@10
294
+ value: 0.11969365467146144
295
+ name: Dot Recall@10
296
+ - type: dot_ndcg@10
297
+ value: 0.3145066096009473
298
+ name: Dot Ndcg@10
299
+ - type: dot_mrr@10
300
+ value: 0.4676904761904762
301
+ name: Dot Mrr@10
302
+ - type: dot_map@100
303
+ value: 0.1297946574794519
304
+ name: Dot Map@100
305
+ - type: query_active_dims
306
+ value: 18.780000686645508
307
+ name: Query Active Dims
308
+ - type: query_sparsity_ratio
309
+ value: 0.9993847060911262
310
+ name: Query Sparsity Ratio
311
+ - type: corpus_active_dims
312
+ value: 164.79444885253906
313
+ name: Corpus Active Dims
314
+ - type: corpus_sparsity_ratio
315
+ value: 0.9946007978227986
316
+ name: Corpus Sparsity Ratio
317
+ - task:
318
+ type: sparse-information-retrieval
319
+ name: Sparse Information Retrieval
320
+ dataset:
321
+ name: NanoNQ
322
+ type: NanoNQ
323
+ metrics:
324
+ - type: dot_accuracy@1
325
+ value: 0.52
326
+ name: Dot Accuracy@1
327
+ - type: dot_accuracy@3
328
+ value: 0.7
329
+ name: Dot Accuracy@3
330
+ - type: dot_accuracy@5
331
+ value: 0.72
332
+ name: Dot Accuracy@5
333
+ - type: dot_accuracy@10
334
+ value: 0.78
335
+ name: Dot Accuracy@10
336
+ - type: dot_precision@1
337
+ value: 0.52
338
+ name: Dot Precision@1
339
+ - type: dot_precision@3
340
+ value: 0.2333333333333333
341
+ name: Dot Precision@3
342
+ - type: dot_precision@5
343
+ value: 0.14400000000000002
344
+ name: Dot Precision@5
345
+ - type: dot_precision@10
346
+ value: 0.08199999999999999
347
+ name: Dot Precision@10
348
+ - type: dot_recall@1
349
+ value: 0.5
350
+ name: Dot Recall@1
351
+ - type: dot_recall@3
352
+ value: 0.66
353
+ name: Dot Recall@3
354
+ - type: dot_recall@5
355
+ value: 0.68
356
+ name: Dot Recall@5
357
+ - type: dot_recall@10
358
+ value: 0.75
359
+ name: Dot Recall@10
360
+ - type: dot_ndcg@10
361
+ value: 0.6400441027431699
362
+ name: Dot Ndcg@10
363
+ - type: dot_mrr@10
364
+ value: 0.6203571428571428
365
+ name: Dot Mrr@10
366
+ - type: dot_map@100
367
+ value: 0.6045362504066882
368
+ name: Dot Map@100
369
+ - type: query_active_dims
370
+ value: 26.940000534057617
371
+ name: Query Active Dims
372
+ - type: query_sparsity_ratio
373
+ value: 0.999117357953802
374
+ name: Query Sparsity Ratio
375
+ - type: corpus_active_dims
376
+ value: 109.75908660888672
377
+ name: Corpus Active Dims
378
+ - type: corpus_sparsity_ratio
379
+ value: 0.99640393530539
380
+ name: Corpus Sparsity Ratio
381
+ - type: dot_accuracy@1
382
+ value: 0.52
383
+ name: Dot Accuracy@1
384
+ - type: dot_accuracy@3
385
+ value: 0.7
386
+ name: Dot Accuracy@3
387
+ - type: dot_accuracy@5
388
+ value: 0.72
389
+ name: Dot Accuracy@5
390
+ - type: dot_accuracy@10
391
+ value: 0.78
392
+ name: Dot Accuracy@10
393
+ - type: dot_precision@1
394
+ value: 0.52
395
+ name: Dot Precision@1
396
+ - type: dot_precision@3
397
+ value: 0.2333333333333333
398
+ name: Dot Precision@3
399
+ - type: dot_precision@5
400
+ value: 0.14400000000000002
401
+ name: Dot Precision@5
402
+ - type: dot_precision@10
403
+ value: 0.08199999999999999
404
+ name: Dot Precision@10
405
+ - type: dot_recall@1
406
+ value: 0.5
407
+ name: Dot Recall@1
408
+ - type: dot_recall@3
409
+ value: 0.66
410
+ name: Dot Recall@3
411
+ - type: dot_recall@5
412
+ value: 0.68
413
+ name: Dot Recall@5
414
+ - type: dot_recall@10
415
+ value: 0.75
416
+ name: Dot Recall@10
417
+ - type: dot_ndcg@10
418
+ value: 0.6400441027431699
419
+ name: Dot Ndcg@10
420
+ - type: dot_mrr@10
421
+ value: 0.6203571428571428
422
+ name: Dot Mrr@10
423
+ - type: dot_map@100
424
+ value: 0.6045362504066882
425
+ name: Dot Map@100
426
+ - type: query_active_dims
427
+ value: 26.940000534057617
428
+ name: Query Active Dims
429
+ - type: query_sparsity_ratio
430
+ value: 0.999117357953802
431
+ name: Query Sparsity Ratio
432
+ - type: corpus_active_dims
433
+ value: 109.75908660888672
434
+ name: Corpus Active Dims
435
+ - type: corpus_sparsity_ratio
436
+ value: 0.99640393530539
437
+ name: Corpus Sparsity Ratio
438
+ - task:
439
+ type: sparse-nano-beir
440
+ name: Sparse Nano BEIR
441
+ dataset:
442
+ name: NanoBEIR mean
443
+ type: NanoBEIR_mean
444
+ metrics:
445
+ - type: dot_accuracy@1
446
+ value: 0.4266666666666667
447
+ name: Dot Accuracy@1
448
+ - type: dot_accuracy@3
449
+ value: 0.6333333333333334
450
+ name: Dot Accuracy@3
451
+ - type: dot_accuracy@5
452
+ value: 0.6733333333333332
453
+ name: Dot Accuracy@5
454
+ - type: dot_accuracy@10
455
+ value: 0.7533333333333333
456
+ name: Dot Accuracy@10
457
+ - type: dot_precision@1
458
+ value: 0.4266666666666667
459
+ name: Dot Precision@1
460
+ - type: dot_precision@3
461
+ value: 0.2733333333333333
462
+ name: Dot Precision@3
463
+ - type: dot_precision@5
464
+ value: 0.20533333333333334
465
+ name: Dot Precision@5
466
+ - type: dot_precision@10
467
+ value: 0.14466666666666664
468
+ name: Dot Precision@10
469
+ - type: dot_recall@1
470
+ value: 0.3005837807103531
471
+ name: Dot Recall@1
472
+ - type: dot_recall@3
473
+ value: 0.46468746873761174
474
+ name: Dot Recall@3
475
+ - type: dot_recall@5
476
+ value: 0.49786035073889795
477
+ name: Dot Recall@5
478
+ - type: dot_recall@10
479
+ value: 0.5765645515571538
480
+ name: Dot Recall@10
481
+ - type: dot_ndcg@10
482
+ value: 0.5257619161129395
483
+ name: Dot Ndcg@10
484
+ - type: dot_mrr@10
485
+ value: 0.5449920634920634
486
+ name: Dot Mrr@10
487
+ - type: dot_map@100
488
+ value: 0.4296909572869542
489
+ name: Dot Map@100
490
+ - type: query_active_dims
491
+ value: 23.11333401997884
492
+ name: Query Active Dims
493
+ - type: query_sparsity_ratio
494
+ value: 0.9992427319959382
495
+ name: Query Sparsity Ratio
496
+ - type: corpus_active_dims
497
+ value: 113.39544145649826
498
+ name: Corpus Active Dims
499
+ - type: corpus_sparsity_ratio
500
+ value: 0.9962847964924808
501
+ name: Corpus Sparsity Ratio
502
+ - type: dot_accuracy@1
503
+ value: 0.531773940345369
504
+ name: Dot Accuracy@1
505
+ - type: dot_accuracy@3
506
+ value: 0.6966718995290424
507
+ name: Dot Accuracy@3
508
+ - type: dot_accuracy@5
509
+ value: 0.7506122448979593
510
+ name: Dot Accuracy@5
511
+ - type: dot_accuracy@10
512
+ value: 0.8230455259026688
513
+ name: Dot Accuracy@10
514
+ - type: dot_precision@1
515
+ value: 0.531773940345369
516
+ name: Dot Precision@1
517
+ - type: dot_precision@3
518
+ value: 0.3291470434327577
519
+ name: Dot Precision@3
520
+ - type: dot_precision@5
521
+ value: 0.25535321821036105
522
+ name: Dot Precision@5
523
+ - type: dot_precision@10
524
+ value: 0.18231397174254316
525
+ name: Dot Precision@10
526
+ - type: dot_recall@1
527
+ value: 0.3098648701114875
528
+ name: Dot Recall@1
529
+ - type: dot_recall@3
530
+ value: 0.4649217800565723
531
+ name: Dot Recall@3
532
+ - type: dot_recall@5
533
+ value: 0.5185635370811593
534
+ name: Dot Recall@5
535
+ - type: dot_recall@10
536
+ value: 0.6012619916233037
537
+ name: Dot Recall@10
538
+ - type: dot_ndcg@10
539
+ value: 0.570153816546152
540
+ name: Dot Ndcg@10
541
+ - type: dot_mrr@10
542
+ value: 0.6274141187610573
543
+ name: Dot Mrr@10
544
+ - type: dot_map@100
545
+ value: 0.49205810209697926
546
+ name: Dot Map@100
547
+ - type: query_active_dims
548
+ value: 40.89984672205474
549
+ name: Query Active Dims
550
+ - type: query_sparsity_ratio
551
+ value: 0.9986599879849926
552
+ name: Query Sparsity Ratio
553
+ - type: corpus_active_dims
554
+ value: 112.74959253323884
555
+ name: Corpus Active Dims
556
+ - type: corpus_sparsity_ratio
557
+ value: 0.9963059566039827
558
+ name: Corpus Sparsity Ratio
559
+ - task:
560
+ type: sparse-information-retrieval
561
+ name: Sparse Information Retrieval
562
+ dataset:
563
+ name: NanoClimateFEVER
564
+ type: NanoClimateFEVER
565
+ metrics:
566
+ - type: dot_accuracy@1
567
+ value: 0.26
568
+ name: Dot Accuracy@1
569
+ - type: dot_accuracy@3
570
+ value: 0.46
571
+ name: Dot Accuracy@3
572
+ - type: dot_accuracy@5
573
+ value: 0.58
574
+ name: Dot Accuracy@5
575
+ - type: dot_accuracy@10
576
+ value: 0.7
577
+ name: Dot Accuracy@10
578
+ - type: dot_precision@1
579
+ value: 0.26
580
+ name: Dot Precision@1
581
+ - type: dot_precision@3
582
+ value: 0.16
583
+ name: Dot Precision@3
584
+ - type: dot_precision@5
585
+ value: 0.124
586
+ name: Dot Precision@5
587
+ - type: dot_precision@10
588
+ value: 0.094
589
+ name: Dot Precision@10
590
+ - type: dot_recall@1
591
+ value: 0.13166666666666665
592
+ name: Dot Recall@1
593
+ - type: dot_recall@3
594
+ value: 0.22899999999999998
595
+ name: Dot Recall@3
596
+ - type: dot_recall@5
597
+ value: 0.26966666666666667
598
+ name: Dot Recall@5
599
+ - type: dot_recall@10
600
+ value: 0.3686666666666667
601
+ name: Dot Recall@10
602
+ - type: dot_ndcg@10
603
+ value: 0.29349270164622415
604
+ name: Dot Ndcg@10
605
+ - type: dot_mrr@10
606
+ value: 0.3810238095238095
607
+ name: Dot Mrr@10
608
+ - type: dot_map@100
609
+ value: 0.22256540249654586
610
+ name: Dot Map@100
611
+ - type: query_active_dims
612
+ value: 52.70000076293945
613
+ name: Query Active Dims
614
+ - type: query_sparsity_ratio
615
+ value: 0.9982733765558306
616
+ name: Query Sparsity Ratio
617
+ - type: corpus_active_dims
618
+ value: 152.81748962402344
619
+ name: Corpus Active Dims
620
+ - type: corpus_sparsity_ratio
621
+ value: 0.9949932019650081
622
+ name: Corpus Sparsity Ratio
623
+ - task:
624
+ type: sparse-information-retrieval
625
+ name: Sparse Information Retrieval
626
+ dataset:
627
+ name: NanoDBPedia
628
+ type: NanoDBPedia
629
+ metrics:
630
+ - type: dot_accuracy@1
631
+ value: 0.76
632
+ name: Dot Accuracy@1
633
+ - type: dot_accuracy@3
634
+ value: 0.88
635
+ name: Dot Accuracy@3
636
+ - type: dot_accuracy@5
637
+ value: 0.88
638
+ name: Dot Accuracy@5
639
+ - type: dot_accuracy@10
640
+ value: 0.9
641
+ name: Dot Accuracy@10
642
+ - type: dot_precision@1
643
+ value: 0.76
644
+ name: Dot Precision@1
645
+ - type: dot_precision@3
646
+ value: 0.6066666666666667
647
+ name: Dot Precision@3
648
+ - type: dot_precision@5
649
+ value: 0.5840000000000001
650
+ name: Dot Precision@5
651
+ - type: dot_precision@10
652
+ value: 0.514
653
+ name: Dot Precision@10
654
+ - type: dot_recall@1
655
+ value: 0.08858465342637668
656
+ name: Dot Recall@1
657
+ - type: dot_recall@3
658
+ value: 0.17454996352915306
659
+ name: Dot Recall@3
660
+ - type: dot_recall@5
661
+ value: 0.2491328586940767
662
+ name: Dot Recall@5
663
+ - type: dot_recall@10
664
+ value: 0.358206868666866
665
+ name: Dot Recall@10
666
+ - type: dot_ndcg@10
667
+ value: 0.632129041524978
668
+ name: Dot Ndcg@10
669
+ - type: dot_mrr@10
670
+ value: 0.8133333333333332
671
+ name: Dot Mrr@10
672
+ - type: dot_map@100
673
+ value: 0.490256496112331
674
+ name: Dot Map@100
675
+ - type: query_active_dims
676
+ value: 22.979999542236328
677
+ name: Query Active Dims
678
+ - type: query_sparsity_ratio
679
+ value: 0.9992471004671307
680
+ name: Query Sparsity Ratio
681
+ - type: corpus_active_dims
682
+ value: 103.23821258544922
683
+ name: Corpus Active Dims
684
+ - type: corpus_sparsity_ratio
685
+ value: 0.9966175803490777
686
+ name: Corpus Sparsity Ratio
687
+ - task:
688
+ type: sparse-information-retrieval
689
+ name: Sparse Information Retrieval
690
+ dataset:
691
+ name: NanoFEVER
692
+ type: NanoFEVER
693
+ metrics:
694
+ - type: dot_accuracy@1
695
+ value: 0.8
696
+ name: Dot Accuracy@1
697
+ - type: dot_accuracy@3
698
+ value: 0.9
699
+ name: Dot Accuracy@3
700
+ - type: dot_accuracy@5
701
+ value: 0.96
702
+ name: Dot Accuracy@5
703
+ - type: dot_accuracy@10
704
+ value: 0.98
705
+ name: Dot Accuracy@10
706
+ - type: dot_precision@1
707
+ value: 0.8
708
+ name: Dot Precision@1
709
+ - type: dot_precision@3
710
+ value: 0.30666666666666664
711
+ name: Dot Precision@3
712
+ - type: dot_precision@5
713
+ value: 0.204
714
+ name: Dot Precision@5
715
+ - type: dot_precision@10
716
+ value: 0.10399999999999998
717
+ name: Dot Precision@10
718
+ - type: dot_recall@1
719
+ value: 0.7666666666666667
720
+ name: Dot Recall@1
721
+ - type: dot_recall@3
722
+ value: 0.8666666666666667
723
+ name: Dot Recall@3
724
+ - type: dot_recall@5
725
+ value: 0.9333333333333332
726
+ name: Dot Recall@5
727
+ - type: dot_recall@10
728
+ value: 0.9433333333333332
729
+ name: Dot Recall@10
730
+ - type: dot_ndcg@10
731
+ value: 0.8746461544855423
732
+ name: Dot Ndcg@10
733
+ - type: dot_mrr@10
734
+ value: 0.8663333333333334
735
+ name: Dot Mrr@10
736
+ - type: dot_map@100
737
+ value: 0.8475954415954416
738
+ name: Dot Map@100
739
+ - type: query_active_dims
740
+ value: 41.619998931884766
741
+ name: Query Active Dims
742
+ - type: query_sparsity_ratio
743
+ value: 0.9986363934561338
744
+ name: Query Sparsity Ratio
745
+ - type: corpus_active_dims
746
+ value: 154.98318481445312
747
+ name: Corpus Active Dims
748
+ - type: corpus_sparsity_ratio
749
+ value: 0.9949222467461355
750
+ name: Corpus Sparsity Ratio
751
+ - task:
752
+ type: sparse-information-retrieval
753
+ name: Sparse Information Retrieval
754
+ dataset:
755
+ name: NanoFiQA2018
756
+ type: NanoFiQA2018
757
+ metrics:
758
+ - type: dot_accuracy@1
759
+ value: 0.32
760
+ name: Dot Accuracy@1
761
+ - type: dot_accuracy@3
762
+ value: 0.54
763
+ name: Dot Accuracy@3
764
+ - type: dot_accuracy@5
765
+ value: 0.6
766
+ name: Dot Accuracy@5
767
+ - type: dot_accuracy@10
768
+ value: 0.7
769
+ name: Dot Accuracy@10
770
+ - type: dot_precision@1
771
+ value: 0.32
772
+ name: Dot Precision@1
773
+ - type: dot_precision@3
774
+ value: 0.24
775
+ name: Dot Precision@3
776
+ - type: dot_precision@5
777
+ value: 0.172
778
+ name: Dot Precision@5
779
+ - type: dot_precision@10
780
+ value: 0.10599999999999998
781
+ name: Dot Precision@10
782
+ - type: dot_recall@1
783
+ value: 0.1620793650793651
784
+ name: Dot Recall@1
785
+ - type: dot_recall@3
786
+ value: 0.36180158730158724
787
+ name: Dot Recall@3
788
+ - type: dot_recall@5
789
+ value: 0.4036349206349206
790
+ name: Dot Recall@5
791
+ - type: dot_recall@10
792
+ value: 0.4941031746031746
793
+ name: Dot Recall@10
794
+ - type: dot_ndcg@10
795
+ value: 0.38562756897614053
796
+ name: Dot Ndcg@10
797
+ - type: dot_mrr@10
798
+ value: 0.43427777777777765
799
+ name: Dot Mrr@10
800
+ - type: dot_map@100
801
+ value: 0.3209789709019607
802
+ name: Dot Map@100
803
+ - type: query_active_dims
804
+ value: 22.68000030517578
805
+ name: Query Active Dims
806
+ - type: query_sparsity_ratio
807
+ value: 0.9992569294179551
808
+ name: Query Sparsity Ratio
809
+ - type: corpus_active_dims
810
+ value: 89.73922729492188
811
+ name: Corpus Active Dims
812
+ - type: corpus_sparsity_ratio
813
+ value: 0.9970598510158272
814
+ name: Corpus Sparsity Ratio
815
+ - task:
816
+ type: sparse-information-retrieval
817
+ name: Sparse Information Retrieval
818
+ dataset:
819
+ name: NanoHotpotQA
820
+ type: NanoHotpotQA
821
+ metrics:
822
+ - type: dot_accuracy@1
823
+ value: 0.9
824
+ name: Dot Accuracy@1
825
+ - type: dot_accuracy@3
826
+ value: 0.96
827
+ name: Dot Accuracy@3
828
+ - type: dot_accuracy@5
829
+ value: 0.96
830
+ name: Dot Accuracy@5
831
+ - type: dot_accuracy@10
832
+ value: 0.96
833
+ name: Dot Accuracy@10
834
+ - type: dot_precision@1
835
+ value: 0.9
836
+ name: Dot Precision@1
837
+ - type: dot_precision@3
838
+ value: 0.5199999999999999
839
+ name: Dot Precision@3
840
+ - type: dot_precision@5
841
+ value: 0.324
842
+ name: Dot Precision@5
843
+ - type: dot_precision@10
844
+ value: 0.172
845
+ name: Dot Precision@10
846
+ - type: dot_recall@1
847
+ value: 0.45
848
+ name: Dot Recall@1
849
+ - type: dot_recall@3
850
+ value: 0.78
851
+ name: Dot Recall@3
852
+ - type: dot_recall@5
853
+ value: 0.81
854
+ name: Dot Recall@5
855
+ - type: dot_recall@10
856
+ value: 0.86
857
+ name: Dot Recall@10
858
+ - type: dot_ndcg@10
859
+ value: 0.8341414369684795
860
+ name: Dot Ndcg@10
861
+ - type: dot_mrr@10
862
+ value: 0.9266666666666665
863
+ name: Dot Mrr@10
864
+ - type: dot_map@100
865
+ value: 0.7830202707785705
866
+ name: Dot Map@100
867
+ - type: query_active_dims
868
+ value: 43.13999938964844
869
+ name: Query Active Dims
870
+ - type: query_sparsity_ratio
871
+ value: 0.9985865932969776
872
+ name: Query Sparsity Ratio
873
+ - type: corpus_active_dims
874
+ value: 108.24322509765625
875
+ name: Corpus Active Dims
876
+ - type: corpus_sparsity_ratio
877
+ value: 0.9964535998591949
878
+ name: Corpus Sparsity Ratio
879
+ - task:
880
+ type: sparse-information-retrieval
881
+ name: Sparse Information Retrieval
882
+ dataset:
883
+ name: NanoQuoraRetrieval
884
+ type: NanoQuoraRetrieval
885
+ metrics:
886
+ - type: dot_accuracy@1
887
+ value: 0.84
888
+ name: Dot Accuracy@1
889
+ - type: dot_accuracy@3
890
+ value: 0.94
891
+ name: Dot Accuracy@3
892
+ - type: dot_accuracy@5
893
+ value: 0.96
894
+ name: Dot Accuracy@5
895
+ - type: dot_accuracy@10
896
+ value: 0.96
897
+ name: Dot Accuracy@10
898
+ - type: dot_precision@1
899
+ value: 0.84
900
+ name: Dot Precision@1
901
+ - type: dot_precision@3
902
+ value: 0.35999999999999993
903
+ name: Dot Precision@3
904
+ - type: dot_precision@5
905
+ value: 0.22399999999999995
906
+ name: Dot Precision@5
907
+ - type: dot_precision@10
908
+ value: 0.12199999999999997
909
+ name: Dot Precision@10
910
+ - type: dot_recall@1
911
+ value: 0.774
912
+ name: Dot Recall@1
913
+ - type: dot_recall@3
914
+ value: 0.8853333333333333
915
+ name: Dot Recall@3
916
+ - type: dot_recall@5
917
+ value: 0.902
918
+ name: Dot Recall@5
919
+ - type: dot_recall@10
920
+ value: 0.93
921
+ name: Dot Recall@10
922
+ - type: dot_ndcg@10
923
+ value: 0.891220122907666
924
+ name: Dot Ndcg@10
925
+ - type: dot_mrr@10
926
+ value: 0.8916666666666666
927
+ name: Dot Mrr@10
928
+ - type: dot_map@100
929
+ value: 0.8739838721281753
930
+ name: Dot Map@100
931
+ - type: query_active_dims
932
+ value: 21.780000686645508
933
+ name: Query Active Dims
934
+ - type: query_sparsity_ratio
935
+ value: 0.999286416332919
936
+ name: Query Sparsity Ratio
937
+ - type: corpus_active_dims
938
+ value: 24.841062545776367
939
+ name: Corpus Active Dims
940
+ - type: corpus_sparsity_ratio
941
+ value: 0.9991861259895886
942
+ name: Corpus Sparsity Ratio
943
+ - task:
944
+ type: sparse-information-retrieval
945
+ name: Sparse Information Retrieval
946
+ dataset:
947
+ name: NanoSCIDOCS
948
+ type: NanoSCIDOCS
949
+ metrics:
950
+ - type: dot_accuracy@1
951
+ value: 0.44
952
+ name: Dot Accuracy@1
953
+ - type: dot_accuracy@3
954
+ value: 0.56
955
+ name: Dot Accuracy@3
956
+ - type: dot_accuracy@5
957
+ value: 0.62
958
+ name: Dot Accuracy@5
959
+ - type: dot_accuracy@10
960
+ value: 0.76
961
+ name: Dot Accuracy@10
962
+ - type: dot_precision@1
963
+ value: 0.44
964
+ name: Dot Precision@1
965
+ - type: dot_precision@3
966
+ value: 0.2733333333333333
967
+ name: Dot Precision@3
968
+ - type: dot_precision@5
969
+ value: 0.22399999999999998
970
+ name: Dot Precision@5
971
+ - type: dot_precision@10
972
+ value: 0.162
973
+ name: Dot Precision@10
974
+ - type: dot_recall@1
975
+ value: 0.09266666666666667
976
+ name: Dot Recall@1
977
+ - type: dot_recall@3
978
+ value: 0.1696666666666667
979
+ name: Dot Recall@3
980
+ - type: dot_recall@5
981
+ value: 0.22966666666666669
982
+ name: Dot Recall@5
983
+ - type: dot_recall@10
984
+ value: 0.3306666666666666
985
+ name: Dot Recall@10
986
+ - type: dot_ndcg@10
987
+ value: 0.32673772222029135
988
+ name: Dot Ndcg@10
989
+ - type: dot_mrr@10
990
+ value: 0.5299920634920635
991
+ name: Dot Mrr@10
992
+ - type: dot_map@100
993
+ value: 0.24677603468739176
994
+ name: Dot Map@100
995
+ - type: query_active_dims
996
+ value: 40.18000030517578
997
+ name: Query Active Dims
998
+ - type: query_sparsity_ratio
999
+ value: 0.9986835724950798
1000
+ name: Query Sparsity Ratio
1001
+ - type: corpus_active_dims
1002
+ value: 145.2197265625
1003
+ name: Corpus Active Dims
1004
+ - type: corpus_sparsity_ratio
1005
+ value: 0.995242129396419
1006
+ name: Corpus Sparsity Ratio
1007
+ - task:
1008
+ type: sparse-information-retrieval
1009
+ name: Sparse Information Retrieval
1010
+ dataset:
1011
+ name: NanoArguAna
1012
+ type: NanoArguAna
1013
+ metrics:
1014
+ - type: dot_accuracy@1
1015
+ value: 0.12
1016
+ name: Dot Accuracy@1
1017
+ - type: dot_accuracy@3
1018
+ value: 0.44
1019
+ name: Dot Accuracy@3
1020
+ - type: dot_accuracy@5
1021
+ value: 0.54
1022
+ name: Dot Accuracy@5
1023
+ - type: dot_accuracy@10
1024
+ value: 0.72
1025
+ name: Dot Accuracy@10
1026
+ - type: dot_precision@1
1027
+ value: 0.12
1028
+ name: Dot Precision@1
1029
+ - type: dot_precision@3
1030
+ value: 0.14666666666666667
1031
+ name: Dot Precision@3
1032
+ - type: dot_precision@5
1033
+ value: 0.10800000000000003
1034
+ name: Dot Precision@5
1035
+ - type: dot_precision@10
1036
+ value: 0.07200000000000001
1037
+ name: Dot Precision@10
1038
+ - type: dot_recall@1
1039
+ value: 0.12
1040
+ name: Dot Recall@1
1041
+ - type: dot_recall@3
1042
+ value: 0.44
1043
+ name: Dot Recall@3
1044
+ - type: dot_recall@5
1045
+ value: 0.54
1046
+ name: Dot Recall@5
1047
+ - type: dot_recall@10
1048
+ value: 0.72
1049
+ name: Dot Recall@10
1050
+ - type: dot_ndcg@10
1051
+ value: 0.40190838047249483
1052
+ name: Dot Ndcg@10
1053
+ - type: dot_mrr@10
1054
+ value: 0.30241269841269836
1055
+ name: Dot Mrr@10
1056
+ - type: dot_map@100
1057
+ value: 0.3152551920928219
1058
+ name: Dot Map@100
1059
+ - type: query_active_dims
1060
+ value: 142.10000610351562
1061
+ name: Query Active Dims
1062
+ - type: query_sparsity_ratio
1063
+ value: 0.9953443415862815
1064
+ name: Query Sparsity Ratio
1065
+ - type: corpus_active_dims
1066
+ value: 133.527099609375
1067
+ name: Corpus Active Dims
1068
+ - type: corpus_sparsity_ratio
1069
+ value: 0.995625217888429
1070
+ name: Corpus Sparsity Ratio
1071
+ - task:
1072
+ type: sparse-information-retrieval
1073
+ name: Sparse Information Retrieval
1074
+ dataset:
1075
+ name: NanoSciFact
1076
+ type: NanoSciFact
1077
+ metrics:
1078
+ - type: dot_accuracy@1
1079
+ value: 0.54
1080
+ name: Dot Accuracy@1
1081
+ - type: dot_accuracy@3
1082
+ value: 0.64
1083
+ name: Dot Accuracy@3
1084
+ - type: dot_accuracy@5
1085
+ value: 0.74
1086
+ name: Dot Accuracy@5
1087
+ - type: dot_accuracy@10
1088
+ value: 0.78
1089
+ name: Dot Accuracy@10
1090
+ - type: dot_precision@1
1091
+ value: 0.54
1092
+ name: Dot Precision@1
1093
+ - type: dot_precision@3
1094
+ value: 0.2333333333333333
1095
+ name: Dot Precision@3
1096
+ - type: dot_precision@5
1097
+ value: 0.16
1098
+ name: Dot Precision@5
1099
+ - type: dot_precision@10
1100
+ value: 0.08599999999999998
1101
+ name: Dot Precision@10
1102
+ - type: dot_recall@1
1103
+ value: 0.495
1104
+ name: Dot Recall@1
1105
+ - type: dot_recall@3
1106
+ value: 0.615
1107
+ name: Dot Recall@3
1108
+ - type: dot_recall@5
1109
+ value: 0.715
1110
+ name: Dot Recall@5
1111
+ - type: dot_recall@10
1112
+ value: 0.76
1113
+ name: Dot Recall@10
1114
+ - type: dot_ndcg@10
1115
+ value: 0.6395168665161247
1116
+ name: Dot Ndcg@10
1117
+ - type: dot_mrr@10
1118
+ value: 0.6164444444444444
1119
+ name: Dot Mrr@10
1120
+ - type: dot_map@100
1121
+ value: 0.6051482940863745
1122
+ name: Dot Map@100
1123
+ - type: query_active_dims
1124
+ value: 54.2400016784668
1125
+ name: Query Active Dims
1126
+ - type: query_sparsity_ratio
1127
+ value: 0.9982229211166219
1128
+ name: Query Sparsity Ratio
1129
+ - type: corpus_active_dims
1130
+ value: 172.6594696044922
1131
+ name: Corpus Active Dims
1132
+ - type: corpus_sparsity_ratio
1133
+ value: 0.9943431141601305
1134
+ name: Corpus Sparsity Ratio
1135
+ - task:
1136
+ type: sparse-information-retrieval
1137
+ name: Sparse Information Retrieval
1138
+ dataset:
1139
+ name: NanoTouche2020
1140
+ type: NanoTouche2020
1141
+ metrics:
1142
+ - type: dot_accuracy@1
1143
+ value: 0.6530612244897959
1144
+ name: Dot Accuracy@1
1145
+ - type: dot_accuracy@3
1146
+ value: 0.8367346938775511
1147
+ name: Dot Accuracy@3
1148
+ - type: dot_accuracy@5
1149
+ value: 0.8979591836734694
1150
+ name: Dot Accuracy@5
1151
+ - type: dot_accuracy@10
1152
+ value: 0.9795918367346939
1153
+ name: Dot Accuracy@10
1154
+ - type: dot_precision@1
1155
+ value: 0.6530612244897959
1156
+ name: Dot Precision@1
1157
+ - type: dot_precision@3
1158
+ value: 0.6122448979591837
1159
+ name: Dot Precision@3
1160
+ - type: dot_precision@5
1161
+ value: 0.5795918367346938
1162
+ name: Dot Precision@5
1163
+ - type: dot_precision@10
1164
+ value: 0.5040816326530612
1165
+ name: Dot Precision@10
1166
+ - type: dot_recall@1
1167
+ value: 0.045827950812536176
1168
+ name: Dot Recall@1
1169
+ - type: dot_recall@3
1170
+ value: 0.1279025170251966
1171
+ name: Dot Recall@3
1172
+ - type: dot_recall@5
1173
+ value: 0.19531048384271374
1174
+ name: Dot Recall@5
1175
+ - type: dot_recall@10
1176
+ value: 0.32173552649478127
1177
+ name: Dot Recall@10
1178
+ - type: dot_ndcg@10
1179
+ value: 0.5552938710432158
1180
+ name: Dot Ndcg@10
1181
+ - type: dot_mrr@10
1182
+ value: 0.7592565597667638
1183
+ name: Dot Mrr@10
1184
+ - type: dot_map@100
1185
+ value: 0.4021024805202534
1186
+ name: Dot Map@100
1187
+ - type: query_active_dims
1188
+ value: 20.53061294555664
1189
+ name: Query Active Dims
1190
+ - type: query_sparsity_ratio
1191
+ value: 0.9993273503392452
1192
+ name: Query Sparsity Ratio
1193
+ - type: corpus_active_dims
1194
+ value: 106.17720031738281
1195
+ name: Corpus Active Dims
1196
+ - type: corpus_sparsity_ratio
1197
+ value: 0.9965212895512292
1198
+ name: Corpus Sparsity Ratio
1199
+ ---
1200
+
1201
+ # splade-distilbert-base-uncased trained on MS MARCO triplets
1202
+
1203
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
1204
+ ## Model Details
1205
+
1206
+ ### Model Description
1207
+ - **Model Type:** SPLADE Sparse Encoder
1208
+ - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 12040accade4e8a0f71eabdb258fecc2e7e948be -->
1209
+ - **Maximum Sequence Length:** 256 tokens
1210
+ - **Output Dimensionality:** 30522 dimensions
1211
+ - **Similarity Function:** Dot Product
1212
+ - **Training Dataset:**
1213
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
1214
+ - **Language:** en
1215
+ - **License:** apache-2.0
1216
+
1217
+ ### Model Sources
1218
+
1219
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
1220
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
1221
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
1222
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
1223
+
1224
+ ### Full Model Architecture
1225
+
1226
+ ```
1227
+ SparseEncoder(
1228
+ (0): MLMTransformer({'max_seq_length': 256, 'do_lower_case': False}) with MLMTransformer model: DistilBertForMaskedLM
1229
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
1230
+ )
1231
+ ```
1232
+
1233
+ ## Usage
1234
+
1235
+ ### Direct Usage (Sentence Transformers)
1236
+
1237
+ First install the Sentence Transformers library:
1238
+
1239
+ ```bash
1240
+ pip install -U sentence-transformers
1241
+ ```
1242
+
1243
+ Then you can load this model and run inference.
1244
+ ```python
1245
+ from sentence_transformers import SparseEncoder
1246
+
1247
+ # Download from the 🤗 Hub
1248
+ model = SparseEncoder("tomaarsen/splade-distilbert-base-uncased-msmarco-mrl")
1249
+ # Run inference
1250
+ queries = [
1251
+ "canine ultrasound cost",
1252
+ ]
1253
+ documents = [
1254
+ 'VetInfo indicates that any type of canine ultrasound costs anywhere from $300 to $500 depending on what region of the country you live in and whether or not a veterinarian or a technician performs the procedure.',
1255
+ '1 Lymphoma, the most common canine cancer, usually requires only chemotherapy and its cost can come up to be around $450 to $500. 2 Osteosarcoma, another type of canine cancer, is usually treated with chemotherapy along with amputation surgery.3 This type of chemotherapy treatment costs approximately $450.nother factor is the type of drugs used in the process. The size of the dog that needs to undergo chemotherapy can also impact the cost. Even a dog very small in size with a single cancerous lesion can cost $200 for chemotherapy, while the same problem on a larger breed could cost more than $1,000 a month.',
1256
+ 'Plant Life. There are many different plants in the rain forest. Some of the plants include vines, bromeliads, the passion fruit plant and the Victorian water lily. Vines in the rainforest can be as thick as the average human average human body and some can grow to be 3,000 ft long.',
1257
+ ]
1258
+ query_embeddings = model.encode_query(queries)
1259
+ document_embeddings = model.encode_document(documents)
1260
+ print(query_embeddings.shape, document_embeddings.shape)
1261
+ # [1, 30522] [3, 30522]
1262
+
1263
+ # Get the similarity scores for the embeddings
1264
+ similarities = model.similarity(query_embeddings, document_embeddings)
1265
+ print(similarities)
1266
+ # tensor([[33.7473, 25.6638, 0.2965]])
1267
+ ```
1268
+
1269
+ <!--
1270
+ ### Direct Usage (Transformers)
1271
+
1272
+ <details><summary>Click to see the direct usage in Transformers</summary>
1273
+
1274
+ </details>
1275
+ -->
1276
+
1277
+ <!--
1278
+ ### Downstream Usage (Sentence Transformers)
1279
+
1280
+ You can finetune this model on your own dataset.
1281
+
1282
+ <details><summary>Click to expand</summary>
1283
+
1284
+ </details>
1285
+ -->
1286
+
1287
+ <!--
1288
+ ### Out-of-Scope Use
1289
+
1290
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
1291
+ -->
1292
+
1293
+ ## Evaluation
1294
+
1295
+ ### Metrics
1296
+
1297
+ #### Sparse Information Retrieval
1298
+
1299
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoClimateFEVER`, `NanoDBPedia`, `NanoFEVER`, `NanoFiQA2018`, `NanoHotpotQA`, `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoQuoraRetrieval`, `NanoSCIDOCS`, `NanoArguAna`, `NanoSciFact` and `NanoTouche2020`
1300
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
1301
+
1302
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
1303
+ |:----------------------|:------------|:-------------|:---------|:-----------------|:------------|:-----------|:-------------|:-------------|:-------------------|:------------|:------------|:------------|:---------------|
1304
+ | dot_accuracy@1 | 0.38 | 0.38 | 0.52 | 0.26 | 0.76 | 0.8 | 0.32 | 0.9 | 0.84 | 0.44 | 0.12 | 0.54 | 0.6531 |
1305
+ | dot_accuracy@3 | 0.66 | 0.54 | 0.7 | 0.46 | 0.88 | 0.9 | 0.54 | 0.96 | 0.94 | 0.56 | 0.44 | 0.64 | 0.8367 |
1306
+ | dot_accuracy@5 | 0.72 | 0.58 | 0.72 | 0.58 | 0.88 | 0.96 | 0.6 | 0.96 | 0.96 | 0.62 | 0.54 | 0.74 | 0.898 |
1307
+ | dot_accuracy@10 | 0.86 | 0.62 | 0.78 | 0.7 | 0.9 | 0.98 | 0.7 | 0.96 | 0.96 | 0.76 | 0.72 | 0.78 | 0.9796 |
1308
+ | dot_precision@1 | 0.38 | 0.38 | 0.52 | 0.26 | 0.76 | 0.8 | 0.32 | 0.9 | 0.84 | 0.44 | 0.12 | 0.54 | 0.6531 |
1309
+ | dot_precision@3 | 0.22 | 0.3667 | 0.2333 | 0.16 | 0.6067 | 0.3067 | 0.24 | 0.52 | 0.36 | 0.2733 | 0.1467 | 0.2333 | 0.6122 |
1310
+ | dot_precision@5 | 0.144 | 0.328 | 0.144 | 0.124 | 0.584 | 0.204 | 0.172 | 0.324 | 0.224 | 0.224 | 0.108 | 0.16 | 0.5796 |
1311
+ | dot_precision@10 | 0.086 | 0.266 | 0.082 | 0.094 | 0.514 | 0.104 | 0.106 | 0.172 | 0.122 | 0.162 | 0.072 | 0.086 | 0.5041 |
1312
+ | dot_recall@1 | 0.38 | 0.0218 | 0.5 | 0.1317 | 0.0886 | 0.7667 | 0.1621 | 0.45 | 0.774 | 0.0927 | 0.12 | 0.495 | 0.0458 |
1313
+ | dot_recall@3 | 0.66 | 0.0741 | 0.66 | 0.229 | 0.1745 | 0.8667 | 0.3618 | 0.78 | 0.8853 | 0.1697 | 0.44 | 0.615 | 0.1279 |
1314
+ | dot_recall@5 | 0.72 | 0.0936 | 0.68 | 0.2697 | 0.2491 | 0.9333 | 0.4036 | 0.81 | 0.902 | 0.2297 | 0.54 | 0.715 | 0.1953 |
1315
+ | dot_recall@10 | 0.86 | 0.1197 | 0.75 | 0.3687 | 0.3582 | 0.9433 | 0.4941 | 0.86 | 0.93 | 0.3307 | 0.72 | 0.76 | 0.3217 |
1316
+ | **dot_ndcg@10** | **0.6227** | **0.3145** | **0.64** | **0.2935** | **0.6321** | **0.8746** | **0.3856** | **0.8341** | **0.8912** | **0.3267** | **0.4019** | **0.6395** | **0.5553** |
1317
+ | dot_mrr@10 | 0.5469 | 0.4677 | 0.6204 | 0.381 | 0.8133 | 0.8663 | 0.4343 | 0.9267 | 0.8917 | 0.53 | 0.3024 | 0.6164 | 0.7593 |
1318
+ | dot_map@100 | 0.5547 | 0.1298 | 0.6045 | 0.2226 | 0.4903 | 0.8476 | 0.321 | 0.783 | 0.874 | 0.2468 | 0.3153 | 0.6051 | 0.4021 |
1319
+ | query_active_dims | 23.62 | 18.78 | 26.94 | 52.7 | 22.98 | 41.62 | 22.68 | 43.14 | 21.78 | 40.18 | 142.1 | 54.24 | 20.5306 |
1320
+ | query_sparsity_ratio | 0.9992 | 0.9994 | 0.9991 | 0.9983 | 0.9992 | 0.9986 | 0.9993 | 0.9986 | 0.9993 | 0.9987 | 0.9953 | 0.9982 | 0.9993 |
1321
+ | corpus_active_dims | 86.9286 | 164.7944 | 109.7591 | 152.8175 | 103.2382 | 154.9832 | 89.7392 | 108.2432 | 24.8411 | 145.2197 | 133.5271 | 172.6595 | 106.1772 |
1322
+ | corpus_sparsity_ratio | 0.9972 | 0.9946 | 0.9964 | 0.995 | 0.9966 | 0.9949 | 0.9971 | 0.9965 | 0.9992 | 0.9952 | 0.9956 | 0.9943 | 0.9965 |
1323
+
1324
+ #### Sparse Nano BEIR
1325
+
1326
+ * Dataset: `NanoBEIR_mean`
1327
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1328
+ ```json
1329
+ {
1330
+ "dataset_names": [
1331
+ "msmarco",
1332
+ "nfcorpus",
1333
+ "nq"
1334
+ ]
1335
+ }
1336
+ ```
1337
+
1338
+ | Metric | Value |
1339
+ |:----------------------|:-----------|
1340
+ | dot_accuracy@1 | 0.4267 |
1341
+ | dot_accuracy@3 | 0.6333 |
1342
+ | dot_accuracy@5 | 0.6733 |
1343
+ | dot_accuracy@10 | 0.7533 |
1344
+ | dot_precision@1 | 0.4267 |
1345
+ | dot_precision@3 | 0.2733 |
1346
+ | dot_precision@5 | 0.2053 |
1347
+ | dot_precision@10 | 0.1447 |
1348
+ | dot_recall@1 | 0.3006 |
1349
+ | dot_recall@3 | 0.4647 |
1350
+ | dot_recall@5 | 0.4979 |
1351
+ | dot_recall@10 | 0.5766 |
1352
+ | **dot_ndcg@10** | **0.5258** |
1353
+ | dot_mrr@10 | 0.545 |
1354
+ | dot_map@100 | 0.4297 |
1355
+ | query_active_dims | 23.1133 |
1356
+ | query_sparsity_ratio | 0.9992 |
1357
+ | corpus_active_dims | 113.3954 |
1358
+ | corpus_sparsity_ratio | 0.9963 |
1359
+
1360
+ #### Sparse Nano BEIR
1361
+
1362
+ * Dataset: `NanoBEIR_mean`
1363
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1364
+ ```json
1365
+ {
1366
+ "dataset_names": [
1367
+ "climatefever",
1368
+ "dbpedia",
1369
+ "fever",
1370
+ "fiqa2018",
1371
+ "hotpotqa",
1372
+ "msmarco",
1373
+ "nfcorpus",
1374
+ "nq",
1375
+ "quoraretrieval",
1376
+ "scidocs",
1377
+ "arguana",
1378
+ "scifact",
1379
+ "touche2020"
1380
+ ]
1381
+ }
1382
+ ```
1383
+
1384
+ | Metric | Value |
1385
+ |:----------------------|:-----------|
1386
+ | dot_accuracy@1 | 0.5318 |
1387
+ | dot_accuracy@3 | 0.6967 |
1388
+ | dot_accuracy@5 | 0.7506 |
1389
+ | dot_accuracy@10 | 0.823 |
1390
+ | dot_precision@1 | 0.5318 |
1391
+ | dot_precision@3 | 0.3291 |
1392
+ | dot_precision@5 | 0.2554 |
1393
+ | dot_precision@10 | 0.1823 |
1394
+ | dot_recall@1 | 0.3099 |
1395
+ | dot_recall@3 | 0.4649 |
1396
+ | dot_recall@5 | 0.5186 |
1397
+ | dot_recall@10 | 0.6013 |
1398
+ | **dot_ndcg@10** | **0.5702** |
1399
+ | dot_mrr@10 | 0.6274 |
1400
+ | dot_map@100 | 0.4921 |
1401
+ | query_active_dims | 40.8998 |
1402
+ | query_sparsity_ratio | 0.9987 |
1403
+ | corpus_active_dims | 112.7496 |
1404
+ | corpus_sparsity_ratio | 0.9963 |
1405
+
1406
+ <!--
1407
+ ## Bias, Risks and Limitations
1408
+
1409
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
1410
+ -->
1411
+
1412
+ <!--
1413
+ ### Recommendations
1414
+
1415
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
1416
+ -->
1417
+
1418
+ ## Training Details
1419
+
1420
+ ### Training Dataset
1421
+
1422
+ #### msmarco
1423
+
1424
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
1425
+ * Size: 90,000 training samples
1426
+ * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
1427
+ * Approximate statistics based on the first 1000 samples:
1428
+ | | query | positive | negative |
1429
+ |:--------|:--------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
1430
+ | type | string | string | string |
1431
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.1 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 24 tokens</li><li>mean: 79.91 tokens</li><li>max: 218 tokens</li></ul> | <ul><li>min: 22 tokens</li><li>mean: 77.09 tokens</li><li>max: 256 tokens</li></ul> |
1432
+ * Samples:
1433
+ | query | positive | negative |
1434
+ |:---------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1435
+ | <code>when do manga complete editions</code> | <code>Volumes 9 and 10 the Sailor Moon Manga Complete Editions are now out. Last week, March 25th, volumes 9 and 10 of the Sailor Moon Manga Complete Editions were released in Japan. Volume 9 features Endymion and Serenity on the cover while volume 10 features all 10 Sailor Guardians.</code> | <code>Destiny: The Taken King will be released in standard download, collector’s edition download, and both “Collector’s” and “Legendary” game disc editions on September 15, 2015.</code> |
1436
+ | <code>the define of homograph</code> | <code>LINK / CITE ADD TO WORD LIST. noun. The definition of a homograph is a word that is spelled like another word or other words, but has a different meaning and sometimes sounds different. An example of a homograph is evening, which is the time of day after the sun has set or making something level or flat.</code> | <code>As verbs the difference between describe and define. is that describe is to represent in words while define is to determine. As a noun define is. (computing|programming) a kind of macro in source code that replaces one text string with another wherever it occurs.</code> |
1437
+ | <code>what is a cv in resume writing</code> | <code>Curriculum Vitae (CV) is Latin for “course of life.” In contrast, resume is French for “summary.” Both CVs & Resumes: 1 Are tailored for the specific job/company you are applying to. 2 Should represent you as the best qualified candidate. 3 Are used to get you an interview. Do not usually include personal interests.</code> | <code>Resume Samples » Resume Objective » Legal Resume Objective » Legal Assistant Resume Objective. Job Description: Legal assistant is responsible to manage and handle various activities of legal department. Preparing legal documents such as contracts, wills and appeals.</code> |
1438
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
1439
+ ```json
1440
+ {
1441
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
1442
+ "lambda_corpus": 0.001,
1443
+ "lambda_query": 5e-05
1444
+ }
1445
+ ```
1446
+
1447
+ ### Evaluation Dataset
1448
+
1449
+ #### msmarco
1450
+
1451
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
1452
+ * Size: 10,000 evaluation samples
1453
+ * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
1454
+ * Approximate statistics based on the first 1000 samples:
1455
+ | | query | positive | negative |
1456
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
1457
+ | type | string | string | string |
1458
+ | details | <ul><li>min: 5 tokens</li><li>mean: 9.24 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 80.9 tokens</li><li>max: 204 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 78.24 tokens</li><li>max: 234 tokens</li></ul> |
1459
+ * Samples:
1460
+ | query | positive | negative |
1461
+ |:--------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1462
+ | <code>the largest vietnamese population in the united states is in</code> | <code>The largest number of Vietnamese outside Vietnam is in Orange County, California (184,153, or 6.1 percent of the county's population), followed by Los Angeles and Santa Clara counties; the three counties accounted for 26 percent of the Vietnamese immigrant population in the United States.</code> | <code>Population by Place in the United States There are 29,257 places in the United States. This section compares Hibbing to the 50 most populous places in the United States. The least populous of the compared places has a population of 371,267.</code> |
1463
+ | <code>how many calories in a tablespoon of flaxseed</code> | <code>Calorie Content. A 2-tablespoon serving of ground flaxseed has about 75 calories, according to the U.S. Department of Agriculture. These calories consist of 2.6 grams of protein, 4 grams of carbohydrates -- almost all of which is fiber -- and 6 grams of fat.</code> | <code>You can also use flaxseed meal to replace an egg. Use 1 tablespoon flaxseed meal and 3 tablespoons water. You can replace up to two eggs in a recipe in this manner, but do not use flaxseed meal as an egg replacement if you are already using it as an oil replacement. Flaxseed has many health benefits.</code> |
1464
+ | <code>who wrote the house of seven gables</code> | <code>The author of The House Of Seven Gables is Nathaniel Hawthorne.</code> | <code>Abigail Adams Wrote To John In 1776: Remember The Ladies Or We'll Rebel Adams wrote a feminist letter to her husband just before U.S. independence.</code> |
1465
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
1466
+ ```json
1467
+ {
1468
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
1469
+ "lambda_corpus": 0.001,
1470
+ "lambda_query": 5e-05
1471
+ }
1472
+ ```
1473
+
1474
+ ### Training Hyperparameters
1475
+ #### Non-Default Hyperparameters
1476
+
1477
+ - `eval_strategy`: steps
1478
+ - `per_device_train_batch_size`: 16
1479
+ - `per_device_eval_batch_size`: 16
1480
+ - `learning_rate`: 2e-05
1481
+ - `num_train_epochs`: 1
1482
+ - `warmup_ratio`: 0.1
1483
+ - `bf16`: True
1484
+ - `load_best_model_at_end`: True
1485
+ - `batch_sampler`: no_duplicates
1486
+
1487
+ #### All Hyperparameters
1488
+ <details><summary>Click to expand</summary>
1489
+
1490
+ - `overwrite_output_dir`: False
1491
+ - `do_predict`: False
1492
+ - `eval_strategy`: steps
1493
+ - `prediction_loss_only`: True
1494
+ - `per_device_train_batch_size`: 16
1495
+ - `per_device_eval_batch_size`: 16
1496
+ - `per_gpu_train_batch_size`: None
1497
+ - `per_gpu_eval_batch_size`: None
1498
+ - `gradient_accumulation_steps`: 1
1499
+ - `eval_accumulation_steps`: None
1500
+ - `torch_empty_cache_steps`: None
1501
+ - `learning_rate`: 2e-05
1502
+ - `weight_decay`: 0.0
1503
+ - `adam_beta1`: 0.9
1504
+ - `adam_beta2`: 0.999
1505
+ - `adam_epsilon`: 1e-08
1506
+ - `max_grad_norm`: 1.0
1507
+ - `num_train_epochs`: 1
1508
+ - `max_steps`: -1
1509
+ - `lr_scheduler_type`: linear
1510
+ - `lr_scheduler_kwargs`: {}
1511
+ - `warmup_ratio`: 0.1
1512
+ - `warmup_steps`: 0
1513
+ - `log_level`: passive
1514
+ - `log_level_replica`: warning
1515
+ - `log_on_each_node`: True
1516
+ - `logging_nan_inf_filter`: True
1517
+ - `save_safetensors`: True
1518
+ - `save_on_each_node`: False
1519
+ - `save_only_model`: False
1520
+ - `restore_callback_states_from_checkpoint`: False
1521
+ - `no_cuda`: False
1522
+ - `use_cpu`: False
1523
+ - `use_mps_device`: False
1524
+ - `seed`: 42
1525
+ - `data_seed`: None
1526
+ - `jit_mode_eval`: False
1527
+ - `use_ipex`: False
1528
+ - `bf16`: True
1529
+ - `fp16`: False
1530
+ - `fp16_opt_level`: O1
1531
+ - `half_precision_backend`: auto
1532
+ - `bf16_full_eval`: False
1533
+ - `fp16_full_eval`: False
1534
+ - `tf32`: None
1535
+ - `local_rank`: 0
1536
+ - `ddp_backend`: None
1537
+ - `tpu_num_cores`: None
1538
+ - `tpu_metrics_debug`: False
1539
+ - `debug`: []
1540
+ - `dataloader_drop_last`: False
1541
+ - `dataloader_num_workers`: 0
1542
+ - `dataloader_prefetch_factor`: None
1543
+ - `past_index`: -1
1544
+ - `disable_tqdm`: False
1545
+ - `remove_unused_columns`: True
1546
+ - `label_names`: None
1547
+ - `load_best_model_at_end`: True
1548
+ - `ignore_data_skip`: False
1549
+ - `fsdp`: []
1550
+ - `fsdp_min_num_params`: 0
1551
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
1552
+ - `fsdp_transformer_layer_cls_to_wrap`: None
1553
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
1554
+ - `deepspeed`: None
1555
+ - `label_smoothing_factor`: 0.0
1556
+ - `optim`: adamw_torch
1557
+ - `optim_args`: None
1558
+ - `adafactor`: False
1559
+ - `group_by_length`: False
1560
+ - `length_column_name`: length
1561
+ - `ddp_find_unused_parameters`: None
1562
+ - `ddp_bucket_cap_mb`: None
1563
+ - `ddp_broadcast_buffers`: False
1564
+ - `dataloader_pin_memory`: True
1565
+ - `dataloader_persistent_workers`: False
1566
+ - `skip_memory_metrics`: True
1567
+ - `use_legacy_prediction_loop`: False
1568
+ - `push_to_hub`: False
1569
+ - `resume_from_checkpoint`: None
1570
+ - `hub_model_id`: None
1571
+ - `hub_strategy`: every_save
1572
+ - `hub_private_repo`: None
1573
+ - `hub_always_push`: False
1574
+ - `gradient_checkpointing`: False
1575
+ - `gradient_checkpointing_kwargs`: None
1576
+ - `include_inputs_for_metrics`: False
1577
+ - `include_for_metrics`: []
1578
+ - `eval_do_concat_batches`: True
1579
+ - `fp16_backend`: auto
1580
+ - `push_to_hub_model_id`: None
1581
+ - `push_to_hub_organization`: None
1582
+ - `mp_parameters`:
1583
+ - `auto_find_batch_size`: False
1584
+ - `full_determinism`: False
1585
+ - `torchdynamo`: None
1586
+ - `ray_scope`: last
1587
+ - `ddp_timeout`: 1800
1588
+ - `torch_compile`: False
1589
+ - `torch_compile_backend`: None
1590
+ - `torch_compile_mode`: None
1591
+ - `include_tokens_per_second`: False
1592
+ - `include_num_input_tokens_seen`: False
1593
+ - `neftune_noise_alpha`: None
1594
+ - `optim_target_modules`: None
1595
+ - `batch_eval_metrics`: False
1596
+ - `eval_on_start`: False
1597
+ - `use_liger_kernel`: False
1598
+ - `eval_use_gather_object`: False
1599
+ - `average_tokens_across_devices`: False
1600
+ - `prompts`: None
1601
+ - `batch_sampler`: no_duplicates
1602
+ - `multi_dataset_batch_sampler`: proportional
1603
+ - `router_mapping`: {}
1604
+ - `learning_rate_mapping`: {}
1605
+
1606
+ </details>
1607
+
1608
+ ### Training Logs
1609
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 | NanoClimateFEVER_dot_ndcg@10 | NanoDBPedia_dot_ndcg@10 | NanoFEVER_dot_ndcg@10 | NanoFiQA2018_dot_ndcg@10 | NanoHotpotQA_dot_ndcg@10 | NanoQuoraRetrieval_dot_ndcg@10 | NanoSCIDOCS_dot_ndcg@10 | NanoArguAna_dot_ndcg@10 | NanoSciFact_dot_ndcg@10 | NanoTouche2020_dot_ndcg@10 |
1610
+ |:----------:|:--------:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|:----------------------------:|:-----------------------:|:---------------------:|:------------------------:|:------------------------:|:------------------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:--------------------------:|
1611
+ | 0.0178 | 100 | 173.8874 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1612
+ | 0.0356 | 200 | 11.8803 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1613
+ | 0.0533 | 300 | 1.0264 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1614
+ | 0.0711 | 400 | 0.3923 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1615
+ | 0.0889 | 500 | 0.32 | 0.2369 | 0.4932 | 0.3001 | 0.5819 | 0.4584 | - | - | - | - | - | - | - | - | - | - |
1616
+ | 0.1067 | 600 | 0.2483 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1617
+ | 0.1244 | 700 | 0.28 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1618
+ | 0.1422 | 800 | 0.2095 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1619
+ | 0.16 | 900 | 0.2093 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1620
+ | 0.1778 | 1000 | 0.1636 | 0.1898 | 0.6051 | 0.2845 | 0.6124 | 0.5006 | - | - | - | - | - | - | - | - | - | - |
1621
+ | 0.1956 | 1100 | 0.1661 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1622
+ | 0.2133 | 1200 | 0.1964 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1623
+ | 0.2311 | 1300 | 0.1937 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1624
+ | 0.2489 | 1400 | 0.1771 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1625
+ | 0.2667 | 1500 | 0.1643 | 0.1549 | 0.5868 | 0.3176 | 0.5769 | 0.4938 | - | - | - | - | - | - | - | - | - | - |
1626
+ | 0.2844 | 1600 | 0.1987 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1627
+ | 0.3022 | 1700 | 0.178 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1628
+ | 0.32 | 1800 | 0.1227 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1629
+ | 0.3378 | 1900 | 0.1478 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1630
+ | 0.3556 | 2000 | 0.1502 | 0.1563 | 0.6249 | 0.3309 | 0.6088 | 0.5215 | - | - | - | - | - | - | - | - | - | - |
1631
+ | 0.3733 | 2100 | 0.1623 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1632
+ | 0.3911 | 2200 | 0.1703 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1633
+ | 0.4089 | 2300 | 0.1804 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1634
+ | 0.4267 | 2400 | 0.121 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1635
+ | 0.4444 | 2500 | 0.1451 | 0.1325 | 0.5620 | 0.3233 | 0.6197 | 0.5017 | - | - | - | - | - | - | - | - | - | - |
1636
+ | 0.4622 | 2600 | 0.1609 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1637
+ | 0.48 | 2700 | 0.1415 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1638
+ | 0.4978 | 2800 | 0.1555 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1639
+ | 0.5156 | 2900 | 0.1581 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1640
+ | 0.5333 | 3000 | 0.1351 | 0.1546 | 0.5901 | 0.3187 | 0.6299 | 0.5129 | - | - | - | - | - | - | - | - | - | - |
1641
+ | 0.5511 | 3100 | 0.1308 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1642
+ | 0.5689 | 3200 | 0.1313 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1643
+ | 0.5867 | 3300 | 0.1248 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1644
+ | 0.6044 | 3400 | 0.1295 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1645
+ | 0.6222 | 3500 | 0.1398 | 0.1449 | 0.6096 | 0.3285 | 0.5975 | 0.5119 | - | - | - | - | - | - | - | - | - | - |
1646
+ | 0.64 | 3600 | 0.1105 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1647
+ | 0.6578 | 3700 | 0.0911 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1648
+ | 0.6756 | 3800 | 0.1683 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1649
+ | 0.6933 | 3900 | 0.1202 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1650
+ | 0.7111 | 4000 | 0.135 | 0.1592 | 0.5989 | 0.3109 | 0.6460 | 0.5186 | - | - | - | - | - | - | - | - | - | - |
1651
+ | 0.7289 | 4100 | 0.1205 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1652
+ | 0.7467 | 4200 | 0.1432 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1653
+ | 0.7644 | 4300 | 0.105 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1654
+ | 0.7822 | 4400 | 0.1028 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1655
+ | 0.8 | 4500 | 0.1386 | 0.1383 | 0.5859 | 0.3084 | 0.6276 | 0.5073 | - | - | - | - | - | - | - | - | - | - |
1656
+ | 0.8178 | 4600 | 0.1068 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1657
+ | 0.8356 | 4700 | 0.1262 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1658
+ | 0.8533 | 4800 | 0.1182 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1659
+ | 0.8711 | 4900 | 0.1331 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1660
+ | 0.8889 | 5000 | 0.1436 | 0.1279 | 0.6261 | 0.3136 | 0.6314 | 0.5237 | - | - | - | - | - | - | - | - | - | - |
1661
+ | 0.9067 | 5100 | 0.1182 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1662
+ | 0.9244 | 5200 | 0.1379 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1663
+ | 0.9422 | 5300 | 0.1343 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1664
+ | 0.96 | 5400 | 0.1475 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1665
+ | **0.9778** | **5500** | **0.0988** | **0.1311** | **0.6227** | **0.3145** | **0.64** | **0.5258** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** |
1666
+ | 0.9956 | 5600 | 0.1072 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
1667
+ | -1 | -1 | - | - | 0.6227 | 0.3145 | 0.6400 | 0.5702 | 0.2935 | 0.6321 | 0.8746 | 0.3856 | 0.8341 | 0.8912 | 0.3267 | 0.4019 | 0.6395 | 0.5553 |
1668
+
1669
+ * The bold row denotes the saved checkpoint.
1670
+
1671
+ ### Environmental Impact
1672
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
1673
+ - **Energy Consumed**: 0.138 kWh
1674
+ - **Carbon Emitted**: 0.054 kg of CO2
1675
+ - **Hours Used**: 0.458 hours
1676
+
1677
+ ### Training Hardware
1678
+ - **On Cloud**: No
1679
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
1680
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
1681
+ - **RAM Size**: 31.78 GB
1682
+
1683
+ ### Framework Versions
1684
+ - Python: 3.11.6
1685
+ - Sentence Transformers: 4.2.0.dev0
1686
+ - Transformers: 4.52.4
1687
+ - PyTorch: 2.7.1+cu126
1688
+ - Accelerate: 1.5.1
1689
+ - Datasets: 2.21.0
1690
+ - Tokenizers: 0.21.1
1691
+
1692
+ ## Citation
1693
+
1694
+ ### BibTeX
1695
+
1696
+ #### Sentence Transformers
1697
+ ```bibtex
1698
+ @inproceedings{reimers-2019-sentence-bert,
1699
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1700
+ author = "Reimers, Nils and Gurevych, Iryna",
1701
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1702
+ month = "11",
1703
+ year = "2019",
1704
+ publisher = "Association for Computational Linguistics",
1705
+ url = "https://arxiv.org/abs/1908.10084",
1706
+ }
1707
+ ```
1708
+
1709
+ #### SpladeLoss
1710
+ ```bibtex
1711
+ @misc{formal2022distillationhardnegativesampling,
1712
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
1713
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
1714
+ year={2022},
1715
+ eprint={2205.04733},
1716
+ archivePrefix={arXiv},
1717
+ primaryClass={cs.IR},
1718
+ url={https://arxiv.org/abs/2205.04733},
1719
+ }
1720
+ ```
1721
+
1722
+ #### SparseMultipleNegativesRankingLoss
1723
+ ```bibtex
1724
+ @misc{henderson2017efficient,
1725
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1726
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1727
+ year={2017},
1728
+ eprint={1705.00652},
1729
+ archivePrefix={arXiv},
1730
+ primaryClass={cs.CL}
1731
+ }
1732
+ ```
1733
+
1734
+ #### FlopsLoss
1735
+ ```bibtex
1736
+ @article{paria2020minimizing,
1737
+ title={Minimizing flops to learn efficient sparse representations},
1738
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
1739
+ journal={arXiv preprint arXiv:2004.05665},
1740
+ year={2020}
1741
+ }
1742
+ ```
1743
+
1744
+ <!--
1745
+ ## Glossary
1746
+
1747
+ *Clearly define terms in order to be accessible across audiences.*
1748
+ -->
1749
+
1750
+ <!--
1751
+ ## Model Card Authors
1752
+
1753
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1754
+ -->
1755
+
1756
+ <!--
1757
+ ## Model Card Contact
1758
+
1759
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1760
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForMaskedLM"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "hidden_dim": 3072,
10
+ "initializer_range": 0.02,
11
+ "max_position_embeddings": 512,
12
+ "model_type": "distilbert",
13
+ "n_heads": 12,
14
+ "n_layers": 6,
15
+ "pad_token_id": 0,
16
+ "qa_dropout": 0.1,
17
+ "seq_classif_dropout": 0.2,
18
+ "sinusoidal_pos_embds": false,
19
+ "tie_weights_": true,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "vocab_size": 30522
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ece0c5b4cbefbb81d45ad9c6445a55c17a666087262f987c6fb54e1c7e1a264d
3
+ size 267954768
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff