nguyenhuucongzz01 commited on
Commit
8acc7e3
·
verified ·
1 Parent(s): 4b6fdc3

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1536,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 1536, "out_features": 1024, "bias": true, "activation_function": "torch.nn.modules.linear.Identity"}
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d0447b2202b919e514a2311e310fbe1ffbb4e92755c9d8ecc927e96d5fae37a
3
+ size 3147944
README.md ADDED
@@ -0,0 +1,1136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:8562
8
+ - loss:CachedMultipleNegativesRankingLoss
9
+ base_model: NovaSearch/stella_en_1.5B_v5
10
+ widget:
11
+ - source_sentence: 'Subject: Sharing in a Ratio
12
+
13
+ Construct: Given information about one part, work out the whole
14
+
15
+ Question: The ratio of cars to vans in a car park is \( 5: 3 \)
16
+
17
+
18
+ If there are \( 80 \) cars, how many vehicles (cars and vans) are there in total?
19
+
20
+ CorrectAnswer: \( 128 \)
21
+
22
+ IncorrectAnswer: \( 83 \)
23
+
24
+ IncorrectReason: The correct answer is \( 128 \) because the ratio of cars to
25
+ vans is \( 5:3 \). This means for every 5 cars, there are 3 vans. Given there
26
+ are 80 cars, we can determine the number of vans by setting up a proportion. Since
27
+ \( 5 \) parts correspond to \( 80 \) cars, each part corresponds to \( \frac{80}{5}
28
+ = 16 \) cars. Therefore, the number of vans, which is \( 3 \) parts, is \( 3 \times
29
+ 16 = 48 \) vans. The total number of vehicles is the sum of cars and vans, which
30
+ is \( 80 + 48 = 128 \).
31
+
32
+
33
+ The incorrect answer \( 83 \) likely comes from a misunderstanding of how to apply
34
+ the ratio. One might incorrectly assume that the total number of vehicles is simply
35
+ the sum of the given number of cars and the number of vans directly derived from
36
+ the ratio without properly scaling the ratio to match the given number of cars.
37
+ For example, someone might incorrectly add \( 80 \) cars and \( 3 \) vans (thinking
38
+ the ratio directly applies without scaling), leading to \( 83 \). This is a common
39
+ misconception when dealing with ratios and proportions.'
40
+ sentences:
41
+ - Thinks the number in the ratio is the total
42
+ - Increases by the given percentage rather than finding the percentage of an amount
43
+ - Thinks there are 50 weeks in a year
44
+ - source_sentence: 'Subject: Indirect (Inverse) Proportion
45
+
46
+ Construct: Calculations using inverse proportion
47
+
48
+ Question: It takes 4 workers 6 days to paint a house. How many days would it take
49
+ 3 workers?
50
+
51
+ CorrectAnswer: 8
52
+
53
+ IncorrectAnswer: 7
54
+
55
+ IncorrectReason: The correct answer is 8 days because the problem involves understanding
56
+ the relationship between the number of workers and the time taken to complete
57
+ a task. If 4 workers take 6 days to paint a house, the total work required can
58
+ be thought of as 4 workers * 6 days = 24 worker-days. This means the total amount
59
+ of work needed to paint the house is 24 worker-days. If you have 3 workers, the
60
+ number of days required to complete the same amount of work would be 24 worker-days
61
+ / 3 workers = 8 days.
62
+
63
+
64
+ The incorrect answer of 7 days likely stems from a common misconception about
65
+ how work rates scale with the number of workers. Someone might incorrectly assume
66
+ that reducing the number of workers from 4 to 3 would increase the time by a proportional
67
+ amount, such as 6 days * (4/3) = 8 days, but then round down to 7 days due to
68
+ a miscalculation or rounding error. This misunderstanding fails to account for
69
+ the direct proportionality between the number of workers and the time required
70
+ to complete the work.'
71
+ sentences:
72
+ - When working with inverse and direct proportion, forgets to apply the root or
73
+ index to find k, but remembers when doing the second calculation
74
+ - Confuses the radius with the radius squared in the equation of a circle
75
+ - Believes unrelated acute/obtuse angles in a diagram are supplementary
76
+ - source_sentence: 'Subject: Experimental Probability and Relative Frequency
77
+
78
+ Construct: Compare relative frequencies in order to determine which prediction
79
+ is likely to be more reliable
80
+
81
+ Question: A vegetarian restaurant wants to know people''s eating habits
82
+
83
+ \( 50 \) people in Town \( A \) are asked if they eat meat. \begin{tabular}{|c|c|}
84
+
85
+ \hline \multicolumn{2}{|c|}{ Town A } \\
86
+
87
+ \hline Do you eat meat? & Frequency \\
88
+
89
+ \hline Yes & \( 39 \) \\
90
+
91
+ \hline No & \( 11 \) \\
92
+
93
+ \hline
94
+
95
+ \end{tabular} \( 200 \) people in Town B are asked if they eat meat. \begin{tabular}{|cc|}
96
+
97
+ \hline \multicolumn{2}{|c|}{ Town B } \\
98
+
99
+ \hline Do you eat meat? & Frequency \\
100
+
101
+ \hline Yes & \( 143 \) \\
102
+
103
+ \hline No & \( 57 \) \\
104
+
105
+ \hline
106
+
107
+ \end{tabular} Which results are more reliable?
108
+
109
+ CorrectAnswer: Town B
110
+
111
+ IncorrectAnswer: They are the same
112
+
113
+ IncorrectReason: The correct answer is Town B because the reliability of survey
114
+ results generally increases with a larger sample size. In this case, Town B has
115
+ a larger sample size of 200 people compared to Town A''s 50 people. Larger samples
116
+ tend to provide more accurate estimates of the population''s eating habits, reducing
117
+ the impact of random variation and providing a more reliable representation of
118
+ the true population proportion.
119
+
120
+
121
+ The incorrect answer, "They are the same," is a misconception that assumes the
122
+ reliability of survey results does not depend on the sample size. This misunderstanding
123
+ ignores the statistical principle that larger samples generally yield more reliable
124
+ estimates. The smaller sample size in Town A introduces more variability and potential
125
+ bias, making the results less reliable compared to the larger sample in Town B.'
126
+ sentences:
127
+ - Has considered the percentage rather than the percentage of the amount
128
+ - Does not know that sample size affects reliability
129
+ - Believes sets are the same if the elements within them have a shared property
130
+ - source_sentence: 'Subject: Square Roots, Cube Roots, etc
131
+
132
+ Construct: Recognise other roots of numbers
133
+
134
+ Question: \( \sqrt[4]{16}=? \)
135
+
136
+ CorrectAnswer: \( 2 \)
137
+
138
+ IncorrectAnswer: \( 16 \)
139
+
140
+ IncorrectReason: The correct answer to the problem \( \sqrt[4]{16} \) is \( 2
141
+ \). This is because the fourth root of a number is the number that, when raised
142
+ to the fourth power, gives the original number. In this case, \( 2^4 = 2 \times
143
+ 2 \times 2 \times 2 = 16 \). Therefore, \( 2 \) is the correct answer as it satisfies
144
+ the equation \( 2^4 = 16 \).
145
+
146
+
147
+ The incorrect answer \( 16 \) likely stems from a misunderstanding of what the
148
+ fourth root operation means. Someone might have confused the fourth root with
149
+ the fourth power, thinking that \( 16 \) raised to the fourth power equals \(
150
+ 16 \), which is not true. The fourth power of \( 16 \) would be \( 16^4 = 65536
151
+ \), not \( 16 \). Thus, the misconception lies in confusing the operation of taking
152
+ a root with raising a number to a power.'
153
+ sentences:
154
+ - 'Confuses the use of LCM and HCF in real life examples '
155
+ - Estimated when not appropriate
156
+ - Does not understand the root power of 4
157
+ - source_sentence: 'Subject: Adding and Subtracting with Decimals
158
+
159
+ Construct: Subtract decimals where the numbers involved have a different number
160
+ of decimal places
161
+
162
+ Question: \( 0.55-0.2= \)
163
+
164
+ CorrectAnswer: \( 0.35 \)
165
+
166
+ IncorrectAnswer: \( 0.33 \)
167
+
168
+ IncorrectReason: The correct answer to the problem \( 0.55 - 0.2 \) is \( 0.35
169
+ \). This is correct because when you subtract \( 0.2 \) from \( 0.55 \), you are
170
+ essentially performing the operation \( 0.55 - 0.20 \). This can be visualized
171
+ as \( 55 \) hundredths minus \( 20 \) hundredths, which equals \( 35 \) hundredths,
172
+ or \( 0.35 \).
173
+
174
+
175
+ The incorrect answer \( 0.33 \) likely stems from a common misconception or a
176
+ calculation error. One possible reason for this mistake is a misunderstanding
177
+ of decimal subtraction or a misinterpretation of the place values. For example,
178
+ someone might incorrectly think that \( 0.55 - 0.2 \) is the same as \( 0.55 -
179
+ 0.22 \), leading to \( 0.33 \). Alternatively, the error could be due to a simple
180
+ arithmetic mistake, such as not properly aligning the decimal points during the
181
+ subtraction process. It''s important to ensure that the decimal points are aligned
182
+ correctly and to understand the value of each digit in the decimal places.'
183
+ sentences:
184
+ - Does not know that 7 and -7 are different
185
+ - When subtracting decimals with a different number of decimals, subtracts one digit
186
+ from more than one column
187
+ - Underestimates the area of shapes when counting squares when some squares are
188
+ neither wholes nor halves
189
+ pipeline_tag: sentence-similarity
190
+ library_name: sentence-transformers
191
+ metrics:
192
+ - cosine_accuracy@25
193
+ - cosine_precision@50
194
+ - cosine_precision@100
195
+ - cosine_precision@150
196
+ - cosine_precision@200
197
+ - cosine_recall@50
198
+ - cosine_recall@100
199
+ - cosine_recall@150
200
+ - cosine_recall@200
201
+ - cosine_ndcg@25
202
+ - cosine_mrr@25
203
+ - cosine_map@25
204
+ model-index:
205
+ - name: SentenceTransformer based on NovaSearch/stella_en_1.5B_v5
206
+ results:
207
+ - task:
208
+ type: information-retrieval
209
+ name: Information Retrieval
210
+ dataset:
211
+ name: val
212
+ type: val
213
+ metrics:
214
+ - type: cosine_accuracy@25
215
+ value: 0.7408256880733946
216
+ name: Cosine Accuracy@25
217
+ - type: cosine_precision@50
218
+ value: 0.01676605504587156
219
+ name: Cosine Precision@50
220
+ - type: cosine_precision@100
221
+ value: 0.00901376146788991
222
+ name: Cosine Precision@100
223
+ - type: cosine_precision@150
224
+ value: 0.006261467889908257
225
+ name: Cosine Precision@150
226
+ - type: cosine_precision@200
227
+ value: 0.0047993119266055055
228
+ name: Cosine Precision@200
229
+ - type: cosine_recall@50
230
+ value: 0.8371559633027523
231
+ name: Cosine Recall@50
232
+ - type: cosine_recall@100
233
+ value: 0.9002293577981652
234
+ name: Cosine Recall@100
235
+ - type: cosine_recall@150
236
+ value: 0.9380733944954128
237
+ name: Cosine Recall@150
238
+ - type: cosine_recall@200
239
+ value: 0.9587155963302753
240
+ name: Cosine Recall@200
241
+ - type: cosine_ndcg@25
242
+ value: 0.398580566924245
243
+ name: Cosine Ndcg@25
244
+ - type: cosine_mrr@25
245
+ value: 0.3016886474683388
246
+ name: Cosine Mrr@25
247
+ - type: cosine_map@25
248
+ value: 0.3016613429685574
249
+ name: Cosine Map@25
250
+ ---
251
+
252
+ # SentenceTransformer based on NovaSearch/stella_en_1.5B_v5
253
+
254
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [NovaSearch/stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
255
+
256
+ ## Model Details
257
+
258
+ ### Model Description
259
+ - **Model Type:** Sentence Transformer
260
+ - **Base model:** [NovaSearch/stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) <!-- at revision b467445fc9c39af69fdb1bda9e18416df4d19f3c -->
261
+ - **Maximum Sequence Length:** 512 tokens
262
+ - **Output Dimensionality:** 1024 dimensions
263
+ - **Similarity Function:** Cosine Similarity
264
+ <!-- - **Training Dataset:** Unknown -->
265
+ <!-- - **Language:** Unknown -->
266
+ <!-- - **License:** Unknown -->
267
+
268
+ ### Model Sources
269
+
270
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
271
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
272
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
273
+
274
+ ### Full Model Architecture
275
+
276
+ ```
277
+ SentenceTransformer(
278
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: PeftModelForFeatureExtraction
279
+ (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
280
+ (2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
281
+ )
282
+ ```
283
+
284
+ ## Usage
285
+
286
+ ### Direct Usage (Sentence Transformers)
287
+
288
+ First install the Sentence Transformers library:
289
+
290
+ ```bash
291
+ pip install -U sentence-transformers
292
+ ```
293
+
294
+ Then you can load this model and run inference.
295
+ ```python
296
+ from sentence_transformers import SentenceTransformer
297
+
298
+ # Download from the 🤗 Hub
299
+ model = SentenceTransformer("sentence_transformers_model_id")
300
+ # Run inference
301
+ sentences = [
302
+ "Subject: Adding and Subtracting with Decimals\nConstruct: Subtract decimals where the numbers involved have a different number of decimal places\nQuestion: \\( 0.55-0.2= \\)\nCorrectAnswer: \\( 0.35 \\)\nIncorrectAnswer: \\( 0.33 \\)\nIncorrectReason: The correct answer to the problem \\( 0.55 - 0.2 \\) is \\( 0.35 \\). This is correct because when you subtract \\( 0.2 \\) from \\( 0.55 \\), you are essentially performing the operation \\( 0.55 - 0.20 \\). This can be visualized as \\( 55 \\) hundredths minus \\( 20 \\) hundredths, which equals \\( 35 \\) hundredths, or \\( 0.35 \\).\n\nThe incorrect answer \\( 0.33 \\) likely stems from a common misconception or a calculation error. One possible reason for this mistake is a misunderstanding of decimal subtraction or a misinterpretation of the place values. For example, someone might incorrectly think that \\( 0.55 - 0.2 \\) is the same as \\( 0.55 - 0.22 \\), leading to \\( 0.33 \\). Alternatively, the error could be due to a simple arithmetic mistake, such as not properly aligning the decimal points during the subtraction process. It's important to ensure that the decimal points are aligned correctly and to understand the value of each digit in the decimal places.",
303
+ 'When subtracting decimals with a different number of decimals, subtracts one digit from more than one column',
304
+ 'Does not know that 7 and -7 are different',
305
+ ]
306
+ embeddings = model.encode(sentences)
307
+ print(embeddings.shape)
308
+ # [3, 1024]
309
+
310
+ # Get the similarity scores for the embeddings
311
+ similarities = model.similarity(embeddings, embeddings)
312
+ print(similarities.shape)
313
+ # [3, 3]
314
+ ```
315
+
316
+ <!--
317
+ ### Direct Usage (Transformers)
318
+
319
+ <details><summary>Click to see the direct usage in Transformers</summary>
320
+
321
+ </details>
322
+ -->
323
+
324
+ <!--
325
+ ### Downstream Usage (Sentence Transformers)
326
+
327
+ You can finetune this model on your own dataset.
328
+
329
+ <details><summary>Click to expand</summary>
330
+
331
+ </details>
332
+ -->
333
+
334
+ <!--
335
+ ### Out-of-Scope Use
336
+
337
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
338
+ -->
339
+
340
+ ## Evaluation
341
+
342
+ ### Metrics
343
+
344
+ #### Information Retrieval
345
+
346
+ * Dataset: `val`
347
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
348
+
349
+ | Metric | Value |
350
+ |:---------------------|:-----------|
351
+ | cosine_accuracy@25 | 0.7408 |
352
+ | cosine_precision@50 | 0.0168 |
353
+ | cosine_precision@100 | 0.009 |
354
+ | cosine_precision@150 | 0.0063 |
355
+ | cosine_precision@200 | 0.0048 |
356
+ | cosine_recall@50 | 0.8372 |
357
+ | cosine_recall@100 | 0.9002 |
358
+ | cosine_recall@150 | 0.9381 |
359
+ | cosine_recall@200 | 0.9587 |
360
+ | **cosine_ndcg@25** | **0.3986** |
361
+ | cosine_mrr@25 | 0.3017 |
362
+ | cosine_map@25 | 0.3017 |
363
+
364
+ <!--
365
+ ## Bias, Risks and Limitations
366
+
367
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
368
+ -->
369
+
370
+ <!--
371
+ ### Recommendations
372
+
373
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
374
+ -->
375
+
376
+ ## Training Details
377
+
378
+ ### Training Dataset
379
+
380
+ #### Unnamed Dataset
381
+
382
+ * Size: 8,562 training samples
383
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
384
+ * Approximate statistics based on the first 1000 samples:
385
+ | | anchor | positive | negative |
386
+ |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
387
+ | type | string | string | string |
388
+ | details | <ul><li>min: 190 tokens</li><li>mean: 331.9 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 13.79 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.33 tokens</li><li>max: 41 tokens</li></ul> |
389
+ * Samples:
390
+ | anchor | positive | negative |
391
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------|
392
+ | <code>Subject: Function Machines<br>Construct: Calculate the square of a number<br>Question: ![A function machine with 3 rectangles in a row, joined by arrows pointing from left to right. The first rectangle on the left is empty and says "input" above it. The middle rectangle has "square root" written inside it and the final rectangle has "output" written above it and "16" written inside it.]() What is the input of this function machine?<br>CorrectAnswer: \( 256 \)<br>IncorrectAnswer: \( 8 \)<br>IncorrectReason: The correct answer is \( 256 \) because the function machine involves taking the square root of the input to produce the output. Given that the output is \( 16 \), we need to find a number whose square root is \( 16 \). Mathematically, this means solving the equation \( \sqrt{x} = 16 \). Squaring both sides, we get \( x = 16^2 = 256 \). Therefore, the input must be \( 256 \).<br><br>The incorrect answer \( 8 \) likely stems from a common misconception. Someone might have seen the output \( 16 \) and thou...</code> | <code>Mixes up squaring and multiplying by 2 or doubling</code> | <code>Confuses written 'teen' numbers with their corresponding single-digit number</code> |
393
+ | <code>Subject: Function Machines<br>Construct: Calculate the square of a number<br>Question: ![A function machine with 3 rectangles in a row, joined by arrows pointing from left to right. The first rectangle on the left is empty and says "input" above it. The middle rectangle has "square root" written inside it and the final rectangle has "output" written above it and "16" written inside it.]() What is the input of this function machine?<br>CorrectAnswer: \( 256 \)<br>IncorrectAnswer: \( 8 \)<br>IncorrectReason: The correct answer is \( 256 \) because the function machine involves taking the square root of the input to produce the output. Given that the output is \( 16 \), we need to find a number whose square root is \( 16 \). Mathematically, this means solving the equation \( \sqrt{x} = 16 \). Squaring both sides, we get \( x = 16^2 = 256 \). Therefore, the input must be \( 256 \).<br><br>The incorrect answer \( 8 \) likely stems from a common misconception. Someone might have seen the output \( 16 \) and thou...</code> | <code>Mixes up squaring and multiplying by 2 or doubling</code> | <code>When multiplying multiples of ten and the answer requires an extra digit, leaves off that extra digit</code> |
394
+ | <code>Subject: Ratio and Proportion<br>Construct: Convert between currencies given an exchange rate<br>Question: Convert 350 Thai baht to Australian Dollars.\n1 Australian dollar = 25 Thai baht<br>CorrectAnswer: 14<br>IncorrectAnswer: 350<br>IncorrectReason: The correct answer is 14 Australian dollars. This is because the conversion rate given is 1 Australian dollar (AUD) equals 25 Thai baht (THB). To convert 350 THB to AUD, you divide 350 by 25, which equals 14 AUD. This calculation correctly reflects the exchange rate and the amount of money being converted.<br><br>The incorrect answer of 350 is likely due to a misunderstanding of the conversion process. Someone might have mistakenly thought that the amount in Thai baht is the same in Australian dollars, not taking into account the exchange rate. This error occurs when the person fails to apply the conversion factor, instead assuming that the currency values are equivalent without adjustment. This misconception can lead to significant errors in financial trans...</code> | <code>Assumes a 1:1 conversion ratio between different currencies</code> | <code>Believes that the larger the divisor, the larger the answer.</code> |
395
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
396
+ ```json
397
+ {
398
+ "scale": 20.0,
399
+ "similarity_fct": "cos_sim"
400
+ }
401
+ ```
402
+
403
+ ### Training Hyperparameters
404
+ #### Non-Default Hyperparameters
405
+
406
+ - `eval_strategy`: steps
407
+ - `per_device_train_batch_size`: 16
408
+ - `per_device_eval_batch_size`: 4
409
+ - `learning_rate`: 0.001
410
+ - `num_train_epochs`: 1.0
411
+ - `lr_scheduler_type`: cosine
412
+ - `save_only_model`: True
413
+ - `bf16`: True
414
+ - `load_best_model_at_end`: True
415
+ - `batch_sampler`: no_duplicates
416
+
417
+ #### All Hyperparameters
418
+ <details><summary>Click to expand</summary>
419
+
420
+ - `overwrite_output_dir`: False
421
+ - `do_predict`: False
422
+ - `eval_strategy`: steps
423
+ - `prediction_loss_only`: True
424
+ - `per_device_train_batch_size`: 16
425
+ - `per_device_eval_batch_size`: 4
426
+ - `per_gpu_train_batch_size`: None
427
+ - `per_gpu_eval_batch_size`: None
428
+ - `gradient_accumulation_steps`: 1
429
+ - `eval_accumulation_steps`: None
430
+ - `torch_empty_cache_steps`: None
431
+ - `learning_rate`: 0.001
432
+ - `weight_decay`: 0.0
433
+ - `adam_beta1`: 0.9
434
+ - `adam_beta2`: 0.999
435
+ - `adam_epsilon`: 1e-08
436
+ - `max_grad_norm`: 1.0
437
+ - `num_train_epochs`: 1.0
438
+ - `max_steps`: -1
439
+ - `lr_scheduler_type`: cosine
440
+ - `lr_scheduler_kwargs`: {}
441
+ - `warmup_ratio`: 0.0
442
+ - `warmup_steps`: 0
443
+ - `log_level`: passive
444
+ - `log_level_replica`: warning
445
+ - `log_on_each_node`: True
446
+ - `logging_nan_inf_filter`: True
447
+ - `save_safetensors`: True
448
+ - `save_on_each_node`: False
449
+ - `save_only_model`: True
450
+ - `restore_callback_states_from_checkpoint`: False
451
+ - `no_cuda`: False
452
+ - `use_cpu`: False
453
+ - `use_mps_device`: False
454
+ - `seed`: 42
455
+ - `data_seed`: None
456
+ - `jit_mode_eval`: False
457
+ - `use_ipex`: False
458
+ - `bf16`: True
459
+ - `fp16`: False
460
+ - `fp16_opt_level`: O1
461
+ - `half_precision_backend`: auto
462
+ - `bf16_full_eval`: False
463
+ - `fp16_full_eval`: False
464
+ - `tf32`: None
465
+ - `local_rank`: 0
466
+ - `ddp_backend`: None
467
+ - `tpu_num_cores`: None
468
+ - `tpu_metrics_debug`: False
469
+ - `debug`: []
470
+ - `dataloader_drop_last`: False
471
+ - `dataloader_num_workers`: 0
472
+ - `dataloader_prefetch_factor`: None
473
+ - `past_index`: -1
474
+ - `disable_tqdm`: False
475
+ - `remove_unused_columns`: True
476
+ - `label_names`: None
477
+ - `load_best_model_at_end`: True
478
+ - `ignore_data_skip`: False
479
+ - `fsdp`: []
480
+ - `fsdp_min_num_params`: 0
481
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
482
+ - `tp_size`: 0
483
+ - `fsdp_transformer_layer_cls_to_wrap`: None
484
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
485
+ - `deepspeed`: None
486
+ - `label_smoothing_factor`: 0.0
487
+ - `optim`: adamw_torch
488
+ - `optim_args`: None
489
+ - `adafactor`: False
490
+ - `group_by_length`: False
491
+ - `length_column_name`: length
492
+ - `ddp_find_unused_parameters`: None
493
+ - `ddp_bucket_cap_mb`: None
494
+ - `ddp_broadcast_buffers`: False
495
+ - `dataloader_pin_memory`: True
496
+ - `dataloader_persistent_workers`: False
497
+ - `skip_memory_metrics`: True
498
+ - `use_legacy_prediction_loop`: False
499
+ - `push_to_hub`: False
500
+ - `resume_from_checkpoint`: None
501
+ - `hub_model_id`: None
502
+ - `hub_strategy`: every_save
503
+ - `hub_private_repo`: None
504
+ - `hub_always_push`: False
505
+ - `gradient_checkpointing`: False
506
+ - `gradient_checkpointing_kwargs`: None
507
+ - `include_inputs_for_metrics`: False
508
+ - `include_for_metrics`: []
509
+ - `eval_do_concat_batches`: True
510
+ - `fp16_backend`: auto
511
+ - `push_to_hub_model_id`: None
512
+ - `push_to_hub_organization`: None
513
+ - `mp_parameters`:
514
+ - `auto_find_batch_size`: False
515
+ - `full_determinism`: False
516
+ - `torchdynamo`: None
517
+ - `ray_scope`: last
518
+ - `ddp_timeout`: 1800
519
+ - `torch_compile`: False
520
+ - `torch_compile_backend`: None
521
+ - `torch_compile_mode`: None
522
+ - `include_tokens_per_second`: False
523
+ - `include_num_input_tokens_seen`: False
524
+ - `neftune_noise_alpha`: None
525
+ - `optim_target_modules`: None
526
+ - `batch_eval_metrics`: False
527
+ - `eval_on_start`: False
528
+ - `use_liger_kernel`: False
529
+ - `eval_use_gather_object`: False
530
+ - `average_tokens_across_devices`: False
531
+ - `prompts`: None
532
+ - `batch_sampler`: no_duplicates
533
+ - `multi_dataset_batch_sampler`: proportional
534
+
535
+ </details>
536
+
537
+ ### Training Logs
538
+ <details><summary>Click to expand</summary>
539
+
540
+ | Epoch | Step | Training Loss | val_cosine_ndcg@25 |
541
+ |:----------:|:-------:|:-------------:|:------------------:|
542
+ | 0.0019 | 1 | 0.7341 | - |
543
+ | 0.0037 | 2 | 1.1246 | - |
544
+ | 0.0056 | 3 | 1.1668 | - |
545
+ | 0.0075 | 4 | 1.2752 | - |
546
+ | 0.0093 | 5 | 1.2428 | - |
547
+ | 0.0112 | 6 | 0.8722 | - |
548
+ | 0.0131 | 7 | 0.9877 | - |
549
+ | 0.0149 | 8 | 0.3914 | - |
550
+ | 0.0168 | 9 | 1.6333 | - |
551
+ | 0.0187 | 10 | 0.3793 | 0.3873 |
552
+ | 0.0205 | 11 | 0.3277 | - |
553
+ | 0.0224 | 12 | 0.689 | - |
554
+ | 0.0243 | 13 | 1.4066 | - |
555
+ | 0.0261 | 14 | 0.7874 | - |
556
+ | 0.0280 | 15 | 0.7898 | - |
557
+ | 0.0299 | 16 | 1.0844 | - |
558
+ | 0.0317 | 17 | 1.0972 | - |
559
+ | 0.0336 | 18 | 1.7414 | - |
560
+ | 0.0354 | 19 | 0.9649 | - |
561
+ | 0.0373 | 20 | 0.9025 | 0.3383 |
562
+ | 0.0392 | 21 | 1.0195 | - |
563
+ | 0.0410 | 22 | 1.5774 | - |
564
+ | 0.0429 | 23 | 2.6835 | - |
565
+ | 0.0448 | 24 | 1.9685 | - |
566
+ | 0.0466 | 25 | 1.5736 | - |
567
+ | 0.0485 | 26 | 0.4385 | - |
568
+ | 0.0504 | 27 | 1.5777 | - |
569
+ | 0.0522 | 28 | 0.5438 | - |
570
+ | 0.0541 | 29 | 1.1351 | - |
571
+ | 0.0560 | 30 | 0.4636 | 0.3349 |
572
+ | 0.0578 | 31 | 1.749 | - |
573
+ | 0.0597 | 32 | 0.6608 | - |
574
+ | 0.0616 | 33 | 1.48 | - |
575
+ | 0.0634 | 34 | 0.6442 | - |
576
+ | 0.0653 | 35 | 1.2882 | - |
577
+ | 0.0672 | 36 | 1.5927 | - |
578
+ | 0.0690 | 37 | 0.819 | - |
579
+ | 0.0709 | 38 | 0.5842 | - |
580
+ | 0.0728 | 39 | 0.4818 | - |
581
+ | 0.0746 | 40 | 0.5143 | 0.3079 |
582
+ | 0.0765 | 41 | 1.4064 | - |
583
+ | 0.0784 | 42 | 0.924 | - |
584
+ | 0.0802 | 43 | 0.9097 | - |
585
+ | 0.0821 | 44 | 0.4214 | - |
586
+ | 0.0840 | 45 | 1.2579 | - |
587
+ | 0.0858 | 46 | 3.0192 | - |
588
+ | 0.0877 | 47 | 0.9019 | - |
589
+ | 0.0896 | 48 | 0.8331 | - |
590
+ | 0.0914 | 49 | 2.1336 | - |
591
+ | 0.0933 | 50 | 0.3793 | 0.3332 |
592
+ | 0.0951 | 51 | 0.6568 | - |
593
+ | 0.0970 | 52 | 0.7644 | - |
594
+ | 0.0989 | 53 | 1.1422 | - |
595
+ | 0.1007 | 54 | 1.1733 | - |
596
+ | 0.1026 | 55 | 1.1297 | - |
597
+ | 0.1045 | 56 | 0.7746 | - |
598
+ | 0.1063 | 57 | 1.2374 | - |
599
+ | 0.1082 | 58 | 1.0382 | - |
600
+ | 0.1101 | 59 | 0.8722 | - |
601
+ | 0.1119 | 60 | 1.6862 | 0.2076 |
602
+ | 0.1138 | 61 | 0.9489 | - |
603
+ | 0.1157 | 62 | 1.6074 | - |
604
+ | 0.1175 | 63 | 2.3639 | - |
605
+ | 0.1194 | 64 | 1.2994 | - |
606
+ | 0.1213 | 65 | 1.3806 | - |
607
+ | 0.1231 | 66 | 1.6077 | - |
608
+ | 0.125 | 67 | 1.2359 | - |
609
+ | 0.1269 | 68 | 1.2202 | - |
610
+ | 0.1287 | 69 | 0.8442 | - |
611
+ | 0.1306 | 70 | 0.8537 | 0.2768 |
612
+ | 0.1325 | 71 | 2.2377 | - |
613
+ | 0.1343 | 72 | 1.0657 | - |
614
+ | 0.1362 | 73 | 0.6213 | - |
615
+ | 0.1381 | 74 | 1.2029 | - |
616
+ | 0.1399 | 75 | 1.4392 | - |
617
+ | 0.1418 | 76 | 0.7116 | - |
618
+ | 0.1437 | 77 | 1.228 | - |
619
+ | 0.1455 | 78 | 0.9498 | - |
620
+ | 0.1474 | 79 | 1.1289 | - |
621
+ | 0.1493 | 80 | 1.6371 | 0.2504 |
622
+ | 0.1511 | 81 | 0.438 | - |
623
+ | 0.1530 | 82 | 1.0909 | - |
624
+ | 0.1549 | 83 | 0.8301 | - |
625
+ | 0.1567 | 84 | 0.9003 | - |
626
+ | 0.1586 | 85 | 1.8428 | - |
627
+ | 0.1604 | 86 | 2.7758 | - |
628
+ | 0.1623 | 87 | 3.7156 | - |
629
+ | 0.1642 | 88 | 2.6085 | - |
630
+ | 0.1660 | 89 | 2.2705 | - |
631
+ | 0.1679 | 90 | 1.4518 | 0.2239 |
632
+ | 0.1698 | 91 | 1.3423 | - |
633
+ | 0.1716 | 92 | 1.4066 | - |
634
+ | 0.1735 | 93 | 2.3138 | - |
635
+ | 0.1754 | 94 | 2.256 | - |
636
+ | 0.1772 | 95 | 1.2564 | - |
637
+ | 0.1791 | 96 | 1.477 | - |
638
+ | 0.1810 | 97 | 2.8484 | - |
639
+ | 0.1828 | 98 | 1.3257 | - |
640
+ | 0.1847 | 99 | 1.1516 | - |
641
+ | 0.1866 | 100 | 1.2892 | 0.2142 |
642
+ | 0.1884 | 101 | 1.7179 | - |
643
+ | 0.1903 | 102 | 2.2282 | - |
644
+ | 0.1922 | 103 | 0.9497 | - |
645
+ | 0.1940 | 104 | 0.9663 | - |
646
+ | 0.1959 | 105 | 1.2476 | - |
647
+ | 0.1978 | 106 | 1.0585 | - |
648
+ | 0.1996 | 107 | 1.565 | - |
649
+ | 0.2015 | 108 | 1.4498 | - |
650
+ | 0.2034 | 109 | 1.237 | - |
651
+ | 0.2052 | 110 | 1.9519 | 0.1239 |
652
+ | 0.2071 | 111 | 2.4816 | - |
653
+ | 0.2090 | 112 | 2.3602 | - |
654
+ | 0.2108 | 113 | 0.5189 | - |
655
+ | 0.2127 | 114 | 2.1441 | - |
656
+ | 0.2146 | 115 | 1.9018 | - |
657
+ | 0.2164 | 116 | 1.1875 | - |
658
+ | 0.2183 | 117 | 1.033 | - |
659
+ | 0.2201 | 118 | 1.7925 | - |
660
+ | 0.2220 | 119 | 1.1472 | - |
661
+ | 0.2239 | 120 | 1.0008 | 0.2699 |
662
+ | 0.2257 | 121 | 1.4836 | - |
663
+ | 0.2276 | 122 | 0.9753 | - |
664
+ | 0.2295 | 123 | 0.7691 | - |
665
+ | 0.2313 | 124 | 0.9119 | - |
666
+ | 0.2332 | 125 | 0.7913 | - |
667
+ | 0.2351 | 126 | 1.4574 | - |
668
+ | 0.2369 | 127 | 1.3908 | - |
669
+ | 0.2388 | 128 | 1.2722 | - |
670
+ | 0.2407 | 129 | 0.3513 | - |
671
+ | 0.2425 | 130 | 1.2904 | 0.2267 |
672
+ | 0.2444 | 131 | 1.1935 | - |
673
+ | 0.2463 | 132 | 2.024 | - |
674
+ | 0.2481 | 133 | 1.2138 | - |
675
+ | 0.25 | 134 | 1.909 | - |
676
+ | 0.2519 | 135 | 1.4939 | - |
677
+ | 0.2537 | 136 | 2.5559 | - |
678
+ | 0.2556 | 137 | 1.1896 | - |
679
+ | 0.2575 | 138 | 1.5372 | - |
680
+ | 0.2593 | 139 | 1.3159 | - |
681
+ | 0.2612 | 140 | 2.8622 | 0.1801 |
682
+ | 0.2631 | 141 | 2.2284 | - |
683
+ | 0.2649 | 142 | 1.1668 | - |
684
+ | 0.2668 | 143 | 1.5383 | - |
685
+ | 0.2687 | 144 | 1.6872 | - |
686
+ | 0.2705 | 145 | 1.3499 | - |
687
+ | 0.2724 | 146 | 1.7111 | - |
688
+ | 0.2743 | 147 | 0.8461 | - |
689
+ | 0.2761 | 148 | 1.0737 | - |
690
+ | 0.2780 | 149 | 1.2229 | - |
691
+ | 0.2799 | 150 | 1.4991 | 0.2705 |
692
+ | 0.2817 | 151 | 1.2098 | - |
693
+ | 0.2836 | 152 | 0.8411 | - |
694
+ | 0.2854 | 153 | 0.7454 | - |
695
+ | 0.2873 | 154 | 0.5295 | - |
696
+ | 0.2892 | 155 | 1.2309 | - |
697
+ | 0.2910 | 156 | 1.1437 | - |
698
+ | 0.2929 | 157 | 1.3461 | - |
699
+ | 0.2948 | 158 | 1.1028 | - |
700
+ | 0.2966 | 159 | 1.6687 | - |
701
+ | 0.2985 | 160 | 1.1048 | 0.2228 |
702
+ | 0.3004 | 161 | 1.4661 | - |
703
+ | 0.3022 | 162 | 2.3891 | - |
704
+ | 0.3041 | 163 | 2.0019 | - |
705
+ | 0.3060 | 164 | 1.9604 | - |
706
+ | 0.3078 | 165 | 2.1173 | - |
707
+ | 0.3097 | 166 | 1.2352 | - |
708
+ | 0.3116 | 167 | 1.0883 | - |
709
+ | 0.3134 | 168 | 1.0343 | - |
710
+ | 0.3153 | 169 | 0.6048 | - |
711
+ | 0.3172 | 170 | 1.2634 | 0.2747 |
712
+ | 0.3190 | 171 | 0.724 | - |
713
+ | 0.3209 | 172 | 0.5937 | - |
714
+ | 0.3228 | 173 | 0.9735 | - |
715
+ | 0.3246 | 174 | 1.1059 | - |
716
+ | 0.3265 | 175 | 0.5561 | - |
717
+ | 0.3284 | 176 | 0.9019 | - |
718
+ | 0.3302 | 177 | 0.6012 | - |
719
+ | 0.3321 | 178 | 0.6203 | - |
720
+ | 0.3340 | 179 | 0.4729 | - |
721
+ | 0.3358 | 180 | 0.488 | 0.2880 |
722
+ | 0.3377 | 181 | 0.5171 | - |
723
+ | 0.3396 | 182 | 1.2202 | - |
724
+ | 0.3414 | 183 | 0.4338 | - |
725
+ | 0.3433 | 184 | 0.2286 | - |
726
+ | 0.3451 | 185 | 1.5921 | - |
727
+ | 0.3470 | 186 | 0.9065 | - |
728
+ | 0.3489 | 187 | 0.7728 | - |
729
+ | 0.3507 | 188 | 0.6743 | - |
730
+ | 0.3526 | 189 | 0.6354 | - |
731
+ | 0.3545 | 190 | 1.0883 | 0.3092 |
732
+ | 0.3563 | 191 | 0.7866 | - |
733
+ | 0.3582 | 192 | 0.4465 | - |
734
+ | 0.3601 | 193 | 0.9169 | - |
735
+ | 0.3619 | 194 | 1.2751 | - |
736
+ | 0.3638 | 195 | 0.6479 | - |
737
+ | 0.3657 | 196 | 1.0898 | - |
738
+ | 0.3675 | 197 | 0.4064 | - |
739
+ | 0.3694 | 198 | 1.216 | - |
740
+ | 0.3713 | 199 | 0.5892 | - |
741
+ | 0.3731 | 200 | 0.9736 | 0.2627 |
742
+ | 0.375 | 201 | 1.8989 | - |
743
+ | 0.3769 | 202 | 1.4159 | - |
744
+ | 0.3787 | 203 | 1.4947 | - |
745
+ | 0.3806 | 204 | 1.6758 | - |
746
+ | 0.3825 | 205 | 1.1081 | - |
747
+ | 0.3843 | 206 | 1.1187 | - |
748
+ | 0.3862 | 207 | 1.7538 | - |
749
+ | 0.3881 | 208 | 2.3149 | - |
750
+ | 0.3899 | 209 | 0.7799 | - |
751
+ | 0.3918 | 210 | 0.7268 | 0.2772 |
752
+ | 0.3937 | 211 | 0.6603 | - |
753
+ | 0.3955 | 212 | 1.034 | - |
754
+ | 0.3974 | 213 | 0.765 | - |
755
+ | 0.3993 | 214 | 1.8519 | - |
756
+ | 0.4011 | 215 | 1.6521 | - |
757
+ | 0.4030 | 216 | 1.7584 | - |
758
+ | 0.4049 | 217 | 2.2637 | - |
759
+ | 0.4067 | 218 | 1.1289 | - |
760
+ | 0.4086 | 219 | 1.9741 | - |
761
+ | 0.4104 | 220 | 1.8754 | 0.1599 |
762
+ | 0.4123 | 221 | 1.8528 | - |
763
+ | 0.4142 | 222 | 2.1507 | - |
764
+ | 0.4160 | 223 | 2.1293 | - |
765
+ | 0.4179 | 224 | 0.9261 | - |
766
+ | 0.4198 | 225 | 1.2636 | - |
767
+ | 0.4216 | 226 | 1.7696 | - |
768
+ | 0.4235 | 227 | 1.0828 | - |
769
+ | 0.4254 | 228 | 1.533 | - |
770
+ | 0.4272 | 229 | 1.438 | - |
771
+ | 0.4291 | 230 | 0.9375 | 0.2517 |
772
+ | 0.4310 | 231 | 0.8709 | - |
773
+ | 0.4328 | 232 | 1.0026 | - |
774
+ | 0.4347 | 233 | 1.0076 | - |
775
+ | 0.4366 | 234 | 0.8922 | - |
776
+ | 0.4384 | 235 | 0.828 | - |
777
+ | 0.4403 | 236 | 1.111 | - |
778
+ | 0.4422 | 237 | 1.5364 | - |
779
+ | 0.4440 | 238 | 0.9463 | - |
780
+ | 0.4459 | 239 | 1.059 | - |
781
+ | 0.4478 | 240 | 1.4188 | 0.1832 |
782
+ | 0.4496 | 241 | 1.7641 | - |
783
+ | 0.4515 | 242 | 1.4712 | - |
784
+ | 0.4534 | 243 | 1.2123 | - |
785
+ | 0.4552 | 244 | 0.9881 | - |
786
+ | 0.4571 | 245 | 2.1159 | - |
787
+ | 0.4590 | 246 | 1.073 | - |
788
+ | 0.4608 | 247 | 0.3211 | - |
789
+ | 0.4627 | 248 | 1.7917 | - |
790
+ | 0.4646 | 249 | 0.6342 | - |
791
+ | 0.4664 | 250 | 1.3472 | 0.2687 |
792
+ | 0.4683 | 251 | 0.492 | - |
793
+ | 0.4701 | 252 | 1.0642 | - |
794
+ | 0.4720 | 253 | 0.6704 | - |
795
+ | 0.4739 | 254 | 0.6744 | - |
796
+ | 0.4757 | 255 | 1.7866 | - |
797
+ | 0.4776 | 256 | 1.2805 | - |
798
+ | 0.4795 | 257 | 1.0666 | - |
799
+ | 0.4813 | 258 | 2.4739 | - |
800
+ | 0.4832 | 259 | 2.7657 | - |
801
+ | 0.4851 | 260 | 2.4601 | 0.1183 |
802
+ | 0.4869 | 261 | 2.5174 | - |
803
+ | 0.4888 | 262 | 2.7207 | - |
804
+ | 0.4907 | 263 | 2.7801 | - |
805
+ | 0.4925 | 264 | 1.2408 | - |
806
+ | 0.4944 | 265 | 2.3538 | - |
807
+ | 0.4963 | 266 | 2.2384 | - |
808
+ | 0.4981 | 267 | 1.4689 | - |
809
+ | 0.5 | 268 | 1.6905 | - |
810
+ | 0.5019 | 269 | 1.4729 | - |
811
+ | 0.5037 | 270 | 1.2211 | 0.2667 |
812
+ | 0.5056 | 271 | 0.6759 | - |
813
+ | 0.5075 | 272 | 0.8592 | - |
814
+ | 0.5093 | 273 | 0.4822 | - |
815
+ | 0.5112 | 274 | 1.2476 | - |
816
+ | 0.5131 | 275 | 0.6806 | - |
817
+ | 0.5149 | 276 | 1.3813 | - |
818
+ | 0.5168 | 277 | 0.7919 | - |
819
+ | 0.5187 | 278 | 0.7511 | - |
820
+ | 0.5205 | 279 | 0.6702 | - |
821
+ | 0.5224 | 280 | 0.8166 | 0.3069 |
822
+ | 0.5243 | 281 | 0.3796 | - |
823
+ | 0.5261 | 282 | 0.7048 | - |
824
+ | 0.5280 | 283 | 1.2978 | - |
825
+ | 0.5299 | 284 | 0.7682 | - |
826
+ | 0.5317 | 285 | 0.554 | - |
827
+ | 0.5336 | 286 | 1.0344 | - |
828
+ | 0.5354 | 287 | 0.8375 | - |
829
+ | 0.5373 | 288 | 0.361 | - |
830
+ | 0.5392 | 289 | 0.3193 | - |
831
+ | 0.5410 | 290 | 0.7264 | 0.2902 |
832
+ | 0.5429 | 291 | 1.2829 | - |
833
+ | 0.5448 | 292 | 1.6457 | - |
834
+ | 0.5466 | 293 | 0.9561 | - |
835
+ | 0.5485 | 294 | 1.2187 | - |
836
+ | 0.5504 | 295 | 1.5597 | - |
837
+ | 0.5522 | 296 | 1.6294 | - |
838
+ | 0.5541 | 297 | 0.9754 | - |
839
+ | 0.5560 | 298 | 1.121 | - |
840
+ | 0.5578 | 299 | 1.0038 | - |
841
+ | 0.5597 | 300 | 1.472 | 0.2603 |
842
+ | 0.5616 | 301 | 1.1317 | - |
843
+ | 0.5634 | 302 | 0.678 | - |
844
+ | 0.5653 | 303 | 1.2261 | - |
845
+ | 0.5672 | 304 | 1.4552 | - |
846
+ | 0.5690 | 305 | 0.7346 | - |
847
+ | 0.5709 | 306 | 1.2259 | - |
848
+ | 0.5728 | 307 | 0.5651 | - |
849
+ | 0.5746 | 308 | 0.5246 | - |
850
+ | 0.5765 | 309 | 0.5817 | - |
851
+ | 0.5784 | 310 | 1.0662 | 0.2983 |
852
+ | 0.5802 | 311 | 1.2422 | - |
853
+ | 0.5821 | 312 | 0.9479 | - |
854
+ | 0.5840 | 313 | 0.8528 | - |
855
+ | 0.5858 | 314 | 0.9502 | - |
856
+ | 0.5877 | 315 | 1.0885 | - |
857
+ | 0.5896 | 316 | 1.4663 | - |
858
+ | 0.5914 | 317 | 0.6274 | - |
859
+ | 0.5933 | 318 | 1.0567 | - |
860
+ | 0.5951 | 319 | 1.4394 | - |
861
+ | 0.5970 | 320 | 0.455 | 0.2463 |
862
+ | 0.5989 | 321 | 0.5577 | - |
863
+ | 0.6007 | 322 | 0.7305 | - |
864
+ | 0.6026 | 323 | 1.3569 | - |
865
+ | 0.6045 | 324 | 1.9528 | - |
866
+ | 0.6063 | 325 | 0.7332 | - |
867
+ | 0.6082 | 326 | 1.6955 | - |
868
+ | 0.6101 | 327 | 1.5237 | - |
869
+ | 0.6119 | 328 | 2.0396 | - |
870
+ | 0.6138 | 329 | 1.913 | - |
871
+ | 0.6157 | 330 | 1.8478 | 0.0902 |
872
+ | 0.6175 | 331 | 2.7965 | - |
873
+ | 0.6194 | 332 | 2.4383 | - |
874
+ | 0.6213 | 333 | 3.3085 | - |
875
+ | 0.6231 | 334 | 2.4657 | - |
876
+ | 0.625 | 335 | 2.3933 | - |
877
+ | 0.6269 | 336 | 2.3603 | - |
878
+ | 0.6287 | 337 | 1.3248 | - |
879
+ | 0.6306 | 338 | 1.568 | - |
880
+ | 0.6325 | 339 | 1.6271 | - |
881
+ | 0.6343 | 340 | 1.3838 | 0.1664 |
882
+ | 0.6362 | 341 | 2.0098 | - |
883
+ | 0.6381 | 342 | 1.7105 | - |
884
+ | 0.6399 | 343 | 1.2461 | - |
885
+ | 0.6418 | 344 | 1.293 | - |
886
+ | 0.6437 | 345 | 1.4298 | - |
887
+ | 0.6455 | 346 | 1.7789 | - |
888
+ | 0.6474 | 347 | 1.0361 | - |
889
+ | 0.6493 | 348 | 0.6129 | - |
890
+ | 0.6511 | 349 | 1.5476 | - |
891
+ | 0.6530 | 350 | 0.8251 | 0.2059 |
892
+ | 0.6549 | 351 | 0.9453 | - |
893
+ | 0.6567 | 352 | 1.1893 | - |
894
+ | 0.6586 | 353 | 0.7976 | - |
895
+ | 0.6604 | 354 | 0.5457 | - |
896
+ | 0.6623 | 355 | 0.6489 | - |
897
+ | 0.6642 | 356 | 1.0474 | - |
898
+ | 0.6660 | 357 | 1.0201 | - |
899
+ | 0.6679 | 358 | 0.5917 | - |
900
+ | 0.6698 | 359 | 1.0068 | - |
901
+ | 0.6716 | 360 | 0.5708 | 0.2568 |
902
+ | 0.6735 | 361 | 0.6778 | - |
903
+ | 0.6754 | 362 | 0.5382 | - |
904
+ | 0.6772 | 363 | 0.9939 | - |
905
+ | 0.6791 | 364 | 0.7322 | - |
906
+ | 0.6810 | 365 | 1.1926 | - |
907
+ | 0.6828 | 366 | 1.5369 | - |
908
+ | 0.6847 | 367 | 0.9815 | - |
909
+ | 0.6866 | 368 | 0.8891 | - |
910
+ | 0.6884 | 369 | 1.2503 | - |
911
+ | 0.6903 | 370 | 0.9369 | 0.2584 |
912
+ | 0.6922 | 371 | 0.538 | - |
913
+ | 0.6940 | 372 | 0.7312 | - |
914
+ | 0.6959 | 373 | 1.1477 | - |
915
+ | 0.6978 | 374 | 1.9885 | - |
916
+ | 0.6996 | 375 | 0.9605 | - |
917
+ | 0.7015 | 376 | 0.7769 | - |
918
+ | 0.7034 | 377 | 0.7701 | - |
919
+ | 0.7052 | 378 | 0.7166 | - |
920
+ | 0.7071 | 379 | 0.9712 | - |
921
+ | 0.7090 | 380 | 0.2171 | 0.3315 |
922
+ | 0.7108 | 381 | 1.1501 | - |
923
+ | 0.7127 | 382 | 0.9079 | - |
924
+ | 0.7146 | 383 | 0.3611 | - |
925
+ | 0.7164 | 384 | 0.1937 | - |
926
+ | 0.7183 | 385 | 0.5164 | - |
927
+ | 0.7201 | 386 | 1.4014 | - |
928
+ | 0.7220 | 387 | 0.5033 | - |
929
+ | 0.7239 | 388 | 0.7722 | - |
930
+ | 0.7257 | 389 | 0.1686 | - |
931
+ | 0.7276 | 390 | 0.5965 | 0.3521 |
932
+ | 0.7295 | 391 | 0.2465 | - |
933
+ | 0.7313 | 392 | 0.2342 | - |
934
+ | 0.7332 | 393 | 0.6155 | - |
935
+ | 0.7351 | 394 | 0.6689 | - |
936
+ | 0.7369 | 395 | 0.4981 | - |
937
+ | 0.7388 | 396 | 0.4915 | - |
938
+ | 0.7407 | 397 | 0.5064 | - |
939
+ | 0.7425 | 398 | 1.244 | - |
940
+ | 0.7444 | 399 | 0.8528 | - |
941
+ | 0.7463 | 400 | 0.6747 | 0.3463 |
942
+ | 0.7481 | 401 | 0.3525 | - |
943
+ | 0.75 | 402 | 1.2951 | - |
944
+ | 0.7519 | 403 | 0.6925 | - |
945
+ | 0.7537 | 404 | 0.7087 | - |
946
+ | 0.7556 | 405 | 0.1436 | - |
947
+ | 0.7575 | 406 | 0.6327 | - |
948
+ | 0.7593 | 407 | 0.3393 | - |
949
+ | 0.7612 | 408 | 0.5633 | - |
950
+ | 0.7631 | 409 | 0.6249 | - |
951
+ | 0.7649 | 410 | 1.5898 | 0.3513 |
952
+ | 0.7668 | 411 | 0.6968 | - |
953
+ | 0.7687 | 412 | 0.9603 | - |
954
+ | 0.7705 | 413 | 0.4476 | - |
955
+ | 0.7724 | 414 | 0.9167 | - |
956
+ | 0.7743 | 415 | 1.2049 | - |
957
+ | 0.7761 | 416 | 0.4518 | - |
958
+ | 0.7780 | 417 | 0.6315 | - |
959
+ | 0.7799 | 418 | 0.2537 | - |
960
+ | 0.7817 | 419 | 0.6812 | - |
961
+ | 0.7836 | 420 | 0.6971 | 0.3573 |
962
+ | 0.7854 | 421 | 0.6064 | - |
963
+ | 0.7873 | 422 | 0.4359 | - |
964
+ | 0.7892 | 423 | 0.4889 | - |
965
+ | 0.7910 | 424 | 0.7253 | - |
966
+ | 0.7929 | 425 | 0.519 | - |
967
+ | 0.7948 | 426 | 0.2237 | - |
968
+ | 0.7966 | 427 | 0.3144 | - |
969
+ | 0.7985 | 428 | 0.7395 | - |
970
+ | 0.8004 | 429 | 0.5903 | - |
971
+ | 0.8022 | 430 | 1.3353 | 0.3664 |
972
+ | 0.8041 | 431 | 0.5381 | - |
973
+ | 0.8060 | 432 | 0.5692 | - |
974
+ | 0.8078 | 433 | 0.3789 | - |
975
+ | 0.8097 | 434 | 0.4091 | - |
976
+ | 0.8116 | 435 | 0.4686 | - |
977
+ | 0.8134 | 436 | 0.5685 | - |
978
+ | 0.8153 | 437 | 0.5923 | - |
979
+ | 0.8172 | 438 | 0.2288 | - |
980
+ | 0.8190 | 439 | 0.5233 | - |
981
+ | 0.8209 | 440 | 0.7775 | 0.3810 |
982
+ | 0.8228 | 441 | 1.1349 | - |
983
+ | 0.8246 | 442 | 0.3454 | - |
984
+ | 0.8265 | 443 | 0.3732 | - |
985
+ | 0.8284 | 444 | 0.2545 | - |
986
+ | 0.8302 | 445 | 0.6133 | - |
987
+ | 0.8321 | 446 | 0.3711 | - |
988
+ | 0.8340 | 447 | 0.2668 | - |
989
+ | 0.8358 | 448 | 0.9298 | - |
990
+ | 0.8377 | 449 | 0.5457 | - |
991
+ | 0.8396 | 450 | 0.5153 | 0.3762 |
992
+ | 0.8414 | 451 | 0.7944 | - |
993
+ | 0.8433 | 452 | 0.274 | - |
994
+ | 0.8451 | 453 | 0.1943 | - |
995
+ | 0.8470 | 454 | 0.865 | - |
996
+ | 0.8489 | 455 | 0.577 | - |
997
+ | 0.8507 | 456 | 0.1895 | - |
998
+ | 0.8526 | 457 | 0.284 | - |
999
+ | 0.8545 | 458 | 0.2472 | - |
1000
+ | 0.8563 | 459 | 0.3254 | - |
1001
+ | 0.8582 | 460 | 0.9113 | 0.3778 |
1002
+ | 0.8601 | 461 | 0.4037 | - |
1003
+ | 0.8619 | 462 | 0.2395 | - |
1004
+ | 0.8638 | 463 | 0.9176 | - |
1005
+ | 0.8657 | 464 | 0.1605 | - |
1006
+ | 0.8675 | 465 | 0.2563 | - |
1007
+ | 0.8694 | 466 | 0.403 | - |
1008
+ | 0.8713 | 467 | 0.6036 | - |
1009
+ | 0.8731 | 468 | 0.368 | - |
1010
+ | 0.875 | 469 | 0.3447 | - |
1011
+ | 0.8769 | 470 | 0.1836 | 0.3848 |
1012
+ | 0.8787 | 471 | 0.4374 | - |
1013
+ | 0.8806 | 472 | 0.1704 | - |
1014
+ | 0.8825 | 473 | 0.326 | - |
1015
+ | 0.8843 | 474 | 0.3527 | - |
1016
+ | 0.8862 | 475 | 0.8108 | - |
1017
+ | 0.8881 | 476 | 0.7219 | - |
1018
+ | 0.8899 | 477 | 0.2727 | - |
1019
+ | 0.8918 | 478 | 0.6034 | - |
1020
+ | 0.8937 | 479 | 0.8513 | - |
1021
+ | 0.8955 | 480 | 0.2772 | 0.3935 |
1022
+ | 0.8974 | 481 | 0.4888 | - |
1023
+ | 0.8993 | 482 | 0.6024 | - |
1024
+ | 0.9011 | 483 | 1.1502 | - |
1025
+ | 0.9030 | 484 | 0.5434 | - |
1026
+ | 0.9049 | 485 | 0.2632 | - |
1027
+ | 0.9067 | 486 | 0.0767 | - |
1028
+ | 0.9086 | 487 | 0.5782 | - |
1029
+ | 0.9104 | 488 | 0.6047 | - |
1030
+ | 0.9123 | 489 | 0.7541 | - |
1031
+ | 0.9142 | 490 | 0.2185 | 0.3965 |
1032
+ | 0.9160 | 491 | 0.1558 | - |
1033
+ | 0.9179 | 492 | 0.1106 | - |
1034
+ | 0.9198 | 493 | 0.7286 | - |
1035
+ | 0.9216 | 494 | 0.1932 | - |
1036
+ | 0.9235 | 495 | 0.6639 | - |
1037
+ | 0.9254 | 496 | 0.422 | - |
1038
+ | 0.9272 | 497 | 0.7506 | - |
1039
+ | 0.9291 | 498 | 0.1227 | - |
1040
+ | 0.9310 | 499 | 0.8022 | - |
1041
+ | **0.9328** | **500** | **0.2475** | **0.3951** |
1042
+ | 0.9347 | 501 | 0.3068 | - |
1043
+ | 0.9366 | 502 | 0.9188 | - |
1044
+ | 0.9384 | 503 | 0.3704 | - |
1045
+ | 0.9403 | 504 | 0.2393 | - |
1046
+ | 0.9422 | 505 | 0.7569 | - |
1047
+ | 0.9440 | 506 | 0.3823 | - |
1048
+ | 0.9459 | 507 | 0.1712 | - |
1049
+ | 0.9478 | 508 | 0.3331 | - |
1050
+ | 0.9496 | 509 | 0.3538 | - |
1051
+ | 0.9515 | 510 | 0.4431 | 0.3976 |
1052
+ | 0.9534 | 511 | 0.422 | - |
1053
+ | 0.9552 | 512 | 0.3282 | - |
1054
+ | 0.9571 | 513 | 0.5834 | - |
1055
+ | 0.9590 | 514 | 1.1424 | - |
1056
+ | 0.9608 | 515 | 0.8699 | - |
1057
+ | 0.9627 | 516 | 0.2811 | - |
1058
+ | 0.9646 | 517 | 0.0964 | - |
1059
+ | 0.9664 | 518 | 0.2971 | - |
1060
+ | 0.9683 | 519 | 0.2435 | - |
1061
+ | 0.9701 | 520 | 1.1154 | 0.3987 |
1062
+ | 0.9720 | 521 | 0.2209 | - |
1063
+ | 0.9739 | 522 | 0.1551 | - |
1064
+ | 0.9757 | 523 | 0.3366 | - |
1065
+ | 0.9776 | 524 | 0.5526 | - |
1066
+ | 0.9795 | 525 | 0.3624 | - |
1067
+ | 0.9813 | 526 | 0.3311 | - |
1068
+ | 0.9832 | 527 | 0.7184 | - |
1069
+ | 0.9851 | 528 | 0.893 | - |
1070
+ | 0.9869 | 529 | 0.2642 | - |
1071
+ | 0.9888 | 530 | 0.4994 | 0.3986 |
1072
+ | 0.9907 | 531 | 0.6881 | - |
1073
+ | 0.9925 | 532 | 0.2637 | - |
1074
+ | 0.9944 | 533 | 0.6997 | - |
1075
+ | 0.9963 | 534 | 0.3827 | - |
1076
+ | 0.9981 | 535 | 0.4079 | - |
1077
+ | 1.0 | 536 | 0.0003 | - |
1078
+
1079
+ * The bold row denotes the saved checkpoint.
1080
+ </details>
1081
+
1082
+ ### Framework Versions
1083
+ - Python: 3.11.11
1084
+ - Sentence Transformers: 3.4.1
1085
+ - Transformers: 4.51.1
1086
+ - PyTorch: 2.5.1+cu124
1087
+ - Accelerate: 1.3.0
1088
+ - Datasets: 3.5.0
1089
+ - Tokenizers: 0.21.0
1090
+
1091
+ ## Citation
1092
+
1093
+ ### BibTeX
1094
+
1095
+ #### Sentence Transformers
1096
+ ```bibtex
1097
+ @inproceedings{reimers-2019-sentence-bert,
1098
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1099
+ author = "Reimers, Nils and Gurevych, Iryna",
1100
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1101
+ month = "11",
1102
+ year = "2019",
1103
+ publisher = "Association for Computational Linguistics",
1104
+ url = "https://arxiv.org/abs/1908.10084",
1105
+ }
1106
+ ```
1107
+
1108
+ #### CachedMultipleNegativesRankingLoss
1109
+ ```bibtex
1110
+ @misc{gao2021scaling,
1111
+ title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
1112
+ author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
1113
+ year={2021},
1114
+ eprint={2101.06983},
1115
+ archivePrefix={arXiv},
1116
+ primaryClass={cs.LG}
1117
+ }
1118
+ ```
1119
+
1120
+ <!--
1121
+ ## Glossary
1122
+
1123
+ *Clearly define terms in order to be accessible across audiences.*
1124
+ -->
1125
+
1126
+ <!--
1127
+ ## Model Card Authors
1128
+
1129
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1130
+ -->
1131
+
1132
+ <!--
1133
+ ## Model Card Contact
1134
+
1135
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1136
+ -->
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "dunzhang/stella_en_1.5B_v5",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 96,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.01,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 48,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "q_proj",
27
+ "k_proj",
28
+ "up_proj",
29
+ "gate_proj",
30
+ "down_proj",
31
+ "o_proj",
32
+ "v_proj"
33
+ ],
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8852785838c6c9efee5560b89cc3948f005cafa3a73fe7846630d0911de16ca9
3
+ size 221627416
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|endoftext|>": 151643,
3
+ "<|im_end|>": 151645,
4
+ "<|im_start|>": 151644
5
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "s2p_query": "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: ",
9
+ "s2s_query": "Instruct: Retrieve semantically similar text.\nQuery: "
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine"
13
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "eos_token": {
7
+ "content": "<|endoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "pad_token": {
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ }
20
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f79052deba517b0663d877714e117a31a4a6243cddb85fc4443c80a2fa65a20
3
+ size 11419302
tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "additional_special_tokens": [
30
+ "<|im_start|>",
31
+ "<|im_end|>"
32
+ ],
33
+ "auto_map": {
34
+ "AutoTokenizer": [
35
+ "dunzhang/stella_en_1.5B_v5--tokenization_qwen.Qwen2Tokenizer",
36
+ "dunzhang/stella_en_1.5B_v5--tokenization_qwen.Qwen2TokenizerFast"
37
+ ]
38
+ },
39
+ "bos_token": null,
40
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
41
+ "clean_up_tokenization_spaces": false,
42
+ "eos_token": "<|endoftext|>",
43
+ "errors": "replace",
44
+ "extra_special_tokens": {},
45
+ "model_max_length": 512,
46
+ "pad_token": "<|endoftext|>",
47
+ "split_special_tokens": false,
48
+ "tokenizer_class": "Qwen2Tokenizer",
49
+ "unk_token": null
50
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09e2f89f3fd81076a08f68ebd8d8bcbea9efc13ef33bb2c17fb93326deefd102
3
+ size 5624
vocab.json ADDED
The diff for this file is too large to render. See raw diff