GphaHoa commited on
Commit
68c90ab
·
verified ·
1 Parent(s): 8412c6e

Upload 13 files

Browse files
README.md CHANGED
@@ -1,3 +1,366 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - generated_from_trainer
6
+ - dataset_size:1122150
7
+ - loss:BinaryCrossEntropyLoss
8
+ base_model: cross-encoder/stsb-distilroberta-base
9
+ pipeline_tag: text-ranking
10
+ library_name: sentence-transformers
11
+ metrics:
12
+ - map
13
+ - mrr@50
14
+ - ndcg@50
15
+ model-index:
16
+ - name: CrossEncoder based on cross-encoder/stsb-distilroberta-base
17
+ results:
18
+ - task:
19
+ type: cross-encoder-reranking
20
+ name: Cross Encoder Reranking
21
+ dataset:
22
+ name: reranking dev
23
+ type: reranking-dev
24
+ metrics:
25
+ - type: map
26
+ value: 0.6701
27
+ name: Map
28
+ - type: mrr@50
29
+ value: 0.7572
30
+ name: Mrr@50
31
+ - type: ndcg@50
32
+ value: 0.775
33
+ name: Ndcg@50
34
+ ---
35
+
36
+ # CrossEncoder based on cross-encoder/stsb-distilroberta-base
37
+
38
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/stsb-distilroberta-base](https://huggingface.co/cross-encoder/stsb-distilroberta-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
39
+
40
+ ## Model Details
41
+
42
+ ### Model Description
43
+ - **Model Type:** Cross Encoder
44
+ - **Base model:** [cross-encoder/stsb-distilroberta-base](https://huggingface.co/cross-encoder/stsb-distilroberta-base) <!-- at revision 6b71347df6e2b34246b53e06d6bce70ef67de368 -->
45
+ - **Maximum Sequence Length:** 128 tokens
46
+ - **Number of Output Labels:** 1 label
47
+ <!-- - **Training Dataset:** Unknown -->
48
+ <!-- - **Language:** Unknown -->
49
+ <!-- - **License:** Unknown -->
50
+
51
+ ### Model Sources
52
+
53
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
54
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
55
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
56
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
57
+
58
+ ## Usage
59
+
60
+ ### Direct Usage (Sentence Transformers)
61
+
62
+ First install the Sentence Transformers library:
63
+
64
+ ```bash
65
+ pip install -U sentence-transformers
66
+ ```
67
+
68
+ Then you can load this model and run inference.
69
+ ```python
70
+ from sentence_transformers import CrossEncoder
71
+
72
+ # Download from the 🤗 Hub
73
+ model = CrossEncoder("cross_encoder_model_id")
74
+ # Get scores for pairs of texts
75
+ pairs = [
76
+ ['Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and', 'Discussing different types of quadrilaterals. Quadrilateral is a closed figure with four line segments. Each point where the two line segments meet is called a vertex. The closed figure also form four angles.. Discussing different types of quadrilaterals Albert Mhango, Mzimba Introduction: Quad'],
77
+ ['Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and', 'Discussing properties of quadrilaterals. The common properties that you will see in every quadrilateral include; all quadrilaterals have four sides, they all consist of four vertices and the sum of interior angles is equal to 360 degrees.. Discussing properties of quadrilaterals Albert Mhango,'],
78
+ ['Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and', 'Calculating interior and exterior angles of a triangle. The exterior angle of a triangle is equal to the sum of two opposite interior angles. This property will help you to find angles in a triangle and exterior angles.. Calculating interior and exterior angles of a triangle Albert Mhango, Mzimba Introduction: The'],
79
+ ['Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and', 'Using properties of quadrilaterals to solve problems. A quadrilateral is a geometric figure with four sides. The general properties of quadrilaterals include; they all have four sides, have two diagonals, have four interior angles and the sum of their interior angles is equal to 360 degrees.. Using properties'],
80
+ ['Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and', 'Solving Problems Involving Polygons. In this lesson, you will learn how we can use the exterior angles of polygon formula to solve problems.. Solving Problems Involving Polygons Mary Chagwa, Blantyre Introduction: In the previous lesson, you were deriving the formula for finding the sum'],
81
+ ]
82
+ scores = model.predict(pairs)
83
+ print(scores.shape)
84
+ # (5,)
85
+
86
+ # Or rank different texts based on similarity to a single text
87
+ ranks = model.rank(
88
+ 'Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and',
89
+ [
90
+ 'Discussing different types of quadrilaterals. Quadrilateral is a closed figure with four line segments. Each point where the two line segments meet is called a vertex. The closed figure also form four angles.. Discussing different types of quadrilaterals Albert Mhango, Mzimba Introduction: Quad',
91
+ 'Discussing properties of quadrilaterals. The common properties that you will see in every quadrilateral include; all quadrilaterals have four sides, they all consist of four vertices and the sum of interior angles is equal to 360 degrees.. Discussing properties of quadrilaterals Albert Mhango,',
92
+ 'Calculating interior and exterior angles of a triangle. The exterior angle of a triangle is equal to the sum of two opposite interior angles. This property will help you to find angles in a triangle and exterior angles.. Calculating interior and exterior angles of a triangle Albert Mhango, Mzimba Introduction: The',
93
+ 'Using properties of quadrilaterals to solve problems. A quadrilateral is a geometric figure with four sides. The general properties of quadrilaterals include; they all have four sides, have two diagonals, have four interior angles and the sum of their interior angles is equal to 360 degrees.. Using properties',
94
+ 'Solving Problems Involving Polygons. In this lesson, you will learn how we can use the exterior angles of polygon formula to solve problems.. Solving Problems Involving Polygons Mary Chagwa, Blantyre Introduction: In the previous lesson, you were deriving the formula for finding the sum',
95
+ ]
96
+ )
97
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
98
+ ```
99
+
100
+ <!--
101
+ ### Direct Usage (Transformers)
102
+
103
+ <details><summary>Click to see the direct usage in Transformers</summary>
104
+
105
+ </details>
106
+ -->
107
+
108
+ <!--
109
+ ### Downstream Usage (Sentence Transformers)
110
+
111
+ You can finetune this model on your own dataset.
112
+
113
+ <details><summary>Click to expand</summary>
114
+
115
+ </details>
116
+ -->
117
+
118
+ <!--
119
+ ### Out-of-Scope Use
120
+
121
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
122
+ -->
123
+
124
+ ## Evaluation
125
+
126
+ ### Metrics
127
+
128
+ #### Cross Encoder Reranking
129
+
130
+ * Dataset: `reranking-dev`
131
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
132
+ ```json
133
+ {
134
+ "at_k": 50,
135
+ "always_rerank_positives": false
136
+ }
137
+ ```
138
+
139
+ | Metric | Value |
140
+ |:------------|:---------------------|
141
+ | map | 0.6701 (+0.0486) |
142
+ | mrr@50 | 0.7572 (+0.0196) |
143
+ | **ndcg@50** | **0.7750 (+0.0495)** |
144
+
145
+ <!--
146
+ ## Bias, Risks and Limitations
147
+
148
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
149
+ -->
150
+
151
+ <!--
152
+ ### Recommendations
153
+
154
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
155
+ -->
156
+
157
+ ## Training Details
158
+
159
+ ### Training Dataset
160
+
161
+ #### Unnamed Dataset
162
+
163
+ * Size: 1,122,150 training samples
164
+ * Columns: <code>topic</code>, <code>content</code>, and <code>label</code>
165
+ * Approximate statistics based on the first 1000 samples:
166
+ | | topic | content | label |
167
+ |:--------|:------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:-----------------------------------------------|
168
+ | type | string | string | int |
169
+ | details | <ul><li>min: 42 characters</li><li>mean: 147.6 characters</li><li>max: 336 characters</li></ul> | <ul><li>min: 5 characters</li><li>mean: 148.86 characters</li><li>max: 376 characters</li></ul> | <ul><li>0: ~90.70%</li><li>1: ~9.30%</li></ul> |
170
+ * Samples:
171
+ | topic | content | label |
172
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
173
+ | <code>Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and</code> | <code>Discussing different types of quadrilaterals. Quadrilateral is a closed figure with four line segments. Each point where the two line segments meet is called a vertex. The closed figure also form four angles.. Discussing different types of quadrilaterals Albert Mhango, Mzimba Introduction: Quad</code> | <code>1</code> |
174
+ | <code>Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and</code> | <code>Discussing properties of quadrilaterals. The common properties that you will see in every quadrilateral include; all quadrilaterals have four sides, they all consist of four vertices and the sum of interior angles is equal to 360 degrees.. Discussing properties of quadrilaterals Albert Mhango,</code> | <code>1</code> |
175
+ | <code>Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and</code> | <code>Calculating interior and exterior angles of a triangle. The exterior angle of a triangle is equal to the sum of two opposite interior angles. This property will help you to find angles in a triangle and exterior angles.. Calculating interior and exterior angles of a triangle Albert Mhango, Mzimba Introduction: The</code> | <code>1</code> |
176
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
177
+ ```json
178
+ {
179
+ "activation_fn": "torch.nn.modules.linear.Identity",
180
+ "pos_weight": 11.914752960205078
181
+ }
182
+ ```
183
+
184
+ ### Training Hyperparameters
185
+ #### Non-Default Hyperparameters
186
+
187
+ - `eval_strategy`: steps
188
+ - `per_device_train_batch_size`: 128
189
+ - `per_device_eval_batch_size`: 128
190
+ - `learning_rate`: 2e-05
191
+ - `num_train_epochs`: 2
192
+ - `warmup_ratio`: 0.1
193
+ - `seed`: 12
194
+ - `bf16`: True
195
+ - `dataloader_num_workers`: 4
196
+ - `load_best_model_at_end`: True
197
+
198
+ #### All Hyperparameters
199
+ <details><summary>Click to expand</summary>
200
+
201
+ - `overwrite_output_dir`: False
202
+ - `do_predict`: False
203
+ - `eval_strategy`: steps
204
+ - `prediction_loss_only`: True
205
+ - `per_device_train_batch_size`: 128
206
+ - `per_device_eval_batch_size`: 128
207
+ - `per_gpu_train_batch_size`: None
208
+ - `per_gpu_eval_batch_size`: None
209
+ - `gradient_accumulation_steps`: 1
210
+ - `eval_accumulation_steps`: None
211
+ - `torch_empty_cache_steps`: None
212
+ - `learning_rate`: 2e-05
213
+ - `weight_decay`: 0.0
214
+ - `adam_beta1`: 0.9
215
+ - `adam_beta2`: 0.999
216
+ - `adam_epsilon`: 1e-08
217
+ - `max_grad_norm`: 1.0
218
+ - `num_train_epochs`: 2
219
+ - `max_steps`: -1
220
+ - `lr_scheduler_type`: linear
221
+ - `lr_scheduler_kwargs`: {}
222
+ - `warmup_ratio`: 0.1
223
+ - `warmup_steps`: 0
224
+ - `log_level`: passive
225
+ - `log_level_replica`: warning
226
+ - `log_on_each_node`: True
227
+ - `logging_nan_inf_filter`: True
228
+ - `save_safetensors`: True
229
+ - `save_on_each_node`: False
230
+ - `save_only_model`: False
231
+ - `restore_callback_states_from_checkpoint`: False
232
+ - `no_cuda`: False
233
+ - `use_cpu`: False
234
+ - `use_mps_device`: False
235
+ - `seed`: 12
236
+ - `data_seed`: None
237
+ - `jit_mode_eval`: False
238
+ - `use_ipex`: False
239
+ - `bf16`: True
240
+ - `fp16`: False
241
+ - `fp16_opt_level`: O1
242
+ - `half_precision_backend`: auto
243
+ - `bf16_full_eval`: False
244
+ - `fp16_full_eval`: False
245
+ - `tf32`: None
246
+ - `local_rank`: 0
247
+ - `ddp_backend`: None
248
+ - `tpu_num_cores`: None
249
+ - `tpu_metrics_debug`: False
250
+ - `debug`: []
251
+ - `dataloader_drop_last`: False
252
+ - `dataloader_num_workers`: 4
253
+ - `dataloader_prefetch_factor`: None
254
+ - `past_index`: -1
255
+ - `disable_tqdm`: False
256
+ - `remove_unused_columns`: True
257
+ - `label_names`: None
258
+ - `load_best_model_at_end`: True
259
+ - `ignore_data_skip`: False
260
+ - `fsdp`: []
261
+ - `fsdp_min_num_params`: 0
262
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
263
+ - `fsdp_transformer_layer_cls_to_wrap`: None
264
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
265
+ - `deepspeed`: None
266
+ - `label_smoothing_factor`: 0.0
267
+ - `optim`: adamw_torch
268
+ - `optim_args`: None
269
+ - `adafactor`: False
270
+ - `group_by_length`: False
271
+ - `length_column_name`: length
272
+ - `ddp_find_unused_parameters`: None
273
+ - `ddp_bucket_cap_mb`: None
274
+ - `ddp_broadcast_buffers`: False
275
+ - `dataloader_pin_memory`: True
276
+ - `dataloader_persistent_workers`: False
277
+ - `skip_memory_metrics`: True
278
+ - `use_legacy_prediction_loop`: False
279
+ - `push_to_hub`: False
280
+ - `resume_from_checkpoint`: None
281
+ - `hub_model_id`: None
282
+ - `hub_strategy`: every_save
283
+ - `hub_private_repo`: None
284
+ - `hub_always_push`: False
285
+ - `gradient_checkpointing`: False
286
+ - `gradient_checkpointing_kwargs`: None
287
+ - `include_inputs_for_metrics`: False
288
+ - `include_for_metrics`: []
289
+ - `eval_do_concat_batches`: True
290
+ - `fp16_backend`: auto
291
+ - `push_to_hub_model_id`: None
292
+ - `push_to_hub_organization`: None
293
+ - `mp_parameters`:
294
+ - `auto_find_batch_size`: False
295
+ - `full_determinism`: False
296
+ - `torchdynamo`: None
297
+ - `ray_scope`: last
298
+ - `ddp_timeout`: 1800
299
+ - `torch_compile`: False
300
+ - `torch_compile_backend`: None
301
+ - `torch_compile_mode`: None
302
+ - `include_tokens_per_second`: False
303
+ - `include_num_input_tokens_seen`: False
304
+ - `neftune_noise_alpha`: None
305
+ - `optim_target_modules`: None
306
+ - `batch_eval_metrics`: False
307
+ - `eval_on_start`: False
308
+ - `use_liger_kernel`: False
309
+ - `eval_use_gather_object`: False
310
+ - `average_tokens_across_devices`: False
311
+ - `prompts`: None
312
+ - `batch_sampler`: batch_sampler
313
+ - `multi_dataset_batch_sampler`: proportional
314
+
315
+ </details>
316
+
317
+ ### Training Logs
318
+ | Epoch | Step | Training Loss | reranking-dev_ndcg@50 |
319
+ |:------:|:-----:|:-------------:|:---------------------:|
320
+ | 1.0001 | 8768 | 0.5739 | 0.7669 (+0.0414) |
321
+ | 1.5002 | 13152 | 0.6846 | 0.7750 (+0.0495) |
322
+
323
+
324
+ ### Framework Versions
325
+ - Python: 3.11.13
326
+ - Sentence Transformers: 4.1.0
327
+ - Transformers: 4.52.4
328
+ - PyTorch: 2.6.0+cu124
329
+ - Accelerate: 1.7.0
330
+ - Datasets: 2.14.4
331
+ - Tokenizers: 0.21.1
332
+
333
+ ## Citation
334
+
335
+ ### BibTeX
336
+
337
+ #### Sentence Transformers
338
+ ```bibtex
339
+ @inproceedings{reimers-2019-sentence-bert,
340
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
341
+ author = "Reimers, Nils and Gurevych, Iryna",
342
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
343
+ month = "11",
344
+ year = "2019",
345
+ publisher = "Association for Computational Linguistics",
346
+ url = "https://arxiv.org/abs/1908.10084",
347
+ }
348
+ ```
349
+
350
+ <!--
351
+ ## Glossary
352
+
353
+ *Clearly define terms in order to be accessible across audiences.*
354
+ -->
355
+
356
+ <!--
357
+ ## Model Card Authors
358
+
359
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
360
+ -->
361
+
362
+ <!--
363
+ ## Model Card Contact
364
+
365
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
366
+ -->
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "LABEL_0"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 3072,
18
+ "label2id": {
19
+ "LABEL_0": 0
20
+ },
21
+ "layer_norm_eps": 1e-05,
22
+ "max_position_embeddings": 514,
23
+ "model_type": "roberta",
24
+ "num_attention_heads": 12,
25
+ "num_hidden_layers": 6,
26
+ "pad_token_id": 1,
27
+ "position_embedding_type": "absolute",
28
+ "sentence_transformers": {
29
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
30
+ "version": "4.1.0"
31
+ },
32
+ "torch_dtype": "float32",
33
+ "transformers_version": "4.52.4",
34
+ "type_vocab_size": 1,
35
+ "use_cache": true,
36
+ "vocab_size": 50265
37
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55fcb1227953f704f05ee9c7b79e775047b794021e25a3e8ddbb78945305bef0
3
+ size 328489204
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddac13df73512f6ec7dd409135468f93d79b7202904d70efb9d9824b2b9b4f27
3
+ size 657041466
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1542eb6fb59dc9864cd057e0dd24538894542520653336503f1ebcf817ebacb8
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:279c12b1c3f0487a2b66dee686cc0dadaa4f214680c384a16910d2e2fd3d627d
3
+ size 1064
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "full_tokenizer_file": null,
52
+ "mask_token": "<mask>",
53
+ "model_max_length": 128,
54
+ "pad_token": "<pad>",
55
+ "sep_token": "</s>",
56
+ "tokenizer_class": "RobertaTokenizer",
57
+ "trim_offsets": true,
58
+ "unk_token": "<unk>"
59
+ }
trainer_state.json ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 13152,
3
+ "best_metric": 0.7750120213490577,
4
+ "best_model_checkpoint": "content/cross_encoder_distilroberta_base_all_data/checkpoint-13152",
5
+ "epoch": 2.0,
6
+ "eval_steps": 4384,
7
+ "global_step": 17534,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.00011406410402646287,
14
+ "grad_norm": 11.324646949768066,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.2833,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 0.5000570320520132,
21
+ "grad_norm": 8.051932334899902,
22
+ "learning_rate": 1.1112801013941699e-05,
23
+ "loss": 0.8934,
24
+ "step": 4384
25
+ },
26
+ {
27
+ "epoch": 0.5000570320520132,
28
+ "eval_reranking-dev_base_map": 0.6214825536231217,
29
+ "eval_reranking-dev_base_mrr@50": 0.7375349668670806,
30
+ "eval_reranking-dev_base_ndcg@50": 0.725527756915131,
31
+ "eval_reranking-dev_map": 0.6377966435473792,
32
+ "eval_reranking-dev_mrr@50": 0.7335214157004677,
33
+ "eval_reranking-dev_ndcg@50": 0.7532420097623872,
34
+ "eval_runtime": 210.6845,
35
+ "eval_samples_per_second": 0.0,
36
+ "eval_steps_per_second": 0.0,
37
+ "step": 4384
38
+ },
39
+ {
40
+ "epoch": 1.0001140641040265,
41
+ "grad_norm": 9.359151840209961,
42
+ "learning_rate": 0.0,
43
+ "loss": 0.5739,
44
+ "step": 8768
45
+ },
46
+ {
47
+ "epoch": 1.0001140641040265,
48
+ "eval_reranking-dev_base_map": 0.6214825536231217,
49
+ "eval_reranking-dev_base_mrr@50": 0.7375349668670806,
50
+ "eval_reranking-dev_base_ndcg@50": 0.725527756915131,
51
+ "eval_reranking-dev_map": 0.658237243359361,
52
+ "eval_reranking-dev_mrr@50": 0.7475440647504192,
53
+ "eval_reranking-dev_ndcg@50": 0.7669254328128408,
54
+ "eval_runtime": 213.929,
55
+ "eval_samples_per_second": 0.0,
56
+ "eval_steps_per_second": 0.0,
57
+ "step": 8768
58
+ },
59
+ {
60
+ "epoch": 1.5001710961560397,
61
+ "grad_norm": 12.708351135253906,
62
+ "learning_rate": 5.555133079847909e-06,
63
+ "loss": 0.6846,
64
+ "step": 13152
65
+ },
66
+ {
67
+ "epoch": 1.5001710961560397,
68
+ "eval_reranking-dev_base_map": 0.6214825536231217,
69
+ "eval_reranking-dev_base_mrr@50": 0.7375349668670806,
70
+ "eval_reranking-dev_base_ndcg@50": 0.725527756915131,
71
+ "eval_reranking-dev_map": 0.6701267463119136,
72
+ "eval_reranking-dev_mrr@50": 0.7571781873839967,
73
+ "eval_reranking-dev_ndcg@50": 0.7750120213490577,
74
+ "eval_runtime": 212.3233,
75
+ "eval_samples_per_second": 0.0,
76
+ "eval_steps_per_second": 0.0,
77
+ "step": 13152
78
+ }
79
+ ],
80
+ "logging_steps": 4384,
81
+ "max_steps": 17534,
82
+ "num_input_tokens_seen": 0,
83
+ "num_train_epochs": 2,
84
+ "save_steps": 4384,
85
+ "stateful_callbacks": {
86
+ "EarlyStoppingCallback": {
87
+ "args": {
88
+ "early_stopping_patience": 3,
89
+ "early_stopping_threshold": 0.0
90
+ },
91
+ "attributes": {
92
+ "early_stopping_patience_counter": 0
93
+ }
94
+ },
95
+ "TrainerControl": {
96
+ "args": {
97
+ "should_epoch_stop": false,
98
+ "should_evaluate": false,
99
+ "should_log": false,
100
+ "should_save": true,
101
+ "should_training_stop": true
102
+ },
103
+ "attributes": {}
104
+ }
105
+ },
106
+ "total_flos": 0.0,
107
+ "train_batch_size": 128,
108
+ "trial_name": null,
109
+ "trial_params": null
110
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e578021e182d911578884f2a78658440e44dd70363c563d6a765c69118df4dfd
3
+ size 5624
vocab.json ADDED
The diff for this file is too large to render. See raw diff