Update README.md
Browse files
README.md
CHANGED
|
@@ -40,6 +40,8 @@ More information needed
|
|
| 40 |
|
| 41 |
### Training hyperparameters
|
| 42 |
|
|
|
|
|
|
|
| 43 |
The following hyperparameters were used during training:
|
| 44 |
- learning_rate: 4e-05
|
| 45 |
- train_batch_size: 8
|
|
@@ -53,8 +55,20 @@ The following hyperparameters were used during training:
|
|
| 53 |
- lr_scheduler_warmup_ratio: 0.01
|
| 54 |
- num_epochs: 3
|
| 55 |
|
| 56 |
-
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
### Framework versions
|
|
|
|
| 40 |
|
| 41 |
### Training hyperparameters
|
| 42 |
|
| 43 |
+
**SESSION ONE**
|
| 44 |
+
|
| 45 |
The following hyperparameters were used during training:
|
| 46 |
- learning_rate: 4e-05
|
| 47 |
- train_batch_size: 8
|
|
|
|
| 55 |
- lr_scheduler_warmup_ratio: 0.01
|
| 56 |
- num_epochs: 3
|
| 57 |
|
| 58 |
+
**SESSION TWO**
|
| 59 |
|
| 60 |
+
The following hyperparameters were used during training:
|
| 61 |
+
- learning_rate: 1e-05
|
| 62 |
+
- train_batch_size: 16
|
| 63 |
+
- eval_batch_size: 16
|
| 64 |
+
- seed: 42
|
| 65 |
+
- distributed_type: multi-GPU
|
| 66 |
+
- gradient_accumulation_steps: 4
|
| 67 |
+
- total_train_batch_size: 64
|
| 68 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 69 |
+
- lr_scheduler_type: cosine
|
| 70 |
+
- lr_scheduler_warmup_ratio: 0.05
|
| 71 |
+
- num_epochs: 4
|
| 72 |
|
| 73 |
|
| 74 |
### Framework versions
|