stefan-it commited on
Commit
d1a8ca0
·
1 Parent(s): aed1b30

readme: update model card

Browse files
Files changed (1) hide show
  1. README.md +14 -13
README.md CHANGED
@@ -9,9 +9,9 @@ language:
9
  - nl
10
  ---
11
 
12
- # hmByT5
13
 
14
- Upcoming Historic Multilingual ByT5 Model. It covers the following languages:
15
 
16
  * English (British Library Corpus - Books)
17
  * German (Europeana Newspaper)
@@ -20,6 +20,8 @@ Upcoming Historic Multilingual ByT5 Model. It covers the following languages:
20
  * Swedish (Europeana Newspaper)
21
  * Dutch (Delpher Corpus)
22
 
 
 
23
  # Pretraining
24
 
25
  We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
@@ -27,19 +29,18 @@ We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
27
 
28
  # Evaluation on Downstream Tasks (NER)
29
 
30
- We use Flair to fine-tune hmByT5 on HIPE-2022 data. Details about the fine-tuning can be found
31
- [here](https://github.com/stefan-it/hmByT5/tree/main/bench).
32
 
33
- # **New**: Logbook
 
 
 
 
 
34
 
35
- * 07.04.2022: Pretraining for 200k steps on the English corpus finished without crashes! TensorBoard logs can be found
36
- on the [Model Hub](https://huggingface.co/stefan-it/byt5-small-historic-multilingual/tensorboard). We
37
- also uploaded all checkpoints (we checkpoint every 25k steps)
38
- [here](https://huggingface.co/stefan-it/byt5-small-historic-multilingual). Fine-Tuning experiments (NER)
39
- on the English part of AjMC corpus from HIPE-2022 are running.
40
- * 03.04.2022: We start ByT5 pretraining with official ByT5 implementation on a v3-32 TPU Pod - thankfully provided by
41
- [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC) program. Plan is to pretrain on the
42
- English corpus for 200k steps and use the original ByT5 Small model as init checkpoint.
43
 
44
  # Acknowledgements
45
 
 
9
  - nl
10
  ---
11
 
12
+ # hmByT5 - Preliminary Language Models
13
 
14
+ Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:
15
 
16
  * English (British Library Corpus - Books)
17
  * German (Europeana Newspaper)
 
20
  * Swedish (Europeana Newspaper)
21
  * Dutch (Delpher Corpus)
22
 
23
+ More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
24
+
25
  # Pretraining
26
 
27
  We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
 
29
 
30
  # Evaluation on Downstream Tasks (NER)
31
 
32
+ We evaluated the hmByT5 model that was pretrained on English AjMC corpus for 200k steps:
 
33
 
34
+ | Hyper-param Configuration | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
35
+ |------------------------------------------|-------|-------|-------|-------|-------|--------------|
36
+ | `wsFalse-bs4-e10-lr0.00016-poolingfirst` | 83.80 | 84.78 | 83.74 | 83.35 | 84.37 | 84.01 ± 0.50 |
37
+ | `wsFalse-bs4-e10-lr0.00015-poolingfirst` | 84.67 | 82.69 | 83.92 | 84.53 | 82.90 | 83.74 ± 0.82 |
38
+ | `wsFalse-bs8-e10-lr0.00016-poolingfirst` | 82.12 | 83.82 | 83.37 | 83.00 | 83.70 | 83.20 ± 0.61 |
39
+ | `wsFalse-bs8-e10-lr0.00015-poolingfirst` | 83.45 | 82.83 | 84.15 | 81.76 | 83.78 | 83.19 ± 0.84 |
40
 
41
+ It turns out, that the results are not on-par with current SOTA on the English AjMC corpus, see a comparison
42
+ [here](https://github.com/stefan-it/blbooks-lms#model-zoo). Thus, we continue experiments with the Hugging Face
43
+ Transformers JAX/FLAX implementation to pretrain ByT5 models on TPU.
 
 
 
 
 
44
 
45
  # Acknowledgements
46