readme: update model card
Browse files
README.md
CHANGED
|
@@ -9,9 +9,9 @@ language:
|
|
| 9 |
- nl
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# hmByT5
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
* English (British Library Corpus - Books)
|
| 17 |
* German (Europeana Newspaper)
|
|
@@ -20,6 +20,8 @@ Upcoming Historic Multilingual ByT5 Model. It covers the following languages:
|
|
| 20 |
* Swedish (Europeana Newspaper)
|
| 21 |
* Dutch (Delpher Corpus)
|
| 22 |
|
|
|
|
|
|
|
| 23 |
# Pretraining
|
| 24 |
|
| 25 |
We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
|
|
@@ -27,19 +29,18 @@ We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
|
|
| 27 |
|
| 28 |
# Evaluation on Downstream Tasks (NER)
|
| 29 |
|
| 30 |
-
We
|
| 31 |
-
[here](https://github.com/stefan-it/hmByT5/tree/main/bench).
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
[here](https://huggingface.co/stefan-it/byt5-small-historic-multilingual). Fine-Tuning experiments (NER)
|
| 39 |
-
on the English part of AjMC corpus from HIPE-2022 are running.
|
| 40 |
-
* 03.04.2022: We start ByT5 pretraining with official ByT5 implementation on a v3-32 TPU Pod - thankfully provided by
|
| 41 |
-
[TPU Research Cloud](https://sites.research.google/trc/about/) (TRC) program. Plan is to pretrain on the
|
| 42 |
-
English corpus for 200k steps and use the original ByT5 Small model as init checkpoint.
|
| 43 |
|
| 44 |
# Acknowledgements
|
| 45 |
|
|
|
|
| 9 |
- nl
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# hmByT5 - Preliminary Language Models
|
| 13 |
|
| 14 |
+
Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:
|
| 15 |
|
| 16 |
* English (British Library Corpus - Books)
|
| 17 |
* German (Europeana Newspaper)
|
|
|
|
| 20 |
* Swedish (Europeana Newspaper)
|
| 21 |
* Dutch (Delpher Corpus)
|
| 22 |
|
| 23 |
+
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
|
| 24 |
+
|
| 25 |
# Pretraining
|
| 26 |
|
| 27 |
We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
|
|
|
|
| 29 |
|
| 30 |
# Evaluation on Downstream Tasks (NER)
|
| 31 |
|
| 32 |
+
We evaluated the hmByT5 model that was pretrained on English AjMC corpus for 200k steps:
|
|
|
|
| 33 |
|
| 34 |
+
| Hyper-param Configuration | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
|
| 35 |
+
|------------------------------------------|-------|-------|-------|-------|-------|--------------|
|
| 36 |
+
| `wsFalse-bs4-e10-lr0.00016-poolingfirst` | 83.80 | 84.78 | 83.74 | 83.35 | 84.37 | 84.01 ± 0.50 |
|
| 37 |
+
| `wsFalse-bs4-e10-lr0.00015-poolingfirst` | 84.67 | 82.69 | 83.92 | 84.53 | 82.90 | 83.74 ± 0.82 |
|
| 38 |
+
| `wsFalse-bs8-e10-lr0.00016-poolingfirst` | 82.12 | 83.82 | 83.37 | 83.00 | 83.70 | 83.20 ± 0.61 |
|
| 39 |
+
| `wsFalse-bs8-e10-lr0.00015-poolingfirst` | 83.45 | 82.83 | 84.15 | 81.76 | 83.78 | 83.19 ± 0.84 |
|
| 40 |
|
| 41 |
+
It turns out, that the results are not on-par with current SOTA on the English AjMC corpus, see a comparison
|
| 42 |
+
[here](https://github.com/stefan-it/blbooks-lms#model-zoo). Thus, we continue experiments with the Hugging Face
|
| 43 |
+
Transformers JAX/FLAX implementation to pretrain ByT5 models on TPU.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
# Acknowledgements
|
| 46 |
|