Spaces:
Running
Running
| title: README | |
| emoji: 📈 | |
| colorFrom: red | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| # hmBERT Tiny | |
| Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT: | |
| * English (British Library Corpus - Books) | |
| * German (Europeana Newspaper) | |
| * French (Europeana Newspaper) | |
| * Finnish (Europeana Newspaper) | |
| * Swedish (Europeana Newspaper) | |
| More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our | |
| [hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf). | |
| <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"> | |
| <p> | |
| The hmBERT Tiny model is a 2-layer model with a hidden size of 128. It has only 4.58M parameters in total. | |
| </p> | |
| </div> | |
| # Leaderboard | |
| We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. | |
| The following table shows an overview of used datasets: | |
| | Language | Datasets | | |
| |----------|------------------------------------------------------------------| | |
| | English | [AjMC] - [TopRes19th] | | |
| | German | [AjMC] - [NewsEye] - [HIPE-2020] | | |
| | French | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] | | |
| | Finnish | [NewsEye] | | |
| | Swedish | [NewsEye] | | |
| | Dutch | [ICDAR-Europeana] | | |
| [AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md | |
| [NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md | |
| [TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md | |
| [ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar | |
| [LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md | |
| [HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md | |
| All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard). | |
| # Acknowledgements | |
| We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and | |
| [Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models. | |
| Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC). | |
| Many Thanks for providing access to the TPUs ❤️ |