Spaces:

hmbert-tiny
/

README

Running

App Files Files Community

README / README.md

stefan-it

readme: add initial organization card \o/

8a24013 about 2 years ago

preview code

raw

history blame contribute delete

2.83 kB

	---
	title: README
	emoji: 📈
	colorFrom: red
	colorTo: purple
	sdk: static
	pinned: false
	---

	# hmBERT Tiny

	Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:

	* English (British Library Corpus - Books)
	* German (Europeana Newspaper)
	* French (Europeana Newspaper)
	* Finnish (Europeana Newspaper)
	* Swedish (Europeana Newspaper)

	More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
	[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).

	<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
	<p>
	The hmBERT Tiny model is a 2-layer model with a hidden size of 128. It has only 4.58M parameters in total.
	</p>
	</div>

	# Leaderboard

	We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
	The following table shows an overview of used datasets:

	\| Language \| Datasets \|
	\|----------\|------------------------------------------------------------------\|
	\| English \| [AjMC] - [TopRes19th] \|
	\| German \| [AjMC] - [NewsEye] - [HIPE-2020] \|
	\| French \| [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] \|
	\| Finnish \| [NewsEye] \|
	\| Swedish \| [NewsEye] \|
	\| Dutch \| [ICDAR-Europeana] \|

	[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
	[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
	[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
	[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
	[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
	[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md

	All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard).

	# Acknowledgements

	We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
	[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models.

	Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
	Many Thanks for providing access to the TPUs ❤️