Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,31 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
## Model Specification
|
| 6 |
+
- This is the **state-of-the-art Twitter POS tagging model (with 95.38\% Accuracy)** on Tweebank V2's NER benchmark (also called `Tweebank-NER`), trained on the corpus combining both Tweebank-NER and English-EWT training data.
|
| 7 |
+
- For more details about the `TweebankNLP` project, please refer to this [our paper](https://arxiv.org/pdf/2201.07281.pdf) and [github](https://github.com/social-machines/TweebankNLP) page.
|
| 8 |
+
- In the paper, it is referred as `HuggingFace-BERTweet (TB2+EWT)` in the POS table.
|
| 9 |
+
|
| 10 |
+
## How to use the model
|
| 11 |
+
|
| 12 |
+
```python
|
| 13 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 14 |
+
|
| 15 |
+
tokenizer = AutoTokenizer.from_pretrained("TweebankNLP/bertweet-tb2_ewt-pos-tagging")
|
| 16 |
+
|
| 17 |
+
model = AutoModelForTokenClassification.from_pretrained("TweebankNLP/bertweet-tb2_ewt-pos-tagging")
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
## References
|
| 21 |
+
|
| 22 |
+
If you use this repository in your research, please kindly cite [our paper](https://arxiv.org/pdf/2201.07281.pdf):
|
| 23 |
+
|
| 24 |
+
```bibtex
|
| 25 |
+
@article{jiang2022tweetnlp,
|
| 26 |
+
title={Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis},
|
| 27 |
+
author={Jiang, Hang and Hua, Yining and Beeferman, Doug and Roy, Deb},
|
| 28 |
+
journal={In Proceedings of the 13th Language Resources and Evaluation Conference (LREC)},
|
| 29 |
+
year={2022}
|
| 30 |
+
}
|
| 31 |
+
```
|