Horizon 1
A larger and more modern variant of Constellation-One for Cockatoo from answerdotai/modernBERT-large
This model is licensed under the Apache-2.0 license
Note:
lmsys/toxic-chat is licensed under CC-BY-NC-4.0, meaning this model cannot be legally used for commercial purposes.
Hardware:
This model was fine-tuned on two NVIDIA A40s with a batch size of 32 and gradient accumulation of 2, totaling to an effective batch size of (32*2) * 2 = 128
Fine-tuned on a dataset size of 232k entries aggregated from:
- ealvaradob/phishing-dataset
- ucberkeley-dlab/measuring-hate-speech
- cardiffnlp/tweet_eval
- lmsys/toxic-chat
- tasksource/jigsaw_toxicity
Software
Training was executed on the Cockatoo_ML_Training server. Metrics are publicly visible at Cockatoo.dev .
Techniques: or label merging, merge_labels on conflict. There have been no manual intervention in data sanitization before/after merging.
Asymmetric losses:
γ- = 3.5
γ+ = 0.5
clipping = 0.05
Optimizer:
adamw
betas = (0.9, 0.999)
eps = 1e-8
momentum = 0.9
LLRD:
decay_factor = 0.98
Hyperparameters:
epoch = 3
batch_size = 32
gradient_accumulation = 2
learning_rate = 5e-5
weight_decay = 0.1
warmup_ratio = 0.1
fp16 = false
bf16 = true
tf32 = true
gradient_checkpointng = false
gradient_clipping = true
gradient_clipping_val = 1.0
attention_implementation = "flash_attention_2"
Available Labels:
"id2label": {
"0": "scam",
"1": "violence",
"2": "harassment",
"3": "hate_speech",
"4": "toxicity",
"5": "obscenity",
"6": "genocide" # genocide is a new addition compared to Constellation
}
Performance
All evaluation metrics are from macro averaging, may contain slight deviations with other data entries due to the discrepancy in different evaluation runs. Metrics from zero-shot evaluation split (not present in training data)
Horizon 1 achieves very high recall values out of the box (0.94 raw) with a comparable precision compared to Constellation (0.566 raw vs. 0.605).
However, this model really shines when trigger thresholds have been fine-tuned:
Default:
| Category | Threshold | F1-Score |
|---|---|---|
| scam | 0.5 | 0.8758 |
| violence | 0.5 | 0.6891 |
| harassment | 0.5 | 0.8279 |
| hate_speech | 0.5 | 0.6581 |
| toxicity | 0.5 | 0.6430 |
| obscenity | 0.5 | 0.6428 |
| genocide | 0.5 | 0.5630 |
| Average | - | 0.7000 |
Tuned:
| Category | Threshold | F1-Score | Delta (vs. default) |
|---|---|---|---|
| scam | 0.7129 | 0.9131 | +0.0373 |
| violence | 0.6238 | 0.7252 | +0.0361 |
| harassment | 0.6535 | 0.8712 | +0.0433 |
| hate_speech | 0.6040 | 0.7082 | +0.0501 |
| toxicity | 0.6238 | 0.7371 | +0.0941 |
| obscenity | 0.6238 | 0.7309 | +0.0881 |
| genocide | 0.6337 | 0.5929 | +0.0299 |
| Average | - | 0.7541 | +0.0541 |
Comparison with Constellation One (tuned):
| Metric | Constellation One | Horizon 1 | Delta (H1 - C1) |
|---|---|---|---|
| Loss | 0.1603 | 0.0245 | -0.1358 |
| Overall Precision | 0.6940 | 0.6809 | -0.0131 |
| Overall Recall | 0.8151 | 0.8554 | +0.0403 |
| Overall F1 | 0.7475 | 0.7448 | -0.0027 |
| Scam Precision | 0.9255 | 0.9330 | +0.0075 |
| Scam Recall | 0.9467 | 0.9009 | -0.0459 |
| Scam F1 | 0.9360 | 0.9167 | -0.0194 |
| Violence Precision | 0.5141 | 0.6293 | +0.1152 |
| Violence Recall | 0.7191 | 0.8828 | +0.1637 |
| Violence F1 | 0.5995 | 0.7348 | +0.1353 |
| Harassment Precision | 0.8238 | 0.8329 | +0.0091 |
| Harassment Recall | 0.8830 | 0.9240 | +0.0410 |
| Harassment F1 | 0.8524 | 0.8761 | +0.0237 |
| Hate Speech Precision | 0.5607 | 0.5965 | +0.0358 |
| Hate Speech Recall | 0.6960 | 0.8652 | +0.1692 |
| Hate Speech F1 | 0.6211 | 0.7061 | +0.0850 |
| Toxicity Precision | 0.6891 | 0.6946 | +0.0056 |
| Toxicity Recall | 0.8025 | 0.7481 | -0.0544 |
| Toxicity F1 | 0.7415 | 0.7204 | -0.0211 |
| Obscenity Precision | 0.6507 | 0.6828 | +0.0321 |
| Obscenity Recall | 0.8431 | 0.7160 | -0.1271 |
| Obscenity F1 | 0.7345 | 0.6990 | -0.0355 |
| Genocide Precision | N/A | 0.3972 | N/A |
| Genocide Recall | N/A | 0.9511 | N/A |
| Genocide F1 | N/A | 0.5604 | N/A |
This model is more "trigger-happy" compared to Constellation One, albeit this can be mitigated in production by increasing thresholds (current values optimized for macro F1).
A newer version is planned to mitigate this behavior.
Resources:
Training/Inferencing server: https://github.com/DominicTWHV/Cockatoo_ML_Training/
Training Metrics: https://cockatoo.dev/ml-training.html
Datasets Used | Citations
| Dataset | License | Link |
|---|---|---|
| Phishing Dataset | MIT | Hugging Face |
| Measuring Hate Speech | CC-BY-4.0 | Hugging Face |
| Tweet Eval (SemEval-2019) | [See Citation]* | Hugging Face |
| Toxic Chat | CC-BY-NC-4.0 | Hugging Face |
| Jigsaw Toxicity | Apache-2.0 | Hugging Face |
Citation: ucberkeley-dlab/measuring-hate-speech
@article{kennedy2020constructing,
title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application},
author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia},
journal={arXiv preprint arXiv:2009.10277},
year={2020}
}
Citation: cardiffnlp/tweet_eval
@inproceedings{basile-etal-2019-semeval,
title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
author = "Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel Pardo, Francisco Manuel and Rosso, Paolo and Sanguinetti, Manuela",
booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
year = "2019",
address = "Minneapolis, Minnesota, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S19-2007",
doi = "10.18653/v1/S19-2007",
pages = "54--63"
}
Citation: lmsys/toxic-chat
@misc{lin2023toxicchat,
title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
year={2023},
eprint={2310.17389},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 33
Model tree for DominicTWHV/Horizon-1-Text-Large
Base model
answerdotai/ModernBERT-large