File size: 4,633 Bytes
35a501f c8c18f5 07ab2d3 35a501f eb0be2f 35a501f eb0be2f 6f64e49 35a501f 0e4d0a0 35a501f e1ea503 8c9e28c e1ea503 8c9e28c e1ea503 35a501f 9647f85 ea6ab39 35a501f 813ae3c c8c18f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
license: apache-2.0
language:
- en
- ru
library_name: gigacheck
tags:
- text-classification
- ai-detection
- multilingual
- gigacheck
datasets:
- iitolstykh/LLMTrace_classification
base_model:
- mistralai/Mistral-7B-v0.3
---
# GigaCheck-Classifier-Multi
<p style="text-align: center;">
<div align="center">
<img src="https://raw.githubusercontent.com/sweetdream779/LLMTrace-info/refs/heads/main/images/logo/GigaCheck-classifier-multi.PNG" width="40%"/>
</div>
<p align="center">
<a href="https://sweetdream779.github.io/LLMTrace-info"> 🌐 LLMTrace Website </a> |
<a href="http://arxiv.org/abs/2509.21269"> 📜 LLMTrace Paper on arXiv </a> |
<a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_classification"> 🤗 LLMTrace - Classification Dataset </a> |
<a href="https://github.com/ai-forever/gigacheck"> Github </a> |
</p>
## Model Card
### Model Description
This is the official `GigaCheck-Classifier-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for the **binary classification of text** as either `human` or `ai`.
The model was trained jointly on the English and Russian portions of the `LLMTrace Classification dataset`. It is designed to be a robust baseline for detecting AI-generated content across multiple domains, text lengths and prompt types.
For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)
### Intended Use & Limitations
This model is intended for academic research, analysis of AI-generated content, and as a baseline for developing more advanced detection tools.
**Limitations:**
* The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
* It is not infallible and can produce false positives (flagging human text as AI) and false negatives.
* Performance may vary on domains or styles of text not well-represented in the training data.
## Evaluation
The model was evaluated on the test split of the `LLMTrace Classification dataset`, which was not seen during training. Performance metrics are reported below:
| Metric | Value |
|-----------------------|---------|
| F1 Score (AI) | 98.64 |
| F1 Score (Human) | 98.00 |
| Mean Accuracy | 98.46 |
| TPR @ FPR=0.01 | 97.93 |
## Quick start
Requirements:
- python3.11
- [gigacheck](https://github.com/ai-forever/gigacheck)
```bash
pip install git+https://github.com/ai-forever/gigacheck
```
### Inference with transformers (with trust_remote_code=True)
```python
from transformers import AutoModel
import torch
gigacheck_model = AutoModel.from_pretrained(
"iitolstykh/GigaCheck-Classifier-Multi",
trust_remote_code=True,
device_map="cuda:0",
torch_dtype=torch.bfloat16
)
text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""
output = gigacheck_model([text.replace("\n", " ")])
print([gigacheck_model.config.id2label[int(c_id)] for c_id in output.pred_label_ids])
```
### Inference with gigacheck
```python
import torch
from transformers import AutoConfig
from gigacheck.inference.src.mistral_detector import MistralDetector
model_name = "iitolstykh/GigaCheck-Classifier-Multi"
config = AutoConfig.from_pretrained(model_name)
model = MistralDetector(
max_seq_len=config.max_length,
with_detr=config.with_detr,
id2label=config.id2label,
device="cpu" if not torch.cuda.is_available() else "cuda:0",
).from_pretrained(model_name)
text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""
output = model.predict(text.replace("\n", " "))
print(output)
```
## Citation
If you use this model in your research, please cite our papers:
```bibtex
@article{Layer2025LLMTrace,
Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
Year = {2025},
Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
title={{GigaCheck: Detecting LLM-generated Content}},
author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
journal={arXiv preprint arXiv:2410.23728},
year={2024}
}
``` |