File size: 4,633 Bytes
35a501f
 
 
 
 
 
 
 
 
 
 
c8c18f5
 
07ab2d3
 
35a501f
 
 
 
 
 
eb0be2f
35a501f
 
eb0be2f
6f64e49
35a501f
0e4d0a0
35a501f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e1ea503
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c9e28c
e1ea503
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c9e28c
e1ea503
 
 
 
35a501f
 
 
 
 
 
9647f85
 
 
 
ea6ab39
35a501f
 
 
 
 
813ae3c
c8c18f5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: apache-2.0
language:
- en
- ru
library_name: gigacheck
tags:
- text-classification
- ai-detection
- multilingual
- gigacheck
datasets:
- iitolstykh/LLMTrace_classification
base_model:
  - mistralai/Mistral-7B-v0.3
---

# GigaCheck-Classifier-Multi

<p style="text-align: center;">
  <div align="center">
  <img src="https://raw.githubusercontent.com/sweetdream779/LLMTrace-info/refs/heads/main/images/logo/GigaCheck-classifier-multi.PNG" width="40%"/>
  </div>
  <p align="center">
  <a href="https://sweetdream779.github.io/LLMTrace-info"> 🌐 LLMTrace Website </a> | 
  <a href="http://arxiv.org/abs/2509.21269"> 📜 LLMTrace Paper on arXiv </a> | 
  <a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_classification"> 🤗 LLMTrace - Classification Dataset </a> | 
  <a href="https://github.com/ai-forever/gigacheck"> Github </a> | 
</p>

## Model Card

### Model Description

This is the official `GigaCheck-Classifier-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for the **binary classification of text** as either `human` or `ai`.

The model was trained jointly on the English and Russian portions of the `LLMTrace Classification dataset`. It is designed to be a robust baseline for detecting AI-generated content across multiple domains, text lengths and prompt types.

For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)

### Intended Use & Limitations

This model is intended for academic research, analysis of AI-generated content, and as a baseline for developing more advanced detection tools.

**Limitations:**
*   The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
*   It is not infallible and can produce false positives (flagging human text as AI) and false negatives.
*   Performance may vary on domains or styles of text not well-represented in the training data.


## Evaluation

The model was evaluated on the test split of the `LLMTrace Classification dataset`, which was not seen during training. Performance metrics are reported below:

| Metric                | Value   |
|-----------------------|---------|
| F1 Score (AI)         | 98.64   |
| F1 Score (Human)      | 98.00   |
| Mean Accuracy         | 98.46   |
| TPR @ FPR=0.01        | 97.93   |


## Quick start

Requirements:
- python3.11
- [gigacheck](https://github.com/ai-forever/gigacheck)

```bash
pip install git+https://github.com/ai-forever/gigacheck
```

### Inference with transformers (with trust_remote_code=True)

```python
from transformers import AutoModel
import torch

gigacheck_model = AutoModel.from_pretrained(
    "iitolstykh/GigaCheck-Classifier-Multi", 
    trust_remote_code=True, 
    device_map="cuda:0",
    torch_dtype=torch.bfloat16
)

text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""

output = gigacheck_model([text.replace("\n", " ")])

print([gigacheck_model.config.id2label[int(c_id)] for c_id in output.pred_label_ids])
```

### Inference with gigacheck

```python
import torch
from transformers import AutoConfig
from gigacheck.inference.src.mistral_detector import MistralDetector

model_name = "iitolstykh/GigaCheck-Classifier-Multi"

config = AutoConfig.from_pretrained(model_name)
model = MistralDetector(
    max_seq_len=config.max_length,
    with_detr=config.with_detr,
    id2label=config.id2label,
    device="cpu" if not torch.cuda.is_available() else "cuda:0",
).from_pretrained(model_name)

text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""

output = model.predict(text.replace("\n", " "))
print(output)
```


## Citation

If you use this model in your research, please cite our papers:

```bibtex
@article{Layer2025LLMTrace,
  Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
  Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
  Year = {2025},
  Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
  title={{GigaCheck: Detecting LLM-generated Content}},
  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
  journal={arXiv preprint arXiv:2410.23728},
  year={2024}
}
```