File size: 4,633 Bytes

---
license: apache-2.0
language:
- en
- ru
library_name: gigacheck
tags:
- text-classification
- ai-detection
- multilingual
- gigacheck
datasets:
- iitolstykh/LLMTrace_classification
base_model:
  - mistralai/Mistral-7B-v0.3
---

# GigaCheck-Classifier-Multi

<p style="text-align: center;">
  <div align="center">
  <img src="https://raw.githubusercontent.com/sweetdream779/LLMTrace-info/refs/heads/main/images/logo/GigaCheck-classifier-multi.PNG" width="40%"/>
  </div>
  <p align="center">
  <a href="https://sweetdream779.github.io/LLMTrace-info"> 🌐 LLMTrace Website </a> | 
  <a href="http://arxiv.org/abs/2509.21269"> 📜 LLMTrace Paper on arXiv </a> | 
  <a href="https://huggingface.co/datasets/iitolstykh/LLMTrace_classification"> 🤗 LLMTrace - Classification Dataset </a> | 
  <a href="https://github.com/ai-forever/gigacheck"> Github </a> | 
</p>

## Model Card

### Model Description

This is the official `GigaCheck-Classifier-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for the **binary classification of text** as either `human` or `ai`.

The model was trained jointly on the English and Russian portions of the `LLMTrace Classification dataset`. It is designed to be a robust baseline for detecting AI-generated content across multiple domains, text lengths and prompt types.

For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)

### Intended Use & Limitations

This model is intended for academic research, analysis of AI-generated content, and as a baseline for developing more advanced detection tools.

**Limitations:**
*   The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
*   It is not infallible and can produce false positives (flagging human text as AI) and false negatives.
*   Performance may vary on domains or styles of text not well-represented in the training data.


## Evaluation

The model was evaluated on the test split of the `LLMTrace Classification dataset`, which was not seen during training. Performance metrics are reported below:

| Metric                | Value   |
|-----------------------|---------|
| F1 Score (AI)         | 98.64   |
| F1 Score (Human)      | 98.00   |
| Mean Accuracy         | 98.46   |
| TPR @ FPR=0.01        | 97.93   |


## Quick start

Requirements:
- python3.11
- [gigacheck](https://github.com/ai-forever/gigacheck)

```bash
pip install git+https://github.com/ai-forever/gigacheck
```

### Inference with transformers (with trust_remote_code=True)

```python
from transformers import AutoModel
import torch

gigacheck_model = AutoModel.from_pretrained(
    "iitolstykh/GigaCheck-Classifier-Multi", 
    trust_remote_code=True, 
    device_map="cuda:0",
    torch_dtype=torch.bfloat16
)

text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""

output = gigacheck_model([text.replace("\n", " ")])

print([gigacheck_model.config.id2label[int(c_id)] for c_id in output.pred_label_ids])
```

### Inference with gigacheck

```python
import torch
from transformers import AutoConfig
from gigacheck.inference.src.mistral_detector import MistralDetector

model_name = "iitolstykh/GigaCheck-Classifier-Multi"

config = AutoConfig.from_pretrained(model_name)
model = MistralDetector(
    max_seq_len=config.max_length,
    with_detr=config.with_detr,
    id2label=config.id2label,
    device="cpu" if not torch.cuda.is_available() else "cuda:0",
).from_pretrained(model_name)

text = """To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them.
"""

output = model.predict(text.replace("\n", " "))
print(output)
```


## Citation

If you use this model in your research, please cite our papers:

```bibtex
@article{Layer2025LLMTrace,
  Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
  Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
  Year = {2025},
  Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
  title={{GigaCheck: Detecting LLM-generated Content}},
  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
  journal={arXiv preprint arXiv:2410.23728},
  year={2024}
}
```