--- license: apache-2.0 language: - en - ru library_name: gigacheck tags: - text-classification - ai-detection - multilingual - gigacheck datasets: - iitolstykh/LLMTrace_classification base_model: - mistralai/Mistral-7B-v0.3 --- # GigaCheck-Classifier-Multi

🌐 LLMTrace Website | πŸ“œ LLMTrace Paper on arXiv | πŸ€— LLMTrace - Classification Dataset | Github |

## Model Card ### Model Description This is the official `GigaCheck-Classifier-Multi` model from the `LLMTrace` project. It is a multilingual transformer-based model trained for the **binary classification of text** as either `human` or `ai`. The model was trained jointly on the English and Russian portions of the `LLMTrace Classification dataset`. It is designed to be a robust baseline for detecting AI-generated content across multiple domains, text lengths and prompt types. For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon) ### Intended Use & Limitations This model is intended for academic research, analysis of AI-generated content, and as a baseline for developing more advanced detection tools. **Limitations:** * The model's performance may degrade on text generated by LLMs released after its training date (September 2025). * It is not infallible and can produce false positives (flagging human text as AI) and false negatives. * Performance may vary on domains or styles of text not well-represented in the training data. ## Evaluation The model was evaluated on the test split of the `LLMTrace Classification dataset`, which was not seen during training. Performance metrics are reported below: | Metric | Value | |-----------------------|---------| | F1 Score (AI) | 98.64 | | F1 Score (Human) | 98.00 | | Mean Accuracy | 98.46 | | TPR @ FPR=0.01 | 97.93 | ## Quick start Requirements: - python3.11 - [gigacheck](https://github.com/ai-forever/gigacheck) ```bash pip install git+https://github.com/ai-forever/gigacheck ``` ### Inference with transformers (with trust_remote_code=True) ```python from transformers import AutoModel import torch gigacheck_model = AutoModel.from_pretrained( "iitolstykh/GigaCheck-Classifier-Multi", trust_remote_code=True, device_map="cuda:0", torch_dtype=torch.bfloat16 ) text = """To be, or not to be, that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles And by opposing end them. """ output = gigacheck_model([text]) print([gigacheck_model.config.id2label[int(c_id)] for c_id in output.pred_label_ids]) ``` ### Inference with gigacheck ```python import torch from transformers import AutoConfig from gigacheck.inference.src.mistral_detector import MistralDetector model_name = "iitolstykh/GigaCheck-Classifier-Multi" config = AutoConfig.from_pretrained(model_name) model = MistralDetector( max_seq_len=config.max_length, with_detr=config.with_detr, id2label=config.id2label, device="cpu" if not torch.cuda.is_available() else "cuda:0", ).from_pretrained(model_name) text = """To be, or not to be, that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles And by opposing end them. """ output = model.predict(text) print(output) ``` ## Citation If you use this model in your research, please cite our papers: ```bibtex @article{Layer2025LLMTrace, Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}}, Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich}, Year = {2025}, Eprint = {arXiv:2509.21269} } @article{tolstykh2024gigacheck, title={{GigaCheck: Detecting LLM-generated Content}}, author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich}, journal={arXiv preprint arXiv:2410.23728}, year={2024} } ```