SemanticCite / README.md
sebsigma's picture
Update ArXiv link
2b74b96 verified
metadata
title: SemanticCite
colorFrom: green
colorTo: green
sdk: static
pinned: false
license: mit
short_description: AI system for automated full-text citation verification

SemanticCite

The first AI system for automated full-text citation verification

πŸ“„ Paper | πŸ’» GitHub Repository | 🌐 Project Homepage

This Hugging Face Space hosts the complete SemanticCite project, including fine-tuned models, training dataset, and interactive demo for AI-powered citation verification.

SemanticCite Input/Output

SemanticCite transforms citation verification by analysing complete source documents and providing nuanced classification through four categories: Supported, Partially Supported, Unsupported, and Uncertain. Beyond simple validation, the system delivers detailed reasoning, confidence scores, and evidence reference snippets that show researchers exactly how their claims connect to the supporting literature.

✨ Key Features

  • Deep Semantic Analysis: Full-text document analysis with 4-class classification (Supported, Partially Supported, Unsupported, Uncertain)
  • Lightweight AI Models: Fine-tuned Qwen3 models (1.7B & 4B parameters) with performance comparable to GPT-4
  • Triple Retrieval System: Dense vector search + sparse BM25 matching + neural reranking with FlashRank
  • Evidence-Based Reasoning: Ranked text snippets with transparent explanations and confidence scores
  • Multiple Deployment Options: Web interface, Python API, local/cloud deployment

πŸ€— Hugging Face Resources

Models

Dataset

  • SemanticCite-Dataset - 1,111 citation-reference pairs across 8 academic fields with expert annotations

πŸ”§ Technical Architecture

  • Hybrid Retrieval: BM25 + Dense Vector Search
  • Reranking: FlashRank neural reranking
  • Classification: Fine-tuned Qwen3 models with structured output
  • Embeddings: Local SentenceTransformers or OpenAI embeddings
  • Storage: ChromaDB vector database

πŸ“¦ Installation

# Clone repository
git clone https://github.com/sebhaan/SemanticCite
cd SemanticCite

# Setup environment
conda env create -f environment.yaml
conda activate cite

# Run web interface
streamlit run src/app.py

For local deployment with Ollama:

# Install models
ollama pull sebsigma/semanticcite-refiner-qwen3-1b
ollama pull sebsigma/semanticcite-checker-qwen3-4b

Full documentation available in the GitHub repository.

πŸ’Ό Tailored Solutions

Need to verify entire documents automatically? Visit semanticcite.com for:

  • Complete citation system with automatic extraction and verification
  • Batch processing for large-scale workflows
  • API integration for editorial and publishing systems
  • On-premise deployment with custom model training

πŸ“„ Citation

If you use SemanticCite in your research, please cite:

@article{semanticcite2025,
  title={SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning},
  author={Sebastian Haan},
  journal={ArXiv Preprint},
  year={2025},
  url={https://arxiv.org/abs/2511.16198}
}

SemanticCite - Enhancing research quality through AI-powered citation verification