Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
title: OSINT Investigation Assistant
emoji: ๐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: RAG-powered OSINT investigation assistant with 344+ tools
license: mit
๐ OSINT Investigation Assistant
A RAG-powered AI assistant that helps investigators develop structured methodologies for open-source intelligence (OSINT) investigations. Built with LangChain, Supabase PGVector, and Hugging Face Inference Providers.
โจ Features
- ๐ฏ Structured Methodologies: Generate step-by-step investigation plans tailored to your query
- ๐ ๏ธ 344+ OSINT Tools: Access recommendations from a comprehensive database of curated OSINT tools
- ๐ Context-Aware Retrieval: Semantic search finds the most relevant tools for your investigation
- ๐ API Access: Built-in REST API for integration with external applications
- ๐ฌ Chat Interface: User-friendly conversational interface
- ๐ MCP Support: Can be extended to work with AI agents via MCP protocol
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Gradio UI + API Endpoints โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ LangChain RAG Pipeline โ
โ โข Query Understanding โ
โ โข Tool Retrieval (PGVector) โ
โ โข Response Generation (LLM) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโดโโโโโโโโโโโ
โ โ
โโโโโผโโโโโโโโโโโโ โโโโโโโผโโโโโโโโโโโโโ
โ Supabase โ โ HF Inference โ
โ PGVector DB โ โ Providers โ
โ (344 tools) โ โ (Llama 3.1) โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
๐ Quick Start
Local Development
Clone the repository
git clone <your-repo-url> cd osint-llmInstall dependencies
pip install -r requirements.txtSet up environment variables
cp .env.example .env # Edit .env with your credentialsRequired variables:
SUPABASE_CONNECTION_STRING: Your Supabase PostgreSQL connection stringHF_TOKEN: Your Hugging Face API token
Run the application
python app.pyThe app will be available at
http://localhost:7860
Hugging Face Spaces Deployment
- Create a new Space on Hugging Face
- Push this repository to your Space
- Set environment variables in Space settings:
SUPABASE_CONNECTION_STRINGHF_TOKEN
- Deploy - The Space will automatically build and launch
๐ Usage
Chat Interface
Simply ask your investigation questions:
"How do I investigate a suspicious domain?"
"What tools can I use to verify an image's authenticity?"
"How can I trace the origin of a social media account?"
The assistant will provide:
- Investigation overview
- Step-by-step methodology
- Recommended tools with descriptions and URLs
- Best practices and safety considerations
- Expected outcomes
Tool Search
Use the "Tool Search" tab to directly search for OSINT tools by category or purpose.
API Access
This app automatically exposes REST API endpoints for external integration.
Python Client:
from gradio_client import Client
client = Client("your-space-url")
result = client.predict(
"How do I investigate a domain?",
api_name="/investigate"
)
print(result)
JavaScript Client:
import { Client } from "@gradio/client";
const client = await Client.connect("your-space-url");
const result = await client.predict("/investigate", {
message: "How do I investigate a domain?"
});
console.log(result.data);
cURL:
curl -X POST "https://your-space.hf.space/call/investigate" \
-H "Content-Type: application/json" \
-d '{"data": ["How do I investigate a domain?"]}'
Available Endpoints:
/call/investigate- Main investigation assistant/call/search_tools- Direct tool search/gradio_api/openapi.json- OpenAPI specification
๐๏ธ Database
The app uses Supabase with PGVector extension to store and retrieve OSINT tools.
Database Schema:
CREATE TABLE bellingcat_tools (
id BIGINT PRIMARY KEY,
name TEXT,
category TEXT,
content TEXT,
url TEXT,
cost TEXT,
details TEXT,
embedding VECTOR,
created_at TIMESTAMP WITH TIME ZONE
);
Tool Categories:
- Archiving & Preservation
- Social Media Investigation
- Image & Video Analysis
- Domain & Network Investigation
- Geolocation
- Data Extraction
- Verification & Fact-Checking
- And more...
๐ ๏ธ Technology Stack
- UI/API: Gradio - Automatic API generation
- RAG Framework: LangChain - Retrieval pipeline
- Vector Database: Supabase with PGVector extension
- Embeddings: HuggingFace sentence-transformers
- LLM: Hugging Face Inference Providers - Llama 3.1
- Language: Python 3.9+
๐ Project Structure
osint-llm/
โโโ app.py # Main Gradio application
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variables template
โโโ README.md # This file
โโโ src/
โโโ __init__.py
โโโ vectorstore.py # Supabase PGVector connection
โโโ rag_pipeline.py # LangChain RAG logic
โโโ llm_client.py # Inference Provider client
โโโ prompts.py # Investigation prompt templates
โ๏ธ Configuration
Environment Variables
See .env.example for all available configuration options.
Required:
SUPABASE_CONNECTION_STRING- PostgreSQL connection stringHF_TOKEN- Hugging Face API token
Optional:
LLM_MODEL- Model to use (default: meta-llama/Llama-3.1-8B-Instruct)LLM_TEMPERATURE- Generation temperature (default: 0.7)LLM_MAX_TOKENS- Max tokens to generate (default: 2000)RETRIEVAL_K- Number of tools to retrieve (default: 5)EMBEDDING_MODEL- Embedding model (default: sentence-transformers/all-MiniLM-L6-v2)
Supported LLM Models
meta-llama/Llama-3.1-8B-Instruct(recommended)meta-llama/Meta-Llama-3-8B-InstructQwen/Qwen2.5-72B-Instructmistralai/Mistral-7B-Instruct-v0.3
๐ฐ Cost Considerations
Hugging Face Inference Providers
- Free tier: $0.10/month credits
- PRO tier: $2.00/month credits + pay-as-you-go
- Typical cost: ~$0.001-0.01 per query
- Recommended budget: $10-50/month for moderate usage
Supabase
- Free tier sufficient for most use cases
- PGVector operations are standard database queries
Hugging Face Spaces
- Free CPU hosting available
- GPU upgrade: ~$0.60/hour (optional, not required)
๐ฎ Future Enhancements
- MCP server integration for AI agent tool use
- Multi-turn conversation with memory
- User authentication and query logging
- Additional tool databases and sources
- Export methodologies as PDF/markdown
- Tool usage examples and tutorials
- Community-contributed tool reviews
๐ค Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
๐ License
MIT License - See LICENSE file for details
๐ Acknowledgments
- Tool data sourced from Bellingcat's Online Investigation Toolkit
- Built with support from the OSINT community
๐ Support
For issues or questions:
- Open an issue on GitHub
- Check the Hugging Face Spaces documentation
- Review the Gradio documentation
Built with โค๏ธ for the OSINT community