osint-llm / README.md
Tom
Add complete RAG-powered OSINT investigation assistant
6466c00

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: OSINT Investigation Assistant
emoji: ๐Ÿ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: RAG-powered OSINT investigation assistant with 344+ tools
license: mit

๐Ÿ” OSINT Investigation Assistant

A RAG-powered AI assistant that helps investigators develop structured methodologies for open-source intelligence (OSINT) investigations. Built with LangChain, Supabase PGVector, and Hugging Face Inference Providers.

โœจ Features

  • ๐ŸŽฏ Structured Methodologies: Generate step-by-step investigation plans tailored to your query
  • ๐Ÿ› ๏ธ 344+ OSINT Tools: Access recommendations from a comprehensive database of curated OSINT tools
  • ๐Ÿ” Context-Aware Retrieval: Semantic search finds the most relevant tools for your investigation
  • ๐Ÿš€ API Access: Built-in REST API for integration with external applications
  • ๐Ÿ’ฌ Chat Interface: User-friendly conversational interface
  • ๐Ÿ”Œ MCP Support: Can be extended to work with AI agents via MCP protocol

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      Gradio UI + API Endpoints       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     LangChain RAG Pipeline           โ”‚
โ”‚  โ€ข Query Understanding               โ”‚
โ”‚  โ€ข Tool Retrieval (PGVector)         โ”‚
โ”‚  โ€ข Response Generation (LLM)         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚                     โ”‚
โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Supabase      โ”‚  โ”‚ HF Inference     โ”‚
โ”‚ PGVector DB   โ”‚  โ”‚ Providers        โ”‚
โ”‚ (344 tools)   โ”‚  โ”‚ (Llama 3.1)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Quick Start

Local Development

  1. Clone the repository

    git clone <your-repo-url>
    cd osint-llm
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Set up environment variables

    cp .env.example .env
    # Edit .env with your credentials
    

    Required variables:

    • SUPABASE_CONNECTION_STRING: Your Supabase PostgreSQL connection string
    • HF_TOKEN: Your Hugging Face API token
  4. Run the application

    python app.py
    

    The app will be available at http://localhost:7860

Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face
  2. Push this repository to your Space
  3. Set environment variables in Space settings:
    • SUPABASE_CONNECTION_STRING
    • HF_TOKEN
  4. Deploy - The Space will automatically build and launch

๐Ÿ“š Usage

Chat Interface

Simply ask your investigation questions:

"How do I investigate a suspicious domain?"
"What tools can I use to verify an image's authenticity?"
"How can I trace the origin of a social media account?"

The assistant will provide:

  1. Investigation overview
  2. Step-by-step methodology
  3. Recommended tools with descriptions and URLs
  4. Best practices and safety considerations
  5. Expected outcomes

Tool Search

Use the "Tool Search" tab to directly search for OSINT tools by category or purpose.

API Access

This app automatically exposes REST API endpoints for external integration.

Python Client:

from gradio_client import Client

client = Client("your-space-url")
result = client.predict(
    "How do I investigate a domain?",
    api_name="/investigate"
)
print(result)

JavaScript Client:

import { Client } from "@gradio/client";

const client = await Client.connect("your-space-url");
const result = await client.predict("/investigate", {
  message: "How do I investigate a domain?"
});
console.log(result.data);

cURL:

curl -X POST "https://your-space.hf.space/call/investigate" \
     -H "Content-Type: application/json" \
     -d '{"data": ["How do I investigate a domain?"]}'

Available Endpoints:

  • /call/investigate - Main investigation assistant
  • /call/search_tools - Direct tool search
  • /gradio_api/openapi.json - OpenAPI specification

๐Ÿ—„๏ธ Database

The app uses Supabase with PGVector extension to store and retrieve OSINT tools.

Database Schema:

CREATE TABLE bellingcat_tools (
  id BIGINT PRIMARY KEY,
  name TEXT,
  category TEXT,
  content TEXT,
  url TEXT,
  cost TEXT,
  details TEXT,
  embedding VECTOR,
  created_at TIMESTAMP WITH TIME ZONE
);

Tool Categories:

  • Archiving & Preservation
  • Social Media Investigation
  • Image & Video Analysis
  • Domain & Network Investigation
  • Geolocation
  • Data Extraction
  • Verification & Fact-Checking
  • And more...

๐Ÿ› ๏ธ Technology Stack

๐Ÿ“ Project Structure

osint-llm/
โ”œโ”€โ”€ app.py                    # Main Gradio application
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ .env.example             # Environment variables template
โ”œโ”€โ”€ README.md                # This file
โ””โ”€โ”€ src/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ vectorstore.py       # Supabase PGVector connection
    โ”œโ”€โ”€ rag_pipeline.py      # LangChain RAG logic
    โ”œโ”€โ”€ llm_client.py        # Inference Provider client
    โ””โ”€โ”€ prompts.py           # Investigation prompt templates

โš™๏ธ Configuration

Environment Variables

See .env.example for all available configuration options.

Required:

  • SUPABASE_CONNECTION_STRING - PostgreSQL connection string
  • HF_TOKEN - Hugging Face API token

Optional:

  • LLM_MODEL - Model to use (default: meta-llama/Llama-3.1-8B-Instruct)
  • LLM_TEMPERATURE - Generation temperature (default: 0.7)
  • LLM_MAX_TOKENS - Max tokens to generate (default: 2000)
  • RETRIEVAL_K - Number of tools to retrieve (default: 5)
  • EMBEDDING_MODEL - Embedding model (default: sentence-transformers/all-MiniLM-L6-v2)

Supported LLM Models

  • meta-llama/Llama-3.1-8B-Instruct (recommended)
  • meta-llama/Meta-Llama-3-8B-Instruct
  • Qwen/Qwen2.5-72B-Instruct
  • mistralai/Mistral-7B-Instruct-v0.3

๐Ÿ’ฐ Cost Considerations

Hugging Face Inference Providers

  • Free tier: $0.10/month credits
  • PRO tier: $2.00/month credits + pay-as-you-go
  • Typical cost: ~$0.001-0.01 per query
  • Recommended budget: $10-50/month for moderate usage

Supabase

  • Free tier sufficient for most use cases
  • PGVector operations are standard database queries

Hugging Face Spaces

  • Free CPU hosting available
  • GPU upgrade: ~$0.60/hour (optional, not required)

๐Ÿ”ฎ Future Enhancements

  • MCP server integration for AI agent tool use
  • Multi-turn conversation with memory
  • User authentication and query logging
  • Additional tool databases and sources
  • Export methodologies as PDF/markdown
  • Tool usage examples and tutorials
  • Community-contributed tool reviews

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

๐Ÿ“„ License

MIT License - See LICENSE file for details

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support

For issues or questions:


Built with โค๏ธ for the OSINT community