Spaces:
Sleeping
Sleeping
| title: OSINT Investigation Assistant | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| short_description: RAG-powered OSINT investigation assistant with 344+ tools | |
| license: mit | |
| # ๐ OSINT Investigation Assistant | |
| A RAG-powered AI assistant that helps investigators develop structured methodologies for open-source intelligence (OSINT) investigations. Built with LangChain, Supabase PGVector, and Hugging Face Inference Providers. | |
| ## โจ Features | |
| - **๐ฏ Structured Methodologies**: Generate step-by-step investigation plans tailored to your query | |
| - **๐ ๏ธ 344+ OSINT Tools**: Access recommendations from a comprehensive database of curated OSINT tools | |
| - **๐ Context-Aware Retrieval**: Semantic search finds the most relevant tools for your investigation | |
| - **๐ API Access**: Built-in REST API for integration with external applications | |
| - **๐ฌ Chat Interface**: User-friendly conversational interface | |
| - **๐ MCP Support**: Can be extended to work with AI agents via MCP protocol | |
| ## ๐๏ธ Architecture | |
| ``` | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Gradio UI + API Endpoints โ | |
| โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ LangChain RAG Pipeline โ | |
| โ โข Query Understanding โ | |
| โ โข Tool Retrieval (PGVector) โ | |
| โ โข Response Generation (LLM) โ | |
| โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโดโโโโโโโโโโโ | |
| โ โ | |
| โโโโโผโโโโโโโโโโโโ โโโโโโโผโโโโโโโโโโโโโ | |
| โ Supabase โ โ HF Inference โ | |
| โ PGVector DB โ โ Providers โ | |
| โ (344 tools) โ โ (Llama 3.1) โ | |
| โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ | |
| ``` | |
| ## ๐ Quick Start | |
| ### Local Development | |
| 1. **Clone the repository** | |
| ```bash | |
| git clone <your-repo-url> | |
| cd osint-llm | |
| ``` | |
| 2. **Install dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Set up environment variables** | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your credentials | |
| ``` | |
| Required variables: | |
| - `SUPABASE_CONNECTION_STRING`: Your Supabase PostgreSQL connection string | |
| - `HF_TOKEN`: Your Hugging Face API token | |
| 4. **Run the application** | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will be available at `http://localhost:7860` | |
| ### Hugging Face Spaces Deployment | |
| 1. **Create a new Space** on Hugging Face | |
| 2. **Push this repository** to your Space | |
| 3. **Set environment variables** in Space settings: | |
| - `SUPABASE_CONNECTION_STRING` | |
| - `HF_TOKEN` | |
| 4. **Deploy** - The Space will automatically build and launch | |
| ## ๐ Usage | |
| ### Chat Interface | |
| Simply ask your investigation questions: | |
| ``` | |
| "How do I investigate a suspicious domain?" | |
| "What tools can I use to verify an image's authenticity?" | |
| "How can I trace the origin of a social media account?" | |
| ``` | |
| The assistant will provide: | |
| 1. Investigation overview | |
| 2. Step-by-step methodology | |
| 3. Recommended tools with descriptions and URLs | |
| 4. Best practices and safety considerations | |
| 5. Expected outcomes | |
| ### Tool Search | |
| Use the "Tool Search" tab to directly search for OSINT tools by category or purpose. | |
| ### API Access | |
| This app automatically exposes REST API endpoints for external integration. | |
| **Python Client:** | |
| ```python | |
| from gradio_client import Client | |
| client = Client("your-space-url") | |
| result = client.predict( | |
| "How do I investigate a domain?", | |
| api_name="/investigate" | |
| ) | |
| print(result) | |
| ``` | |
| **JavaScript Client:** | |
| ```javascript | |
| import { Client } from "@gradio/client"; | |
| const client = await Client.connect("your-space-url"); | |
| const result = await client.predict("/investigate", { | |
| message: "How do I investigate a domain?" | |
| }); | |
| console.log(result.data); | |
| ``` | |
| **cURL:** | |
| ```bash | |
| curl -X POST "https://your-space.hf.space/call/investigate" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"data": ["How do I investigate a domain?"]}' | |
| ``` | |
| **Available Endpoints:** | |
| - `/call/investigate` - Main investigation assistant | |
| - `/call/search_tools` - Direct tool search | |
| - `/gradio_api/openapi.json` - OpenAPI specification | |
| ## ๐๏ธ Database | |
| The app uses Supabase with PGVector extension to store and retrieve OSINT tools. | |
| **Database Schema:** | |
| ```sql | |
| CREATE TABLE bellingcat_tools ( | |
| id BIGINT PRIMARY KEY, | |
| name TEXT, | |
| category TEXT, | |
| content TEXT, | |
| url TEXT, | |
| cost TEXT, | |
| details TEXT, | |
| embedding VECTOR, | |
| created_at TIMESTAMP WITH TIME ZONE | |
| ); | |
| ``` | |
| **Tool Categories:** | |
| - Archiving & Preservation | |
| - Social Media Investigation | |
| - Image & Video Analysis | |
| - Domain & Network Investigation | |
| - Geolocation | |
| - Data Extraction | |
| - Verification & Fact-Checking | |
| - And more... | |
| ## ๐ ๏ธ Technology Stack | |
| - **UI/API**: [Gradio](https://gradio.app/) - Automatic API generation | |
| - **RAG Framework**: [LangChain](https://langchain.com/) - Retrieval pipeline | |
| - **Vector Database**: [Supabase](https://supabase.com/) with PGVector extension | |
| - **Embeddings**: HuggingFace sentence-transformers | |
| - **LLM**: [Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers/) - Llama 3.1 | |
| - **Language**: Python 3.9+ | |
| ## ๐ Project Structure | |
| ``` | |
| osint-llm/ | |
| โโโ app.py # Main Gradio application | |
| โโโ requirements.txt # Python dependencies | |
| โโโ .env.example # Environment variables template | |
| โโโ README.md # This file | |
| โโโ src/ | |
| โโโ __init__.py | |
| โโโ vectorstore.py # Supabase PGVector connection | |
| โโโ rag_pipeline.py # LangChain RAG logic | |
| โโโ llm_client.py # Inference Provider client | |
| โโโ prompts.py # Investigation prompt templates | |
| ``` | |
| ## โ๏ธ Configuration | |
| ### Environment Variables | |
| See `.env.example` for all available configuration options. | |
| **Required:** | |
| - `SUPABASE_CONNECTION_STRING` - PostgreSQL connection string | |
| - `HF_TOKEN` - Hugging Face API token | |
| **Optional:** | |
| - `LLM_MODEL` - Model to use (default: meta-llama/Llama-3.1-8B-Instruct) | |
| - `LLM_TEMPERATURE` - Generation temperature (default: 0.7) | |
| - `LLM_MAX_TOKENS` - Max tokens to generate (default: 2000) | |
| - `RETRIEVAL_K` - Number of tools to retrieve (default: 5) | |
| - `EMBEDDING_MODEL` - Embedding model (default: sentence-transformers/all-MiniLM-L6-v2) | |
| ### Supported LLM Models | |
| - `meta-llama/Llama-3.1-8B-Instruct` (recommended) | |
| - `meta-llama/Meta-Llama-3-8B-Instruct` | |
| - `Qwen/Qwen2.5-72B-Instruct` | |
| - `mistralai/Mistral-7B-Instruct-v0.3` | |
| ## ๐ฐ Cost Considerations | |
| ### Hugging Face Inference Providers | |
| - Free tier: $0.10/month credits | |
| - PRO tier: $2.00/month credits + pay-as-you-go | |
| - Typical cost: ~$0.001-0.01 per query | |
| - Recommended budget: $10-50/month for moderate usage | |
| ### Supabase | |
| - Free tier sufficient for most use cases | |
| - PGVector operations are standard database queries | |
| ### Hugging Face Spaces | |
| - Free CPU hosting available | |
| - GPU upgrade: ~$0.60/hour (optional, not required) | |
| ## ๐ฎ Future Enhancements | |
| - [ ] MCP server integration for AI agent tool use | |
| - [ ] Multi-turn conversation with memory | |
| - [ ] User authentication and query logging | |
| - [ ] Additional tool databases and sources | |
| - [ ] Export methodologies as PDF/markdown | |
| - [ ] Tool usage examples and tutorials | |
| - [ ] Community-contributed tool reviews | |
| ## ๐ค Contributing | |
| Contributions are welcome! Please feel free to submit issues or pull requests. | |
| ## ๐ License | |
| MIT License - See LICENSE file for details | |
| ## ๐ Acknowledgments | |
| - Tool data sourced from [Bellingcat's Online Investigation Toolkit](https://www.bellingcat.com/) | |
| - Built with support from the OSINT community | |
| ## ๐ Support | |
| For issues or questions: | |
| - Open an issue on GitHub | |
| - Check the [Hugging Face Spaces documentation](https://huggingface.co/docs/hub/spaces) | |
| - Review the [Gradio documentation](https://gradio.app/docs/) | |
| --- | |
| Built with โค๏ธ for the OSINT community | |