Spaces:
Running
Running
| title: TraceMind AI | |
| emoji: π§ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| short_description: AI agent evaluation with MCP-powered intelligence | |
| license: agpl-3.0 | |
| pinned: true | |
| tags: | |
| - mcp-in-action-track-enterprise | |
| - agent-evaluation | |
| - mcp-client | |
| - leaderboard | |
| - gradio | |
| # π§ TraceMind-AI | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind-AI Logo" width="200"/> | |
| </p> | |
| **Agent Evaluation Platform with MCP-Powered Intelligence** | |
| [](https://github.com/modelcontextprotocol) | |
| [-purple)](https://github.com/modelcontextprotocol/hackathon) | |
| [](https://gradio.app/) | |
| > **π― Track 2 Submission**: MCP in Action (Enterprise) | |
| > **π MCP's 1st Birthday Hackathon**: November 14-30, 2025 | |
| --- | |
| ## Why TraceMind-AI? | |
| **The Challenge**: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming. | |
| **The Solution**: TraceMind-AI is your **intelligent agent evaluation command center**: | |
| - π **Live leaderboard** with real-time performance data | |
| - π€ **Autonomous agent chat** powered by MCP tools | |
| - π° **Smart cost estimation** before you run evaluations | |
| - π **Deep trace analysis** to debug agent behavior | |
| - βοΈ **Multi-cloud job submission** (HuggingFace Jobs + Modal) | |
| All powered by the **Model Context Protocol** for AI-driven insights at every step. | |
| --- | |
| ## π Try It Now | |
| - **π Live Demo**: [TraceMind-AI Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) | |
| - **π οΈ MCP Server**: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) (Track 1) | |
| - **π Full Docs**: See [USER_GUIDE.md](USER_GUIDE.md) for complete walkthrough | |
| - **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) | |
| - **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) | |
| - **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) | |
| --- | |
| ## The TraceMind Ecosystem | |
| TraceMind-AI is the **user-facing platform** in a complete 4-project agent evaluation ecosystem: | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/> | |
| <br/><br/> | |
| </p> | |
| ``` | |
| π TraceVerde π SMOLTRACE | |
| (genai_otel_instrument) (Evaluation Engine) | |
| β β | |
| Instruments Evaluates | |
| LLM calls agents | |
| β β | |
| βββββββββββββ¬ββββββββββββββββββββ | |
| β | |
| Generates Datasets | |
| (leaderboard, traces, metrics) | |
| β | |
| βββββββββββββ΄ββββββββββββββββββββ | |
| β β | |
| π οΈ TraceMind MCP Server π§ TraceMind-AI | |
| (Track 1 - Building MCP) (This Project - Track 2) | |
| Provides AI Tools Consumes MCP Tools | |
| ββββββββββ MCP Protocol βββββββββ | |
| ``` | |
| ### The Foundation | |
| **π TraceVerde** - Automatic OpenTelemetry instrumentation for LLM frameworks | |
| β Captures every LLM call, tool usage, and agent step | |
| β [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument) | |
| **π SMOLTRACE** - Lightweight evaluation engine with built-in tracing | |
| β Generates structured datasets (leaderboard, results, traces, metrics) | |
| β [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/) | |
| ### The Platform | |
| **π οΈ TraceMind MCP Server** - AI-powered analysis tools via MCP | |
| β [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) | |
| β **Track 1**: Building MCP (Enterprise) | |
| **π§ TraceMind-AI** (This Project) - Interactive UI that consumes MCP tools | |
| β **Track 2**: MCP in Action (Enterprise) | |
| --- | |
| ## Why This Matters for Hugging Face | |
| This ecosystem is built **around** Hugging Face, not just "using it": | |
| - Every SMOLTRACE evaluation creates **4 structured `datasets` on the Hub** (leaderboard, results, traces, metrics) | |
| - TraceMind MCP Server and TraceMind-AI run as **Hugging Face Spaces**, using **Gradio's MCP integration** | |
| - The stack is designed for **`smolagents`** β agents are evaluated, traced, and analyzed using HF's own agent framework | |
| - Evaluations can be executed via **HF Jobs**, turning evaluations into real compute usage, not just local scripts | |
| So TraceMind isn't just another agent demo. | |
| **It's an opinionated blueprint for:** | |
| > **"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."** | |
| --- | |
| ## Key Features | |
| ### π― MCP Integration (Track 2) | |
| TraceMind-AI demonstrates **enterprise MCP client usage** in two ways: | |
| **1. Direct MCP Client Integration** | |
| - Connects to TraceMind MCP Server via SSE transport | |
| - Uses 5 AI-powered tools: `analyze_leaderboard`, `estimate_cost`, `debug_trace`, `compare_runs`, `analyze_results` | |
| - Real-time insights powered by Google Gemini 2.5 Flash | |
| **2. Autonomous Agent with MCP Tools** | |
| - Built with `smolagents` framework | |
| - Agent has access to all MCP server tools | |
| - Natural language queries β autonomous tool execution | |
| - Example: *"What are the top 3 models and how much do they cost?"* | |
| ### π Agent Evaluation Features | |
| - **Live Leaderboard**: View all evaluation runs with sortable metrics | |
| - **Cost Estimation**: Auto-select hardware and predict costs before running | |
| - **Trace Visualization**: Deep-dive into OpenTelemetry traces with GPU metrics | |
| - **Multi-Cloud Jobs**: Submit evaluations to HuggingFace Jobs or Modal | |
| - **Performance Analytics**: GPU utilization, CO2 emissions, token tracking | |
| ### π‘ Smart Features | |
| - **Auto Hardware Selection**: Based on model size and provider | |
| - **Real-time Job Monitoring**: Track HuggingFace Jobs status | |
| - **Agent Reasoning Visibility**: See step-by-step tool execution | |
| - **Quick Action Buttons**: One-click common queries | |
| --- | |
| ## Quick Start | |
| ### Option 1: Use the Live Demo (Recommended) | |
| 1. **Visit**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind | |
| 2. **Login**: Sign in with your HuggingFace account | |
| 3. **Explore**: Browse the leaderboard, chat with the agent, visualize traces | |
| ### Option 2: Run Locally | |
| ```bash | |
| # Clone and setup | |
| git clone https://github.com/Mandark-droid/TraceMind-AI.git | |
| cd TraceMind-AI | |
| pip install -r requirements.txt | |
| # Configure environment | |
| cp .env.example .env | |
| # Edit .env with your API keys (see Configuration section) | |
| # Run the app | |
| python app.py | |
| ``` | |
| Visit http://localhost:7860 | |
| --- | |
| ## Configuration | |
| ### For Viewing (Free) | |
| **Required**: | |
| - HuggingFace account (free) | |
| - HuggingFace token with **Read** permissions | |
| ### For Submitting Jobs (Paid) | |
| **Required**: | |
| - β οΈ **HuggingFace Pro** ($9/month) with credit card | |
| - HuggingFace token with **Read + Write + Run Jobs** permissions | |
| - LLM provider API keys (OpenAI, Anthropic, etc.) | |
| **Optional (Modal Alternative)**: | |
| - Modal account (pay-per-second, no subscription) | |
| - Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET) | |
| ### Using Your Own API Keys (Recommended for Judges) | |
| To prevent rate limits during evaluation: | |
| **Step 1: Configure MCP Server** (Required for AI tools) | |
| 1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server | |
| 2. Go to **βοΈ Settings** tab | |
| 3. Enter: **Gemini API Key** + **HuggingFace Token** | |
| 4. Click **"Save & Override Keys"** | |
| **Step 2: Configure TraceMind-AI** (Optional) | |
| 1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind | |
| 2. Go to **βοΈ Settings** tab | |
| 3. Enter: **Gemini API Key** + **HuggingFace Token** | |
| 4. Click **"Save API Keys"** | |
| **Get Free API Keys**: | |
| - **Gemini**: https://ai.google.dev/ (1,500 requests/day) | |
| - **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets) | |
| --- | |
| ## For Hackathon Judges | |
| ### β Track 2 Compliance | |
| - **MCP Client Integration**: Connects to remote MCP server via SSE transport | |
| - **Autonomous Agent**: `smolagents` agent with MCP tool access | |
| - **Enterprise Focus**: Cost optimization, job submission, performance analytics | |
| - **Production-Ready**: Deployed to HuggingFace Spaces with OAuth authentication | |
| - **Real Data**: Live HuggingFace datasets from SMOLTRACE evaluations | |
| ### π― Key Innovations | |
| 1. **Dual MCP Integration**: Both direct MCP client + autonomous agent with MCP tools | |
| 2. **Multi-Cloud Support**: HuggingFace Jobs + Modal for serverless compute | |
| 3. **Auto Hardware Selection**: Smart hardware recommendations based on model size | |
| 4. **Complete Ecosystem**: Part of 4-project platform demonstrating full evaluation workflow | |
| 5. **Agent Reasoning Visibility**: See step-by-step MCP tool execution | |
| ### πΉ Demo Materials | |
| - **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) - Complete walkthrough of all features | |
| - **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) - Quick intro to MCP tools | |
| - **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) - Deep dive into MCP server | |
| - **π Blog Post**: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Technical deep-dive | |
| - **π LinkedIn Post**: [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - Final submission announcement | |
| ### π§ͺ Testing Suggestions | |
| **1. Try the Agent Chat** (π€ Agent Chat tab): | |
| - "Analyze the current leaderboard and show me the top 5 models" | |
| - "Compare the costs of the top 3 models" | |
| - "Estimate the cost of running 100 tests with GPT-4" | |
| **2. Explore the Leaderboard** (π Leaderboard tab): | |
| - Click "Load Leaderboard" to see live data | |
| - Read the AI-generated insights (powered by MCP server) | |
| - Click on a run to see detailed test results | |
| **3. Visualize Traces** (Select a run β View traces): | |
| - See OpenTelemetry waterfall diagrams | |
| - View GPU metrics overlay (for GPU jobs) | |
| - Ask questions about the trace (MCP-powered debugging) | |
| --- | |
| ## What Can You Do? | |
| ### π View & Analyze | |
| - **Browse leaderboard** with AI-powered insights | |
| - **Compare models** side-by-side across metrics | |
| - **Analyze traces** with interactive visualization | |
| - **Ask questions** via autonomous agent | |
| ### π° Estimate & Plan | |
| - **Get cost estimates** before running evaluations | |
| - **Compare hardware options** (CPU vs GPU tiers) | |
| - **Preview duration** and CO2 emissions | |
| - **See recommendations** from AI analysis | |
| ### π Submit & Monitor | |
| - **Submit evaluation jobs** to HuggingFace or Modal | |
| - **Track job status** in real-time | |
| - **View results** automatically when complete | |
| - **Download datasets** for further analysis | |
| ### π§ͺ Generate & Customize | |
| - **Generate synthetic datasets** for custom domains and tools | |
| - **Create prompt templates** optimized for your use case | |
| - **Push to HuggingFace Hub** with one click | |
| - **Test evaluations** without writing code | |
| --- | |
| ## Documentation | |
| **For quick evaluation**: | |
| - Read this README for overview | |
| - Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) to try it | |
| - Check out the **π€ Agent Chat** tab for autonomous MCP usage | |
| **For deep dives**: | |
| - [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen walkthrough | |
| - Leaderboard tab usage | |
| - Agent chat interactions | |
| - Synthetic data generator | |
| - Job submission workflow | |
| - Trace visualization guide | |
| - [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client architecture | |
| - How TraceMind-AI connects to MCP server | |
| - Agent framework integration (smolagents) | |
| - MCP tool usage examples | |
| - [JOB_SUBMISSION.md](JOB_SUBMISSION.md) - Evaluation job guide | |
| - HuggingFace Jobs setup | |
| - Modal integration | |
| - Hardware selection guide | |
| - Cost optimization tips | |
| - [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture | |
| - Project structure | |
| - Data flow | |
| - Authentication | |
| - Deployment | |
| --- | |
| ## Technology Stack | |
| - **UI Framework**: Gradio 5.49.1 | |
| - **Agent Framework**: smolagents 1.22.0+ | |
| - **MCP Integration**: MCP Python SDK + smolagents MCPClient | |
| - **Data Source**: HuggingFace Datasets API | |
| - **Authentication**: HuggingFace OAuth (planned) | |
| - **AI Models**: | |
| - Agent: Google Gemini 2.5 Flash | |
| - MCP Server: Google Gemini 2.5 Flash | |
| - **Cloud Platforms**: HuggingFace Jobs + Modal | |
| --- | |
| ## Example Workflows | |
| ### Workflow 1: Quick Analysis | |
| 1. Open TraceMind-AI | |
| 2. Go to **π€ Agent Chat** | |
| 3. Click **"Quick: Top Models"** | |
| 4. See agent fetch leaderboard and analyze top performers | |
| 5. Ask follow-up: *"Which one is most cost-effective?"* | |
| ### Workflow 2: Submit Evaluation Job | |
| 1. Go to **βοΈ Settings** β Configure API keys | |
| 2. Go to **π New Evaluation** | |
| 3. Select model (e.g., `meta-llama/Llama-3.1-8B`) | |
| 4. Choose infrastructure (HuggingFace Jobs or Modal) | |
| 5. Click **"π° Estimate Cost"** to preview | |
| 6. Click **"Submit Evaluation"** | |
| 7. Monitor job in **π Job Monitoring** tab | |
| 8. View results in leaderboard when complete | |
| ### Workflow 3: Debug Agent Behavior | |
| 1. Browse **π Leaderboard** | |
| 2. Click on a run with failures | |
| 3. View **detailed test results** | |
| 4. Click on a failed test to see trace | |
| 5. Use MCP-powered Q&A: *"Why did this test fail?"* | |
| 6. Get AI analysis of the execution trace | |
| ### Workflow 4: Generate Custom Test Dataset | |
| 1. Go to **π¬ Synthetic Data Generator** | |
| 2. Configure: | |
| - Domain: `finance` | |
| - Tools: `get_stock_price,calculate_profit,send_alert` | |
| - Number of tasks: `20` | |
| - Difficulty: `balanced` | |
| 3. Click **"Generate Dataset"** | |
| 4. Review generated tasks and prompt template | |
| 5. Enter repository name: `yourname/smoltrace-finance-tasks` | |
| 6. Click **"Push to HuggingFace Hub"** | |
| 7. Use your custom dataset in evaluations | |
| --- | |
| ## Screenshots | |
| *See [SCREENSHOTS.md](SCREENSHOTS.md) for annotated screenshots of all screens* | |
| --- | |
| ## π Quick Links | |
| ### π¦ Component Links | |
| | Component | Description | Links | | |
| |-----------|-------------|-------| | |
| | **TraceVerde** | OTEL Instrumentation | [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) β’ [PyPI](https://pypi.org/project/genai-otel-instrument) | | |
| | **SMOLTRACE** | Evaluation Engine | [GitHub](https://github.com/Mandark-droid/SMOLTRACE) β’ [PyPI](https://pypi.org/project/smoltrace/) | | |
| | **MCP Server** | Building MCP (Track 1) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) | | |
| | **TraceMind-AI** | MCP in Action (Track 2) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-AI) | | |
| ### π’ Community Posts | |
| - π [**TraceMind-AI Hackathon Submission**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - MCP's 1st Birthday Hackathon final submission | |
| - π [**Building TraceMind Ecosystem Blog Post**](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem | |
| - π [**TraceMind Teaser**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement | |
| - π [**SMOLTRACE Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine | |
| - π [**TraceVerde Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs | |
| - π [**TraceVerde 3K Downloads**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community! | |
| --- | |
| ## Credits | |
| **Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025) | |
| **Track**: MCP in Action (Enterprise) | |
| **Author**: Kshitij Thakkar | |
| **Powered by**: TraceMind MCP Server + Gradio + smolagents | |
| **Built with**: Gradio 5.49.1 (MCP client integration) | |
| **Special Thanks**: | |
| - **[Eliseu Silva](https://huggingface.co/elismasilva)** - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable! | |
| **Sponsors**: HuggingFace β’ Google Gemini β’ Modal β’ Anthropic β’ Gradio β’ OpenAI β’ Nebius β’ Hyperbolic β’ ElevenLabs β’ SambaNova β’ Blaxel | |
| --- | |
| ## License | |
| AGPL-3.0 - See [LICENSE](LICENSE) for details | |
| --- | |
| ## Support | |
| - π§ GitHub Issues: [TraceMind-AI/issues](https://github.com/Mandark-droid/TraceMind-AI/issues) | |
| - π¬ HF Discord: `#mcp-1st-birthday-officialπ` | |
| - π·οΈ Tag: `mcp-in-action-track-enterprise` | |
| - π¦ Twitter: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder) | |
| --- | |
| **Ready to evaluate your agents with AI-powered intelligence?** | |
| π **Try the live demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind | |