---
title: TraceMind AI
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
- mcp-in-action-track-enterprise
- agent-evaluation
- mcp-client
- leaderboard
- gradio
---
# π§ TraceMind-AI
**Agent Evaluation Platform with MCP-Powered Intelligence**
[](https://github.com/modelcontextprotocol)
[-purple)](https://github.com/modelcontextprotocol/hackathon)
[](https://gradio.app/)
> **π― Track 2 Submission**: MCP in Action (Enterprise)
> **π
MCP's 1st Birthday Hackathon**: November 14-30, 2025
---
## Why TraceMind-AI?
**The Challenge**: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.
**The Solution**: TraceMind-AI is your **intelligent agent evaluation command center**:
- π **Live leaderboard** with real-time performance data
- π€ **Autonomous agent chat** powered by MCP tools
- π° **Smart cost estimation** before you run evaluations
- π **Deep trace analysis** to debug agent behavior
- βοΈ **Multi-cloud job submission** (HuggingFace Jobs + Modal)
All powered by the **Model Context Protocol** for AI-driven insights at every step.
---
## π Try It Now
- **π Live Demo**: [TraceMind-AI Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)
- **π οΈ MCP Server**: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) (Track 1)
- **π Full Docs**: See [USER_GUIDE.md](USER_GUIDE.md) for complete walkthrough
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe)
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)
---
## The TraceMind Ecosystem
TraceMind-AI is the **user-facing platform** in a complete 4-project agent evaluation ecosystem:
```
π TraceVerde π SMOLTRACE
(genai_otel_instrument) (Evaluation Engine)
β β
Instruments Evaluates
LLM calls agents
β β
βββββββββββββ¬ββββββββββββββββββββ
β
Generates Datasets
(leaderboard, traces, metrics)
β
βββββββββββββ΄ββββββββββββββββββββ
β β
π οΈ TraceMind MCP Server π§ TraceMind-AI
(Track 1 - Building MCP) (This Project - Track 2)
Provides AI Tools Consumes MCP Tools
ββββββββββ MCP Protocol βββββββββ
```
### The Foundation
**π TraceVerde** - Automatic OpenTelemetry instrumentation for LLM frameworks
β Captures every LLM call, tool usage, and agent step
β [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)
**π SMOLTRACE** - Lightweight evaluation engine with built-in tracing
β Generates structured datasets (leaderboard, results, traces, metrics)
β [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)
### The Platform
**π οΈ TraceMind MCP Server** - AI-powered analysis tools via MCP
β [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
β **Track 1**: Building MCP (Enterprise)
**π§ TraceMind-AI** (This Project) - Interactive UI that consumes MCP tools
β **Track 2**: MCP in Action (Enterprise)
---
## Why This Matters for Hugging Face
This ecosystem is built **around** Hugging Face, not just "using it":
- Every SMOLTRACE evaluation creates **4 structured `datasets` on the Hub** (leaderboard, results, traces, metrics)
- TraceMind MCP Server and TraceMind-AI run as **Hugging Face Spaces**, using **Gradio's MCP integration**
- The stack is designed for **`smolagents`** β agents are evaluated, traced, and analyzed using HF's own agent framework
- Evaluations can be executed via **HF Jobs**, turning evaluations into real compute usage, not just local scripts
So TraceMind isn't just another agent demo.
**It's an opinionated blueprint for:**
> **"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."**
---
## Key Features
### π― MCP Integration (Track 2)
TraceMind-AI demonstrates **enterprise MCP client usage** in two ways:
**1. Direct MCP Client Integration**
- Connects to TraceMind MCP Server via SSE transport
- Uses 5 AI-powered tools: `analyze_leaderboard`, `estimate_cost`, `debug_trace`, `compare_runs`, `analyze_results`
- Real-time insights powered by Google Gemini 2.5 Flash
**2. Autonomous Agent with MCP Tools**
- Built with `smolagents` framework
- Agent has access to all MCP server tools
- Natural language queries β autonomous tool execution
- Example: *"What are the top 3 models and how much do they cost?"*
### π Agent Evaluation Features
- **Live Leaderboard**: View all evaluation runs with sortable metrics
- **Cost Estimation**: Auto-select hardware and predict costs before running
- **Trace Visualization**: Deep-dive into OpenTelemetry traces with GPU metrics
- **Multi-Cloud Jobs**: Submit evaluations to HuggingFace Jobs or Modal
- **Performance Analytics**: GPU utilization, CO2 emissions, token tracking
### π‘ Smart Features
- **Auto Hardware Selection**: Based on model size and provider
- **Real-time Job Monitoring**: Track HuggingFace Jobs status
- **Agent Reasoning Visibility**: See step-by-step tool execution
- **Quick Action Buttons**: One-click common queries
---
## Quick Start
### Option 1: Use the Live Demo (Recommended)
1. **Visit**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. **Login**: Sign in with your HuggingFace account
3. **Explore**: Browse the leaderboard, chat with the agent, visualize traces
### Option 2: Run Locally
```bash
# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)
# Run the app
python app.py
```
Visit http://localhost:7860
---
## Configuration
### For Viewing (Free)
**Required**:
- HuggingFace account (free)
- HuggingFace token with **Read** permissions
### For Submitting Jobs (Paid)
**Required**:
- β οΈ **HuggingFace Pro** ($9/month) with credit card
- HuggingFace token with **Read + Write + Run Jobs** permissions
- LLM provider API keys (OpenAI, Anthropic, etc.)
**Optional (Modal Alternative)**:
- Modal account (pay-per-second, no subscription)
- Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)
### Using Your Own API Keys (Recommended for Judges)
To prevent rate limits during evaluation:
**Step 1: Configure MCP Server** (Required for AI tools)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save & Override Keys"**
**Step 2: Configure TraceMind-AI** (Optional)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save API Keys"**
**Get Free API Keys**:
- **Gemini**: https://ai.google.dev/ (1,500 requests/day)
- **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets)
---
## For Hackathon Judges
### β
Track 2 Compliance
- **MCP Client Integration**: Connects to remote MCP server via SSE transport
- **Autonomous Agent**: `smolagents` agent with MCP tool access
- **Enterprise Focus**: Cost optimization, job submission, performance analytics
- **Production-Ready**: Deployed to HuggingFace Spaces with OAuth authentication
- **Real Data**: Live HuggingFace datasets from SMOLTRACE evaluations
### π― Key Innovations
1. **Dual MCP Integration**: Both direct MCP client + autonomous agent with MCP tools
2. **Multi-Cloud Support**: HuggingFace Jobs + Modal for serverless compute
3. **Auto Hardware Selection**: Smart hardware recommendations based on model size
4. **Complete Ecosystem**: Part of 4-project platform demonstrating full evaluation workflow
5. **Agent Reasoning Visibility**: See step-by-step MCP tool execution
### πΉ Demo Materials
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) - Complete walkthrough of all features
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) - Quick intro to MCP tools
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) - Deep dive into MCP server
- **π Blog Post**: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Technical deep-dive
- **π LinkedIn Post**: [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - Final submission announcement
### π§ͺ Testing Suggestions
**1. Try the Agent Chat** (π€ Agent Chat tab):
- "Analyze the current leaderboard and show me the top 5 models"
- "Compare the costs of the top 3 models"
- "Estimate the cost of running 100 tests with GPT-4"
**2. Explore the Leaderboard** (π Leaderboard tab):
- Click "Load Leaderboard" to see live data
- Read the AI-generated insights (powered by MCP server)
- Click on a run to see detailed test results
**3. Visualize Traces** (Select a run β View traces):
- See OpenTelemetry waterfall diagrams
- View GPU metrics overlay (for GPU jobs)
- Ask questions about the trace (MCP-powered debugging)
---
## What Can You Do?
### π View & Analyze
- **Browse leaderboard** with AI-powered insights
- **Compare models** side-by-side across metrics
- **Analyze traces** with interactive visualization
- **Ask questions** via autonomous agent
### π° Estimate & Plan
- **Get cost estimates** before running evaluations
- **Compare hardware options** (CPU vs GPU tiers)
- **Preview duration** and CO2 emissions
- **See recommendations** from AI analysis
### π Submit & Monitor
- **Submit evaluation jobs** to HuggingFace or Modal
- **Track job status** in real-time
- **View results** automatically when complete
- **Download datasets** for further analysis
### π§ͺ Generate & Customize
- **Generate synthetic datasets** for custom domains and tools
- **Create prompt templates** optimized for your use case
- **Push to HuggingFace Hub** with one click
- **Test evaluations** without writing code
---
## Documentation
**For quick evaluation**:
- Read this README for overview
- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) to try it
- Check out the **π€ Agent Chat** tab for autonomous MCP usage
**For deep dives**:
- [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen walkthrough
- Leaderboard tab usage
- Agent chat interactions
- Synthetic data generator
- Job submission workflow
- Trace visualization guide
- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client architecture
- How TraceMind-AI connects to MCP server
- Agent framework integration (smolagents)
- MCP tool usage examples
- [JOB_SUBMISSION.md](JOB_SUBMISSION.md) - Evaluation job guide
- HuggingFace Jobs setup
- Modal integration
- Hardware selection guide
- Cost optimization tips
- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
- Project structure
- Data flow
- Authentication
- Deployment
---
## Technology Stack
- **UI Framework**: Gradio 5.49.1
- **Agent Framework**: smolagents 1.22.0+
- **MCP Integration**: MCP Python SDK + smolagents MCPClient
- **Data Source**: HuggingFace Datasets API
- **Authentication**: HuggingFace OAuth (planned)
- **AI Models**:
- Agent: Google Gemini 2.5 Flash
- MCP Server: Google Gemini 2.5 Flash
- **Cloud Platforms**: HuggingFace Jobs + Modal
---
## Example Workflows
### Workflow 1: Quick Analysis
1. Open TraceMind-AI
2. Go to **π€ Agent Chat**
3. Click **"Quick: Top Models"**
4. See agent fetch leaderboard and analyze top performers
5. Ask follow-up: *"Which one is most cost-effective?"*
### Workflow 2: Submit Evaluation Job
1. Go to **βοΈ Settings** β Configure API keys
2. Go to **π New Evaluation**
3. Select model (e.g., `meta-llama/Llama-3.1-8B`)
4. Choose infrastructure (HuggingFace Jobs or Modal)
5. Click **"π° Estimate Cost"** to preview
6. Click **"Submit Evaluation"**
7. Monitor job in **π Job Monitoring** tab
8. View results in leaderboard when complete
### Workflow 3: Debug Agent Behavior
1. Browse **π Leaderboard**
2. Click on a run with failures
3. View **detailed test results**
4. Click on a failed test to see trace
5. Use MCP-powered Q&A: *"Why did this test fail?"*
6. Get AI analysis of the execution trace
### Workflow 4: Generate Custom Test Dataset
1. Go to **π¬ Synthetic Data Generator**
2. Configure:
- Domain: `finance`
- Tools: `get_stock_price,calculate_profit,send_alert`
- Number of tasks: `20`
- Difficulty: `balanced`
3. Click **"Generate Dataset"**
4. Review generated tasks and prompt template
5. Enter repository name: `yourname/smoltrace-finance-tasks`
6. Click **"Push to HuggingFace Hub"**
7. Use your custom dataset in evaluations
---
## Screenshots
*See [SCREENSHOTS.md](SCREENSHOTS.md) for annotated screenshots of all screens*
---
## π Quick Links
### π¦ Component Links
| Component | Description | Links |
|-----------|-------------|-------|
| **TraceVerde** | OTEL Instrumentation | [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) β’ [PyPI](https://pypi.org/project/genai-otel-instrument) |
| **SMOLTRACE** | Evaluation Engine | [GitHub](https://github.com/Mandark-droid/SMOLTRACE) β’ [PyPI](https://pypi.org/project/smoltrace/) |
| **MCP Server** | Building MCP (Track 1) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) |
| **TraceMind-AI** | MCP in Action (Track 2) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-AI) |
### π’ Community Posts
- π [**TraceMind-AI Hackathon Submission**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - MCP's 1st Birthday Hackathon final submission
- π [**Building TraceMind Ecosystem Blog Post**](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem
- π [**TraceMind Teaser**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement
- π [**SMOLTRACE Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine
- π [**TraceVerde Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs
- π [**TraceVerde 3K Downloads**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community!
---
## Credits
**Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
**Track**: MCP in Action (Enterprise)
**Author**: Kshitij Thakkar
**Powered by**: TraceMind MCP Server + Gradio + smolagents
**Built with**: Gradio 5.49.1 (MCP client integration)
**Special Thanks**:
- **[Eliseu Silva](https://huggingface.co/elismasilva)** - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!
**Sponsors**: HuggingFace β’ Google Gemini β’ Modal β’ Anthropic β’ Gradio β’ OpenAI β’ Nebius β’ Hyperbolic β’ ElevenLabs β’ SambaNova β’ Blaxel
---
## License
AGPL-3.0 - See [LICENSE](LICENSE) for details
---
## Support
- π§ GitHub Issues: [TraceMind-AI/issues](https://github.com/Mandark-droid/TraceMind-AI/issues)
- π¬ HF Discord: `#mcp-1st-birthday-officialπ`
- π·οΈ Tag: `mcp-in-action-track-enterprise`
- π¦ Twitter: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder)
---
**Ready to evaluate your agents with AI-powered intelligence?**
π **Try the live demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind