Spaces:
Running
Running
File size: 17,965 Bytes
cdeb1d3 e7e29cb cdeb1d3 fae4e5b 664f166 fae4e5b 98dc4d3 ea9bb7d dafc8f1 fae4e5b 659d404 cdeb1d3 e7e29cb fae4e5b 73f859d 34f1a7a 73f859d fae4e5b 8dccf7d 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 6ae304e 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 9b4e279 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 6ae304e 54d748d d0bd9af 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a ae24574 34f1a7a ae24574 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 54d748d f42b8e7 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 50b95c9 34f1a7a 50b95c9 34f1a7a 50b95c9 34f1a7a 2c5c69c 34f1a7a 50b95c9 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a |
|
---
title: TraceMind AI
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
- mcp-in-action-track-enterprise
- agent-evaluation
- mcp-client
- leaderboard
- gradio
---
# π§ TraceMind-AI
<p align="center">
<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind-AI Logo" width="200"/>
</p>
**Agent Evaluation Platform with MCP-Powered Intelligence**
[](https://github.com/modelcontextprotocol)
[-purple)](https://github.com/modelcontextprotocol/hackathon)
[](https://gradio.app/)
> **π― Track 2 Submission**: MCP in Action (Enterprise)
> **π
MCP's 1st Birthday Hackathon**: November 14-30, 2025
---
## Why TraceMind-AI?
**The Challenge**: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.
**The Solution**: TraceMind-AI is your **intelligent agent evaluation command center**:
- π **Live leaderboard** with real-time performance data
- π€ **Autonomous agent chat** powered by MCP tools
- π° **Smart cost estimation** before you run evaluations
- π **Deep trace analysis** to debug agent behavior
- βοΈ **Multi-cloud job submission** (HuggingFace Jobs + Modal)
All powered by the **Model Context Protocol** for AI-driven insights at every step.
---
## π Try It Now
- **π Live Demo**: [TraceMind-AI Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)
- **π οΈ MCP Server**: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) (Track 1)
- **π Full Docs**: See [USER_GUIDE.md](USER_GUIDE.md) for complete walkthrough
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe)
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)
---
## The TraceMind Ecosystem
TraceMind-AI is the **user-facing platform** in a complete 4-project agent evaluation ecosystem:
<p align="center">
<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/>
<br/><br/>
</p>
```
π TraceVerde π SMOLTRACE
(genai_otel_instrument) (Evaluation Engine)
β β
Instruments Evaluates
LLM calls agents
β β
βββββββββββββ¬ββββββββββββββββββββ
β
Generates Datasets
(leaderboard, traces, metrics)
β
βββββββββββββ΄ββββββββββββββββββββ
β β
π οΈ TraceMind MCP Server π§ TraceMind-AI
(Track 1 - Building MCP) (This Project - Track 2)
Provides AI Tools Consumes MCP Tools
ββββββββββ MCP Protocol βββββββββ
```
### The Foundation
**π TraceVerde** - Automatic OpenTelemetry instrumentation for LLM frameworks
β Captures every LLM call, tool usage, and agent step
β [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)
**π SMOLTRACE** - Lightweight evaluation engine with built-in tracing
β Generates structured datasets (leaderboard, results, traces, metrics)
β [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)
### The Platform
**π οΈ TraceMind MCP Server** - AI-powered analysis tools via MCP
β [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
β **Track 1**: Building MCP (Enterprise)
**π§ TraceMind-AI** (This Project) - Interactive UI that consumes MCP tools
β **Track 2**: MCP in Action (Enterprise)
---
## Why This Matters for Hugging Face
This ecosystem is built **around** Hugging Face, not just "using it":
- Every SMOLTRACE evaluation creates **4 structured `datasets` on the Hub** (leaderboard, results, traces, metrics)
- TraceMind MCP Server and TraceMind-AI run as **Hugging Face Spaces**, using **Gradio's MCP integration**
- The stack is designed for **`smolagents`** β agents are evaluated, traced, and analyzed using HF's own agent framework
- Evaluations can be executed via **HF Jobs**, turning evaluations into real compute usage, not just local scripts
So TraceMind isn't just another agent demo.
**It's an opinionated blueprint for:**
> **"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."**
---
## Key Features
### π― MCP Integration (Track 2)
TraceMind-AI demonstrates **enterprise MCP client usage** in two ways:
**1. Direct MCP Client Integration**
- Connects to TraceMind MCP Server via SSE transport
- Uses 5 AI-powered tools: `analyze_leaderboard`, `estimate_cost`, `debug_trace`, `compare_runs`, `analyze_results`
- Real-time insights powered by Google Gemini 2.5 Flash
**2. Autonomous Agent with MCP Tools**
- Built with `smolagents` framework
- Agent has access to all MCP server tools
- Natural language queries β autonomous tool execution
- Example: *"What are the top 3 models and how much do they cost?"*
### π Agent Evaluation Features
- **Live Leaderboard**: View all evaluation runs with sortable metrics
- **Cost Estimation**: Auto-select hardware and predict costs before running
- **Trace Visualization**: Deep-dive into OpenTelemetry traces with GPU metrics
- **Multi-Cloud Jobs**: Submit evaluations to HuggingFace Jobs or Modal
- **Performance Analytics**: GPU utilization, CO2 emissions, token tracking
### π‘ Smart Features
- **Auto Hardware Selection**: Based on model size and provider
- **Real-time Job Monitoring**: Track HuggingFace Jobs status
- **Agent Reasoning Visibility**: See step-by-step tool execution
- **Quick Action Buttons**: One-click common queries
---
## Quick Start
### Option 1: Use the Live Demo (Recommended)
1. **Visit**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. **Login**: Sign in with your HuggingFace account
3. **Explore**: Browse the leaderboard, chat with the agent, visualize traces
### Option 2: Run Locally
```bash
# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)
# Run the app
python app.py
```
Visit http://localhost:7860
---
## Configuration
### For Viewing (Free)
**Required**:
- HuggingFace account (free)
- HuggingFace token with **Read** permissions
### For Submitting Jobs (Paid)
**Required**:
- β οΈ **HuggingFace Pro** ($9/month) with credit card
- HuggingFace token with **Read + Write + Run Jobs** permissions
- LLM provider API keys (OpenAI, Anthropic, etc.)
**Optional (Modal Alternative)**:
- Modal account (pay-per-second, no subscription)
- Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)
### Using Your Own API Keys (Recommended for Judges)
To prevent rate limits during evaluation:
**Step 1: Configure MCP Server** (Required for AI tools)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save & Override Keys"**
**Step 2: Configure TraceMind-AI** (Optional)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save API Keys"**
**Get Free API Keys**:
- **Gemini**: https://ai.google.dev/ (1,500 requests/day)
- **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets)
---
## For Hackathon Judges
### β
Track 2 Compliance
- **MCP Client Integration**: Connects to remote MCP server via SSE transport
- **Autonomous Agent**: `smolagents` agent with MCP tool access
- **Enterprise Focus**: Cost optimization, job submission, performance analytics
- **Production-Ready**: Deployed to HuggingFace Spaces with OAuth authentication
- **Real Data**: Live HuggingFace datasets from SMOLTRACE evaluations
### π― Key Innovations
1. **Dual MCP Integration**: Both direct MCP client + autonomous agent with MCP tools
2. **Multi-Cloud Support**: HuggingFace Jobs + Modal for serverless compute
3. **Auto Hardware Selection**: Smart hardware recommendations based on model size
4. **Complete Ecosystem**: Part of 4-project platform demonstrating full evaluation workflow
5. **Agent Reasoning Visibility**: See step-by-step MCP tool execution
### πΉ Demo Materials
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) - Complete walkthrough of all features
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) - Quick intro to MCP tools
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) - Deep dive into MCP server
- **π Blog Post**: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Technical deep-dive
- **π LinkedIn Post**: [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - Final submission announcement
### π§ͺ Testing Suggestions
**1. Try the Agent Chat** (π€ Agent Chat tab):
- "Analyze the current leaderboard and show me the top 5 models"
- "Compare the costs of the top 3 models"
- "Estimate the cost of running 100 tests with GPT-4"
**2. Explore the Leaderboard** (π Leaderboard tab):
- Click "Load Leaderboard" to see live data
- Read the AI-generated insights (powered by MCP server)
- Click on a run to see detailed test results
**3. Visualize Traces** (Select a run β View traces):
- See OpenTelemetry waterfall diagrams
- View GPU metrics overlay (for GPU jobs)
- Ask questions about the trace (MCP-powered debugging)
---
## What Can You Do?
### π View & Analyze
- **Browse leaderboard** with AI-powered insights
- **Compare models** side-by-side across metrics
- **Analyze traces** with interactive visualization
- **Ask questions** via autonomous agent
### π° Estimate & Plan
- **Get cost estimates** before running evaluations
- **Compare hardware options** (CPU vs GPU tiers)
- **Preview duration** and CO2 emissions
- **See recommendations** from AI analysis
### π Submit & Monitor
- **Submit evaluation jobs** to HuggingFace or Modal
- **Track job status** in real-time
- **View results** automatically when complete
- **Download datasets** for further analysis
### π§ͺ Generate & Customize
- **Generate synthetic datasets** for custom domains and tools
- **Create prompt templates** optimized for your use case
- **Push to HuggingFace Hub** with one click
- **Test evaluations** without writing code
---
## Documentation
**For quick evaluation**:
- Read this README for overview
- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) to try it
- Check out the **π€ Agent Chat** tab for autonomous MCP usage
**For deep dives**:
- [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen walkthrough
- Leaderboard tab usage
- Agent chat interactions
- Synthetic data generator
- Job submission workflow
- Trace visualization guide
- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client architecture
- How TraceMind-AI connects to MCP server
- Agent framework integration (smolagents)
- MCP tool usage examples
- [JOB_SUBMISSION.md](JOB_SUBMISSION.md) - Evaluation job guide
- HuggingFace Jobs setup
- Modal integration
- Hardware selection guide
- Cost optimization tips
- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
- Project structure
- Data flow
- Authentication
- Deployment
---
## Technology Stack
- **UI Framework**: Gradio 5.49.1
- **Agent Framework**: smolagents 1.22.0+
- **MCP Integration**: MCP Python SDK + smolagents MCPClient
- **Data Source**: HuggingFace Datasets API
- **Authentication**: HuggingFace OAuth (planned)
- **AI Models**:
- Agent: Google Gemini 2.5 Flash
- MCP Server: Google Gemini 2.5 Flash
- **Cloud Platforms**: HuggingFace Jobs + Modal
---
## Example Workflows
### Workflow 1: Quick Analysis
1. Open TraceMind-AI
2. Go to **π€ Agent Chat**
3. Click **"Quick: Top Models"**
4. See agent fetch leaderboard and analyze top performers
5. Ask follow-up: *"Which one is most cost-effective?"*
### Workflow 2: Submit Evaluation Job
1. Go to **βοΈ Settings** β Configure API keys
2. Go to **π New Evaluation**
3. Select model (e.g., `meta-llama/Llama-3.1-8B`)
4. Choose infrastructure (HuggingFace Jobs or Modal)
5. Click **"π° Estimate Cost"** to preview
6. Click **"Submit Evaluation"**
7. Monitor job in **π Job Monitoring** tab
8. View results in leaderboard when complete
### Workflow 3: Debug Agent Behavior
1. Browse **π Leaderboard**
2. Click on a run with failures
3. View **detailed test results**
4. Click on a failed test to see trace
5. Use MCP-powered Q&A: *"Why did this test fail?"*
6. Get AI analysis of the execution trace
### Workflow 4: Generate Custom Test Dataset
1. Go to **π¬ Synthetic Data Generator**
2. Configure:
- Domain: `finance`
- Tools: `get_stock_price,calculate_profit,send_alert`
- Number of tasks: `20`
- Difficulty: `balanced`
3. Click **"Generate Dataset"**
4. Review generated tasks and prompt template
5. Enter repository name: `yourname/smoltrace-finance-tasks`
6. Click **"Push to HuggingFace Hub"**
7. Use your custom dataset in evaluations
---
## Screenshots
*See [SCREENSHOTS.md](SCREENSHOTS.md) for annotated screenshots of all screens*
---
## π Quick Links
### π¦ Component Links
| Component | Description | Links |
|-----------|-------------|-------|
| **TraceVerde** | OTEL Instrumentation | [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) β’ [PyPI](https://pypi.org/project/genai-otel-instrument) |
| **SMOLTRACE** | Evaluation Engine | [GitHub](https://github.com/Mandark-droid/SMOLTRACE) β’ [PyPI](https://pypi.org/project/smoltrace/) |
| **MCP Server** | Building MCP (Track 1) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) |
| **TraceMind-AI** | MCP in Action (Track 2) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-AI) |
### π’ Community Posts
- π [**TraceMind-AI Hackathon Submission**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - MCP's 1st Birthday Hackathon final submission
- π [**Building TraceMind Ecosystem Blog Post**](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem
- π [**TraceMind Teaser**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement
- π [**SMOLTRACE Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine
- π [**TraceVerde Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs
- π [**TraceVerde 3K Downloads**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community!
---
## Credits
**Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
**Track**: MCP in Action (Enterprise)
**Author**: Kshitij Thakkar
**Powered by**: TraceMind MCP Server + Gradio + smolagents
**Built with**: Gradio 5.49.1 (MCP client integration)
**Special Thanks**:
- **[Eliseu Silva](https://huggingface.co/elismasilva)** - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!
**Sponsors**: HuggingFace β’ Google Gemini β’ Modal β’ Anthropic β’ Gradio β’ OpenAI β’ Nebius β’ Hyperbolic β’ ElevenLabs β’ SambaNova β’ Blaxel
---
## License
AGPL-3.0 - See [LICENSE](LICENSE) for details
---
## Support
- π§ GitHub Issues: [TraceMind-AI/issues](https://github.com/Mandark-droid/TraceMind-AI/issues)
- π¬ HF Discord: `#mcp-1st-birthday-officialπ`
- π·οΈ Tag: `mcp-in-action-track-enterprise`
- π¦ Twitter: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder)
---
**Ready to evaluate your agents with AI-powered intelligence?**
π **Try the live demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
|